Forensic Science International: Genetics 5 (2011) 146–151 The GHEP–EMPOP collaboration on mtDNA population data—A new resource for forensic casework L. Prieto a, B. Zimmermann b, A. Goios c, A. Rodriguez-Monge a, G.G. Paneto d, C. Alves c, A. Alonso e, C. Fridman f, S. Cardoso g, G. Lima h, M.J. Anjos i, M.R. Whittle j, M. Montesino a, R.M.B. Cicarelli d, A.M. Rocha c, C. Albarrán e, M.M. de Pancorbo g, M.F. Pinheiro h, M. Carvalho i, D.R. Sumita j, W. Parson b,* a Comisarı́a General de Policı́a Cientı́fica, University Institute of Research in Forensic Sciences (IUICP), Madrid, Spain b Institute of Legal Medicine, Innsbruck Medical University, Austria c Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal d Laboratory of Paternity, UNESP, Univ. Estadual Paulista, São Paulo, Brazil e National Institute of Toxicology and Forensic Sciences (INTCF), Madrid, Spain f Department of Legal Medicine, Bioethics and Occupational Health, Medical School, University of São Paulo, Brazil g BIOMICs Research Group, Centro de Investigación y Estudios Avanzados ‘‘Lucio Lascaray’’, University of the Basque Country, Vitoria-Gasteiz, Spain h National Institute of Legal Medicine, North Branch, Porto, Portugal i National Institute of Legal Medicine, Centre Branch, Coimbra, Portugal j Genomic Engenharia Molecular, São Paulo, Brazil A R T I C L E I N F O Keywords: mtDNA Mitochondrial DNA GHEP-ISFG EMPOP Population analyses Forensic science A B S T R A C T Mitochondrial DNA (mtDNA) population data for forensic purposes are still scarce for some populations, which may limit the evaluation of forensic evidence especially when the rarity of a haplotype needs to be determined in a database search. In order to improve the collection of mtDNA lineages from the Iberian and South American subcontinents, we here report the results of a collaborative study involving nine laboratories from the Spanish and Portuguese Speaking Working Group of the International Society for Forensic Genetics (GHEP-ISFG) and EMPOP. The individual laboratories contributed population data that were generated throughout the past 10 years, but in the majority of cases have not been made available to the scientific community. A total of 1019 haplotypes from Iberia (Basque Country, 2 general Spanish populations, 2 North and 1 Central Portugal populations), and Latin America (3 populations from São Paulo) were collected, reviewed and harmonized according to defined EMPOP criteria. The majority of data ambiguities that were found during the reviewing process (41 in total) were transcription errors confirming that the documentation process is still the most error-prone stage in reporting mtDNA population data, especially when performed manually. This GHEP–EMPOP collaboration has significantly improved the quality of the individual mtDNA datasets and adds mtDNA population data as valuable resource to the EMPOP database (www.empop.org). � 2010 Elsevier Ireland Ltd. All rights reserved. Contents lists available at ScienceDirect Forensic Science International: Genetics journa l homepage: www.e lsev ier .com/ locate / fs ig 1. Introduction The importance of mitochondrial DNA (mtDNA) analysis is still growing and nowadays it has become an essential technique in dedicated forensic laboratories [1]. It is usually investigated in forensic case work when not enough nuclear DNA is available in a questioned sample or when it is necessary to evaluate maternal relationships between individuals. When two mtDNA haplotypes cannot be excluded as originating from the same source mtDNA databases are queried to determine the rarity of that profile. * Corresponding author. Tel.: +43 512 9003 70640; fax: +43 512 9003 73640. E-mail address: walther.parson@i-med.ac.at (W. Parson). 1872-4973/$ – see front matter � 2010 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.fsigen.2010.10.013 Laboratories performing forensic mtDNA testing usually have data sets of their local population(s) at hand to aid frequency searches. Unfortunately, these data sets are usually not available to the general forensic community and therefore of limited use. Also, some of these data may contain errors or ambiguities as they only rarely – if at all – undergo independent data quality review [2]. However, they constitute a valuable source of information, as mtDNA population data for forensic purposes are generally still in demand. In order to make those data accessible, the individual data sets need to be collected, reviewed and harmonized in a number of aspects, including the systematic performance of plausibility checks, the minimization of error, the adaptation of the sequencing ranges and the standardized presentation (alignment and annota- tion) of the mtDNA haplotypes. http://dx.doi.org/10.1016/j.fsigen.2010.10.013 http://www.empop.org/ mailto:walther.parson@i-med.ac.at http://www.sciencedirect.com/science/journal/18724973 http://dx.doi.org/10.1016/j.fsigen.2010.10.013 Table 1 List of participating laboratories in the collaborative GHEP-ISFG–EMPOP study. Laboratory Samples Year of data generation Range Publication Comisarı́a General de Policı́a Cientı́fica (Madrid, Spain) 249 2000–2010 Variable, but at least 16024–16365 and 72–340 This publication National Institute of Toxicology and Forensic Sciences, INTCF (Madrid, Spain) 154 1995–2000 16024–16365 and 73–340 This publication Laboratory of Paternity, UNESP, Univ. Estadual Paulista (São Paulo, Brazil) 142 2006–2010 16024–576 This publication Institute of Molecular Pathology and Immunology of the University of Porto, IPATIMUP (Porto, Portugal) 132 2008–2009 16024–576 This publication Department of Legal Medicine, Bioethics and Occupational Health, Medical School, University of São Paulo, Brazil 102 2006–2009 16024–576 ONLY EMPOP + future publication BIOMICs Research Group. Centro de Investigación y Estudios Avanzados ‘‘Lucio Lascaray’’. University of the Basque Country (Vitoria-Gasteiz, Spain) 84 2003–2007 16024–16383 and 66–370 29 New haplotypes this publication; 55 haplotypes already published in Ref. [5] National Institute of Legal Medicine. North Branch (Porto, Portugal) 55 2005–2008 16024–16391 and 30–408; 10 codR SNPs + 1 non-coding region SNP This publication National Institute of Legal Medicine, Centre Branch (Coimbra, Portugal) 53 2000–2005 16024–16365 and 72–340 This publication Genomic Engenharia Molecular (São Paulo, Brazil) 48 2002–2007 16024–16365 and 73–340 This publication Total 1019 L. Prieto et al. / Forensic Science International: Genetics 5 (2011) 146–151 147 The EDNAP Mitochondrial DNA Population Database (EMPOP) is a collaborative project among forensic and population genetic laboratories worldwide with the aim to increase the amount of reliable mtDNA population data in a searchable format via the internet (www.empop.org) [3]. The currently available version (Release 2) contains 10,970 haplotypes that have undergone meticulous revision using software-based format and plausibility control and inspection of the data with phylogenetic methods. Although populations of west Eurasian origin are the most well represented in EMPOP, it is necessary to continue their collection especially for underrepresented populations at the regional level, which is the case for Iberian and also South American lineages. In addition, the phenomenon of migration is influencing the dynamics of populations and new studies are necessary for a more accurate evaluation of the frequency and distribution of mtDNA lineages. The current study follows a similar initiative driven by the Italian Ge.F.I-Group [4] which collected a total of 395 mtDNA haplotypes from Italy generated by 8 forensic laboratories. Those data were assembled and scrutinized with respect to EMPOP quality criteria and uploaded onto the database, thus making them available to the forensic community. In the current study, the Spanish and Portuguese-speaking Working Group of the Interna- tional Society for Forensic Genetics (GHEP-ISFG) has carried out a collaborative exercise by collecting and reviewing a total of 1019 haplotypes from different Iberian and Latin American populations that have been generated in the respective laboratories throughout the past 10 years. The current paper demonstrates the organization of the collaboration and the methods of data review. Observed ambiguities and questionable base calls were communicated to the authors who inspected raw data for review and clarification. Finally, comparative analysis of the Iberian populations is presented to support the data with forensically relevant informa- tion. 2. Materials and methods 2.1. Participants, samples and requirements Participating laboratories and the number of contributed samples are shown in Table 1. This collaborative exercise was open to all the GHEP labs, which met the following requirements: (a) successful participation at the 2008 GHEP mtDNA proficiency test control excercise; (b) supply of mtDNA haplotypes of about 50 unrelated individuals (as far as could possibly be determined) with (c) established geographical origin (region/city/population); (d) minimum sequencing coverage of HVS-I (16,024–16,365) and HVS-II (73–340) and (e) retention of raw data, if available both forward and reverse sequence information. All data included herein have not been published elsewhere except for 55 samples from the Basque Country (total of 84) that were previously presented in [5]. Therefore, we add 29 new haplotypes from the Basque Country to the pool of data in the course of this study. We also note that a subset of 102 lineages from Brazil was part of the evaluation process described herein but the individual haplotypes will be published in a different context later (Table 1). 2.2. Summary of methods The mtDNA sequences were generated between the years of 1995 and 2010. Therefore, a huge variety of methods in terms of DNA extraction, amplification, sequencing and electrophoresis were used. Therefore, we aimed at taking specific details into account that have a known effect on data interpretation, such as the older version of the Taq polymerase that left specific footprints in sequence electropherograms and was thus prone to introduce phantom mutations [6]. Details are summarized in Table 2. 2.3. EMPOP revision process The analysis of mtDNA is usually more challenging for a forensic laboratory than Short Tandem Repeat typing. This is because of its biological characteristics that may lead to difficulties for interpre- tation, such as heteroplasmy and potential uncertainty of exclusion/non-exclusion scenarios as well as technical peculiari- ties, e.g. the lack of standardized commercial support to aid the laboratory process (manufacturing kits), the elevated risk of contamination and sequencing artifacts. In addition, there is a lack of automation of numerous steps in the entire laboratory process. Thus, the separate amplification of HVS-I and HVS-II, which harbors an increased risk of mixing up samples (artificial recombination) or the manual transfer of tabular data are some of the critical issues. Previous publications have aptly demonstrat- ed these problems by example [9]. Therefore, a careful revision of http://www.empop.org/ Table 2 Analysis methods employed to generate the mtDNA population data. Laboratory DNA extraction Amplification primers Sequencing primers Sequencing chemistry Sequencing machine Comisarı́a General de Policı́a Cientı́fica (Madrid, Spain) P/C/I-Centricon L15997/H16395 or H17 L48/H408 L350/H619 or L16555/H619 L15997, H16395, L16555, L16209, H16164, L48, H17, H408, H285, L318, L350, H619 BigDye Terminator v2.0, v3.0 and v3.1 ABI 377/310/3130 National Institute of Toxicology and Forensic Sciences, INTCF (Madrid, Spain) P/C/I-Centricon L15997/H16391 L48/H408 L15997, H16391, L16209, H16164, L48, H408 dRhodamine Terminator ABI 377 Laboratory of Paternity, UNESP, Univ. Estadual Paulista (São Paulo, Brazil) FTA Reagent (Whatman) L15997/H639 L15997, H16401, L16209, H16164, L29, H408, H159, H285, L314, H599, H639 Big Dye Terminator v3.1 ABI 3130 Institute of Molecular Pathology and Immunology of the University of Porto, IPATIMUP (Porto, Portugal) Chelex L15997/H639 L15900/H599 L15900, L15997, H16, H159, L16268, L16555, L314, H599, H639 Big Dye Terminator v3.1 ABI 3130/3100 Department of Legal Medicine, Bioethics and Occupational Health, Medical School, University of São Paulo, Brazil Salting out [7] L15978/H16420 L29/H306 L153/H429 L256/H653 L15978, H16420, L29, H306, L153, H429, L256, H653 BigDye Terminator v3.1 ABI 3100/3130 BIOMICs Research Group. Centro de Investigación y Estudios Avanzados ‘‘Lucio Lascaray’’. University of the Basque Country (Vitoria-Gasteiz, Spain) Organic L15996/H16401 L29/H408 L15996, L29, H16401, H408 dRhodamine Terminator and Big Dye Terminator v3.1 ABI 310/3130 National Institute of Legal Medicine. North Branch (Porto, Portugal) Chelex or P/C/I L15996/H16401 L29/H408 SNPs: [8] M13 Forward, M13 Reverse BigDye Terminator v1.1 ABI 310/3100 National Institute of Legal Medicine, Centre Branch (Coimbra, Portugal) Chelex L15997/H16401/L16209/H16164 L48/H408/L314/H285 L15997/H16401/L16209/H16164 L48/H408/L314/H285 BigDye Terminator v1.1 ABI 3130 Genomic Engenharia Molecular (São Paulo, Brazil) FTA Reagent (Whatman) L15990/H16391 L34/H370 L15990/H16391/L16190/H16187 L34/H370/L313/H306 BigDyeTerminator v3.1 ABI 377/3130xl L. P rieto et a l./Fo ren sic Scien ce In tern a tio n a l: G en etics 5 (2 0 1 1 ) 1 4 6 – 1 5 1 1 4 8 Table 3 Classification of ambiguities after revision and confirmation by the raw lane data. Polymorphism Times (a) Reference bias 72C 1 73G 2 210G 1 315.1C 1 16355T 1 16360T 1 16390A 1 Total = 8 Position Times (b) Phantom mutation 16293M 1 527G 1 Total = 2 Mistaken Correct Times (c) Base mis-scoring 114G 114A 1 146T 146C 1 150C 150T 2 150G 150T 1 152T 152C 2 195T 195C 1 16278G 16278T 1 16356T 16356C 3 Total = 12 Position Times (d) Nomenclature 309.2C without 309.1C 8 Total = 8 Position Times (e) Alignment violation 523.1C 524.1A instead of 524.1A 524.2C 3 Total = 3 Mistaken Correct Times (f) Clerical errors 163G 263G 1 315C 315.1C 2 1620G 16207G 1 16218C 16182C 1 16223 16223T 1 16278C 16288C 1 19294T 16294T 1 Total = 8 L. Prieto et al. / Forensic Science International: Genetics 5 (2011) 146–151 149 the mtDNA haplotypes is crucial before they can be used for forensic interpretation in mtDNA databases. We performed IT- based evaluation of the data using formal and phylogenetic methods, such as NETWORK [3,10] to evaluate the following sources of error: (a) Reference bias. (b) Phantom mutations. (c) Base mis-scoring. (d) Nomenclature issues. (e) Alignment violation. (f) Clerical errors. We further aimed at achieving uniformity regarding the following aspects: (g) Haplogroup assignment, following [11; phylotree, build 10]. (h) Alignment and annotation in length variant regions. (i) Confirmation of point heteroplasmy. (j) Revision of sample affiliation (metadata). (k) Achieving best possible uniformity of sequence ranges. Compilation and revision processes were carried out at the Comisarı́a General de Policı́a Cientı́fica (Madrid) and reviewed by the EMPOP group at the Institute of Legal Medicine, Innsbruck Medical University. All polymorphisms were finally cross-refer- enced against commonly observed phantom mutations [12] and apparent ‘‘new polymorphisms’’ were evaluated using mtDNA literature data and direct Internet queries [13]. When necessary, contributing authors were asked to support their findings with raw data (electropherograms) to evaluate specific polymorphisms. 2.4. Population studies Molecular diversity indices, pairwise differences between and within populations and an analysis of molecular variance (AMOVA) were calculated using ARLEQUIN (Version 3.5) [14]. The random match probability was calculated as the sum of squared haplotype frequencies based on mtDNA control region sequences. All sequences were aligned and trimmed to a greatest common range of ntps 16024–16365 and ntps 73–340, length variation around ntps 16193 and 309 was disregarded. 3. Results and discussion 3.1. Results of the revision process A total of 1019 mtDNA haplotypes from 9 populations were examined in the present study (Table 1 and Table S1) of which 154 (from Spain) were already contributed and evaluated earlier. Another 249 haplotypes came from the organizing laboratory (Madrid) and 132 (North Portugal) were generated de novo in the course of this project. Therefore the total number of yet unreviewed haplotypes was 484. The communication with the authors of the sequences allowed the correction of questionable polymorphisms in 41 haplotypes (8.5%). The following sections list those according to their source (see also Section 2 and Table 3). 3.1.1. Reference bias Reference bias is one of the most abundant forms of clerical error which is manifest in a failure to report a polymorphism relative to the rCRS. Note that in some cases (not observed here) also other ‘‘Anderson sequences’’ are mistakenly used as reference sequence to which the consensus sequences are reported, which can then result in a similar problem. Reference bias is more frequently observed at the beginning and at the end of sequencing strands, due to decreased quality of the electropherograms there. If reverse sequencing reactions are missing or of low quality, reference biases are more frequent. In the present study we noted 8 instances 3 of which were located at the beginning and 3 at the end of the sequences (Table 3a). 3.1.2. Phantom mutations Artificial signals in the sequencing electropherograms (e.g. dye blobs, unincorporated dye terminators, inadequate migration conditions leading to shoulder peaks, secondary structures, polymerase footprints, etc.) are referred to as phantom mutations, as they are designated by some analysis software as genuine base calls. This emphasizes the need of manual data review, especially when sequence quality is low. Phantom mutations are usually also located at sequence beginnings and ends, as the quality of the electropherograms is lower there. We observed two instances in this study (Table 3b), where one (527G) is a well-known phantom hot spot [12]. 3.1.3. Base mis-scoring Base mis-scoring was found to be the most frequent error in the present study (Table 3c). It originates from manual data transfer and insufficient results review. The majority of these could be identified by applying stringent scrutiny when checking the data Table 4 Descriptive statistics for six populations from the Iberian Peninsula. Analyzed range: ntps 16024–16356, 73–340. Population statistics Basque [n = 84] Central Portugal [n = 53] North Portugal [n = 55] North Portugal [n = 132] Spain [n = 249] Spain [n = 154] Number of haplotypes 47 44 50 105 193 124 Number of unique haplotypes 31 40 47 88 167 114 Random match probability 0.043 0.033 0.023 0.014 0.014 0.016 Genetic diversity 0.957 0.967 0.977 0.986 0.986 0.984 Table 5 AMOVA results for the six investigated Iberian populations. Source of variation d.f. Sum of squares Variance components Percent of variation (a) Design and results (d.f. stands for degrees of freedom) Among populations 5 22.162 0.00839 Va 0.24 Within populations 721 2508.999 3.47989 Vb 99.76 Total 726 2531.161 3.48828 Basque [n = 84] Central Portugal [n = 53] North Portugal [n = 55] North Portugal [n = 132] Spain [n = 249] Spain [n = 154] (b) FST comparison among the regional populations Basque [N = 84] * 0.1290 0.0342 0.0049 0.0049 0.2432 Central Portugal [N = 53] 0.0053 * 0.8731 0.43848 0.2002 0.3516 North Portugal [N = 55] 0.0100 0.0000 * 0.77051 0.2891 0.4502 North Portugal [N = 132] 0.0113 0.0000 0.0000 * 0.0986 0.4102 Spain [N = 249] 0.0079 0.0025 0.0013 0.0022 * 0.1807 Spain [N = 154] 0.0016 0.0006 0.0000 0.0002 0.0013 * Basque [n = 84] Central Portugal [n = 53] North Portugal [n = 55] North Portugal [n = 132] Spain [n = 249] Spain [n = 154] (c) Population average pairwise differences Basque [N = 84] 5.82 6.40 7.11 6.85 6.16 6.62 Central Portugal [N = 53] 0.03 6.92 7.56 7.31 6.67 7.16 North Portugal [N = 55] 0.06 0.04 8.28 7.97 7.34 7.83 North Portugal [N = 132] 0.08 0.00 0.02 7.71 7.07 7.55 Spain [N = 249] 0.05 0.01 0.00 0.01 6.40 6.90 Spain [N = 154] 0.02 0.01 0.01 0.00 0.01 7.40 FST values are below the diagonal and the p-values (1023 permutations, significance level = 0.05) above the diagonal. Above diagonal: average number of pairwise differences between populations (PiXY); diagonal elements: average number of pairwise differences within population (PiX); below diagonal: corrected average pairwise difference (PiXY� (PiX + PiY)/2). L. Prieto et al. / Forensic Science International: Genetics 5 (2011) 146–151150 tables or by using automated plausibility checks, such as provided by the emp-tool (www.empop.org/modules/emptool/). 3.1.4. Nomenclature issues One participating laboratory called 8 instances of 2 C-insertions between positions 303 and 310 as only 309.2C instead of the commonly used term 309.1C 309.2C. While this constitutes a minor issue the explicit documentation of 309.1C makes clear that there is no other base inserted here (Table 3d). 3.1.5. Alignment violation The dinucleotide repeat region between ntps 514 and 524 has earlier been referred to as CA-repeat [15] and was later changed to an AC-repeat-based nomenclature in order to better accommodate a commonly observed transition at ntp 513 [16]. Since then AC- insertions relative to the rCRS (five repeats) are reported as 524.1A 524.2C (in contrast to the earlier formulated 523.1C 523.2A). In the present study we observed the designation of 523.1C 524.1A, which is incompatible with both alignment schemes (Table 3e). In Table 6 Observed haplogroup frequencies in the Iberian populations. Haplogroup Basque [n = 84] Central Portugal [n = 53] North Portugal R0 67.9% 49.1% 49.1% JT 15.5% 26.4% 20.0% UK 9.5% 15.1% 16.5% R* 2.4% 1.9% 3.6% N* 4.7% 7.5% 3.6% M 0.0% 0.0% 3.6% L 0.0% 0.0% 3.6% general, the phylogenetically meaningful alignment is recom- mended [17]. 3.1.6. Clerical error While some of the above mentioned issues can also be regarded as clerical errors, we list only those here that are undoubtedly introduced by manual data transfer (Table 3f). Again, those would be captured by some electronic evaluation of the data table, such as the emp-tool. 3.2. Results of the Iberian population comparisons A total of 727 mtDNA control region haplotypes from 6 Iberian populations (Basque, Central Portugal, 2 North Portugal and 2 mixed Spain; Tables S1 and 4) were analyzed and AMOVA was used to test for significant variation in the genetic structure (Table 5). Most of the observed genetic variation was attributable to differences within populations (99.76%). Variance among popula- tions accounted for 0.24% (Table 5a). The Basque population [n = 55] North Portugal [n = 132] Spain [n = 249] Spain [n = 154] 45.5% 56.6% 51.3% 17.4% 14.9% 13.0% 19.7% 22.9% 20.1% 1.5% 0.4% 4.6% 9.8% 2.4% 7.8% 1.5% 0.8% 0.6% 4.6% 2.0% 2.6% http://www.empop.org/modules/emptool/ L. Prieto et al. / Forensic Science International: Genetics 5 (2011) 146–151 151 differed significantly in its composition of mtDNA lineages from both North Portuguese populations and one mixed Spanish population (Table 5b). This result may be explained by the relative overrepresentation of hg R0 lineages in the Basque population sample and the lack of hg L lineages that are present, albeit at low frequencies, in the other populations (Table 6). We note here that the different sample sizes may also have an effect on these results. All Iberian populations shared (common) haplotypes to relatively great extent (Table S2). The Basque shared approxi- mately half of their haplotypes (46.81%) with other Iberian populations from Spain and Portugal. All six Iberian populations included the same most common haplotype 263G 315.1C that represents the most common HVS-I/II haplotype in west Eurasia (here grouped under hg R0). 4. Conclusions One of the most important issues in the forensic use of mtDNA analyses is the difficulty of accurately transmitting the signifi- cance of a match (non-exclusion) between unknown and reference samples to court. Non-DNA experts may not immedi- ately be aware of the difference between nDNA and mtDNA evidence, which can then lead to overestimation of the mtDNA match (or underestimation of its significance when only statistical numbers are compared). Also reliable mtDNA population data in forensics are still scarce although many studies have been published. A sometimes unacceptable rate of error makes some of these studies unfortunately unusable. This is one of the main reasons why forensic mtDNA database projects need to be expanded. Due to the wide variability of populations that are presented in the GHEP-ISFG group and in order to join forces and make individual datasets available to the forensic community, we have carried out the present project in collaboration with the EMPOP database. The remittance of our data has been very useful since some of our populations are not represented in EMPOP (Release 2) yet. Our data reviewing process confirmed earlier findings [2,18] that the majority of errors occur due to manual documentation processes without rigorous scrutiny. This study demonstrates that a posteriori plausibility and phylogenetic evaluations help to uncover data idiosyncrasies and obvious errors. By inspection of the raw data we were then able to solve ambiguities. Acknowledgements Antonio Amorim is greatly acknowledged for hosting the inauguration of the GHEP–EMPOP collaboration and for useful discussion. We are grateful to the contribution of Alexander Röck (Innsbruck), who provided software tools to handle the data. We would like to thank Theresa Harm (Innsbruck) for careful analysis of raw data. This study received support by the Austrian Science Fund (FWF): TR397. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.fsigen.2010.10.013. References [1] M.M. Holland, T.J. Parsons, Mitochondrial DNA sequence analysis-validations and use for forensic casework, Forensic Sci. Rev. 11 (1999) 21–50. [2] W. Parson, H.-J. Bandelt, Extended guidelines for mtDNA typing of population data in forensic science, Forensic Sci. Int. Genet. 1 (2007) 13–19. [3] W. Parson, A. Dür, EMPOP—a forensic mtDNA database, Forensic Sci. Int. Genet. 1 (2007) 88–92. [4] C. Turchi, L. Buscemi, C. Previderè, P. Grignani, A. Brandstätter, A. Achilli, W. Parson, A. Tagliabracci, Ge.F.I. Group, Italian mitochondrial DNA database: results of a collaborative exercise and proficiency testing, Int. J. Legal Med. 122 (2008) 199–204. [5] M.A. Alfonso-Sánchez, S. Cardoso, C. Martı́nez-Bouzas, J.A. Peña, R.J. Herrera, A. Castro, I. Fernández-Fernández, M.M. De Pancorbo, Mitochondrial DNA hap- logroup diversity in Basques: a reassessment based on HVI and HVII polymorph- isms, Am. J. Hum. Biol. 20 (2008) 154–164. [6] W. Parson, The art of reading sequence electropherograms, Ann. Hum. Gen. 71 (2007) 276–278. [7] S.A. Miller, D.D. Dykes, H.F. Polesky, A simple salting out procedure for extracting DNA from human nucleated cells, Nucleic Acids Res. 16 (3) (1988) 1215. [8] P.M. Vallone, R.S. Just, M.D. Coble, J.M. Butler, T.J. Parsons, A multiplex allele- specific primer extension assay for forensically informative SNPs distributed throughout the mitochondrial genome, Int. J. Legal Med. 118 (2004) 147–157. [9] H.-J. Bandelt, P. Lahermo, M. Richards, V. Macaulay, Detecting errors in mtDNA data by phylogenetic analyses, Int. J. Legal Med. 115 (2001) 64–69. [10] A. Brandstätter, R. Klein, N. Duftner, P. Wiegand, W. Parson, Application of a quasi- median network analysis for the visualization of character conflicts to a popula- tion sample of mitochondrial DNA control region sequences from southern Germany (Ulm), Int. J. Legal Med. 120 (2006) 310–314. [11] M. van Oven, M. Kayser, Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation, Hum. Mutat. 30 (2009) E386–E394. [12] A. Brandstätter, T. Sänger, S. Lutz-Bonengel, W. Parson, E. Béraud-Colomb, B. Wen, Q.-P. Kong, C.M. Bravi, H.-J. Bandelt, Phantom mutation hotspots in human mitochondrial DNA, Electrophoresis 26 (2005) 3414–3429. [13] H.-J. Bandelt, A. Salas, C.M. Bravi, What is a ‘novel’ mtDNA mutation – and does ‘novelty’ really matter? J. Hum. Genet. 51 (2006) 1073–1082. [14] L. Excoffier, H.E.L. Lischer, Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows, Mol. Ecol. Resour. 10 (2010) 564–567. [15] S. Lutz, H.J. Weisser, J. Heizmann, S. Pollak, A third hypervariable region in the human mitochondrial D-loop, Hum. Genet. 101 (1997) 384. [16] M.R. Wilson, M.W. Allard, K. Monson, K.W. Miller, B. Budowle, Recommendations for consistent treatment of length variants in the human mitochondrial DNA control region, Forensic Sci. Int. 10 (2002) 35–42. [17] H.-J. Bandelt, W. Parson, Consistent treatment of length variants in the human mtDNA control region: a reappraisal, Int. J. Legal Med. 122 (2008) 11–21. [18] W. Parson, A. Brandstätter, A. Alonso, N. Brandt, B. Brinkmann, A. Carracedo, D. Corach, O. Froment, I. Furac, T. Grzybowski, K. Hedberg, C. Keyser-Tracqui, T. Kupiec, S. Lutz-Bonengel, B. Mevag, R. Ploski, H. Schmitter, P. Schneider, D. Syndercombe-Court, E. Sørensen, H. Thew, G. Tully, R. Scheithauer, The EDNAP mitochondrial DNA population database (EMPOP) collaborative exercises: orga- nisation, results and perspectives, Forensic Sci. Int. 139 (2004) 215–226. http://dx.doi.org/10.1016/j.fsigen.2010.10.013 The GHEP-EMPOP collaboration on mtDNA population data-A new resource for forensic casework Introduction Materials and methods Participants, samples and requirements Summary of methods EMPOP revision process Population studies Results and discussion Results of the revision process Reference bias Phantom mutations Base mis-scoring Nomenclature issues Alignment violation Clerical error Results of the Iberian population comparisons Conclusions Acknowledgements Supplementary data References