Title: Pierre Taberlet, Eva Bellemain, Aur
1Genotyping errors
- Pierre Taberlet, Eva Bellemain, Aurélie Bonin,
François Pompanon - Laboratoire d'Ecologie Alpine, CNRS UMR 5553,
- Université Joseph Fourier, Grenoble, France
2Genotyping errors
- Bonin A, Bellemain E, Bronken Eidesen P, Pompanon
F, Brochmann C, Taberlet P (2004) How to track
and assess genotyping errors in population
genetics studies. Molecular Ecology, 13,
3261-3273. - Pompanon F, Bonin A, Bellemain E, Taberlet P
(2005) Genotyping errors causes, consequences
and solutions. Nature Reviews Genetics, in press.
3Genotyping errors
- Definition
- Non-invasive sampling and genotyping errors
- Causes of genotyping errors
- Quantifying genotyping errors
- Consequences of genotyping errors
- How to limit genotyping errors and their impact?
4Genotyping errors
- Definition
- Non-invasive sampling and genotyping errors
- Causes of genotyping errors
- Quantifying genotyping errors
- Consequences of genotyping errors
- How to limit genotyping errors and their impact?
5Definition
- A genotyping error occurs when the observed
genotype of an individual does not correspond to
the true genotype. - Genotyping errors can have strong consequences on
the biological message that can be deduced from
the data.
6Distribution of papers on "genotyping errors"
according to their publication year
- Apparently, more and more attention is paid to
genotyping errors.
7Distribution of papers on "genotyping errors"
according to their subject
- Genotyping errors are a concern for some research
field only (linkage analyses, non-invasive
methods). - What about the other fields using genetic tools?
(population genetics/genomics?)
8Genotyping errors
- Definition
- Non-invasive sampling and genotyping errors
- Causes of genotyping errors
- Quantifying genotyping errors
- Consequences of genotyping errors
- How to limit genotyping errors and their impact?
9Non-invasive sampling and genotyping errors
- Historical aspects
- Solutions to limit genotyping errors
- Towards a quality index
- Practicals estimation of the quality index
10Questions about the Pyrenean bear population
Papillon, Photo J.-J. Camarra, ONC, Août 1995
Geographic distribution of the brown bear in
Europe
11Questions about the Pyrenean bear population
- Where to take bears to reinforce the endangered
Pyrenean population? - How many bears are left in the Pyrenees?
- How many males and females?
12The three different sampling methods
- Destructive sampling
- Non-destructive sampling
- Non-invasive sampling
13Destructive sampling
- The animal is killed in order to obtain the
tissues necessary for genetic analysis. - This sampling strategy has been used extensively
for isozyme studies, and for mtDNA analysis
before PCR was discovered. - It has been abandoned by many researchers.
14Non-destructive sampling
- The animal is often captured, and a biopsy or a
blood sample is taken invasively. - However, some invasive sampling strategies do not
require catching the animal. - For example tissues can be obtained from whales
and some other large mammals by using biopsy dart
guns.
15Non-invasive sampling
- This term should be restricted to situation where
the source of DNA is left behind and is collected
without having to catch or disturb the animal. - In the literature, non-destructive sampling is
often improperly considered as non-invasive. - Catching a mammal (or a bird) and plucking a few
hairs (or feathers) should not be considered as
non-invasive, but rather as non-destructive.
16Non-invasive genetic sampling only possible via
PCR
- Mullis KB, Faloona FA (1987) Specific synthesis
of DNA in vitro via a polymerase-catalysed chain
reaction. Methods in Enzymology, 155, 335-350. - Saiki RK, Gelfand DH, Stoffel S, Scharf SJ,
Higuchi R, Horn GT, Mullis KB, Erlich HA (1988)
Primer-directed enzymatic amplification of DNA
with a thermostable DNA polymerase. Science, 239,
487-491.
17Problematic results about the census of the
Pyrenean bear population (in 1994)
- More bears than expected!
- No success when trying to replicate the results.
- Two years to understand and solve the problem.
18Potential of non-invasive genetic sampling two
opposing point of view
- Non-invasive sampling can exploit the full
potential of DNA analysis. - True for mtDNA
- Dominant opinion ten years ago
- Non-invasive sampling has serious limitations.
- Many technical problems
- Possibility of genotyping errors
19Non-invasive sampling can exploit the full
potential of DNA analysis
- Morin PA, Moore JJ, Chakraborty R, Jin L, Goodall
J, Woodruff DS (1994) Kin selection, social
structure, gene flow, and the evolution of
chimpanzees. Science, 265, 1193-1201. - Microsatellite study using hairs as a source of
DNA. - Males are more homozygous than females
in general. - Males are staying in their group more than  Â
females. - Wrong results due to more genotyping
errors in males (mainly allelic
dropout).
20Non-invasive sampling has serious limitations
- Gerloff U, Schlötterer C, Rassmann K, Rambold I,
Hohmann G, Fruth B, Tautz D (1995) Amplification
of hypervariable simple sequence repeats
(microsatellites) from excremental DNA of wild
living bonobos (Pan paniscus). Molecular Ecology,
4, 515-518. - Taberlet P, Griffin S, Goossens B, Questiau S,
Manceau V, Escaravage N, Waits LP, Bouvet J
(1996) Reliable genotyping of samples with very
low DNA quantities using PCR. Nucleic Acids
Research, 26, 3189-3194. - Gagneux P, Boesch C, Woodruff DS (1997)
Microsatellite scoring errors associated with
noninvasive genotyping based on nuclear DNA
amplified from shed hair. Molecular Ecology, 6,
861-868.
21Gagneux P, Woodruff DS, Boesch C (1997) Furtive
mating in female chimpanzees. Nature, 387 (22 May
1997), 358-359.
22Gagneux P, Woodruff DS, Boesch C (1997) Furtive
mating in female chimpanzees. Nature, 387 (22 May
1997), 358-359.
- Paternity study of the offspring of a chimpanze
community. - Half of the offspring did not display any allele
inherited from an intragroup father. - Conclusion these offspring had an extragroup
father. - The dataset contained allelic dropouts (paper
retraction in 2001).
23Scan of the Gagneux paper in Mol Ecol
24Genotyping errors main difficulties in
non-invasive sampling
- Contamination
- Allelic dropout
- False alleles
25Contamination
- Behind the possibility of detecting a single
target molecule, there is also a possibility of
detecting a single contaminant molecule. - Working with non-invasive genetic sampling is
similar to ancient DNA studies.
26Genotyping errors allelic dropout
- For a heterozygous individual, only one allele is
present in the template and/or is amplified in
the PCR reaction. - This error produces a false homozygote.
27Genotyping errors false alleles
- Artifacts can be generated during the first
cycles of the PCR reaction, and can be
misinterpreted as true alleles. - Very difficult to discern from sporadic
contamination.
28Genotyping errors example
Allele A
Brown bear Locus G10B
Allele B
- Five independent genetic typing using the same
DNA extract (from feces).
29Genotyping errors example
Allele A
Allele B
- Fifty independent genotyping experiments using
the same DNA extract (from a bear feces) locus
G10B.
30Genotyping errors example
Seven independent experiments using the same DNA
extract from a bear feces.
31Genotyping errors example
Seven independent experiments using the same DNA
extract from a single marmot hair.
32Influence of the amount of template DNA
From Goossens et al., 1998
33Allelic dropout mathematical model
- The model is restricted to the genotyping of an
individual bearing alleles A and B at an
autosomal locus. - Many assumptions have been made.
34Allelic dropout mathematical model assumptions
- The DNA extract contains equal numbers of the
alleles A and B. - A single target molecule can be amplified and
detected. - Each single target molecule has the same
probability of being amplified. - 100 PCRs and be performed using the DNA extract,
and the target DNA molecules are distributed
randomly among the 100 PCR tubes. - If the initial proportion between alleles A and B
(A/B or B/A) in the PCR tube is greater than or
equal to five, then only the most common allele
will be detected.
35The problem of very small DNA samples simulations
Simulations for a heterozygote individual with
alleles A and B
Simulations for a heterozygote individual with
alleles A and B.
correct genotyping
correct genotyping
36Results of the simulations
PCR product
(at least one
allele)
correct
genotyping
(both alleles)
one cell contains
about 7 picograms
of DNA
template DNA per
amplification (picograms)
Probability of correct genotyping at a
heterozygote microsatellite locus using very mall
DNA samples
DNA samples
37Guidelines for genotyping very small DNA samples
- Multiple-tube approach.
- Navidi W, Arnheim N, Waterman MS (1992) A
multiple-tube approach for accurate genotyping of
very small DNA samples by using PCR statistical
considerations. American Journal of Human
Genetics, 50, 347-359. - Taberlet P, Griffin S, Goossens B, Questiau S,
Manceau V, Escaravage N, Waits LP, Bouvet J
(1996) Reliable genotyping of samples with very
low DNA quantities using PCR. Nucleic Acids
Research, 26, 3189-3194.
38Guidelines for genotyping very small DNA samples
- The guidelines are only valid under the following
conditions. - A single target molecule can be detected.
- The amount of template DNA is very low, in the
picogram range, but is not accurately know.
39Guidelines for genotyping very small DNA samples
- Confidence of 99.
- Multiple-tube approach.
- Heterozygotes an allele can be recorded only if
it has been found at least twice. - Homozygotes an individual can be considered as
homozygous only if eight independent experiments
have shown the same allele.
40Guidelines for genotyping very small DNA samples
- How to avoid or to limit the impact of the
multiple-tube approach? - By estimating the amount of template DNA
- Miller C, Joyce P, Waits L (2002) Assessing
allelic dropout and genotype reliability using
maximum likelihood. Genetics, 160, 357-366. - Morin P, Chambers K, Boesh C, Vigilant L (2001)
Quantitative polymerase chain reaction analysis
of DNA from noninvasive samples for accurate
microsatellite genotyping of wild chimpanzees
(Pan troglodytes verus). Molecular Ecology, 10,
1835-1844.
41Quantitative PCR (from Morin et al., 2001)
Relationship between the initial amount of
template DNA in the PCR and both the proportion
of PCRs with amplification product (grey squares)
and the proportion of PCRs with allelic dropout
(black circles).
42Towards a quality index
- Goal estimate a quality index associated to each
sample. - This quality index should allow comparisons among
samples, loci, and studies. - Restricted to the situation where the
multiple-tube approach is used.
43Towards a quality index
- The estimation of the quality index (QI) is based
on the analysis of the whole set of
electropherograms produced when using the
multiple-tube approach. - For each locus of a given sample, a QI is
estimated using the following steps - Step 1 estimation of the most likely consensus
genotype - Step 2 estimation of the score for each repeat
- Step 3 estimation of the quality index for the
locus
44Towards a quality index
- Step 1 estimation of the most likely consensus
genotype after simultaneous observation of the
electropherograms corresponding to the different
repeats of this locus. An allele is considered
only if it is present at least twice among the
different repeats. - Step 2 estimation of the score for each repeat.
If the electropherogram at one repeat corresponds
to the consensus genotype, the score "1" is
assigned, otherwise the score "0" is assigned,
whetever the differences. - Step 3 estimation of the QI for the locus. The
scores assigned to each repeat are summed, and
divided by the number of repeats. - Step 4 estimation of the mean QI per locus and
per individual.
45Additional rules
- No signal is scored as "0".
- Electropherograms with an additional allele are
scored as "0". - If the less intense allele is less than 20 of
the most intense allele, a score of "0" is given.
46Quality index example 1
Multiple-tube approach, 8 repeats
47Quality index example 2
Multiple-tube approach, 8 repeats
0
1
0
0
Step 2 score for each repeat
1
0
0
0
48Quality index example 3
Multiple-tube approach, 8 repeats
1
1
0
1
Step 2 score for each repeat
1
0
0
1
49Quality indexes for loci, samples, and study
Samples Samples Samples Samples Samples
1 2 3 4 5 mean
Locus 1 0.88 0.63 0.75 0.00 1.00
Locus 2 1.00 0.38 1.00 0.25 1.00
Locus 3 1.00 0.25 0.63 0.25 1.00
mean mean
50Quality indexes for loci, samples, and study
Samples Samples Samples Samples Samples
1 2 3 4 5 mean
Locus 1 0.88 0.63 0.75 0.00 1.00 0.65
Locus 2 1.00 0.38 1.00 0.25 1.00 0.73
Locus 3 1.00 0.25 0.63 0.25 1.00 0.63
mean mean
51Quality indexes for loci, samples, and study
Samples Samples Samples Samples Samples
1 2 3 4 5 mean
Locus 1 0.88 0.63 0.75 0.00 1.00 0.65
Locus 2 1.00 0.38 1.00 0.25 1.00 0.73
Locus 3 1.00 0.25 0.63 0.25 1.00 0.63
mean mean 0.96 0.42 0.79 0.17 1.00
52Quality indexes for loci, samples, and study
Samples Samples Samples Samples Samples
1 2 3 4 5 mean
Locus 1 0.88 0.63 0.75 0.00 1.00 0.65
Locus 2 1.00 0.38 1.00 0.25 1.00 0.73
Locus 3 1.00 0.25 0.63 0.25 1.00 0.63
mean mean 0.96 0.42 0.79 0.17 1.00 0.67
53Quality indexes for samples
54Quality indexes for loci
55Non-invasive census of the brown bears from the
Deosai National Park (Pakistan)
56Non-invasive census of the brown bears from the
Deosai National Park (Pakistan)
57Genotyping errors
- Definition
- Non-invasive sampling and genotyping errors
- Causes of genotyping errors
- Quantifying genotyping errors
- Consequences of genotyping errors
- How to limit genotyping errors and their impact?
58Causes of genotyping errors
- Very diverse, complex, and sometimes cryptic
origins. - Grouping errors into discrete categories
according to their causes is challenging. - DNA sequence
- Low DNA quantity or quality
- Biochemical artifacts
- Human errors
59"A" artifact
CGATCGTTAATCAGAATGCATACCGCA GCTAGCAATTAGTCTTACGT
ATGGCG
60(No Transcript)
61"A" artifact example
62Three solutions
- Enzymatic treatment of the PCR product with T4
DNA polymerase to remove the additional "A". - Modification of the PCR parameters.
- Modification of the 5' end of the non-labeled
primer.
63Modification of the PCR parameters
64Modification of the primer principle
65Modification of the primer result
66(No Transcript)
67PCR conditions
68Modifications reducing the "A"
69Modifications enhancing the "A"
70(No Transcript)
71PCR conditions
72Experiments
73Original XA modified XT
Enhance the "A"
CGATCGTTAATCAGAATGCATACCGCA GCTAGCAATTAGTCTTACGT
ATGGCGT
original
CGATCGTTAATCAGAATGCATACCGCTA GCTAGCAATTAGTCTTACG
TATGGCGA
modified
74Original XT modified XA
Reduce the "A"
CGATCGTTAATCAGAATGCATACCGCTA GCTAGCAATTAGTCTTACG
TATGGCGA
original
CGATCGTTAATCAGAATGCATACCGCA GCTAGCAATTAGTCTTACGT
ATGGCGT
modified
75"A" artifact conclusion (1)
- Do not perform a final elongation without reason
(this elongation enhance the "A" artifact). - Even at 4C, the "A" is slowly added.
- Use the most simple PCR protocol at the
beginning. - In case of scoring difficulty, identify if the
marker is "2-steps" or "3-steps". - Modify the primers and the PCR protocol if
necessary.
76"A" artifact conclusion (2)
- To enhance the "A"
- Final elongation (up to 90 minutes)
- Put a "G" at the 5' end of the reverse primer
- Add a "PIGtail" GTGTCTT
- To reduce the "A"
- 2-step PCR without final elongation
- Put a "T" at the 5' end of the reverse primer
- Good luck with multiplexing markers!
77DNA molecules interactions
- Cause DNA sequence flanking the marker
- No or less efficient amplification because of a
mutation in the target primer sequence (null
allele) - Insertion or deletion in the amplified fragment
(size homoplasy of different alleles) - In heterozygous individuals, preferential
amplification of one allele when its denaturation
is favoured (allelic dropout)
78Sample quality
- Cause 1 Low DNA quality or quantity
- In heterozygous individuals, amplification of
only one allele (allelic dropout) - In heterozygous individuals, preferential
amplification of the shorter allele (short allele
dominance) - Cause 2 Contamination of the DNA extract
- Amplification of a contaminant allele (mistaken
allele) - Cause 3 Extract quality
- No or less efficient restriction/amplification
due to inhibitors (allelic dropout)
79Biochemical artifacts and equipment
- Cause 1 Low quality reagents
- Allelic dropout, mistaken alleles
- Cause 2 Equipment precision or reliability
- Allelic dropout, mistaken alleles
- Cause 3 Taq polymerase errors
- False allele
- Cause 4 Lack of specificity
- Mistaken allele
- Cause 5 Electrophoresis artifacts
- Size homoplasy of different alleles, mistaken
alleles
80Human factor
- Cause 1 sample manipulation
- Confusion between samples (e.g. mislabelling or
tube mixing) (mistaken allele(s)) - Cause 2 Experimental error
- Contamination with an exogenous DNA or
cross-contamination between samples (mistaken
allele(s)) - Use of an inappropriate protocol (reagent
forgotten, wrong hybridization temperature,
primers, or concentrations of reagents) (allelic
dropout, mistaken allele(s)) - Cause 3 Data handling
- Misreading of the profile or misidentification of
the fluorescent peak (mistaken allele) - Miscopying or confusion of the genotypes in the
database (mistaken allele) - Computing data bug in the database/analysis
program (mistaken allele)
81Genotyping errors
- Definition
- Non-invasive sampling and genotyping errors
- Causes of genotyping errors
- Quantifying genotyping errors
- Consequences of genotyping errors
- How to limit genotyping errors and their impact?
82Quantifying genotyping errors
- Different estimates, based on replicates within a
dataset, have been defined to quantify error
rates. - Some metrics have been proposed for specific
errors such as allelic dropouts or false alleles.
- More global metrics, which take into account all
types of detectable genotyping errors, are also
commonly used although they have never been
explicitly defined.
83Quantifying genotyping errors
- First, a reference genotype must be defined as
the genotype that minimizes the number of errors
in the comparison among replicates. Several
reference genotypes may exist. If only two
replicates are performed and give contradictory
genotypes, either one or the other can be
considered as the reference. - The calculation of error rates is based on the
number of mismatches between the reference
genotype and the replicates.
84Quantifying genotyping errors
- n individual single-locus genotypes have been
replicated t times. - For diploid individuals, 2nt alleles and nt loci
are typed and can be compared to the reference. - Estimation of the error rates at the allelic,
locus, multilocus, and reaction levels.
85Quantifying genotyping errors
- Mean allelic error rate
- Mean error rate per locus
- Error rate per multilocus genotype
- Error rate per reaction
86Mean allelic error rate ea
- The mean allelic error rate ea is the ratio
between ma, the number of allelic mismatches, and
2nt, the number of replicated alleles. - For microsatellite markers, the error rate per
allele can also be estimated for each particular
allele to eventually point out error-prone
alleles (for example, alleles prone to dropouts).
87Mean error rate per locus el
- The mean error rate per locus is the ratio
between ml, the number of single locus genotypes
including at least one allelic mismatch, and nt,
i.e. the number of replicated single locus
genotypes. - This metric can also be estimated for each
particular locus, to help identifying the
error-prone loci. - As it can be compared between studies and
samples, it should become the standard metric.
88Error rate per multilocus genotype eobs (1)
- The observed error rate per multilocus genotype
eobs is the ratio between mg, the number of
multilocus genotypes including at least one
allelic mismatch, and nt, the number of
replicated multilocus genotypes. - This metric is particularly informative for
individual identification, parentage analyse or
population size estimation.
89Error rate per multilocus genotype eind (2)
- If genotyping errors occur independently among l
loci (which is very unlikely), the error rate per
multilocus genotype eind is deduced from the
single-locus error rate ei at each locus i
90Error rate per reaction er
- The error rate per reaction er is the ratio
between ml, the number of single-locus genotypes
including at least one allelic mismatch and r,
the total number of reactions. - This metric is equivalent to the mean error rate
per locus when the PCR reaction involves one
locus or to the multilocus error rate when all
loci are amplified in a single multiplex reaction.
91Estimation of the error rates per allele and per
locus, for four replicates (t4) of three
individuals (n3)
replicates replicates replicates replicates Reference genotype Reference genotype Reference genotype Error rate per allele Error rate per locus
1 2 3 4 Reference genotype Reference genotype Reference genotype Error rate per allele Error rate per locus
Genotyped individuals Ind 1 Al 1 A A B A A 3/8 2/4
Genotyped individuals Ind 1 Al 2 A B C A A 3/8 2/4
Genotyped individuals Ind 2 Al 1 A B B A A or B 2/8 2/4
Genotyped individuals Ind 2 Al 2 B B B B B or B 2/8 2/4
Genotyped individuals Ind 3 Al 1 A A A A A 1/8 1/4
Genotyped individuals Ind 3 Al 2 C C B C C 1/8 1/4
mean mean mean 1/4 5/12
92Reference papers
- Bonin A, Bellemain E, Bronken Eidesen P, Pompanon
F, Brochmann C, Taberlet P (2004) How to track
and assess genotyping errors in population
genetics studies. Molecular Ecology, 13,
3261-3273. - Hoffman J, Amos W (2005) Microsatellite
genotyping errors detection approaches, common
sources and consequences for paternal exclusion.
Molecular Ecology, 14, 599-612.
93Example of error rates
- Bonin et al. (2004)
- Bear tissues 0.008 per locus
- Bear faeces 0.019 per locus
- AFLP 0.019 to 0.026 per locus
- Hoffman and Amos (2005)
- 2000 antarctic fur seal genotyped at 9
microsatellite loci - 0.0013 to 0.0074 per locus
- Human errors are the most important cause
94Genotyping errors
- Definition
- Non-invasive sampling and genotyping errors
- Causes of genotyping errors
- Quantifying genotyping errors
- Consequences of genotyping errors
- How to limit genotyping errors and their impact?
95Consequences of genotyping errors
- Linkage and association studies
- Individual identification
- Population genetic studies
96Linkage and association studies
- Erroneous genotypes might markedly affect linkage
and association studies by hiding the true
segregation of alleles. - The impact on the results is measured by
experimental or simulation studies and can be
serious even for low error rates (e.g. lt 3). - For example, in linkage studies, genotyping
errors can affect the haplotype frequency and
eventually lead to inflation of genetic map
lengths. - Error rates as low as 3 have serious effects on
linkage disequilibrium analysis, and a 1 error
rate can generate a loss of 53-58 of the linkage
information for a trait locus. However, modest
error rates might be tolerable in situations that
do not involve rare alleles, as in QTL studies.
97Linkage and association studies
- In association studies, because recombination is
rare, errors mostly affect non-recombinant
genotypes, which are then erroneously interpreted
as being the result of recombination. Errors
therefore decrease the power for detecting
associations. - The importance of the experimental design has
also to be emphasised as it can generate errors
that are not randomly distributed across
phenotypes (i.e., differential errors). This can
be the case when controls and cases are genotyped
in different assays while investigating the
genetic basis of a disease. Differential and
non-differential errors can have opposite
consequences on the rate of false positive in
statistical tests of association.
98Individual identification
- Genotyping errors can strongly affect individual
identification studies that are based on
multilocus genotypes by erroneously increasing
the number of genotypes observed in a population
sample. - In census studies of rare or elusive species, the
population size can be estimated based on the
identified genotypes from non-invasive samples
collected in the field (e.g., hair or faeces). In
this context, genotyping errors can lead to a
serious overestimate of population size. - A 200 overestimate of population size has been
found with a 5 error rate per locus when using 7
to 10 loci for genotype identification (Creel et
al., 2004). Such an overestimate obviously
increases with the number of loci and with the
number of samples per genotype.
99Individual identification
- Genotyping errors also have a huge impact in
parentage analysis, generating wrong paternity or
maternity exclusion. - Such information on population size and structure
are required in conservation biology, and their
inaccurate estimation due to genotyping errors
could result in wrong decision in population
management. - In forensic DNA analyses, a false multilocus
genotype can prevent the identification of a
corpse or lead to erroneous identification (or
exoneration) of criminal offenders.
100Population genetic studies
- Most of the studies that take genotyping error
into account in population genetics are those
that use non-invasive samples, which are
error-prone because of the low quality and/or
quantity of DNA. - However, it has been demonstrated that even with
high quality DNA the error rate might not be
negligible. - The impact of genotyping errors remains largely
unknown in this field, because very few studies
have dealt with this topic until now. - Genotyping errors may lead to erroneous allele
identification or allele frequencies, resulting
in wrong Fst estimates, false migration rates, or
false detection of selection or population
bottlenecks.
101Population genetic studies
- Analyses based on allele frequencies will be less
affected by errors than those based on individual
identification (e.g., parentage analysis), but
will be sensitive to sampling effects. - The apparent low impact of scoring differences
has been demonstrated on an AFLP data set that
was scored by two different scientists. The two
scorers had only 38 of the marker loci in
common, but the same biological conclusions about
population genetic structure was extracted from
the data. In this study, the robustness of the
inferred biological message was certainly due to
the redundancy of the information contained in
the large amount of AFLP markers (more than 200
polymorphic loci screened by both scorers). - Population genomics studies looking for selected
markers among several hundred markers would be
very sensitive to the impact of genotyping error,
especially if the errors are population-specific.
There is a great need for studies on the impact
of genotyping error in this new emerging field.
102Genotyping errors
- Definition
- Non-invasive sampling and genotyping errors
- Causes of genotyping errors
- Quantifying genotyping errors
- Consequences of genotyping errors
- How to limit genotyping errors and their impact?
103How to limit genotyping errors and their impact?
- The worse situation arises when a scientist
realises at the end of a study that the data were
not reliable due to genotyping errors, and that
the dataset is not retrievable. - Such situations are almost never reported in the
literature, but their occurrence is probably not
rare. - Therefore, it is important to take into account
the possibility of genotyping errors when
designing the experimental protocol.
104How to limit genotyping errors and their impact?
- The strategy consists in demonstrating, via an
appropriate procedure, that the data produced and
the results obtained are reliable. - The diversity of case studies, error causes, and
laboratory contexts makes it impossible to
propose a universal and simple procedure. - As a consequence, the possible solutions to limit
the occurrence and the impact of genotyping
errors are case-specific. - The optimal strategy will be determined by
several factors, such as the biological question,
the tolerable error rate, the sampling
possibilities, the equipment and technical skills
that are locally available, the financial support
and time constraints.
105How to limit genotyping errors and their impact?
- General recommendations
- Limiting the production of errors during
genotyping - Cleaning the dataset after genotyping
- Analysing data taking into account the errors
- Towards quality processes for genotyping
- Practicals establishing reliable experimental
protocols (case studies)
106General recommendations (1)
- A first step consists in checking that the
genotyping experiments necessary to reach the
scientific goal are realistic according to the
sample quality and the technical skills available
(bad sample quality and limited technical skills
obviously influence the error rate). - A second step involves carrying out a pilot study
designed to first evaluate the theoretical error
rate compatible with the data analysis, and then
to estimate the real error rate based on the
analysis of a subset of the samples. - Finally, it is important to be aware of potential
problems all along the experimental procedure,
even after a successful pilot study, from
sampling to data analysis.
107General recommendations (2)
- Quality controls should be performed in real time
during each step and each batch of experiments. - They should also be diverse for being able to
detect as many types of errors as possible. For
example, highly reproducible errors such as null
alleles cannot be detected by replicates, and
require Hardy-Weinberg tests or inheritance
studies. On the contrary, stochastic allelic
dropouts might not be detected by Hardy-Weinberg
tests, but by replicating the genotyping assays. - Control procedures are costly and time consuming.
Thus the effort for reducing the error rate must
be adapted to the foreseeable impact of the
genotyping errors. - Because genotyping errors may be generated even
with high quality standards, and because they
cannot be all detected, efforts must be directed
towards limiting both their production and their
subsequent impact.
108Limiting the production of errors during
genotyping (1)
- Given that human factors can be the main issue
during genotype production, the most efficient
approach is to concentrate first on minimizing
human error. - Only well-trained bench scientists/technicians
should be involved, as suggested by quality
assurance standards for forensic DNA testing
laboratories. - Only standardized and validated procedures should
be used. - Human manipulation should be reduced as much as
possible according to the automation
possibilities, from all handling and pipeting
steps to allele scoring. However, for allele
scoring, software packages are not yet
sophisticated enough to prevent scoring errors.
Semi-automated scoring followed by human visual
inspection appears to be the most reliable
procedure.
109Limiting the production of errors during
genotyping (2)
- Limiting genotyping errors during laboratory
experiments requires the systematic use of an
appropriate number of positive and negative
controls, but also requires the implementation of
replicates for real-time error detection and
error rate estimation. - In every situation, even with high quality DNA,
replicating five to 10 of the samples has been
recommended, but the amount can vary according to
the goal of the study and the potential impact of
errors. - As far as possible, these replicates have to be
carried out blind and independently. - This involves implementing the blind process from
the beginning of the experiment, by carrying out
a systematic duplication of the samples during
sample collection. Such a procedure will not only
allow to detect all laboratory errors, but will
also pick up handling errors at any stage of the
analysis. Moreover, comparing blind samples and
original experiments will produce a fair estimate
of the error rate.
110Limiting the production of errors during
genotyping (3)
- When genotyping errors are highly probable, blind
replicates are still necessary but not
sufficient. The systematic replication of each
genotyping assay (i.e., multiple-tube approach)
may be required to define the consensus
genotypes. - There is a trade-off between the cost of the
experiments and the reliability of the genotypes.
- One role of the pilot study is to determine the
optimal number of replicates required. - In some cases, errors can also be detected by
replicating the genotyping process using a
different technology such as sequencing whose
error rates are typically lower than standard
genotyping technologies.
111Cleaning the dataset after genotyping (1)
- Even if all erroneous genotypes detected during
the experiments are removed, and eventually
corrected after re-genotyping, some undetected
errors will certainly remain in the data set. A
part of them can still be detected or suspected
by looking at the concordance with independent
data. - The power of detecting errors by consistency with
independent data can influence the strategy for
limiting errors. - It might be more efficient to retype erroneous
genotypes detected by consistency checking than
running a large proportion of blind replicates.
112Cleaning the dataset after genotyping (2)
- Testing Hardy-Weinberg equilibrium is common to
check the quality of the data, under the
assumption that a high error rate implies
disequilibrium. However, many other causes can
lead to disequilibrium, including selection,
inbreeding and population admixture. - Moreover, just a few types of error might produce
disequilibrium, such as null alleles and allelic
dropouts. - Therefore there is still a need for other
controls and replicates for detecting errors that
are compatible with Mendelian inheritance and
Hardy-Weinberg equilibrium.
113Cleaning the dataset after genotyping (3)
- Several computer programs specifically designed
to detect potential errors are now available. - Most of them check for Mendelian consistency
and/or Hardy-Weinberg equilibrium, and are
commonly used for pedigree analyses and linkage
studies. - Some others have been developed to track some
kinds of errors that can be compatible with
Mendelian inheritance or Hardy-Weinberg
equilibrium. For example, some detect a spurious
excess of recombinants in linkage studies and
others focus on inconsistencies between
replicates.
114Cleaning the dataset after genotyping (4)
- Removing errors might not reduce bias, depending
on the number and kind of errors detected and the
bias each one creates. - For instance, when correcting Mendelian-incompatib
le genotypes by retyping or removing families in
which they occur, the undetected errors can
produce an excess of false positives for some
family-based association tests. This problem has
been addressed by developing an appropriate
Likelihood Ratio Test based on a general genotype
error model. - In general, taking into account the occurrence of
errors in the analysis is crucial, especially for
large or error-prone data sets.
115Example of genotyping error
116Computer programs for detecting errors (1)
GEMINI http//pbil.univ-lyon1.fr/software/Gemini/g
emini.htm PAWE http//linkage.rockefeller.edu/paw
e/ PREST http//fisher.utstat.toronto.edu/sun/Sof
tware/Prest/ Pedcheck http//watson.hgen.pitt.edu/
register/docs/pedcheck.html PedManager http//www.
broad.mit.edu/ftp/distribution/software/pedmanager
/ MENDEL http//www.genetics.ucla.edu/software/ SI
MWALK http//www.genetics.ucla.edu/software/ Genoc
heck http//softlib.rice.edu/geno.html R/QTL http
//www.biostat.jhsph.edu/kbroman/qtl/
117Computer programs for detecting errors (2)
CERVUS http//helios.bto.ed.ac.uk/evolgen/cervus/c
ervus.html GIMLET http//pbil.univ-lyon1.fr/softwa
re/Gimlet/gimlet.htm RelioType http//www.cnr.uida
ho.edu/lecg/pubs_and_software.htm Micro-checker ht
tp//www.microchecker.hull.ac.uk DROPOUT http//ww
w.fs.fed.us/rm/wildlife/genetics PARENTE http//ww
w2.ujf-grenoble.fr/leca/membres/manel.html PAPA h
ttp//www.bio.ulaval.ca/louisbernatchez/downloads_
fr.htm PseudoMarker http//www.helsinki.fi/tsjunt
un/pseudomarker/ TDTae ftp//linkage.rockefeller.
edu/software/tdtae2/ LRTae ftp//linkage.rockefel
ler.edu/softare/lrtae/
118How to limit genotyping errors and their impact?
119How to limit genotyping errors and their impact?
120Towards quality processes for genotyping (1)
- In every scientific discipline, the reliability
of the conclusions strongly depends on the
quality of the data. - For geneticists, genotyping errors may strongly
affect the results. - The protocol used for minimizing the occurrence
of errors, the methods for error detection, and
the estimated error rate should be provided for
each study. - With this information, it will be possible to
assign to each genotype a quality index, allowing
the scientific community to have a critical view
when unexpected results are published.
121Towards quality processes for genotyping (2)
- More and more studies, often in the context of
international programs, generate enormous
datasets that cannot be produced in a single
laboratory. - The reproducibility of genotyping becomes more
and more important. - Even for markers known to be robust (SNPs,
microsatellites, AFLPs), differences may appear
among laboratories and over time within the same
laboratory.
122Towards quality processes for genotyping (3)
- Expression studies using microarray experiments
are known to be error-prone, and the scientific
community reacted in designing strict standards
the Minimum Information About a Microarray
Experiment (MIAME) produces a checklist to guide
authors and journal editors to ensure that data
are made publicly available in a format that
enables unambiguous interpretation and potential
verification of the conclusion. It includes
several steps verifying for instance experiment
design, sample preparation, and data measurement.
123Towards quality processes for genotyping (4)
- Genotyping errors have been identified since the
early beginning of molecular genetics. - Their consequences in statistical genetics were
pointed out in 1957, and null alleles in blood
groups have been recognised since 1938. - They remained too often neglected in the past and
it is clear that they merit much more attention
according to their dramatic impact in some
studies. - Recently, many papers have dealt with genotyping
errors, and it seems that the scientific
community begin to realise their importance.
124Towards quality processes for genotyping (5)
- The fields of ancient DNA and gene expression
suffered a crisis of confidence, with series of
erroneous papers published in leading journals.
As a consequence, these two scientific
communities were able to set up strict standards
that promoted data quality and solved the crisis.
- In population genetics, the situation is
different because only a few erroneous papers
have been published. Therefore, this community
has not been apparently strongly pushed to
establish strict standards. Another explanation
for the delay in establishing strict standards
might be related to the complexity of the
problems. - According to the recent awareness about
genotyping errors occurrence and about their
potential impact, it can be predicted that more
and more attention will be paid to these
difficulties when designing experimental
protocols and publishing results.
125How to limit genotyping errors and their impact?
- General recommendations
- Limiting the production of errors during
genotyping - Cleaning the dataset after genotyping
- Analysing data taking into account the errors
- Towards quality processes for genotyping
- Practicals establishing reliable experimental
protocols (case studies)
126Practicals establishing reliable experimental
protocols (case studies)
- Identify the question
- Design the pilot study
- Design the sampling strategy
- Design the experimental protocol that will limit
as much as possible the genotyping errors - Design the data analysis process
127Practicals establishing reliable experimental
protocols (case studies)
- A good approach is to consider that the
experimental protocol used will produce only
artifacts and that all the samples have been
mixed during the process. - The best strategy is to try to establish an
experimental protocol that demonstrates that no
artifacts have been produced, and that nothing
has been mixed up during the process.
128Phylogeography of Capercallie
- What is the status of the Pyrenean and Cantabrian
populations?
129Phylogeography of Capercallie
- 92 faeces samples covering the whole range (23
localities). - Sequencing of the mitochondrial DNA control
region (443 bp) in both directions. - Unexpected results that could come from tube
mixing!
130(No Transcript)