Pierre Taberlet, Eva Bellemain, Aur - PowerPoint PPT Presentation

1 / 130
About This Presentation
Title:

Pierre Taberlet, Eva Bellemain, Aur

Description:

Laboratoire d'Ecologie Alpine, CNRS UMR 5553, Universit Joseph Fourier, ... Bonin A, Bellemain E, Bronken Eidesen P, Pompanon F, ... in Enzymology, 155, ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 131
Provided by: pierret3
Category:

less

Transcript and Presenter's Notes

Title: Pierre Taberlet, Eva Bellemain, Aur


1
Genotyping errors
  • Pierre Taberlet, Eva Bellemain, Aurélie Bonin,
    François Pompanon
  • Laboratoire d'Ecologie Alpine, CNRS UMR 5553,
  • Université Joseph Fourier, Grenoble, France

2
Genotyping errors
  • Bonin A, Bellemain E, Bronken Eidesen P, Pompanon
    F, Brochmann C, Taberlet P (2004) How to track
    and assess genotyping errors in population
    genetics studies. Molecular Ecology, 13,
    3261-3273.
  • Pompanon F, Bonin A, Bellemain E, Taberlet P
    (2005) Genotyping errors causes, consequences
    and solutions. Nature Reviews Genetics, in press.

3

Genotyping errors
  • Definition
  • Non-invasive sampling and genotyping errors
  • Causes of genotyping errors
  • Quantifying genotyping errors
  • Consequences of genotyping errors
  • How to limit genotyping errors and their impact?

4

Genotyping errors
  • Definition
  • Non-invasive sampling and genotyping errors
  • Causes of genotyping errors
  • Quantifying genotyping errors
  • Consequences of genotyping errors
  • How to limit genotyping errors and their impact?

5
Definition
  • A genotyping error occurs when the observed
    genotype of an individual does not correspond to
    the true genotype.
  • Genotyping errors can have strong consequences on
    the biological message that can be deduced from
    the data.

6
Distribution of papers on "genotyping errors"
according to their publication year
  • Apparently, more and more attention is paid to
    genotyping errors.

7
Distribution of papers on "genotyping errors"
according to their subject
  • Genotyping errors are a concern for some research
    field only (linkage analyses, non-invasive
    methods).
  • What about the other fields using genetic tools?
    (population genetics/genomics?)

8

Genotyping errors
  • Definition
  • Non-invasive sampling and genotyping errors
  • Causes of genotyping errors
  • Quantifying genotyping errors
  • Consequences of genotyping errors
  • How to limit genotyping errors and their impact?

9
Non-invasive sampling and genotyping errors
  • Historical aspects
  • Solutions to limit genotyping errors
  • Towards a quality index
  • Practicals estimation of the quality index

10
Questions about the Pyrenean bear population
Papillon, Photo J.-J. Camarra, ONC, Août 1995
Geographic distribution of the brown bear in
Europe
11
Questions about the Pyrenean bear population
  • Where to take bears to reinforce the endangered
    Pyrenean population?
  • How many bears are left in the Pyrenees?
  • How many males and females?

12
The three different sampling methods
  • Destructive sampling
  • Non-destructive sampling
  • Non-invasive sampling

13
Destructive sampling
  • The animal is killed in order to obtain the
    tissues necessary for genetic analysis.
  • This sampling strategy has been used extensively
    for isozyme studies, and for mtDNA analysis
    before PCR was discovered.
  • It has been abandoned by many researchers.

14
Non-destructive sampling
  • The animal is often captured, and a biopsy or a
    blood sample is taken invasively.
  • However, some invasive sampling strategies do not
    require catching the animal.
  • For example tissues can be obtained from whales
    and some other large mammals by using biopsy dart
    guns.

15
Non-invasive sampling
  • This term should be restricted to situation where
    the source of DNA is left behind and is collected
    without having to catch or disturb the animal.
  • In the literature, non-destructive sampling is
    often improperly considered as non-invasive.
  • Catching a mammal (or a bird) and plucking a few
    hairs (or feathers) should not be considered as
    non-invasive, but rather as non-destructive.

16
Non-invasive genetic sampling only possible via
PCR
  • Mullis KB, Faloona FA (1987) Specific synthesis
    of DNA in vitro via a polymerase-catalysed chain
    reaction. Methods in Enzymology, 155, 335-350.
  • Saiki RK, Gelfand DH, Stoffel S, Scharf SJ,
    Higuchi R, Horn GT, Mullis KB, Erlich HA (1988)
    Primer-directed enzymatic amplification of DNA
    with a thermostable DNA polymerase. Science, 239,
    487-491.

17
Problematic results about the census of the
Pyrenean bear population (in 1994)
  • More bears than expected!
  • No success when trying to replicate the results.
  • Two years to understand and solve the problem.

18
Potential of non-invasive genetic sampling two
opposing point of view
  • Non-invasive sampling can exploit the full
    potential of DNA analysis.
  • True for mtDNA
  • Dominant opinion ten years ago
  • Non-invasive sampling has serious limitations.
  • Many technical problems
  • Possibility of genotyping errors

19
Non-invasive sampling can exploit the full
potential of DNA analysis
  • Morin PA, Moore JJ, Chakraborty R, Jin L, Goodall
    J, Woodruff DS (1994) Kin selection, social
    structure, gene flow, and the evolution of
    chimpanzees. Science, 265, 1193-1201.
  • Microsatellite study using hairs as a source of
    DNA.
  • Males are more homozygous than females
    in general.
  • Males are staying in their group more than   
    females.
  • Wrong results due to more genotyping
    errors in males (mainly allelic
    dropout).

20
Non-invasive sampling has serious limitations
  • Gerloff U, Schlötterer C, Rassmann K, Rambold I,
    Hohmann G, Fruth B, Tautz D (1995) Amplification
    of hypervariable simple sequence repeats
    (microsatellites) from excremental DNA of wild
    living bonobos (Pan paniscus). Molecular Ecology,
    4, 515-518.
  • Taberlet P, Griffin S, Goossens B, Questiau S,
    Manceau V, Escaravage N, Waits LP, Bouvet J
    (1996) Reliable genotyping of samples with very
    low DNA quantities using PCR. Nucleic Acids
    Research, 26, 3189-3194.
  • Gagneux P, Boesch C, Woodruff DS (1997)
    Microsatellite scoring errors associated with
    noninvasive genotyping based on nuclear DNA
    amplified from shed hair. Molecular Ecology, 6,
    861-868.

21
Gagneux P, Woodruff DS, Boesch C (1997) Furtive
mating in female chimpanzees. Nature, 387 (22 May
1997), 358-359.
22
Gagneux P, Woodruff DS, Boesch C (1997) Furtive
mating in female chimpanzees. Nature, 387 (22 May
1997), 358-359.
  • Paternity study of the offspring of a chimpanze
    community.
  • Half of the offspring did not display any allele
    inherited from an intragroup father.
  • Conclusion these offspring had an extragroup
    father.
  • The dataset contained allelic dropouts (paper
    retraction in 2001).

23
Scan of the Gagneux paper in Mol Ecol
24
Genotyping errors main difficulties in
non-invasive sampling
  • Contamination
  • Allelic dropout
  • False alleles

25
Contamination
  • Behind the possibility of detecting a single
    target molecule, there is also a possibility of
    detecting a single contaminant molecule.
  • Working with non-invasive genetic sampling is
    similar to ancient DNA studies.

26
Genotyping errors allelic dropout
  • For a heterozygous individual, only one allele is
    present in the template and/or is amplified in
    the PCR reaction.
  • This error produces a false homozygote.

27
Genotyping errors false alleles
  • Artifacts can be generated during the first
    cycles of the PCR reaction, and can be
    misinterpreted as true alleles.
  • Very difficult to discern from sporadic
    contamination.

28
Genotyping errors example
Allele A
Brown bear Locus G10B
Allele B
  • Five independent genetic typing using the same
    DNA extract (from feces).

29
Genotyping errors example
Allele A
Allele B
  • Fifty independent genotyping experiments using
    the same DNA extract (from a bear feces) locus
    G10B.

30
Genotyping errors example
Seven independent experiments using the same DNA
extract from a bear feces.
31
Genotyping errors example
Seven independent experiments using the same DNA
extract from a single marmot hair.
32
Influence of the amount of template DNA
From Goossens et al., 1998
33
Allelic dropout mathematical model
  • The model is restricted to the genotyping of an
    individual bearing alleles A and B at an
    autosomal locus.
  • Many assumptions have been made.

34
Allelic dropout mathematical model assumptions
  • The DNA extract contains equal numbers of the
    alleles A and B.
  • A single target molecule can be amplified and
    detected.
  • Each single target molecule has the same
    probability of being amplified.
  • 100 PCRs and be performed using the DNA extract,
    and the target DNA molecules are distributed
    randomly among the 100 PCR tubes.
  • If the initial proportion between alleles A and B
    (A/B or B/A) in the PCR tube is greater than or
    equal to five, then only the most common allele
    will be detected.

35
The problem of very small DNA samples simulations
Simulations for a heterozygote individual with
alleles A and B
Simulations for a heterozygote individual with
alleles A and B.
correct genotyping
correct genotyping
36
Results of the simulations
PCR product


(at least one

allele)
correct

genotyping

(both alleles)
one cell contains

about 7 picograms

of DNA
template DNA per

amplification (picograms)




Probability of correct genotyping at a
heterozygote microsatellite locus using very mall
DNA samples
DNA samples
37
Guidelines for genotyping very small DNA samples
  • Multiple-tube approach.
  • Navidi W, Arnheim N, Waterman MS (1992) A
    multiple-tube approach for accurate genotyping of
    very small DNA samples by using PCR statistical
    considerations. American Journal of Human
    Genetics, 50, 347-359.
  • Taberlet P, Griffin S, Goossens B, Questiau S,
    Manceau V, Escaravage N, Waits LP, Bouvet J
    (1996) Reliable genotyping of samples with very
    low DNA quantities using PCR. Nucleic Acids
    Research, 26, 3189-3194.

38
Guidelines for genotyping very small DNA samples
  • The guidelines are only valid under the following
    conditions.
  • A single target molecule can be detected.
  • The amount of template DNA is very low, in the
    picogram range, but is not accurately know.

39
Guidelines for genotyping very small DNA samples
  • Confidence of 99.
  • Multiple-tube approach.
  • Heterozygotes an allele can be recorded only if
    it has been found at least twice.
  • Homozygotes an individual can be considered as
    homozygous only if eight independent experiments
    have shown the same allele.

40
Guidelines for genotyping very small DNA samples
  • How to avoid or to limit the impact of the
    multiple-tube approach?
  • By estimating the amount of template DNA
  • Miller C, Joyce P, Waits L (2002) Assessing
    allelic dropout and genotype reliability using
    maximum likelihood. Genetics, 160, 357-366.
  • Morin P, Chambers K, Boesh C, Vigilant L (2001)
    Quantitative polymerase chain reaction analysis
    of DNA from noninvasive samples for accurate
    microsatellite genotyping of wild chimpanzees
    (Pan troglodytes verus). Molecular Ecology, 10,
    1835-1844.

41
Quantitative PCR (from Morin et al., 2001)
Relationship between the initial amount of
template DNA in the PCR and both the proportion
of PCRs with amplification product (grey squares)
and the proportion of PCRs with allelic dropout
(black circles).
42
Towards a quality index
  • Goal estimate a quality index associated to each
    sample.
  • This quality index should allow comparisons among
    samples, loci, and studies.
  • Restricted to the situation where the
    multiple-tube approach is used.

43
Towards a quality index
  • The estimation of the quality index (QI) is based
    on the analysis of the whole set of
    electropherograms produced when using the
    multiple-tube approach.
  • For each locus of a given sample, a QI is
    estimated using the following steps
  • Step 1 estimation of the most likely consensus
    genotype
  • Step 2 estimation of the score for each repeat
  • Step 3 estimation of the quality index for the
    locus

44
Towards a quality index
  • Step 1 estimation of the most likely consensus
    genotype after simultaneous observation of the
    electropherograms corresponding to the different
    repeats of this locus. An allele is considered
    only if it is present at least twice among the
    different repeats.
  • Step 2 estimation of the score for each repeat.
    If the electropherogram at one repeat corresponds
    to the consensus genotype, the score "1" is
    assigned, otherwise the score "0" is assigned,
    whetever the differences.
  • Step 3 estimation of the QI for the locus. The
    scores assigned to each repeat are summed, and
    divided by the number of repeats.
  • Step 4 estimation of the mean QI per locus and
    per individual.

45
Additional rules
  • No signal is scored as "0".
  • Electropherograms with an additional allele are
    scored as "0".
  • If the less intense allele is less than 20 of
    the most intense allele, a score of "0" is given.

46
Quality index example 1
Multiple-tube approach, 8 repeats
47
Quality index example 2
Multiple-tube approach, 8 repeats
0
1
0
0
Step 2 score for each repeat
1
0
0
0
48
Quality index example 3
Multiple-tube approach, 8 repeats
1
1
0
1
Step 2 score for each repeat
1
0
0
1
49
Quality indexes for loci, samples, and study
Samples Samples Samples Samples Samples
1 2 3 4 5 mean
Locus 1 0.88 0.63 0.75 0.00 1.00
Locus 2 1.00 0.38 1.00 0.25 1.00
Locus 3 1.00 0.25 0.63 0.25 1.00
mean mean
50
Quality indexes for loci, samples, and study
Samples Samples Samples Samples Samples
1 2 3 4 5 mean
Locus 1 0.88 0.63 0.75 0.00 1.00 0.65
Locus 2 1.00 0.38 1.00 0.25 1.00 0.73
Locus 3 1.00 0.25 0.63 0.25 1.00 0.63
mean mean
51
Quality indexes for loci, samples, and study
Samples Samples Samples Samples Samples
1 2 3 4 5 mean
Locus 1 0.88 0.63 0.75 0.00 1.00 0.65
Locus 2 1.00 0.38 1.00 0.25 1.00 0.73
Locus 3 1.00 0.25 0.63 0.25 1.00 0.63
mean mean 0.96 0.42 0.79 0.17 1.00
52
Quality indexes for loci, samples, and study
Samples Samples Samples Samples Samples
1 2 3 4 5 mean
Locus 1 0.88 0.63 0.75 0.00 1.00 0.65
Locus 2 1.00 0.38 1.00 0.25 1.00 0.73
Locus 3 1.00 0.25 0.63 0.25 1.00 0.63
mean mean 0.96 0.42 0.79 0.17 1.00 0.67
53
Quality indexes for samples
54
Quality indexes for loci
55
Non-invasive census of the brown bears from the
Deosai National Park (Pakistan)
56
Non-invasive census of the brown bears from the
Deosai National Park (Pakistan)
57

Genotyping errors
  • Definition
  • Non-invasive sampling and genotyping errors
  • Causes of genotyping errors
  • Quantifying genotyping errors
  • Consequences of genotyping errors
  • How to limit genotyping errors and their impact?

58
Causes of genotyping errors
  • Very diverse, complex, and sometimes cryptic
    origins.
  • Grouping errors into discrete categories
    according to their causes is challenging.
  • DNA sequence
  • Low DNA quantity or quality
  • Biochemical artifacts
  • Human errors

59
"A" artifact
CGATCGTTAATCAGAATGCATACCGCA GCTAGCAATTAGTCTTACGT
ATGGCG
60
(No Transcript)
61
"A" artifact example
62
Three solutions
  • Enzymatic treatment of the PCR product with T4
    DNA polymerase to remove the additional "A".
  • Modification of the PCR parameters.
  • Modification of the 5' end of the non-labeled
    primer.

63
Modification of the PCR parameters
64
Modification of the primer principle
65
Modification of the primer result
66
(No Transcript)
67
PCR conditions
68
Modifications reducing the "A"
69
Modifications enhancing the "A"
70
(No Transcript)
71
PCR conditions
72
Experiments
73
Original XA modified XT
Enhance the "A"
CGATCGTTAATCAGAATGCATACCGCA GCTAGCAATTAGTCTTACGT
ATGGCGT
original
CGATCGTTAATCAGAATGCATACCGCTA GCTAGCAATTAGTCTTACG
TATGGCGA
modified
74
Original XT modified XA
Reduce the "A"
CGATCGTTAATCAGAATGCATACCGCTA GCTAGCAATTAGTCTTACG
TATGGCGA
original
CGATCGTTAATCAGAATGCATACCGCA GCTAGCAATTAGTCTTACGT
ATGGCGT
modified
75
"A" artifact conclusion (1)
  • Do not perform a final elongation without reason
    (this elongation enhance the "A" artifact).
  • Even at 4C, the "A" is slowly added.
  • Use the most simple PCR protocol at the
    beginning.
  • In case of scoring difficulty, identify if the
    marker is "2-steps" or "3-steps".
  • Modify the primers and the PCR protocol if
    necessary.

76
"A" artifact conclusion (2)
  • To enhance the "A"
  • Final elongation (up to 90 minutes)
  • Put a "G" at the 5' end of the reverse primer
  • Add a "PIGtail" GTGTCTT
  • To reduce the "A"
  • 2-step PCR without final elongation
  • Put a "T" at the 5' end of the reverse primer
  • Good luck with multiplexing markers!

77
DNA molecules interactions
  • Cause DNA sequence flanking the marker
  • No or less efficient amplification because of a
    mutation in the target primer sequence (null
    allele)
  • Insertion or deletion in the amplified fragment
    (size homoplasy of different alleles)
  • In heterozygous individuals, preferential
    amplification of one allele when its denaturation
    is favoured (allelic dropout)

78
Sample quality
  • Cause 1 Low DNA quality or quantity
  • In heterozygous individuals, amplification of
    only one allele (allelic dropout)
  • In heterozygous individuals, preferential
    amplification of the shorter allele (short allele
    dominance)
  • Cause 2 Contamination of the DNA extract
  • Amplification of a contaminant allele (mistaken
    allele)
  • Cause 3 Extract quality
  • No or less efficient restriction/amplification
    due to inhibitors (allelic dropout)

79
Biochemical artifacts and equipment
  • Cause 1 Low quality reagents
  • Allelic dropout, mistaken alleles
  • Cause 2 Equipment precision or reliability
  • Allelic dropout, mistaken alleles
  • Cause 3 Taq polymerase errors
  • False allele
  • Cause 4 Lack of specificity
  • Mistaken allele
  • Cause 5 Electrophoresis artifacts
  • Size homoplasy of different alleles, mistaken
    alleles

80
Human factor
  • Cause 1 sample manipulation
  • Confusion between samples (e.g. mislabelling or
    tube mixing) (mistaken allele(s))
  • Cause 2 Experimental error
  • Contamination with an exogenous DNA or
    cross-contamination between samples (mistaken
    allele(s))
  • Use of an inappropriate protocol (reagent
    forgotten, wrong hybridization temperature,
    primers, or concentrations of reagents) (allelic
    dropout, mistaken allele(s))
  • Cause 3 Data handling
  • Misreading of the profile or misidentification of
    the fluorescent peak (mistaken allele)
  • Miscopying or confusion of the genotypes in the
    database (mistaken allele)
  • Computing data bug in the database/analysis
    program (mistaken allele)

81

Genotyping errors
  • Definition
  • Non-invasive sampling and genotyping errors
  • Causes of genotyping errors
  • Quantifying genotyping errors
  • Consequences of genotyping errors
  • How to limit genotyping errors and their impact?

82
Quantifying genotyping errors
  • Different estimates, based on replicates within a
    dataset, have been defined to quantify error
    rates.
  • Some metrics have been proposed for specific
    errors such as allelic dropouts or false alleles.
  • More global metrics, which take into account all
    types of detectable genotyping errors, are also
    commonly used although they have never been
    explicitly defined.

83
Quantifying genotyping errors
  • First, a reference genotype must be defined as
    the genotype that minimizes the number of errors
    in the comparison among replicates. Several
    reference genotypes may exist. If only two
    replicates are performed and give contradictory
    genotypes, either one or the other can be
    considered as the reference.
  • The calculation of error rates is based on the
    number of mismatches between the reference
    genotype and the replicates.

84
Quantifying genotyping errors
  • n individual single-locus genotypes have been
    replicated t times.
  • For diploid individuals, 2nt alleles and nt loci
    are typed and can be compared to the reference.
  • Estimation of the error rates at the allelic,
    locus, multilocus, and reaction levels.

85
Quantifying genotyping errors
  • Mean allelic error rate
  • Mean error rate per locus
  • Error rate per multilocus genotype
  • Error rate per reaction

86
Mean allelic error rate ea
  • The mean allelic error rate ea is the ratio
    between ma, the number of allelic mismatches, and
    2nt, the number of replicated alleles.
  • For microsatellite markers, the error rate per
    allele can also be estimated for each particular
    allele to eventually point out error-prone
    alleles (for example, alleles prone to dropouts).

87
Mean error rate per locus el
  • The mean error rate per locus is the ratio
    between ml, the number of single locus genotypes
    including at least one allelic mismatch, and nt,
    i.e. the number of replicated single locus
    genotypes.
  • This metric can also be estimated for each
    particular locus, to help identifying the
    error-prone loci.
  • As it can be compared between studies and
    samples, it should become the standard metric.

88
Error rate per multilocus genotype eobs (1)
  • The observed error rate per multilocus genotype
    eobs is the ratio between mg, the number of
    multilocus genotypes including at least one
    allelic mismatch, and nt, the number of
    replicated multilocus genotypes.
  • This metric is particularly informative for
    individual identification, parentage analyse or
    population size estimation.

89
Error rate per multilocus genotype eind (2)
  • If genotyping errors occur independently among l
    loci (which is very unlikely), the error rate per
    multilocus genotype eind is deduced from the
    single-locus error rate ei at each locus i

90
Error rate per reaction er
  • The error rate per reaction er is the ratio
    between ml, the number of single-locus genotypes
    including at least one allelic mismatch and r,
    the total number of reactions.
  • This metric is equivalent to the mean error rate
    per locus when the PCR reaction involves one
    locus or to the multilocus error rate when all
    loci are amplified in a single multiplex reaction.

91
Estimation of the error rates per allele and per
locus, for four replicates (t4) of three
individuals (n3)
replicates replicates replicates replicates Reference genotype Reference genotype Reference genotype Error rate per allele Error rate per locus
1 2 3 4 Reference genotype Reference genotype Reference genotype Error rate per allele Error rate per locus
Genotyped individuals Ind 1 Al 1 A A B A A 3/8 2/4
Genotyped individuals Ind 1 Al 2 A B C A A 3/8 2/4
Genotyped individuals Ind 2 Al 1 A B B A A or B 2/8 2/4
Genotyped individuals Ind 2 Al 2 B B B B B or B 2/8 2/4
Genotyped individuals Ind 3 Al 1 A A A A A 1/8 1/4
Genotyped individuals Ind 3 Al 2 C C B C C 1/8 1/4
mean mean mean 1/4 5/12
92
Reference papers
  • Bonin A, Bellemain E, Bronken Eidesen P, Pompanon
    F, Brochmann C, Taberlet P (2004) How to track
    and assess genotyping errors in population
    genetics studies. Molecular Ecology, 13,
    3261-3273.
  • Hoffman J, Amos W (2005) Microsatellite
    genotyping errors detection approaches, common
    sources and consequences for paternal exclusion.
    Molecular Ecology, 14, 599-612.

93
Example of error rates
  • Bonin et al. (2004)
  • Bear tissues 0.008 per locus
  • Bear faeces 0.019 per locus
  • AFLP 0.019 to 0.026 per locus
  • Hoffman and Amos (2005)
  • 2000 antarctic fur seal genotyped at 9
    microsatellite loci
  • 0.0013 to 0.0074 per locus
  • Human errors are the most important cause

94

Genotyping errors
  • Definition
  • Non-invasive sampling and genotyping errors
  • Causes of genotyping errors
  • Quantifying genotyping errors
  • Consequences of genotyping errors
  • How to limit genotyping errors and their impact?

95
Consequences of genotyping errors
  • Linkage and association studies
  • Individual identification
  • Population genetic studies

96
Linkage and association studies
  • Erroneous genotypes might markedly affect linkage
    and association studies by hiding the true
    segregation of alleles.
  • The impact on the results is measured by
    experimental or simulation studies and can be
    serious even for low error rates (e.g. lt 3).
  • For example, in linkage studies, genotyping
    errors can affect the haplotype frequency and
    eventually lead to inflation of genetic map
    lengths.
  • Error rates as low as 3 have serious effects on
    linkage disequilibrium analysis, and a 1 error
    rate can generate a loss of 53-58 of the linkage
    information for a trait locus. However, modest
    error rates might be tolerable in situations that
    do not involve rare alleles, as in QTL studies.

97
Linkage and association studies
  • In association studies, because recombination is
    rare, errors mostly affect non-recombinant
    genotypes, which are then erroneously interpreted
    as being the result of recombination. Errors
    therefore decrease the power for detecting
    associations.
  • The importance of the experimental design has
    also to be emphasised as it can generate errors
    that are not randomly distributed across
    phenotypes (i.e., differential errors). This can
    be the case when controls and cases are genotyped
    in different assays while investigating the
    genetic basis of a disease. Differential and
    non-differential errors can have opposite
    consequences on the rate of false positive in
    statistical tests of association.

98
Individual identification
  • Genotyping errors can strongly affect individual
    identification studies that are based on
    multilocus genotypes by erroneously increasing
    the number of genotypes observed in a population
    sample.
  • In census studies of rare or elusive species, the
    population size can be estimated based on the
    identified genotypes from non-invasive samples
    collected in the field (e.g., hair or faeces). In
    this context, genotyping errors can lead to a
    serious overestimate of population size.
  • A 200 overestimate of population size has been
    found with a 5 error rate per locus when using 7
    to 10 loci for genotype identification (Creel et
    al., 2004). Such an overestimate obviously
    increases with the number of loci and with the
    number of samples per genotype.

99
Individual identification
  • Genotyping errors also have a huge impact in
    parentage analysis, generating wrong paternity or
    maternity exclusion.
  • Such information on population size and structure
    are required in conservation biology, and their
    inaccurate estimation due to genotyping errors
    could result in wrong decision in population
    management.
  • In forensic DNA analyses, a false multilocus
    genotype can prevent the identification of a
    corpse or lead to erroneous identification (or
    exoneration) of criminal offenders.

100
Population genetic studies
  • Most of the studies that take genotyping error
    into account in population genetics are those
    that use non-invasive samples, which are
    error-prone because of the low quality and/or
    quantity of DNA.
  • However, it has been demonstrated that even with
    high quality DNA the error rate might not be
    negligible.
  • The impact of genotyping errors remains largely
    unknown in this field, because very few studies
    have dealt with this topic until now.
  • Genotyping errors may lead to erroneous allele
    identification or allele frequencies, resulting
    in wrong Fst estimates, false migration rates, or
    false detection of selection or population
    bottlenecks.

101
Population genetic studies
  • Analyses based on allele frequencies will be less
    affected by errors than those based on individual
    identification (e.g., parentage analysis), but
    will be sensitive to sampling effects.
  • The apparent low impact of scoring differences
    has been demonstrated on an AFLP data set that
    was scored by two different scientists. The two
    scorers had only 38 of the marker loci in
    common, but the same biological conclusions about
    population genetic structure was extracted from
    the data. In this study, the robustness of the
    inferred biological message was certainly due to
    the redundancy of the information contained in
    the large amount of AFLP markers (more than 200
    polymorphic loci screened by both scorers).
  • Population genomics studies looking for selected
    markers among several hundred markers would be
    very sensitive to the impact of genotyping error,
    especially if the errors are population-specific.
    There is a great need for studies on the impact
    of genotyping error in this new emerging field.

102

Genotyping errors
  • Definition
  • Non-invasive sampling and genotyping errors
  • Causes of genotyping errors
  • Quantifying genotyping errors
  • Consequences of genotyping errors
  • How to limit genotyping errors and their impact?

103
How to limit genotyping errors and their impact?
  • The worse situation arises when a scientist
    realises at the end of a study that the data were
    not reliable due to genotyping errors, and that
    the dataset is not retrievable.
  • Such situations are almost never reported in the
    literature, but their occurrence is probably not
    rare.
  • Therefore, it is important to take into account
    the possibility of genotyping errors when
    designing the experimental protocol.

104
How to limit genotyping errors and their impact?
  • The strategy consists in demonstrating, via an
    appropriate procedure, that the data produced and
    the results obtained are reliable.
  • The diversity of case studies, error causes, and
    laboratory contexts makes it impossible to
    propose a universal and simple procedure.
  • As a consequence, the possible solutions to limit
    the occurrence and the impact of genotyping
    errors are case-specific.
  • The optimal strategy will be determined by
    several factors, such as the biological question,
    the tolerable error rate, the sampling
    possibilities, the equipment and technical skills
    that are locally available, the financial support
    and time constraints.

105
How to limit genotyping errors and their impact?
  • General recommendations
  • Limiting the production of errors during
    genotyping
  • Cleaning the dataset after genotyping
  • Analysing data taking into account the errors
  • Towards quality processes for genotyping
  • Practicals establishing reliable experimental
    protocols (case studies)

106
General recommendations (1)
  • A first step consists in checking that the
    genotyping experiments necessary to reach the
    scientific goal are realistic according to the
    sample quality and the technical skills available
    (bad sample quality and limited technical skills
    obviously influence the error rate).
  • A second step involves carrying out a pilot study
    designed to first evaluate the theoretical error
    rate compatible with the data analysis, and then
    to estimate the real error rate based on the
    analysis of a subset of the samples.
  • Finally, it is important to be aware of potential
    problems all along the experimental procedure,
    even after a successful pilot study, from
    sampling to data analysis.

107
General recommendations (2)
  • Quality controls should be performed in real time
    during each step and each batch of experiments.
  • They should also be diverse for being able to
    detect as many types of errors as possible. For
    example, highly reproducible errors such as null
    alleles cannot be detected by replicates, and
    require Hardy-Weinberg tests or inheritance
    studies. On the contrary, stochastic allelic
    dropouts might not be detected by Hardy-Weinberg
    tests, but by replicating the genotyping assays.
  • Control procedures are costly and time consuming.
    Thus the effort for reducing the error rate must
    be adapted to the foreseeable impact of the
    genotyping errors.
  • Because genotyping errors may be generated even
    with high quality standards, and because they
    cannot be all detected, efforts must be directed
    towards limiting both their production and their
    subsequent impact.

108
Limiting the production of errors during
genotyping (1)
  • Given that human factors can be the main issue
    during genotype production, the most efficient
    approach is to concentrate first on minimizing
    human error.
  • Only well-trained bench scientists/technicians
    should be involved, as suggested by quality
    assurance standards for forensic DNA testing
    laboratories.
  • Only standardized and validated procedures should
    be used.
  • Human manipulation should be reduced as much as
    possible according to the automation
    possibilities, from all handling and pipeting
    steps to allele scoring. However, for allele
    scoring, software packages are not yet
    sophisticated enough to prevent scoring errors.
    Semi-automated scoring followed by human visual
    inspection appears to be the most reliable
    procedure.

109
Limiting the production of errors during
genotyping (2)
  • Limiting genotyping errors during laboratory
    experiments requires the systematic use of an
    appropriate number of positive and negative
    controls, but also requires the implementation of
    replicates for real-time error detection and
    error rate estimation.
  • In every situation, even with high quality DNA,
    replicating five to 10 of the samples has been
    recommended, but the amount can vary according to
    the goal of the study and the potential impact of
    errors.
  • As far as possible, these replicates have to be
    carried out blind and independently.
  • This involves implementing the blind process from
    the beginning of the experiment, by carrying out
    a systematic duplication of the samples during
    sample collection. Such a procedure will not only
    allow to detect all laboratory errors, but will
    also pick up handling errors at any stage of the
    analysis. Moreover, comparing blind samples and
    original experiments will produce a fair estimate
    of the error rate.

110
Limiting the production of errors during
genotyping (3)
  • When genotyping errors are highly probable, blind
    replicates are still necessary but not
    sufficient. The systematic replication of each
    genotyping assay (i.e., multiple-tube approach)
    may be required to define the consensus
    genotypes.
  • There is a trade-off between the cost of the
    experiments and the reliability of the genotypes.
  • One role of the pilot study is to determine the
    optimal number of replicates required.
  • In some cases, errors can also be detected by
    replicating the genotyping process using a
    different technology such as sequencing whose
    error rates are typically lower than standard
    genotyping technologies.

111
Cleaning the dataset after genotyping (1)
  • Even if all erroneous genotypes detected during
    the experiments are removed, and eventually
    corrected after re-genotyping, some undetected
    errors will certainly remain in the data set. A
    part of them can still be detected or suspected
    by looking at the concordance with independent
    data.
  • The power of detecting errors by consistency with
    independent data can influence the strategy for
    limiting errors.
  • It might be more efficient to retype erroneous
    genotypes detected by consistency checking than
    running a large proportion of blind replicates.

112
Cleaning the dataset after genotyping (2)
  • Testing Hardy-Weinberg equilibrium is common to
    check the quality of the data, under the
    assumption that a high error rate implies
    disequilibrium. However, many other causes can
    lead to disequilibrium, including selection,
    inbreeding and population admixture.
  • Moreover, just a few types of error might produce
    disequilibrium, such as null alleles and allelic
    dropouts.
  • Therefore there is still a need for other
    controls and replicates for detecting errors that
    are compatible with Mendelian inheritance and
    Hardy-Weinberg equilibrium.

113
Cleaning the dataset after genotyping (3)
  • Several computer programs specifically designed
    to detect potential errors are now available.
  • Most of them check for Mendelian consistency
    and/or Hardy-Weinberg equilibrium, and are
    commonly used for pedigree analyses and linkage
    studies.
  • Some others have been developed to track some
    kinds of errors that can be compatible with
    Mendelian inheritance or Hardy-Weinberg
    equilibrium. For example, some detect a spurious
    excess of recombinants in linkage studies and
    others focus on inconsistencies between
    replicates.

114
Cleaning the dataset after genotyping (4)
  • Removing errors might not reduce bias, depending
    on the number and kind of errors detected and the
    bias each one creates.
  • For instance, when correcting Mendelian-incompatib
    le genotypes by retyping or removing families in
    which they occur, the undetected errors can
    produce an excess of false positives for some
    family-based association tests. This problem has
    been addressed by developing an appropriate
    Likelihood Ratio Test based on a general genotype
    error model.
  • In general, taking into account the occurrence of
    errors in the analysis is crucial, especially for
    large or error-prone data sets.

115
Example of genotyping error
116
Computer programs for detecting errors (1)
GEMINI http//pbil.univ-lyon1.fr/software/Gemini/g
emini.htm PAWE http//linkage.rockefeller.edu/paw
e/ PREST http//fisher.utstat.toronto.edu/sun/Sof
tware/Prest/ Pedcheck http//watson.hgen.pitt.edu/
register/docs/pedcheck.html PedManager http//www.
broad.mit.edu/ftp/distribution/software/pedmanager
/ MENDEL http//www.genetics.ucla.edu/software/ SI
MWALK http//www.genetics.ucla.edu/software/ Genoc
heck http//softlib.rice.edu/geno.html R/QTL http
//www.biostat.jhsph.edu/kbroman/qtl/
117
Computer programs for detecting errors (2)
CERVUS http//helios.bto.ed.ac.uk/evolgen/cervus/c
ervus.html GIMLET http//pbil.univ-lyon1.fr/softwa
re/Gimlet/gimlet.htm RelioType http//www.cnr.uida
ho.edu/lecg/pubs_and_software.htm Micro-checker ht
tp//www.microchecker.hull.ac.uk DROPOUT http//ww
w.fs.fed.us/rm/wildlife/genetics PARENTE http//ww
w2.ujf-grenoble.fr/leca/membres/manel.html PAPA h
ttp//www.bio.ulaval.ca/louisbernatchez/downloads_
fr.htm PseudoMarker http//www.helsinki.fi/tsjunt
un/pseudomarker/ TDTae ftp//linkage.rockefeller.
edu/software/tdtae2/ LRTae ftp//linkage.rockefel
ler.edu/softare/lrtae/
118
How to limit genotyping errors and their impact?
119
How to limit genotyping errors and their impact?
120
Towards quality processes for genotyping (1)
  • In every scientific discipline, the reliability
    of the conclusions strongly depends on the
    quality of the data.
  • For geneticists, genotyping errors may strongly
    affect the results.
  • The protocol used for minimizing the occurrence
    of errors, the methods for error detection, and
    the estimated error rate should be provided for
    each study.
  • With this information, it will be possible to
    assign to each genotype a quality index, allowing
    the scientific community to have a critical view
    when unexpected results are published.

121
Towards quality processes for genotyping (2)
  • More and more studies, often in the context of
    international programs, generate enormous
    datasets that cannot be produced in a single
    laboratory.
  • The reproducibility of genotyping becomes more
    and more important.
  • Even for markers known to be robust (SNPs,
    microsatellites, AFLPs), differences may appear
    among laboratories and over time within the same
    laboratory.

122
Towards quality processes for genotyping (3)
  • Expression studies using microarray experiments
    are known to be error-prone, and the scientific
    community reacted in designing strict standards
    the Minimum Information About a Microarray
    Experiment (MIAME) produces a checklist to guide
    authors and journal editors to ensure that data
    are made publicly available in a format that
    enables unambiguous interpretation and potential
    verification of the conclusion. It includes
    several steps verifying for instance experiment
    design, sample preparation, and data measurement.

123
Towards quality processes for genotyping (4)
  • Genotyping errors have been identified since the
    early beginning of molecular genetics.
  • Their consequences in statistical genetics were
    pointed out in 1957, and null alleles in blood
    groups have been recognised since 1938.
  • They remained too often neglected in the past and
    it is clear that they merit much more attention
    according to their dramatic impact in some
    studies.
  • Recently, many papers have dealt with genotyping
    errors, and it seems that the scientific
    community begin to realise their importance.

124
Towards quality processes for genotyping (5)
  • The fields of ancient DNA and gene expression
    suffered a crisis of confidence, with series of
    erroneous papers published in leading journals.
    As a consequence, these two scientific
    communities were able to set up strict standards
    that promoted data quality and solved the crisis.
  • In population genetics, the situation is
    different because only a few erroneous papers
    have been published. Therefore, this community
    has not been apparently strongly pushed to
    establish strict standards. Another explanation
    for the delay in establishing strict standards
    might be related to the complexity of the
    problems.
  • According to the recent awareness about
    genotyping errors occurrence and about their
    potential impact, it can be predicted that more
    and more attention will be paid to these
    difficulties when designing experimental
    protocols and publishing results.

125
How to limit genotyping errors and their impact?
  • General recommendations
  • Limiting the production of errors during
    genotyping
  • Cleaning the dataset after genotyping
  • Analysing data taking into account the errors
  • Towards quality processes for genotyping
  • Practicals establishing reliable experimental
    protocols (case studies)

126
Practicals establishing reliable experimental
protocols (case studies)
  • Identify the question
  • Design the pilot study
  • Design the sampling strategy
  • Design the experimental protocol that will limit
    as much as possible the genotyping errors
  • Design the data analysis process

127
Practicals establishing reliable experimental
protocols (case studies)
  • A good approach is to consider that the
    experimental protocol used will produce only
    artifacts and that all the samples have been
    mixed during the process.
  • The best strategy is to try to establish an
    experimental protocol that demonstrates that no
    artifacts have been produced, and that nothing
    has been mixed up during the process.

128
Phylogeography of Capercallie
  • What is the status of the Pyrenean and Cantabrian
    populations?

129
Phylogeography of Capercallie
  • 92 faeces samples covering the whole range (23
    localities).
  • Sequencing of the mitochondrial DNA control
    region (443 bp) in both directions.
  • Unexpected results that could come from tube
    mixing!

130
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com