Assessment of genotype data quality and the effect of data quality on analysis - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Assessment of genotype data quality and the effect of data quality on analysis

Description:

Merlin, Sibmed. Need sibs or extended families. No, would only do if have families in sample ... sph.umich.edu/csg/abecasis/Merlin/tour/error.html. 18. Sample ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 41
Provided by: ljst
Category:

less

Transcript and Presenter's Notes

Title: Assessment of genotype data quality and the effect of data quality on analysis


1
Assessment of genotype data quality and the
effect of data quality on analysis
  • Laura Scott
  • March 29, 2006

2
Questions
  • What methods can be used to assess quality of
    genotype data?
  • What criteria are used to eliminate bad
    markers?
  • How do markers with poor genotype quality affect
    tests of association and linkage?
  • When are gentoype errors not really errors?

3
What is a genotyping error ?
  • Genotype error The observed genotype and true
    genotype dont match
  • Truth AA, Observed AB (or BB)
  • Missing data Genotype exists but is not called
  • TruthAA, Observed

4
Terminology
  • Blinded
  • Genotyping group is unaware of the duplicates or
    family relationships
  • Unblinded
  • Genotyping group is aware of duplicates or family
    relationships. Use information to
  • Form clusters (Illumina)
  • Adjust algorithms (Affy)

5
Sources of apparent genotyping error
  • Sample quality
  • Sample specific characteristics
  • Human/machine error in sample handling
  • Error in genotyping procedure
  • Human error in data manipulation or test
  • Genomic DNA characteristics

6
Methods of detecting genotyping
errors/questionable data
7
Samples with poor success rate
  • Poor quality DNA preparation
  • Low DNA concentration
  • Some whole genome amplification methods produce
    poor quality samples
  • Poor genotyping

8
SNPs with poor success rates
  • Poor genotyping
  • Deletions
  • SNP under primer
  • Sample contamination

9
Duplicate sample error rates
  • For each SNP
  • For pair where both have data
  • Number of discrepant pairs / total pairs

10
Power to detect genotype errors by number of
duplicates and error rate
11
Detection of Mendelian inheritance
inconsistencies using parent-child trios
Aa
aa
Aa
Aa
aa
Aa
Consistent
Inconsistent
Uninformative, consistent
12
Reasons for inconsistencies in Mendelian
inheritance in parent-child trios
  • Child is not consistent with the
  • parents
  • Possibilities
  • Genotyping error
  • New mutation
  • Deletion
  • Sample contamination

13
Hardy-Weinberg equilibrium
  • In 1908 Hardy and Weinberg separately noticed
    that the allele frequency in a population could
    be used to calculate the expected genotype
    frequencies in randomly mating populations at
    equilibrium
  • If p frequency of A allele, expected frequency
    of
  • AAp2
  • AB2p(1-p)
  • BB(1-p) 2

?2S((obs count-exp count)/exp count)2 with 1 df
14
Testing Hardy-Weinberg in samples with rarer
alelles
Exact test ?2 test The ?2 test is often very
anti-conservative when have small allele counts
Wigginton JE, Cutler DJ, Abecasis GR A note on
exact tests of Hardy-Weinberg equilibrium. Am J
Hum Genet 2005 76 887-893.
15
Ability to detect Hardy-Weinberg deviations
Cox DG, Kraft P Quantification of the Power of
Hardy-Weinberg Equilibrium Testing to Detect
Genotyping Error. Hum Hered 2006 61 10-14.
16
Power to detect Hardy Weinberg deviations
17
Tight double recombinants
  • Need family data
  • Identification of genotypes that are very
    unlikely given the surrounding haplotype

Abecasis http//www.sph.umich.edu/csg/abecasis/Me
rlin/tour/error.html
18
Sample switches
  • Samples switched in original tubes
  • Mistakes in sample handling when plates are built
  • Mis-orientation of sample plates during
    genotyping

19
Placement of samples in plates for genotyping to
detect sample switches
  • Mix placement of cases and controls
  • Make unique patterns of sex by row and column of
    individuals
  • Place duplicate samples on separate plates in
    different places
  • Position control samples so can assess genotyping
    quality before end of project

20
Affect of genotyping error on case/control
association tests
  • Sample genotypes AA, AB, BB from a population in
    HWE
  • Allele frequencies p, q
  • Genotype frequencies p2, 2pq, q2
  • Traditional estimate of allele frequency from
    observed genotype frequencies
    is dependent on assumption of HWE
  • Estimates may be biased when HWE is violated

Karen Conneely and Michael Boehnke, unpublished
21
Bias in allele frequency estimates small for most
levels missing genotypes
  • Missing data from early stage Illumina
    genotyping
  • 5.1 of heterozygotes (6037/118959)
  • 0.7 of homozygotes (2312/335783)

Karen Conneely and Michael Boehnke, unpublished
22
Asymptotic bias in association testdue to loss
of genotypes
  • Assuming equal sample sizes, t-statistic is
    estimated
  • When there is truly no difference between cases
    and controls, and no genotyping error or loss,
  • If when genotypes are missing the variance will
    be under or over estimated.

Karen Conneely and Michael Boehnke, unpublished
23
Asymptotic bias in association test due to
genotyping error
  • The test will be similarly biased if there is
    systematic genotyping error
  • AB mistyped as AA ? anticonservative test
  • AA mistyped as AB ? conservative test
  • Exception when AB ? AA and AB?BB with equal
    frequency, biases cancel out. In this case, the
    test is valid and no power is lost.

Karen Conneely and Michael Boehnke, unpublished
24
Robust tests for case/control association
  • Genotype frequency based tests robust to
    mistyping and missing data because no assumption
    of HWE
  • 2x3 ?2 test of equal genotype frequencies in
    cases and controls
  • Armitages test for trend (Armitage, Biometrics
    11375-86, 1955,Sasieni et al. Biometrics
    531253-61, 1997)
  • Equivalent to using logistic regression with
    score test
  • Type 1 error is correct
  • Power greater for Armitages test than for 2x3
    test if model is additive on log odds scale
  • Small loss of power under systematic
    mistyping/loss

Karen Conneely and Michael Boehnke, unpublished
25
Linkage analysis

26
Affect of a single mis-genotyped marker on
linkage
Yonan AL, Palmer AA, Gilliam TC Hardy-Weinberg
disequilibrium identified genotyping error of the
serotonin transporter (SLC6A4) promoter
polymorphism. Psychiatr Genet 2006 16 31-34.
27
Effect of data cleaning on linkage
Precleaning Postcleaning
BMI Tecumseh
BMI Tecumseh and Maywood
Chang YP, Kim JD, Schwander K et al The impact
of data quality on the identification of complex
disease genes experience from the Family Blood
Pressure Program. Eur J Hum Genet 2006 14
469-477.
28
Hardy-Weinberg deviations caused by presence of
disease associated alleles
  • Deviations from Hardy-Weinberg (HWD) can be
    caused by disease association with marker
  • Hard to distinguish between disease related and
    genotyping error related HWD
  • Let A non risk allele, qfrequency of A,
    ggenotype frequency
  • Express HWD as ? gAA (1-q)2

Wittke-Thompson JK, Pluzhnikov A, Cox NJ
Rational inferences about departures from
Hardy-Weinberg equilibrium. Am J Hum Genet 2005
76 967-986.
29
Is HWD in cases and controls consistent with a
genetic disease model?
  • Parameterize ? in terms of risk of disease for
    each genotype
  • Find best fitting additive, dominant, recessive,
    multiplicative or general model
  • Ask if the expected counts from this model are
    not significantly different than those observed
  • Assess using simulations

Wittke-Thompson JK, Pluzhnikov A, Cox NJ
Rational inferences about departures from
Hardy-Weinberg equilibrium. Am J Hum Genet 2005
76 967-986.
30
HWD for cases and controls
? gAA (1-q)2 , ? gt1 excess homozygotes, ? lt1
deficit homozygotes
Cases
Controls
Dominant
Recessive
31
HWD for cases and controls
Cases
Controls
Additive
Multiplicative
32
HWD summary
  • Test 60 polymorphisms with HWE departures
  • Find 34 are consistent with tested biological
    models
  • Does not prove HWD is due to disease

33
Detection of deletions from data with multiple
null genotypes and Mendelian inconsistencies
  • HapMap data
  • Phase 1 with1.3 M SNPs
  • 269 individuals
  • Assess similarity of patterns of
  • Mendelian inconsistencies
  • Null gentoypes
  • Calculate the binomial probability of observing
    each pattern n times in m markers relative to
    background rates
  • Use observed heterozygotes/ expected
    heterozygotes lt .4 or .7 to help confirm

McCarroll SA, Hadnott TN, Perry GH et al Common
deletion polymorphisms in the human genome. Nat
Genet 2006 38 86-92.
34
Detection of deletions

McCarroll SA, Hadnott TN, Perry GH et al Common
deletion polymorphisms in the human genome. Nat
Genet 2006 38 86-92.
35
SNPs with deviations from expected tests are
found in close proximity to each other
McCarroll SA, Hadnott TN, Perry GH et al Common
deletion polymorphisms in the human genome. Nat
Genet 2006 38 86-92.
36
See clustered patterns of deviations from
expected test results
Size distribution of putative deletions
McCarroll SA, Hadnott TN, Perry GH et al Common
deletion polymorphisms in the human genome. Nat
Genet 2006 38 86-92.
37
Detection of deletions using Illumina
McCarroll SA, Hadnott TN, Perry GH et al Common
deletion polymorphisms in the human genome. Nat
Genet 2006 38 86-92.
38
Deletions summary
  • Identified 541 potential deletions
  • Confirmed majority of deletions that were tested
    by more stringent methods
  • Deletions often in high r2 with surrounding SNPs
  • Find deletions of a few genes olfactory
    receptors, drug metabolism, sex steroid hormones

McCarroll SA, Hadnott TN, Perry GH et al Common
deletion polymorphisms in the human genome. Nat
Genet 2006 38 86-92.
39
Summary
  • Genotype errors can arise in many different ways
  • Multiple methods exist for determining genotype
    data quality
  • Poor quality genotyping data can strongly affect
    some analysis
  • Some poor quality SNPs may have underlying
    biological basis

40
More references
  • Review of sources of genotyping errors
  • Pompanon F, Bonin A, Bellemain E, Taberlet P
    Genotyping errors causes, consequences and
    solutions. Nat Rev Genet 2005 6 847-859.
  • Error checking programs
  • Douglas JA, Boehnke M, Lange K A multipoint
    method for detecting genotyping errors and
    mutations in sibling-pair linkage data. Am J Hum
    Genet 2000 66 1287-1297.
  • Wigginton JE, Abecasis GR PEDSTATS descriptive
    statistics, graphics and quality assessment for
    gene mapping data. Bioinformatics 2005 21
    3445-3447.
  • O'Connell JR, Weeks DE PedCheck a program for
    identification of genotype incompatibilities in
    linkage analysis. Am J Hum Genet 1998 63
    259-266.
Write a Comment
User Comments (0)
About PowerShow.com