Human%20Genetics - PowerPoint PPT Presentation

About This Presentation
Title:

Human%20Genetics

Description:

... root mean squared error (RMSE) of logea, calculated for various sample size ... 90% confidence, the true loge a lies in the interval logea 1.645(RSME), i. ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 48
Provided by: pwe59
Category:
Tags: 20genetics | human | loge

less

Transcript and Presenter's Notes

Title: Human%20Genetics


1
Human Genetics
  • Genetic Epidemiology

2
  • Family trees can have a lot of nuts

3
Genetic Epidemiology - Aims
  1. Gene detection
  2. Gene characterization

mode of inheritance allele frequencies ?
prevalence, attributable risk
4
Genetic Epidemiology - Methods
  • Aggregation
  • Segregation
  • Co-segregation
  • Association

5
Segregation
determined by a dominant or recessive allele
  • Can the dichotomy or trichotomy be explained by
    Mendelian segregation?

6
Likelihood (parameter(s) data) ?
Probability (data parameter(s))
7
Transmission Probabilities
Value if there is Mendelian segregation 1 ½ 0
8
Ascertainment
  • We examine segregating sibships
  • The proportion of sibs affected is larger than
    expected on the basis of Mendelian inheritance
  • The likelihood must be conditional on the mode
    of ascertainment
  • We need to know the proband sampling frame

9
Cosegregation
  • Chromosome segments are transmitted
  • Cosegregation is caused by linked loci

10
Methods of Linkage Analysis
  • Trait model-based assume a genetic model
    underlying the trait
  • Trait model-free - no assumptions about the
    genetic model underlying the trait
  • Ascertainment is often not an issue for locus
    detection by linkage analysis

11
Model-based Linkage Analysis
  • If founder marker genotypes are known or can
    be inferred exactly,
  • ? no increase in Type 1 error
  • ? smallest Type 2 error when the model
    is correct
  • If founder marker genotypes are unknown, we can
  • 1) estimate them
  • 2) use a database
  • All parameters other than the recombination
    fraction are assumed known

12
(No Transcript)
13
Model-free Linkage AnalysisIdentity-in-state
versus Identity-by-descent
  • Two alleles are identical by descent if they are
    copies of the same parental allele

14
Sib pairs share
  • 0, 1 or 2 alleles identical by descent at a
    marker locus
  • 0, 1 or 2 alleles identical by descent at a
    trait locus

Linkage
The average proportion shared at any particular
locus is 1/2
15
Relative Pair Model-Free Linkage Analysis
  • We correlate relative-pair similarity
    (dissimilarity) for the trait of interest with
    relative-pair similarity (dissimilarity) for a
    marker
  • Linkage between a trait locus and a marker
    locus
  • ? positive correlation
  • Affected relative pair analysis Do affected
    relative pairs share more
    marker alleles than expected if there is no
    linkage?
  • No controls!

16
Association
  • Causes of association between a marker and a
    disease
  • chance
  • stratification, population heterogeneity
  • very close linkage
  • pleiotropy

17
Causes of Allelic Association
Heterogeneity/stratification
The best solution to avoid this confounding is to
study only ethnically homogeneous populations
18
(Tight) Linkage
This chromosome is passed down through the
generations, and now there are many copies. If
the distance between D and A1 is small,
recombinations are unlikely, so most D
chromosomes carry A1 This is the type of allelic
association we are interested in
19
Guarding Against Stratification
  • Three solutions
  • use a homogenous population
  • use family-based controls
  • use genomic control

20
Matching on Ethnicity
  • Close relatives are the best controls, but can
    lead to overmatching
  • Cases and control family members must have the
    same family history of disease

21
Transmission Disequilibrium Test (TDT)
  • A design that uses pseudosibs as controls
  • Cases and their parents are typed for markers

Transmitted genotype is A1A2 Untransmitted
genotype is A2A2
Father transmits A1, does not transmit A2 Mother
transmits A2, does not transmit
A2 (uninformative in terms of alleles)
22
  • Build up a 2 x 2 table
  • The counts a and d come from homozygous
    parents
  • The counts b and c come from heterozygous
    parents

23
Genomic Control
  • Calculate an association statistic for
    a candidate locus
  • Calculate the same association statistic, from
    the same sample, for a set of unlinked loci
  • Determine significance by reference to the
    results for the unlinked loci

24
Linkage Between a Marker and a Disease
  • Intrafamilial association
  • Typically no population association
  • Not affected by population stratification
  • Population association if very close

25
Association versus Linkage
  • Association at the population level

Intrafamilial association
  • Pinpoints alleles

Pinpoints loci
  • More powerful

Less powerful
  • More tests required

Fewer tests required
  • More sensitive to mistyping

Less sensitive to mistyping
  • Sensitive to population stratification

Not sensitive to population stratification
  • Which is better?

26
What is the Best Design and Analysis?
  • If heterogeneity / stratification is a
    non-issue,
  • unrelated cases and controls for association
    analysis
  • (genome scan?)

large extended pedigrees, type all (founders and
non- founders) for 200-400 equi-spaced markers,
for linkage analysis
Note cost, burden of multiple testing
A wise investigator, like a wise investor, would
hedge bets with a judicious mix
27
Case-Control Data
28
  • Consider the probability structure
  • Cochran-Armitage trend test the null
    hypothesis
  • p2 ½p1 q2 ½q1
  • without assuming the two alleles a person has
    are independent

Sasieni (1997) Biometrics 531253-1261
29
asymptotically has a ?2 distribution with 1 d.f
30
Cochran-Armitage Trend Test
  • Does not assume independence of alleles within a
    person
  • Does assume independence of genotypes from
    person to person
  • Is not valid if there is population
    stratification
  • The increased variance due to stratification
    can be estimated from a random set of markers
    that are independent of the disease

31
Case-only Studies
  • Suggested as
  • more powerful (only cases needed)
  • more precise (signal decreases faster with
    distance from the causative locus)

32
Case - only Studies
  • No power in the case of a multiplicative model
  • No controls
  • there must be a difference in HWD between
    cases and controls

33
(No Transcript)
34
Weighted average of the Cochran-Armitage trend
test and the HWD trend test statistics
35
  • To investigate the null distribution of this
    average we simulate many different situations
    sample sizes up to 10,000 cases and 10,000
    controls - and generate
  • For all situations considered, the distribution
    is well approximated by a Gamma distribution

36
  • As the sample size and marker allele frequency
    increase, the largest mean and the smallest
    variance occur for 10,000 cases and 10,000
    controls, and for a marker allele frequency 0.5
  • For 10,000 cases and 10,000 controls, and
    marker allele frequency 0.5, the upper tail of
    the distribution is well approximated by a Gamma
    distribution with mean µ 1.78 and variance s2
    3.45

37
  • We develop a prediction equation to determine
    percentiles of the null distribution for smaller
    sample sizes and marker allele frequencies
  • We base goodness of fit on the root mean squared
    error (RMSE) of logea, calculated for various
    sample size combinations, from the variance among
    50 replicate samples

38
  • With 90 confidence, the true loge a lies in
    the interval logea 1.645(RSME), i.e., a is
    within e1.645(RSME) - fold of the true a
  • For total sample size (R S) 200 or larger
    and a 0.0001 or larger, in the very worst case
    (R S 100, a 0.0001) with 90 confidence a
    could differ from the true a by a factor of at
    most 4.8
  • The average RMSE is 0.35, corresponding to
    being between 78 and 122 of the true a with
    90 confidence

39
POWER Genetic Models Simulated
Probability of being affected given Probability of being affected given Probability of being affected given
A1A1 A1A AA
1 Recessive 1 1.00 0.10 0.10
2 Recessive 2 1.00 0.05 0.05
3 Additive 1.00 0.50 0.00
4Multiplicative 0.81 0.045 0.0025
  • Each simulated population contains 500,000
    individuals allowed to randomly mate for 50
    generations after the appearance of a disease
    mutation
  • Marker loci placed at distances 0 6 cM from
    the disease susceptibility locus
  • For type I error, no association between the
    disease and marker loci

40
Tests Performed
  • Homogeneous populations
  • HWD, cases only
  • Allele test
  • Allele test x HWD in cases
  • HWD trend test
  • Cochran-Armitage trend test
  • Cochran-Armitage trend test x HWD trend test
  • Weighted average
  • Population stratification
  • Cochran-Armitage trend test with genomic control
  • Product of this and the HWD trend test
  • Weighted average with genomic control

41
Type I error, homogeneous population
? HWD test, cases only ? product of the
allele test and HWD test
42
Type I error, population stratification
? allele test ? Cochran-Armitage trend
test ? product of the allele test and HWD
test weighted average test ? product
of the Cochrn-Armitage trend test and the HWD
test
43
Power, homogeneous population
weighted average test
44
Power, population stratification
? HWD trend test ? CA test with genomic
control weighted average with genomic
control
45
Conclusions
  • Under recessive inheritance, the weighted
    average has better performance than either the
    Cochran-Armitage trend test or the HWD trend test
  • Has good performance for other models as well
  • The product of the Cochran-Armitage trend test
    statistic and the HWD test statistic (cases only)
    has better power, but has inflated Type I error
    if there is population stratification
  • The weighted average has good overall
    properties, automatically controls for marker
    mistyping

46
  • With acknowledgment to
  • Kijoung Song

47
  • Can we use evolutionary models, when we have
    large amounts of genetic data on a sample of
    cases and controls, to obtain a more powerful way
    of detecting loci involved in the etiology of
    disease?
  • Will these models bear fruit or nuts?
Write a Comment
User Comments (0)
About PowerShow.com