Title: Human%20Genetics
1Human Genetics
2- Family trees can have a lot of nuts
3Genetic Epidemiology - Aims
- Gene detection
- Gene characterization
mode of inheritance allele frequencies ?
prevalence, attributable risk
4Genetic Epidemiology - Methods
- Aggregation
- Segregation
- Co-segregation
- Association
5Segregation
determined by a dominant or recessive allele
- Can the dichotomy or trichotomy be explained by
Mendelian segregation?
6Likelihood (parameter(s) data) ?
Probability (data parameter(s))
7Transmission Probabilities
Value if there is Mendelian segregation 1 ½ 0
8Ascertainment
- We examine segregating sibships
- The proportion of sibs affected is larger than
expected on the basis of Mendelian inheritance - The likelihood must be conditional on the mode
of ascertainment - We need to know the proband sampling frame
9Cosegregation
- Chromosome segments are transmitted
- Cosegregation is caused by linked loci
10Methods of Linkage Analysis
- Trait model-based assume a genetic model
underlying the trait - Trait model-free - no assumptions about the
genetic model underlying the trait
- Ascertainment is often not an issue for locus
detection by linkage analysis
11Model-based Linkage Analysis
- If founder marker genotypes are known or can
be inferred exactly, - ? no increase in Type 1 error
- ? smallest Type 2 error when the model
is correct
- If founder marker genotypes are unknown, we can
- 1) estimate them
- 2) use a database
- All parameters other than the recombination
fraction are assumed known
12(No Transcript)
13 Model-free Linkage AnalysisIdentity-in-state
versus Identity-by-descent
- Two alleles are identical by descent if they are
copies of the same parental allele
14Sib pairs share
- 0, 1 or 2 alleles identical by descent at a
marker locus - 0, 1 or 2 alleles identical by descent at a
trait locus
Linkage
The average proportion shared at any particular
locus is 1/2
15Relative Pair Model-Free Linkage Analysis
- We correlate relative-pair similarity
(dissimilarity) for the trait of interest with
relative-pair similarity (dissimilarity) for a
marker
- Linkage between a trait locus and a marker
locus - ? positive correlation
- Affected relative pair analysis Do affected
relative pairs share more
marker alleles than expected if there is no
linkage? - No controls!
16Association
- Causes of association between a marker and a
disease
- chance
- stratification, population heterogeneity
- very close linkage
- pleiotropy
17Causes of Allelic Association
Heterogeneity/stratification
The best solution to avoid this confounding is to
study only ethnically homogeneous populations
18(Tight) Linkage
This chromosome is passed down through the
generations, and now there are many copies. If
the distance between D and A1 is small,
recombinations are unlikely, so most D
chromosomes carry A1 This is the type of allelic
association we are interested in
19Guarding Against Stratification
- use a homogenous population
- use family-based controls
- use genomic control
20Matching on Ethnicity
- Close relatives are the best controls, but can
lead to overmatching - Cases and control family members must have the
same family history of disease
21Transmission Disequilibrium Test (TDT)
- A design that uses pseudosibs as controls
- Cases and their parents are typed for markers
Transmitted genotype is A1A2 Untransmitted
genotype is A2A2
Father transmits A1, does not transmit A2 Mother
transmits A2, does not transmit
A2 (uninformative in terms of alleles)
22- The counts a and d come from homozygous
parents - The counts b and c come from heterozygous
parents
23Genomic Control
- Calculate an association statistic for
a candidate locus - Calculate the same association statistic, from
the same sample, for a set of unlinked loci - Determine significance by reference to the
results for the unlinked loci
24Linkage Between a Marker and a Disease
- Intrafamilial association
- Typically no population association
- Not affected by population stratification
- Population association if very close
25Association versus Linkage
- Association at the population level
Intrafamilial association
Pinpoints loci
Less powerful
Fewer tests required
- More sensitive to mistyping
Less sensitive to mistyping
- Sensitive to population stratification
Not sensitive to population stratification
26What is the Best Design and Analysis?
- If heterogeneity / stratification is a
non-issue, - unrelated cases and controls for association
analysis - (genome scan?)
large extended pedigrees, type all (founders and
non- founders) for 200-400 equi-spaced markers,
for linkage analysis
Note cost, burden of multiple testing
A wise investigator, like a wise investor, would
hedge bets with a judicious mix
27Case-Control Data
28- Consider the probability structure
- Cochran-Armitage trend test the null
hypothesis - p2 ½p1 q2 ½q1
- without assuming the two alleles a person has
are independent
Sasieni (1997) Biometrics 531253-1261
29asymptotically has a ?2 distribution with 1 d.f
30Cochran-Armitage Trend Test
- Does not assume independence of alleles within a
person - Does assume independence of genotypes from
person to person
- Is not valid if there is population
stratification
- The increased variance due to stratification
can be estimated from a random set of markers
that are independent of the disease
31Case-only Studies
- Suggested as
- more powerful (only cases needed)
- more precise (signal decreases faster with
distance from the causative locus)
32Case - only Studies
- No power in the case of a multiplicative model
- there must be a difference in HWD between
cases and controls
33(No Transcript)
34Weighted average of the Cochran-Armitage trend
test and the HWD trend test statistics
35- To investigate the null distribution of this
average we simulate many different situations
sample sizes up to 10,000 cases and 10,000
controls - and generate
- For all situations considered, the distribution
is well approximated by a Gamma distribution
36- As the sample size and marker allele frequency
increase, the largest mean and the smallest
variance occur for 10,000 cases and 10,000
controls, and for a marker allele frequency 0.5
- For 10,000 cases and 10,000 controls, and
marker allele frequency 0.5, the upper tail of
the distribution is well approximated by a Gamma
distribution with mean µ 1.78 and variance s2
3.45
37- We develop a prediction equation to determine
percentiles of the null distribution for smaller
sample sizes and marker allele frequencies - We base goodness of fit on the root mean squared
error (RMSE) of logea, calculated for various
sample size combinations, from the variance among
50 replicate samples
38- With 90 confidence, the true loge a lies in
the interval logea 1.645(RSME), i.e., a is
within e1.645(RSME) - fold of the true a
- For total sample size (R S) 200 or larger
and a 0.0001 or larger, in the very worst case
(R S 100, a 0.0001) with 90 confidence a
could differ from the true a by a factor of at
most 4.8 - The average RMSE is 0.35, corresponding to
being between 78 and 122 of the true a with
90 confidence
39POWER Genetic Models Simulated
Probability of being affected given Probability of being affected given Probability of being affected given
A1A1 A1A AA
1 Recessive 1 1.00 0.10 0.10
2 Recessive 2 1.00 0.05 0.05
3 Additive 1.00 0.50 0.00
4Multiplicative 0.81 0.045 0.0025
- Each simulated population contains 500,000
individuals allowed to randomly mate for 50
generations after the appearance of a disease
mutation
- Marker loci placed at distances 0 6 cM from
the disease susceptibility locus - For type I error, no association between the
disease and marker loci
40Tests Performed
- Homogeneous populations
- HWD, cases only
- Allele test
- Allele test x HWD in cases
- HWD trend test
- Cochran-Armitage trend test
- Cochran-Armitage trend test x HWD trend test
- Weighted average
- Population stratification
- Cochran-Armitage trend test with genomic control
- Product of this and the HWD trend test
- Weighted average with genomic control
41Type I error, homogeneous population
? HWD test, cases only ? product of the
allele test and HWD test
42Type I error, population stratification
? allele test ? Cochran-Armitage trend
test ? product of the allele test and HWD
test weighted average test ? product
of the Cochrn-Armitage trend test and the HWD
test
43Power, homogeneous population
weighted average test
44Power, population stratification
? HWD trend test ? CA test with genomic
control weighted average with genomic
control
45Conclusions
- Under recessive inheritance, the weighted
average has better performance than either the
Cochran-Armitage trend test or the HWD trend test - Has good performance for other models as well
- The product of the Cochran-Armitage trend test
statistic and the HWD test statistic (cases only)
has better power, but has inflated Type I error
if there is population stratification - The weighted average has good overall
properties, automatically controls for marker
mistyping
46- With acknowledgment to
- Kijoung Song
47- Can we use evolutionary models, when we have
large amounts of genetic data on a sample of
cases and controls, to obtain a more powerful way
of detecting loci involved in the etiology of
disease? - Will these models bear fruit or nuts?