Title: Power in QTL linkage analysis
1Power in QTL linkage analysis
- Shaun Purcell Pak Sham
- SGDP, IoP, London, UK
F\pshaun\power.ppt
2Power primer
- Statistics (e.g. chi-squared, z-score) are
continuous measures of support for a certain
hypothesis
YES OR NO decision-making significance testing
Inevitably leads to two types of mistake
false positive (YES instead of NO) (Type
I) false negative (NO instead of YES) (Type II)
3Hypothesis testing
- Null hypothesis no effect
- A significant result means that we can reject
the null hypothesis - A nonsignificant result means that we cannot
reject the null hypothesis
4Statistical significance
- The p-value
- The probability of a false positive error if the
null were in fact true - Typically, we are willing to incorrectly reject
the null 5 or 1 of the time (Type I error)
5Misunderstandings
- p - VALUES
- that the p value is the probability of the null
hypothesis being true - that high p values mean large and important
effects - NULL HYPOTHESIS
- that nonrejection of the null implies its truth
6Limitations
- IF A RESULT IS SIGNIFICANT
- leads to the conclusion that the null is false
- BUT, this may be trivial
- IF A RESULT IS NONSIGNIFICANT
- leads only to the conclusion that it cannot be
concluded that the null is false
7Alternate hypothesis
- Neyman Pearson (1928)
- ALTERNATE HYPOTHESIS
- specifies a precise, non-null state of affairs
with associated risk of error
8P(T)
T
9STATISTICS
Rejection of H0
Nonrejection of H0
Type I error at rate ?
Nonsignificant result
H0 true
R E A L I T Y
Type II error at rate ?
Significant result
HA true
POWER (1- ?)
10Power
- The probability of rejection of a false
null-hypothesis - depends on
- - the significance crtierion (?)
- - the sample size (N)
- - the effect size (NCP)
The probability of detecting a given effect size
in a population from a sample of size N, using
significance criterion ?
11Impact of ? alpha
P(T)
T
?
?
12Impact of ? effect size, N
P(T)
T
?
?
13Applications
EXPERIMENTAL DESIGN - avoiding false positives
vs. dealing with false negatives
MAGNITUDE VS. SIGNIFICANCE - highly significant ?
very important
INTERPRETING NONSIGIFICANT RESULTS -
nonsignficant results only meaningful if power is
high
- POWER SURVEYS / META-ANALYSES
- - low power undermines the confidence that can be
placed in statistically significant results
14Practical Exercise 1
- Calculation of power for simple case-control
association study. - DATA allele frequency of A allele for cases
and controls - TEST 2-by-2 contingency table chi-squared
- (1 degree of freedom)
15Step 1 determine expected chi-squared
- Hypothetical allele frequencies
- Cases P(A) 0.68
- Controls P(A) 0.54
- Sample 150 cases, 150 controls
- Excel spreadsheet faculty drive\pshaun\chisq.xl
s
Chi-squared statistic 12.36
16Step 2. Determine the critical value for a given
type I error rate, ?
- inverse central chi-squared distribution
P(T)
Critical value
T
17- http//workshop.colorado.edu/pshaun/gpc/pdf.html
- df 1 , NCP 0
- ? X
- 0.05
- 0.01
- 0.001
3.84146
6.63489
10.82754
18Step 3. Determine the power for a given critical
value and non-centrality parameter
- non-central chi-squared distribution
P(T)
Critical value
T
19Determining power
- df 1 , NCP 12.36
- ? X Power
- 0.05 3.84146
- 0.01 6.6349
- 0.001 10.827
0.94
0.83
0.59
20Exercises
- Using the spreadsheet and the chi-squared
calculator, what is power (for the 3 levels of
alpha) - 1. if the sample size were 300 for each group?
- 2. if allele frequencies were 0.24 and 0.18 for
750 cases and 750 controls?
21Answers
- 1. NCP 24.72 ? Power
- 0.05 1.00
- 0.01 0.99
- 0.001 0.95
- 2. NCP 16.27 ? Power
- 0.05 0.98
- 0.01 0.93
- 0.001 0.77
- nb. Stata di 1-nchi(df,NCP,invchi(df,?))
22QTL linkage
23Power of tests
- For chi-squared tests on large samples, power is
determined by non-centrality parameter (?) and
degrees of freedom (df) - ? E(2lnL1 - 2lnL0)
- E(2lnL1 ) - E(2lnL0)
- where expectations are taken at asymptotic values
of maximum likelihood estimates (MLE) under an
assumed true model
24Linkage test
for ij
for i?j
for ij
for i?j
25Expected log likelihood under H0
Expectation of the quadratic product is simply s,
the sibship size (note standarised trait)
26Expected log likelihood under HA
27Linkage test
Expected NCP
28Approximation of NCP
NCP per sib pair is proportional to - the of
pairs in the sibship (large sibships are
powerful) - the square of the additive QTL
variance (decreases rapidly for QTL of v.
small effect) - the sibling correlation (stru
cture of residual variance is important)
29QTL linkage
POWER
Type I errors Type II errors Sample N Effect
Size
Variance explained
Marker vs functional variant
30Incomplete linkage
- The previous calculations assumed analysis was
performed at the QTL. - - imagine that the test locus is not the QTL
- but is linked to it.
- Calculate sib-pair IBD distribution at the QTL,
conditional on IBD at test locus, - - a function of recombination fraction
31? at QTL
0
1/2
1
? at M
0
1/2
1
32- Use conditional probabilities to calculate the
sib correlation conditional on IBD sharing at the
test marker. For example for IBD 0 at marker
? at QTL
0
1/2
1
C0
33- The noncentrality parameter per sib pair is then
given by
34- If the QTL is additive, then
- attenuation of the NCP is by a factor of (1-2?)4
- square of the correlation
- between the proportions of alleles IBD
- at two loci with recombination fraction ?
35Effect of incomplete linkage
36Effect of incomplete linkage
37Comparison to H-E
- Amos Elston (1989) H-E regression
- - 90 power (at significant level 0.05)
- - QTL variance 0.5
- - marker and major gene are completely linked
- ? 320 sib pairs
- ? 778 sib pairs if ? 0.1
38GPC input parameters
- Proportions of variance
- additive QTL variance
- dominance QTL variance
- residual variance (shared / nonshared)
- Recombination fraction ( 0 - 0.5 )
- Sample size Sibship size ( 2 - 5 )
- Type I error rate
- Type II error rate
39GPC output parameters
- Expected sibling correlations
- - by IBD status at the QTL
- - by IBD status at the marker
- Expected NCP per sibship
- Power
- - at different levels of alpha given sample
size - Sample size
- - for specified power at different levels of
alpha given power
40From GPC
- Modelling additive effects only
- Sibships Individuals
- Pairs 265 (320) 530
- Pairs (? 0.1) 666 (778) 1332
Trios (? 0.1) 220 660 Quads (?
0.1) 110 440 Quints (? 0.1) 67 335
41Practical Exercise 2
- What is the effect on power to detect linkage of
- 1. QTL variance?
- 2. residual sibling correlation?
-
- 3. marker-QTL recombination fraction?
42Pairs required (?0, p0.05, power0.8)
43Pairs required (?0, p0.05, power0.8)
44Effect of residual correlation
- QTL additive effects account for 10 trait
variance - Sample size required for 80 power (?0.05)
- No dominance
- ? 0.1
- A residual correlation 0.35
- B residual correlation 0.50
- C residual correlation 0.65
45Individuals required
46Selective genotyping
Unselected
Proband Selection
EDAC
Maximally Dissimilar
ASP
Extreme Discordant
EDAC
Mahanalobis Distance
47Selective genotyping
- The power calculations so far assume an
unselected population. - - calculate expected NCP per sibship
- If we have a sample with trait scores
- - calculate expected NCP for each sibship
conditional on trait values - - this quantity can be used to rank order the
sample for genotying
48Sibship informativeness sib pairs
49Sibship informativeness sib pairs
dominance
rare recessive
unequal allele frequencies
50Selective genotyping
ASP
MaxD
PS
ED
EDAC
MDis
SEL B
SEL T
p
d/a
.5
0
15.82
.1
0
17.10
.25
0
15.45
.1
16.88
1
.25
15.76
1
.5
1
18.89
.75
1
27.64
43.16
.9
1
51Impact of selection
52QTL linkage
POWER
Type I errors Type II errors Sample N Effect
Size
Variance explained
Marker vs functional variant
Locus informativeness
53Indices of marker informativeness
- Markers should be highly polymorphic
- - alleles inherited from different sources are
likely to be distinguishable - Heterozygosity (H)
- Polymorphism Information Content (PIC)
- - measure number and frequency of alleles at a
locus
54Heterozygosity
- n number of alleles,
- pi frequency of the ith allele.
- H probability that an individual is
heterozygous
55Heterozygosity
Allele Frequency
Genotype Frequency 11 0.04 12 0.14 13 0.02 1
4 0.16 22 0.1225 23 0.035 24 0.28 33 0.0025 3
4 0.04 44 0.16
Genotype Frequency 11 12 0.14 13 0.02 14 0
.16 22 23 0.035 24 0.28 33 34 0.04 44
1 0.20 2 0.35 3 0.05 4 0.40
56Polymorphism information content
- IF a parent is heterozygous,
- their gametes will usually be informative.
-
- BUT if both parents child are heterozygous for
the same genotype, - origins of childs alleles are ambiguous
- IF C the probability of this occurring,
- PIC H - C
57Polymorphism information content
58Possible IBD configurations given parental
genotypes
Parental Mating Type
Configuration
Probability
1 Hom ? Hom 1/4 1/2 (1-H)2
2 Hom ? Het 0 1/4 H(1-H)
3 Hom ? Het 1/2 3/4 H(1-H)
4 Het ? Het 0 1/2 H2 /
2
5 Het ? Het 0 0 (H2
-C)/4
6 Het ? Het 1 1 (H2
-C)/4
7 Het ? Het 1/2 1/2 C/2
59PIC NCP for linkage
- From the table of possible IBD configurations
given parental genotypes, -
- Therefore, NCP is attenuated in proportion to
PIC
60QTL linkage
POWER
Type I errors Type II errors Sample N Effect
Size
Variance explained
Marker vs functional variant
Locus informativeness
Multipoint
61Multipoint IBD
- Estimates IBD sharing at any arbitrary point
along a chromosomal region, using all available
marker information on a chromosome
simultaneously.
62?, ?, ? and PIC
635cM
5cM
5cM
5cM
M1 0.1 0.2 0.7
M2 0.2 0.2 0.2 0.2 0.2
M3 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.2
M4 0.2 0.1 0.1 0.2 0.2 0.2
M5 0.2 0.2 0.2 0.2 0.2
643. Calculate covariance matrixbetween pi-hat at
markers
654. Consider each multipoint position
- At each position along the chromosome, calculate
covariance between trait locus and each of the
markers -
MD 10 15 20 25 30
PIC 0.41 0.77 0.84 0.79 0.77
RF 0.091 0.130 0.165 0.197 0.226
66Fulker et al multipoint
- If is a vector of single marker IBD
estimates then a multipoint IBD estimate at test
position t is given by - Conditional on the variance of ? at the
test position is reduced by a quantity which can
be thought of as a multipoint PIC
675. Calculate MPIC
6810 cM map
695 cM map
70Exclusion mapping
- Exclusion support for the hypothesis that a
QTL of at least a certain effect is absent at
that position - Normally, the LRT compares the likelihood at the
MLE and the null - In exclusion mapping, the LRT compares the
likelihood of a fixed effect size against the
null and therefore can be negative
71Conclusions
- Factors influencing power
- QTL variance
- Sib correlation
- Sibship size
- Marker informativeness
- Marker density
- Phenotypic selection