Power in QTL linkage analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Power in QTL linkage analysis

Description:

A significant' result means that we can reject the null hypothesis ... that the p value is the probability of the null hypothesis being true ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 72
Provided by: shaunp2
Category:

less

Transcript and Presenter's Notes

Title: Power in QTL linkage analysis


1
Power in QTL linkage analysis
  • Shaun Purcell Pak Sham
  • SGDP, IoP, London, UK

F\pshaun\power.ppt
2
Power primer
  • Statistics (e.g. chi-squared, z-score) are
    continuous measures of support for a certain
    hypothesis

YES OR NO decision-making significance testing
Inevitably leads to two types of mistake
false positive (YES instead of NO) (Type
I) false negative (NO instead of YES) (Type II)
3
Hypothesis testing
  • Null hypothesis no effect
  • A significant result means that we can reject
    the null hypothesis
  • A nonsignificant result means that we cannot
    reject the null hypothesis

4
Statistical significance
  • The p-value
  • The probability of a false positive error if the
    null were in fact true
  • Typically, we are willing to incorrectly reject
    the null 5 or 1 of the time (Type I error)

5
Misunderstandings
  • p - VALUES
  • that the p value is the probability of the null
    hypothesis being true
  • that high p values mean large and important
    effects
  • NULL HYPOTHESIS
  • that nonrejection of the null implies its truth

6
Limitations
  • IF A RESULT IS SIGNIFICANT
  • leads to the conclusion that the null is false
  • BUT, this may be trivial
  • IF A RESULT IS NONSIGNIFICANT
  • leads only to the conclusion that it cannot be
    concluded that the null is false

7
Alternate hypothesis
  • Neyman Pearson (1928)
  • ALTERNATE HYPOTHESIS
  • specifies a precise, non-null state of affairs
    with associated risk of error

8
P(T)
T
9
STATISTICS
Rejection of H0
Nonrejection of H0
Type I error at rate ?
Nonsignificant result
H0 true
R E A L I T Y
Type II error at rate ?
Significant result
HA true
POWER (1- ?)
10
Power
  • The probability of rejection of a false
    null-hypothesis
  • depends on
  • - the significance crtierion (?)
  • - the sample size (N)
  • - the effect size (NCP)

The probability of detecting a given effect size
in a population from a sample of size N, using
significance criterion ?
11
Impact of ? alpha
P(T)
T
?
?
12
Impact of ? effect size, N
P(T)
T
?
?
13
Applications
EXPERIMENTAL DESIGN - avoiding false positives
vs. dealing with false negatives
MAGNITUDE VS. SIGNIFICANCE - highly significant ?
very important
INTERPRETING NONSIGIFICANT RESULTS -
nonsignficant results only meaningful if power is
high
  • POWER SURVEYS / META-ANALYSES
  • - low power undermines the confidence that can be
    placed in statistically significant results

14
Practical Exercise 1
  • Calculation of power for simple case-control
    association study.
  • DATA allele frequency of A allele for cases
    and controls
  • TEST 2-by-2 contingency table chi-squared
  • (1 degree of freedom)

15
Step 1 determine expected chi-squared
  • Hypothetical allele frequencies
  • Cases P(A) 0.68
  • Controls P(A) 0.54
  • Sample 150 cases, 150 controls
  • Excel spreadsheet faculty drive\pshaun\chisq.xl
    s

Chi-squared statistic 12.36
16
Step 2. Determine the critical value for a given
type I error rate, ?
- inverse central chi-squared distribution
P(T)
Critical value
T
17
  • http//workshop.colorado.edu/pshaun/gpc/pdf.html
  • df 1 , NCP 0
  • ? X
  • 0.05
  • 0.01
  • 0.001

3.84146
6.63489
10.82754
18
Step 3. Determine the power for a given critical
value and non-centrality parameter
- non-central chi-squared distribution
P(T)
Critical value
T
19
Determining power
  • df 1 , NCP 12.36
  • ? X Power
  • 0.05 3.84146
  • 0.01 6.6349
  • 0.001 10.827

0.94
0.83
0.59
20
Exercises
  • Using the spreadsheet and the chi-squared
    calculator, what is power (for the 3 levels of
    alpha)
  • 1. if the sample size were 300 for each group?
  • 2. if allele frequencies were 0.24 and 0.18 for
    750 cases and 750 controls?

21
Answers
  • 1. NCP 24.72 ? Power
  • 0.05 1.00
  • 0.01 0.99
  • 0.001 0.95
  • 2. NCP 16.27 ? Power
  • 0.05 0.98
  • 0.01 0.93
  • 0.001 0.77
  • nb. Stata di 1-nchi(df,NCP,invchi(df,?))

22
QTL linkage
23
Power of tests
  • For chi-squared tests on large samples, power is
    determined by non-centrality parameter (?) and
    degrees of freedom (df)
  • ? E(2lnL1 - 2lnL0)
  • E(2lnL1 ) - E(2lnL0)
  • where expectations are taken at asymptotic values
    of maximum likelihood estimates (MLE) under an
    assumed true model

24
Linkage test
  • HA
  • H0

for ij
for i?j
for ij
for i?j
25
Expected log likelihood under H0
Expectation of the quadratic product is simply s,
the sibship size (note standarised trait)
26
Expected log likelihood under HA
27
Linkage test
Expected NCP
28
Approximation of NCP
NCP per sib pair is proportional to - the of
pairs in the sibship (large sibships are
powerful) - the square of the additive QTL
variance (decreases rapidly for QTL of v.
small effect) - the sibling correlation (stru
cture of residual variance is important)
29
QTL linkage
POWER
Type I errors Type II errors Sample N Effect
Size

Variance explained
Marker vs functional variant
30
Incomplete linkage
  • The previous calculations assumed analysis was
    performed at the QTL.
  • - imagine that the test locus is not the QTL
  • but is linked to it.
  • Calculate sib-pair IBD distribution at the QTL,
    conditional on IBD at test locus,
  • - a function of recombination fraction

31
? at QTL
0
1/2
1
? at M
0
1/2
1
32
  • Use conditional probabilities to calculate the
    sib correlation conditional on IBD sharing at the
    test marker. For example for IBD 0 at marker

? at QTL
0
1/2
1
C0
33
  • The noncentrality parameter per sib pair is then
    given by

34
  • If the QTL is additive, then
  • attenuation of the NCP is by a factor of (1-2?)4
  • square of the correlation
  • between the proportions of alleles IBD
  • at two loci with recombination fraction ?

35
Effect of incomplete linkage
36
Effect of incomplete linkage
37
Comparison to H-E
  • Amos Elston (1989) H-E regression
  • - 90 power (at significant level 0.05)
  • - QTL variance 0.5
  • - marker and major gene are completely linked
  • ? 320 sib pairs
  • ? 778 sib pairs if ? 0.1

38
GPC input parameters
  • Proportions of variance
  • additive QTL variance
  • dominance QTL variance
  • residual variance (shared / nonshared)
  • Recombination fraction ( 0 - 0.5 )
  • Sample size Sibship size ( 2 - 5 )
  • Type I error rate
  • Type II error rate

39
GPC output parameters
  • Expected sibling correlations
  • - by IBD status at the QTL
  • - by IBD status at the marker
  • Expected NCP per sibship
  • Power
  • - at different levels of alpha given sample
    size
  • Sample size
  • - for specified power at different levels of
    alpha given power

40
From GPC
  • Modelling additive effects only
  • Sibships Individuals
  • Pairs 265 (320) 530
  • Pairs (? 0.1) 666 (778) 1332

Trios (? 0.1) 220 660 Quads (?
0.1) 110 440 Quints (? 0.1) 67 335
41
Practical Exercise 2
  • What is the effect on power to detect linkage of
  • 1. QTL variance?
  • 2. residual sibling correlation?
  • 3. marker-QTL recombination fraction?

42
Pairs required (?0, p0.05, power0.8)
43
Pairs required (?0, p0.05, power0.8)
44
Effect of residual correlation
  • QTL additive effects account for 10 trait
    variance
  • Sample size required for 80 power (?0.05)
  • No dominance
  • ? 0.1
  • A residual correlation 0.35
  • B residual correlation 0.50
  • C residual correlation 0.65

45
Individuals required
46
Selective genotyping
Unselected
Proband Selection
EDAC
Maximally Dissimilar
ASP
Extreme Discordant
EDAC
Mahanalobis Distance
47
Selective genotyping
  • The power calculations so far assume an
    unselected population.
  • - calculate expected NCP per sibship
  • If we have a sample with trait scores
  • - calculate expected NCP for each sibship
    conditional on trait values
  • - this quantity can be used to rank order the
    sample for genotying

48
Sibship informativeness sib pairs
49
Sibship informativeness sib pairs
dominance
rare recessive
unequal allele frequencies
50
Selective genotyping
ASP
MaxD
PS
ED
EDAC
MDis
SEL B
SEL T
p
d/a
.5
0
15.82
.1
0
17.10
.25
0
15.45
.1
16.88
1
.25
15.76
1
.5
1
18.89
.75
1
27.64
43.16
.9
1
51
Impact of selection
52
QTL linkage
POWER
Type I errors Type II errors Sample N Effect
Size
Variance explained
Marker vs functional variant
Locus informativeness
53
Indices of marker informativeness
  • Markers should be highly polymorphic
  • - alleles inherited from different sources are
    likely to be distinguishable
  • Heterozygosity (H)
  • Polymorphism Information Content (PIC)
  • - measure number and frequency of alleles at a
    locus

54
Heterozygosity
  • n number of alleles,
  • pi frequency of the ith allele.
  • H probability that an individual is
    heterozygous

55
Heterozygosity
Allele Frequency
Genotype Frequency 11 0.04 12 0.14 13 0.02 1
4 0.16 22 0.1225 23 0.035 24 0.28 33 0.0025 3
4 0.04 44 0.16
Genotype Frequency 11 12 0.14 13 0.02 14 0
.16 22 23 0.035 24 0.28 33 34 0.04 44
1 0.20 2 0.35 3 0.05 4 0.40
56
Polymorphism information content
  • IF a parent is heterozygous,
  • their gametes will usually be informative.
  • BUT if both parents child are heterozygous for
    the same genotype,
  • origins of childs alleles are ambiguous
  • IF C the probability of this occurring,
  • PIC H - C

57
Polymorphism information content
58
Possible IBD configurations given parental
genotypes
Parental Mating Type
Configuration
Probability
1 Hom ? Hom 1/4 1/2 (1-H)2
2 Hom ? Het 0 1/4 H(1-H)
3 Hom ? Het 1/2 3/4 H(1-H)
4 Het ? Het 0 1/2 H2 /
2
5 Het ? Het 0 0 (H2
-C)/4
6 Het ? Het 1 1 (H2
-C)/4
7 Het ? Het 1/2 1/2 C/2
59
PIC NCP for linkage
  • From the table of possible IBD configurations
    given parental genotypes,
  • Therefore, NCP is attenuated in proportion to
    PIC

60
QTL linkage
POWER
Type I errors Type II errors Sample N Effect
Size

Variance explained
Marker vs functional variant
Locus informativeness
Multipoint
61
Multipoint IBD
  • Estimates IBD sharing at any arbitrary point
    along a chromosomal region, using all available
    marker information on a chromosome
    simultaneously.

62
?, ?, ? and PIC

63
5cM
5cM
5cM
5cM
M1 0.1 0.2 0.7
M2 0.2 0.2 0.2 0.2 0.2
M3 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.2
M4 0.2 0.1 0.1 0.2 0.2 0.2
M5 0.2 0.2 0.2 0.2 0.2
64
3. Calculate covariance matrixbetween pi-hat at
markers
65
4. Consider each multipoint position
  • At each position along the chromosome, calculate
    covariance between trait locus and each of the
    markers

MD 10 15 20 25 30
PIC 0.41 0.77 0.84 0.79 0.77
RF 0.091 0.130 0.165 0.197 0.226
66
Fulker et al multipoint
  • If is a vector of single marker IBD
    estimates then a multipoint IBD estimate at test
    position t is given by
  • Conditional on the variance of ? at the
    test position is reduced by a quantity which can
    be thought of as a multipoint PIC

67
5. Calculate MPIC
68
10 cM map
69
5 cM map
70
Exclusion mapping
  • Exclusion support for the hypothesis that a
    QTL of at least a certain effect is absent at
    that position
  • Normally, the LRT compares the likelihood at the
    MLE and the null
  • In exclusion mapping, the LRT compares the
    likelihood of a fixed effect size against the
    null and therefore can be negative

71
Conclusions
  • Factors influencing power
  • QTL variance
  • Sib correlation
  • Sibship size
  • Marker informativeness
  • Marker density
  • Phenotypic selection
Write a Comment
User Comments (0)
About PowerShow.com