Title: Statistical Digression
1Statistical Digression
- The p-value
- Under the null hypothesis the probability that
you observe your data or something more extreme - Distribution of the test statistic under the null
hypothesis (integrates to 1) - F
- t
- Chi-Square
2The Decision
- Reject the null - fail to reject the null
- Truth versus decision
- H0 no change
- H1 difference
The Decision H0 H1
Significance (no diff) (diff)
level H0 (no diff) a The
truth H1 (diff) b (1-b)
Power
3Distribution of the test statistic under the
alternative hypothesis
Null distribution
Alternative distribution
a
B
4Allele Frequency
- Affects power!
- Controlled crosses work very well because they
control allele frequency.
Disease Status (no
dis) (disease) Allele1 n11 n12
n1. Marker Type Allele 2 n21 n22
n2.
Power is maximized when n1.n2.
5Effective population size
- the number of individuals in a population who
contribute offspring to the next generation - what matters is the chance that two copies of a
gene will be sampled as the next generation is
produced, and this is affected by the breeding
structure of the population - The number of offspring any individual leaves to
the next generation is a binomial random variable
- population of size N there will be 2N alleles at
any one locus - IBS- identical by state
- IBD-identical by descent
- Frequency of the two sexes (if they exist in your
organism) - Overlapping generations?
- Non-random mating success
6Ne
- Small
- Number of haplotypes is small
- Linkage disequllibrium is high
- Big
- Number of haplotypes is large
- Linkage disequillibrium is low
7Linkage Disequillibrium
- In an association mapping context should it be
big or small?
8Socialized medicine
- Track everyone in the population
- Know who is related and how
- Know disease phenotypes
- ICELAND
- Denmark
- Finland
- The Netherlands
- Norway
9Iceland sells its medical records, pitting
privacy against greater good
- Â
- March 3, 2000CNN.com
- Web posted at 409 a.m. EST (0909 GMT)
- From staff reports
- REYKJAVIK, Iceland (CNN) -- Iceland has sold the
medical and genealogy records of its 275,000
citizens to a private medical research company,
turning the entire nation into a virtual petri
dish in hopes of finding cures to diseases that
have afflicted humans for ages.
10Other isolated populations
- Tonga sells genetic heritage to Australian firm.Â
The Lancet, Volume 356, Issue 9245, Pages
1910-1910 - Estonia sells its genetic database Archive -
Friday, 10 November , 2000
11Example
- Association of variants of transcription factor
7-like 2 (TCF7L2) with susceptibility to type 2
diabetes in the Dutch Breda cohort  van
Vliet-Ostaptchouk JV (van Vliet-Ostaptchouk, J.
V.), Shiri-Sverdlov R (Shiri-Sverdlov, R.),
Zhernakova A (Zhernakova, A.), Strengman E
(Strengman, E.), van Haeften TW (van Haeften, T.
W.), Hofker MH (Hofker, M. H.), Wijmenga C
(Wijmenga, C.) DIABETOLOGIA 50 (1) 59-62 JAN
2007 - Aim/hypothesis A strong association between
susceptibility to type 2 diabetes and common
variants of transcription factor 7-like 2
(TCF7L2), encoding an enteroendocrine
transcription factor involved in glucose
homeostasis, has been reported in three different
populations (Iceland, Denmark and USA) by Grant
et al. We aimed to replicate these findings in a
Dutch cohort. - Methods We analysed the genotypes of two intronic
single nucleotide polymorphisms (SNPs) in TCF7L2
gene in 502 unrelated type 2 diabetes patients
and in a set of healthy controls (n 920). The
two SNPs showed almost complete linkage
disequilibrium (D' 0.91). - Results We were able to replicate the previously
reported association in our Breda cohort. The
minor alleles of both variants were significantly
over-represented in cases (odds ratio OR 1.29,
95 CI 1.09-1.52, p 10(-3) for rs12255372 OR
1.41, 95 CI 1.19-1.66, p 4.4 x 10(-5) for
rs7903146). In addition, TCF7L2 haplotypes were
analysed for association with the disease. The
analysis of haplotypes did not reveal any strong
association beyond that expected from analysing
individual SNPs. The TT - Conclusions/interpretation Our data strongly
confirm that variants of the TCF7L2 gene
contribute to the risk of type 2 diabetes. The
population-attributable risk from this factor in
the Dutch type 2 diabetes population is 10.
12Only humans?
- Fragmented habitat
- Ecological niches
- Bottlenecks
- TREES Neale DB, Savolainen OAssociation genetics
of complex traits in conifers TRENDS IN PLANT
SCIENCE 9 (7) 325-330 JUL 2004
13Admixture
- Historically separated populations come together
- Haplotypes are in LD
14Collaborative Cross
Nat Genet. 2004 Nov36(11)1133-7.
15Recombination
- Breaks up LD
- Good for finding the gene
16Disease
A
U
M1
n11
n12
Marker
M2
n21
n22
Sampling and population stratification can
cause problems
17The TDT
- Based upon Simplex pedigree data-no sampling
problem - Conditional upon the parental alleles -population
stratification will not bias results - Spielman and Ewens (1993), Terwillinger and Ott
(1993)
18PT
2 2 1 1
1 2 2 1
MT
PU
MU
Father
Mother
12 22
Affected Child
For any Simplex Family there is a set of alleles
that is transmitted (T) to the affected child and
a set of alleles that were not transmitted
(U). If phase is known, maternally transmitted
(MT) can be distinguished from paternally
transmitted (PT).
19Not Transmitted
M1
M2
a
b
M1
Transmitted
M2
c
d
TDT(b-c)2 bc
McNemars Test ?21
20The Noncentrality Parameter
- Kaplan, Martin and Weir 1997 show that the
noncentrality parameter ? for the TDT is
2N(1-??)I - where ??is the recombination between the marker
and the disease in the population, I is the
association between affected status and the
marker, and N is the number of affected children
21Extensions of the TDT
- Multiple Alleles Sham and Curtis 1995, Spielman
and Ewens 1996
22Multiple Testing
- Each Marker is tested separately for association
with the disease - This is a multiple testing situation
- Lander and Kruglyak (1997), Spielman and Ewens
(1996), Risch and Merikangas (1997)
23Bonferroni?
- The standard Bonferroni procedure is to adjust
the significance level of the test by the number
of tests performed - ?T is the experimentwide type I error rate
- ?T/l is the Bonferroni corrected type I error for
each test performed
24Bonferroni and power
25When are the individual TDT tests independent
under the null?
- The null is no association between the marker and
the disease - Tests are independent if the markers are not
associated - Tests are NOT independent if the markers are
associated
26Dense Concentration of Markers
Tests in this region are unlikely to be
independent
27An extreme example
- If marker M has two alleles with frequencies .7
and .3 - If marker N has two alleles with frequencies .7
and .3 - If P(M1N1).7, P(M1N2)0,P(M2N1)0, and
P(M2N2).3 - Then TDTmTDTn
28Association between Markers
- P(MiNk)P(Mi)P(Nk)??markers are associated when
??0 - This can happen due to recent population
admixture (Mexican Americans), isolated
populations (Finns, Old Order Amish) - This can also happen if markers are tightly linked
29TDTMAX
- TDTMAX is defined as the maximum TDT statistic in
the set of all tests conducted (all markers
biallelic) - Calculate the TDT for each marker locus
- Pick the Maximum
- For multiple alleles- minimum p-value is
equivalent