Statistical Digression - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Statistical Digression

Description:

Statistical Digression. The p-value. Under the null hypothesis ... Then TDTm=TDTn. Association between Markers. P(MiNk)=P(Mi)P(Nk) markers are associated when ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 30
Provided by: McI86
Category:

less

Transcript and Presenter's Notes

Title: Statistical Digression


1
Statistical Digression
  • The p-value
  • Under the null hypothesis the probability that
    you observe your data or something more extreme
  • Distribution of the test statistic under the null
    hypothesis (integrates to 1)
  • F
  • t
  • Chi-Square

2
The Decision
  • Reject the null - fail to reject the null
  • Truth versus decision
  • H0 no change
  • H1 difference

The Decision H0 H1
Significance (no diff) (diff)
level H0 (no diff) a The
truth H1 (diff) b (1-b)
Power
3
Distribution of the test statistic under the
alternative hypothesis
  • Non-centrality parameter

Null distribution
Alternative distribution
a
B
4
Allele Frequency
  • Affects power!
  • Controlled crosses work very well because they
    control allele frequency.

Disease Status (no
dis) (disease) Allele1 n11 n12
n1. Marker Type Allele 2 n21 n22
n2.
Power is maximized when n1.n2.
5
Effective population size
  • the number of individuals in a population who
    contribute offspring to the next generation
  • what matters is the chance that two copies of a
    gene will be sampled as the next generation is
    produced, and this is affected by the breeding
    structure of the population
  • The number of offspring any individual leaves to
    the next generation is a binomial random variable
  • population of size N there will be 2N alleles at
    any one locus
  • IBS- identical by state
  • IBD-identical by descent
  • Frequency of the two sexes (if they exist in your
    organism)
  • Overlapping generations?
  • Non-random mating success

6
Ne
  • Small
  • Number of haplotypes is small
  • Linkage disequllibrium is high
  • Big
  • Number of haplotypes is large
  • Linkage disequillibrium is low

7
Linkage Disequillibrium
  • In an association mapping context should it be
    big or small?

8
Socialized medicine
  • Track everyone in the population
  • Know who is related and how
  • Know disease phenotypes
  • ICELAND
  • Denmark
  • Finland
  • The Netherlands
  • Norway

9
Iceland sells its medical records, pitting
privacy against greater good
  •  
  • March 3, 2000CNN.com
  • Web posted at 409 a.m. EST (0909 GMT)
  • From staff reports
  • REYKJAVIK, Iceland (CNN) -- Iceland has sold the
    medical and genealogy records of its 275,000
    citizens to a private medical research company,
    turning the entire nation into a virtual petri
    dish in hopes of finding cures to diseases that
    have afflicted humans for ages.

10
Other isolated populations
  • Tonga sells genetic heritage to Australian firm. 
    The Lancet, Volume 356, Issue 9245, Pages
    1910-1910
  • Estonia sells its genetic database Archive -
    Friday, 10 November , 2000

11
Example
  • Association of variants of transcription factor
    7-like 2 (TCF7L2) with susceptibility to type 2
    diabetes in the Dutch Breda cohort  van
    Vliet-Ostaptchouk JV (van Vliet-Ostaptchouk, J.
    V.), Shiri-Sverdlov R (Shiri-Sverdlov, R.),
    Zhernakova A (Zhernakova, A.), Strengman E
    (Strengman, E.), van Haeften TW (van Haeften, T.
    W.), Hofker MH (Hofker, M. H.), Wijmenga C
    (Wijmenga, C.) DIABETOLOGIA 50 (1) 59-62 JAN
    2007
  • Aim/hypothesis A strong association between
    susceptibility to type 2 diabetes and common
    variants of transcription factor 7-like 2
    (TCF7L2), encoding an enteroendocrine
    transcription factor involved in glucose
    homeostasis, has been reported in three different
    populations (Iceland, Denmark and USA) by Grant
    et al. We aimed to replicate these findings in a
    Dutch cohort.
  • Methods We analysed the genotypes of two intronic
    single nucleotide polymorphisms (SNPs) in TCF7L2
    gene in 502 unrelated type 2 diabetes patients
    and in a set of healthy controls (n 920). The
    two SNPs showed almost complete linkage
    disequilibrium (D' 0.91).
  • Results We were able to replicate the previously
    reported association in our Breda cohort. The
    minor alleles of both variants were significantly
    over-represented in cases (odds ratio OR 1.29,
    95 CI 1.09-1.52, p 10(-3) for rs12255372 OR
    1.41, 95 CI 1.19-1.66, p 4.4 x 10(-5) for
    rs7903146). In addition, TCF7L2 haplotypes were
    analysed for association with the disease. The
    analysis of haplotypes did not reveal any strong
    association beyond that expected from analysing
    individual SNPs. The TT
  • Conclusions/interpretation Our data strongly
    confirm that variants of the TCF7L2 gene
    contribute to the risk of type 2 diabetes. The
    population-attributable risk from this factor in
    the Dutch type 2 diabetes population is 10.

12
Only humans?
  • Fragmented habitat
  • Ecological niches
  • Bottlenecks
  • TREES Neale DB, Savolainen OAssociation genetics
    of complex traits in conifers TRENDS IN PLANT
    SCIENCE 9 (7) 325-330 JUL 2004

13
Admixture
  • Historically separated populations come together
  • Haplotypes are in LD

14
Collaborative Cross
Nat Genet. 2004 Nov36(11)1133-7.
15
Recombination
  • Breaks up LD
  • Good for finding the gene

16
Disease
A
U
M1
n11
n12
Marker
M2
n21
n22
Sampling and population stratification can
cause problems
17
The TDT
  • Based upon Simplex pedigree data-no sampling
    problem
  • Conditional upon the parental alleles -population
    stratification will not bias results
  • Spielman and Ewens (1993), Terwillinger and Ott
    (1993)

18
PT
2 2 1 1
1 2 2 1
MT
PU
MU
Father
Mother
12 22
Affected Child
For any Simplex Family there is a set of alleles
that is transmitted (T) to the affected child and
a set of alleles that were not transmitted
(U). If phase is known, maternally transmitted
(MT) can be distinguished from paternally
transmitted (PT).
19
Not Transmitted
M1
M2
a
b
M1
Transmitted
M2
c
d
TDT(b-c)2 bc
McNemars Test ?21
20
The Noncentrality Parameter
  • Kaplan, Martin and Weir 1997 show that the
    noncentrality parameter ? for the TDT is
    2N(1-??)I
  • where ??is the recombination between the marker
    and the disease in the population, I is the
    association between affected status and the
    marker, and N is the number of affected children

21
Extensions of the TDT
  • Multiple Alleles Sham and Curtis 1995, Spielman
    and Ewens 1996

22
Multiple Testing
  • Each Marker is tested separately for association
    with the disease
  • This is a multiple testing situation
  • Lander and Kruglyak (1997), Spielman and Ewens
    (1996), Risch and Merikangas (1997)

23
Bonferroni?
  • The standard Bonferroni procedure is to adjust
    the significance level of the test by the number
    of tests performed
  • ?T is the experimentwide type I error rate
  • ?T/l is the Bonferroni corrected type I error for
    each test performed

24
Bonferroni and power
25
When are the individual TDT tests independent
under the null?
  • The null is no association between the marker and
    the disease
  • Tests are independent if the markers are not
    associated
  • Tests are NOT independent if the markers are
    associated

26
Dense Concentration of Markers
Tests in this region are unlikely to be
independent
27
An extreme example
  • If marker M has two alleles with frequencies .7
    and .3
  • If marker N has two alleles with frequencies .7
    and .3
  • If P(M1N1).7, P(M1N2)0,P(M2N1)0, and
    P(M2N2).3
  • Then TDTmTDTn

28
Association between Markers
  • P(MiNk)P(Mi)P(Nk)??markers are associated when
    ??0
  • This can happen due to recent population
    admixture (Mexican Americans), isolated
    populations (Finns, Old Order Amish)
  • This can also happen if markers are tightly linked

29
TDTMAX
  • TDTMAX is defined as the maximum TDT statistic in
    the set of all tests conducted (all markers
    biallelic)
  • Calculate the TDT for each marker locus
  • Pick the Maximum
  • For multiple alleles- minimum p-value is
    equivalent
Write a Comment
User Comments (0)
About PowerShow.com