Title: Finding the Molecular Basis of Quantitative Genetic Variation
1Finding the Molecular Basis of Quantitative
Genetic Variation
Richard Mott Wellcome Trust Centre for Human
Genetics Oxford UK
2Genetic Traits
- Quantitative (height, weight)
- Dichotomous (affected/unaffected)
- Factorial (blood group)
- Mendelian - controlled by single gene (cystic
fibrosis) - Complex controlled by multiple
genesenvironment (diabetes, asthma)
3Molecular Basis of Quantitative Traits
QTL Quantitative Trait Locus
chromosome
genes
4Molecular Basis ofQuantitative Traits
QTL Quantitative Trait Locus
chromosome
QTG Quantitative Trait Gene
5Molecular Basis ofQuantitative Traits
QTL Quantitative Trait Locus
chromosome
SNP Single Nucleotide Polymorphism
QTG Quantitative Trait Gene
QTN Quantitative Trait Nucleotide
6Association Studies
- Compare unrelated individuals from a population
- Phenotypes
- Cases vs Controls
- Quantitative measure
- Genotypes state of genome at multiple variable
locations (Single Nucleotide Polymorphism SNP)
in each individual - Seek correlation between genotype and phenotype
7Problems with Association Studies
- Population stratification
- Linkage Disequilibrium
- Allele Frequencies
- Multiple loci
- Small Effect Sizes
- Very few Successes
8Population Stratification
- If the sampling population comprises genetically
distinct sub-populations with different disease
prevalences - Then -
- Any variant that distinguishes the
sub-populations is likely to show disease
association
9Admixture Mapping
- Population is homogeneous but each individuals
genome is a mosaic of segments from different
populations - May be used to map disease loci
- multiple sclerosis susceptibility
- Reich et al 2005, Nature Genetics
10Linkage Disequilibrium
Mouse
11Effects of Linkage Disequilibrium
- Correlation between nearby SNPs
- SNPs near to QTN will show association
- Risk of false positive interpretation
- But need only genotype tagging SNPs
- 1 million tagging SNPs will be in LD with 50
of common variants in the human genome
12The Common-Disease Common-Variant Hypothesis
- Says
- disease-predisposing variants will exist at
relatively high frequency (i.e. gt1) in the
population. - are ancient alleles occurring on specific
haplotypes. - detectable in an case-control study using tagging
SNPs. - Alternative hypothesis says
- disease-predisposing alleles are sporadic new
mutations, perhaps around the same genes, on
different haplotypes. - families with history of the same disease owe
their condition to different mutations events. - Theoretically detectable with family-based
strategies which do not assume a common origin
for the disease alleles, but are harder to detect
with case-control studies (Pritchard, 2001).
13Power Depends on
- Disease-predisposing alleles
- Effect Size (Odds Ratio)
- Allele frequency
- Sample Size cases, controls
- Number of tagging SNPs
- To detect an allele with odds ratio of 1.25 and
with allele frequency gt 1, at 5 Bonferroni
genome-wide significance and 80 power, we
require - 6000 cases, 6000 controls
- 0.5 million tagging SNPs, one of which must be
in perfect LD with the causative variant - Hirschorn and Daly 2005
14WTCCCWellcome Trust Case-Control Consortium
- 2000 cases from each of
- Type I Diabetes
- Type II Diabetes
- rheumatoid arthritis,
- susceptibility to TB
- bipolar depression
- . and others
- 3000 common controls
- 0.675 million SNPs
- 10 billion genotypes
- Data expected mid 2006
15Mouse Models
16Map inHuman or Animal Models ?
- Disease studied directly
- Population and environment stratification
- Very many SNPs (1,000,000?) required
- Hard to detect trait loci very large sample
sizes required to detect loci of small effect
(5,000-10,000) - Potentially very high mapping resolution single
gene - Very Expensive
- Animal Model required
- Population and environment controlled
- Fewer SNPs required (100-10,000)
- Easy to detect QTL with 500 animals
- Poorer mapping resolution 1Mb (10 genes)
- Relatively inexpensive
17QTL Mapping in Mice using Inbred Line Crosses
- Genetically Homozygous genome is fixed, breed
true. - Standard Inbred Strains available
- Haplotype diversity is controlled far more than
in human association studies - QTL detection is very easy
- QTL fine mapping is hard
-
18Sizes of Mapped Behavioural QTL in rodents ( of
total phenotypic variance)
19Physiological QTL
20Effect sizes of cloned genes
21QTL detection F2 Intercross
X
A
B
22QTL mapping F2 Intercross
X
X
A
B
F1
23QTL mapping F2 Intercross
X
X
A
B
F1
F2
24QTL mapping F2 Intercross
QTL
1
-1
0
0
0
2
-2
F2
F1
25QTL mapping F2 Intercross
1
-1
0
0
0
2
-2
F2
F1
26QTL mapping F2 Intercross
Genotype a skeleton of markers across genome
20cM
0
0
2
-2
F2
27QTL mapping F2 Intercross
AB AA AB BA
AB BA AB BA
AB BA BA BA
BA BA BA AA
BA BA BA AA
0
0
2
-2
BB BB AB AA
F2
28QTL mapping F2 Intercross
AB AA AB BA
AB BA AB BA
AB BA BA BA
BA BA BA AA
BA BA BA AA
0
0
2
-2
BB BB AB AA
F2
29Single Marker Association
- Test of association between genotype and trait at
each marker position. - ANOVA
- F2 crosses are
- good for detecting QTL
- bad for fine-mapping
- typical mapping resolution 1/3 chromosome 20-30
cM
30Increasing mapping resolution
- Increase number of recombinants
- more animals
- more generations in cross
31Heterogeneous Stocks
- cross 8 inbred strains for gt10 generations
32Heterogeneous Stocks
- cross 8 inbred strains for gt10 generations
33Heterogeneous Stocks
- cross 8 inbred strains for gt10 generations
0.25 cM
34Mosaic Crosses
G3
GN
F20
inbreeding
mixing
chopping up
HS, AI, outbreds
F2, diallele
RI (RIHS, CC)
35Analysis of mosaic crosses
chromosome
markers
alleles
1
1
2
1
2
1
1
1
2
2
1
2
2
1
1
1
1
2
1
1
2
1
1
1
1
1
2
1
2
2
1
2
1
1
- Want to predict ancestral strain from genotype
- We know the alleles in the founder strains
- Single marker association lacks power, cant
distinguish all strains - Multipoint analysis combine data from
neighbouring markers
36Analysis of mosaic crosses
chromosome
markers
alleles
1
1
2
1
2
1
1
1
2
2
1
2
2
1
1
1
1
2
1
1
2
1
1
1
1
1
2
1
2
2
1
2
1
1
- Hidden Markov model HAPPY
- Hidden states ancestral strains
- Observed states genotypes
- Unknown phase of genotypes
- - analyse both chromosomes simultaneously
- Output is probability that a locus is descended
from a pair of strains - Mott et al 2000 PNAS
37Testing for a QTL
- piL(s,t) Prob( animal i is descended from
strains s,t at locus L) - piL(s,t) calculated using
- genotype data
- founder strains alleles
- Phenotype is modelled
- yi Ss,t piL(s,t)T(s,t) Covariatesi ei
- Test for no QTL at locus L
- H0 T(s,t) are all same
- ANOVA
- partial F test
38Example Open Field Avtivity
39(No Transcript)
40OFA Tracking
41(No Transcript)
42multipoint
singlepoint
significance threshold
Talbot et al 1999, Mott et al 2000
43(No Transcript)
44Relation Between Marker and Genetic Effect
QTL
Marker 2
Marker 1
No effect observable
Observable effect
45How Much Mapping Resolution do we need?
46Mapping Resolution in Mouse QTL experiments
- F2
- 25-50 Mb 250-300 genes
- HS
- 1-5 Mb 10-50 genes
- Need More Resolution
47Other Outbred Populations
- Commercially available outbreds may contain more
historical recombination - Potentially finer mapping resolution
- How to exploit it ?
48MF1 Outbred Mice MF1
49Analysis of MF1
50Single Marker Analysis
51Unknown progenitors
- Sometime in the 1970s.
- LACA x CF
- MF1
52MF1 resemble HS
- Sequencing revealed very few new variants in MF1
compared to HS strains - Variants present in HS strains also present in MF1
53MF1 as a mosaic of inbred strains
54Mapping with 30 generation HS
55Mapping with MF1 mice
Yalcin et al 2004 Nature Genetics
56Acknowledgements
- Jonathan Flint
- Binnaz Yalcin
- William Valdar
- Leah Solberg
57Further Reading
- Mouse
- Flint et al Nature Reviews Genetics 2005
- Human
- Hirschhorn and Daly, Nature Reviews Genetics 2005
- Zondervan and Cardon, Nature Reviews Genetics
2004 -