Title: Zoology 2005 Part 2
1Zoology 2005 Part 2
2Inbred Mouse Strain Haplotype Structure
- When the genomes of a pair of inbred strains are
compared, - we find a mosaic of segments of identity and
difference (Wade et al, Nature 2002). - A QTL segregating between the strains must lie in
a region of sequence difference. - What happens when we compare more than two
strains simultaneously?
3No Simple Haplotype Block Mosaic
Yalcin et al 2004 PNAS
4But a Tree Mosaic
5In-silico Mapping
- Simple idea-
- Collect phenotypes across a set of inbred strains
- Genotype the strains (ONCE)
- Look for phenotype-genotype correlation
- Works well for simple Mendelian traits (eg coat
colour) - Suggested as a panacea for QTL mapping
6In-silico Mapping Problems
- Less well-suited for complex traits
- Number of strains required grows quickly with the
complexity of the trait. Suggested at least 100
strains required, possibly more if epistasis is
present - Require high-density genotype/sequence data to
ensure identity-by-state identity by-descent - May be very useful for the dissection of a QTL
previously identified in a F2 cross (look for
patterns of sequence difference)
7Recombinant Inbred Lines
- Panels of inbred lines descended form pairs of
inbred strains - Genomes are inbred mosaics of the founders
- Lines only need be genotyped once
- Similar to in-silico mapping except
- identity-by-descentidentity-by-state
- Coarser recombination structure
- ?lower resolution mapping?
8BXD chromosome 4
9Testing if a variant is functional without
genotyping it(Yalcin et al, Genetics 2005)
- Requirements
- A Heterogeneous Stock, genotyped at a skeleton of
markers - The genome sequences of the progenitor strains
- A statistical test
10Merge Analysis
- Each polymorphism groups together the founders
according to their alleles - If the polymorphism is functional, then a model
in which the phenotypic strain effects are
estimated after merging the strains together
should be as good as a model where each strain
can have an independent effect. - Compare the fit of merged and unmerged
genetic models to test if the variant is
functional. - If the fit of the merged model is poor then that
variant can be eliminated.
11Merge Analysis
12Merge Analysis
13How can we show a gene under a QTL peak affects
the trait?
- Genetic Mapping identifies Functional Variants,
not Genes - Could be a control element affecting some other
gene
14Quantitative Complementation
KO
0
15Quantitative Complementation
KO
wt
High
Low
30
0
50
100
16Quantitative Complementation
KO
wt
High
Low
d
30
0
50
100
17Quantitative Complementation
KO
wt
High
Low
d
d
30
0
50
100
D d - d
18Quantitative Complementation
KO
wt
High
Low
d
d
30
0
50
100
D d - d
19Using Functional Information to Confirm Genes
- Further experiments
- further bioinformatics, eg networks, functional
annotation (GO, KEGG) - candidate gene sequencing
- gene expression analyses (eQTL) of
- founder strains
- HS
20Mouse/human sequence comparison
21Enhancer reporter assays
luciferase reporter
promoter
enhancer
promoter
enhancer
luciferase reporter
22Enhancer elements affect promoter expression
23Large-Scale Genetic Mapping
- Using a Heterogeneous Stock
- Multiple Phenotypes collected in parallel
24Predictions (from simulation of an HS population)
- In a population of 1,000 HS animals
- Genome-wide power to detect 5 QTL 0.92
- Resolution lt 2 Mb
25Study design
- 2,000 mice
- 15,000 diallelic markers
- More than 100 phenotypes
- each mouse subject to a battery of tests spread
over weeks 5-9 of the animals life - more (post-mortem) phenotypes being added
26Phenotypes
27Covariates
- For each phenotype, we recorded covariates, eg,
- experimenter
- time of day
- apparatus (eg, Shock Chamber 3)
28Data collection
- All animals microchipped
- Automated data checking, processing and uploading
- All data uploaded into the Integrated Genotyping
System (IGS) database
29Genotypes from Illumina
- Genotyped and phenotyped 2,000 offspring
- Genotyped 300 parents
- Pedigree analysis shows genotyping was 99.99
accurate - 11, 558 markers polymorphic in HS
30QTL mapping
- Models
- HAPPY and single marker association
- Fitting framework
- Linear regression of (transformed) phenotypes
- Survival analysis for latency data
- Logit-based models for categorical data
- Significant covariates incorporated into the null
model, eg
Startle TestChamber BodyWeight Year Age
Hour Gender
Null
additive genetic info for locus
Additive
Null
full genetic info for locus
Full
Null
31QTL mapping
- Significance tests
- partial F-test (linear models), Chi-square / LRT
(others) - Significance thresholds
- different for each phenotype
- have to take into account LD
- fit distribution to scores of permuted data
32E-values
- We set score thresholds using ideas from sequence
databank search programs such as BLAST
33E-values
- We set score thresholds using ideas from sequence
databank search programs such as BLAST - The E-value of a threshold is the number of times
you would expect to see a false positive exceed
the threshold in a genome scan
34E-values
- We set score thresholds using ideas from sequence
databank search programs such as BLAST - The E-value of a threshold is the number of times
you would expect to see a false positive exceed
the threshold in a genome scan - Applying the Bonferroni correction to the number
of marker intervals is too severe because LD
makes neighbouring scores correlated.
35E-values
- We set score thresholds using ideas from sequence
databank search programs such as BLAST - The E-value of a threshold is the number of times
you would expect to see a false positive exceed
the threshold in a genome scan - Applying the Bonferroni correction to the number
of marker intervals is too severe because LD
makes neighbouring scores correlated. - Permutation analyses indicate the score of the
most significant expected random score amongst
all 12000 marker intervals behaves as if it was
drawn from M4000 independent tests.
36E-values
- We set score thresholds using ideas from sequence
databank search programs such as BLAST - The E-value of a threshold is the number of times
you would expect to see a false positive exceed
the threshold in a genome scan - Applying the Bonferroni correction to the number
of marker intervals is too severe because LD
makes neighbouring scores correlated. - Permutation analyses indicate the score of the
most significant expected random score amongst
all 12000 marker intervals behaves as if it was
drawn from M4000 independent tests. - Hence a nominal P-value of p corresponds to an
E-value of pM
37Problems
- Our population includes both siblings and
unrelateds - We have ignored this distinction
- And therefore
- Confounding environmental family effects with
genetic family effects - Allowing ghost peaks due to linkage
disequilibrium between markers within a sibship - Our solution so far
- (1) Investigating the effect of environmental
factors and building covariates into the model - (2) Identify peaks by a multiple conditional fit
38Multiple Peak FittingForward Selection
- For each phenotypes genome scan
- Make list of all peaks gt genome-wide threshold T
- Fit most significant peak, P1
- Go through list of peaks, refitting each on
conditional upon the most significant peak. - Add the most significant remaining peak, P2
- Continue refitting remaining peaks P3 , P4 and
adding them into model until the most significant
remaining peak lt T
39Peaks found by multiple conditional fit
Multiple conditional fit (using additive model
only)
number of phenotypes
40Database for scans
41Database for scans
Additive model
Full model
- E-value thresholds
- additive only
- Elt0.01 is about the same as genome-wide corrected
plt0.01.
42Database for scans
zoom in
43Covariates
44QTL Mapping Validation
- Coat colour
- Detection of known QTLs
45Coat colour genes
46A known QTL HDL
Wang et al, 2003
HS mapping
47High Resolution QTLs
Phenotype Chrom Mb Method Ref HS position
Cue freezing 3 70-83 Genome tagged mice Liu 2003 71-73
Obesity 2 142-168 Congenic Demant 2004 150-153
10 week body mass 1 156-160 Progeny testing Christians 2004 154.5-156
Emotionality 1 143-148 HS Mott 2000 143-144.5
Emotionality 10 123-127 HS Mott 2000 121.5-122.7
Emotionality 12 54-57 HS Mott 2000 55.5-56.5
Emotionality 15 64-77 HS Turri 1999 63.5-66
48New QTLs two examples
- Freeze.During.Tone (from Cue Conditioning
behavioural experiment) 1 peak - of CD4 in CD3 cells (immunology assay)
- 10 peaks
49Cue Conditioning
- Freezing in response to a conditioned stimulus
50Cue Conditioning
- Freeze.During.Tone huge effect, small number of
genes
chr15
cntn1 Contactin precursor (Neural cell
surface protein)
51 CD4 cells in CD3 cells
- huge effect but lots of genes
52 CD4 in CD3 (under peak)
53All QTLs
- 608 peaks
- Median interval is 938,936 bp
- or about 9 genes per peak
54Summary
- The HS project so far has
- phenotyped 2,500 HS mice
- genotyped 2,300 mice
- mapped over 140 phenotypes
- identified more than 600 potential QTLs
55Confirming gene candidates
- Increased mapping resolution through
- include epistasis
- multivariate
- G x E
- pleiotropy
- sex effects
- Further experiments
- further bioinformatics, eg networks, functional
annotation (GO, KEGG) - candidate gene sequencing
- gene expression analyses (eQTL) of
- founder strains
- HS
56Confirming gene candidates epistasis
Single marker association of pairwise epistasis
57Work of many hands
- Carmen Arboleda-Hitas
- Amarjit Bhomra
- Peter Burns
- Richard Copley
- Stuart Davidson
- Simon Fiddy
- Jonathan Flint
- Polinka Hernandez
- Sue Miller
- Richard Mott
- Chela Nunez
- Gemma Peachey
- Sagiv Shifman
- Leah Solberg
- Amy Taylor
- Martin Taylor
- Jordana Tzenova-Bell
- William Valdar
- Binnaz Yalcin
- Dave Bannerman
- Shoumo Bhattacharya
- Bill Cookson
- Rob Deacon
- Dominique Gauguier
- Doug Higgs
- Tertius Hough
- Paul Klenerman
- Nick Rawlins
- Project funded by
- The Wellcome Trust, UK