Title: Introduction to QTL mapping
1Introduction to QTL mapping
Manuel Ferreira
Boulder Introductory Course 2006
2Outline
1. Aim
2. The Human Genome
3. Principles of Linkage Analysis
4. Parametric Linkage Analysis
5. Nonparametric Linkage Analysis
31. Aim
4QTL mapping
LOCALIZE and then IDENTIFY a locus that regulates
a trait (QTL)
Nucleotide or sequence of nucleotides with
variation in the population, with different
variants associated with different trait levels.
5For a heritable trait...
Linkage
localize region of the genome where a QTL that
regulates the trait is likely to be harboured
Family-specific phenomenon Affected individuals
in a family share the same ancestral
predisposing DNA segment at a given QTL
identify a QTL that regulates the trait
Association
Population-specific phenomenon Affected
individuals in a population share the same
ancestral predisposing DNA segment at a given QTL
62. Human Genome
7DNA structure
A DNA molecule is a linear backbone of
alternating sugar residues and phosphate groups
Attached to carbon atom 1 of each sugar is a
nitrogenous base A, C, G or T
Two DNA molecules are held together in
anti-parallel fashion by hydrogen bonds between
bases Watson-Crick rules
Antiparallel double helix
A gene is a segment of DNA which is transcribed
to give a protein or RNA product
Only one strand is read during gene transcription
Nucleotide 1 phosphate group 1 sugar 1 base
8DNA polymorphisms
Microsatellites gt100,000 Many alleles, (CA)n,
very informative, even, easily automated
SNPs 10,054,521 (25 Jan 05) 10,430,753 (11 Mar
06) Most with 2 alleles (up to 4), not
very informative, even, easily automated
A
B
9DNA organization
22 1
2 (22 1)
2 (22 1)
2 (22 1)
?
?
?
A -
A -
A -
?
B -
?
?
?
?
Mitosis
B -
B -
chr1
A -
A -
A -
A -
- A
- A
?
?
?
B -
B -
B -
B -
- B
- B
A -
- A
- A
B -
- B
chr1
- B
G1 phase
S phase
M phase
Haploid gametes
Diploid zygote 1 cell
Diploid zygote gt1 cell
10DNA recombination
22 1
22 1
A -
NR
(?)
B -
A -
- A
chr1
2 (22 1)
2 (22 1)
B -
- B
?
- A
Meiosis
R
chr1
(?)
(?)
?
?
- B
A -
A -
- A
- A
chr1
B -
B -
- B
- B
A -
R
chr1
chr1
chr1
chr1
(?)
A -
- A
B -
chr1
Diploid gamete precursor cell
B -
- B
- A
chr1
NR
- B
Haploid gamete precursors
chr1
Hap. gametes
11DNA recombination between linked loci
22 1
A -
NR
B -
(?)
A -
- A
B -
- B
2 (22 1)
?
- A
Meiosis
NR
- B
(?)
(?)
?
?
A -
A -
- A
- A
B -
B -
- B
- B
A -
NR
B -
(?)
A -
- A
B -
- B
Diploid gamete precursor
- A
- B
NR
Haploid gamete precursors
Hap. gametes
12Human Genome - summary
DNA is a linear sequence of nucleotides
partitioned into 23 chromosomes Two copies of
each chromosome (2x22 autosomes XY),
from paternal and maternal origins. During
meiosis in gamete precursors, recombination can
occur between maternal and paternal homologs
Recombination fraction between loci A and B
(?) Proportion of gametes produced that are
recombinant for A and B If A and B are very far
apart 50R50NR - ? 0.5 If A and B are very
close together lt50R - 0 ? lt 0.5
Recombination fraction (?) can be converted to
genetic distance (cM) Haldane
eg. ?0.17, cM20.8 Kosambi eg.
?0.17, cM17.7
133. Principles of Linkage Analysis
14Linkage Analysis requires genetic markers
Q
M1
Mn
M2
0.5
.4
.3
.3
.4
0.5
?
0.5
.15
M1
Mn
M2
.35
.35
.22
.26
0.5
?
0.5
0.5
.4
.3
.3
.4
.1
M1
Mn
M2
15Linkage Analysis Parametric vs. Nonparametric
Gene
Chromosome
Recombination
Genetic factors
Q
M
A
Mode of inheritance
Correlation
D
Phe
C
E
Environmental factors
Adapted from Weiss Terwilliger 2000
164. Parametric Linkage Analysis
17Linkage with informative phase known meiosis
Gene
Chromosome
?
?
M1..6
Q1,2
Autosomal dominant, Q1 predisposing allele
M2M5Q2Q2
M1M6Q1Q?
M1
Q1
Informative Phase known
M1Q1/M2Q2
M3M4Q2Q2
M1M2Q1Q2
M2
Q2
M1Q1/M3Q2
M2Q2/M3Q2
M1Q1/M4Q2
M1Q1/M4Q2
M2Q2/M4Q2
M2Q1/M3Q2
NR M1Q1
NR M2Q2
(20.8 cM)
?MQ 1/6 0.17
R M1Q2
R M2Q1
18Linkage with informative phase unknown meiosis
M1
Q1
M1
Q2
Q2Q2
Q1Q?
M2
Q2
M2
Q1
Informative Phase unknown
M1Q1/M2Q2
M1Q2/M2Q1
M1M2Q1Q2
M3M4Q2Q2
M1Q1/M3Q2
M2Q2/M3Q2
M1Q1/M4Q2
M1Q1/M4Q2
M2Q2/M4Q2
M2Q1/M3Q2
M1Q1/M2Q2
M1Q2/M2Q1
P
P
N
N
½(1-?)
R M1Q1
½?
3
3
NR M1Q1
NR M2Q2
R M2Q2
½?
2
2
½(1-?)
R M1Q2
½?
NR M1Q2
0
0
½(1-?)
R M2Q1
NR M2Q1
½?
1
1
½(1-?)
19Parametric LOD score calculation
Overall LOD score for a given ? is the sum of all
family LOD scores at ?
eg. LOD3 for ?0.28
20Parametric Linkage Analysis - summary
Q
M1
M2
Mn
.3
.4
?
0.5
0.5
.4
.3
0.5
.1
For each marker, estimate the ? that yields
highest LOD score across all families
This ? (and the LOD) will depend upon the mode of
inheritance assumed MOI determines the genotype
at the trait locus Q and thus determines
the number of meiosis which are recombinant or
nonrecombinant. Limited to Mendelian diseases.
Markers with a significant parametric LOD score
(gt3) are said to be linked to the trait locus
with recombination fraction ?
21Outline
1. Aim
2. The Human Genome
3. Principles of Linkage Analysis
4. Parametric Linkage Analysis
5. Nonparametric Linkage Analysis
225. Nonparametric Linkage Analysis
23Approach
Parametric genotype marker locus genotype
trait locus (latter inferred from phenotype
according to a specific disease model) Parameter
of interest ? between marker and trait loci
Nonparametric genotype marker locus
phenotype If a trait locus truly regulates the
expression of a phenotype, then two relatives
with similar phenotypes should have similar
genotypes at a marker in the vicinity of the
trait locus, and vice-versa. Interest
correlation between phenotypic similarity and
marker genotypic similarity
No need to specify mode of inheritance, allele
frequencies, etc...
24Phenotypic similarity between relatives
Squared trait differences
Squared trait sums
Trait cross-product
Trait variance-covariance matrix
Affection concordance
T2
T1
25Genotypic similarity between relatives
IBS Alleles shared Identical By State look the
same, may have the same DNA sequence but they
are not necessarily derived from a known common
ancestor
M3
M1
M2
M3
Q3
Q1
Q2
Q4
IBD Alleles shared Identical By Descent are
a copy of the same ancestor allele
M1
M2
M3
M3
Q1
Q2
Q3
Q4
IBS
IBD
M1
M3
M1
M3
2
1
Q1
Q3
Q1
Q4
0
0
0
1
1
Inheritance vector (M)
26Genotypic similarity between relatives -
Number of alleles shared IBD
Proportion of alleles shared IBD -
Inheritance vector (M)
M2
M3
M1
M3
0
0
0
0
1
1
Q2
Q4
Q1
Q3
M1
M3
M1
M3
0.5
0
0
0
1
1
Q1
Q3
Q1
Q4
M1
M1
M3
M3
2
1
0
0
0
0
Q1
Q1
Q3
Q3
27Genotypic similarity between relatives -
A
B
C
D
22n
28Practical
Aim
(1) Estimate IBD with MERLIN (2) IBD estimation
can be influenced by genotyped individuals and
allele frequencies (3) compute
H\manuel - Copy folder Linkage to C\
1. Open with Notepad pr1.ped pr1.dat
pr1.map pr1.freq 2. StartgtRungtC/Linkage/pfe32.
exe 3. Run Command Prompt 4. Keep a File Explorer
window open
Exercice1
(1) Estimate IBD for pedigrees A, B and C in the
previous slide
(2) Change allele frequencies (pr1.freq) from
0.25 0.25 0.25 0.25 to (i) 0.45 0.25 0.25 0.05
and (ii) 0.05 0.25 0.25 0.45
29Practical
A1A2
A3A4
A1A3
A1A3
A2A4
Exercice 2
(1) Modify pr1.ped and estimate IBD probabilities
and between twin 1 and twin 2 for pedigrees
E, F and G
E
F
G
A1A2
A1A3
A1A3
A1A3
A1A3
A1A3
A1A3
A2A4
P(IBD0)
0.08
0.00
0.00
P(IBD1)
0.31
0.20
0.00
P(IBD2)
0.61
0.80
1.00
0.77
0.90
1.00
Allele frequencies on pr1.freq 0.25 0.25 0.25
0.25
30M1
Mn
M2
IBD at a marker Singlepoint IBD
5 cM
M1
Mn
M2
IBD at a grid Multipoint IBD
31Statistics that incorporate both phenotypic and
genotypic similarities
Phenotypic similarity
0
0.5
1
Genotypic similarity ( )
32Haseman-Elston regression Quantitative traits
0.5
1
0
Phenotypic dissimilarity
Genotypic similarity
b
c
33VC ML Quantitative Categorical traits
method
0.5
1
0
H1
H0
e.g. LOD3
34Genome-wide linkage analysis (e.g. VC)
Individual LOD scores can be expressed as P
values (Pointwise) LOD Chi-sq (n-df) P
value 2.1 9.67 0.0009
(x4.6)
True positive
Theoretical (Lander Kruglyak 1995)
k
LOD
LOD 3.6, Chi-sq 16.7, P 0.000022
Type I error
35Nonparametric Linkage Analysis - summary
No need to specify mode of inheritance
Models phenotypic and genotypic similarity of
relatives
Expression of phenotypic similarity, calculation
of IBD
HE and VC are the most popular statistics used
for linkage of quantitative traits
Other statistics available, specially for
affection traits