Title: Linkage Analysis: An Introduction
1Linkage AnalysisAn Introduction
- Pak Sham
- Twin Workshop 2001
2Linkage Mapping
- Compares inheritance pattern of trait with the
inheritance pattern of chromosomal regions - First gene-mapping in 1913 (Sturtevant)
- Uses naturally occurring DNA variation
(polymorphisms) as genetic markers - gt400 Mendelian (single gene) disorders mapped
- Current challenge is to map QTLs
3Linkage Co-segregation
A3A4
A1A2
A2A4
A1A3
A2A3
Marker allele A1 cosegregates with dominant
disease
A1A2
A1A4
A3A4
A3A2
4Recombination
A1
Q1
Parental genotypes
A1
Q1
A2
Q2
Likely gametes (Non-recombinants)
A2
Q2
A1
Q2
Unlikely gametes (Recombinants)
A2
Q1
5Recombination of three linked loci
?1 ?2
6Map distance
- Map distance between two loci (Morgans)
- Expected number of crossovers per meiosis
- Note Map distances are additive
7Recombination map distance
Haldane map function
8Methods of Linkage Analysis
- Model-based lod scores
- Assumes explicit trait model
- Model-free allele sharing methods
- Affected sib pairs
- Affected pedigree members
- Quantitative trait loci
- Variance-components models
9Double Backcross Fully Informative Gametes
AABB
aabb
aabb
AaBb
Aabb
AaBb
aabb
aaBb
Non-recombinant
Recombinant
10Linkage Analysis Fully Informative Gametes
Count Data Recombinant Gametes
R Non-recombinant Gametes N Parameter Recombi
nation Fraction ? Likelihood L(?) ?R (1-
?)N Parameter Chi-square
11Phase Unknown Meioses
aabb
AaBb
Aabb
AaBb
aabb
aaBb
Either
Non-recombinant
Recombinant
Or
Recombinant
Non-recombinant
12Linkage Analysis Phase-unknown Meioses
Count Data Recombinant Gametes
X Non-recombinant Gametes Y or Recombinant
Gametes Y Non-recombinant Gametes
X Likelihood L(?) ?X (1- ?)Y ?Y (1- ?)X An
example of incomplete data Mixture distribution
likelihood function
13Parental genotypes unknown
Aabb
AaBb
aabb
aaBb
Likelihood will be a function of allele
frequencies (population parameters) ?
(transmission parameter)
14Trait phenotypes
Penetrance parameters
Phenotype
Genotype
f2
AA
Disease
f1
1- f2
f0
Aa
1- f1
1- f0
aa
Normal
Each phenotype is compatible with multiple
genotypes.
15General Pedigree Likelihood
Likelihood is a sum of products (mixture
distribution likelihood)
number of terms (m1, m2 ..mk)2n where mj is
number of alleles at locus j
16Elston-Stewart algorithm
Reduces computations by Peeling
Step 1 Condition likelihoods of family 1 on
genotype of X.
Step 2 Joint likelihood of families 2 and 1
17Lod Score Morton (1955)
Lod gt 3 ? conclude linkage
Prior odds linkage ratio Posterior
odds 150 1000 201
Lod lt-2 ? exclude linkage
18Linkage AnalysisAdmixture Test
Model Probabilty of linkage in family
? Likelihood L(?, ?) ? L(?) (1- ?)
L(?1/2)
19Allele sharing (non-parametric) methods
Penrose (1935) Sib Pair linkage For rare
disease IBD Concordant affected Concordant
normal Discordant Therefore Affected sib pair
design Test H0 Proportion of alleles IBD 1/2
20Affected sib pairs incomplete marker information
Parameters IBD sharing probabilities Z(z0, z1,
z2)
Marker Genotype Data M Finite Mixture Likelihood
SPLINK, ASPEX
21Joint distribution of Pedigree IBD
- IBD of relative pairs are independent
- e.g If IBD(1,2) 2 and IBD (1,3) 2
- then IBD(2,3) 2
- Inheritance vector gives joint IBD distribution
- Each element indicates whether
- paternally inherited allele is transmitted (1)
- or maternally inherited allele is transmitted
(0) - ?Vector of 2N elements (N of non-founders)
22Pedigree allele-sharing methods
- Problem
- APM Affected family members Uses IBS
- ERPA Extended Relative Pairs Analysis Dodgy
statistic - Genehunter NPL Non-Parametric Linkage Conservati
ve - Genehunter-PLUS Likelihood (tilting)
- All these methods consider affected members only
23Convergence of parametric and non-parametric
methods
- Curtis and Sham (1995)
- MFLINK Treats penetrance as parameter
- Terwilliger et al (2000)
- Complex recombination fractions
- Parameters with no simple biological
interpretation
24Quantitative Sib Pair Linkage
X, Y standardised to mean 0, variance 1 r sib
correlation VA additive QTL variance
Haseman-Elston Regression (1972)
(X-Y)2 2(1-r) 2VA(?-0.5) ?
Haseman-Elston Revisited (2000)
XY r VA(?-0.5) ?
25Improved Haseman-Elston
- Sham and Purcell (2001)
- Use as dependent variable
- Gives equivalent power to variance components
model for sib pair data
26Variance components linkage
- Models trait values of pedigree members jointly
- Assumes multivariate normality conditional on IBD
- Covariance between relative pairs
- Vr VA ?-E(?)
- Where V trait variance
- r correlation (depends on relationship)
- VA QTL additive variance
- E(?) expected proportion IBD
27QTL linkage model for sib-pair data
1
0 / 0.5 / 1
N
Q
S
Q
S
N
n
q
s
n
s
q
PT1
PT2
28No linkage
29Under linkage
30Incomplete Marker Information
- IBD sharing cannot be deduced from marker
genotypes with certainty - Obtain probabilities of all possible IBD values
- Finite mixture likelihood
-
- Pi-hat likelihood
31QTL linkage model for sib-pair data
1
N
Q
S
Q
S
N
n
q
s
n
s
q
PT1
PT2
32Conditioning on Trait Values
Usual test
Conditional test
Zi IBD probability estimated from marker
genotypes Pi IBD probability given relationship
33QTL linkage some problems
- Sensitivity to marker misspecification of marker
allele frequencies and positions - Sensitivity to non-normality / phenotypic
selection - Heavy computational demand for large pedigrees or
many marker loci - Sensitivity to marker genotype and relationship
errors - Low power and poor localisation for minor QTL