Title: Introduction to Linkage Analysis
1Introduction to Linkage Analysis
- Pak Sham
- Twin Workshop 2003
2Human Genome
- 22 autosomes, XY
- ?3 ?109 base-pairs (2 metres long)
- ? 2 coding sequences, rest regulatory junk
- ? 30,000 - 40,000 genes
- Much communality with other species
3Genetic Variation
- Chromosomal abnormalities
- Duplication (e.g. Downs)
- Deletion (e.g. Velo-cardio-facial syndrome)
- Major deleterious mutations
- Usually Rare (e.g. Huntingtons)
- Polymorphisms
- Single nucleotide polymorphisms (SNPs)
- Variable length repeats (e.g. microsatellites)
- Some are functional (normal variation)
- Most are non-functional (neutral markers)
4Genetic Mapping of Disease
- Levels of Genetic Analysis
- Estimate heritability (family, twins, adoption)
- Find chromosomal locations (linkage)
- Identify risk variants (association)
- Understand mechanisms (cell biology, etc)
- Applications
- Prediction of genetic risk
- More accurate prediction of genetic risk
- Even more accurate prediction of genetic risk
prediction of prognosis and treatment response - Development of new drug targets
5Strategies of Gene Mapping
- Functional
- Uses knowledge of disease to identify candidate
genes - Finds variants in candidate genes
- Looks for association between variants and
disease - Positional
- Systematic screen of whole genome
- Uses a set of ? 400 evenly-spaced markers
- Looks for markers which con-segregate with disease
6Co-segregation
A3A4
A1A2
A2A4
A1A3
A2A3
Marker allele A1 cosegregates with dominant
disease
A1A2
A1A4
A3A4
A3A2
7Linkage ?Co-segregation
Gametes
Parent
Alleles on the same chromosome tend to be
stay together in meiosis therefore they tend be
co-transmitted.
8Crossing over between homologous chromosomes
9Map Distance
- Map distance between two loci (Morgans)
- Expected number of crossovers per meiosis
- (1 Morgan 100 centiMorgans)
- Note Map distances are additive
- Heterogeneity in recombination frequencies
- Total map length ? 33
- (1 cM ? 106 base pairs)
10Recombination
A1
Q1
Parental genotypes
A1
Q1
A2
Q2
Non-recombinants 1-?
A2
Q2
A1
Q2
Recombinants ?
A2
Q1
11Recombination Fraction
- Recombination fraction (?) between two loci
- Proportion of gametes that are recombinant
with respect to the two loci
12Recombination map distance
Haldane map function
13Double Backcross Fully Informative Gametes
AABB
aabb
aabb
AaBb
Aabb
AaBb
aabb
aaBb
Non-recombinant
Recombinant
14Linkage Analysis Fully Informative Gametes
Count Data Recombinant Gametes
R Non-recombinant Gametes N Parameter Recombi
nation Fraction ? Likelihood L(?) ?R (1-
?)N Estimation Chi-square
15Phase Unknown Meioses
aabb
AaBb
Aabb
AaBb
aabb
aaBb
Either
Non-recombinant
Recombinant
Or
Recombinant
Non-recombinant
16Mixture distribution likelihood
The probability of observed data X depend on
the status of descrete variable G P(XG) The
status of G is not observed but the
probability distribution of G is
available P(G) Then the likelihood of the
observed data X is
17Linkage Analysis Phase-unknown Meioses
Count Data Recombinant Gametes
X Non-recombinant Gametes Y or Recombinant
Gametes Y Non-recombinant Gametes
X Likelihood L(?) ?X (1- ?)Y ?Y (1- ?)X An
example of incomplete data Mixture distribution
likelihood function
18Parental genotypes unknown
Aabb
AaBb
aabb
aaBb
Likelihood will be a function of allele
frequencies (population parameters) ?
(transmission parameter)
19Complex Phenotypes
Penetrance parameters
Phenotype
Genotype
f2
AA
Disease
f1
1- f2
f0
Aa
1- f1
1- f0
aa
Normal
Each phenotype is compatible with multiple
genotypes.
20General Pedigree Likelihood
Likelihood is a sum of products (mixture
distribution likelihood)
number of terms (m1 m2 ..mk)2n where mj is
number of alleles at locus j
21Elston-Stewart algorithm
Reduces computations by peeling
Step 1 Condition likelihoods of family 1 on
genotype of X.
Step 2 Joint likelihood of families 2 and 1
22Lod Score Morton (1955)
Lod gt 3 ? conclude linkage
Prior odds linkage ratio Posterior
odds 150 1000 201
Lod lt-2 ? exclude linkage
23Lod Score Curves
lod
0
0.5
?
Lod score curves are additive over pedigrees
24Lods, chi-squares p-values
In large samples
2 ? loge(10) ? Max lod ?21
In small samples
P ? 10 -Max lod
25Problems with parametric linkage
- Requires parameters of the disease model to be
specified - Allele frequency
- Penetrances
- These are generally unknown for a complex trait
- Disease model assumes that a single locus is the
only source of familial resemblance - This is generally unrealistic
26Linkage AnalysisAdmixture Test (CAB Smith)
Model Probability of linkage in family
? Likelihood L(?, ?) ? L(?) (1- ?)
L(?1/2) Note Another example of mixture
likelihood
27Linkage Analysis MOD
- Maximise lod score over several sets of disease
models, e.g. dominant, recessive, additive - Make correction for multiple (k) models
- Adjusted lod lod log10(k)
28Allele sharing (non-parametric) methods
Penrose (1935) Sib Pair linkage For rare
disease IBD Concordant
affected Concordant normal Discordant Theref
ore affected sib pair (ASP) design
efficient Test H0 Proportion of alleles IBD
1/2 HA Proportion of alleles IBD gt1/2
29Correlation between IBD of two loci
- For sib pairs
- Corr(?A, ?B) (1-2?AB)2
- ? attenuation of linkage signal with increasing
genetic distance from disease locus
30Joint distribution of Pedigree IBD
- IBD of relative pairs are not independent
- e.g If IBD(1,2) 2 and IBD (1,3) 2 then
IBD(2,3) 2 - Inheritance vector gives joint IBD distribution
- Each element indicates whether
- paternally inherited allele is transmitted (1)
- or maternally inherited allele is transmitted (0)
- ?Vector of 2N elements (N of non-founders)
31Inheritance Vector An Example
Ordered genotype notation 1st allele
paternally inherited 2nd allele maternally
inherited
1/2
3/4
2/3
1/3
1/4
2/4
Inheritance vector (1, 1, 1, 0, 1, 0)
32Pedigree allele-sharing methods
- APM Affected Pedigree Members Uses IBS
- very sensitive to allele frequency
mis-specification - less powerful than IBD-based methods
- NPL Non-Parametric Linkage (Genehunter)
- Conservative at positions between markers
- LRT Delta parameter (Genehunter, Allegro)
- All these methods consider affected members only
33Variance Components Linkage
- Models trait values of pedigree members jointly
- Assumes multivariate normality conditional on IBD
- Covariance between relative pairs
-
- Vr VQ ?-E(?)
- Where V trait variance
- r correlation (depends on relationship)
- VQ QTL additive variance
- E(?) expected proportion IBD
34Path Diagram for Sib-Pair QTL model
1
0 / 0.5 / 1
N
Q
S
Q
S
N
n
q
s
n
s
q
PT1
PT2
35Incomplete Marker Information
- IBD sharing cannot always be deduced from marker
genotypes with certainty - Obtain probabilities of IBD values (Z0, Z1, Z2)
- Finite mixture likelihood
-
- Pi-hat likelihood
36Pi-hat Model
1
N
Q
S
Q
S
N
n
q
s
n
s
q
PT1
PT2
37Parametric / Allele Sharing
Parametric
Trait Data
Marker Data
Allele sharing
IBD sharing