Title: Linkage disequilibrium LD mapping
1Linkage disequilibrium (LD) mapping
- Also looks at common inheritance but in
populations of unrelated individuals No
pedigrees required - Fine mapping with dense markers at least every 60
kb - Beyond this distance loci are generally in
linkage equilibrium - Also called Association Mapping
- Typically, used to obtain a finer-grained map
after coarse mapping by linkage analysis
2(No Transcript)
3Quantifying association with contingency tables
Observed o
Expected e
R1
R2
C1
C2
4Quantifying association with contingency tables
Observed o
Expected e
50
50
10
90
5Linkage and LD analysis in tandem
Figure 12.2 from Primrose and Twyman
6LD mapping elucidates our evolutionary origins
- In Northern European populations, LD extends for
60kb - In a Nigerian African population, LD extends for
5kb, a much shorter distance - What do we conclude from these findings?
7Resources for genetic mapping
- CEPH- (Centre dEtude du Polymorphisme Humaine)
Large database of pedigrees/nuclear families - dbSNP- database of SNPs found in the human genome
- Successfully mapped Type I diabetes, cystic
fibrosis, Breast cancer (BRCAI and II), Crohns
disease, etc.
8Haplotype mapping
- A haplotype is a pattern of SNPs in a contiguous
stretch of DNA - Due to linkage disequilibrium, SNPs are typically
inherited in discrete haplotype blocks spanning
10-100kb - Greatly simplifies LD analysis, because rather
than screen all SNPs in a region, we just need to
screen a few and the rest can be inferred - A complete human haplotype map is still underway
9Example haplotype map
Figure 12.4 from Primrose and Twyman
10Map functions
- A genetic map function M gives a relation rM(d)
connecting recombination fractions r and genetic
map distances d. - The simplest map function (used by TH Morgan) is
rd. - However, this function only applies if the case
if the chance of multiple crossovers is
negligible. - For d gt 0.1 (distances gt 10cM), multiple
crossovers do occur leading to r lt d. - This happens because even numbers of crossovers
cancel each other out to produce parental types,
not recombinants.
11Haldanes Map Function
- Haldanes function gives a correction using
Poisson statistics. - Denote the distribution of crossover points in
the interval d by (p0, p1, p2, p3,) where pk is
the probability that exactly k crossovers occur
within the interval. Assume this is Poisson. - The recombination fraction is the probability of
an odd number of crossovers - r p1 p3 p5
- While the map length is
- d p1 2p2 3p3
12The Poisson function
d 1d 4d 10
13Haldanes Map Function
- The recombination fraction is the probability of
an odd number of crossovers - r p1 p3 p5
- If p follows Poisson inter-arrival times
- pk e-ddk / k! with E(k) d
- Solving for r we obtain
- r e-dd e-dd3/3! ½ (1 e-2d)
- Rearranging as a function of d d ½ ln(1-2r)
14Genetic phase
- Haplotype alleles received by an individual
from one parent - Phase For a doubly heterozygous individual A/a
B/b, whether the A allele was received in the
same haplotype as the B or b allele.
PHASE KNOWN
PHASE UNKNOWN
A B
a b
A B
a b
A B
A B
a b
a b
A B
a b
A B
a b
A B
a b
A B
a b
or
Could be
15An exampleMorgans Fly Experiments
- One gene affects eye color(pr, purple, and pr,
red)The other affects wing length(vg,
vestigial, and vg, normal). - Morgan crossed pr/pr vg/vg flies with pr/pr
vg/vg and then testcrossed the doubly
heterozygous F1 femalespr/pr vg/vg ?
pr/pr vg/vg ?. - Because one parent (tester) contributes gametes
carrying only recessive alleles, the phenotypes
of the offspring reveal the gametic contribution
of the other, doubly heterozygous parent.
16The test cross format
P pr/pr vg/vg pr/pr vg/vg
F1 pr/pr vg/vg
Tester pr/pr vg/vg pr/pr vg/vg
17Reverse phase experiment
P pr/pr vg/vg pr/pr vg/vg
F1 pr/pr vg/vg
Tester pr/pr vg/vg pr/pr vg/vg
18Another example showing the importance of phase
information
1
2
No Disease
HC/Y
HC/hc?
Colorblind
Colorblind Hemophilia
1
2
HC/Y
HC/hc
1
2
3
4
5
6
HC
hc/Y
Hc/Y
HC/Y
hc/Y
hc/Y
HC/Y
What is the genetic distance between these genes?
Could this computation be done without the
grandparents?
19SNPs and Pharmacogenomics
- Refers to the complete list of genes that
determine the overall efficacy and toxicity of a
drug - Tries to account for all genes that influence
- Drug metabolism
- Drug transport/export
- Receptors
- Signaling pathways, etc.
- Your genotype would allow a physician to
determine the optimal dose and medication for
optimal therapy - Pharmas are spending a lot of money to discover
clinically relevant SNPs
20(No Transcript)
21Population Genetics 101Measuring Genetic
Variation
- Hardy-Weinberg equilibrium (HWE)
- Genotype frequencies depend only on gene
frequencies - pA frequency of allele A
- pB frequency of allele B
- P(A/A) pA2 P(A/B) pB2 P(A/B) 2pApB
- pA pB 1
- pA2 2pApB pB2 1
22Population Genetics 101Measuring Genetic
Variation
- Observed vs. expected heterozygosity
- Ho Observed fraction of heterozygous
individuals - He Expected fraction based on allele
frequencies - The frequency f(X) of allele X is the fraction of
times it occurs over all loci (2 per individual) - He 1 the probability of homozygosity
- 1 f2(X) f2(Y) for all alleles
(X,Y,)
23Example 10 Unique Genotypes(in bp lengths of
microsatellite)
Ho 0.30 He 0.69
H 1 high diversity H 0 asexual
mitotic reproduction Ho ltlt He indicates
selective pressure or non-random mating
24Components of the genetic model
- Components of the genetic model include
inheritance pattern (dominant vs. recessive,
sex-linked vs. autosomal), trait allele frequency
(a common or rare disease?), and the frequency of
new mutation at the trait locus. - Another important component of the genetic model
is the penetrance of the trait allele. Knowing
the penetrance of the disease allele is crucial
because it specifies the probability that an
unaffected individual is unaffected because he's
a non-gene carrier or because he's a
non-penetrant gene carrier. The frequency of
phenocopies is an important component, too. - Rough estimates of the disease allele frequency
and penetrance can often be obtained from the
literature or from computer databases, such as
Online Mendelian Inheritance in Man
(http//www3.ncbi.nlm.nih.gov/Omim/). Estimates
of the rate of phenocopies and new mutation are
frequently guesses, included as a nuisance
parameter in some cases to allow for the fact
that these can exist. - Linkage analysis is relatively robust to modest
misspecification of the disease allele frequency
and penetrance, but misspecification of whether
the disease is dominant or recessive can lead to
incorrect conclusions of linkage or non-linkage.
25Steps to linkage analysis
- In pedigrees in which the genetic model is known,
linkage analysis can be broken down into five
steps - State the components of the genetic model.
- Assign underlying disease genotypes given
information in the genetic model. - Determine putative linkage phase.
- Score the meiotic events as recombinant or
non-recombinant. - Calculate and interpret LOD scores.
- Let's take a look at each of these steps in
detail.
26State the components of the model
- In this example, the disease allele will be
assumed to be rare and to function in an
autosomal dominant fashion with complete
penetrance, and the disease locus will be assumed
to have two alleles - N (for normal or wild-type)
- A (for affected or disease)
27Assign underlying disease genotypes
- The assumption of complete penetrance of the
disease allele allows all unaffected individuals
in the pedigree to be assigned a disease genotype
of NN. Since the disease allele is assumed rare,
the disease genotype for affected individuals can
be assigned as AN.
28Determine putative phase
- Individual II-1 has inherited the disease trait
together with marker allele 2 from his affected
father. Thus, the A allele at the disease locus
and the 2 allele at the marker locus were
inherited in the gamete transmitted to II-1.
Once the putative linkage phase (the disease
allele "segregates" with marker allele 2) has
been established, this phase can be tested in
subsequent generations.
29Score the meiotic events asrecombinant (R) or
non-recombinant (NR)
- There are four possible gametes from the
affected parent II-1 N1, N2, A1, and A2. Based
on the putative linkage phase assigned in step 3,
gametes A2 and N1 are non-recombinant.
30Calculate LOD scores
- In this example, the highest LOD score is -0.09
at q 0.40. At no value of q is the lod score
positive, let alone gt3.0, so this pedigree has no
evidence in favor of linkage between the disease
and marker loci.