Introduction to linkage analysis - PowerPoint PPT Presentation

1 / 113
About This Presentation
Title:

Introduction to linkage analysis

Description:

insertion/deletion (indel): AAACATAGACCACCGGTT. AAACATAG-CCGGTT ... 0 recombinants in 10 trials (observed outcome; there is no more extreme outcome) ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 114
Provided by: hgor7
Category:

less

Transcript and Presenter's Notes

Title: Introduction to linkage analysis


1
Introduction to linkage analysis
Harald H.H. Göring
Course Study Design and Data Analysis for
Genetic Studies, Universidad ded Zulia,
Maracaibo, Venezuela, 9-10 April 2005
2
Marker loci
  • There are many different types of polymorphisms,
    e.g.
  • single nucleotide polymorphism (SNP)
  • AAACATAGACCGGTT
  • AAACATAGCCCGGTT
  • microsatellite/variable number of tandem repeat
    (VNTR)
  • AAACATAGCACACA----CCGGTT
  • AAACATAGCACACACACCGGTT
  • insertion/deletion (indel)
  • AAACATAGACCACCGGTT
  • AAACATAG--------CCGGTT
  • restriction fragment length polymorphism (RFLP)

3
Tracing chromosomal inheritanceusing marker
locus genotypes
4
Tracing chromosomal inheritance(fully
informative situation)
5
Linkage analysislocus with known genotypes
6
Linkage analysis
  • In linkage analysis, one evaluates statistically
    whether or not the alleles at 2 loci co-segregate
    during meiosis more often than expected by
    chance. If the evidence of increased
    co-segregation is convincing, one generally
    concludes that the 2 loci are linked, i.e. are
    located on the same chromosome (syntenic loci).
    The degree of co-segregation provides an estimate
    of the proximity of the 2 loci, with near
    complete co-segregation for very tightly linked
    loci.

7
Lets step backto Mendel
8
One of Mendels pea crosses
P1
Mendels law of uniformity
F1
F2
Mendels law of independent assortment
315 108 101
32 9 3 3
1
observed ratio
9
P1
Mendels law of uniformity
F1
Mendels law of segregation
F2
25
50
25
(in expectation)
10
P1
Mendels law of uniformity
F1
Mendels law of segregation
F2
25
50
25
(in expectation)
11
P1
Mendels law of uniformity
F1
F2
Mendels law of independent assortment
6.25
12.5
6.25
12.5
12.5
6.25
12.5
6.25
25
(in expectation)
12
Co-segregation(due to linkage)
P1 generation (diploid)
1
1
2
2
a
a
b
b
gametes (haploid)
1
2
a
b
Mendels law of uniformity
F1 generation (diploid)
1
2
1
2
a
b
a
b
gametes (haploid)
1
1
2
2
a
a
b
b
Mendels law of segregation
F2 generation (diploid)
1
1
2
2
1
2
a
a
b
b
a
b
25
50
25
13
Recombination
  • Recombination between 2 loci is said to have
    occurred if an individual received, from one
    parent, alleles (at these 2 loci) that originated
    in 2 different grandparents.

14
Who is a recombinant?
N
N
N
R
N
N
N
N
R
N
15
Possible explanations for recombination
1/1
2/2
a/a
b/b
N
R
R
N
1
2
1
2
I
different chromosomes
b
a
b
a
homologous recombination during meiosis
1
2
1
2
II
b
a
b
a
III
genotyping error
2
R
a
16
Recombination fraction
  • The recombination fraction between 2 loci is
    defined as the proportion of meioses resulting in
    a recombinant gamete. For loci on different
    chromosomes (or for loci far apart on the same,
    large chromosome), the recombination fraction is
    0.5. Such loci are said to be unlinked. For loci
    close together on the same chromosome, the
    recombination fraction is lt 0.5. Such loci are
    said to be linked. The closer the loci, the
    smaller the recombination fraction (? 0).

17
Estimation of recombination fraction
N
N
N
R
N
N
N
N
R
N
18
Missing phase informationWho is a recombinant??
1/2
3/3
a/b
c/c
19
Missing phase and genotype informationWho is a
recombinant??
?/?
3/3
1/2
c/c
a/b
20
Missing phase and genotype informationWho is a
recombinant???
?/?
?/?
c/c
a/b
21
Likelihood
  • The likelihood of a hypothesis (e.g. specific
    parameter value(s)) on a given dataset,
    L(hypothesisdata), is defined to be proportional
    to the probability of the data given the
    hypothesis, P(datahypothesis)
  • L(hypothesisdata) constant
    P(datahypothesis)
  • Because of the proportionality constant, a
    likelihood by itself has no interpretation.
  • The likelihood ratio (LR) of 2 hypotheses is
    meaningful if the 2 hypotheses are nested (i.e.,
    one hypothesis is contained within the other)
  • Under certain conditions, maximum likelihood
    estimates are asymptotically unbiased and
    asymptotically efficient. Likelihood theory
    describes how to interpret a likelihood ratio.

22
Evaluating the evidence of linkagelod score
The lod (logarithm of odds) score is defined as
the logarithm (to the base 10) of the likelihood
of 2 hypothesis on a given dataset
In linkage analysis, typically the different
hypotheses refer to different values of the
recombination fraction
23
Who is a recombinant?
N
N
N
R
N
N
N
N
R
N
24
Example lod score calculation

0
0.1 0.644
0.2 0.837
0.3 0.725
0.4 0.439
0.5 0
25
Missing phase informationWho is a recombinant??
1/2
3/3
a/b
c/c
26
Example lod score calculation(missing phase
information)
P(dataq) P(phase 1) P(dataphase 1, q)
P(phase 2) P(dataphase 2 , q)

0
0.1 0.343
0.2 0.536
0.3 0.427
0.4 0.175
0.5 0
27
Missing phase and genotype informationWho is a
recombinant???
?/?
?/?
c/c
a/b
28
Example lod score calculation(missing phase and
genotype information)
Assuming 3 equally frequent alleles , i.e. P(1)
P(2) P(3) 0.333
q Z(q) 0 -0.304 0.1 0.204 0.2 0.346 0.3 0.264 0.4
0.096 0.5 0
q Z(q) 0 -0.378 0.1 0.183 0.2 0.332 0.3 0.253 0.4
0.091 0.5 0
Assuming P(1) 0.495, P(2) 0.495, P(3) 0.010
29
known phase, known genotypes
unknown phase, known genotypes
3
unknown phase, unknown genotypes
30
Interpretation of lod score
  • The traditional threshold for declaring evidence
    of linkage statistically significance is a lod
    score of 3, or a likelihood ratio of 10001,
    meaning the likelihood of linkage on the data is
    1000-times higher than the likelihood of no
    linkage on the data.
  • Asymptotically, a lod score of 3 has a point-wise
    significance level (p-value) of 0.0001. In other
    words, the probability of obtaining a lod score
    of at least this magnitude by chance is 0.0001.
  • Due to the many linkage tests being conducted as
    part of a genome-wide linkage scan, a lod score
    of 3 has a significance level of 0.05.

31
P-value
The p-value is defined as the probability of
obtaining an outcome at least as extreme as
observed by chance (i.e. when the null hypothesis
is true).
Example Testing whether a coin is fair H0
P(head) 0.5 H1 P(head) ? 0.5 (2-sided
alternative hypothesis). You observe 1 head out
of 10 coin tosses. The p-value then is the
probability of observing exactly 1 head in 10
trials (observed outcome), or 0 head in 10 trials
(more extreme outcome), or 9 (equally extreme
outcome) or 10 (more extreme outcome) heads in 10
trials.
32
P-value
The p-value is defined as the probability of
obtaining an outcome at least as extreme as
observed by chance (i.e. when the null hypothesis
is true).
Example Testing whether 2 loci are linked H0
P(recombination) 0.5 H1 P(recombination) 0.5
(1-sided alternative hypothesis). You observe 0
recombinant and 10 non-recombinant in 10
informative meioses. The p-value then is the
probability of observing exactly 0 recombinants
in 10 trials (observed outcome there is no more
extreme outcome).
33
Lod score
Example Testing whether 2 loci are linked H0
P(recombination) 0.5 H1 P(recombination) 0.5
(1-sided alternative hypothesis). You observe 0
recombinant and 10 non-recombinant in 10
informative meioses. The p-value then is the
probability of observing exactly 0 recombinants
in 10 trials (observed outcome there is no more
extreme outcome).
In the ideal case, 10 fully informative meioses
may suffice to obtain significant evidence of
linkage.
34
Lod score and significance level
lod score (point-wise) p-value
0.588 0.05
1.175 0.01
2.000 0.001
3.000 0.0001
4.000 0.00001
5.000 0.000001
35
Linkage analysis reducesmultiple testing problem
  • Linkage analysis is so useful because it greatly
    reduces the multiple testing problem
    3,000,000,000 bp of DNA are interrogated in 500
    independent linkage tests for human data. This is
    possible because a meiotic recombination event
    occurs on average only once every 100,000,000 bp.
  • No specification of prior hypotheses is therefore
    necessary, as all possible hypotheses can be
    screened.

36
Linkage analysis trait locus with unknown
genotypes
37
Statistical gene mapping with trait phenotypes
38
Many different types of linkage methods
  • penetrance model-based linkage analysis
    (classical linkage analysis)
  • penetrance model-free linkage analysis
    (model-free or non-parametric linkage
    analysis
  • affected sib-pair linkage analysis
  • affected relative-pair linkage analysis
  • regression-based linkage analysis
  • variance components-based linkage analysis

39
Variation with each linkage method
  • 2-point analysis vs. multiple 2-point analysis
    vs. multi-point analysis
  • exact calculation vs. approximation (e.g., MCMC)
  • qualitative trait vs. quantitative traits
  • rare simple mendelian diseases vs. common
    complex multifactorial diseases

40
Penetrance-model-based linkage analysis
41
Segregation analysis
In segregation analysis, one attempts to
characterize the mode of inheritance of a trait,
by statistically examining the segregation
pattern of the trait through a sample of related
individuals. In a way, heritability analysis is
a way of segregation analysis. In heritability
analysis, the analysis is not focused on
characterization of the segregation pattern per
se, but on quantification of inheritance assuming
a given mode of inheritance (such as, generally,
additivity/co-dominance).
42
Relationship between genotypes and phenotypes
(penetrances) at the ABO blood group locus
penetrance P(phenotype given genotype)
Phenotype (blood group)
Genotype A B AB O A/A 1 0 0 0 A/B 0 0 1 0 A/O
1 0 0 0 B/B 0 1 0 0 B/O 0 1 0 0 O/O 0 0 0 1
43
Probability model correlating trait phenotypes
and trait locus genotypespenetrances
penetrance P(phenotype given genotype)
Ex. fully-penetrant dominant disease without
phenocopies
Phenotype
Genotype unaffected affected / 1 0 D/ or
/D 0 1 D/D 0 1
44
Statistical gene mapping with trait
phenotypessimple dominant inheritance model
45
Linkage analysis trait locus (genotypes based on
assumed dominant inheritance model)
46
Example of multipoint lod score curve
Pseudoxanthoma elasticum
From Le Saux et al (1999) Pseudoxanthoma
elasticum maps to an 820 kb region of the p13.1
region of chromosome 16. Genomics 621-10
47
Genetic heterogeneity
locus homogeneity, allelic homogeneity
time
locus homogeneity, allelic heterogeneity
locus heterogeneity, allelic homogeneity (at
each locus)
time
locus heterogeneity, allelic heterogeneity (at
each locus)
48
Pros and cons ofpenetrance-model-based linkage
analysis
  • potentially very powerful (under suitable
    penetrance model)
  • statistically well-behaved
  • - requires specification of penetrance model not
    powerful at all under unsuitable penetrance model

49
Effects of model misspecification
informative
uninformative
dominant inheritance
/
D/
1/2
3/4
P(aff.DD or D) 1 P(aff.) 0
D/
/
D/
1/3
1/4
2/3
uninformative
informative
recessive inheritance
D/
D/D
1/2
3/4
P(aff.DD) 1 P(aff. or D) 0
D/D
D/
D/D
1/3
1/4
2/3
50
Pros and cons ofpenetrance-model-based linkage
analysis
  • potentially very powerful (under suitable
    penetrance model)
  • statistically well-behaved
  • - requires specification of penetrance model not
    powerful at all under unsuitable penetrance model
  • - modeling flexibility limited
  • - computationally intensive

51
Mendelian vs. complex traits
  • simple mendelian disease
  • genotypes of a single locus cause disease
  • often little genetic (locus) heterogeneity
    (sometimes even little allelic heterogeneity)
    little interaction between genotypes at different
    genes
  • often hardly any environmental effects
  • often low prevalence
  • often early onset
  • often clear mode of inheritance
  • good pedigrees for gene mapping can often be
    found
  • often straightforward to map
  • complex multifactorial disease
  • genotypes of a single locus merely increase risk
    of disease
  • genotypes of many different genes (and various
    environmental factors) jointly and often
    interactively determine the disease status
  • important environmental factors
  • often high prevalence
  • often late onset
  • no clear mode of inheritance
  • not easy to find good pedigrees for gene
    mapping
  • difficult to map

52
A quantitative trait is not necessarily complex
observed trait phenotypes
53
Fundamental problem in complex trait gene mapping
correlation to be detected
etiology given ascertainment
genetic distance (linkage, allelic
association)
54
Etiological complexity
gene 2
gene 1
gene 3
trait phenotype
other env. factor(s)
other gene(s)
environm. factor 1
environm. factor 3
environm. factor 2
55
How to improve power to detect correlations
between trait phenotypes and trait locus
genotypes?
etiology
56
How to simplify the etiological architecture?
  • choose tractable trait
  • Are there sub-phenotypes within trait?
  • age of onset
  • severity
  • combination of symptoms (syndrome)
  • endophenotype or biomarker vs. disease
  • quantitative vs. qualitative (discrete)
  • Dichotomizing quantitative phenotypes leads to
    loss of information.
  • simple/cheap measurement vs. uncertain/expensive
    diagnosis
  • not as clinically relevant, but with simpler
    etiology
  • given trait, choose appropriate study
    design/ascertainment protocol
  • study population
  • genetic heterogeneity
  • environmental heterogeneity
  • random ascertainment vs. ascertainment based on
    phenotype of interest
  • single or multiple probands
  • concordant or discordant probands
  • pedigrees with apparent mendelian inheritance?
  • inbred pedigrees?

57
Affected sib-pair linkage analysis
58
Identity-by-state (IBS) vs. identity-by-descent
(IBD)
If IBD then necessarily IBS (assuming absence of
mutation event). If IBS then not necessarily IBD
(unless a locus is 100 informative, i.e. has an
infinite number of alleles, each with
infinitesimally small allele frequency).
59
Probabilistic inference of IBD
IBD
1 0 0.5 1 1
2 1.5 1 0.5 0
0.25 0.5
NIBD
p
60
Rationale ofaffected sib-pair linkage analysis
  • A pair of sibs affected with the same disorder
    is expected to share the alleles at the trait
    locus/loci---and also alleles at linked
    loci---more often (gt 50 ) than a random pair of
    sibs (50 ).

61
Basic concept ofaffected sib pair linkage
analysis
62
Affected sib pair linkage analysis(mean test)
NIBD IBD
counts in example ped. 1 1
total counts in dataset
1/2
3/4
Conditional on the fact that both sibs are
affected, test if
1/3
1/4
63
Affected sib pair linkage analysis(mean test)
NIBD IBD
probability
counts in ex. 1 1
total counts
64
Penetrance-model based linkage analysis on
affected sib pair
65
Penetrance-model-based linkage analysis on
affected sib pair
assuming a rare recessive trait w/o phenocopies
66
Penetrance-based linkage analysis on affected sib
pair
(assuming a rare, recessive trait w/o
phenocopies)
67
Relationship of affected sib-pair linkage
analysis and penetrance-model-based linkage
analysis
For an affected sib-pair of unaffected parents,
affected sib-pair linkage analysis and
penetrance-model-based linkage analysis assuming
a rare recessive trait w/o phenocopies are
identical.
68
Penetrance-based linkage analysis on affected sib
pair
Assuming a rare, recessive trait w/o
phenocopies, the father is no longer
informative.
Penetrance-based linkage analysis is then no
longer equivalent to affected sib pair linkage
analysis.
69
Pseudo-marker analog of affected sib pair
linkage analysis (mean test)
pseudo-marker genotypes
70
Take home message regarding relationship of
penetrance-model-based and model-free
approaches to gene mapping
  • The perceived differences between
    penetrance-model based and many popular
    model-free methods are more related to the
    underlying study design than the statistical
    methodology.
  • A deterministic pseudo-marker genotype
    assignment algorithm can be used to mimic popular
    model-free approaches, allowing joint analysis
    of different data structures for linkage and/or
    LD in a framework identical to penetrance-based
    analysis.
  • These pseudo-marker statistics are generally
    better behaved and more powerful than their
    conventional model-free analogs.

71
Regression-based methods forlinkage analysis of
quantitative traits
The basic rationale behind this approach (in its
various forms) is that pairs of individuals (of a
given relationship) with similar phenotypes are
expected to be more similar to each other
genetically at/near loci influencing the trait of
interest than pairs of relatives (of the same
relationship) who have dissimilar phenotypes. The
degree of phenotypic similarity therefore should
be reflected in the proportion of alleles that
individuals share IBD at/near trait loci.
72
Haseman-Elston sib pair linkage testfor
quantitative traits
squared phenotypic difference between 2 sibs
Statistical inference Is the regression slope lt
0?
D2











IBD
0 0.5 1
73
Variance components-basedlinkage analysis
74
Rationale of variance components-based linkage
analysis
  • The pattern of phenotypic similarity among
    pedigree members should be reflected by the
    pattern of IBD sharing among them at chromosomal
    loci influencing the trait of interest.

75
Variance components approachmultivariate normal
distribution (MVN)
In variance components analysis, the phenotype is
generally assumed to follow a multivariate normal
distribution
no. of individuals (in a pedigree)
n?n covariance matrix
phenotype vector
mean phenotype vector
76
Modeling the resemblance among relative
heritability analysis
linkage analysis
77
Matrix of estimated allele sharing among relatives
P
M
12
33
S1
S2
S3
13
13
13
P M S1 S2 S3
P 1 0 0.5 0.5 0.5
M 1 0.5 0.5 0.5
S1 1 0.5 0.5
S2 1 0.5
S3 1
P M S1 S2 S3
P 1 0 0.5 0.5 0.5
M 1 0.5 0.5 0.5
S1 1 0.75 0.75
S2 1 0.75
S3 1
78
Variance components-based lod score
79
Sample size requirements to detect linkage to a
QTL with a lod score of 3 and 80 power
80
Pros and cons ofvariance-components-based
linkage analysis
  • no need to specify inheritance model
  • robust to allelic heterogeneity at a locus
  • modeling flexibility
  • computationally feasible even on large
    pedigrees
  • - generally assumes additive inheritance model
  • - modeling restrictions
  • - not always well-behaved statistically
    (depending on phenotypic distribution and
    ascertainment)
  • generally less powerful than penetrance-model-base
    d linkage analysis under suitable model

81
Choice of covariates
Covariates ought to be included in the likelihood
model if they are known to influence the
phenotype of interest and if their own genetic
regulation does not overlap the genetic
regulation of the target phenotype. Typical
examples include sex and age. In the analysis
of height, information on nutrition during
childhood should probably be included during
analysis. However, known growth hormone levels
probably should not be.
82
Choice of covariates
83
Choice of covariates
84
Choice of covariatesspecial case of
treatment/medication
85
Before treatment/medicationof affected
individuals
unaffected
affected
86
After (partially effective) treatment /
medication of affected individuals
apparent effect of covariate
unaffected
affected
87
Choice of covariatesspecial case of
treatment/medication
  • If medication is ineffective/partially effective,
    including treatment as a covariate is worse than
    ignoring it in the analysis.
  • If medication is very effective, such that the
    phenotypic mean of individuals after treatment is
    equal to the phenotypic mean of the population as
    a whole, then including medication as a covariate
    has no effect.
  • If medication is extremely effective, such that
    the phenotypic mean of individuals after
    treatment is better than the phenotypic mean of
    the population as a whole, then including
    medication as a covariate is better than ignoring
    it, but still far from satisfying.
  • Either censor individuals or, better, infer or
    integrate over their phenotypes before treatment,
    based on information on efficacy etc.

88
Two-point vs. multi-point linkage analysis
  • In linkage analysis, one always examines whether
    or not the alleles at 2 loci tend to co-segregate
    during meiosis.
  • In two-point linkage analysis, chromosomal
    inheritance is inferred from the observed trait
    phenotypes on the one hand (locus 1) and from a
    single (genotyped) marker locus on the other hand
    (locus 2).
  • In multi-point linkage analysis, chromosomal
    inheritance is inferred from the observed trait
    phenotypes on the one hand (locus 1) and from
    multiple (genotyped) marker loci on the other
    hand (locus 2).

89
Pros and cons of multi-point linkage analysis
  • Genotypes at multiple markers contain at least
    as much and generally more information to infer
    chromosomal inheritance than genotypes at a
    single marker, resulting in greater power to
    detect linkage.
  • The number of independent tests in genome-wide
    linkage analysis is somewhat reduced in
    multi-point linkage analysis vs. two-point
    linkage analysis.
  • - Multi-point linkage analysis requires knowledge
    of the genetic marker map (marker order and
    inter-marker recombination fractions). If this
    information is incorrect, power can be reduced
    and/or the false positive rate can be increased.
  • - Multi-point linkage analysis is more
    susceptible to genotyping errors.
  • - Multi-point linkage analysis typically assumes
    linkage equilibrium between markers. If this does
    not hold, power can be reduced and/or the false
    positive rate can be increased.
  • - Multi-point linkage analysis is computationally
    more demanding than two-point linkage analysis.

90
Genetic map vs. physical map
m1
m2
m3
m4
?23
?34
?12
genetic map
x1
x2
x3
x4
cM
physicalmap
y1
y2
y3
y4
Mb
91
Genetic map distance vs. recombination fraction
Def. of recombination fraction probability that
recombination takes place between 2 chromosomal
positions during meiosis Recombination fractions
are not additive, i.e., for 3 loci and
recombination fractions ?12 and ?23, ?13 ? ?12
?23.
Def. of genetic map distance (Morgan, M)
distance in which 1 recombination event is
expected to take place or, equivalently, average
distance between recombination events.
centi-Morgan (cM) is equal to 1/100
Morgan. Genetic map distances are additive, i.e.
for 3 loci and map distances x12 cM and x23 cM,
x13 x12 x23 cM.
Neither recombination fractions nore genetic map
distances are easily converted into physical map
distances.
92
Why a genome-wide linkage scan may fail
  • The sample size is too small.
  • The marker genotypes are not sufficiently
    informative (low heterozygosity and/or large gaps
    in marker map).
  • There is no major gene.
  • The chosen analytical approach is unsuitable.
  • Bad luck!

93
A fairytale of 2 traits
94
Heritability estimates
trait A trait B
45-82 63-92
95
Quantitative trait A (sample 1)
large, randomly ascertained pedigrees no. of
phenotyped individuals 268 trait heritability
estimate 0.55
96
Quantitative trait B (sample 1)
large, randomly ascertained pedigrees no. of
phenotyped individuals 324 trait heritability
estimate 0.88
97
Quantitative trait A (sample 1)
98
Quantitative trait A (samples 1--2)
99
Quantitative trait A (samples 1--3)
100
Quantitative trait A (samples 1--3 combined)
101
Quantitative trait B (sample 1)
102
Quantitative trait B (samples 1--2)
103
Quantitative trait B (samples 1--3)
104
Quantitative trait B (samples 1--4)
105
Quantitative trait B (samples 1--5)
106
Quantitative trait B (samples 1--6)
107
Quantitative trait B (samples 1--7)
108
Quantitative trait B (samples 1--8)
109
Quantitative trait B (samples 1--9)
110
quantitative trait A lipoprotein A
(concentration in serum)
quantitative trait B height (in adults)
111
Heritability of adult height(additive
heritability, adjusted for sex and age)
study study sample size heritability estimate
TOPS TOPS 2199 0.78
FLS FLS 705 0.83
GAIT GAIT 324 0.88
SAFHS SAFHS 903 0.76
SAFDS SAFDS 737 0.92
SHFS AZ 643 0.80
SHFS DK 675 0.81
SHFS OK 647 0.79
Jiri Jiri 616 0.63
total total 7449
112
Polygenic or oligogenic ?
113
Height (9 samples)
Write a Comment
User Comments (0)
About PowerShow.com