Title: OVERVIEW
1OVERVIEW
- Elston-Stewart Algorithm
- Quantitative Trait Loci
- Non-parametric analysis
2Model specification in linkage analysis
- In linkage analysis, the parameter of primary
interest is q, the recomb. fraction. - This is the only parameter that appears in the
log-likelihood function - However, in order to deal with more complex
pedigrees and loci, it is helpful to introduce
other parameters in the framework of a general
model.
3Other parameters
- Penetrance parameters specify the relationship
between genotype and phenotype. - Let g and x represent the vectors of observed
genotypes and phenotypes, respectively, over all
individuals. - Then the penetrance function can be written
P(xg)?P(xigi) - For example, consider disease allele D and normal
allele d, in which case four penetrance
parameters are required. These give the
conditional probabilities of disease given the
genotypes AA, Aa, aA, aa. For autosomal
dominance, the parameters are (1,1,1,0).
4Other parameters
- Transmission parameters give the probability of
inheriting a genotype given the genotype of the
parents. - Transmission is captured by the term p(gi gi,f
, gi,m) which is a function of the recombination
fraction q - Transmission parameters only apply to individuals
whose parents are included in the pedigree. - For individuals whose parents are not included,
these are called founding members. The
parameters which define the distribution of
genotypes in the founding members of the pedigree
are known as population parameters.
5The likelihood function
Gi represents all possible genotypes of
individual i of n. Individuals 1 .. p are
founders (parents). Individuals p1 .. n are
non-founders (children). This function is
prohibitive to evaluate due to the huge number of
products and sums. However, the Elston-Stewart
algorithm gives a method for aggregating terms in
the calculation to reduce the number of products
and sums required.
6Complexity analysis
It is not hard to see that the number of terms in
this expression is enormous! For two loci, with
m1 and m2 alleles each, this corresponds to m1m2
ordered haplotypes and (m1m2)2 possible genotypes
for an individual. There are (m1m2)2n genotype
combinations over n individuals. Therefore, the
likelihood function is a sum over (m1m2)2n terms,
each term being a product of 2n probabilities.
7Elston-Stewart Algorithm
Elston and Stewart suggested a simple recursion
for grouping terms of the likelihood function
which greatly reduces the number of additions and
multiplications. A simple example of how this
works is as follows. Examine this equation (how
many operations?)
Compared to the following re-write, which is
possible because some terms in the sum are
independent of others (how many operations?)
8(No Transcript)
9Analysis of quantitative traits (QTLs)
- Traits that are determined by a single locus are
necessarily discrete. - Single-locus traits have been our focus so far,
and include ABO blood type, HLA antigens, and
rare dominant and recessive diseases. - The situation with continuous traits is less
clear (e.g., height, weight). - While these clearly exhibit genetic inheritance,
they cannot solely be determined by the action of
genes at a single locus, because a single locus
in discrete, not continuous, in nature. - The bell-shaped distribution of most of these
quantitative traits suggests that several or many
factors, both genetic and environmental, are at
play. - Underlying genetic events at a single locus are
masked (or at best convoluted) by the operation
of the other factors. - Unfortunately, most traits are continuous/quantita
tive.
10Moving to non-parametric approaches
- Complex diseases such as heart disease, diabetes,
and depression are caused by multiple genetic and
environmental factors. - A complete likelihood model would include all
these factors, their joint probability
distribution in the population, and their joint
effect on the penetrance. - How to proceed in the midst of all of this
complexity is an open problem. - However, the main approach has been to abandon
the so-called parametric method of conventional
linkage (in which q is the main parameter) and to
instead measure the association between the
sharing of marker alleles among siblings and the
sharing of their disease status.
11Measures of allele sharing by relatives
- The concept of allele sharing is central to
non-parametric methods of linkage analysis. - There are two different forms, identical-by-state
(IBS) and identical-by-descent (IBD). - Two alleles of the same physical form are IBS.
- If, in addition to being IBS, the two alleles
descended from the same ancestral allele, they
are also IBD.
12Example of IBS versus IBD
- Consider the following pedigree with loci A and B
which are in extremely tight linkage
A2A2B1B2
A1A2B1B1
A1A2B1B2
A1A2B1B1
- At locus A, how many alleles do the siblings have
that are IBS? - At locus A, how many alleles do the siblings have
that are IBD? - Note the close relationship between IBD and
recombination events. This is why IBD is more
relevant than IBS for linkage analysis
13Measuring association with IBD
- Define two indicator variables
- Df 1 if the two siblings have the same paternal
allele 0 otherwise - Dm 1 if the two siblings have the same maternal
allele 0 otherwise - Let D Df Dm be the total IBD value of the
sib-pair. - D is a binomial random variable with possible
values 0, 1, 2 with probabilities ¼, ½, ¼. - For two loci A and B, their corresponding IBD
values DA and DB will be independent for unlinked
loci, but positively correlated for linked loci. - The IBD status at locus A will be the same as the
IBD status at locus B if and only if - NEITHER haplotype is recombinant between the
loci OR - BOTH haplotypes are recombinant between the loci
14Measuring association with IBD
- In terms of the recombination fraction, Y q2
(1-q)2where Y is the probability that the IBD
status of A and B is the same. - The same is true for haplotypes transmitted from
the other parent. - It can be shown that corr(DA,DB) 2 Y-1.
15Measuring association between IBD and a
quantitative trait
- For an entire population of individuals, we can
aggregate the individual D. - Let p the proportion of alleles at a locus that
are IBD between pairs of relatives. Note p 0,
0.5, 1 corresponding to D 0, 1, 2 which
counts alleles for a single sib-pair. - Let X1 and X2 be the (continuous) quantitative
trait values of siblings 1 and 2. - Haseman/Elston method Regress (X1-X2)2 onto p
- A regression coefficient significantly less than
zero is evidence for linkage.
16x
x
x
x
x
(X1-X2)2
x
x
x
x
x
p
17Components of the genetic model
- Components of the genetic model include
inheritance pattern (dominant vs. recessive,
sex-linked vs. autosomal), trait allele frequency
(a common or rare disease?), and the frequency of
new mutation at the trait locus. - Another important component of the genetic model
is the penetrance of the trait allele. Knowing
the penetrance of the disease allele is crucial
because it specifies the probability that an
unaffected individual is unaffected because he's
a non-gene carrier or because he's a
non-penetrant gene carrier. The frequency of
phenocopies is an important component, too. - Rough estimates of the disease allele frequency
and penetrance can often be obtained from the
literature or from computer databases, such as
Online Mendelian Inheritance in Man
(http//www3.ncbi.nlm.nih.gov/Omim/). Estimates
of the rate of phenocopies and new mutation are
frequently guesses, included as a nuisance
parameter in some cases to allow for the fact
that these can exist. - Linkage analysis is relatively robust to modest
misspecification of the disease allele frequency
and penetrance, but misspecification of whether
the disease is dominant or recessive can lead to
incorrect conclusions of linkage or non-linkage.
18Steps to linkage analysis
- In pedigrees in which the genetic model is known,
linkage analysis can be broken down into five
steps - State the components of the genetic model.
- Assign underlying disease genotypes given
information in the genetic model. - Determine putative linkage phase.
- Score the meiotic events as recombinant or
non-recombinant. - Calculate and interpret LOD scores.
- Let's take a look at each of these steps in
detail.
19State the components of the model
- In this example, the disease allele will be
assumed to be rare and to function in an
autosomal dominant fashion with complete
penetrance, and the disease locus will be assumed
to have two alleles - N (for normal or wild-type)
- A (for affected or disease)
20Assign underlying disease genotypes
- The assumption of complete penetrance of the
disease allele allows all unaffected individuals
in the pedigree to be assigned a disease genotype
of NN. Since the disease allele is assumed rare,
the disease genotype for affected individuals can
be assigned as AN.
21Determine putative phase
- Individual II-1 has inherited the disease trait
together with marker allele 2 from his affected
father. Thus, the A allele at the disease locus
and the 2 allele at the marker locus were
inherited in the gamete transmitted to II-1.
Once the putative linkage phase (the disease
allele "segregates" with marker allele 2) has
been established, this phase can be tested in
subsequent generations.
22Score the meiotic events asrecombinant (R) or
non-recombinant (NR)
- There are four possible gametes from the
affected parent II-1 N1, N2, A1, and A2. Based
on the putative linkage phase assigned in step 3,
gametes A2 and N1 are non-recombinant.
23Calculate LOD scores
- In this example, the highest LOD score is -0.09
at q 0.40. At no value of q is the lod score
positive, let alone gt3.0, so this pedigree has no
evidence in favor of linkage between the disease
and marker loci.