OVERVIEW - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

OVERVIEW

Description:

The same is true for haplotypes transmitted from the other parent. ... 2 allele at the marker locus were inherited in the gamete transmitted to II-1. ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 24
Provided by: bioin9
Category:

less

Transcript and Presenter's Notes

Title: OVERVIEW


1
OVERVIEW
  • Elston-Stewart Algorithm
  • Quantitative Trait Loci
  • Non-parametric analysis

2
Model specification in linkage analysis
  • In linkage analysis, the parameter of primary
    interest is q, the recomb. fraction.
  • This is the only parameter that appears in the
    log-likelihood function
  • However, in order to deal with more complex
    pedigrees and loci, it is helpful to introduce
    other parameters in the framework of a general
    model.

3
Other parameters
  • Penetrance parameters specify the relationship
    between genotype and phenotype.
  • Let g and x represent the vectors of observed
    genotypes and phenotypes, respectively, over all
    individuals.
  • Then the penetrance function can be written
    P(xg)?P(xigi)
  • For example, consider disease allele D and normal
    allele d, in which case four penetrance
    parameters are required. These give the
    conditional probabilities of disease given the
    genotypes AA, Aa, aA, aa. For autosomal
    dominance, the parameters are (1,1,1,0).

4
Other parameters
  • Transmission parameters give the probability of
    inheriting a genotype given the genotype of the
    parents.
  • Transmission is captured by the term p(gi gi,f
    , gi,m) which is a function of the recombination
    fraction q
  • Transmission parameters only apply to individuals
    whose parents are included in the pedigree.
  • For individuals whose parents are not included,
    these are called founding members. The
    parameters which define the distribution of
    genotypes in the founding members of the pedigree
    are known as population parameters.

5
The likelihood function
Gi represents all possible genotypes of
individual i of n. Individuals 1 .. p are
founders (parents). Individuals p1 .. n are
non-founders (children). This function is
prohibitive to evaluate due to the huge number of
products and sums. However, the Elston-Stewart
algorithm gives a method for aggregating terms in
the calculation to reduce the number of products
and sums required.
6
Complexity analysis
It is not hard to see that the number of terms in
this expression is enormous! For two loci, with
m1 and m2 alleles each, this corresponds to m1m2
ordered haplotypes and (m1m2)2 possible genotypes
for an individual. There are (m1m2)2n genotype
combinations over n individuals. Therefore, the
likelihood function is a sum over (m1m2)2n terms,
each term being a product of 2n probabilities.
7
Elston-Stewart Algorithm
Elston and Stewart suggested a simple recursion
for grouping terms of the likelihood function
which greatly reduces the number of additions and
multiplications. A simple example of how this
works is as follows. Examine this equation (how
many operations?)
Compared to the following re-write, which is
possible because some terms in the sum are
independent of others (how many operations?)
8
(No Transcript)
9
Analysis of quantitative traits (QTLs)
  • Traits that are determined by a single locus are
    necessarily discrete.
  • Single-locus traits have been our focus so far,
    and include ABO blood type, HLA antigens, and
    rare dominant and recessive diseases.
  • The situation with continuous traits is less
    clear (e.g., height, weight).
  • While these clearly exhibit genetic inheritance,
    they cannot solely be determined by the action of
    genes at a single locus, because a single locus
    in discrete, not continuous, in nature.
  • The bell-shaped distribution of most of these
    quantitative traits suggests that several or many
    factors, both genetic and environmental, are at
    play.
  • Underlying genetic events at a single locus are
    masked (or at best convoluted) by the operation
    of the other factors.
  • Unfortunately, most traits are continuous/quantita
    tive.

10
Moving to non-parametric approaches
  • Complex diseases such as heart disease, diabetes,
    and depression are caused by multiple genetic and
    environmental factors.
  • A complete likelihood model would include all
    these factors, their joint probability
    distribution in the population, and their joint
    effect on the penetrance.
  • How to proceed in the midst of all of this
    complexity is an open problem.
  • However, the main approach has been to abandon
    the so-called parametric method of conventional
    linkage (in which q is the main parameter) and to
    instead measure the association between the
    sharing of marker alleles among siblings and the
    sharing of their disease status.

11
Measures of allele sharing by relatives
  • The concept of allele sharing is central to
    non-parametric methods of linkage analysis.
  • There are two different forms, identical-by-state
    (IBS) and identical-by-descent (IBD).
  • Two alleles of the same physical form are IBS.
  • If, in addition to being IBS, the two alleles
    descended from the same ancestral allele, they
    are also IBD.

12
Example of IBS versus IBD
  • Consider the following pedigree with loci A and B
    which are in extremely tight linkage

A2A2B1B2
A1A2B1B1
A1A2B1B2
A1A2B1B1
  • At locus A, how many alleles do the siblings have
    that are IBS?
  • At locus A, how many alleles do the siblings have
    that are IBD?
  • Note the close relationship between IBD and
    recombination events. This is why IBD is more
    relevant than IBS for linkage analysis

13
Measuring association with IBD
  • Define two indicator variables
  • Df 1 if the two siblings have the same paternal
    allele 0 otherwise
  • Dm 1 if the two siblings have the same maternal
    allele 0 otherwise
  • Let D Df Dm be the total IBD value of the
    sib-pair.
  • D is a binomial random variable with possible
    values 0, 1, 2 with probabilities ¼, ½, ¼.
  • For two loci A and B, their corresponding IBD
    values DA and DB will be independent for unlinked
    loci, but positively correlated for linked loci.
  • The IBD status at locus A will be the same as the
    IBD status at locus B if and only if
  • NEITHER haplotype is recombinant between the
    loci OR
  • BOTH haplotypes are recombinant between the loci

14
Measuring association with IBD
  • In terms of the recombination fraction, Y q2
    (1-q)2where Y is the probability that the IBD
    status of A and B is the same.
  • The same is true for haplotypes transmitted from
    the other parent.
  • It can be shown that corr(DA,DB) 2 Y-1.

15
Measuring association between IBD and a
quantitative trait
  • For an entire population of individuals, we can
    aggregate the individual D.
  • Let p the proportion of alleles at a locus that
    are IBD between pairs of relatives. Note p 0,
    0.5, 1 corresponding to D 0, 1, 2 which
    counts alleles for a single sib-pair.
  • Let X1 and X2 be the (continuous) quantitative
    trait values of siblings 1 and 2.
  • Haseman/Elston method Regress (X1-X2)2 onto p
  • A regression coefficient significantly less than
    zero is evidence for linkage.

16
x
x
x
x
x
(X1-X2)2
x
x
x
x
x
p
17
Components of the genetic model
  • Components of the genetic model include
    inheritance pattern (dominant vs. recessive,
    sex-linked vs. autosomal), trait allele frequency
    (a common or rare disease?), and the frequency of
    new mutation at the trait locus.
  • Another important component of the genetic model
    is the penetrance of the trait allele. Knowing
    the penetrance of the disease allele is crucial
    because it specifies the probability that an
    unaffected individual is unaffected because he's
    a non-gene carrier or because he's a
    non-penetrant gene carrier. The frequency of
    phenocopies is an important component, too.
  • Rough estimates of the disease allele frequency
    and penetrance can often be obtained from the
    literature or from computer databases, such as
    Online Mendelian Inheritance in Man
    (http//www3.ncbi.nlm.nih.gov/Omim/). Estimates
    of the rate of phenocopies and new mutation are
    frequently guesses, included as a nuisance
    parameter in some cases to allow for the fact
    that these can exist.
  • Linkage analysis is relatively robust to modest
    misspecification of the disease allele frequency
    and penetrance, but misspecification of whether
    the disease is dominant or recessive can lead to
    incorrect conclusions of linkage or non-linkage.

18
Steps to linkage analysis
  • In pedigrees in which the genetic model is known,
    linkage analysis can be broken down into five
    steps
  • State the components of the genetic model.
  • Assign underlying disease genotypes given
    information in the genetic model.
  • Determine putative linkage phase.
  • Score the meiotic events as recombinant or
    non-recombinant.
  • Calculate and interpret LOD scores.
  • Let's take a look at each of these steps in
    detail.

19
State the components of the model
  • In this example, the disease allele will be
    assumed to be rare and to function in an
    autosomal dominant fashion with complete
    penetrance, and the disease locus will be assumed
    to have two alleles
  • N (for normal or wild-type)
  • A (for affected or disease)

20
Assign underlying disease genotypes
  • The assumption of complete penetrance of the
    disease allele allows all unaffected individuals
    in the pedigree to be assigned a disease genotype
    of NN. Since the disease allele is assumed rare,
    the disease genotype for affected individuals can
    be assigned as AN.

21
Determine putative phase
  • Individual II-1 has inherited the disease trait
    together with marker allele 2 from his affected
    father. Thus, the A allele at the disease locus
    and the 2 allele at the marker locus were
    inherited in the gamete transmitted to II-1.
    Once the putative linkage phase (the disease
    allele "segregates" with marker allele 2) has
    been established, this phase can be tested in
    subsequent generations.

22
Score the meiotic events asrecombinant (R) or
non-recombinant (NR)
  • There are four possible gametes from the
    affected parent II-1 N1, N2, A1, and A2. Based
    on the putative linkage phase assigned in step 3,
    gametes A2 and N1 are non-recombinant.

23
Calculate LOD scores
  • In this example, the highest LOD score is -0.09
    at q 0.40. At no value of q is the lod score
    positive, let alone gt3.0, so this pedigree has no
    evidence in favor of linkage between the disease
    and marker loci.
Write a Comment
User Comments (0)
About PowerShow.com