Title: OVERVIEW
1OVERVIEW
- Wrap up phase discussion- computing LOD scores
with unknown phase - How do we determine the model of inheritance?
Autosomal dominant, autosomal recessive, or
sex-linked? - How are good markers chosen?
2Phase UnknownComputing the LOD score
No Disease
Colorblind
1
2
Colorblind Hemophilia
HC/Y
HC/hc
Hc/hC
OR
1
2
3
4
5
6
HC
hc/Y
Hc/Y
HC/Y
hc/Y
hc/Y
HC/Y
What is the genetic distance between these genes?
Could this computation be done without the
grandparents?
3Y statistics
- In the previous example, the recombination
fraction r 1/6 or 5/6 depending on the phase. - Note that it is always possible to bin the
progeny into two groups, but that the difficulty
lies in labeling the recombinant group - Early geneticists thought that such pedigrees did
not yield information on linkage. - However, assuming k of n recombinants, Bernstein
(1931) pointed out that the product y k(n-k) is
the same in either phase. - Note that y is largest for r ½ and zero for r0
- Bernstein published mean values of y for
different parameters N and r. These enabled
researchers to estimate recombination fractions
for phase-unknown data.
4Modified LOD score
- The likelihood ratio function can also be
modified to deal with unknown phase. - Recall that the LOD score is a ratio of two
likelihoods L(q) and L(q½) with - If phase is unknown, k cannot be computed. But
since the two phases are equally likely
5How do we determine which model of inheritance to
use?
- FOR AUTOSOMAL DOMINANCE
- Since diseases are typically rare, the frequency
of allele D is assumed low. - Most affected individuals are therefore expected
to have genotype Dd rather than DD. - Matings between an affected and unaffected will
be type Dd x dd. - In this case the probability of affected
offspring will be 50 (see next table). - This prediction can be used as a test of
autosomal dominance.
6Six possible mating types autosomal dominance
7Test 1 the binomial test
- The binomial distribution is the discrete
probability distribution of the number of
successes k in a sequence of n independent yes/no
experiments, each of which yields success with
probability p. - We want the associated p-value of observing k
when p 1/2
8Example Opalescent dentine(Neel and Schull
1954)
Examined 112 offspring of an affected parent 52
were similarly affected as the parent The other
60 were normal Are these observations consistent
with autosomal dominance?
9Test 2 Maximum likelihood
- The hypothesis of p1/2 can also be tested by
considering the likelihood of the data L(p) as a
function of parameter p. - Given k of n offspring are affected
- Let L1 be the maximum value of this function for
0p1, and let L0 be the value at p1/2. - The likelihood ratio statistic l 2(ln L1 ln
L0)What are the lower and upper bounds of l? - If the hypothesis that p1/2 is true, l can be
shown to be distributed as c21.
10Maximizing the likelihood
- Computing L1 requires finding the value of p
which maximizes L. - Solution Set the derivative 0 and solve for p.
Ignoring then choose k constant
These terms equal zero only at the extremes of p
11Plots of L vs. p for two different k
k 50 n 100
k 5 n 100
12Maximizing the likelihood (continued)
- The maximum likelihood estimate of p k/n. This
is used to compute L1 - The p-value is given by the probability that a
chi-square with 1 DOF will exceed 2(lnL1 lnL0)
13Back to dentin example
- Example for the opalescent dentin data
- Thus the likelihood ratio statistic l 0.5719.
This gives a p-value of 0.4495 (see next slide). - This value is very close to that calculated with
the binomial CDF.
14Chi-square with k degrees of freedom
15Autosomal recessive disorders
- The entire preceding discussion was focused on
testing for autosomal dominance. The autosomal
recessive model is harder to test because of
so-called ascertainment bias, which occurs as
follows - For a dominant disorder, we are examining Dd x
dd, which means that families can be selected
based on the phenotypes of the parents. - For a recessive disorder, the mating one would
like to examine is Dd x Dd, with an expected
segregation ratio of ¼. - Since these parents are phenotypically normal, we
must select families based on the presence of
diseased children. - However, this ascertainment procedure will miss
families with the Dd x Dd mating type that by
chance had no affected children.
16Autosomal recessive disorders
-
- The need to account for the incomplete selection
of a mating type in segregation analysis was
pointed out by Fisher in a classic 1934 paper
entitled The effect of methods of ascertainment
upon the estimation of frequencies. - It is a statistical commonplace that the
interpretation of a body of data requires a
knowledge of how it was obtained Nevertheless,
in human genetics especially, statistical methods
are sometimes put forward, and their respective
claims advocated with entire disregard of the
conditions of ascertainment.
17Truncated binomial method
- Consider Dd x Dd families with X observed
affected individuals out of s total offspring. - X is binomial random variable with parameters s
and expected fraction affected of p ¼ for a
rare recessive disorder. - Assume all families with Xgt0 are ascertained.
- We wish to compute
- which can be plugged into the maximum
likelihood framework already described.
18Population genetics as another tool to establish
the mode of inheritance.
- The field of population genetics is concerned
with the distribution patterns of alleles and the
factors that alter or maintain their frequencies - One might expect a detrimental allele to
disappear from the population over time, but
population genetic analysis shows that this is
not usually the case - The main measure is the allele frequency (or gene
frequency), i.e., the frequency with which an
allele is present in the population. - Keep in mind that this measure is not the same as
the frequency of different genotypes, which
involve two alleles in combination.
19Population Genetics 101Measuring Genetic
Variation
- Genotype frequencies depend only on allele
frequencies - For recessive disorders, the genotype BB may
result in selective disadvantage, but this is
balanced by the occurrence of new mutations (see
next slide). This is the Hardy-Weinberg
equilibrium (HWE). - pA frequency of allele A
- pB frequency of allele B
- P(A/A) pA2 P(A/B) pB2 P(A/B) 2pApB
- pA pB 1
- (pA pB)2 1 ? pA2 2pApB pB2 1
20A
a
New mutations
Elimination by disease
21Population Genetics (continued)
- When the allele frequency is known, the expected
genotype frequencies can be determined. - For instance, if the frequency p of allele A is
60 p 0.6q 0.4Frequency of AA
0.36Frequency of Aa 0.48Frequency of aa
0.16 - Conversely, when genotype frequencies are known,
the allele frequencies can be estimated. For
instance, for a recessive disorder one knows the
frequency of disease which provides an estimate
of q2. From this, all other genotype frequencies
can be estimated. - In the case of linkage analysis, the
demonstration of Hardy-Weinberg equilibrium is
very strong evidence for a genetic basis for a
trait.
22Which loci make good markers?
- Clearly, they must be polymorphic (q gt 1).
- However, we also want high heterozygosity, since
it is heterozygous matings that are the most
informative. - Observed vs. expected heterozygosity
- Ho Observed fraction of heterozygous
individuals - He Expected fraction based on allele
frequencies - The frequency f(X) of allele X is the fraction of
times it occurs over all loci (2 per individual) - He 1 the probability of homozygosity
- 1 f2(X) f2(Y) for all alleles
(X,Y,)
23Example 10 Unique Genotypes(in bp lengths of
microsatellite)
Ho 0.30 He 0.69
H 1 high diversity H 0 asexual
mitotic reproduction Ho ltlt He indicates
selective pressure or non-random mating
24Components of the genetic model
- Components of the genetic model include
inheritance pattern (dominant vs. recessive,
sex-linked vs. autosomal), trait allele frequency
(a common or rare disease?), and the frequency of
new mutation at the trait locus. - Another important component of the genetic model
is the penetrance of the trait allele. Knowing
the penetrance of the disease allele is crucial
because it specifies the probability that an
unaffected individual is unaffected because he's
a non-gene carrier or because he's a
non-penetrant gene carrier. The frequency of
phenocopies is an important component, too. - Rough estimates of the disease allele frequency
and penetrance can often be obtained from the
literature or from computer databases, such as
Online Mendelian Inheritance in Man
(http//www3.ncbi.nlm.nih.gov/Omim/). Estimates
of the rate of phenocopies and new mutation are
frequently guesses, included as a nuisance
parameter in some cases to allow for the fact
that these can exist. - Linkage analysis is relatively robust to modest
misspecification of the disease allele frequency
and penetrance, but misspecification of whether
the disease is dominant or recessive can lead to
incorrect conclusions of linkage or non-linkage.
25Steps to linkage analysis
- In pedigrees in which the genetic model is known,
linkage analysis can be broken down into five
steps - State the components of the genetic model.
- Assign underlying disease genotypes given
information in the genetic model. - Determine putative linkage phase.
- Score the meiotic events as recombinant or
non-recombinant. - Calculate and interpret LOD scores.
- Let's take a look at each of these steps in
detail.
26State the components of the model
- In this example, the disease allele will be
assumed to be rare and to function in an
autosomal dominant fashion with complete
penetrance, and the disease locus will be assumed
to have two alleles - N (for normal or wild-type)
- A (for affected or disease)
27Assign underlying disease genotypes
- The assumption of complete penetrance of the
disease allele allows all unaffected individuals
in the pedigree to be assigned a disease genotype
of NN. Since the disease allele is assumed rare,
the disease genotype for affected individuals can
be assigned as AN.
28Determine putative phase
- Individual II-1 has inherited the disease trait
together with marker allele 2 from his affected
father. Thus, the A allele at the disease locus
and the 2 allele at the marker locus were
inherited in the gamete transmitted to II-1.
Once the putative linkage phase (the disease
allele "segregates" with marker allele 2) has
been established, this phase can be tested in
subsequent generations.
29Score the meiotic events asrecombinant (R) or
non-recombinant (NR)
- There are four possible gametes from the
affected parent II-1 N1, N2, A1, and A2. Based
on the putative linkage phase assigned in step 3,
gametes A2 and N1 are non-recombinant.
30Calculate LOD scores
- In this example, the highest LOD score is -0.09
at q 0.40. At no value of q is the lod score
positive, let alone gt3.0, so this pedigree has no
evidence in favor of linkage between the disease
and marker loci.
31Deviations from Mendelian segregation
- The trait is not governed by the alleles of a
single locus - Ascertainment bias heterozygous parents with no
diseased children are not sampled - Differential survival some genotypes do not
survive - Phenocopies the disease can also be caused by
environmental factors - Incomplete penetrance the genotype does not
always lead to disease
For each of these possibilities, is the
segregation ratio increased or decreased?