Detecting adaptive protein evolution

About This Presentation

Title:

Detecting adaptive protein evolution

Description:

There are two main explanations for genetic variation observed ... (abalone sperm lysin, sea urchin bindin, proteins in mammals) Miscellaneous. Acknowledgments ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 40

Provided by: zih6

Category:

more less

Transcript and Presenter's Notes

Title: Detecting adaptive protein evolution

1
Detecting adaptive protein evolution
Ziheng Yang Department of BiologyUniversity
College London
2
There are two main explanations for genetic
variation observed within a population or between
speciesNatural selection (survival of the
fittest)mutation and drift (survival of the
luckiest)
Gillespie, J.H. 1998. Population genetics a
concise guide. John Hopkins University Press,
Baltimore. Hartl, D.L., and A.G. Clark. 1997.
Principles of population genetics. Sinauer
Associates, Sunderland, Massachusetts.
3
Positive negative selection
Genotype AA Aa aa Frequency p2
2p(1-p) (1-p)2 Fitness 1 1s 12s
(A wildtype-allele a new mutant) s is
selection coefficient s ? 0 neutral
evolution s lt 0 negative (purifying) selection
s gt 0 positive selection (adaptive evolution)
4
Positive negative selection
Whether mutation or selection dominates the fate
of the new allele depends on whether Ns ? 1,
where N is the effective population size.
Ns lt -3 fatal mutations -3 lt Ns lt -1 unlucky
losers -1 lt Ns lt 1 nearly neutral 1 lt Ns lt
3 occasional hopefuls Ns gt 3 rare monsters
5
Theories of molecular evolution
Akashi, H. (1999) Gene 238 39-51
6
Detecting the effect of natural selection is
useful for (a) advancing evolutionary theory
(b) inferring functional significance from
genomic data.
7
Evolutionary conservation means functional
significance.
Thomas, et al. 2003. Nature 424788-793
8
Fast-evolving genes or gene regions are also
functionally important if the variability is
driven by natural selection.
9
In protein-coding genes, we can distinguish
between synonymous (silent) and nonsynonymous
(replacement) mutations, and contrast their
substitution rates to infer selection on the
protein.
10
Synonymous nonsynonymous substitutions
11
Definitions
dS (KS) number of synonymous substitutions per
synonymous site dN (KA) number of nonsynonymous
substitutions per nonsynonymous site ? dN/dS
nonsynonymous/synonymous rate ratio
12
The ? ratio measures selection at the protein
level

? 1 neutral evolution
? lt 1 negative (purifying) selection
? gt 1 positive (diversifying) selection

13
Data information
a2 GGC TCT CAC TCC ATG AGG TAT TTC TTC ACA
TCC a24 ... ..C ... ... ... ..T ... ... .A.
..C ... a11 ... ..C ..A ... ... ... ... ...
.A. ..C ... aw24 ... ..C ... ... ... ... ...
... CA. ..C ... aw68 ... ..C ... ... ... ..A
... ... .A. ..C ... a3 ... ..T ..T ... ...
... ... ... C.. ..T ...
14
Early studies average synonymous and
nonsynonymous rates over sites and have little
power in detecting adaptive evolution.
15
Possible approaches

Test each site for positive selection (Suzuki
Gojobori 1999 Mol. Biol. Evol. 16 13151328)

Decide on which sites might be under selection
and focus on them (Hughes Nei 1988 Nature
335167-170) (fixed-sites model)

Use a statistical distribution to model the ?
variation (random-sites model, fishing expedition)

16
A simple approach (Fitch et al. 1997 Suzuki
Gojobori 1999)
TTC
TTC
T?A
TTC
ATC
TTC
TTC
TTA
C?A
T?A
TAT
C?T
TTT
TTT
3 nonsynonymous changes1 synonymous change
17
Use of codon models to detect amino acid sites
under diversifying selection

Likelihood Ratio Test (LRT) for sites under
positive selection
Bayes calculation of posterior probabilities of
sites under positive selection

18
Rates to CTG
Synonymous CTC (Leu) ? CTG (Leu)
?CTG TTG (Leu) ? CTG (Leu)
??CTG
Nonsynonymous GTG (Val) ? CTG (Leu)
??CTG CCG (Pro) ? CTG (Leu)
???CTG
19
Rate matrix Q qij
(Goldman Yang 1994 Mol Biol Evol
11725-736Muse Gaut 1994 Mol Biol Evol
11715-724)
20
LRT of sites under positive selection
H0 there are no sites at which ? gt 1H1 there
are such sites Compare 2?? 2(?1 - ?0) with a ?2
distribution
(Nielsen Yang 1998 Genetics 148929-936Yang,
Nielsen, Goldman Pedersen 2000. Genetics
155431-449)
21
Two pairs of useful models

M1a (Nearly Neutral)
Site class k 0 1
pk p0 p1
?k ?0lt1 ?11
M2a (Positive Selection)
Site class k 0 1 2
pk p0 p1 p2
?k ?0lt1 ?11 ?2gt1

Modified from Nielsen Yang (1998), where ?00
is fixed
22

M7 (beta, using 10 site classes)
? beta(p, q)
M8 (beta?)
p0 of sites from beta(p, q)
p1 1 - p0 of sites with ?s gt 1

From Yang et al. (2000)
23
(No Transcript)
24
Discretisation of a continuous distribution
M7(beta)
Sites
0
0.2
0.4
0.6
0.8
1
? ratio
25
Mixture distribution M8(beta?)
p1
p0 from beta(p, q)
Sites
0
0.2
0.4
0.6
0.8
1
?1.7
? ratio
26
Likelihood function and Empirical Bayesian
inference of sites under selection (M2a)
Site class k 0 1
2 Proportion pk p0 p1
p2 ? ratio ?k ?0 lt 1 ?1 1 ?2
gt 1
27
Bayes Empirical Bayes (BEB) M2a
28
Human MHC Class I data192 alleles, 270 codons
Model ? Parameter estimates M7 (beta)
?7,498.97 beta(0.10, 0.35) M8 (beta?)
?7,232.68 p0 0.90, beta(0.17, 0.71) (p1
0.10), ?s 5.12
Likelihood ratio test of positive selection 2??
2 ? 266.29 532.58, P lt 0.000, d.f. 2
29
Posterior probabilities for MHC
30
25 sites identified by M8 (beta?) using both NEB
BEB
31
Comparison between NEB and BEB from real data
analysis and computer simulation suggests that

BEB is effective in correcting high false
positive rates of NEB in small (non-informative)
data sets.
BEB does not seem to cause a loss of power in
large (informative) data sets.
Some wrong models are more useful than the true
model.

32
A small data set (HTLV tax gene)(Suzuki Nei
2004 MBE 21914-921)
20 sequences, 181 codons. 23 singleton
differences on star tree 2 synonymous, 21
nonsynonymous NEB M0 (one-ratio), M2
(selection), M2a (PositiveSelection), M8 (beta?)
all give ? 4.87. Every site is under positive
selection with P 1 BEB 21 sites have 0.91 lt P
lt 0.93 under M2a and 0.96 lt P lt 0.97 under M8.
Other sites have P 57 or 70.
33
Performance measures in simulation
True positive 50/80 False positive
10/120 Accuracy 50/60
34
Performance of BEB (NEB) in simulations
(cutoff P 95)
35
Advantages of ML

Accounts for the genetic code
Accounts for ts/tv rate bias and codon usage bias
Avoids bias in ancestral reconstruction
Uses probability theory to correct for multiple
hits

36
Assumptions Limitations

Same selective pressure over all lineages
No recombination within the sequence
No variation in synonymous rate among sites
Same rate for all amino acid changes
No sequencing or alignment errors
The level of sequence divergence and the number
of sequences are two major factors affecting
accuracy and power. Data of only a few closely
related sequences do not contain much information.

37
Adaptive molecular evolution

proteins involved in immunity or defence (MHC,
immunoglobulin VH, class 1 chitinas)
proteins involved in evading defence systems
(HIV env, nef, gap, pol, etc., capsid in FMD
virus, flu virus hemagglutinin gene)
proteins involved in male female
reproduction(abalone sperm lysin, sea urchin
bindin, proteins in mammals)
Miscellaneous

38
Acknowledgments
BBSRC
http//abacus.gene.ucl.ac.uk/
39
References
Yang, Z., and J.P. Bielawski. 2000. Statistical
methods for detecting molecular adaptation.
Trends in Ecology and Evolution 15
496-503. Yang, Z. 2001. Adaptive molecular
evolution, Chapter 12 (pp. 327-350) in Handbook
of statistical genetics, eds. D. Balding, M.
Bishop, and C. Cannings. Wiley, New York. Yang,
Z. 2002. Inference of selection from multiple
species alignments. Current Opinion in Genetics
and Development 12688-694. Wong, W.S.W., et al.
2004. Accuracy and power of statistical methods
for detecting adaptive evolution in protein
coding sequences and for identifying positively
selected sites. Genetics 168 1041-1051. Yang,
Z., et al. submitted. Bayes empirical Bayes
inference of amino acid sites under positive
selection. Molecular Biology Evolution

Write a Comment

User Comments (0)

About PowerShow.com

Detecting adaptive protein evolution - PowerPoint PPT Presentation

Detecting adaptive protein evolution

There are two main explanations for genetic variation observed ... (abalone sperm lysin, sea urchin bindin, proteins in mammals) Miscellaneous. Acknowledgments ... – PowerPoint PPT presentation