Title: What is positive selection
1What is positive selection?
- dN rate of nonsynonymous
- substitution
- dS rate of synonymous substitution
- Let ? ratio ratio of dN/dS
2Positive selection occurs when the ? ratio
exceeds unity
__________________________________ Type of
Selection Outcome _______________________________
___ Purifying selection dN/dS lt 1 No
selection dN/dS 1 Positive selection dN/dS
gt 1 __________________________________
3How do we test for positive selection?
- 1. Estimate means and variances of dN and dS for
all pair-wise species comparisons. - 2. Use t-test to determine if dN and dS differ
significantly.
4Some problems
- 1. Averages over all amino acid positions in a
protein.
5Some problems
1. Averages over all amino acid positions in a
protein. 2. Averages over all lineages.
6Some problems
1. Averages over all amino acid positions in a
protein. 2. Averages over all lineages. 3. Can
detect positive selection only when it is very
strong and consistent through evolutionary time.
7Some more problems
- Ignores the phylogenetic framework in which
adaptive molecular evolution occurs!
8Suppose a significant ? ratio is detected between
cow and pig
- 1 2 3 4 5 6
- 1. Cow --- ns ns ns 3.45 ns
- 2. Deer --- ns ns ns ns
- 3. Whale --- ns ns ns
- 4. Hippo --- ns ns
- 5. Pig --- ns
- 6. Camel ---
- ns non significant
9A phylogenetic perspective
? gt 1?
?
? gt 1?
Cow
?
? gt 1?
Deer
?
Whale
Hippo
? gt 1?
?
Pig
Camel
Outgroup
10An example lysozyme evolution in colobine
monkeys
- Colobine monkeys are leaf-eaters that have
evolved a complex foregut (like ruminants). - Stomach expresses a high level of the
bacteriolytic enzyme, lysozyme.
11Phylogeny of Colobines and Cercopithecines
Foregut fermentation evolved ?
Hanuman langur Purple-faced langur Dusky
Langur Francois Langur Proboscis monkey Guereza
colobus Angolan colobus Patas monkey Vervet Talapo
in Rhesus macaque Allens monkey Olive
baboon Sooty mangabey Chimpanzee
Colobines
Cercopithecines
from Messier Stewart (1997)
12Phylogeny of Colobines and Cercopithecines
Hanuman langur Purple-faced langur Dusky
Langur Francois Langur Proboscis monkey Guereza
colobus Angolan colobus Patas monkey Vervet Talapo
in Rhesus macaque Allens monkey Olive
baboon Sooty mangabey Chimpanzee
? 4.7 ?
Colobines
Cercopithecines
from Messier Stewart (1997)
13A Maximum-Likelihood (ML) approach to the
detection of positive selection
14A Maximum-Likelihood (ML) approach to the
detection of positive selection
- ML methods evaluate the probability (i.e.,
likelihood) of obtaining a set of DNA sequences
given - a specific phylogenetic tree
- an explicit model of nucleotide substitution.
15Some details of the model
- ? Implemented in the PAML package of Yang (1997)
- ? Uses a Markov process to describe substitutions
between sense codons - ? Parameters include transition/transversion
ratio (?) - codon frequencies (?)
- branch lengths scaled for time (t)
16Testing for positive selection involves comparing
two models
- Model M7 Assumes ? ratios follow a beta
distribution (i.e., constrained in the interval
0-1).
17Testing for positive selection involves comparing
two models
Model M7 Assumes ? ratios follow a beta
distribution (i.e., constrained in the interval
0-1). Model M8 Adds a second class of sites to
M7 at which ? ratios can exceed unity (i.e.,
positive selection).
18Statistical testing can be done by likelihood
ratio tests (LRTs)
- 1. Obtain log likelihood score from model M7, lM7
(null model).
19Statistical testing can be done by likelihood
ratio tests (LRTs)
- 1. Obtain log likelihood score from model M7, lM7
(null model). - 2. Obtain log likelihood score from model M8, lM8
(positive selection).
20Statistical testing can be done by likelihood
ratio tests (LRTs)
- 1. Obtain log likelihood score from model M7, lM7
(null model). - 2. Obtain log likelihood score from model M8, lM8
(positive selection). - 3. Test for significance
- X 2 2 (lM8 lM7 ) with 1 d.f.
21Advantages of ML approach
- 1. Allows for formal statistical testing by
likelihood ratio tests.
22Advantages of ML approach
- 1. Allows for formal statistical testing by
likelihood ratio tests. - 2. Allows for individual codons subject to
positive selection to be identified.
23Advantages of ML approach
- 1. Allows for formal statistical testing by
likelihood ratio tests. - 2. Allows for individual codons subject to
positive selection to be identified. - 3. Allows for positive selection to be inferred
along individual branches of a phylogeny.
24Application to the pantophysin gene in marine
gadid fishes
- ? Pantophysin is an integral membrane protein
localized to small (lt100 nm) cytoplasmic
microvesicles - ? Believed to function in a variety of
intracellular shuttling pathways - ? Exact function remains unknown
25Transmembrane structure of pantophysin
V N E E I F A S F N Y P F R
L M
T S I V A L
P S
S
Q
S D V C
P
P
Lumen of microvesicle
T R
G T P
V Q Y
T D S
K C
Q K N
G
A G
V
T T E S
W N
K C T
N
F
V A
G
I
L
D
S
S
Y N G I
T P
V
T
T
L H
T
S S
G M
G
A
K G
S
E
Y
F
G
R
S
F
W
L
A
N
V
A
T
A
S
S
T
S
I
S
A
V
G
S
F
I
V
V
Microvesicle membrane
A
F
L
L
F
G
S
W
I
F
F
L
S
L
F
L
F
Y
A
N
I
L
S
L
L
I
T
A
E
L
A
A
L
W
S
T
V
G
L
V
R
S
V
F
I
N
L
L
F
Y
D
C
P
L
V V
W F
L
I
G
P
P
Y
F
G
K
E
E R
E Q P E D A
P
K
H S
T N E
P T-COOH
L
L T
Y K
P A A
G
P
R
F H K S R
R
F
G G Q
Cytoplasm
A
V L Q N V V D M-NH2
26Transmembrane structure of pantophysin
V N E E I F A S F N Y P F R
L M
T S I V A L
P S
S
Q
Intra- vesicular domains
S D V C
P
P
Lumen of microvesicle
IV1
T R
G T P
V Q Y
T D S
K C
Q K N
G
A G
V
T T E S
W N
IV2
K C T
N
F
V A
G
I
L
D
S
S
Y N G I
T P
V
T
T
L H
T
S S
G M
G
A
K G
S
E
Y
F
G
R
S
F
W
L
A
N
V
A
T
A
S
S
T
S
I
S
A
V
G
S
Trans- Membrane (TM) domains
F
I
V
V
Microvesicle membrane
A
F
L
L
F
G
S
W
I
F
F
L
S
L
F
L
F
Y
A
N
I
L
S
L
L
I
T
A
E
L
A
A
L
W
S
T
V
G
L
V
R
S
V
F
I
N
L
L
F
Y
D
C
P
L
V V
W F
L
I
G
P
P
Y
F
G
Cytoplasmic (Cyt) domains
K
E
E R
E Q P E D A
P
K
H S
T N E
P T-COOH
L
L T
Y K
P A A
G
P
R
F H K S R
R
F
G G Q
Cytoplasm
A
V L Q N V V D M-NH2
27BA105A
BS39A
BA107A
Genealogy of PanI alleles in the Atlantic cod
BA108A
BA112A
BA143A
BS21A
BS29A
IC70A
IC74A
BS71A
BS72A
BA126A
BA115A
BA128A
BS49A
BS53A
BA132A
BS81A
BS87A
NS1A
NS12A
NS28A
NS73A
PanIA alleles (N 64)
NS79A
NS91A
NS34A
100
NS41A
NS58A
NS74A
IC2A
IC30A
NF24A
NF42A
NF88A
NF94A
NF142A
NF158A
NF162A
BA138A
BA140A
BA149A
BS20A
NS83A
NF17A
NF73A
BS31A
BS64A
NS68A
NS70A
IC6A
IC8A
IC9A
IC41A
IC80A
IC42A
IC61A
IC78A
NF6A
NF56A
NF11A
NF36A
BA105B
BA107B
BA108B
BA112B
BA128B
NS1B
NF88B
BA115B
BA126B
BA132B
BA138B
BA140B
BA143B
BA149B
BS20B
BS21B
BS29B
BS31B
BS39B
PanIB alleles (N 64)
BS49B
BS53B
BS64B
BS71B
BS72B
BS81B
BS87B
NS12B
NS28B
NS41B
NS68B
NS70B
NS74B
NS91B
IC2B
IC6B
IC8B
IC25B
IC30B
IC41B
IC42B
100
IC61B
IC70B
IC74B
IC78B
IC80B
NF11B
NF24B
NF36B
NF56B
1 change
NF73B
NF94B
NF158B
NS34B
NF6B
NF17B
NF162B
NS58B
NS73B
NS79B
NS83B
Gadus ogac
NF42B
NF142B
28Amino acid differences between PanIA and PanIB
alleles
T S I V A L
V N E E I F A S F N Y P F R
L M
P S
S
Q
Lumen of microvesicle
S D V C
P
P
T R
G T P
V Q Y
IV1
T D S
K C
A G
Q K N
G
W N
V
T T E S
K C T
N
F
V A
G
I
L
D
S
S
Y N G I
T P
V
T
T
L H
T
S S
G M
G
A
S
E
K G
Y
R
F
G
S
L
F
W
A
N
V
A
T
A
S
S
T
S
I
S
A
V
G
S
F
I
V
V
Microvesicle membrane
A
F
L
L
F
G
S
W
I
F
F
L
S
L
L
F
F
N
Y
A
I
L
S
L
L
T
A
I
E
L
A
A
L
S
T
W
V
G
L
V
R
V
F
S
I
N
L
L
F
Y
D
C
P
W F
L
V V
L
G
P
I
P
Y
F
G
K
E
E R
H S
E Q P E D A
P
K
T N E
P T-COOH
L
L T
Y K
P A A
G
P
F H K S R
R
R
F
G G Q
Cytoplasm
A
V L Q N V V D M-NH2
29Amino acid substitutions within PanI allelic
classes
__________________________________________________
_________ Codon Amino Acid
Distribution Allele
Position Change Location
Classificationa in sample _____________________
______________________________________ PanIA
61 Lys to Gln IV1
Radical Fixed 64 Asn to Thr
IV1 Radical Fixed 79 Ser
to Thr IV1 Radical Fixed PanIB
43 Glu to Val IV1
Radical Fixed 61 Lys to Asn
IV1 Radical Fixed 64 Asn
to Asp IV1 Radical Fixed _______
__________________________________________________
__ a following Taylor (1986)