Title: Conservation Pattern in 145 Aldehyde Dehydrogenases
1Analysis of Paralogous Subfamilies
2Sequence Analysis - Overview
3Diagnosing Subfamily Differences
- Sequence families or superfamilies often contain
paralogous genes - genes that have evolved from a
common ancestor to carry out related but
different functions. - The defining feature of paralogous sequences is a
gene duplication event in their common
evolutionary history. - Common examples
- tRNAs for different amino acids or codons.
- Serine proteases elastase, trypsin,
chymotrypsin - Globins myoglobin, alpha hemoglobin, beta
hemoglobin ...
4Paralogous Subfamilies
- We want to find what sequence residues define the
identity of paralogous families within the same
superfamily.
5What are we trying to discover?
- The important task is to ask the right question
- What does the subfamily have in common?
- The obvious question after studying homologous
families, but - Fails to carefully consider the nature of a
pattern of conserved residues in a superfamily of
sequences. - Unproductive because it leads to inefficient use
of the available data - What makes the subfamily different from the rest
of the family or superfamily?
6Diagnosing Subfamily Differences
- Columns in a multiple sequence alignment can be
crudely classified into three distinct
categories - Important to the common function and structure of
the family - limited variability across the entire family
- Important to the specific function of a subfamily
of sequences - likely to be limited variability within the
subfamily - indeterminate variability outside the subfamily
- residues within the subfamily differ from those
outside the subfamily - Mainly scaffolding or filler - residue identity
is not critical to either the family or subfamily - highly variable within the family and subfamily
- Not different between subfamilies and the entire
family
7Diagnosing Subfamily Differences
- What is the biological model?
- Simple set theory or counting implementation.
- Information theory implementation.
- Apply the analysis to tRNAs
- Set theory results.
- Information theory results.
- Apply the analysis to Aldehyde Dehydrogenases and
Glutathione S-Transferases using the information
theory implementation.
8Disjoint Subset Analysis
- High scores indicate residues essential to the
function of specific subfamilies family. - The analysis corresponds to a straight-forward
physical model. - Has successfully predicted the transfer RNA sites
essential for amino acid acceptor activity. - Predicted previously unknown biochemistry in tRNA
processing.
9Disjoint Subset Analysis
- Much more powerful than consensus analyses
- Given a superfamily with families A, B, and C
- Consensus analysis ask three simple questions.
- What is invariant in family A?
- What is invariant in family B?
- What is invariant in family C?
- Discrete Subset analysis asks three complex
questions. - What is uniquely family A and not families B or
C? - What is uniquely family B and not families A or
C? - What is uniquely family C and not families A or B?
10Disjoint Subset Analysis
- Disjoint subset analysis is based on an explicit
model of macromolecular identity determinants. - Biological macromolecules have two types of
identity determinants positive and negative. - Positive identity determinants mediate
interactions with other molecules
(macromolecules, ligands, co-factors, substrates,
or inhibitors) that are essential to the correct
functioning of the molecule. - Negative identity determinants prevent
interactions, so-called forbidden interactions,
with other molecules that would lead to incorrect
functioning of the molecule - carrying out the
function of a different family of molecules.
11Disjoint Subset Analysis
- Disjoint subset analysis is based on an explicit
model of macromolecular identity determinants. - Molecules within an homologous family have an
overlapping set of positive identity determinants
at the same positions within the structure and
sequence of the molecule. - Paralogous subfamilies can have positive identity
determinants at different positions within the
molecule. - Paralogous subfamilies that share a necessary
interaction will most likely share positive
identity determinants for that interaction. - Individual molecules within the family may have
completely different negative identity
determinants for any particular forbidden
interaction.
12Analysis of Alanine tRNAs
Ala-1 G G G G G Ala-2
G G C G C Arg-1 G A U U A
s d d d d s d d d d Arg-2 G G A C C s
s d d d s s d d s Leu-1 G G U A A s s
d d d s s d d d Leu-2 G C U G G s d d s
s s d d s d Leu-3 G C U C G s d d d s
s d d d d Total number of ds 0 3 5 4 3
0 3 5 4 4 Aggregate the totaled ds (discrete
sequences). (0,3,5,4,3) (0,3,5,4,4)
(0,6,10,8,7) Ala position 3 (G,C) is discrete
from Arg (U,A) and Leu (U,U,U) and hence
completely and logically identifies Ala.
13Two Entropy Measures
Family Entropy
Group Entropy Distance
pi foreground residue frequency qi
background residue frequency
14Residue Frequency Data
- Family Entropy
- Foreground residue frequencies, pi,are taken from
each column of the alignment data - Background frequencies, qi, are taken from as the
expected values of residues in random sequences - Group Entropy Distances
- Foreground residue frequencies, pi, are taken
from a single column of a defined group within
the alignment data - Background residue frequencies, qi, are taken
from a single column of all residues outside of
the defined group within the alignment data
15Group Entropy Distance Ala tRNAs
Ala-1 G G G G G Ala-2 G G C G C Arg-1 G A
U U A Arg-2 G G A C C Leu-1 G G U A A Leu-2
G C U G G Leu-3 G C U C G
pi fractions of nucleotides within the Alanine
group. qj fractions of nucleotides in
the ArginineLeucine group. pa 0.1 pc 0.4
pg 0.4 pu 0.1 qa0.15 qc0.05 qg0.05
qu0.75
GED 0.1log(0.1/0.15) 0.4log(0.4/0.05)
0.4log(0.4/0.05) 0.1log(0.1/0.75)
0.15log(0.15/0.1) 0.05log(0.05/0.4)
0.05log(0.05/0.4) 0.75log(0.75/0.1)
GED .059 1.200 1.200 - 0.291 0.088 - 0.15
- 0.15 2.180 4.136
16Calculating Group Entropy
17Analyzing tRNA isoacceptors
- 67 tRNAs from Escherichia coli, near relatives,
and its bacteriophage - 20 amino acid isoacceptor subfamilies
- only one sequence in some isoacceptor subfamilies
(Phe) - as many as eight sequences in Leu and Pro
subfamilies - William H. McClain University of Wisconsin
18(No Transcript)
19Analysis of diagnostic sequence elements in 67
tRNAs from E. coli
20Consensus Analysis of 3 Valine tRNAs
21Analysis of diagnostic sequence elements in 67
tRNAs from E. coli
22Analysis of diagnostic sequence elements in 67
tRNAs from E. coli
23tRNA Discriminator Positions
24Differences Among Groups ofAldehyde
Dehydrogenases
- Hugh Nicholas, Pittsburgh Supercomputing Center
- John Hempel, University of Pittsburgh
- John Perozich, University of Pittsburgh
- Bi-Cheng, Wang, University of Georgia
- Ronald Lindahl, University of South Dakota
25Relationship Among ALDH Families
26Motifs Strength and Consensus
Red 100 conserved Green gt 90 Blue gt
80. Italics functional residues.
27Relationship Among Motifs
Rat Class 3 Aldehyde Dehydrogenase
28Two Entropy Measures
Family Entropy
Group Entropy Distance
pi foreground residue frequency qi
background residue frequency
29Graphical Classification of Residues
Type 1 Residues
Forbidden Region
Family Entropy
Type 2 Residues
Type 3 Residues
Group Entropy Distance
30Diagnostic positions for Class 3 ALDH
31Motifs and Diagnostic Residues for all ALDH
Classes
32Motifs and Class 3 Discriminators
33ALDH Motif 6
Catalytic thiol (cys)
34ALDH Motif 8
NAD binding and specificity
35Asp 247 An Sjögren-Larssen Mutation in Class 3
ALDH
36Differences Among Groups ofGlutathione
S-Transferases
- Hugh B. Nicholas Jr.
- Troy Wymore
- David W. Deerfield, II.
37Glutathione S-Transferase
- Detoxifies organic chemicals containing halogen
or double bonds by addition of Glutathione. - Subsequent processing pathway leads to excretion.
- The catalytic residue (thiol) is from
Glutathione. - Only the cytoplasmic form is presented here.
- Classified into six groups, initially based on
Swiss-Prot database annotation. Exact number of
groups is still subject to debate. - Found in bacteria and all kinds of eucaryotes.
- 126 Sequences from the Swiss-Protein Database.
38Consensus Bootstap Phylogeny
39MEME ZOOPS Motifs for GTS
40MEME ZOOPS Motifs -- Rat Mu-1
41(No Transcript)
42Cross Entropy Group Positions
43Group Specific Amino AcidsMu, Alpha, and Theta
GSTs
Rat Mu1
Human Alpha1
Human Theta2