Conservation Pattern in 145 Aldehyde Dehydrogenases - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Conservation Pattern in 145 Aldehyde Dehydrogenases

Description:

PSC's Biomedical Supercomputing Initiative -- An NIH Resource Center 1 ... Mainly scaffolding or 'filler' - residue identity is not critical to either the ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 44
Provided by: hughbni
Category:

less

Transcript and Presenter's Notes

Title: Conservation Pattern in 145 Aldehyde Dehydrogenases


1
Analysis of Paralogous Subfamilies
2
Sequence Analysis - Overview
3
Diagnosing Subfamily Differences
  • Sequence families or superfamilies often contain
    paralogous genes - genes that have evolved from a
    common ancestor to carry out related but
    different functions.
  • The defining feature of paralogous sequences is a
    gene duplication event in their common
    evolutionary history.
  • Common examples
  • tRNAs for different amino acids or codons.
  • Serine proteases elastase, trypsin,
    chymotrypsin
  • Globins myoglobin, alpha hemoglobin, beta
    hemoglobin ...

4
Paralogous Subfamilies
  • We want to find what sequence residues define the
    identity of paralogous families within the same
    superfamily.

5
What are we trying to discover?
  • The important task is to ask the right question
  • What does the subfamily have in common?
  • The obvious question after studying homologous
    families, but
  • Fails to carefully consider the nature of a
    pattern of conserved residues in a superfamily of
    sequences.
  • Unproductive because it leads to inefficient use
    of the available data
  • What makes the subfamily different from the rest
    of the family or superfamily?

6
Diagnosing Subfamily Differences
  • Columns in a multiple sequence alignment can be
    crudely classified into three distinct
    categories
  • Important to the common function and structure of
    the family
  • limited variability across the entire family
  • Important to the specific function of a subfamily
    of sequences
  • likely to be limited variability within the
    subfamily
  • indeterminate variability outside the subfamily
  • residues within the subfamily differ from those
    outside the subfamily
  • Mainly scaffolding or filler - residue identity
    is not critical to either the family or subfamily
  • highly variable within the family and subfamily
  • Not different between subfamilies and the entire
    family

7
Diagnosing Subfamily Differences
  • What is the biological model?
  • Simple set theory or counting implementation.
  • Information theory implementation.
  • Apply the analysis to tRNAs
  • Set theory results.
  • Information theory results.
  • Apply the analysis to Aldehyde Dehydrogenases and
    Glutathione S-Transferases using the information
    theory implementation.

8
Disjoint Subset Analysis
  • High scores indicate residues essential to the
    function of specific subfamilies family.
  • The analysis corresponds to a straight-forward
    physical model.
  • Has successfully predicted the transfer RNA sites
    essential for amino acid acceptor activity.
  • Predicted previously unknown biochemistry in tRNA
    processing.

9
Disjoint Subset Analysis
  • Much more powerful than consensus analyses
  • Given a superfamily with families A, B, and C
  • Consensus analysis ask three simple questions.
  • What is invariant in family A?
  • What is invariant in family B?
  • What is invariant in family C?
  • Discrete Subset analysis asks three complex
    questions.
  • What is uniquely family A and not families B or
    C?
  • What is uniquely family B and not families A or
    C?
  • What is uniquely family C and not families A or B?

10
Disjoint Subset Analysis
  • Disjoint subset analysis is based on an explicit
    model of macromolecular identity determinants.
  • Biological macromolecules have two types of
    identity determinants positive and negative.
  • Positive identity determinants mediate
    interactions with other molecules
    (macromolecules, ligands, co-factors, substrates,
    or inhibitors) that are essential to the correct
    functioning of the molecule.
  • Negative identity determinants prevent
    interactions, so-called forbidden interactions,
    with other molecules that would lead to incorrect
    functioning of the molecule - carrying out the
    function of a different family of molecules.

11
Disjoint Subset Analysis
  • Disjoint subset analysis is based on an explicit
    model of macromolecular identity determinants.
  • Molecules within an homologous family have an
    overlapping set of positive identity determinants
    at the same positions within the structure and
    sequence of the molecule.
  • Paralogous subfamilies can have positive identity
    determinants at different positions within the
    molecule.
  • Paralogous subfamilies that share a necessary
    interaction will most likely share positive
    identity determinants for that interaction.
  • Individual molecules within the family may have
    completely different negative identity
    determinants for any particular forbidden
    interaction.

12
Analysis of Alanine tRNAs
Ala-1 G G G G G Ala-2
G G C G C Arg-1 G A U U A
s d d d d s d d d d Arg-2 G G A C C s
s d d d s s d d s Leu-1 G G U A A s s
d d d s s d d d Leu-2 G C U G G s d d s
s s d d s d Leu-3 G C U C G s d d d s
s d d d d Total number of ds 0 3 5 4 3
0 3 5 4 4 Aggregate the totaled ds (discrete
sequences). (0,3,5,4,3) (0,3,5,4,4)
(0,6,10,8,7) Ala position 3 (G,C) is discrete
from Arg (U,A) and Leu (U,U,U) and hence
completely and logically identifies Ala.
13
Two Entropy Measures
Family Entropy
Group Entropy Distance

pi foreground residue frequency qi
background residue frequency
14
Residue Frequency Data
  • Family Entropy
  • Foreground residue frequencies, pi,are taken from
    each column of the alignment data
  • Background frequencies, qi, are taken from as the
    expected values of residues in random sequences
  • Group Entropy Distances
  • Foreground residue frequencies, pi, are taken
    from a single column of a defined group within
    the alignment data
  • Background residue frequencies, qi, are taken
    from a single column of all residues outside of
    the defined group within the alignment data

15
Group Entropy Distance Ala tRNAs
Ala-1 G G G G G Ala-2 G G C G C Arg-1 G A
U U A Arg-2 G G A C C Leu-1 G G U A A Leu-2
G C U G G Leu-3 G C U C G
pi fractions of nucleotides within the Alanine
group. qj fractions of nucleotides in
the ArginineLeucine group. pa 0.1 pc 0.4
pg 0.4 pu 0.1 qa0.15 qc0.05 qg0.05
qu0.75
GED 0.1log(0.1/0.15) 0.4log(0.4/0.05)
0.4log(0.4/0.05) 0.1log(0.1/0.75)
0.15log(0.15/0.1) 0.05log(0.05/0.4)
0.05log(0.05/0.4) 0.75log(0.75/0.1)
GED .059 1.200 1.200 - 0.291 0.088 - 0.15
- 0.15 2.180 4.136
16
Calculating Group Entropy
17
Analyzing tRNA isoacceptors
  • 67 tRNAs from Escherichia coli, near relatives,
    and its bacteriophage
  • 20 amino acid isoacceptor subfamilies
  • only one sequence in some isoacceptor subfamilies
    (Phe)
  • as many as eight sequences in Leu and Pro
    subfamilies
  • William H. McClain University of Wisconsin

18
(No Transcript)
19
Analysis of diagnostic sequence elements in 67
tRNAs from E. coli
20
Consensus Analysis of 3 Valine tRNAs
21
Analysis of diagnostic sequence elements in 67
tRNAs from E. coli
22
Analysis of diagnostic sequence elements in 67
tRNAs from E. coli
23
tRNA Discriminator Positions
24
Differences Among Groups ofAldehyde
Dehydrogenases
  • Hugh Nicholas, Pittsburgh Supercomputing Center
  • John Hempel, University of Pittsburgh
  • John Perozich, University of Pittsburgh
  • Bi-Cheng, Wang, University of Georgia
  • Ronald Lindahl, University of South Dakota

25
Relationship Among ALDH Families
26
Motifs Strength and Consensus
Red 100 conserved Green gt 90 Blue gt
80. Italics functional residues.
27
Relationship Among Motifs
Rat Class 3 Aldehyde Dehydrogenase
28
Two Entropy Measures
Family Entropy
Group Entropy Distance

pi foreground residue frequency qi
background residue frequency
29
Graphical Classification of Residues
Type 1 Residues
Forbidden Region
Family Entropy
Type 2 Residues
Type 3 Residues
Group Entropy Distance
30
Diagnostic positions for Class 3 ALDH
31
Motifs and Diagnostic Residues for all ALDH
Classes
32
Motifs and Class 3 Discriminators
33
ALDH Motif 6
Catalytic thiol (cys)
34
ALDH Motif 8
NAD binding and specificity
35
Asp 247 An Sjögren-Larssen Mutation in Class 3
ALDH
36
Differences Among Groups ofGlutathione
S-Transferases
  • Hugh B. Nicholas Jr.
  • Troy Wymore
  • David W. Deerfield, II.

37
Glutathione S-Transferase
  • Detoxifies organic chemicals containing halogen
    or double bonds by addition of Glutathione.
  • Subsequent processing pathway leads to excretion.
  • The catalytic residue (thiol) is from
    Glutathione.
  • Only the cytoplasmic form is presented here.
  • Classified into six groups, initially based on
    Swiss-Prot database annotation. Exact number of
    groups is still subject to debate.
  • Found in bacteria and all kinds of eucaryotes.
  • 126 Sequences from the Swiss-Protein Database.

38
Consensus Bootstap Phylogeny
39
MEME ZOOPS Motifs for GTS
40
MEME ZOOPS Motifs -- Rat Mu-1
41
(No Transcript)
42
Cross Entropy Group Positions
43
Group Specific Amino AcidsMu, Alpha, and Theta
GSTs
Rat Mu1
Human Alpha1
Human Theta2
Write a Comment
User Comments (0)
About PowerShow.com