Title: TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis
1TMproTransmembrane Helix Prediction using
Amino Acid Properties and Latent Semantic
Analysis
- Madhavi Ganapathiraju, N. Balakrishnan, Raj
Reddy and Judith Klein-Seetharaman - Carnegie Mellon University
6th International Conference on Bioinformatics,
Hong Kong, PR China,August 29th, 2007
2Outline
- Introduction
- Membrane proteins
- Transmembrane helix prediction
- Previous methods
- Drawbacks
- Amino acid properties
- Approach
- Algorithm
- Features and models
- Evaluations
- Web server
Introduction
Properties
Approach
Algorithm
Web Server
Previous Methods
3Membrane Proteins
Embedded in the cell / organelle membrane
Membrane Protein
Cell Membrane
Soluble Protein
- Important class of proteins
- Many important functions carried out by them
- Provide access to cell for drug targeting
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
4Transmembrane Segment Characteristics
Cytoplasm (Aqueous medium)
- Transmembrane
- 30Å hydrophobic core
- A helix has to be 19 residues long to go from one
side to the other
Extracellular (Aqueous medium)
Side view
- Questions to be addressed by prediction algorithm
- How many transmembrane segments are there?
- Where are the transmembrane locations in primary
sequence?
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
5Transmembrane Helix Prediction
- Important
- protein family
- structure and function
- regions accessible from extracellular side
- Challenges
- Little available training data
- Overtraining
- Difficulty in discovery of novel architectures
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
6Hydrophobicity scale
Kyte-Doolittle hydrophobicity profile
KD scale, GES scale, WW scale
9 residue window average hydrophobicity
Limitations segment boundary unclear low
accuracy
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
7Current best methods use HMMs
Hidden Markov Model Methods (TMHMM)
Potassium channel
actual
predicted
Limitations too many parameters restrictive
topology
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
8TMpro property based algorithm for
transmembrane helix prediction
9Opportunities for Improvement
Amino acid properties
Nonpolar residues
Charged Residues
Aromatic Residues
- Previous methods
- Do not employ all possible property distributions
- Find average occurrences of amino acids
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
10Properties We Studied
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
11Modified Representation of Primary Sequence
Amino Acid Property Sequences
Charge
Polarity
Aromaticity
Size
Electronic properties
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
12Predictive Capability of Each Property
- Adjust parameters of TMHMM (v 1.0)
- To make it emit one of the property values
- Properties considered
- Polarity polar, non-polar
- Aromaticity aromatic, aliphatic, neutral
- Electronic properties strong donor, weak donor,
neutral, weak acceptor, strong acceptor
3-valued property observations achieve 91
accuracy of that of 20-valued amino acid
observation
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
13Approach
Biology-Language Analogy
Ganapathiraju, et al (2004) LNCS 3345
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
14Text Domain Equivalent
Documents and Words
Documents 15-residue windows
VQLAHHFSEPEITLIIFGVMAGVIGTILLISYGIRRLIKK
----ppn-n-n---- -p--pp-p----p-- -.-.RRR....-.-- OO
O.OOO.O.OOoOO
W1 positively charged W2 polar W3 nonpolar W4
aromatic W5 aliphatic
W6 strong electron acceptor W7 strong electron
donor W8 weak electron acceptor W9 weak
electron donor W10 medium sized
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
15Latent Semantic Analysis
Build Word-Document Matrix
Documents
Distinct features of TM and nonTM achieved
Words
Dimension 2
W USVT
For classification feature vectors SVT can be
used
Dimension 1
Reduced dimensions 4
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
16Different Classifiers/Models
- Support vector machines
- Neural networks
- Linear classifier
- Hidden Markov modeling
- Decision trees
Neural network with LSA features is called TMpro
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
17Evaluations
Uses evolutionary information and many more model
parameters
Benchmark Server Resultshttp//cubic.bioc.columbi
a.edu/services/tmh_benchmark/
Evaluation on larger datasets
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
18TMpro Web Interface
http//linzer.blm.cs.cmu.edu/tmpro/
Novel features for manual annotation
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
19Acknowledgements
- Co-authors
- Judith Klein-Seetharaman
- Raj Reddy
- N. Balakrishnan
- Web-site Development
- Christopher Jon Jursa
- Hassan A. Karimi
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
20 21Larger training data does not improve TMHMM
STMHMM is TMHMM trained with recent 145 TM
proteins
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm
22Performance on Recent Large Dataset
Introduction
Properties
Approach
Web Server
Previous Methods
Algorithm