Title: Mathematical Challenges in Protein Motif Recognition
1(No Transcript)
2 Mathematical Challenges in Protein Motif
Recognition Bonnie Berger MIT
3Approaches to Structural Motif Recognition
Alignments Multiple alignments HMMs
Threading Profile methods (1D, 3D)
Statistical methods
4Structural Motif Recognition
1) Collect a database of positive examples of a
motif (e.g., coiled coil, beta helix). 2) Devise
a method to determine if an unknown sequence
folds as the motif or not. 3) Verification in lab.
5Our Coiled-Coil Programs
- PairCoil Berger, Wilson, Wolf, Tonchev, Milla,
Kim,1995 - predicts 2-stranded CCs
- http//theory.lcs.mit.edu/paircoil
- MultiCoil Wolf, Kim, Berger, 1997
- predicts 3-stranded CCs
- http//theory.lcs.mit.edu/multicoil
- LearnCoil-Histidine Kinase Singh, Berger, Kim,
Berger, Cochran, 1998 - predicts CCs in histidine kinase linker domains
- http//theory.lcs.mit.edu/learncoil
- LearnCoil-VMF Singh, Berger, Kim, 1999
- predicts CCs in viral membrane fusion proteins
- http//theory.lcs.mit.edu/learncoil-vmf
6Long Distance Correlations
In beta structures, amino acids close in the
folded 3D structure may be far away in the linear
sequence
7Biological Importance of Beta Helices
- Surface proteins in human infectious disease
- virulence factors (plants, too)
- adhesins
- toxins
- allergens
- Amyloid fibrils (e.g., Alzheimers, Creutzfeld
Jakob (Mad Cow) disease) - Potential new materials
8What is Known
- Solved beta-helix structures
- 12 structures in PDB in 7 different SCOP families
- Related work
- ID profile of pectate lyase (Heffron et al. 98)
- HMM (e.g., HMMER)
-
- Threading (e.g., 3D-PSSM)
9Key Databases
Solved structures Protein Data Bank (PDB) (100s
of non-redundant structures) www.rcsb.org/pdb/
Sequence databases Genbank (100s of thousands
of protein sequences) www.ncbi.nlm.nih.gov/Genban
k/GenbankSearch.html SWISSPROT (10s of
thousands of protein sequences) www.ebi.ac.uk/swi
ssprot
10BetaWrap Program
Bradley, Cowen, Menke, King, Berger RECOMB 2001
- Performance
- On PDB no false positives no false negatives.
- Recognizes beta helices in PDB across SCOP
families in cross-validation. - Recognizes many new potential beta helices.
- Runs in linear time (5 min. on SWISS-PROT).
11BetaWrap Program
- Histogram of protein scores for
- beta helices not in database (12 proteins)
- non-beta helices in PDB (1346 proteins )
12Single Rung of a Beta Helix
13(No Transcript)
143D Pairwise Correlations
Stacking residues in adjacent beta-strands
exhibit strong correlations Residues in the T2
turn have special correlations (Asparagine
ladder, aliphatic stacking)
B1
153D Pairwise Correlations
Stacking residues in adjacent beta-strands
exhibit strong correlations Residues in the T2
turn have special correlations (Asparagine
ladder, aliphatic stacking)
B1
16 17Question but how can we find these correlations
which are a variable distance apart in sequence?
Tailspike, 63 residue turn
18Finding Candidate Wraps
- Assume we have the correct locations of a
- single T2 turn (fixed B2 B3).
Candidate Rung
B3
T2
B2
- Generate the 5 best-scoring candidates for the
next rung.
19Scoring Candidate Wraps (rung-to-rung)
Similar to probabilistic framework plus
- Pairwise probabilities taken
- from amphipathic
- beta (not beta helix)
- structures in PDB.
- Additional stacking bonuses
- on internal pairs.
- Incorporates distribution on
- turn lengths.
20Scoring Candidate Wraps (5 rungs)
- Iterate out to 5 rungs generating candidate
wraps
- Score each wrap
- - sum the rung-to-rung scores
- - B1 correlations filter
- - screen for alpha-helical content
21Potential Beta Helices
- Toxins
- Vaculating cytotoxin from the human gastric
pathogen H. pylori - Toxin B from the enterohemorrhagic E. coli
strain O157H7 - Allergens
- Antigen AMB A II, major allergen from A.
artemisiifolia (ragweed) - Major pollen allergen CRY J II, from C. japonica
(Japanese cedar) - Adhesins
- AIDA-I, involved in diffuse adherence of
diarrheagenic E. coli - Other cell surface proteins
- Outer membrane protein B from Rickettsia
japonica - Putative outer membrane protein F from Chlamydia
trachomatis - Toxin-like outer membrane protein from
Helicobacter pylori
22The Problem
- Given an amino acid residue subsequence, does it
fold as a coiled coil? A beta helix? - Very difficult
- peptide synthesis (1-2 months)
- X-ray crystallization, NMR (gt1 year)
- molecular dynamics
- Our goal predict folded structure based on a
template of positive examples.
23Collaborators
Math / CS Mona Singh Ethan Wolf Phil Bradley
Lenore Cowen Matt Menke David Wilson Theo Tonchev
Biologists Peter S. Kim Jonathan King Andrea
Cochran James Berger Mari Milla