Title: Protein Structural Prediction
1Protein Structural Prediction
2Protein Structure is Hierarchical
3Structure Determines Function
The Protein Folding Problem
- What determines structure?
- Energy
- Kinematics
- How can we determine structure?
- Experimental methods
- Computational predictions
4Primary Structure Sequence
- The primary structure of a protein is the amino
acid sequence
5Primary Structure Sequence
- Twenty different amino acids have distinct shapes
and properties
6Primary Structure Sequence
A useful mnemonic for the hydrophobic amino acids
is "FAMILY VW"
7Secondary Structure ?, ?, loops
- ? helices and ? sheets are stabilized by
hydrogen bonds between backbone oxygen and
hydrogen atoms
8Secondary Structure ? helix
9Secondary Structure ? sheet
b sheet
b buldge
10Second-and-a-half-ary Structure Motifs
beta helix
beta barrel
beta trefoil
11Tertiary Structure Domains
12Mosaic Proteins
13Tertiary Structure A Protein Fold
14Protein Folds Composed of ?, ?, other
15Quaternary Structure Multimeric Proteins or
Functional Assemblies
- Multimeric Proteins
- Macromolecular Assemblies
RibosomeProtein Synthesis
Hemoglobin A tetramer
Replisome DNA copying
16Protein Folding
- The amino-acid sequence of a protein determines
the 3D fold Anfinsen et al., 1950s - Some exceptions
- All proteins can be denatured
- Some proteins have multiple conformations
- Some proteins get folding help from chaperones
- The function of a protein is determined by its 3D
fold - Can we predict 3D fold of a protein given its
amino-acid sequence?
17The Leventhal Paradox
- Given a small protein (100aa) assume 3 possible
conformations/peptide bond - 3100 5 1047 conformations
- Fastest motions 10- 15 sec so sampling all
conformations would take 5 1032 sec - 60 60 24 365 31536000 seconds in a year
- Sampling all conformations will take 1.6 1025
years - Each protein folds quickly into a single stable
native conformation the Leventhal paradox
18Quick Overview of Energy
Strength (kcal/mole) Bond
3-7 H-bonds
10 Ionic bonds
1-2 Hydrophobic interactions
1 Van der vaals interactions
51 Disulfide bridge
19The Hydrophobic Effect
- Important for folding, because every amino acid
participates!
Thr 0.26
His 0.13
Gly 0.00
Ser -0.04
Gln -0.22
Asn -0.60
Glu -0.64
Asp -0.77
Lys -0.99
Arg -1.01
Trp 2.25
Ile 1.80
Phe 1.79
Leu 1.70
Cys 1.54
Met 1.23
Val 1.22
Tyr 0.96
Pro 0.72
Ala 0.31
Fauchere and Pilska (1983). Eur. J. Med. Chem.
18, 369-75.
Experimentally Determined Hydrophobicity Levels
20Protein Structure Determination
- Experimental
- X-ray crystallography
- NMR spectrometry
- Computational Structure Prediction
- (The Holy Grail)
- Sequence implies structure, therefore in
principle we can predict the structure from the
sequence alone
21Protein Structure Prediction
- ab initio
- Use just first principles energy, geometry, and
kinematics - Homology
- Find the best match to a database of sequences
with known 3D-structure - Threading
- Meta-servers and other methods
22Ab initio Prediction
- Sampling the global conformation space
- Lattice models / Discrete-state models
- Molecular Dynamics
- Pre-set libraries of fragment 3D motifs
- Picking native conformations with an energy
function - Solvation model how protein interacts with water
- Pair interactions between amino acids
- Predicting secondary structure
- Local homology
- Fragment libraries
23Lattice String Folding
- HP model main modeled force is hydrophobic
attraction - NP-hard in both 2-D square and 3-D cubic
- Constant approximation algorithms
- Not so relevant biologically
24Lattice String Folding
25ROSETTAhttp//www.bioinfo.rpi.edu/bystrc/hmmstr/
server.php
- http//depts.washington.edu/bakerpg/papers/Bonneau
-ARBBS-v30-p173.pdf - Monte Carlo based method
- Limit conformational search space by using
sequencestructure motif I-Sites library
(http//isites.bio.rpi.edu/Isites/) - 261 patterns in library
- Certain positions in motif favor certain residues
- Remove all sequences with lt25 identity
- Find structures of the 25 nearest sequence
neighbors of each 9-mer - Rationale
- Local structures often fold independently of full
protein - Can predict large areas of protein by matching
sequence to I-Sites
26I-Sites Examples
- Non polar helix
- Abundance of alanine at all positions
- Non-polar side chains favored at positions 3, 6,
10 (methionine, leucine, isoleucine)
- Amphipathic helix
- Non-polar side chains favored at positions 6, 9,
13, 16 (methionine, leucine, isoleucine) - Polar side chains favored at positions 1, 8, 11,
18 (glutamic acid, lysine)
27ROSETTA Method
- New structures generated by swapping compatible
fragments - Accepted structures are clustered based on energy
and structural size - Best cluster is one with the greatest number of
conformations within 4-Å rms deviation structure
of the center - Representative structures taken from each of the
best five clusters and returned to the user as
predictions
28Robetta Rosetta
29(No Transcript)
30Rosetta results in CASP
31Rosetta Results
- In CASP4, Rosettas best models ranged from 610
Å rmsd C? - For comparison, good comparative models give 2-5
Å rmsd C? - Most effective with small proteins (lt100
residues) and structures with helices
32Only a few folds are found in nature
33The SCOP Database
- Structural Classification Of Proteins
- FAMILY proteins that are gt30 similar, or gt15
similar and have similar known structure/function - SUPERFAMILY proteins whose families have some
sequence and function/structure similarity
suggesting a common evolutionary origin - COMMON FOLD superfamilies that have same
secondary structures in same arrangement,
probably resulting by physics and chemistry - CLASS alpha, beta, alphabeta, alphabeta,
multidomain
34Status of Protein Databases
PDB
SCOP Structural Classification of Proteins. 1.67
release24037 PDB Entries (15 May 2004). 65122
Domains.
Class Number of folds Number of superfamilies Number of families
All alpha proteins 202 342 550
All beta proteins 141 280 529
Alpha and beta proteins (a/b) 130 213 593
Alpha and beta proteins (ab) 260 386 650
Multi-domain proteins 40 40 55
Membrane and cell surface proteins 42 82 91
Small proteins 71 104 162
Total 887 1447 2630
EMBL
35Evolution of Proteins Domains
- members in different families obey power law
- 429 families common in all 14 eukaryotes
- 80 of animal domains, 90 of fungi domains
- 80 of proteins are multidomain in eukaryotes
- domains usually combine pairwise in same order
--why?
Chothia, Gough, Vogel, Teichmann, Science
3001701-17-3, 2003
Evolution of proteins happens mainly through
duplication, recombination, and divergence
36Homology-based Prediction
- Align query sequence with sequences of known
structure, usually gt30 similar - Superimpose the aligned sequence onto the
structure template, according to the computed
sequence alignment - Perform local refinement of the resulting
structure in 3D
The number of unique structural folds is small
(possibly a few thousand)
90 of new structures submitted to PDB in the
past three years have similar folds in PDB
37Examples of Fold Classes
38Homology-based Prediction
39Homology-based Prediction