Title: Protein Conformation Prediction (Part II)
1Protein Conformation Prediction (Part II)
2Review two folding models
- Framework model
- Secondary structure first
- Assemble secondary structure segments
- Hydrophobic collapse
- Molten compact but denatured
- Formation of secondary structure after settles
in - van der Waals forces and hydrogen bonds require
close proximity
3Two approaches
- De novo (or ab initio)
- From the beginning or from first principles
- Comparative/Homology Based
- Sequence similarity
4Homology based
- Find a similar protein of known structure
- Structure should be similar
5How
- Know the phi and psi angles of the similar
protein - Can apply those same angles
- Known as threading
6Threading issues
- What are chances that lengths will the same
- Where put longer portions
- Where put gaps
Once again, MSAs
7Popular homology based approach
- 3D PSSM (Threading Server)
- Remember?
- Position specific similarity matrix
- Profiles
- 3D PSSM performs MSAs but augments with
additional 3D alignments - Aligning known 3D conformations in three
dimensions
8De Novo Approaches
- Molecular dynamics
- Summation of all forces exerted at all locations
simultaneously - Computationally intensive
- Do not fully understand such forces as
hydro-phobic avoidance of solvent
9Middle ground
- Secondary structure prediction
- Accuracy mid to upper 70s
- Work the loops to fold secondary structures into
energetically optimal conformation
10One approach
- See how often aas show up at specific positions
in secondary structure - Chou-Fasman
- Empirical parameters for ?, ?, and ? -turns
- P(?,aa)f(aa)/ave(?)
- P(?,aa)f(aa)/ave(?)
- P(?-turn,aa)f(aa)/ave(?-turn)
- Name P(a) P(b) P(turn)
- Alanine 142 83 66
- Arginine 98 93 95
- Aspartic Acid 101 54 146
-
-
-
- Valine 106 170 50
11Algorithm
- ID regions where 4 out of 6 (3 of 5 for ?)
contiguous residues have P(a-helix) gt 100 - Extend the helix in both directions until a set
of four contiguous residues that have an average
P(a-helix) lt 100 is reached.
1 MAKYNEKKEK KRIAKERIDI LFSLAERVFP YSPELAKRYV
ELALLVQQKA HHHHH HHHHHHHHHH H
HHHHHHHH HHHHHHHHHH 51 KVKIPRKWKR RYCKKCHAFL
VPGINARVRL RQKRMPHIVV KCLECGHIMR T SSTTTT SB
TTT B BTTTEEEEE E SSS EEEE EETTTTEEEE 101
YPYIKEIKKR RKEKMEYGGL VPR EE
12For turns
- Chou and Fasman also determined turn frequencies
- Most hairpins are three in length
- When p(?-turn) f(j)f(j1)f(j2)f(j3) is
greater than P(?) or P(?)
- Name P(a) P(b) P(turn) f(i)
f(i1) f(i2) f(i3) - Alanine 142 83 66 0.06
0.076 0.035 0.058 - Arginine 98 93 95 0.070
0.106 0.099 0.085 - Aspartic Acid 101 54 146 0.147
0.110 0.179 0.081 -
-
-
- Valine 106 170 50 0.062
0.048 0.028 0.053
13Patterns in usage
- Patterns can be used to augment these
statistical approaches - In some cases, one side of helices like water
- Every 4th aa hydrophilic
- Helps ID helix
- Helps ID that solvent exposed
- Other patterns coiled coils
14Sounds like
- Does this sound familiar?
- Probability of a sequence of occurrences?
Hairpin position 1
Hairpin position 2
Hairpin position 3
15HMMSTR
- Hidden Markov Model
- Hidden states helix, beta sheet, turn
16Motifs
- Proteins organized into
- Domains
- Domains composed of motifs
- PFAM
- Database of protein families
- Hidden Markov Models
17HMMR and PFAM
18CASP
- Critical Assessment of techniques for protein
Structure Prediction - Biannual conference contest
- Secret newly experimentally determined structures
CASP1 (1994) CASP2 (1996) CASP3 (1998)
CASP4 (2000) CASP5 (2002) CASP6 (2004)
CASP7 (2006) CASP8 (2008) CASP9 (2010)
CASP10(2012)
19CASP evaluation
- Root mean square (RMS) for angles
- No intermol contacts
- Secondary structure
- Surface
- Buried
20Approaches
- Have seen comparative homology based
- HMM based rely on multiple sequence alignments
homology - Now turn to De novo
- Split into two Ab initio and knowledge based
21ROSETTA (CASP3 Winner) De Novo Knowledge based
- Build a list of possible conformations (25) for
each segment (length 9) - Predicted secondary structure
- Database of structures
- Randomly draw from this list, apply ? and f, and
score conformation - Monte Carlo simulated annealing procedure
22ROSETTA (CASP3 Winner)
- Scoring global conformation
- hydrophobic burial
- Electrostatics
- Disulfide bonding
- Main chain hydrogen bonding
- Strand pairing
- Sheet formation
- Helix-strand interactions
- Excluded volume
23(No Transcript)
24 Letter Name Definition
H Alpha helix (4-12) Two or more consecutive bridge partners at i and i4.
B Isolated beta-bridge residue Must not have a neighbor that qualifies it for H, E, G, or I status. Bridge partner is identified in BP1 or BP2 column.
E Strand ("extended") Has at least one bridge partner and at least one neighbor bridged in parallel or antiparallel.
G 3-10 helix Two or more consecutive bridge partners at i and i3.
I pi helix Two or more consecutive bridge partners at i and i5.
T Turn Bridge partner at i3, i4, or i5, but no bridged neighbor that would qualify them for H, G, or I status.
S Bend Local curvature greater than 70 degrees, measured as the angle between alpha carbons at i-2, i, and i2.
blank None Meets none of the criteria above.