Title: Protein Folding: Interrelation between Secondary and Tertiary Structure Determination
 1Protein Folding Interrelation between Secondary 
and Tertiary Structure Determination
Karl F. Freed
James Franck Institute and Department of 
Chemistry University of Chicago
KITPC, Beijing China, July 29, 2009. 
 2www.ncbi.nlm.nih.gov/Genbank/genbankgrowth.jpg
Human, Dog, Rat, Worm Genome Projects Obtain 
genes which code for protein sequences
www.genomesonline.org/images/gold_s1.gif 
 3Proteins The primary functional biomolecules
insulin cytochrome c hormone 
heme group glucose electron transfer 
 levels 
ribonuclease lysozyme 
myoglobin cleaves RNA cleaves 
 oxygen carbohydrates storage
hemoglobin oxygen transport
glutamine synthetase synthesize glutamine
antibodyrecognize/target foreign bodies
Our goal is to determine the folded structures
Function follows from structure 
 4Protein Folding Problem
SEQUENCE
Folding 0.00001  10 sec
Native State Responsible for function 
 5Why havent we solved The Protein Folding 
Problem after 40 years?
Mother Folding
- Two aspects 
- Predict pathways 
- Predict structure
Sequence determines structure 
 6Give me an aa sequence Ill produce a pathway 
and the final structure. Why so 
difficult?
- Were not smart enough 
- Its a very complex system 
7Give me an aa sequence Ill produce a pathway 
and the final structure. Why so difficult?
- Complex problem 
- Too many atoms (not enough computing power) 
- Force fields inaccurate (pairwise interactions 
 inadequate)
- Complex interplay between secondary and tertiary 
 structure formation (local vs. long-range
 structure)
- High degree of folding cooperativity 
- Averaging doesnt work (no mean field models) 
- H2O solvent is difficult to treat 
- Dont know all the rules? 
- Not enough information? 
- Reductionist models (e.g. H/P) often too simple
8What are the fundamental principles needed to 
predict pathways and structures?  
 9Two aspects of The Protein Folding Problem 
- Mechanistic studies How does it get from 
the U-state to the N-state?
Successful when have homology, but side-steps The 
Question What are the Principles? 
 10Why do they fold into specific structures? 
 11What level of representation is needed?
Ca
Cb
Beads-on-a-string 
 12Monomer structure  miscibility of polyolefins
K. F. Freed and J. Dudowicz, Adv. Polym. Sci. 
183, 63-126 (2005). 
 13Major themes and challenges in protein folding
2) Satisfy main-chain hydrogen bonds and form 
secondary structure.
1) Polymer bends in certain ways
3) Bury hydrophobic residues and pack the atoms
Must satisfy 1, 2  3 simultaneously 
 14Vast Conformational Search Levinthal Paradox
How does a protein find the time to fold?
Polypeptide backbone is flexible, adopting 
specific conformations
Poly-Proline II basin
Beta basin
y
preferences
Helical, turn basin
Ramachandran Map
f
And we have to search too! 
 15What info is needed to fold proteins? all-atom ? 
dihedral angles
All-atom protein  solvent Simulation would take 
decades 
 16Reduced representation Side chains are only Cb
Retains the 3 themes 
Big Challenge Retain sequence information lost 
with removal of side chains 
 17?-basin
PPII-basin
1) Proteins bend in certain ways
Sampling in dihedral space f-y angles and 
Ramachandran BASINS
e
b
PP2
Extended
f  y show very strong preferences for certain 
regions of the f-y map, called Ramachandran 
basins. (due to steric  electrostatic interaction
s)
aL
Helical
Where to get this information?
?-basin 
 18 From computer simulations?
But, force fields can vary widely
Zaman et al. JMB 2003 
 19Data mine protein data base (PDB) of crystal 
structures Extract the distributions for each 
type of amino acid
THR
ALA
y
f 
 20Ramachandran Map of ALA with neighborsALA, ASP
ALAASP
ALAALA
Map depends on neighbor type and 
conformationStrong correlations in sequence  
 21Move-set uses highly selected trimers from PDB
Specifying side chain type and backbone geometry 
implicitly includes all-atom side chain 
information 
 22Knowledge-based energy function Assign 
interaction energies according to the observed 
distances in the PDB
ProbPDB(rij) Probability of finding 2 atoms some 
distance apart e.g. Dist(Caala  Caval).
EnergyPDB(rij)  -ln( ProbPDB(rij) ) 
 23 Secondary structure prediction methods
 SASA,  
 24What prevents accuracy of secondary structure 
prediction from reaching 90 ? 
Secondary structure often depend on long range 
interactions, i.e. tertiary structure
This is supported by the following studies
- The same fragment from different parts of protein 
 G forms varying secondary structures
Minor and Kim (1996)
- Secondary structure prediction accuracy decreases 
 with increasing contact order
Kihara (2005)
Pan et al. (1999) Jacobini et al. (2000) Zhou et 
al. (2000) Ikeda and Higo (2006)
- The same sequence fragment can be found in 
 multiple native secondary structure types
25What we do differently 
Couple secondary and tertiary structure during 
the folding process Restrict possible secondary 
structure as the chain folds
Eliminate all other factors, and the one which 
remains must be the truth. 
A. C. Doyle, The Sign of the Four (1890)  
 26Trimer library Full PDB 
 27B)
Iterations mimic steps in folding pathway
Major pathway
1
b1 b2 helix b4 b5 
310 b3
Unfolded state
Round 1
Round 0
0
 b3 
 28A
B
C
Energy
1af7
1r69
1ubq
C? rmsd
C? rmsd
C? rmsd 
 29Round 5
1af7
1di2
1r69 
1b72A 
1
Round 0
Round 0
Round 0
Round 0
0
1
Round 1
Round 1
Round 1
Round 1
0
1
Round 2
Round 2
Round 2
Round 2
Secondary Structure frequency
0
1
Round 4
Round 3
Round 3
Round 3
0
1
Round 6
Round 4
Round 5
Round 4
0
1
Round 6
Round 7
Round 6
Round 8
0
residue index 
 30(No Transcript) 
 311AF7 3.4 Å RMSD
1TIF 5.4 Å RMSD
1SAP 7.8 Å RMSD 
 32Novel Aspects
- Predict 2  3 structure without using homology 
- Use principles of protein structure and folding 
- couple 2  3 structure formation 
- sequential stabilization 
- Iterative fixing to reduce the search. 
- Use a Cb representation 
- Potential function orientational  2 structure 
 dependence in a Cb model
- Q8-level 2 structure, (f,y) prediction, can 
 outperform PSIPRED
- Outputs pathway information 
33Conclusions 
 34Acknowledgements
Prof. Tobin Sosnick Joe DeBartolo - ab initio 
folding, secondary structure prediction Dr. 
Andres Colubri  Folding simulations, software 
 Dr. Abhishek Jha (MIT)  coil library, unfolded 
state, a, b propensities, structure refinement, 
electrostatics James Fitzgerald (Stanford) - 
Statistical potentials, torsional dynamics Prof. 
M. Zaman (UT Austin)  Simulations on 
peptides All-atom statistical potential Dr. 
Min-yi Shen, Prof. A. Sali, UCSF Funding NIH, 
NSF, Burroughs Wellcome Fund