Title: Introduction to Protein Folding
1Introduction to Protein Folding
Yaohang Li Department of Computer Science North
Carolina AT State University
2Introduction to Protein Folding
- Protein are biologys workhorses
- Nanomachines
- Carry out biochemical function
- Enzyme
- Structural elements
- Antibodies
- Genome
- Blueprint of Protein
- Specify the sequence of amino acids
- Protein Folding
- Protein self-assembles to a particular shape
- The shape determines the function of the
protein - Connection between Genome (sequence) and Protein
Function
3Protein Folding Grand Challenges of
Computational Biology
- Protein Folding
- Predicting the 3-D Structure of protein
- Problems related to folding
- Dynamic structure prediction
- Protein docking
- Protein-protein interaction
- Issues
- Models
- Force fields (e.g. Charmm, Amber)
- Lots of parameters, constrained by experiment
good enough? - Sampling
- Can simulate 1ns 10-9 sec in a day
- Need to sample 104 to 106 ns!
4Why simulation?
- Physics ? chemistry ? biology
- Start from the laws of physics and chemistry
- explain the properties of biomolecules
- Experiments
- less detailed
- Spectroscopies, FRET, NMR, etc.
- Crystals are static
- Constrained by the experimental environment
- Costly
- Time Consuming
- Simulations
- very detailed
- Femtosecond time resolution
- Angstrom spatial resolution
- Much like having thousands of completely detailed
single molecule experiments - Relatively Cheap
5Goals
- Can we characterize folding computationally?
- Accurate rates
- Detailed mechanisms
- Can we design proteins?
- Specific stable structure
- Retention of function
6Levels of structures in Proteins
- Primary Structure
- amino acid sequence
- Secondary Structure
- Common motifs
- alpha helix
- beta sheet
- coil (turn)
- Tertiary Structure
- 3D conformation
- Quaternary Structure
- Multiple polypeptide chains
7Tertiary Structure Prediction
- Three variants
- Ab initio prediction
- Protein Threading
- Homology-based prediction(Comparative
modelling)
8Ab initio
- From Merriam-Webster's Dictionary
- Etymology Latin
- Meaning From the beginning
- In the field of Bioinformatics
- Predicting tertiary structure in the absence of
homology to a known structureAPKFFRGGNWKMNGKRSL
GELIHTLGDAKLSADTEVVCGI APSITEKVVFQETKAIAD
NKD WSKVEVHESRIYGGSVTNCK
ELASQHDVDGFLVGGASLKPVDGFLHALAEGLGVDINAKH.......
....
Ab Initio
9Anfinsens thermodynamic hypothesis
The 3D structure of a protein in its native
environment is the one in which the Gibbs free
energy of the whole system is the lowest.
sequence determines structure
Anfinsen, C.B. Principles that govern the folding
of protein chains. Science 181, 223-30 (1973).
10Folding Energy Landscape
11Levinthal paradox
If a protein has 100 amino acids, and each amino
acid has 3 conformational states, then the
protein has 3100 conformational states, of
which 1 is the Native state. Protein High Degree
of Freedom -gt Huge Search Space
Mathematician Curse of Dimensionality
12Ab Initio methods
- Assumption The structure that a protein folds
into inside cells is the structure with the
lowest global free energy (or a structure very
similar to it) - Finding native-like protein conformations
requires developing - an accurate potential function that permits
calculation of the free energy given a structure - an efficient method for searching for energy
minima
13Models
- Models are used to reduce the search space
(simplify the computation) - Three kinds of models
- Lattice models
- Discrete state off-lattice models
- Narrowing the search with Local Structure
Prediction
14Lattice models
- How to reduce complexity
- Represent peptide chains as lattices
- Advantage
- Analytical and computational simplicity
- Disadvantage
- Restricted ability to represent subtle geometric
considerations (e.g. strand twist) - Backbone reproduced has accuracies no greater
than approximately half the lattice spacing
15Discrete State Off-Lattice Models
- How to reduce complexity
- Only allow certain side chain structures and
limited peptide-bond rotations, e.g. - limit side chain to a single rotamer
- limit the backbone to specific Phi/Psi pairs
- The Omega angle tends to be planar (0 or 180o)
16Narrowing the search withLocal Structure
Prediction
- How to reduce complexity
- Use Local Structure Biases Local Structures
excised from proteins can fold independent of the
full protein - But, Strength and Multiplicity of Local Structure
Biases are highly sequence dependent - Use sequence motifs (Bystroff et al)
- Have strong tendencies to adopt a single local
conformation (in different sequences) - Made good predictions in CASP2
17Scoring Functions
- Scoring function
- Appropriate for the reduced space and
- can be calculated rapidly
- Three examples
- Solvation-based scores Classify sites in known
proteins by degree of solvent exposure and
determine frequencies of each amino acid in each
site - Pair interactions How likely two residues are
to be near each other - Secondary Structure Arrangement Score how well
secondary structure elements match with each other
18Who is ROSETTA
- ROSETTA Stone(From http//www.ba.dlr.de/ne/pe/vi
rtis/stone1.htm) - ROSETTAThe "Translation" of silent symbols into
a living languageSilent symbols ?
Living language hieroglyphs ?
Greekanalogically polypeptide
? Tertiary Structure
19What is ROSETTA Method
- Kim T. Simons, Rich Bonneau, David Baker, et al
- Model Narrow the search with Local Structure
Prediction - Scoring function Solvation-based Pair
interactions - Method outline (two steps)
- "Get tiny pieces" sequence profile alignment
- "Put them together" Monte-Carlo
method Bayesian scoring function
N near-native structures
20"Get tiny pieces"
- AssumptionDistribution of conformations sampled
for a given nine residue segment of the chain is
reasonably well approximated by the distribution
of structures adopted the sequence(and closely
related sequences) in known protein structures. - MethodFragment libraries for each three and nine
residue segment of the chain are extracted from
PDB using sequence profile alignment
21"Put them together"
?
?
?
- Which is better ?You need
- Energy function and
- Space searching method
22"Put them together"(cont.)
- Sample the resulting conformational space with
Monte-Carlo method - Bayesian scoring function
- Chose the most likely structure given the
sequence
23ROSETTA -- Scoring Function (cont.)
Bayesian Theorem
In ab initio folding, we assume P(structure)
Comparing different structures of the same
sequence, it is a constant
In threading wih pairs of positions
independent rij distance between residues i
and j
Using Bayesian theorem for each i and j
Independent of structure in same sequence
24ROSETTA -- Scoring Function (cont.)
The Scoring Function favours Compact
structure Buried hydrophobic residues (Paired
beta-strands)
25Metropolis-Hastings Method
- Simulating a Markov Chain
- Generate a new state y from the current state x
- Change the configuration of a random selected
3-residues - Metropolis-Hastings Ratio
- rgt1, accept y
- rlt1, accept y with probability r
- Reject y
26Simulated Annealing
- Physically motivated approachhigh temperature
--gt move around low temperature --gt no
free energy to movecool quickly --gt
defective crystal cool slowly --gt
perfect crystal - Analogue Natural annealing process lt----gt
Monte Carlo methodThe best crystal structure
lt----gt Native conformation
27My Contribution Accelerated Simulated Tempering
Scheme
28Our Simulation Results
29The CASP contest
- CASP(Critical Assessment of Structure
Prediction) - Experimentalists announce some protein sequences
that they are going to resolve structurally - CASP put these sequence on web for prediction
with deadline - Computational biologists submit their predictions
- CASP evaluates the predictions according to the
results resolved by experimentalists
30ROSETTA results
- FactsLeft Native structuresRight
Predictions
31Protein Folding Application Research in the Mad
Cow Disease
32Summary
- Before you ask me questions
- Is Rosetta really ab initio?