Title: Protein Structure Prediction
1Protein Structure Prediction
2Protein Sequence Analysis
- Molecular properties (pH, mol. wt. isoelectric
point, hydrophobicity) - Secondary Structure
- Super-secondary (signal peptide, coiled-coil,
trans-membrane, etc.) - 3-D prediction, Threading (tertiary structure)
- Domains, motifs, etc.
- Subunit (quaternary structure)
3Self-assembly
- Proteins self-assemble in solution
- All of the information necessary to determine the
complex 3-D structure is in the amino acid
sequences - Structure determines function
- lock key model of enzyme function
- Know the sequence, know the function?
- Nearly infinite complexity
4Structure of Peptide
N-terminal
C-terminal
Peptide
Backbone C-N-Ca-C Dihedral Angle or
torsional angle (F,?)
Instead of 9 variables, use 2 variables (F,?) for
each AA ?180 (CO and N-H)
(Stable
resonance) .
5Structure prediction
- Protein Structure prediction is the Holy Grail
of bioinformatics - Since structure function, then structure
prediction should allow protein design, design of
inhibitors, etc. - Huge amounts of genome data - what are the
functions of all of these proteins?
6Chemical Properties of Proteins
- Proteins are linear polymers of 20 amino acids
- Chemical properties of the protein are determined
by its amino acids - Molecular wt., pH, isoelectric point are simple
calculations from amino acid composition - Hydrophobicity is a property of groups of amino
acids - best examined as a graph
7(Increase local flexibility)
(Increase stability)
8Terminology
- Active site, Blocks, Core, Fold
- Domain, Motif
- Family, superfamily
- Module
- Class
- Primary, Secondary, Tertiary, Quaternary
9Secondary Structure
- Protein 2ndary structure takes one of three
forms - a helix
- ß sheet
- Turn, coil or loop
- 2ndary structure are tightly packed in the
protein core in a hydrophobic environment - 2ndary structure is predicted within a small
window - Many different algorithms, not highly accurate
- Better predictions from a multiple alignment
- Methods neural networks, nearest-neighbor
method, HMM,
103-D Structure of Protein
Right-hand turn (most), 3.6 residues per turn,
F600, ?400 on average
Turn or coil
Antiparallele and parallel
Alpha-helix
Beta-sheet
Loop
Loop and Turn
11Neural Networks for 2ndary
12Protein Structure Classification
- Class a a bundle of a helices connected by loops
on the surface of protein - Class ß antiparallel ßsheets
- Class a/ß mainly parallel ßsheets with
interveninga helices - Class aß mainly segregated a helices and
antiparallel ß sheets - Multidomain proteins comprise domains
representing more than one of the above 4 classes - Membrane and cell-surface proteins a helices
(hydrophobic) with a particular length range,
traversing a membrane
13Class ß
Class a
Class aß
Class a/ß
membrane
Membrane proteins
14Structure Prediction on the Web
- Secondary Structural Content Prediction (SSCP)
EMBL, Heidelberg - http//www.bork.embl-heidelberg.de/SSCP/sscp_seq.h
tml - BCM Search Launcher Protein Secondary Structure
Prediction Baylor College of Medicine - http//dot.imgen.bcm.tmc.edu9331/seq-search/struc
-predict.html - PREDATOR EMBL, Heidelberg
- http//www.embl-heidelberg.de/cgi/predator_serv.pl
- UCLA-DOE Protein Fold Recognition Server
- http//www.doe-mbi.ucla.edu/people/fischer/TEST/ge
tsequence.html
15Super-secondary Structure
- Common structural motifs
- Membrane spanning
- Signal peptide
- Coiled coil
- Helix-turn-helix
16Hydrophobicity Profile for 2ndary(positions of
turns between 2ndary structure, exposed and
buried residues, membrane-spanning segments,
antigenic sites)
173-D Structure
- Cannot be accurately predicted from sequence
alone (known as ab initio) - Levinthals paradox a 100 aa protein has 3200
possible backbone configurations - many orders of
magnitude beyond the capacity of the fastest
computers - There are perhaps only a few hundred basic
structures, but we dont yet have this vocabulary
or the ability to recognize variants on a theme - Methods HMM, structure profile method, contact
potential method, threading method,
conformational energy (monte Carlo Algorithm)
18Procedure of Prediction
No
Database similarity search
Align Known structure
sequence
Family analysis
Yes
Relationship to Know structure
Predict 3D structure
3D comparative modeling
Yes
No
3D structural Analysis in Lab
19Hidden Markov Models for 2D and 3D
- Hidden Markov Models (HMMs) are a more
sophisticated form of profile analysis. - Rather than build a table of amino acid
frequencies at each position, they model the
transition from one amino acid to the next.
20(No Transcript)
21 Homology Modeling If two
proteins show sufficient sequence similarity, it
essentially guarantees that they adopt the same
structure. Safe thresholds gt50 identity over
25 residues gt30 identity over 50 residues gt25
identity over 80 residues or more If one of the
two similar proteins has a known structure, can
build a rough model of the protein of unknown
structure. Quality of the model diminishes with
lower sequence identity.
22Steps in Homology Modeling
1. Do sequence alignment with protein of known
structure
Known Structure ksedemkase- - - -
dlkkhgatvltalg
Unknown Structure
kseddmrrseafgctytcdlrkhgntvltalg
3. Rebuild loops where there are gaps in the
aligment
2. Replace any side chains that are different in
the homolog (green side chains)
- Adjust side-chains to accommodate the new
residues and loops - Energy Minimize
23Structure 3D Profile Method (or 3D-1D method)
36 environments
Data from known library
(AA Residues)
24Threading Protein Structures
- Best bet is to compare with similar sequences
that have known structures gtgt Threading - Only works for proteins with gt25 sequence
similarity to a protein with known structure - Current state of the art requires many days of
computing on a dedicated workstation - Some websites offer quick approximations
- Will improve as more 3-D structures are described
- Another aspect of the Genome Project
25Monte Carlo Algorithm for 3D
- X set of atomic coordinates or
mainchain-sidechain torsion angles of a protein.
- E(x) conformation energy
- k is Boltzmanns constant T
is an effective temperature - Metropolis
Algorithm - 1. generate a random state x, calculate E(x)
- 2. perturb x x?x, to generate a neighbouring
conformation - 3. calculate E(x)
- 4. If E(x) gt E(x), accept x as a new state.
(downhill). Otherwise accept x with a
probability exp(-(E(x)-E(x))/kT). (uphill) - 5. return to 2