Protein Structure Prediction - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Protein Structure Prediction

Description:

isoelectric point, hydrophobicity) Secondary Structure ... Hydrophobicity is a property of groups of amino acids - best examined as a graph ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 26

Provided by: luona

Category:

more less

Transcript and Presenter's Notes

Title: Protein Structure Prediction

1
Protein Structure Prediction

Luonan Chen

2
Protein Sequence Analysis

Molecular properties (pH, mol. wt. isoelectric
point, hydrophobicity)
Secondary Structure
Super-secondary (signal peptide, coiled-coil,
trans-membrane, etc.)
3-D prediction, Threading (tertiary structure)
Domains, motifs, etc.
Subunit (quaternary structure)

3
Self-assembly

Proteins self-assemble in solution
All of the information necessary to determine the
complex 3-D structure is in the amino acid
sequences
Structure determines function
lock key model of enzyme function
Know the sequence, know the function?
Nearly infinite complexity

4
Structure of Peptide
N-terminal
C-terminal
Peptide
Backbone C-N-Ca-C Dihedral Angle or
torsional angle (F,?)
Instead of 9 variables, use 2 variables (F,?) for
each AA ?180 (CO and N-H)

(Stable
resonance) .
5
Structure prediction

Protein Structure prediction is the Holy Grail
of bioinformatics
Since structure function, then structure
prediction should allow protein design, design of
inhibitors, etc.
Huge amounts of genome data - what are the
functions of all of these proteins?

6
Chemical Properties of Proteins

Proteins are linear polymers of 20 amino acids
Chemical properties of the protein are determined
by its amino acids
Molecular wt., pH, isoelectric point are simple
calculations from amino acid composition
Hydrophobicity is a property of groups of amino
acids - best examined as a graph

7
(Increase local flexibility)
(Increase stability)
8
Terminology

Active site, Blocks, Core, Fold
Domain, Motif
Family, superfamily
Module
Class
Primary, Secondary, Tertiary, Quaternary

9
Secondary Structure

Protein 2ndary structure takes one of three
forms
a helix
ß sheet
Turn, coil or loop
2ndary structure are tightly packed in the
protein core in a hydrophobic environment
2ndary structure is predicted within a small
window
Many different algorithms, not highly accurate
Better predictions from a multiple alignment
Methods neural networks, nearest-neighbor
method, HMM,

10
3-D Structure of Protein
Right-hand turn (most), 3.6 residues per turn,
F600, ?400 on average
Turn or coil
Antiparallele and parallel
Alpha-helix
Beta-sheet
Loop
Loop and Turn
11
Neural Networks for 2ndary
12
Protein Structure Classification

Class a a bundle of a helices connected by loops
on the surface of protein
Class ß antiparallel ßsheets
Class a/ß mainly parallel ßsheets with
interveninga helices
Class aß mainly segregated a helices and
antiparallel ß sheets
Multidomain proteins comprise domains
representing more than one of the above 4 classes
Membrane and cell-surface proteins a helices
(hydrophobic) with a particular length range,
traversing a membrane

13
Class ß
Class a
Class aß
Class a/ß
membrane
Membrane proteins
14
Structure Prediction on the Web

Secondary Structural Content Prediction (SSCP)
EMBL, Heidelberg
http//www.bork.embl-heidelberg.de/SSCP/sscp_seq.h
tml
BCM Search Launcher Protein Secondary Structure
Prediction Baylor College of Medicine
http//dot.imgen.bcm.tmc.edu9331/seq-search/struc
-predict.html
PREDATOR EMBL, Heidelberg
http//www.embl-heidelberg.de/cgi/predator_serv.pl
UCLA-DOE Protein Fold Recognition Server
http//www.doe-mbi.ucla.edu/people/fischer/TEST/ge
tsequence.html

15
Super-secondary Structure

Common structural motifs
Membrane spanning
Signal peptide
Coiled coil
Helix-turn-helix

16
Hydrophobicity Profile for 2ndary(positions of
turns between 2ndary structure, exposed and
buried residues, membrane-spanning segments,
antigenic sites)
17
3-D Structure

Cannot be accurately predicted from sequence
alone (known as ab initio)
Levinthals paradox a 100 aa protein has 3200
possible backbone configurations - many orders of
magnitude beyond the capacity of the fastest
computers
There are perhaps only a few hundred basic
structures, but we dont yet have this vocabulary
or the ability to recognize variants on a theme
Methods HMM, structure profile method, contact
potential method, threading method,
conformational energy (monte Carlo Algorithm)

18
Procedure of Prediction
No
Database similarity search
Align Known structure
sequence
Family analysis
Yes
Relationship to Know structure
Predict 3D structure
3D comparative modeling
Yes
No
3D structural Analysis in Lab
19
Hidden Markov Models for 2D and 3D

Hidden Markov Models (HMMs) are a more
sophisticated form of profile analysis.
Rather than build a table of amino acid
frequencies at each position, they model the
transition from one amino acid to the next.

20
(No Transcript)
21
Homology Modeling If two
proteins show sufficient sequence similarity, it
essentially guarantees that they adopt the same
structure. Safe thresholds gt50 identity over
25 residues gt30 identity over 50 residues gt25
identity over 80 residues or more If one of the
two similar proteins has a known structure, can
build a rough model of the protein of unknown
structure. Quality of the model diminishes with
lower sequence identity.
22
Steps in Homology Modeling
1. Do sequence alignment with protein of known
structure
Known Structure ksedemkase- - - -
dlkkhgatvltalg
Unknown Structure
kseddmrrseafgctytcdlrkhgntvltalg
3. Rebuild loops where there are gaps in the
aligment
2. Replace any side chains that are different in
the homolog (green side chains)

Adjust side-chains to accommodate the new
residues and loops
Energy Minimize

23
Structure 3D Profile Method (or 3D-1D method)
36 environments
Data from known library
(AA Residues)
24
Threading Protein Structures

Best bet is to compare with similar sequences
that have known structures gtgt Threading
Only works for proteins with gt25 sequence
similarity to a protein with known structure
Current state of the art requires many days of
computing on a dedicated workstation
Some websites offer quick approximations
Will improve as more 3-D structures are described
Another aspect of the Genome Project

25
Monte Carlo Algorithm for 3D

X set of atomic coordinates or
mainchain-sidechain torsion angles of a protein.
E(x) conformation energy
k is Boltzmanns constant T
is an effective temperature
Metropolis
Algorithm
1. generate a random state x, calculate E(x)
2. perturb x x?x, to generate a neighbouring
conformation
3. calculate E(x)
4. If E(x) gt E(x), accept x as a new state.
(downhill). Otherwise accept x with a
probability exp(-(E(x)-E(x))/kT). (uphill)
5. return to 2