Structure prediction presentation

About This Presentation

Transcript and Presenter's Notes

Title: Structure prediction

1
Structure prediction

Why do we need structure prediction ?
Intellectual
Practical
Levinthals paradox
Anfinsens experiments
How it will be solved
Physics
Computer science
The protein design problem

2
4 Basic Levels of Protein Structure and all
information is in the sequence
3
What is homology
4
Structures change linear with ED
5
Homology

Similar sequence - Similar structure ?

6
Examples of similar structure
7
Zones
8
Homology can be detected by Sequence Alignment

Key aspect of sequence comparison is sequence
alignment
A sequence alignment maximizes the number of
positions that are in agreement in two sequences

9
Alignments

Local alignment
Global alignments

Global Alignments LGPSTKDFGKISESREFDN
LNQLERSFGKINM-RLEDA Local
Alignments ----------FGKI----------
----------FGKI----------
10
Dotplots
11
Methods to align

Optimal alignment
Maximise similarities
Minimize gaps
Score of an alignment
Score for substition
Gap-opening and gap-extension costs
Dynamic programming
Finds optimal solution
BLAST
Heuristic, fast algorithm using indexes (hashes)?

12
When are two sequences homologous
13
Identities do not provide the best similarity
14
Statistics of Sequence scores

Local alignments
Follows extreme value distribution
Scores depends on log(length)?
E (or P-value)?
Global alignments
Heuristics
Randomize sequences

15
Statistical comparison of alignment scores
16
How to improve alignments

Use more evolutionary information
Multiple alignments
Profiles
HMMS
Profile-profile alignments
Using additional information
Structure
Structural alignments

17
Multiple sequence alignments

Computationally intensive
Heuristic methods

18
Profiles can be used to detect distant homologs

Extra information
How to best use
Different methods
Patterns
Evolutionary method
Profile methods
HMMs
ANNs

19
PSI-BLAST in a nutshell

With a protein sequence as query, use BLAST to
search a protein sequence database.
Collapse significant local alignments (those with
E-value less than or equal to a set threshold h)
into a multiple alignment, using the residues of
the query sequence as alignment-column
placeholders.
Abstract a position-specific score matrix from
the multiple alignment.
Search the database with the score matrix as
query.
Iterate a fixed number of times, or until
convergence.

20
Protein structure prediction (and other uses for
molecules in life in a computer)?

Secondary structure predictions
Homology detection
What is homology
Why is is related to protein structure
How does it work
Simulations of folding
What is physics ?
Realistic simulations (folding_at_home)?
Smart simulations (rosetta_at_home)?

21
It's not that simple...

Amino acid sequence contains all the information
for 3D structure (experiments of Anfinsen,
1970's)
But, there are thousands of atoms, rotatable
bonds, solvent and other molecules to deal
with...

22
All the 3D information is in the sequence
23
Levinthal Paradox

Cyrus Levinthal, Columbia University, 1968
Levinthal's paradox
If we have 3 rotamers per residues a 100 residue
protein have 3100 possible conformations. To
search all these takes longer than the time of
the universe. But proteins fold in less than a
second.
Resolution Proteins have to fold through some
directed process
Goal is to understand the dynamics of this process

24
Old vs. New Views of folding

Old
Hierarchical view of protein folding
Secondary structures form, then interact to form
tertiary structures
General order of events
New
Statistical ensembles of states
Potential energy landscape
Folding Funnel

25
Two alternatives for structure prediction

Simulation of protein folding
Folding_at_home (Erik next week)?
Identification of lowest energy structure
More successful (today)?
Several layers
Secondary structure
3D-structure

26
Secondary structure prediction

AA preferences for different SS
Pro
Does not have a NH backbone
No H-bonds
Prefers Coils
Also in N-terminal part of helices and Beta-turns
Gly (compared with Ala)
No sidechain on Gly (more flexible)?
Polar groups in loops
Additional H-bonds to backbone

27
Amino acid preferences in coil
28
Amino acid preferences in ?-Strand
29
Secondary structure preferences

C? branched AAs prefers sheets
Entropic cost in helices of sidechain rotations
Hydrophobic groups prefers SS-elements
Negatively charged residues at C-terminal end of
helices due to dipole effect.

30
Amino acid preferences in ?-Strand
31
Amino acid preferences in ?-Helix
32
Secondary structure preferences

C? branched AAs prefers sheets
Entropic cost in helices of sidechain rotations
Hydrophobic groups in SS-elements
Polar in loops
Negatively charged residues at C-terminal end of
helices due to dipole effect.

33
Templates for helix, loops and sheet
34
More elaborate templates

Key residues
Gly in turns

35
Incorporating globular effects

Hydrophobic lake

36
Exemple of SS predictions
37
PhD (Rost Sander, 1994)?
38
PhD-Input
39
PhD-architecture
40
PhD-predictions
41
PhD summary

First methods with gt70 Q3
Correct length distributions
Much better beta strand predictions
Good correlation between score and accuracy
Better predictions for larger multiple sequence
alignments

42
Threading

A priori prediction of the Interferon fold in
1985
Good precdiction of helices

43
Prediction of interferon fold
44
FR methodologies
45
3D profiles
46
Threading or Fold recognition
VIFVLWGNAARQKCN LLFQTKHQHAVLACPH
47
PROSA/THREADER
48
How good is FR?

LiveBench and CASP measure performance.
E-values work reasonably well.
In the real world, you might get a few percent
more hits'' with FR compared to PSI-BLAST.
Individual researcher vs. genome-wide analysis
Structure information not necessary?

49
Sucess of FR
50
Alignments are not always perfect
51
Does threading really work ?

Evolutionary methods work better
Secondary Structure Predictions might help

52
AB-INITIO methods

Simulate the process of folding
Folding_at_home - MD simulations of small peptides
Find the lowest energy structure
Not simulate process.
Consequences of small energy gap
Unrealistic to model exactly
Easier to distinguish between Correct/Incorrect
than between Folded/non-folded.

53
How Rosetta Works

Minimize energy in the folded state
Uses a combination of energy formulas based on
the likelihood of particular structures, and the
fitness of the sequence
Side-chains simplified to a centroid located at
center of mass of the side-chain
Average of observed side-chain centroids in known
structures
Local sequence does not decide the local
structure, it only biases the decision
Non-local favorable conditions
Buried hydrophobic fragments
Paired ß strands
Specific side-chain interactions

54
Rosetta clustering the models

Compare models to each other with RMSD
Models can come from different family members
Cutoff varied to give 80-100 members in largest
cluster
The largest clusters are assumed to contain the
best structures (attractors in folding space...?)

55
Recent improvements to Rosetta

Refinement in HR rosetta
Make small dihedral changes
Rebuild sidechains
Minimize (in dihedral space)?
Evaluate energy
Go To 1
5 out 16 small proteins lt 1.5 Å

56
Physics of Rosetta

Is Rosetta physical ?
What is the most important terms in globular and
local free energies ?
How does proteins really fold ?
What do you think ?

57
Designs

Molten globule designs
Regan 50

58
Deign of four-helical bundle (De Grado, 1991)
Molten Globule
59
What characterizes a molten globule

Compact
Good secondary structure
Not solid
Sidechains not packed
No cooperative folding

60
Mayo method

Automatic design
Take fold (backbone)?
Take sequence (random)?
Mutate sequence
Build sidechains
Calculate energy
Accept/reject
Go to 3

61
Designing a non-zinc finger
62
Design of a non-zinc finger (Dahiyat and Mayo)
63
Alfabetin
64
Non-MG alfabetin shows cooperative folding
65
TOP7 (Rosetta Design)

Novel fold
Iterate between design and refinement
Non molten globular behavior.

66
Iteration between Seq and Str
Sequence Structure
67
The project

Three goals
Learn how to develop a (binary) predictor
Read the background literature
Write a scientific report about your work
Write a program that can do all of this.
Additional goal (for top grades)
Make a web-server
Combine your predictor into a full system

68
Tools

Python (or other language)
Write scripts to do the work
svmlight
Used in the bioinformatics course
Preparsed datafiles
Annotations
PSIBLAST needs to be parsed
Evaluations programs should be developed.

69
The projects

Binary classifier
Alpha-helix/other etc..
Surface area
Membrane non-membrane
Globular or membrane datasets

70
The program

Input.
Sequence in fasta file
Output
A prediction for each residue in the sequence

71
Web-server

For top grades (A,B) a web-server should be
developed, using the following steps.
Learn how to use PHP
Ask for an account on a web-server.
Use the templates index.php available from the
web-page.

72
The report

The following sections
Abstract
Introduction
Methods
Results and Discussion
Conclusions
References
More info May 8

Write a Comment

User Comments (0)

About PowerShow.com

Structure prediction PowerPoint PPT Presentation