Proteiinianalyysi 7 - PowerPoint PPT Presentation

About This Presentation
Title:

Proteiinianalyysi 7

Description:

State of the art in homology modelling. Template search ... vector representation. Probabilities from the database. Rosetta. CASP4 predictions ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 82
Provided by: luh8
Category:

less

Transcript and Presenter's Notes

Title: Proteiinianalyysi 7


1
Proteiinianalyysi 7
  • Kolmiulotteisen rakenteen ennustaminen
  • http//www.bioinfo.biocenter.helsinki.fi/downloads
    /teaching/spring2006/proteiinianalyysi

2
Sekvenssistä rakenteeseen
  • komparatiivinen mallitus
  • 1-ulotteinen tilan (luokan) ennustaminen
    sekvenssistä
  • 3-ulotteisen rakenteen tunnistaminen annetusta
    kirjastosta (fold recognition)
  • 3-ulotteisen rakenteen ennustaminen ab initio

3
Motivation
  • Protein structure determines protein function
  • For the majority of proteins the structure is not
    known

4
  • Curve fitted to data
  • for homologous
  • families
  • Divergence of
  • common cores
  • fraction in core
  • decreases with
  • increasing sequence
  • divergence

Chothia Lesk (1986)
5
Steps in comparative modelling
  • Find suitable template(s)
  • Build alignment between target and template(s)
  • Build model(s)
  • Replace sidechains
  • Resolve conflicts in the structure
  • Model loops (regions without an alignment)
  • Evaluate and select model(s)

6
State of the art in homology modelling
  • Template search
  • (iterative) sequence database searches (PSIBLAST)
  • Alignment step
  • multiple alignment of close to fairly distant
    homologues
  • Modelling step
  • rigid body assembly
  • segment matching
  • satisfaction of spatial constraints

7
An alignment defines structurally equivalent
positions!
Template structure
Template sequence
Alignment
Target sequence
Model
8
The crucial importance of the alignment
Template sequence
Template structure
Alignment
Target sequence
Model
9
Modelling by spatial restraints
  • Generate many constraints
  • Homology derived constraints
  • Distances and angles between aligned positions
    should be similar
  • Stereochemical constraints
  • Bond lengths, bond angles, dihedral angles,
    nonbonded atom-atom contacts
  • Model derived by minimizing restraints

Modeller Sali Blundell (1993)
10
Loop modelling
  • Exposed loop regions usually more variable than
    protein core
  • Often very important for protein function
  • Loops longer than 5 residues difficult to built
  • Mini-protein folding problem

11
Model evaluation
  • Check of stereochemistry
  • bond lengths angles, peptide bond planarity,
    side-chain ring planarity, chirality, torsion
    angles, clashes
  • Check of spatial features
  • hydrophobic core, solvent accessibility,
    distribution of charged groups,
    atom-atom-distances, atomic volumes, main-chain
    hydrogen bonding
  • 3D profiles/mean force potentials
  • residue environment

12
Knowledge-based mean force potentials
  • Compute typical atomic/residue environments based
    on known protein structures

Melo Feytmanns (1997)
13
Modelling a transcription factor
  • Sequence from different species
  • Is binding to ligand conserved?

14
Ligand binding domain
hydrogen bonds to ligand
homo-serine lactone moiety binding
acyl moiety binding
15
DNA binding domain
DNA binding domain
Linker
16
New Loop
Template
Target
Variable loops
MODELLER output
17
Ligand binding pocket
18
Errors in comparative modelling
  1. Side chain packing
  2. Distortions and shifts
  3. Loops
  4. Misalignments
  5. Incorrect template

True structure
Template
Model
Marti-Renom et al. (2000)
19
Modelling accuracy
Marti-Renom et al. (2000)
20
Applications of homology modelling
Marti-Renom et al. (2000)
21
Structural genomics
  • Post-genomics
  • many new sequences, no function
  • Aim a structure for every protein
  • High-throughput structure determination
  • robotics
  • standard protocols for cloning/expression/crystall
    ization

22
Structural coverage
high quality models
Complete models
Total 43
Vitkup et al. (2001)
23
Target selection
24
Fold recognition - Assumption
  • Native structure is the global minimum energy
    conformation
  • So, need
  • Discriminating energy function
  • Conformation generator
  • Backbone from homologous template (comparative
    modelling)
  • Backbone from analogous template (fold
    recognition)
  • Comprehensive sampling (ab initio)

25
Fold recognition steps
  • Template library
  • Known structures from Protein Data Bank
  • Fold classification suggests a limited number of
    fold types
  • Score sequence-structure fitness
  • Environmental preferences of amino acids
  • Boltzmann engine
  • Search problem alignment
  • Complicated with pair potentials
  • Significance of best score in database search
  • Reference state

26
(No Transcript)
27
(No Transcript)
28
Potentials of mean force
  • Boltzmann engine
  • In thermodynamic equilibrium, particles are
    partitioned between states proportionally to
    exp(-DG)
  • Effective energy negative logarithm of the
    equilibrium constant
  • Count occurrences per state
  • Radial distribution of aa pairs (Sippl)

29
Structural environment
  • Single-residue preferences 20 x 3 x 3 x 3
  • Helix, strand, coil
  • Accessibility
  • Contact area (indirectly codes for aa type)
  • Contact pair potentials
  • Atomic contacts within 4 A
  • C-beta atoms within 7 A
  • Secondary structure of residues i and j
  • 3 x 20 x 3 x 20 3600 preferences

30
Information content
Arg-Asp helix-helix (dashed) Arg-Asp
strand-strand (solid) Arg-Asp (dotted)
31
Threading algorithms
  • Dynamic programming
  • Simple
  • frozen approximation
  • Read sequence-dependent environment from template
    (1st round), then from aligned target sequence
  • Stochastic optimization (Monte Carlo)
  • Pair potentials
  • Exhaustive search
  • Simplify search space (e.g., ignore loops)

32
Prospect model (Xu Xu)
Etotal vmutateEmutate x vsingleEsingle x
vpairEpair x vgapEgap Weights v optimized on
training set
33
Prospect - segmentation
  • - Finds optimal threading fairly efficiently
  • Topological complexity
  • No gaps in secondary structure elements
  • Pair energy term only evaluated between
  • secondary structure elements

34
Prospect- observations
  • Mutation energy is the most important
  • Single-residue terms with profile information
    generate reasonably good alignments for 2/3 of
    test cases
  • The pairwise energy term can thus be ignored
    during the search for optimal alignment, but is
    used in evaluating the fold recognition

35
Performance comparison
Method Family only Superfamily Fold only Top
1 Top 5 Top 1 Top 5 Top 1 Top 5 Using pair
potential PROSPECT 84.1 88.2 52.6 64.8 27.7 50.3
Using dynamic programming, structural
environment FUGUE 82.2 85.8 41.9 53.2 12.5 26.8 T
HREADER 49.2 58.9 10.8 24.7 14.6 37.7 Using
sequence similarity only PSI-BLAST 71.2 72.3 27.4
27.9 4.0 4.7 HMMER 67.7 73.5 20.7 31.3
4.4 14.6 SAMT98 70.1 75.4 28.3 38.9
3.4 18.7 BLASTLINK 74.6 78.9 29.3 40.6
6.9 16.5 SSEARCH 68.6 75.7 20.7 32.5 5.6 15.6
36
Threading score - significance
  • Target sequence fold library
  • Each threading aligns a different sub-sequence
  • Compute Z-score for each by ungapped threading on
    large decoy (Sippl)
  • Reverse threading
  • Design optimal sequence for a given fold

37
Incorrect self-threading
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
Fold recognized
43
Fold recognized Poor alignment of residues
44
(No Transcript)
45
Ab initio prediction
  • HMMSTR/I-sites/RosettaHMMSTR is a Hidden Markov
    Model based on protein STRucture. Each Markov
    state in this model represents a position in one
    of the I-sites motifs. HMMSTR can predict local
    structure (as backbone angles), secondary
    structure, and supersecondary structure (edge
    versus middle strand, hairpin versus diverging
    turn).
  • I-sites LibraryI-sites is a library of folding
    initiation site motif, which are sequence motifs
    that correlate with particular local structures
    such as beta hairpins and helix caps. I-sites can
    be used to predict local structure, or to predict
    which parts of a protein are likely to fold
    early, initiating folding.

46
Intermediates are not observed, but
Folding is 2-state
Unfolded
Folded
47
Nucleation sites
something happens first...
48
Early folding events might be recorded in the
database
Short, recurrent sequence patterns could be
folding Initiation sites
recurrent part
HDFPIEGGDSPMQTIFFWSNANAKLSHGY
CPYDNIWMQTIFFNQSAAVYSVLHLIFLT IDMNPQGSIEMQTIFFGYA
ESAELSPVVNFLEEMQTIFFISGFTQTANSD
INWGSMQTIFFEEWQLMNVMDKIPSIFNESKKKGIAMQTIFFILSGR
PPPMQTIFFVIVNYNESKHALWCSVD
PWMWNLMQTIFFISQQVIEIPS
MQTIFFVFSHDEQMKLKGLKGA
Non-homologous proteins
Nature has selected for these patterns because
they speed folding.
49
(No Transcript)
50
How to read an I-sites motif profile
51
Backbone angles and sequence pattern for
Amphipathic alpha-helix
52
Superposition of the top scoring 30
true-positives
53
Conserved polar (green) and non-polar (purple)
sidechains
54
Serine alpha-N-cap
55
HMMSTR
A Markov state. A hidden Markov model consists of
Markov states connected by directed transitions.
Each state emits an output symbol, representing
sequence or structure. There are four categories
of emission symbols in our model b, d, r, and c,
corresponding to amino acid residues, three-state
secondary structure, backbone angles (discretized
into regions of phi-psi space) and structural
context (e.g. hairpin versus diverging turn,
middle versus end-strand), respectively. Bystroff
C, Thorsson V Baker D. (2000). HMMSTR A
hidden markov model for local sequence-structure
correlations in proteins. Journal of Molecular
Biology 301, 173-90.
56
Merging of two I-sites motifs to form an HMM.
57
(No Transcript)
58
Sequence Profiles
Sequence alignment
VIVAANRSA
VIVSAARTA
VIASAVRTA
VIVDAGRSA
VIASGVRTA
VIVAAKRTA
VIVSAVRTP
Sequence profile
VIVSAARTA
VIVSAVRTP
aa
VIVDAGRTA
VIVDAGRTA
VIVSGARTP


VIVDFGRTP
VIVSATRTP
VIVSATRTP
VIVGALRTP
VIVSATRTP
VIVSATRTP
VIASAARTA
VIVDAIRTP
Red high prob ratio (LLRgt1)Green background
prob ratio (LLR0)Blue low prob ratio (LLRlt-1)
VIVAAYRTA
VIVSAARTP
VIVDAIRTP
VIVSAVRTA
VIVAAHRTA
59
I-sites motifs
Backbone angles ygreen, fred
Amino acids arranged from non-polar to polar
60
Why do I-sites exist?
1. They are ancient conserved regions? 2. They
fold independently?
61
Patterns of conservation suggest independent
folding
2. sidechain contacts
1. backbone angle constraints
3. negative design
62
NMR structures confirm independent folding
diverging turn motif
NMR structure of a 7-residue I-sites motif in
isolation (Yi et al, J. Mol. Biol, 1998)
63
Fold prediction Rosetta method
  • Knowledge based scoring function

Bayes' law
P(structure) P(sequencestructure)
P(structuresequence)
P(sequence)
P(sequencestructure) f(residue contacts in
native structures)
near-native structures
protein-likestructures
sequence consistentlocal structure
P(structure) probability of a protein-like
structure (no clashes, globular shape)
Simons et al. (1997)
64
Rosetta
(1) A stone with three ancient languages on
it. (2) A program (David Baker) that simulates
the folding of a protein, using statistical
energies and moves.
65
The Folding Problem
Two parts (1) The Search Problem Is the true
structure one of my 2 million guesses? (2) The
Discrimination Problem If its one of these 2
million, which one is it?
66
Fragment insertion Monte Carlo
Rosetta
backbone torsion angles
accept or reject
moveset
Energy function
Choose fragment from moveset
change backbone angles
Convert angles to 3D coordinates
67
Backbone angles are restrained in I-sites regions
Rosetta
regions of high-confidence I-sites prediction
backbone torsion angles
moveset
Fragments that deviate from the paradigm (gt90 in
f or y) are removed from the moveset.
Generally, about one-third of the sequence has an
I-sites prediction with confidence gt 0.75, and is
restrained.
68
Sequence dependent features
Rosetta
69
Sequence-independent features
Rosetta
Probabilities from the database
Current structure
The energy score for a contact between secondary
structures is summed using database statistics.
70
CASP4 predictions
Rosetta
31 target sequences. Ab initio prediction i.e.
Sequence homolog data was ignored if present. 61
topologically correct 60 locally correct 73
secondary structure correct
71
T0116 262-322 (61 residues)
Rosetta
prediction
true structure
Topologically correct (rmsd5.9Å) but helix is
mis-predicted as loop.
72
T0121 126-199 (66 residues)
Rosetta
prediction
true structure
Topologically correct (rmsd5.9Å) but loop is
mis-predicted as helix.
73
T0122 57-153 (97 residues)
Rosetta
prediction
true structure
...contains a 53 residue stretch with max
deviation 96
74
T0112 153-213
Rosetta
prediction
true structure
Low rmsd (5.6Å) and all angles correct ( mda
84), but topologically wrong!
(this is rare)
75
Rosetta
What needs to be fixed?
Turns
8 of the residues in the targets have f gt
0. 44 of these are at Glycine residues. 7 of
the residues in the predictions have f gt 0. but
only 16 of these are at Glycines.
Contact order
True structure 0.252
Predictions 0.119
76
Prediction of protein structure
  • ROSETTA program most famous
  • different models to treat the local and nonlocal
    interactions.
  • sequence-dependent local interactions bias
    segments of the chain to sample distinct sets of
    local structures
  • turn to in known three-dimensional structures as
    an approximation to the distribution of
    structures sampled by isolated peptides with the
    corresponding sequences.
  • nonlocal interactions select the lowest
    free-energy tertiary structures from the many
    conformations compatible with these local biases.
  • The primary nonlocal interactions considered are
    hydrophobic burial, electrostatics, main-chain
    hydrogen bonding and excluded volume.
  • minimizing the nonlocal interaction energy in the
    space defined by the local structure
    distributions using Monte Carlo simulated
    annealing.

77
Using NMR to guide Rosetta
  • We have extended the ROSETTA ab initio structure
    prediction strategy to the problem of generating
    models of proteins using limited experimental
    data. By incorporating chemical shift and NOE
    information and more recently dipolar coupling
    information into the Rosetta structure generation
    procedure, it has been possible to generate much
    more accurate models than with ab initio
    structure prediction alone or using the same
    limited data sets with conventional NMR structure
    generation methodology. An exciting recent
    development is that the Rosetta procedure can
    also take advantage of unassigned NMR data and
    hence circumvent the difficult and tedious step
    of assigning NMR spectra.

78
Rosetta in comparative modelling
  • We have also developed a method for comparative
    modeling that was one of the top performing
    methods in the CASP4 experiment. The method
    utilizes a new protein sequence structure
    alignment method and structurally variable
    regions such as long loops not present in the
    structure of a homologue are built using a
    modification of the rosetta ab initio structure
    prediction methodology. Both the ab initio and
    the comparative modeling methods have been
    implemented in a server called ROBETTA which was
    one of the best all around fully automated
    structure prediction servers in the CASP5 test.

79
Prediction algorithms have Underlying principles
Darwin protein evolution. Principle Proteins
that evolved from common ancestor have the same
fold. Boltzmann protein folding Principle
Proteins search conformational space, minimizing
the free energy.
80
Summary
  • Most prediction methods depend on sequence
    homology.(Darwin)
  • Folding predictions combine statistics and
    simulations.
  • Putative folding initiation sites can be found
    using database statistics.
  • Knowledge-based energy functions are derived from
    database statistics.
  • The folding problem is really two problems the
    search problem and the discrimination problem.
  • If we knew how proteins fold, we could predict
    their structures.
  • We dont know how proteins fold.

81
CASP6 current status
  • Comparative modelling extended to distant
    homologues
  • Easy PSI-Blast neighbours
  • Hard indirect PSI-Blast neighbours
  • Fold recognition merged with comparative
    modelling
  • Ab initio methods based on fragment assembly
    generate models (among top N predictions) that
    have some resemblance to the real structure
Write a Comment
User Comments (0)
About PowerShow.com