3D Structure Prediction - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

3D Structure Prediction

Description:

Include global sequence info in the profile ... Servers - LIBRA 1. More Servers - www.bronco.ualberta.ca. 2D Threading Disadvantages ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 85
Provided by: Comp632
Category:

less

Transcript and Presenter's Notes

Title: 3D Structure Prediction


1
3D Structure Prediction Assessment Pt. 2
  • David Wishart
  • Rm. 2123 Dent/Pharm Centre
  • david.wishart_at_ualberta.ca

2
3D Structure Generation
  • X-ray Crystallography
  • NMR Spectroscopy
  • Homology or Comparative Modelling
  • Threading (1D and 2D threading)
  • Secondary Structure Prediction
  • Ab initio Structure Prediction

3
Outline
  • Threading (1D and 3D threading)
  • Secondary Structure Prediction
  • Ab initio Structure Prediction
  • Structure Evaluation Assessment
  • PERL and PDB

4
Definition
  • Threading - A protein fold recognition technique
    that involves incrementally replacing the
    sequence of a known protein structure with a
    query sequence of unknown structure. The new
    model structure is evaluated using a simple
    heuristic measure of protein fold quality. The
    process is repeated against all known 3D
    structures until an optimal fit is found.

5
Why Threading?
  • Secondary structure is more conserved than
    primary structure
  • Tertiary structure is more conserved than
    secondary structure
  • Therefore very remote relationships can be better
    detected through 2o or 3o structural homology
    instead of sequence homology

6
Visualizing Threading
THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD
GSEQNCEQCQESGIDAERTHR...
7
Visualizing Threading
THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD
GSEQNCEQCQESGIDAERTHR...
T
H
R
E
8
Visualizing Threading
THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD
GSEQNCEQCQESGIDAERTHR...
T
H
9
Visualizing Threading
THREADINGSEQNCEECNQESGNI ERHTHREADINGSEQNCETHREAD
GSEQNCEQCQESGIDAERTHR...
10
Visualizing Threading
11
Threading
  • Database of 3D structures and sequences
  • Protein Data Bank (or non-redundant subset)
  • Query sequence
  • Sequence lt 25 identity to known structures
  • Alignment protocol
  • Dynamic programming
  • Evaluation protocol
  • Distance-based potential or secondary structure
  • Ranking protocol

12
2 Kinds of Threading
  • 2D Threading or Prediction Based Methods (PBM)
  • Predict secondary structure (SS) or ASA of query
  • Evaluate on basis of SS and/or ASA matches
  • 3D Threading or Distance Based Methods (DBM)
  • Create a 3D model of the structure
  • Evaluate using a distance-based hydrophobicity
    or pseudo-thermodynamic potential

13
2D Threading Algorithm
  • Convert PDB to a database containing sequence, SS
    and ASA information
  • Predict the SS and ASA for the query sequence
    using a high-end algorithm
  • Perform a dynamic programming alignment using the
    query against the database (include sequence, SS
    ASA)
  • Rank the alignments and select the most probable
    fold

14
Database Conversion
gtProtein1 THREADINGSEQNCEECNQESGNI HHHHHHCCCCEEEEE
CCCHHHHHH ERHTHREADINGSEQNCETHREAD HHCCEEEEECCCCCH
HHHHHHHHH
gtProtein2 QWETRYEWQEDFSHAECNQESGNI EEEEECCCCHHHHHH
HHHHHHHHH YTREWQHGFDSASQWETRA CCCCEEEEECCCEEEEECC
gtProtein3 LKHGMNSNWEDFSHAECNQESG EEECCEEEECCCEEECC
CCCCC
15
Secondary Structure
-
-
16
2o Structure Identification
  • DSSP - Database of Secondary Structures for
    Proteins (swift.embl-heidelberg.de/dssp)
  • VADAR - Volume Area Dihedral Angle Reporter
    (redpoll.pharmacy.ualberta.ca)
  • PDB - Protein Data Bank (www.rcsb.org)

QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCA HHHHHHCCEEEEEE
EEEEECCHHHHHHHCCCCCCC
17
Accessible Surface Area
Reentrant Surface
Accessible Surface
Solvent Probe
Van der Waals Surface
18
ASA Calculation
  • DSSP - Database of Secondary Structures for
    Proteins (swift.embl-heidelberg.de/dssp)
  • VADAR - Volume Area Dihedral Angle Reporter
    (www.pence.ualberta.ca/ftp/vadar)
  • GetArea - www.scsb.utmb.edu/getarea/area_form.html

QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCAMD
BBPPBEEEEEPBPBPBPBBPEEEPBPEPEEEEEEEEE 10562987994
15251510478941496989999999
19
Other ASA sites
  • Connolly Molecular Surface Home Page
  • http//www.biohedron.com/
  • Naccess Home Page
  • http//sjh.bi.umist.ac.uk/naccess.html
  • ASA Parallelization
  • http//cmag.cit.nih.gov/Asa.htm
  • Protein Structure Database
  • http//www.psc.edu/biomed/pages/research/PSdb/

20
2D Threading Algorithm
  • Convert PDB to a database containing sequence, SS
    and ASA information
  • Predict the SS and ASA for the query sequence
    using a high-end algorithm
  • Perform a dynamic programming alignment using the
    query against the database (include sequence, SS
    ASA)
  • Rank the alignments and select the most probable
    fold

21
2o Structure Prediction
  • Statistical (Chou-Fasman, GOR)
  • Homology or Nearest Neighbor (Levin)
  • Physico-Chemical (Lim, Eisenberg)
  • Pattern Matching (Cohen, Rooman)
  • Neural Nets (Qian Sejnowski, Karplus)
  • Evolutionary Methods (Barton, Niemann)
  • Combined Approaches (Rost, Levin, Argos)

22
Chou-Fasman Statistics
23
The PhD Approach
PRFILE...
24
The PhD Algorithm
  • Search the SWISS-PROT database and select high
    scoring homologues
  • Create a sequence profile from the resulting
    multiple alignment
  • Include global sequence info in the profile
  • Input the profile into a trained two-layer neural
    network to predict the structure and to
    clean-up the prediction

25
Prediction Performance
26
Best of the Best
  • PredictProtein-PHD (72)
  • http//cubic.bioc.columbia.edu/predictprotein
  • Jpred (73-75)
  • http//jura.ebi.ac.uk8888/
  • PREDATOR (75)
  • http//www.embl-heidelberg.de/cgi/predator_serv.pl
  • PSIpred (77)
  • http//insulin.brunel.ac.uk/psipred

27
ASA Prediction
  • PredictProtein-PHDacc (58)
  • http//cubic.bioc.columbia.edu/predictprotein
  • PredAcc (70?)
  • condor.urbb.jussieu.fr/PredAccCfg.html

QHTAW...
QHTAWCLTSEQHTAAVIW BBPPBEEEEEPBPBPBPB
28
2D Threading Algorithm
  • Convert PDB to a database containing sequence, SS
    and ASA information
  • Predict the SS and ASA for the query sequence
    using a high-end algorithm
  • Perform a dynamic programming alignment using the
    query against the database (include sequence, SS
    ASA)
  • Rank the alignments and select the most probable
    fold

29
Dynamic Programming
G
E
N
E
T
I
C
S
G
60
40
30
20
20
0
10
0
E
40
50
30
30
20
0
10
0
N
30
30
40
20
20
0
10
0
E
20
20
20
30
20
10
10
0
S
20
20
20
20
20
0
10
10
I
10
10
10
10
10
20
10
0
S
0
0
0
0
0
0
0
10
30
Sij (Identity Matrix)
A C D E F G H I K L M N P Q R S T V W Y A 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 E 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 F 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 G 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 H 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 K 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 L 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 M 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 N 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 P 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 Q 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 R 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 S 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 W
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 Y 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
31
A Simple Example...
A A T V D A 1 V V D
A A T V D A 1 1 V V D
A A T V D A 1 1 0 0 0 V V D
A A T V D A 1 1 0 0 0 V 0 V D
A A T V D A 1 1 0 0 0 V 0 1 1 V D
A A T V D A 1 1 0 0 0 V 0 1 1 2 V D
32
A Simple Example...
A A T V D A 1 1 0 0 0 V 0 1 1 2 1 V D
A A T V D A 1 1 0 0 0 V 0 1 1 2 1 V 0 1
1 2 2 D 0 1 1 1 3
A A T V D A 1 1 0 0 0 V 0 1 1 2 1 V 0 1
1 2 2 D 0 1 1 1 3
A A T V D A - V V D
A A T V D A V V D
A A T V D A V - V D
33
Lets Include 2o info ASA
H E C
E P B
Sij
Sij
H 1 0 0 E 0 1 0 C 0 0 1
E 1 0 0 P 0 1 0 B 0 0 1
strc
asa
Sij k1Sij k2Sij k3Sij
total
seq
strc
asa
34
A Simple Example...
E E E C C
E E E C C
E E E C C
A A T V D A 2 V V D
A A T V D A 2 2 V V D
A A T V D A 2 2 1 0 0 V V D
E E C C
E E C C
E E C C
E E E C C
E E E C C
E E E C C
A A T V D A 2 2 1 0 0 V 1 V D
A A T V D A 2 2 1 0 0 V 1 3 3 V D
A A T V D A 2 2 1 0 0 V 1 3 3 3 V D
E E C C
E E C C
E E C C
35
A Simple Example...
E E E C C
E E E C C
E E E C C
A A T V D A 2 2 1 0 0 V 1 3 3 3 2 V D
A A T V D A 2 2 1 0 0 V 1 3 3 3 2 V 0 2
3 5 4 D 0 2 3 4 7
A A T V D A 2 2 1 0 0 V 1 3 3 3 2 V 0 2
3 5 4 D 0 2 3 4 7
E E C C
E E C C
E E C C
A A T V D A - V V D
A A T V D A V V D
A A T V D A V - V D
36
2D Threading Performance
  • In test sets 2D threading methods can identify
    30-40 of proteins having very remote homologues
    (i.e. not detected by BLAST) using minimal
    non-redundant databases (lt700 proteins)
  • If the database is expanded 4x the performance
    jumps to 70-75
  • Performs best on true homologues as opposed to
    postulated analogues

37
2D Threading Advantages
  • Algorithm is easy to implement
  • Algorithm is very fast (10x faster than 3D
    threading approaches)
  • The 2D database is small (lt500 kbytes) compared
    to 3D database (gt1.5 Gbytes)
  • Appears to be just as accurate as DBM or other 3D
    threading approaches
  • Very amenable to web servers

38
Servers - PredictProtein
39
Servers - 123D
40
Servers - GenThreader
41
Servers - LIBRA 1
42
More Servers - www.bronco.ualberta.ca
43
2D Threading Disadvantages
  • Reliability is not 100 making most threading
    predictions suspect unless experimental evidence
    can be used to support the conclusion
  • Does not produce a 3D model at the end of the
    process
  • Doesnt include all aspects of 2o and 3o
    structure features in prediction process
  • PSI-BLAST may be just as good (faster too!)

44
Making it Better
  • Include 3D threading analysis as part of the 2D
    threading process -- offers another layer of
    information
  • Include more information about the coil state
    (3-state prediction isnt good enough)
  • Include other biochemical (ligands, function,
    binding partners, motifs) or phylogenetic
    (origin, species) information

45
Outline
  • Threading (1D and 3D threading)
  • Secondary Structure Prediction
  • Ab initio Structure Prediction
  • Structure Evaluation Assessment
  • PERL and PDB

46
Ab Initio Prediction
  • Predicting the 3D structure without any prior
    knowledge
  • Used when homology modelling or threading have
    failed (no homologues are evident)
  • Equivalent to solving the Protein Folding
    Problem
  • Still a research problem

47
Polypeptides can be...
  • Represented by a range of approaches or
    approximations including
  • all atom representations in cartesian space
  • all atom representations in dihedral space
  • simplified atomic versions in dihedral space
  • tube/cylinder/ribbon representations
  • lattice models

48
Ab Initio Folding
  • Two Central Problems
  • Sampling conformational space (10100)
  • The energy minimum problem
  • The Sampling Problem (Solutions)
  • Lattice models, off-lattice models, simplified
    chain methods, parallelism
  • The Energy Problem (Solutions)
  • Threading energies, packing assessment, topology
    assessment

49
A Simple 2D Lattice
3.5Å
50
Lattice Folding
51
Lattice Algorithm
  • Build a n x m matrix (a 2D array)
  • Choose an arbitrary point as your N terminal
    residue (start residue)
  • Add or subtract 1 from the x or y position of
    the start residue
  • Check to see if the new point (residue) is off
    the lattice or is already occupied
  • Evaluate the energy
  • Go to step 3) and repeat until done

52
Lattice Energy Algorithm
  • Red hydrophobic, Blue hydrophilic
  • If Red is near empty space E E1
  • If Blue is near empty space E E-1
  • If Red is near another Red E E-1
  • If Blue is near another Blue E E0
  • If Blue is near Red E E0

53
More Complex Lattices
54
3D Lattices
55
Really Complex 3D Lattices
J. Skolnick
56
Lattice Methods
Advantages
Disadvantages
  • Easiest and quickest way to build a polypeptide
  • Implicitly includes excluded volume
  • More complex lattices allow reasonably accurate
    representation
  • At best, only an approximation to the real thing
  • Does not allow accurate constructs
  • Complex lattices are as costly as the real thing

57
Non-Lattice Models
3.5 Å
H
R
Resi
C
H
1.53 Å
1.00 Å
1.32 Å
C
N
1.47 Å
1.24 Å
O
C
Resi1
H
R
58
Vistraj Foldtraj
  • Chris Hogue Howard Feldman (SLRI)
  • Uses simplified Ca chain to represent polypeptide
    backbone
  • Generates a simplified self-avoiding chain of
    100 residues in 3 sec
  • Uses a binary tree search to look for potential
    collisions in 3D space
  • Reconstructs full polypeptide from Cas

59
Simplified Chain Representation
4
q
3
f
2
1
Spherical Coordinates
60
The Search Sphere
Helix
Coil
b-Sheet
61
Building a Ca Peptide Chain
n 3 n 5 n 7 n 9
62
Simplified Chain Representation
Reconstructing backbone atoms from Ca atoms
63
(No Transcript)
64
Best Method So Far...
Rosetta - David Baker
65
Blue Gene and Protein Folding
66
Outline
  • Threading (1D and 3D threading)
  • Secondary Structure Prediction
  • Ab initio Structure Prediction
  • Structure Evaluation Assessment
  • PERL and PDB

67
Why Assess Structure?
  • A structure can (and often does) have mistakes
  • A poor structure will lead to poor models of
    mechanism or relationship
  • Unusual parts of a structure may indicate
    something important (or an error)

68
Famous bad structures
  • Azobacter ferredoxin (wrong space group)
  • Zn-metallothionein (mistraced chain)
  • Alpha bungarotoxin (poor stereochemistry)
  • Yeast enolase (mistraced chain)
  • Ras P21 oncogene (mistraced chain)
  • Gene V protein (poor stereochemistry)

69
How to Assess Structure?
  • Assess experimental fit (look at R factor or
    rmsd)
  • Assess correctness of overall fold (look at
    disposition of hydrophobes)
  • Assess structure quality (packing,
    stereochemistry, bad contacts, etc.)

70
A Good Protein Structure..
X-ray structure NMR structure
  • R 0.59 random chain
  • R 0.45 initial structure
  • R 0.35 getting there
  • R 0.25 typical protein
  • R 0.15 best case
  • R 0.05 small molecule
  • rmsd 4 Å random
  • rmsd 2 Å initial fit
  • rmsd 1.5 Å OK
  • rmsd 0.8 Å typical
  • rmsd 0.4 Å best case
  • rmsd 0.2 Å dream on

71
A Good Protein Structure..
  • Minimizes disallowed torsion angles
  • Maximizes number of hydrogen bonds
  • Maximizes buried hydrophobic ASA
  • Maximizes exposed hydrophilic ASA
  • Minimizes interstitial cavities or spaces

72
A Good Protein Structure..
  • Minimizes number of bad contacts
  • Minimizes number of buried charges
  • Minimizes radius of gyration
  • Minimizes covalent and noncovalent (van der Waals
    and coulombic) energies

73
Radius Radius of Gyration
  • RAD 3.875 x NUMRES 0.333 (Folded)
  • RADG 0.41 x (110 x NUMRES) 0.5 (Unfolded)

Radius Radius of Gyration
74
Packing Volume
Loose Packing Dense Packing Protein
Proteins are Densely Packed
75
Accessible Surface Area
76
Accessible Surface Area
  • Solvation free energy is related to ASA
  • DG SDsiAi
  • Proteins typically have 60 of their ASA
    comprised of polar atoms or residues
  • Proteins typically have 40 of their ASA
    comprised of nonpolar atoms or residues
  • DASA (obs - exp.) reveals shape/roughness

77
Structure Validation Servers
  • WhatIf Web Server - http//www.cmbi.kun.nl1100/WI
    WWWI/
  • Biotech Validation Suite - http//biotech.ebi.ac.u
    k8400/cgi-bin/sendquery
  • Verify3D -
    http//www.doe-mbi.ucla.edu/Services/Verify_3D/
  • VADAR - http//redpoll.pharmacy.ualberta.ca

78
(No Transcript)
79
(No Transcript)
80
(No Transcript)
81
(No Transcript)
82
Structure Validation Programs
  • PROCHECK - http//www.biochem.ucl.ac.uk/roman/pr
    ocheck/procheck.html
  • PROSA II - http//lore.came.sbg.ac.at/People/mo/Pr
    osa/prosa.html
  • VADAR - http//www.pence.ualberta.ca/ftp/vadar/
  • DSSP - http//www.embl-heidelberg.de/dssp/

83
Procheck
84
Slides Located At...
http//redpoll.pharmacy.ualberta.ca
Write a Comment
User Comments (0)
About PowerShow.com