Protein Structural Prediction - PowerPoint PPT Presentation

About This Presentation
Title:

Protein Structural Prediction

Description:

Protein Structural Prediction – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 40
Provided by: root
Learn more at: https://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Protein Structural Prediction


1
Protein Structural Prediction
2
Protein Structure is Hierarchical
3
Structure Determines Function
The Protein Folding Problem
  • What determines structure?
  • Energy
  • Kinematics
  • How can we determine structure?
  • Experimental methods
  • Computational predictions

4
Primary Structure Sequence
  • The primary structure of a protein is the amino
    acid sequence

5
Primary Structure Sequence
  • Twenty different amino acids have distinct shapes
    and properties

6
Primary Structure Sequence
A useful mnemonic for the hydrophobic amino acids
is "FAMILY VW"
7
Secondary Structure ?, ?, loops
  • ? helices and ? sheets are stabilized by
    hydrogen bonds between backbone oxygen and
    hydrogen atoms

8
Secondary Structure ? helix
9
Secondary Structure ? sheet
b sheet
b buldge
10
Second-and-a-half-ary Structure Motifs
beta helix
beta barrel
beta trefoil
11
Tertiary Structure Domains
12
Mosaic Proteins
13
Tertiary Structure A Protein Fold
14
Protein Folds Composed of ?, ?, other
15
Quaternary Structure Multimeric Proteins or
Functional Assemblies
  • Multimeric Proteins
  • Macromolecular Assemblies

RibosomeProtein Synthesis
Hemoglobin A tetramer
Replisome DNA copying
16
Protein Folding
  • The amino-acid sequence of a protein determines
    the 3D fold Anfinsen et al., 1950s
  • Some exceptions
  • All proteins can be denatured
  • Some proteins have multiple conformations
  • Some proteins get folding help from chaperones
  • The function of a protein is determined by its 3D
    fold
  • Can we predict 3D fold of a protein given its
    amino-acid sequence?

17
The Leventhal Paradox
  • Given a small protein (100aa) assume 3 possible
    conformations/peptide bond
  • 3100 5 1047 conformations
  • Fastest motions 10- 15 sec so sampling all
    conformations would take 5 1032 sec
  • 60 60 24 365 31536000 seconds in a year
  • Sampling all conformations will take 1.6 1025
    years
  • Each protein folds quickly into a single stable
    native conformation the Leventhal paradox

18
Quick Overview of Energy
Strength (kcal/mole) Bond
3-7 H-bonds
10 Ionic bonds
1-2 Hydrophobic interactions
1 Van der vaals interactions
51 Disulfide bridge
19
The Hydrophobic Effect
  • Important for folding, because every amino acid
    participates!

Thr 0.26
His 0.13
Gly 0.00
Ser -0.04
Gln -0.22
Asn -0.60
Glu -0.64
Asp -0.77
Lys -0.99
Arg -1.01
Trp 2.25
Ile 1.80
Phe 1.79
Leu 1.70
Cys 1.54
Met 1.23
Val 1.22
Tyr 0.96
Pro 0.72
Ala 0.31
Fauchere and Pilska (1983). Eur. J. Med. Chem.
18, 369-75.
Experimentally Determined Hydrophobicity Levels
20
Protein Structure Determination
  • Experimental
  • X-ray crystallography
  • NMR spectrometry
  • Computational Structure Prediction
  • (The Holy Grail)
  • Sequence implies structure, therefore in
    principle we can predict the structure from the
    sequence alone

21
Protein Structure Prediction
  • ab initio
  • Use just first principles energy, geometry, and
    kinematics
  • Homology
  • Find the best match to a database of sequences
    with known 3D-structure
  • Threading
  • Meta-servers and other methods

22
Ab initio Prediction
  • Sampling the global conformation space
  • Lattice models / Discrete-state models
  • Molecular Dynamics
  • Pre-set libraries of fragment 3D motifs
  • Picking native conformations with an energy
    function
  • Solvation model how protein interacts with water
  • Pair interactions between amino acids
  • Predicting secondary structure
  • Local homology
  • Fragment libraries

23
Lattice String Folding
  • HP model main modeled force is hydrophobic
    attraction
  • NP-hard in both 2-D square and 3-D cubic
  • Constant approximation algorithms
  • Not so relevant biologically

24
Lattice String Folding
25
ROSETTAhttp//www.bioinfo.rpi.edu/bystrc/hmmstr/
server.php
  • http//depts.washington.edu/bakerpg/papers/Bonneau
    -ARBBS-v30-p173.pdf
  • Monte Carlo based method
  • Limit conformational search space by using
    sequencestructure motif I-Sites library
    (http//isites.bio.rpi.edu/Isites/)
  • 261 patterns in library
  • Certain positions in motif favor certain residues
  • Remove all sequences with lt25 identity
  • Find structures of the 25 nearest sequence
    neighbors of each 9-mer
  • Rationale
  • Local structures often fold independently of full
    protein
  • Can predict large areas of protein by matching
    sequence to I-Sites

26
I-Sites Examples
  • Non polar helix
  • Abundance of alanine at all positions
  • Non-polar side chains favored at positions 3, 6,
    10 (methionine, leucine, isoleucine)
  • Amphipathic helix
  • Non-polar side chains favored at positions 6, 9,
    13, 16 (methionine, leucine, isoleucine)
  • Polar side chains favored at positions 1, 8, 11,
    18 (glutamic acid, lysine)

27
ROSETTA Method
  • New structures generated by swapping compatible
    fragments
  • Accepted structures are clustered based on energy
    and structural size
  • Best cluster is one with the greatest number of
    conformations within 4-Å rms deviation structure
    of the center
  • Representative structures taken from each of the
    best five clusters and returned to the user as
    predictions

28
Robetta Rosetta
29
(No Transcript)
30
Rosetta results in CASP
31
Rosetta Results
  • In CASP4, Rosettas best models ranged from 610
    Å rmsd C?
  • For comparison, good comparative models give 2-5
    Å rmsd C?
  • Most effective with small proteins (lt100
    residues) and structures with helices

32
Only a few folds are found in nature
33
The SCOP Database
  • Structural Classification Of Proteins
  • FAMILY proteins that are gt30 similar, or gt15
    similar and have similar known structure/function
  • SUPERFAMILY proteins whose families have some
    sequence and function/structure similarity
    suggesting a common evolutionary origin
  • COMMON FOLD superfamilies that have same
    secondary structures in same arrangement,
    probably resulting by physics and chemistry
  • CLASS alpha, beta, alphabeta, alphabeta,
    multidomain

34
Status of Protein Databases
PDB
SCOP Structural Classification of Proteins. 1.67
release24037 PDB Entries (15 May 2004). 65122
Domains.
Class Number of folds Number of superfamilies Number of families
All alpha proteins 202 342 550
All beta proteins 141 280 529
Alpha and beta proteins (a/b) 130 213 593
Alpha and beta proteins (ab) 260 386 650
Multi-domain proteins 40 40 55
Membrane and cell surface proteins 42 82 91
Small proteins 71 104 162
Total 887 1447 2630
EMBL
35
Evolution of Proteins Domains
  • members in different families obey power law
  • 429 families common in all 14 eukaryotes
  • 80 of animal domains, 90 of fungi domains
  • 80 of proteins are multidomain in eukaryotes
  • domains usually combine pairwise in same order
    --why?

Chothia, Gough, Vogel, Teichmann, Science
3001701-17-3, 2003
Evolution of proteins happens mainly through
duplication, recombination, and divergence
36
Homology-based Prediction
  • Align query sequence with sequences of known
    structure, usually gt30 similar
  • Superimpose the aligned sequence onto the
    structure template, according to the computed
    sequence alignment
  • Perform local refinement of the resulting
    structure in 3D

The number of unique structural folds is small
(possibly a few thousand)
90 of new structures submitted to PDB in the
past three years have similar folds in PDB
37
Examples of Fold Classes
38
Homology-based Prediction
39
Homology-based Prediction
Write a Comment
User Comments (0)
About PowerShow.com