Proteins Secondary Structure Predictions - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Proteins Secondary Structure Predictions

Description:

Title: Bioinformatics Tools Author _ _ Last modified by: yaelmg Created Date: 3/28/2003 11:41:44 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 38
Provided by: 2850
Category:

less

Transcript and Presenter's Notes

Title: Proteins Secondary Structure Predictions


1
Proteins SecondaryStructure Predictions
Structural Bioinformatics
2
Structure Prediction Motivation
  • Better understand protein function
  • Broaden homology
  • Detect similar function where sequence differs
  • (only 50 remote homologies can be detected
    based on sequence)
  • Explain disease
  • Explain the effect of mutations
  • Design drugs

3
Myoglobin the first high resolution protein
structure
Solved in 1958 by Max Perutz John Kendrew of
Cambridge University. Won the 1962 and Nobel
Prize in Chemistry.
In 1.1.2012 there are 72,468 protein structures
in the protein structure database. Great increase
but still a magnitude lower then 53,3657 protein
sequences in Uniprot
4
What can we do??
MERFGYTRAANCEAP.
  • Predicting the three dimensional structure
    from sequence of a protein is very hard
  • (some times impossible)
  • However we can predict with relative high
    precision the secondary structure

5
What do we mean by Secondary Structure ?
  • Secondary structure are the building blocks of
    the protein structure


6
What do we mean by Secondary Structure ?
  • Secondary structure is usually divided into
    three categories

Anything else turn/loop
Alpha helix
Beta strand (sheet)
7
Alpha Helix Pauling (1951)
  • A consecutive stretch of 5-40 amino acids
    (average 10).
  • A right-handed spiral conformation.
  • 3.6 amino acids per turn.
  • Stabilized by H-bonds

3.6 residues 5.6 Å
8
Beta Strand Pauling and Corey (1951)
  • Different polypeptide chains run alongside
    each
  • other and are linked together by hydrogen
    bonds.
  • Each section is called ß -strand,
  • and consists of 5-10 amino acids.

ß -strand
9
Beta Sheet
The strands become adjacent to each other,
forming beta-sheet.
Antiparallel
Parallel
10
Loops
  • Connect the secondary structure elements.
  • Have various length and shapes.
  • Located at the surface of the folded protein and
    therefore may have important role in biological
    recognition processes.

11
Three dimensional Tertiary Structure
  • Describes the packing of alpha-helices,
    beta-sheets and random coils with respect to each
    other on the level of one whole polypeptide chain

12
Secondary
Tertiary
?
?
RBP
?
Globin
13
How do the (secondary and tertiary) structures
relate to the primary protein sequence??
14
STRUCTURE
SEQUENCE
-Early experiments have shown that the sequence
of the protein is sufficient to determine its
structure (Anfisen) - Protein structure is more
conserved than protein sequence and more closely
related to function.
15
How (CAN) Different Amino Acid Sequence Determine
Similar Protein Structure ??
Lesk and Chothia 1980
16
The Globin Family
17
Different sequences can result in similar
structures
1ecd
2hhd
18
  • We can learn about the important features
    which determine structure and function by
    comparing the sequences and structures ?

19
The Globin Family
20
Why is Proline 36 conserved in all the globin
family ?
21
Where are the gaps??
The gaps in the pairwise alignment are mapped to
the loop regions
22
How are remote homologs related in terms of their
structure?
RBD
b-lactoglobulin
23
PSI-BLAST alignment of RBP and b-lactoglobulin
iteration 3
Score 159 bits (404), Expect
1e-38 Identities 41/170 (24), Positives
69/170 (40), Gaps 19/170 (11) Query 3
WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMA
KKDPEGLFLQ 54 V L LA A
S VENFD G WY K Sbjct 1
MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIE
KIPASFE-KG 59 Query 55 DNIVAEFSVDETGQMSATAKGRVR
LLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114
I A S E G K V PAK
Sbjct 60 NCIQANYSLMENGNIEVLNKELSPDG
TMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112 Query
115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGL
PPEA 164 WI TDY YA YSC
RP LPPE Sbjct 113
MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET
159
24
The Retinol Binding Protein
b-lactoglobulin
25
Structure Prediction
  • Goal Predict protein structure based
  • on sequence information

26
Prediction Approaches
  • Two stage approach
  • 1. Primary (sequence) to secondary structure
  • 2. Secondary to tertiary
  • One stage approach
  • - Primary to tertiary structure

27
Secondary Structure Prediction
  • Given a primary sequence
  • ADSGHYRFASGFTYKKMNCTEAA
  • what secondary structure will it adopt ?

28
Secondary Structure Prediction Methods
  • Chou-Fasman / GOR Method
  • Based on amino acid frequencies
  • Machine learning methods
  • PHDsec and PSIpred
  • HMM (Hidden Markov Model)

29
Chou and Fasman (1974)
Name P(a) P(b) P(turn) Alanine
142 83 66 Arginine 98 93
95 Aspartic Acid 101 54
146 Asparagine 67 89 156 Cysteine
70 119 119 Glutamic Acid 151 037
74 Glutamine 111 110
98 Glycine 57 75 156 Histidine
100 87 95 Isoleucine 108 160
47 Leucine 121 130 59 Lysine
114 74 101 Methionine 145
105 60 Phenylalanine 113 138
60 Proline 57 55 152 Serine
77 75 143 Threonine 83 119
96 Tryptophan 108 137
96 Tyrosine 69 147 114 Valine
106 170 50
The propensity of an amino acid to be part of a
certain secondary structure (e.g. Proline has a
low propensity of being in an alpha helix or beta
sheet ? breaker)
  • Success rate of 50

30
Secondary Structure Method Improvements
  • Sliding window approach
  • Most alpha helices are 12 residues longMost
    beta strands are 6 residues long
  • Look at all windows of size 6/12
  • Calculate a score for each window. If gtthreshold
    ? predict this is an alpha helix/beta sheet

TGTAGPOLKCHIQWMLPLKK
31
Improvements since 1980s
  • Adding information from conservation in MSA
  • Smarter algorithms (e.g. Machine learning, HMM).

Success -gt 75-80
32
Machine learning approach for predicting
Secondary Structure (PHD, PSIpred)
Query
SwissProt
  • Step 1
  • Generating a multiple sequence alignment

Query
Subject
Subject
Subject
Subject
33
  • Step 2
  • Additional sequences are added using a profile.
    We end up with a MSA which represents the protein
    family.

Query
seed
MSA
Query
Subject
Subject
Subject
Subject
34
Step 3
  • The sequence profile of the protein family is
    compared (by machine learning methods) to
    sequences with known secondary structure.

Query
seed
Machine Learning Approach
MSA
Known structures
Query
Subject
Subject
Subject
Subject
35
HMM approach for predicting Secondary Structure
(SAM)
  • HMM enables us to calculate the probability of
    assigning a sequence to a secondary structure

TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB
p ?
36
Beginning with an a-helix
The probability of observing Alanine as part of a
ß-sheet
a-helix followed by a-helix
The probability of observing a residue which
belongs to an a-helix followed by a residue
belonging to a turn 0.15
Table built according to large database of known
secondary structures
37
  • The above table enables us to calculate the
    probability of assigning secondary structure to a
    protein
  • Example

TGQ HHH
p 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635
0.0020995
Write a Comment
User Comments (0)
About PowerShow.com