Proteins Secondary Structure Predictions - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Proteins Secondary Structure Predictions

Description:

Structural Bioinformatics Proteins Secondary Structure Predictions – PowerPoint PPT presentation

Number of Views:413

Avg rating:3.0/5.0

Slides: 39

Provided by: 9918

Category:

more less

Transcript and Presenter's Notes

Title: Proteins Secondary Structure Predictions

1
Proteins SecondaryStructure Predictions
Structural Bioinformatics
2
The first high resolution structure of a
protein-myoglobin
Was solved in 1958 by Max Perutz John Kendrew of
Cambridge University. (Won the 1962 and Nobel
Prize in Chemistry)
In 12.12.2013 there were 89,110 protein
structures in the protein structure
database. Great increase but still a magnitude
lower then the total number of protein sequence
databases (close to 1,000,000)
3
What can we do to bridge the gap??
MERFGYTRAANCEAP.

Predicting the three dimensional structure
from sequence of a protein is very hard
(some times impossible)
However we can predict with relative high
precision the secondary structure

4
What do we mean by Secondary Structure ?

Secondary structure are the building blocks of
the protein structure

5
What do we mean by Secondary Structure ?

Secondary structure is usually divided into
three categories

Anything else turn/loop
Alpha helix
Beta strand (sheet)
6
The different secondary structures are combined
together to form theTertiary Structure of the
Proteins
7
Secondary
Tertiary
?
?
RBP
?
Globin
8
Secondary Structure Prediction

Given a primary sequence
ADSGHYRFASGFTYKKMNCTEAA
what secondary structure will it adopt
(alpha helix, beta strand or random coil) ?

9
Secondary Structure Prediction Methods

Statistical methods
Based on amino acid frequencies
HMM (Hidden Markov Model)
Machine learning methods
SVM , Neural networks

10
Chou and Fasman (1974)
Statistical Methods for SS prediction
Name P(a) P(b) P(turn) Alanine
142 83 66 Arginine 98 93
95 Aspartic Acid 101 54
146 Asparagine 67 89 156 Cysteine
70 119 119 Glutamic Acid 151 037
74 Glutamine 111 110
98 Glycine 57 75 156 Histidine
100 87 95 Isoleucine 108 160
47 Leucine 121 130 59 Lysine
114 74 101 Methionine 145
105 60 Phenylalanine 113 138
60 Proline 57 55 152 Serine
77 75 143 Threonine 83 119
96 Tryptophan 108 137
96 Tyrosine 69 147 114 Valine
106 170 50
The propensity of an amino acid to be part of a
certain secondary structure (e.g. Proline has a
low propensity of being in an alpha helix or beta
sheet ? breaker)

Success rate of 50

11
Secondary Structure Method Improvements

Sliding window approach
Most alpha helices are 12 residues longMost
beta strands are 6 residues long
Look at all windows of size 6/12
Calculate a score for each window. If gtthreshold
? predict this is an alpha helix/beta sheet

TGTAGPQLKCHIQWMLPLKK
12
Improvements since 1980s

Adding information from conservation in MSA
Smarter algorithms (e.g. Machine learning, HMM).

13
HMM (Hidden Markov Model) approach for
predicting Secondary Structure

HMM enables us to calculate the probability of
assigning a sequence to a secondary structure

TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB
p ?
14
Beginning with an a-helix
The probability of observing Alanine as part of a
ß-sheet
a-helix followed by a-helix
The probability of observing a residue which
belongs to an a-helix followed by a residue
belonging to a turn 0.15
Table built according to large database of known
secondary structures
15

Example
What is the probability that the sequence TGQ
will be in a helical structure??

TGQ HHH
p 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635
0.0020995
Success of HMM based methods-gt 75-80
16

What can we learn from secondary structure
predictions??

17
Mad Cow DiseasePrPc to PrPsc
PRPc
PRPsc
18
How do the protein structure relate to the
primary protein sequence??
19
SEQUENCE
-Early experiments have shown that the sequence
of the protein is sufficient to determine its
structure (Anfisen) - Protein structure is more
conserved than protein sequence and more closely
related to function.
20
How (CAN) Different Amino Acid Sequence Determine
Similar Protein Structure ??
Lesk and Chothia 1980
21
The Globin Family
22
Different sequences can result in similar
structures
1ecd
2hhd
23

We can learn about the important features
which determine structure and function by
comparing the sequences and structures ?

24
The Globin Family
25
Why is Proline 36 conserved in all the globin
family ?
26
Where are the gaps??
The gaps in the pairwise alignment are mapped to
the loop regions
27
How are remote homologs related in terms of their
structure?
RBD
b-lactoglobulin
28
PSI-BLAST alignment of RBP and b-lactoglobulin
iteration 3
Score 159 bits (404), Expect
1e-38 Identities 41/170 (24), Positives
69/170 (40), Gaps 19/170 (11) Query 3
WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMA
KKDPEGLFLQ 54 V L LA A
S VENFD G WY K Sbjct 1
MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIE
KIPASFE-KG 59 Query 55 DNIVAEFSVDETGQMSATAKGRVR
LLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114
I A S E G K V PAK
Sbjct 60 NCIQANYSLMENGNIEVLNKELSPDG
TMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112 Query
115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGL
PPEA 164 WI TDY YA YSC
RP LPPE Sbjct 113
MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET
159
29
The Retinol Binding Protein
b-lactoglobulin
30
Taken together
MERFGYTRAANCEAP.
FUNCTION
31
Pfam

Database that contains a large collection of
multiple sequence alignments of protein families
(common structures)
Very useful for function prediction.

http//pfam.sanger.ac.uk/
32
The zinc-finger family (domain)
Known family of Transcription Factors
Protein sequence
ZINC FINGER DOMAIN
33
Pfam
Based on Profile hidden Markov Models (HMMs)
which represents the protein family HMM in
comparison to PSSM is a model which considers
dependencies between the different columns in the
matrix (different residues) and is thus much more
powerful!!!!
http//pfam.sanger.ac.uk/
34
Profile HMM (Hidden Markov Model)can accurately
represent a MSA
D19
D16
D17
D18
100
16 17 18 19
delete
D R T R D R T S S - - S S P T R D R T R D P
T S D - - S D - - S D - - S D - - R
100
50
M16
M17
M18
M19
100
100
50
D 0.8 S 0.2
P 0.4 R 0.6
R 0.4 S 0.6
Match
T 1.0
I16
I19
I18
I17
insert
X
X
X
X
35
Extra Slides (for your interest)
36
Alpha Helix Pauling (1951)

A consecutive stretch of 5-40 amino acids
(average 10).
A right-handed spiral conformation.
3.6 amino acids per turn.
Stabilized by Hydrogen bonds

3.6 residues 5.6 Å
37
Beta Strand Pauling and Corey (1951)
ß -strand
gt An extended polypeptide chains is called
ß strand (consists of 5-10 amino acids gt The
chains are connected together by Hydrogen
bonds to form b-sheet
ß -sheet
38
Loops