Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing - PowerPoint PPT Presentation

About This Presentation

Title:

Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing

Description:

Advisor :Prof. R. C. T. Lee. CSIE National Chi Nan University. 2. Outline. Concepts. Introduction ... on the Connection Machine. 141.20. 140.07 -6.88. 163.55 ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 64

Provided by: csieNc5

Category:

more less

Transcript and Presenter's Notes

Title: Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing

1
Protein Structure Prediction by A Data-level
Parallel Proceedings of the 1989 ACM/IEEE
conference on Supercomputing

Speaker Chuan-Cheng Lin
Advisor Prof. R. C. T. Lee
CSIE National Chi Nan University

2
Outline

Concepts
Introduction
Approach
Example
Conclusions
Reference

3
4
Concepts
80,000
5
Cell, Chromosome, DNA, Gene
trillions
Protein synthesis
23 pairs
6
polypeptides -gtamino acids
3 billions
1
DNA words
3
2
4
Concepts

Protein Synthesis

Transcription
Enzyme
Messenger RNA
5
Concepts

Protein Synthesis

Translation
6
Concepts
7
(No Transcript)
8
Concepts
Amino acid
9
Concepts
Peptide bound
10
Concepts
11
Concepts

Protein
Primary structure
Secondary structure
Tertiary structure
Quaternary structure

12
Introduction

What is protein?
Why do we prediction protein structure?
X-ray
NMR
Known 19006 protein structure(22-Oct-2002 )
How?

13
Introduction

Method of protein structure prediction
AI
Neural Network
PHI-PSI
Potential Energy
Statistical method

14
Introduction
To determine the native folded state of a
protein given only the primary sequence of
amino acids is referred to as the protein
folding problem.
15
Introduction
The protein folding problem is, given an amino
acid sequence, to find its correctly folded 3D
protein structure. Protein Folding in the
Hydrophobic- Hydrophilic(HP) Model is
NP-Complete BL98.
16
Given a test protein sequence, we want to
compare every part of it against every part of
every protein in the database, then to select
some similar parts of proteins in the database.
17
The Basic Algorithm
Step 1 Specify the initial parameters, such as
the initial windows size W, the window weight
pattern P, and N, the number of best matches to
keep.
18
Window size
1.Large or small
2.The five and seven are good choices for the
initial window size.
3.A smaller windows is used in finding the
best matches for prediction of the next larger
window.
19
The Weight Pattern
1 2 3 4 5
P 1 1 2 1 1
20
The Basic Algorithm
Step 2 Move the window over the test protein
sequence, And at each position, extract an amino
acid segment S of length W, and do
21
The Basic Algorithm
2-1. set the window size in every processor to
be of length W 2-2. send S to every
processor 2-3. match S against all si ,
i1,2,..,m in all the processors, and
compute a score for each si using a
scoring function 2-4. select the N segments from
s1,,sm which have the highest N scores.
22
Compute a score
23
Why do we bother to use the top N matches
rather than just the one with the highest score?
Among the top N matchers, the majority have a
similar structure, then the input will at least
have the tendency to form that structure as
well.
24
The Basic Algorithm
Step 3 If the recursive mode is chosen, adjust
the parameters (e.g. the window size) and repeat
Step 2 unless the end conditions are met or
PHI-PSI has gone though a pre-specified number
Recursive levels.
25
Example
Step 1 Initial parameters W 5 N 2 Recursive
level1 Sr0
26
The Weight Pattern
1 2 3 4 5
P 1 1 2 1 1
27
Step 2-1 The layout of the known protein
structure data on the Connection Machine
A L G G P E P Y
KP1

A L G G P
-64.19 -100.49 106.63 -66.44 -92.02
-33.26 8.49 0.20 163.55 -6.88
G P E P Y
-66.44 -92.02 -70.98 -70.98 -84.58
163.55 -6.88 140.07 141.20 120.99
PHI
PSI

Processor 1
Processor 4
28
A L G G A S E W
KP2

A L G G A
-61.48 -94.70 83.20 -106.05 -136.41
-28.65 3.88 22.82 -8.01 142.77
G A S E W
-106.05 -136.41 -61.15 -153.99 -125.64
-8.01 142.77 -37.26 171.98 120.48
PHI
PSI

Processor 5
Processor 8
29
P1 A L G G P P2 L G G P E P3 G G P E P P4 G P
E P Y P5 A L G G A P6 L G G A S P7 G G A S
E P8 G A S E W
30
Step 2-2
Testing protein sequence ALGGPNAWTG
A L G G P N A W T G
S ALGGP
Send S to P1P8
31
P1 A L G G P P2 L G G P E P3 G G P E P P4 G P
E P Y P5 A L G G A P6 L G G A S P7 G G A S
E P8 G A S E W
ALGGP
32
Step 2-3
S ALGGP P1 ALGGP
33
(No Transcript)
34
S ALGGP P2 LGGPE
35
S ALGGP P3 GGPEP
36
S ALGGP P4 GPEPY
37
S ALGGP P5 ALGGA
38
Step 2-4 Score 19
Score 23
Score 31.5 Score 40
Score 57.5
Score 63 Score 70
Score 80
39
S ALGGP
P1
A L G G P
-64.19 -100.49 106.63 -66.44 -92.02
-33.26 8.49 0.20 163.55 -6.88
PHI
PSI
P5
A L G G A
-61.48 -94.70 83.20 -106.05 -136.41
-28.65 3.88 22.82 -8.01 142.77
PHI
PSI
40
S
A L G G P
-64.19 -100.49 106.63 -66.44 -92.02
-33.26 8.49 0.20 163.55 -6.88
PHI
PSI
test protein
A L G G P N A W T G
-64.19 -100.49 106.63 -66.44 -92.02 -108.72 -66.18 -73.71 -125.23 -85.96
-33.26 8.49 0.20 163.55 -6.88 116.38 155.62 125.74 18.36 -162.86
PHI
PSI
41
Step 3 if
Srltrecursive level then WW2
Sr go to
Step 2 else end
42
The Weight Pattern
1 2 3 4 5 6 7
P 1 2 2 3 2 2 1
43
Step 2-1 The layout of the known protein
structure data on the Connection Machine
A L G G P E P Y
KP1
A L G G P E P
-64.19 -100.49 106.63 -66.44 -92.02 -70.98 -70.98
-33.26 8.49 0.20 163.55 -6.88 140.07 141.20
Processor 1
L G G P E P Y
-100.49 106.63 -66.44 -92.02 -70.98 -70.98 -84.58
8.49 0.20 163.55 -6.88 140.07 141.20 120.99
Processor 2
44
A L G G A S E W
KP2
A L G G A S E
-61.48 -94.70 83.20 -106.05 -136.41 -61.15 -153.99
-28.65 3.88 22.82 -8.01 142.77 -37.26 171.98
Processor 3
L G G A S E W
-94.70 83.20 -106.05 -136.41 -61.15 -153.99 -125.64
3.88 22.82 -8.01 142.77 -37.26 171.98 120.48
Processor 4
45
Step 2-2
Testing protein sequence AALGGPNA
A L G G P N A
S ALGGPNA
Send S to P1P4
46
P1 A L G G P E P P2 L G G P E P Y P3 A L G G
A S E P4 L G G A S E W
A L G G P N A
47
Step 2-3
S ALGGPNA P1 ALGGPEP
48
(No Transcript)
49
S ALGGPNA P2 LGGPEPY
50
(No Transcript)
51
S ALGGPNA P3 ALGGASE
52
(No Transcript)
53
S ALGGPNA P4 LGGASEW
54
(No Transcript)
55
Step 2-4 Score 1-74.9 Score 2-970.63 Score
3-1592.74 Score 4-860.25
56
A L G G P E P
-64.19 -100.49 106.63 -66.44 -92.02 -70.98 -70.98
-33.26 8.49 0.20 163.55 -6.88 140.07 141.20
Processor 1
L G G A S E W
-94.70 83.20 -106.05 -136.41 -61.15 -153.99 -125.64
3.88 22.82 -8.01 142.77 -37.26 171.98 120.48
Processor 4
57
test protein
A L G G P N A
-64.19 -100.49 106.63 -66.44 -92.02 -70.98 -70.98
-33.26 8.49 0.20 163.55 -6.88 140.07 141.20
PHI
PSI
58
Prediction errors

The prediction errors are measured
in terms of PHI and PSI angles.
There are several ways to measure the
errors, such as
Residue error
Overall errors

59
Residue errors the difference between the real
angle values computed from the 3D coordinates
and the values predicted by the algorithm for a
particular residue in a protein.
Overall errors the average of the residue
errors of all the proteins in the database.
60
Conclusions
Secondary Structure Prediction
61
(No Transcript)
62
Reference

BL98 Protein Folding in the Hydrophobic-Hydroph
ilic(HP) Model is NP-Complete, Berger, B. and
Leighton, T., Journal of Computational Biology,
Vol. 5, No. 1, 1998, pp. 27-40.
Protein Structure Prediction
http//cmgm.stanford.edu/WWW/www_predict.html
PDB (Protein Data Bank)
http//www.rcsb.org/pdb/

63
Thank you

Write a Comment

User Comments (0)