Title: Protein Structure Prediction by A Data-level Parallel Proceedings of the 1989 ACM/IEEE conference on Supercomputing
1Protein Structure Prediction by A Data-level
Parallel Proceedings of the 1989 ACM/IEEE
conference on Supercomputing
- Speaker Chuan-Cheng Lin
- Advisor Prof. R. C. T. Lee
- CSIE National Chi Nan University
2Outline
- Concepts
- Introduction
- Approach
- Example
- Conclusions
- Reference
34
Concepts
80,000
5
Cell, Chromosome, DNA, Gene
trillions
Protein synthesis
23 pairs
6
polypeptides -gtamino acids
3 billions
1
DNA words
3
2
4Concepts
Transcription
Enzyme
Messenger RNA
5Concepts
Translation
6Concepts
7(No Transcript)
8Concepts
Amino acid
9Concepts
Peptide bound
10Concepts
11Concepts
- Protein
- Primary structure
- Secondary structure
- Tertiary structure
- Quaternary structure
12Introduction
- What is protein?
- Why do we prediction protein structure?
- X-ray
- NMR
- Known 19006 protein structure(22-Oct-2002 )
- How?
13Introduction
- Method of protein structure prediction
- AI
- Neural Network
- PHI-PSI
- Potential Energy
- Statistical method
-
14Introduction
To determine the native folded state of a
protein given only the primary sequence of
amino acids is referred to as the protein
folding problem.
15Introduction
The protein folding problem is, given an amino
acid sequence, to find its correctly folded 3D
protein structure. Protein Folding in the
Hydrophobic- Hydrophilic(HP) Model is
NP-Complete BL98.
16Given a test protein sequence, we want to
compare every part of it against every part of
every protein in the database, then to select
some similar parts of proteins in the database.
17The Basic Algorithm
Step 1 Specify the initial parameters, such as
the initial windows size W, the window weight
pattern P, and N, the number of best matches to
keep.
18Window size
1.Large or small
2.The five and seven are good choices for the
initial window size.
3.A smaller windows is used in finding the
best matches for prediction of the next larger
window.
19The Weight Pattern
1 2 3 4 5
P 1 1 2 1 1
20The Basic Algorithm
Step 2 Move the window over the test protein
sequence, And at each position, extract an amino
acid segment S of length W, and do
21The Basic Algorithm
2-1. set the window size in every processor to
be of length W 2-2. send S to every
processor 2-3. match S against all si ,
i1,2,..,m in all the processors, and
compute a score for each si using a
scoring function 2-4. select the N segments from
s1,,sm which have the highest N scores.
22Compute a score
23Why do we bother to use the top N matches
rather than just the one with the highest score?
Among the top N matchers, the majority have a
similar structure, then the input will at least
have the tendency to form that structure as
well.
24The Basic Algorithm
Step 3 If the recursive mode is chosen, adjust
the parameters (e.g. the window size) and repeat
Step 2 unless the end conditions are met or
PHI-PSI has gone though a pre-specified number
Recursive levels.
25Example
Step 1 Initial parameters W 5 N 2 Recursive
level1 Sr0
26The Weight Pattern
1 2 3 4 5
P 1 1 2 1 1
27Step 2-1 The layout of the known protein
structure data on the Connection Machine
A L G G P E P Y
KP1
A L G G P
-64.19 -100.49 106.63 -66.44 -92.02
-33.26 8.49 0.20 163.55 -6.88
G P E P Y
-66.44 -92.02 -70.98 -70.98 -84.58
163.55 -6.88 140.07 141.20 120.99
PHI
PSI
Processor 1
Processor 4
28A L G G A S E W
KP2
A L G G A
-61.48 -94.70 83.20 -106.05 -136.41
-28.65 3.88 22.82 -8.01 142.77
G A S E W
-106.05 -136.41 -61.15 -153.99 -125.64
-8.01 142.77 -37.26 171.98 120.48
PHI
PSI
Processor 5
Processor 8
29P1 A L G G P P2 L G G P E P3 G G P E P P4 G P
E P Y P5 A L G G A P6 L G G A S P7 G G A S
E P8 G A S E W
30Step 2-2
Testing protein sequence ALGGPNAWTG
A L G G P N A W T G
S ALGGP
Send S to P1P8
31P1 A L G G P P2 L G G P E P3 G G P E P P4 G P
E P Y P5 A L G G A P6 L G G A S P7 G G A S
E P8 G A S E W
ALGGP
32Step 2-3
S ALGGP P1 ALGGP
33(No Transcript)
34S ALGGP P2 LGGPE
35S ALGGP P3 GGPEP
36S ALGGP P4 GPEPY
37S ALGGP P5 ALGGA
38Step 2-4 Score 19
Score 23
Score 31.5 Score 40
Score 57.5
Score 63 Score 70
Score 80
39S ALGGP
P1
A L G G P
-64.19 -100.49 106.63 -66.44 -92.02
-33.26 8.49 0.20 163.55 -6.88
PHI
PSI
P5
A L G G A
-61.48 -94.70 83.20 -106.05 -136.41
-28.65 3.88 22.82 -8.01 142.77
PHI
PSI
40S
A L G G P
-64.19 -100.49 106.63 -66.44 -92.02
-33.26 8.49 0.20 163.55 -6.88
PHI
PSI
test protein
A L G G P N A W T G
-64.19 -100.49 106.63 -66.44 -92.02 -108.72 -66.18 -73.71 -125.23 -85.96
-33.26 8.49 0.20 163.55 -6.88 116.38 155.62 125.74 18.36 -162.86
PHI
PSI
41Step 3 if
Srltrecursive level then WW2
Sr go to
Step 2 else end
42The Weight Pattern
1 2 3 4 5 6 7
P 1 2 2 3 2 2 1
43Step 2-1 The layout of the known protein
structure data on the Connection Machine
A L G G P E P Y
KP1
A L G G P E P
-64.19 -100.49 106.63 -66.44 -92.02 -70.98 -70.98
-33.26 8.49 0.20 163.55 -6.88 140.07 141.20
Processor 1
L G G P E P Y
-100.49 106.63 -66.44 -92.02 -70.98 -70.98 -84.58
8.49 0.20 163.55 -6.88 140.07 141.20 120.99
Processor 2
44A L G G A S E W
KP2
A L G G A S E
-61.48 -94.70 83.20 -106.05 -136.41 -61.15 -153.99
-28.65 3.88 22.82 -8.01 142.77 -37.26 171.98
Processor 3
L G G A S E W
-94.70 83.20 -106.05 -136.41 -61.15 -153.99 -125.64
3.88 22.82 -8.01 142.77 -37.26 171.98 120.48
Processor 4
45Step 2-2
Testing protein sequence AALGGPNA
A L G G P N A
S ALGGPNA
Send S to P1P4
46P1 A L G G P E P P2 L G G P E P Y P3 A L G G
A S E P4 L G G A S E W
A L G G P N A
47Step 2-3
S ALGGPNA P1 ALGGPEP
48(No Transcript)
49S ALGGPNA P2 LGGPEPY
50(No Transcript)
51S ALGGPNA P3 ALGGASE
52(No Transcript)
53S ALGGPNA P4 LGGASEW
54(No Transcript)
55Step 2-4 Score 1-74.9 Score 2-970.63 Score
3-1592.74 Score 4-860.25
56A L G G P E P
-64.19 -100.49 106.63 -66.44 -92.02 -70.98 -70.98
-33.26 8.49 0.20 163.55 -6.88 140.07 141.20
Processor 1
L G G A S E W
-94.70 83.20 -106.05 -136.41 -61.15 -153.99 -125.64
3.88 22.82 -8.01 142.77 -37.26 171.98 120.48
Processor 4
57test protein
A L G G P N A
-64.19 -100.49 106.63 -66.44 -92.02 -70.98 -70.98
-33.26 8.49 0.20 163.55 -6.88 140.07 141.20
PHI
PSI
58Prediction errors
- The prediction errors are measured
- in terms of PHI and PSI angles.
- There are several ways to measure the
- errors, such as
- Residue error
- Overall errors
59Residue errors the difference between the real
angle values computed from the 3D coordinates
and the values predicted by the algorithm for a
particular residue in a protein.
Overall errors the average of the residue
errors of all the proteins in the database.
60Conclusions
Secondary Structure Prediction
61(No Transcript)
62Reference
- BL98 Protein Folding in the Hydrophobic-Hydroph
ilic(HP) Model is NP-Complete, Berger, B. and
Leighton, T., Journal of Computational Biology,
Vol. 5, No. 1, 1998, pp. 27-40. - Protein Structure Prediction
- http//cmgm.stanford.edu/WWW/www_predict.html
- PDB (Protein Data Bank)
- http//www.rcsb.org/pdb/
63Thank you