PROTEIN SEONDARY - PowerPoint PPT Presentation

1 / 89
About This Presentation
Title:

PROTEIN SEONDARY

Description:

PROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee CS 882 Protein Folding Instructed by Professor Ming Li – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 90
Provided by: Anni138
Category:

less

Transcript and Presenter's Notes

Title: PROTEIN SEONDARY


1
PROTEIN SEONDARY SUPER-SECONDARY STRUCTURE
PREDICTION WITH HMM By En-Shiun Annie Lee CS 882
Protein Folding Instructed by Professor Ming Li
2
0. OUTLINE
  • Introduction
  • Problem
  • Methods (4)
  • HMM Examples (3)
  • Segmentation HMM
  • Profile HMM
  • Conditional Random Field
  • Proposal

3
1. INTRODUCTION
  • Introduction
  • Problem
  • Methods (4)
  • HMM Examples (3)
  • Segmentation HMM
  • Profile HMM
  • Conditional Random Field
  • Proposal

4
1. Genomics
  • Achievements in Genomic
  • BLAST (Basic Local Alignment Search Tool)
  • most cited paper published in 1990s
  • more than 15,000 times
  • Human genome project
  • Completion April 2003

5
1. Proteomics
  • Precedence to Proteomics
  • Protein Data Bank (PDB)
  • 40,132 structures
  • cited more than 6,000 times

6
1. Proteomics
Number of Protein Structures in Protein Data Bank
7
1. Secondary Structure
  • Importance
  • The known secondary structure may be used as an
    input for the tertiary structure predictions.

8
1. Protein Structure
  • Primary Structure

9
1. Protein Structure
  • Secondary Structure

10
1. Secondary Structure
  • a-helix
  • Interaction between i and (i4)th residue

11
1. Secondary Structure
  • ß-sheet/strand
  • Parallel or Anti-parallel

12
1. Secondary Structure
  • Coil (loop)

13
1. Protein Structure
  • Tertiary Structure

14
1. Protein Structure
  • Super-Secondary (2.5) Structure

Super-Secondary (2.5) Structure
15
1. Protein Structure
  • Quaternary Structure

Super-Secondary (2.5) Structure
16
2. PROBLEM
  • Introduction
  • Problem
  • Methods (4)
  • HMM Examples (3)
  • Segmentation HMM
  • Profile HMM
  • Conditional Random Field
  • Proposal

17
2. Secondary Structure
  • Problem
  • Given
  • A primary sequence of amino acids
  • a1a2an
  • Find
  • Secondary structure of each ai as
  • a-helix H
  • ß-strand E
  • coil C

18
2. Secondary Structure
  • Example
  • Given
  • Primary Sequence
  • GHWIATRGQLIREAYEDYRHFSSECPFIP
  • Find
  • Secondary Structure Element
  • CEEEEECHHHHHHHHHHHCCCHHCCCCCC
  • Note segments

19
2. Prediction Quality
  • Three-state prediction accuracy
  • Q3 of correctly predicted residues total
    of number of residues
  • Q?, Qß, Qc
  • Q3 for random prediction is 33
  • Theoretical limit Q390.

20
2. Prediction Quality
  • Segment Overlap (SOV)
  • Higher penalties for core segment regions
  • Matthews Correlation Coefficients (MCC)
  • Prediction errors made for each state

21
2. True Structures
  • Three dimensional PDB data
  • DSSP (Dictionary of Secondary Structure of
    Proteins)
  • 8 states
  • H alpha helix H
  • G 310 - helix H
  • I 5 helix (pi helix) H
  • E extended strand (beta ladder) E
  • B residue in isolated beta-bridge E
  • T hydrogen bonded turn C
  • S bend C
  • C coil C
  • STRIDE

22
3. METHODS
  • Introduction
  • Problem
  • Methods (4)
  • HMM Examples (3)
  • Segmentation HMM
  • Profile HMM
  • Conditional Random Field
  • Proposal

23
3. Sliding Window
  • Sliding-Window

24
3. Sliding Window
  • Sliding-Window

25
3. Sliding Window
  • Sliding-Window

26
3. Sliding Window
  • Sliding-Window

27
3. Four Methods
  • Statistical Method
  • Neural Network
  • Support Vector Machine
  • Hidden Markov Model

28
3a. Statistical Method
  • Propensity
  • Ex. Chou-Fasman 5053

29
3b. Neural Network
  • Ex. PHD 71

30
3c. SVM
  • Ex. PSIPRED 7678

31
3d. HMM Definition
  • State set Q
  • Output alphabet S

32
3d. HMM Definition
  • Transition probabilities
  • probability of entering the state p from state q
  • Tq(p)
  • ? q ? Q
  • ? p ? Q

33
3d. HMM Definition
  • Emission probabilities
  • probability emits each letter of S from state q
  • Eq(ai)
  • ? ai ? S
  • ? q ? Q

34
3d. HMM Decoding
  • Problem
  • Given
  • HMM (Q,S,E,T) and
  • Sequence S
  • Where S S1, S2, , Sn
  • Find
  • Most probable path of state gone through to get S
  • Where X X1, X2, , Xn state sequence

35
4. HMM Decoding
  • Optimize
  • Pr S , X
  • X X1, X2, , Xn state sequence
  • S S1, S2, , Sn
  • Pr S X

36
4. HMM Decoding
  • Dynamic programming
  • Memoryless
  • Pr XnSn Pr Xn-1Sn-1 Tn-1Xn EXn Sn

37
4. HMM EXAMPLES
  • Introduction
  • Problem
  • Methods (4)
  • HMM Examples (3)
  • Segmentation HMM
  • Profile HMM
  • Conditional Random Field
  • Proposal

38
4a. SEMI-HMM
  • Introduction
  • Problem
  • Methods (4)
  • HMM Examples (3)
  • Semi-HMM
  • Profile HMM
  • Conditional Random Field
  • Proposal

39
4a. Semi-HMM
  • Definition
  • Each state can emit a sequence
  • Move emission probabilities into states
  • Model secondary structure segments

40
4a. Segmentation
  • Sequence Segments

41
4a. Segmentation
  • Sequence Segments

42
4a. Segmentation
  • Sequence Segments
  • T secondary structural type of the segment, H,
    E, L
  • S ends of each individual structural segments
  • R known amino acid sequence

43
4a. Segmentation
  • Sequence Segments
  • T2 E ß-strand
  • S2 9
  • R2 S1 1 S2

44
4a. Bayesian
  • Bayesian Formulation
  • R Sequence of ALL amino acid residues
  • S End of the segments
  • T Secondary structural type of the segments
  • H, E, L

45
4a. Bayesian
  • Bayesian Formulation

?
?
?
  1. Likelihood
  2. Priori Probability
  3. Constant ? (S,T) ? ? dropped

46
4a. Bayesian
  • ? Likelihood
  • m Total number of segments
  • Sj End of the jth segments
  • Tj Secondary structural type of the jth segments

47
4a. Bayesian
  • ? Likelihood

48
4a. Bayesian
  • ? Likelihood

49
4a. Bayesian
  • ? Likelihood

N-terminus
Internal
C-terminus
50
4a. BSPPS
  • Bayesian Segmentation PPS

51
4a. BSPPS
  • Bayesian Segmentation PPS

52
4a. Results
  • Better than PSIPRED
  • (w/o homology information)

53
4a. Results
  • Better than PSIPRED
  • (w/o homology information)

54
4b. PROFILE-HMM
  • Introduction
  • Problem
  • Methods (4)
  • HMM Examples (3)
  • Semi-HMM
  • Profile HMM
  • Conditional Random Field
  • Proposal

55
4b. Profile HMM
  • Main States
  • Columns of alignment

56
4b. Profile HMM
  • Insertion States

57
4b. Profile HMM
  • Deletion States
  • Jump over 1 column in alignment

58
4b. Profile HMM
  • Combined

59
4b. HMMSTR
  • HMM for local protein STRucture

60
4b. HMMSTR
  • HMM for local protein STRucture
  • Pronounced hamster

61
4b. I-Site Library
  • I-sites Library
  • Motif short basic structural fragments
  • 319 residues
  • 262 motifs
  • Highly predictable
  • Non-redundant PDB data (lt25 similarity)
  • Fold uniquely across protein family
  • Exhaustive motif clustering

62
4b. Build HMM
  • States
  • Amino acid sequence and
  • Structural attribute
  • Transition from state
  • Adjacent positions in motif
  • No gap or insertion states

63
4b. Build HMM
  • Emission probability distributions
  • b observed amino acid
  • (20 probability values)
  • d secondary structure
  • (helix, strand, loop)
  • r backbone angle region
  • (11 dihedral angle symbols)
  • c structural context descriptor
  • (10 context symbols)

64
4b. Build HMM
  • Model I-site Library
  • Each 262 motif is a chain in HMM
  • Merge states base on similarity of
  • Sequence
  • Structure

65
4b. Build HMM
  • Model I-site Library
  • Merge states
  • base on similarity of
  • Sequence
  • Structure

66
4b. HMMSTR Merge
  • Ex. ß-Hairpin

67
4b. HMMSTR Merge
  • Ex. ß-Hairpin

68
4b. HMMSTR Merge
  • Ex. ß-Hairpin

69
4b. HMMSTR Merge
  • Ex. ß-Hairpin

70
4b. HMMSTR Training
  • Input PDB proteins
  • Find
  • best state sequence for sequence
  • probability distribution of one amino acid
  • Integrate 3 data set
  • Aligned probability distribution
  • Amino acid and context information
  • Contact map

71
4b. HMMSTR Summary
  • 282 nodes
  • 317 transitions
  • 31 merged motifs

72
4b. HMMSTR Summary
  • Introduce structural context on level of
    super-secondary structure
  • Predict higher-order 3D tertiary structure
  • Side-result predict 1D secondary structure

73
4b. PROFILE-HMM
  • Introduction
  • Problem
  • Methods (4)
  • HMM Examples (3)
  • Semi-HMM
  • Profile HMM
  • Conditional Random Field
  • Proposal

74
4c. HMM Disadvantages
  • Does not model
  • Multiple interacting features
  • Long-range dependencies
  • Strict independence assumptions

75
4c. Conditional Model
  • Allow
  • Arbitrary features
  • Non-independent features
  • Transition probability
  • With respect to past and future observations

76
4c. Conditional Model
y1
y2
y3
y4
y5
y6

HMM
x1
x2
x3
x4
x5
x6
y1
y2
y3
y4
y5
y6

CRF
x1
x2
x3
x4
x5
x6
77
4c. Random Field
  • Random Field (Undirected graphical model)
  • Let G (Y, E) be a graph
  • Where each vertex Yv a random variable
  • If P(Yvall other Y) P(Yvneighbours of Yv)Then
    Y is a random field

78
4c. Random Field
  • Example
  • P(Y5 all other Y) P(Y5 Y4, Y6)

79
4c. Conditional RF
  • Conditional Random Field
  • Let X r.v. data sequences to be labeled
  • observations
  • Let Y r.v. corresponding label sequences
  • labels
  • Let G (V, E) be a graph
  • S.t. Y (Yv)v?Y so Y is indexed by vertices of G
  • If P(Yv X, Yw w?v) P(Yv X, Yw, wv)Then
    (X, Y) is a random field

80
4c. Conditional RF
  • Example
  • P(Y3 X, all other Y) P(Y3 X, Y2, Y4)

81
4c. HMM vs. CRF
  • HMM
  • Maximize P(x,y?)P(yx,?)P(x?)
  • Transition and emission probabilities
  • Transition/emission base only one x
  • CRF
  • Maximize P(yx,?)
  • Feature function f(i, j, k)
  • Feature function base on all x

82
4c. Beta-Wrap
  • ß-Helix
  • 3 parallel ß-strands
  • Connected by coils
  • Few solved structures
  • 9 SCOP SuperFamilies
  • 14 RH solved structures in PDB
  • Solved structures differ widely

83
4c. Graph Definition
  • Let G (V,E1,E2) be a graph
  • V Nodes/States Secondary structures
  • Edges interactions
  • E1
  • Edges between adjacent neighbors
  • Implied in the model
  • E2
  • Edges for long-term interactions
  • Explicitly considered

84
4c. Beta-Wrap Example
  • Simple Example
  • S2 first ß-strand
  • S3 coil
  • S4 second ß-strand
  • S5 coil
  • S6 ?-helix

85
4c. Beta-Wrap
  • ß-Helix Solution

86
5. PROPOSAL
  • Introduction
  • Problem
  • Methods (4)
  • HMM Examples (3)
  • Segmentation HMM
  • Profile HMM
  • Conditional Random Field
  • Proposal

87
5. Difficulties
  • Do not infer global interaction
  • i.e. Beta-sheet interactions
  • Protein structure definition constraint

88
5. Possible Future Work
  • Novel methods of secondary structure prediction
  • Model as Integer Programming
  • Super-secondary structure prediction

89
5. Acknowledgement
  • Professor Ming Li
  • Guidance in
  • knowledge and
  • expertise
  • Bioinformatics lab
  • Mentoring a rookie
  • Class
  • Attention and listening
Write a Comment
User Comments (0)
About PowerShow.com