CISC 841 Bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

CISC 841 Bioinformatics

Description:

CISC 841 Bioinformatics. Combining HMMs with SVMs. 1. Li Liao, CISC841, F07. HMM gradients ... ROC = 1. ROC = 0. ROC = 0.67. 6. 5. ROC: receiver operating characteristic score is the ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 21
Provided by: lil3
Category:

less

Transcript and Presenter's Notes

Title: CISC 841 Bioinformatics


1
CISC 841 Bioinformatics Combining HMMs with
SVMs
2
HMM gradients
  • Fisher Score ltXgt ?? log P(XH, ?)
  • The gradient of a sequence X with respect to a
    given model is computed using the
    forward-backward algorithm.
  • Each dimension corresponds to one parameter of
    the model.
  • The feature space is tailored to the sequences
    from which the model was trained.

3
SVM-Fisher discrimination
  • A probabilistic hidden Markov model ? is
    trained from some example
  • sequences x1 x2 x3 xN
  • Usually probability model P(xi?) (or function
    of P(xi?)) is used as a measure of
  • sequence-model membership, and a threshold is
    used on this measure
  • to decide membership.
  • The Fisher vector is a vector of gradients of
    P(xi?) (or gradients of function of P(xi?))
  • w.r.t the parameters of the model.
  • Uxi ?? P(xi?)
  • One can take the training example sequences
    (positive set) and other sequences that are
  • known to be non-members (negative set), and
    transform them into Fisher vectors.
  • A Support Vector Machine (SVM) can be trained
    using the positive and negative
  • Fisher vectors, and can be used to classify
    other sequences.

4
Application Protein remote homology detection
5
SVM-Pairwise method
Positive train
Negative train
Protein non-homologs
Protein homologs
1
Positive pairwise score vectors
Negative pairwise score vectors
Testing data
Target protein of unknown function
2
3
Support vector machine
Binary classification
6
Experiment known protein families
Jaakkola, Diekhans and Haussler 1999
7
Sample family sizes
8
A measure of sensitivity and specificity
5
6
ROC 1
ROC 0.67
ROC 0
ROC receiver operating characteristic score is
the normalized area under a curve the plots true
positives as a function of false positives
9
Application Discriminating signal peptide from
transmembrane proteins
10

Feature selection
  • We expect gradients w.r.t transition parameters
  • to be better discrimination features
  • We look for those transitions that
  • are differentially used by TM
  • proteins and SP proteins
  • - transform each signal peptide sequence (1275)
  • into a Fisher vector w.r.t transition
    parameters
  • and find the resultant vector
  • - transform each TM sequence into a Fisher
  • vector w.r.t transition parameters and find
  • the resultant vector
  • - compare the two resultant vectors

11
Gradients of P(sx)
  • In pattern recognition problems, we are
    interested in P(sx,?) rather than P(x?)
  • Usx ?? log P(sx,?) ?? log P(s, x?) - ??
    log P(x?)

12

Classification experiment
  • 10-fold cross validation experiment using
  • - positive set (247 TM proteins)
  • - negative set (1275 signal peptide containing
    proteins)
  • SVM-light package is used.

13
Discrimination results
  • Results
  • A third (68) more SP proteins that were
    incorrectly classified as TM
  • TM proteins are identified correctly.

14
Application Protein-Protein Interaction
Prediction
15
Interaction Profile Hidden Markov Model (ipHMM)
Fredrich et al (2006)
16
  • Knowledge transfer
  • Build ipHMM from proteins whose structural
    information is available.
  • Align the sequences of proteins whose
    structural information is
  • not available to the model.

Likelihood Score Vector
ltLSai, A, LSai, B, LSbj,A, LSbj, Bgt
Fisher Score Vector
U (x) ?? logP(x?)
Uij Ej(i) / ej(i) ? k Ej(k)
17
(No Transcript)
18
(No Transcript)
19
Data set Fredrich et al (2006) 2018 proteins
in 36 domain families
20
Conclusions
  • Structural information at binding sites enhances
    protein-protein interaction prediction.
  • Interaction profile HMM can transfer structural
    information
  • Fisher scores extracted from domain profiles
    further enhance protein-protein interaction
    prediction for proteins with no available
    structural information.
Write a Comment
User Comments (0)
About PowerShow.com