Predicting PTS1 Peroxisomal Matrix Proteins with SVMs - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Predicting PTS1 Peroxisomal Matrix Proteins with SVMs

Description:

Predicting PTS1 Peroxisomal Matrix Proteins with SVMs. John Hawkins & Mikael Bod n ... The Peroxisome is an organelle involved in numerous metabolic pathways. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 17
Provided by: jhaw4
Category:

less

Transcript and Presenter's Notes

Title: Predicting PTS1 Peroxisomal Matrix Proteins with SVMs


1
Predicting PTS1 Peroxisomal Matrix Proteins with
SVMs
  • John Hawkins Mikael Bodén

2
Introduction
  • The Peroxisome is an organelle involved in
    numerous metabolic pathways.
  • Proteins are manufactured from nuclear
    transcripts in the cytoplasm and then imported
    into the peroxisome.
  • Import dependant upon targeting signals
  • Accurate peroxisomal localisation prediction is
    important for several reasons.
  • Cost effective in-silico experimentation
  • Aiding the drug design process

3
PTS1 Peroxisomal Proteins
  • The majority of Peroxisomal Matrix Proteins use a
    C-terminal signal called PTS1.
  • Highly conserved trimer
  • Known dependencies between locations in the final
    twelve residues.

4
Prediction Approaches
  • Amino Acid Composition (1983) - K. Nishikawa, Y.
    Kubota, and T. Ooi.
  • Motif Recognition - PSORT (1990) Nakai Horton
  • Sequence Level Machine learning - PeroxiP (2003)
    Emanuelsson, Elofsson, Heijne and Cristóbal
  • Eliminate sequences bound elsewhere
  • Verify valid PTS1 motif
  • Machine learning module applied to the 9-mer
    preceding the PTS1 tri-peptide
  • Custom parameterised model PTS1 Predictor
    (2003) Neuberger, Maurer-Stroh, Eisenhaber,
    Hartig Eisenhaber.

5
Previous Work
  • Previously established that including the
    tri-peptide in the input to the machine learning
    module improves the prediction accuracy.
  • SVMs outperformed Neural Networks, Naïve Bayes
    and Decision Tree methods.

6
Current Work
  • Exploration of Kernel Space using Jack Knife
    tests
  • Investigation of the information provided by the
    inclusion of amino acid composition window.
  • Use of logistic output function on the SVM
    Platt (2000)

7
Amino Acid Composition Study
8
Amino Acid Composition
  • Almost all kernels are improved with the
    inclusion of AAC.
  • Only Gaussian Kernel with gamma 0.01 is
    negatively affected.
  • Third order polynomial kernel performs best. MCC
    of 0.70 including or excluding lower order terms.

9
Logistic Output Study
10
Logistic Output Study
  • Logistic output improves performance for all
    kernels, regardless of whether amino acid
    composition is used.
  • Fifth order polynomial kernel performs best of
    all. MCC of 0.74 with or without lower order
    terms.

11
Final Model
12
Final Model Performance
  • Trained on the PeroxiP dataset our model yielded
    an average MCC of 0.76, a 52 improvement.
  • Training using our updated, 2005 dataset, yielded
    a further 1 increase in performance
  • The sensitivity of our final model is identical
    to that of the original PeroxiP, all of the
    improvement in performance has come from an
    increase of 45 in the specificity of the model.

13
Life without a Peroxisome
  • Several species of parasitic eukaryotes are
    believed not to possess a peroxisome.
  • False positive testing using these proteomes.
  • PeroxiP has a false positive rate of approx 1.88
  • PTS1Prowler has a false positive rate of 0.088,
    over an order of magnitude smaller.

14
PTS1 Predictor
  • Neuberger et al trained their model only with
    positive instances
  • No MCC reported
  • Estimated sensitivity of 90
  • Testing on prokaryotic proteomes, false positive
    rate of 0.74
  • Performing the same test with PTS1Prowler gives a
    FP rate of 0.08

15
Conclusions
  • Using the three stage classifier design of
    PeroxiP, we have refined the machine learning
    module considerably.
  • A final MCC of 0.77, an improvement of 54
  • False positive rate of 0.088 well over an order
    of magnitude smaller than that of PeroxiP.
  • Although our sensitivity is lower than that of
    PTS1 Predictor, their false positive rate is
    almost an order of magnitude greater.

16
The End
  • ?
Write a Comment
User Comments (0)
About PowerShow.com