Leo Molokov - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Leo Molokov

Description:

This presentation will probably involve audience discussion, which ... Studies performed only on one specie. Predicting protein-protein interaction using SVM ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 20
Provided by: csCha
Category:
Tags: leo | molokov | specie

less

Transcript and Presenter's Notes

Title: Leo Molokov


1
Predicting protein-protein interaction using SVM
  • This presentation will probably involve audience
    discussion, which will create action items. Use
    PowerPoint to keep track of these action items
    during your presentation
  • In Slide Show, click on the right mouse button
  • Select Meeting Minder
  • Select the Action Items tab
  • Type in action items as they come up
  • Click OK to dismiss this box
  • This will automatically create an Action Item
    slide at the end of your presentation with your
    points entered.
  • Leo Molokov
  • Xidan Li
  • Parhati Aibibula

2
Basic information
  • Project is based on paper 1
  • Different data set had been used.
  • Tweaks in results.
  • Comparison will be provided.
  • Tests on parameter value were performed
  • Data summarization procedure was changed.

3
Model built
  • Feature vector
  • Symmetric pairs introduced
  • x p1p2, pi represents protein,
  • p dwf, d domain information,
  • wf amino acid frequencies.
  • Kernel function
  • Gaussian RBF with various parameter values
  • (0.01, 0.02, 0.04, 0.06 and 0.08 were checked)

4
Data sets
  • SGD
  • S.Cerevisiae Genome Database
  • http//www.yeastgenome.org/
  • DIP
  • Database of Interacting Proteins
  • http//dip.doe-mbi.ucla.edu/
  • pFAM-A
  • Protein family classification
  • http//www.sanger.ac.uk/Software/Pfam/

5
Data extraction
6
(No Transcript)
7
Training and test samples.
  • Training
  • Negatives to Positives ratio of 1.0, 2.0 and 4.0
    were considered.
  • N/P1.0, 2.0, 4.0
  • Positives number is constant and equal to 1500.
  • P1500
  • Symmetric pairs are generated
  • Test
  • P400, N/P180 (compared to used before N/P60)

8
Multiple SVM
  • Dependency between training sample size and
    working time is polynomial.
  • Training data set is split into subsets upon
    which several SVMs are trained.
  • Data is summarized by
  • Taking weighted average -OR-
  • Taking minimum
  • The first approach seems to work better on gt3
    SVMs
  • Precision/sensitivity are actually increased

9
Comparison tables.
  • Different data sets were used, although overall
    parameters dont fluctuate much.
  • N/P60 on test set gives the similar values
  • N/P180 is actually used.

10
Old data different N/P ratio
11
New data different N/P ratio
12
(No Transcript)
13
Old data various number of SVMs
14
New data various number of SVMs
15
Different summary approach
16
G-parameter of RBF
17
Conclusions
  • Method may be used to yield a number of unknown
    interactions by chance.
  • FPR is reasonably low.
  • Sensitivity is high
  • Precision exceeds random assignment analogue.
  • Training on high N/P ratio sets is required
  • Multiple SVMs increase the performance in both
    senses

18
Some Contras
  • Negatives are assigned to some possibly positive
    pairs.
  • Positive might not mean interacting. Old
    problem of statistics.
  • Low sensitivity
  • Studies performed only on one specie.

19
Acknowledgements
  • Shinsuke Dohkan, Asako Koike and Toshihisa Takagi
  • for Improving the performance of an SVM-based
    method for predicting protein-protein
    interactions and for keeping in touch and some
    valueable advices!
  • Dia project team
  • For the tool Dia!
  • You all
  • For listening
Write a Comment
User Comments (0)
About PowerShow.com