An Iterative Bayesian Updating Method for Biological Pathway Prediction PowerPoint PPT Presentation

presentation player overlay
1 / 7
About This Presentation
Transcript and Presenter's Notes

Title: An Iterative Bayesian Updating Method for Biological Pathway Prediction


1
An Iterative Bayesian Updating Method for
Biological Pathway Prediction
  • Corey Powell

2
Biological Pathway Prediction
  • Input A list of genes that work together in a
    biological pathway.
  • Goal Determine all genes in the biological
    pathway.

3
Desirable Properties for a Biological Pathway
Prediction Method
  • The method should be able to integrate data from
    multiple data sources
  • The method should give probabilistic assessments
    of pathway membership.
  • The method should identify genes that might be
    misclassified as being in the pathway.

4
The Method
  • Determine prior estimates for the probability
    that each gene is in the pathway.
  • Repeat
  • Determine score functions for each gene and
    data source.
  • Compute positive and negative conditional
    distributions for the score function from each
    data source.
  • Update the probability estimates for each gene
    using the positive and negative conditional
    distributions together with Bayes Theorem and
    the Naïve Bayes assumption.

5
Testing the Method
  • The method was tested on the Ribosome, Oxidative
    Phosphorylation, Proteasome, and Cell Cycle
    pathways from KEGG, together with random
    pathways consisting of 25, 50, 75, and 100 genes.
  • The pathway probabilities were estimated using
    1) gene coexpression data, 2) protein interaction
    data, and 3) both 1) and 2).
  • The testing was based on a 5-fold cross
    validation, with a gene classified as being in
    the pathway if the probability was greater than
    50.

6
General Behavior of the Method
  • Pathway probabilities tended to converge either
    to 0 or to 1. The convergence was very rapid for
    pathways that were well represented in the data.
  • The method exploded if run for too many
    iterations.

7
Results
  • For the random pathways, the method would rarely
    classify a gene as belonging to the pathway, and
    this classification was almost always incorrect.
  • The performance of the method was best on the
    Ribosome, followed by the Proteasome, Oxidative
    Phosphorylation, and Cell Cycle. Combining gene
    coexpression data and protein interaction data
    resulted in sometimes dramatic improvements in
    precision.
Write a Comment
User Comments (0)
About PowerShow.com