People - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

People

Description:

A colorful path ... point: a colorful path of length k contains a colorful path of ... A colorful path is simple, but a simple path may not be colorful under ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 20
Provided by: jacobs3
Category:
Tags: colorful | people

less

Transcript and Presenter's Notes

Title: People


1
Title
  • People

2
Outline
  • Motivation
  • Theoretical foundations
  • Biological extensions
  • Implementation
  • Validation techniques
  • Results from yeast

3
Motivation
  • Post-genomics, want to understand organisms
    protein-protein interaction network
  • Model network as a probabilistic graph
  • Edge weights represent interaction probabilities
  • Interested in protein signaling cascades
  • Show up as simple paths in the graph
  • Want to find biologically interesting paths
    efficiently
  • Score paths, with high scores reflecting
    importance
  • Extended graph algorithms provide speed

4
Theoretical Foundation
  • Finding long, simple paths is NP-Complete
  • Reduce from TSP
  • Requirement for paths to be simple is what drives
    hardness
  • Color-Coding is a randomized, dynamic-programming
    based algorithm for finding paths of fixed length
  • Developed by Alon et all

5
Color-Coding
  • Need to eliminate restriction for paths to be
    simple
  • Instead, randomly color graph and require paths
    be colorful (containing exactly one vertex of
    each color)
  • Number of colors fixed length of paths
  • A colorful path is always simple
  • The problem of finding colorful paths can be
    solved with dynamic programming

6
Color-Coding DP
  • Key point a colorful path of length k contains a
    colorful path of length k-1.
  • Store path information at each node for each
    subset of k colors
  • Only 2k color subsets, rather than O(nk) node
    subsets
  • Runtime is O(2kkm) ltlt O(knk) brute force
  • Space is O(2kn) ltlt O(knk) brute force

7
PICTURE
8
Monte Carlo Details
  • A colorful path is simple, but a simple path may
    not be colorful under a given coloring
  • Solution run multiple independent trials
  • After one trial,

9
Adding Biology
  • Color-Coding gives an algorithmic basis, now
    introduce biologically motivated extensions
  • Can set the start or end of path by type
  • Eg screening by Gene Ontology categories
  • Can force the inclusion of a protein on the path
    by giving it a unique color
  • Using counters, can specify path must contain
    between x and y proteins of a given type
  • Computational cost multiplicative in y per counter

10
Adding Biology - Segmented Paths
  • Pathways may be ordered
  • Signaling pathways going from the membrane, to
    nuclear proteins and finally transcription
    factors
  • Assign each protein an integer label based on
    biological information, build path out of ordered
    sequences of labeled proteins
  • Now only need to constrain color collisions among
    proteins with the same label
  • If path length is equally split among labels,
    probability of correct coloring rises
  • Modifications allow for inability to assign
    proteins to unique labels

11
Adding Biology - More Structures
  • Modifications to the Color-Coding recurrence
    allow for the discovery beyond simple paths
  • Example Two-terminal series-parallel graphs
  • Capture parallel signaling pathways
  • PICTURE

12
Generating Edge Weights
  • So far, have glossed over how weights
    (probabilities) on the protein graph are assigned
  • Here, use our previous work, generate logistic
    function of three variables (for a pair of
    proteins)
  • Number of times interaction between them was
    experimental observed
  • Pearson correlation coefficient of expressions
    (for corresponding genes)
  • Their small world clustering coefficient
  • Using training data from MIPS as gold standard
    for training relative weighting
  • Taking log of weights makes path score additive

13
Application
  • Tested our simple path implementation with the
    yeast interaction network
  • 4,500 vertices, 14,500 edges
  • Based on interaction data from Database of
    Interacting Proteins (Feb 2004)
  • Varied path length, much faster than brute force
    at longer end of spectrum (14x for paths of
    length 9)
  • Focus on paths from membrane proteins to
    transcription factors

14
Validation Techniques
  • Three methods of validation
  • Two statistical
  • Functional enrichment p-value based on how many
    proteins in the path are similar (by GO category)
  • Weight p-value compares weights of paths to those
    found when the protein graph undergoes random
    degree-preserving shuffling
  • Lastly, search for expected pathways
  • MAP-Kinase, ubiquitin-ligation

15
BIO ABOUT MAPK AND ETC?
16
Statistical Results
  • 100 best length 8 paths _at_ 99.9 success
  • 100 normal/2000 random paths used for weight
    p-value

17
MAPK Recovery Results
18
Conclusion
  • Presented efficient, color-coding based
    algorithms for finding simple paths
  • Added biological extensions, other structures
  • Applied our algorithms to yeast
  • Shown 60 of discovered pathways were
    significantly enriched
  • Recovered known MAP-Kinase pathways

19
(Future Work?)
Write a Comment
User Comments (0)
About PowerShow.com