Title: Building biological networks from diverse genomic data
1Building biological networks from diverse genomic
data
- Chad Myers
- Department of Computer Science, Lewis-Sigler
Institute for Integrative Genomics - Princeton University
- PRIME Workshop on Pathway Databases and Modeling
Tools - June 16, 2006
2Motivation building biological networks from
experimental data
?
- Find missing pathway components
- Detect uncharacterized crosstalk between
pathways - Discover novel pathways
Explosion of functional genomic DATA
KNOWLEDGE of components and inter-relationships
that lead to function
3Motivation building biological networks from
experimental data
noisy
How can we harness this information without
sacrificing precision?
4Directed network discovery involving the
biologist in the search process
- Previous approaches to network analysis from
genomic data - largely undirected global approaches that detect
interesting network features - Incorporating expert direction can
- Improve sensitivity and precision by using
context information - Focus on relevant information for biologist user
(allows interactivity) -
Two-hybrid interaction network, yeast (SH3
domain) Boone lab
Previous work Bader et al. (2003), Asthana et
al. (2004) Yamanashi et al. (2004,2005), Kato et
al. (2005)
5bioPIXIE system overview
bioPIXIE Pathway Inference from eXperimental
Interaction Evidence
6Overview
- How do we integrate heterogeneous evidence?
- Expert-driven network discovery
- Making it usable practical visualization and
other interface considerations - Does it work?
- (evaluation experiments and biological
validation) - Challenges/opportunities and future work
7Heterogeneous data integration
- Diverse forms of data whats a unifying
framework? - Variable coverage, reliability, and relevance
- Integration scheme should utilize information in
data when available, but be robust when missing
physical binding
cellular localization
genetic interaction
expression
sequence (TF motifs, coding,)
? Map to associations of genes/proteins
? Bayes net
8Bayes net for evidence integration
We infer
- Input evidence grouped by lab (source) and by
type - Structure
- Naïve Bayes (60 nodes)
- (also tried TAN)
- CPTs
- learned from GO gold standard
Functional Relationship
Fully-connected, weighted graph of proteins
Microarray correlation
Shared transcription factors
Synthetic lethality
Synthetic rescue
Co- localization
Purified complex
2 Hybrid
Affinity precipitation
9Overview
- How do we integrate heterogeneous evidence?
- Expert-driven network discovery
- Making it usable practical visualization and
other interface considerations - Does it work?
- (evaluation experiments and biological
validation) - Challenges/opportunities and future work
10Expert-driven network discovery
- Local search in the PPI network centered at the
query - Which proteins should we extract as a single,
functionally coherent group? - Should consider confidence in links and topology
surrounding query group
11Extracting relevant proteins
- Basic idea compute expected linkage to query
set - eij P ( protein i is functionally related to
protein j evidence) - Xij binary RV with prob. eij
- SQ ( pi ) of links from protein i to query
set, Q - Find proteins that maximize
What about indirect links to the query set?
12Graph search handling indirect links
- Solution iterative expanding search where
indirect links to the query through high
confidence neighbors are counted
13Overview
- How do we integrate heterogeneous evidence?
- Expert-driven network discovery
- Making it usable practical visualization and
other interface considerations - Does it work?
- (evaluation experiments and biological
validation) - Challenges/opportunities and future work
14Making bioPIXIE usable
- Guiding principles
- Accessibility
- (users can access most recent data with little
effort) - Simplicity vs. flexibility
- Drill-down
- (details, e.g. supporting exp. data, hidden
until requested) - Browseable
15Graph visualization
16Overview
- How do we integrate heterogeneous evidence?
- Expert-driven network discovery
- Making it usable practical visualization and
other interface considerations - Does it work?
- (evaluation experiments and biological
validation) - Challenges/opportunities and future work
17Evaluation experiments
Recovering known network components How much
does integration help?
- Results averaged over 31 pathways, processes, and
complexes (KEGG, GO, MIPS) - 10 random proteins as query set and try to
recover remaining members
18Evaluation experiments (2)
Recovering known network components Do naïve
methods of integration/search work just as well?
- Results averaged over 31 pathways, processes, and
complexes (KEGG, GO, MIPS) - 10 random proteins as query set and try to
recover remaining members
19Biological validation finding new components
- Using bioPIXIE to characterize unknown genes
S. cerevisiae uncharacterized gene,
YPL077C Predicted involvement in chromosome
segregation
20Biological validation finding new components
P-value based on blind counting 1.98x10-7 ,
Fishers exact test
21Biological validation novel links between
pathways
DNA replication initiation Cdc7 switch that
starts replication (activated by Dbf4) Linked to
Hsp90 complex by our method Hsp90 (yeast-
hsc82,hsp82) Cytosolic molecular chaperone that
participates in the folding of several signaling
kinases and hormone receptors
(Helmut Pospiech)
22Genetic analysis of DNA replication-Hsp90 link
dbf4?hsp82?
dbf4?hsc82?
dbf4?cpr7?
hsp82?
hsc82?
cpr7?
dbf4?
dbf4?
dbf4?
wt
wt
wt
105 cells
RT
105 cells
30C
105 cells
37C
YKO Dbf4 vs. hsp82, hsc82 and co-chaperones
cpr7, sti1, cdc37
23Overview
- How do we integrate heterogeneous evidence?
- Expert-driven network discovery
- Making it usable practical visualization and
other interface considerations - Does it work?
- (evaluation experiments and biological
validation) - Challenges/opportunities and future work
24Practical challenges/opportunities
- Visualizing complex networks of interactions in
a meaningful way - how does it scale with added data?
- easy user navigation around the network
- Data-centric vs. established knowledge views
- How do we overlay current knowledge of pathways
with predictions derived from experimental data?
25Future work
An observation
The more specific we can be about the end goal,
the better the accuracy of our prediction
26Future work
Exploiting relevance and reliability variation
context-specific integration
27Summary
- bioPIXIE can facilitate precise network discovery
from experimental data using - Bayesian data integration
- Expert-directed search
- Web-based dynamic interface
- bioPIXIE is an effective tool for browsing
genomic evidence and generating specific,
testable hypotheses
http//pixie.princeton.edu
28Acknowledgements
Olga Troyanskaya Drew Robson Adam Wible Kara
Dolinski Camelia Chiriac Matt Hibbs Curtis
Huttenhower David Botstein Lab Leonid Kruglyak Lab
Thank you!
http//pixie.princeton.edu
29Evaluation experiments (3) what about noise in
the query set?
of random proteins out of 20 total query
proteins
AUPRC
30Evaluation experiments (4)
Comparing with existing approaches
SEEDY proteins ranked by max. direct connection
to query
Complexpander
31Hydroxyurea sensitivity (replication inhibitor)
dbf4?hsp82?
dbf4?hsc82?
dbf4?cpr7?
dbf4?hsc82?
dbf4?hsp82?
dbf4?hsp82?
dbf4?hsc82?
dbf4?sti1?
dbf4?cpr7?
dbf4?cpr7?
dbf4?sti1?
dbf4?sti1?
hsc82?
hsp82?
hsp82?
hsc82?
hsc82?
dbf4?
dbf4?
wt
wt
wt
cpr7?
hsp82?
cpr7?
cpr7?
dbf4?
sti1?
sti1?
sti1?
106 cells
30C
106 cells
37C
HU 50 mM
HU 100 mM
HU 0 mM
32Is this interaction specific to DNA replication?
MMS sensitivity (induces DNA damage)
- Conclusions
- Hsp90 complex plays specific role in DNA
replication - Hsc82 and hsp82 do not have identical function
- Possible new link between signaling cascades,
stress, and DNA replication - Our system generates specific, testable
hypotheses
dbf4?hsc82?
dbf4?hsp82?
dbf4?cpr7?
dbf4?hsc82?
dbf4?hsc82?
dbf4?hsp82?
dbf4?hsp82?
dbf4?sti1?
dbf4?cpr7?
dbf4?cpr7?
dbf4?sti1?
dbf4?sti1?
hsc82?
hsc82?
hsc82?
hsp82?
hsp82?
dbf4?
dbf4?
wt
wt
wt
cpr7?
hsp82?
cpr7?
cpr7?
dbf4?
sti1?
sti1?
sti1?
106 cells
37C
MMS treatment has no apparent effect at RT, 30C
or 37C (shown)
33(No Transcript)
34(No Transcript)