Integrating heterogeneous genomic data wrapup - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Integrating heterogeneous genomic data wrapup

Description:

Integrating heterogeneous genomic data wrap-up. Rui Kuang and Chad Myers ... Far from 'Gold', more like Pewter... Why? Remember the Gene Ontology? Three hierarchies: ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 32
Provided by: KUA2
Category:

less

Transcript and Presenter's Notes

Title: Integrating heterogeneous genomic data wrapup


1
Integrating heterogeneous genomic data wrap-up
CSCI5980 Functional Genomics, Systems Biology
and Bioinformatics
  • Rui Kuang and Chad Myers
  • Department of Computer Science and Engineering
  • University of Minnesota

2
Announcements
  • HW 3 due tonight at midnight!
  • Project presentations May 5, 7, 12, 14
  • presentation schedule out tomorrow
  • required attendance!
  • Project reports due midnight, May 14th
  • 10 pg abstract, intro, methods, experiments,
    discussion and/or conclusion

3
Outline for today
  • Wrap up discussion of inference of functional
    linkage network based on heterogeneous data
  • Paper discussion Workman et al.

4
Handling noise
  • General approaches
  • Simple strategies
  • Conjunctive integration (keep only data/features
    that are supported independently across multiple
    input datasets) (AND integration)
  • Disjunctive integration (OR integration)
  • Pros/Cons?
  • More sophisticated idea use machine learning to
    learn which datasets are reliable (optimal
    weighting)

(requires gold standard)
5
Gold Standards
  • Expert-curated assignments of genes to functional
    groups, complexes, or pathways
  • Gene Ontology
  • KEGG - Kyoto Encyclopedia of Genes and Genomes
  • MIPS Munich Information center for Protein
    Sequences

6
Remember the Gene Ontology?
  • Three hierarchies
  • Molecular function
  • Biological process
  • Cellular component
  • Curated annotation

7
Gold standard(pairwise data)
Predicted Gene Pairs
Based on TPs and FPs, calculate precision and
recall, and draw ROC curves
8
An example evaluation based on GO gold standard
9
Disagreement between gold standards
10
Caveat functional biases in evaluations
11
Comparison of individual datasets
(based on comparison against GO bio. process gold
standard)
Myers et al. Finding function evaluation methods
for functional genomic data. BMC Genomics (2006)
12
Process specific evaluation
13
Bayesian data integration a simple model
Wed like to infer
Cdc7
Dbf4
(naïve Bayes classifier)
14
Modeling process-specific signal
Cdc7
Dbf4
(Derived from users query)
Reliability variation
Datasets
Contexts
What are we assuming here?
15
Bayesian integration an intuitive view
Cdc7
Dbf4
Reliability variation
(Derived from users query)
Datasets
Contexts
Context ribosome
Context membrane organization
Unrelated genes
Functionally-related genes
PDF
PDF
Observed Co-expression (Pearson correlation)
Observed Co-expression (Pearson correlation)
16
Bayesian integration example
  • DNA replication initiation complex, Cdc7-Dbf4

Cdc7
Dbf4
Inferred prob. of FR .998
(Derived from users query)
Observed
17
Bayesian integration example
  • DNA replication initiation complex, Cdc7-Dbf4

Cdc7
Dbf4
Inferred prob. of FR .998
(Derived from users query)
Observed
18
Evaluation experiments
Recovering known network components Conclusion
1 Robust integration is important
Precision ( TP / TP FP )
of recovered same-process protein pairs
(8 of 174 input datasets)
19
Evaluation experiments (2)
Does incorporating biological context information
improve prediction?
Simple structure (global)
Comparison Same data Same query Simple vs.
context-specific integration
Context-sensitive BN
Datasets
Contexts
20
Evaluation experiments (2)
Conclusion 2 Using contextual information in
integration is critical!
RNA splicing (GO0008380)
10-protein query each point- average of 50 trials
21
Evaluation experiments (2)
RNA splicing same 5 query genes
Context-specific network 6 FPs/ 80 precision
Global network 22 FPs/27 precision
22
A consistent improvement
  • Context-specific integration improves 44/53
    evaluated bio. process GO terms an average of 25

10-protein query each point average of 50 trials
Context-specific network ( of recovered proteins)
Global network ( of recovered proteins)
23
A practical system for network discovery
Gene expression dataset 1
Gene expression dataset 2
Gene expression
Gene expression dataset N
Yeast two-hybrid dataset 1
Co-precipitation dataset 1
Physical interactions
Synthetic lethality dataset
Network recovery algorithm
Synthetic rescue dataset
Data integration via a Bayesian network
Genetic interactions
bioPIXIE Pathway Inference from eXperimental
Interaction Evidence
Transcription factor bin sites
Localization
Other
Curated literature
Results displayed in a dynamic visualization
Myers et al. Discovery of biological networks
from diverse functional genomic data. Genome
Biology (2005).
24
Making our approach practicaleffective data
visualization
25
Making our approach practicaleffective data
visualization

  • Guiding principles
  • Accessibility
  • (users can access most recent data with little
    effort)
  • Drill-down
  • (details, e.g. supporting exp. data, hidden
    until requested)
  • Browseable

26
Graph visualization


27
Graph visualization


28
Biological validation characterizing new genes
Uncharacterized genes YPL077C, YPL017C,
YPL144W Predicted involvement in chromosome
segregation
29
Biological validation characterizing new genes
Differential Interference Contrast
DAPI
FACS
Wild type
YPL017CD
YPL077CD
Prediction Chromosome segregation
YPL144WD
30
Biological validation characterizing new genes
Differential Interference Contrast
DAPI
FACS
Wild type
YPL017CD
YPL077CD
YPL077CD
Prediction Chromosome segregation
YPL144WD
31
Biological validation characterizing new genes
Differential Interference Contrast
DAPI
FACS
Wild type
YPL017CD
YPL077CD
YPL077CD
Prediction Chromosome segregation
YPL144WD
Write a Comment
User Comments (0)
About PowerShow.com