Title: Networks and Pathways I
1Networks and Pathways I
CBW Bioinformatics Workshop February 24th 2005,
Vancouver Christopher Hogue The Blueprint
Initiative
2About this talk
- The Problem of Choice Too Many Databases
- Data Exchange Formats
- Pathway Resources
- KEGG
- EcoCyc
- Small Molecule Resources
- PubChem
- SMID-BLAST
3Molecular Assembly Data
- Interaction pair
- A binds B
- Database of Interactions
- Molecule Vertex
- Interaction Edge
- Tools/Computations
- Graph Theory
- Pathway Finding
- Simulations
- Cellular CAD
Goodsell
4Molecular Assembly What Databases to use?
- DNA
- RNA
- Proteins
- Small molecules
- Complexes
5The Problem
- So many assembly databases, all with their own
data models, formats, and data access methods.
http//cbio.mskcc.org/prl/
6User Behavior
- The problem of too much choice.
- (M. Lepper _at_Stanford and S. Iyengar _at_Columbia)
- Two tables in a supermarket
- 24 jars of jam vs 6 jars of jam.
- 3 vs 30
- Choice frustration.
- Leads to incrementalism as essential user
criticism is withdrawn. - Cant Debug - This jam is a little bitter
compared to - the other 6?
- the other 26?
- A whole lot of bad jam that nobody wants to buy
7User Behavior
- The problem of too much choice.
- (M. Lepper _at_Stanford and S. Iyengar _at_Columbia)
- Two tables in a supermarket
- 24 jars of jam vs 6 jars of jam.
- 3 vs 30
- Choice frustration.
- Leads to incrementalism
- Essential user criticism is withdrawn.
- Cant Debug - This jam is a little bitter
compared to - the other 6?
- the other 24?
- A whole lot of bad jam that nobody wants to buy
8Standards Fatigue
- Data Standards are not an effective goal to
achieve results in a timely way - Interactions/Pathways since NIH meeting in Nov
1999. Efforts are still not integrated (PSI/IMEX
and BIOPAX). - Information Systems are better goals.
- Wet Lab Scientists are busy people who are
(excuse me) trying to write papers. - Ongoing wishful thinking about latest new
technology. - If only we had the semantic web it wouldl fix
everything!
9Community Standards
- IMEX (BIND/DIP/INTACT/MINT/MIPS)
- BioPAX (pathway databases)
- SBML (gt70 software systems collaborating)
- Cytoscape (collaborating interface developers)
- NCBI/Blueprint (architecture)
- Model Organism Databases (GMOD architecture)
- Journals and Editors
- Scientific Societies (FASEB)
- Member and Non-member Scientists
10Interaction Standards - PSI
11BioPAX Pathways/Reactions
12Exchange Formats in the Pathway Data Space
Database Exchange Formats
Simulation Model Exchange Formats
BioPAX
SBML, CellML
Genetic Interactions
PSI-MI 2
Rate Formulas
Biochemical Reactions
13Two Views on Biomolecular Assembly Data
Integration
- Separate Models
- Pathways
- Interactions
- Separate Databases
- Multiple DB ontologies
- Ad-hoc curation standards
- Ontology Consortia
- PSI
- BioPAX
- APIs Exchange Only
- Publish or perish
- Unified Model
- Networks with Interactions and Reactions
- GenBank-Like Data Archive
- One Ontology archiving all
- Professional Curation
- Single Curation Standard
- FTP Services
- APIs Atomistic Objects
- Service or perish
14Where to define data objects? API or Exchange or
Archive?
- Software Systems Components (OSI Layers)
- Human Interfaces
- Application Programming Interfaces
- Communications Protocols (Exchange)
- Content Structure (Archive)
- Database (ODBC/JDBC compliant MySQL)
- Document Structures (XML)
- Architectures (Compatible orchestration of the
above) - Platforms (Runs the above Windows, Linux, Unix)
Atomistic
All-or-none
15BioPAX Motivation
Common format will make data more accessible,
promoting data sharing and distributed curation
efforts
Application
Database
User
With BioPAX
Before BioPAX
gt150 DBs and tools
16Pathways, Interactions and Signaling
Metabolic Pathways
Molecular Interaction Networks
Signaling Pathways
17(No Transcript)
18SummaryWorking with a spectrum of communities.
- Identify the communities.
- Recognize that communities are disjoint.
- Success will arise from broad collaboration
across the spectrum of identified communities. - Service all communities effectively with a whole
system. - Drive innovation more through applications
development and use. - Gain and effectively incorporate user critique.
- Understand user needs, behaviors.
19Pathway Databases
20http//www.genome.jp/kegg/
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34PubChem Small Molecules
35PubChem
- Substance
- descriptions of chemical samples, from a variety
of sources, and links to PubMed citations,
protein 3D structures, and biological screening
results that are available in PubChem BioAssay. - If the contents of a chemical sample are known,
the description includes links to PubChem
compound. - Compound
- Includes mixtures
36(No Transcript)
37(No Transcript)
38(No Transcript)
39Links PubChem Bioassay
Similarity Search
Similar Compounds
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45Small Molecule Interaction DB.
- SMID-BLAST - for finding small molecule binding
sites based on 3D structures.
46(No Transcript)
47Whats in SMID? SMID-BLAST?
- SMID is a derived relational database
- 3D structures that have small molecule binding
sites - CDD domain regions families of conserved
domains - Small molecule binding residues are mapped onto
CDDs. - SMID-BLAST enhances domain searching with small
molecule binding context.
48Proteomics HUPO Poster
- Proteomics Phenol upregulated protein in H.
salinarium. - Spots identified by 2D gels of /- 1mM Phenol in
4.5M NaCL - Han, Han, Kim, Joo and Chan-Wha Kim, Korea
University - H. salinarium is not sequenced
- Mass spec peptide hits to Halobacterium sp.
NRC-1 - GI 15791191 (Vng2406c) and
- GI 15791140 (Vng2339c)
- Poster authors presented no conclusions other
than that these were completely unknown proteins.
49(No Transcript)
50Little information from CDD
51(No Transcript)
52(No Transcript)
53Completely relaxed Search settings
54(No Transcript)
55Aromatic ligand binding site phenol
56Oxygen Reactive site
57SMID-BLAST
- Offers small molecule context in addition to CDD
domain hits - With SMID-BLAST we can speculate on how two
proteins work to utilize Phenol as a carbon
source - Reactive species and loose specificity
hydrophobic binding sites.
58SMID-BLAST Standalone
- Scoring System
- Distinguishes site specificity
- Weights substrate/binding site size
- Generates GenPept Annotation
- Suitable for use in sequence analysis pipelines