Title: BioPAX The Birth of A Data Exchange Language for Biological Pathways
1BioPAXThe Birth of A Data Exchange Language for
Biological Pathways
- Joanne Luciano
- BioPAX Group
- www.biopax.org
- 7th International Annual Bio-Ontologies Meeting
- 30 July 2004
- Glasgow, Scotland
- United Kingdom
2Introduction
- BioPAX Biopathway Exchange Language
- Emerged at ISMB
- conceived at ISMB 01
- born just before ISMB 02 (Protégé workshop)
- crawling at ISMB 03 (Level 0.5)
- walking at ISMB 04 (Level 1.0)
- approaching the terrible twos
3What is a pathway?
Depends on who you ask
4Research Community Need
WIT BioCyc Reactome aMAZE KEGG BIND DIP HPRD MINT
IntAct PSI format CSNDB TRANSPATH TRANSFAC PubGene
GeneWays
Integrated Pathway Database
Pathway Databases Metabolic Protein
Interaction Signal Transduction Gene Regulatory
5Design Goals
- Encapsulation An entire pathway in one record
- Compatible Use existing standards wherever
possible - Computable From file reading to logical
inference - Successful Buy-in from the research community
6Technical Logistics Goals
- Interoperability
- Integration and exchange of pathway data
- Interchange through a common (standard)
representation - accommodate existing database representations
- provide a basis for future databases
- enables development of tools for searching and
reasoning over the data base
7Technical Logistics (contd)
- Why OWL DL?
- Expressivity (biology complex relationships
- W3C Standard (use existing standards)
- Semantic Web enabled.
- XML based (the exchange language in computing)
- Machine Computable
- Enable full reasoning capability from file
reading to logical inference - facilitate integration of knowledge, data, tool
development - uncover inconsistencies and new knowledge
- OWL DL
- Complete all conclusions are guaranteed to be
computed - Decidable all computations will finish in finite
time (with OWL Lite, short amount of time
8Social LogisticsHow we engendered buy in from
the field.Made it much easier
- Take things in steps
- Pathway Database -gt Data Exchange Format
- Data Exchange Format -gt Release in Levels of
increasing complexity (early success leads to
early adoption leads to the possibility of
overall project success. -
- Get buy in and get involvement -leads to
acceptance later - Support the existing databases (BioCYC, WIT,
BIND, etc.) - Got database sources to agree to participate in
the development to assure that their DBs will be
properly represented - Got database sources to agree to export in the
new format once it is defined
9Social Logistics(contd)
- Get buy in (continued)
- Community Involvement and Support
- Core group (from community, small, meet
regularly) - Mailing List
- User community
- Subgroups
- International Meetings and Presentations
- Tool developers
- Modelers
- Users (researchers)
- Ontology developers
- Database providers
- Complementary representations (SBML, CellML)
- Like minds
- General Community
10Social Logistics(contd)
- Get organized
- Small core group advancing standard
- International representation via mailing lists
- Collaborate complementary representations
- Bi-weekly conference calls, bi-monthly F2F
- Cost paid by participants and DOE
- Special interests have subgroups
- Core group member outside experts
- Tackle specific challenges
-
11Implementation of BioPAX
- Designed using GKB Editor and Protégé
- BioPAX uses OWL to define the Schema
- BioPAX Instances to store the data
12BioPAX Ontology
13OWL (schema)
Instances (Individuals) data
14Complex Relationships Captured
15Ontology Slot Definitions
16Integration -gt KnowledgeKnowledge is Power
-
- Data in the same format
- Metabolic Protein Protein Interaction
- Signal Transduction Gene Regulation
- Facilitates
- Centralized public pathway DB
- Share data between existing DBs
- Distribute public and proprietary data
- Knowledge Assembly
- Reasoning
17A Common Exchange Language
Promotes collaboration (big science),
accessibility
BioPAX
Without BioPAX gt100 DBs and tools
18Consistency Checking Nutrient-related analysis
of a BioPAX knowledge base
Known Nutrient set
Fired Reaction
Unfired Reaction
Essential compounds
Missing essential compound
Biomass
19What Next?
- BioPAX future Development
- Level 2, 3, future levels
- BOF (check schedule)
- Talk later today by Gary Bader at BioPathways SIG
- Poster in Main Conference
- Development of tools and API
- libBioPAX
- Semantic Web Life Science Initiatives
- BOF Sunday
20BioPAX Supporting Groups
- Databases
- BioCyc (www.biocyc.org)
- BIND (www.bind.ca)
- WIT (wit.mcs.anl.gov/WIT2)
- PharmGKB (www.pharmgkb.org)
- Grants
- Department of Energy (Workshop)
- Groups
- Memorial Sloan-Kettering Cancer Center G. Bader,
M. Cary, J. Luciano, C. Sander - SRI Bioinformatics Research Group P.
Karp, S. Paley, J. Pick - University of Colorado Health Sciences Center I.
Shah - BioPathways Consortium J. Luciano, E.
Neumann, A. Regev, V. Schachter - Argonne National Laboratory N. Maltsev, E.
Marland - Samuel Lunenfeld Research Institute C. Hogue
- Harvard Medical School E. Brauner, D.
Marks, J. Luciano, A. Regev - NIST R. Goldberg
- Stanford T. Klein
- Columbia A. Rzhetsky
- Dana Farber Cancer Institute J. Zucker
- Collaborating Organizations
- Proteomics Standards Initiative (PSI)
- Systems Biology Markup Language (SBML)
- CellML
- Chemical Markup Language (CML)
The BioPAX Community
21Exchange Formats in the Pathway Data Space
Database Exchange Formats
Simulation Model Exchange Formats
SBML, CellML
PSI
Biochemical Reactions
Protein Interaction Networks
Rate Formulas
Metabolic Pathways Low Detail High
Detail
Regulatory Pathways Low Detail High
Detail
22Level 1 BioPAXReleased July 2004
Database Exchange Formats
Simulation Model Exchange Formats
SBML, CellML
Genetic Interactions
PSI
Rate Formulas
BioPAX Level 1
Biochemical Reactions
23Exchange Formats in the Pathway Data Space
Database Exchange Formats
Simulation Model Exchange Formats
BioPAX
SBML, CellML
Genetic Interactions
PSI
Rate Formulas
Biochemical Reactions