Title: BioPAX The Birth of A Data Exchange Language for Biological Pathways
1BioPAXThe Birth of A Data Exchange Language for
Biological Pathways
- Joanne Luciano
- BioPAX Core Group
- www.biopax.org
- 7th International Annual Bio-Ontologies Meeting
- 30 July 2004
- Glasgow, Scotland
- United Kingdom
2Introduction
- BioPAX Biopathway Exchange Language
- Emerged at ISMB
- conceived at ISMB 01
- born at ISMB 02
- crawling at ISMB 03 (Level 0.5)
- walking at ISMB 04 (Level 1.0)
- now approaching the terrible twos
3What is a pathway?
Depends on who you ask
4Research Community Need
WIT BioCyc Reactome aMAZE KEGG BIND DIP HPRD MINT
IntAct PSI format CSNDB TRANSPATH TRANSFAC PubGene
GeneWays
Integrated Pathway Database
Pathway Databases Metabolic Protein
Interaction Signal Transduction Gene Regulatory
5Design Goals
- Encapsulation An entire pathway in one record
- Compatible Use existing standards wherever
possible - Computable From file reading to logical
inference - Successful Buy-in from the research community
6Technical Logistics Goals
- Interoperability
- Integration and exchange of pathway data
- Interchange through a common (standard)
representation - accommodate existing database representations
- provide a basis for future databases
- enables development of tools for searching and
reasoning over the data base
7Technical Logistics (contd)
- Why OWL? Why OWL DL?
- Expressivity (biology complex relationships)
- W3C Standard (use existing standards)
- Semantic Web enabled
- XML based (the exchange language in computing)
- Machine Computable
- Facilitate integration of knowledge, data, tool
development - Uncover inconsistencies and new knowledge
- OWL DL
- Enable full reasoning capability for users
- from file reading to logical inference
- Complete all conclusions are guaranteed to be
computed - Decidable all computations will finish in finite
time (with OWL Lite, short amount of time)
8Social Logistics
- Get organized
- Make the decision commitment
- 2 or 3 dedicated individuals
- Small core group
- Bi-weekly conference calls, bi-monthly F2F
- Commitment resources
- Participants willing and able cover their costs
- Outside funding (DOE)
- Special interests and needs form subgroup task
forces - Core group member(s)
- Outside experts
- International representation participation
(Outreach Community Building) - conferences and mailing lists
- follow-up and individual
- Collaborate with complementary/competing
representations -
9Social LogisticsHow we engendered buy in from
the field whichmade life much easier
- Take things in steps
- Pathway Database vision -gt Data Exchange Format
as 1st step - Data Exchange Format -gt Release in Levels of
increasing complexity Level 1 supports Metabolic
pathways, Level 2 -
- Early success leads to early adoption, leads to
increased probability of overall project success. -
- Get buy in and get involvement -leads to
acceptance later - Support the existing databases (BioCYC, WIT,
BIND, etc.) - Got database sources to agree to participate in
the development to assure that their DBs will be
properly represented - Got database sources to agree to export in the
new format once it is defined
10Social Logistics (contd)
- Get buy in (continued)
- Community Involvement and Support
- Core group (represents voice of community,
small, committed) - Mailing List
- User community
- Subgroups
- International Meetings and Presentations
- Tool developers
- Modelers
- Users (researchers)
- Ontology developers
- Database providers
- Complementary representations (SBML, CellML)
- Like minds
- General Community
11Implementation of BioPAX
- Designed using GKB Editor and Protégé
- BioPAX uses OWL to define the Schema
- BioPAX Instances to store the data
12BioPAX Ontology
13OWL (schema)
Instances (Individuals) data
14Complex Relationships Captured
15Ontology Slot Definitions
16Integration -gt KnowledgeKnowledge is Power
-
- Data in the same format
- Metabolic Protein Protein Interaction
- Signal Transduction Gene Regulation
- Facilitates
- Centralized public pathway DB
- Share data between existing DBs
- Distribute public and proprietary data
- Knowledge Assembly
- Reasoning
17A Common Exchange Language
Promotes collaboration (big science),
accessibility
BioPAX
Without BioPAX gt100 DBs and tools
18Consistency Checking Nutrient-related analysis
of a BioPAX knowledge base
Known Nutrient set
Fired Reaction
Unfired Reaction
Essential compounds
Missing essential compound
Biomass
19What Next?
- BioPAX future Development
- Level 2, 3, future levels
- BOF (check schedule)
- Talk later today by Gary Bader at BioPathways SIG
- Poster in Main Conference (check program)
- Development of tools and API
- libBioPAX
- Semantic Web Life Science Initiatives
- BOF Sunday
20BioPAX Supporting Groups
- Databases
- BioCyc (www.biocyc.org)
- BIND (www.bind.ca)
- WIT (wit.mcs.anl.gov/WIT2)
- PharmGKB (www.pharmgkb.org)
- Grants
- Department of Energy (Workshop)
- Groups
- Memorial Sloan-Kettering Cancer Center G. Bader,
M. Cary, J. Luciano, C. Sander - SRI Bioinformatics Research Group P.
Karp, S. Paley, J. Pick - University of Colorado Health Sciences Center I.
Shah - BioPathways Consortium J. Luciano, E.
Neumann, A. Regev, V. Schachter - Argonne National Laboratory N. Maltsev, E.
Marland - Samuel Lunenfeld Research Institute C. Hogue
- Harvard Medical School E. Brauner, D.
Marks, J. Luciano, A. Regev - NIST R. Goldberg
- Stanford T. Klein
- Columbia A. Rzhetsky
- Dana Farber Cancer Institute J. Zucker
- Collaborating Organizations
- Proteomics Standards Initiative (PSI)
- Systems Biology Markup Language (SBML)
- CellML
- Chemical Markup Language (CML)
The BioPAX Community
21Exchange Formats in the Pathway Data Space
Database Exchange Formats
Simulation Model Exchange Formats
SBML, CellML
PSI
Biochemical Reactions
Protein Interaction Networks
Rate Formulas
Metabolic Pathways Low Detail High
Detail
Regulatory Pathways Low Detail High
Detail
22Level 1 BioPAXReleased July 2004
Database Exchange Formats
Simulation Model Exchange Formats
SBML, CellML
Genetic Interactions
PSI
Rate Formulas
BioPAX Level 1
Biochemical Reactions
23Exchange Formats in the Pathway Data Space
Database Exchange Formats
Simulation Model Exchange Formats
BioPAX
SBML, CellML
Genetic Interactions
PSI
Rate Formulas
Biochemical Reactions