Title: Pathway Tools / BioCyc Fundamentals
1Pathway Tools / BioCycFundamentals
- Peter D. Karp, Ph.D.
- Bioinformatics Research Group
- SRI International
- pkarp_at_ai.sri.com
- BioCyc.org
- EcoCyc.org, MetaCyc.org, HumanCyc.org
2Pathway Tools Capabilities
- Create and maintain an organism database
integrating genome, pathway, regulatory
information - Computational inference tools
- Interactive editing tools
- Query and visualize that database
- Use the database to interpret omics data
- Metabolic network analysis tools
- Comparative analysis tools
- Export the metabolic network to SBML
- Speed creation of flux-balance models by order of
magnitude
3BioCyc
- Hundreds of microbial genomes
- Inferred operons and metabolic networks
- Couples curated data with computational
predictions - Supports analysis of omics data
- Comparative analysis tools
- Microbial emphasis. Exceptions
- HumanCyc, MouseCyc, CattleCyc
4Model Organism Databases /Organism Specific
Databases
- DBs that describe the genome and other
information about an organism - Every sequenced organism with an active
experimental community requires a MOD - Integrate genome data with information about the
biochemical and genetic network of the organism - Integrate literature-based information with
computational predictions - Curated by experts for that organism
- No one group can curate all the worlds genomes
- Distribute workload across a community of experts
to create a community resource
5Rationale for MODs
- Each complete genome is incomplete in several
respects - 40-60 of genes have no assigned function
- Roughly 7 of those assigned functions are
incorrect - Many assigned functions are non-specific
- Need continuous updating of annotations with
respect to new experimental data and
computational predictions - MODs are platforms for global analyses of an
organism - Interpret omics data in a pathway context
- In silico prediction of essential genes
- Characterize systems properties of metabolic and
genetic networks
6What is Curation?
- Ongoing updating and refinement of a PGDB
- Correcting false-positive and false-negative
predictions - Incorporating information from experimental
literature - Authoring of comments and citations
- Updating database fields
- Gene positions, names, synonyms
- Protein functions, activators, inhibitors
- Addition of new pathways, modification of
existing pathways - Defining TF binding sites, promoters, regulation
of transcription initiation and other processes
7Pathway/Genome Database
Pathways
Reactions
Compounds
Sequence Features
Proteins RNAs
Regulation Operons Promoters DNA Binding
Sites Regulatory Interactions
Genes
Chromosomes Plasmids
CELL
8BioCyc Collection of 507 Pathway/Genome Databases
- Pathway/Genome Database (PGDB) combines
information about - Pathways, reactions, substrates
- Enzymes, transporters
- Genes, replicons
- Transcription factors/sites, promoters, operons
- Tier 1 Literature-Derived PGDBs
- MetaCyc
- EcoCyc -- Escherichia coli K-12
- Tier 2 Computationally-derived DBs, Some
Curation -- 24 PGDBs - HumanCyc
- Mycobacterium tuberculosis
- Tier 3 Computationally-derived DBs, No Curation
-- 481 DBs
9Pathway Tools Overview
Annotated Genome
MetaCyc Reference Pathway DB
PathoLogic
Pathway/Genome Database
Pathway/Genome Navigator
Pathway/Genome Editors
10Pathway Tools Software PathoLogic
- Computational creation of new Pathway/Genome
Databases - Transforms genome into Pathway Tools schema and
layers inferred information above the genome - Predicts operons
- Predicts metabolic network
- Predicts which genes code for missing enzymes in
metabolic pathways - Infers transport reactions from transporter names
Bioinformatics 18S225 2002
11Pathway Tools SoftwarePathway/Genome Editors
- Interactively update PGDBs with graphical editors
- Support geographically distributed teams of
curators with object database system - Gene editor
- Protein editor
- Reaction editor
- Compound editor
- Pathway editor
- Operon editor
- Publication editor
12Pathway Tools SoftwarePathway/Genome Navigator
- Querying and visualization of
- Pathways
- Reactions
- Metabolites
- Proteins
- Genes
- Chromosomes
- Two modes of operation
- Web mode
- Desktop mode
- Most functionality shared, but each has unique
functionality
13Pathway Tools Software PGDBs Created Outside SRI
- 1,700 licensees 75 groups applying software to
300 organisms - Saccharomyces cerevisiae, SGD project, Stanford
University - 135 pathways / 565 publications
- Candida albicans, CGD project, Stanford
University - dictyBase, Northwestern University
- Mouse, MGD, Jackson Laboratory
- Under development
- Drosophila, FlyBase
- C. elegans, WormBase
- Arabidopsis thaliana, TAIR, Carnegie Institution
of Washington - 288 pathways / 2282 publications
- PlantCyc, Carnegie Institution of Washington
- Six Solanaceae species, Cornell University
- GrameneDB, Cold Spring Harbor Laboratory
- Medicago truncatula, Samuel Roberts Noble
Foundation
14Pathway Tools Software PGDBs Created Outside SRI
- NIAID BRCs for Biodefense pathogens
- BioHealthBase -- Mycobacterium tuberculosis,
Francisella tuleremia - Pathema -- 80 PGDBs
- PATRIC Brucella suis, Coxiella burnetii,
Rickettsia typhi - EuPathDB Cryptosporidium, Plasmodium
- G. Xie, Los Alamos Lab, Dental pathogens
- F. Brinkman, Simon Fraser Univ, Pseudomonas
aeruginosa - V. Schachter, Genoscope, Acinetobacter
- M. Bibb, John Innes Centre, Streptomyces
coelicolor - G. Church, Harvard, Prochlorococcus marinus,
multiple strains - E. Uberbacher, ORNL and G. Serres, MBL,
Shewanella onedensis - R.J.S. Baerends, University of Groningen,
Lactococcus lactis IL1403, Lactococcus lactis
MG1363, Streptococcus pneumoniae TIGR4, Bacillus
subtilis 168, Bacillus cereus ATCC14579 - Matthew Berriman, Sanger Centre, Trypanosoma
brucei, Leishmania major - Sergio Encarnacion, UNAM, Sinorhizobium meliloti
- Mark van der Giezen, University of London,
Entamoeba histolytica, Giardia intestinalis - Michael Gottfert, Technische Universitat Dresden,
Bradyrhizobium japonicum - Artiva Maria Goudel, Universidade Federal de
Santa Catarina, Brazil, Chromobacterium violaceum
ATCC 12472
15Pathway Tools Software PGDBs Created Outside SRI
- Large scale users
- C. Medigue, Genoscope, 200 PGDBs
- G. Sutton, J. Craig Venter Institute, 80 PGDBs
- G. Burger, U Montreal, 60 PGDBs
- Bart Weimer, Utah State University, Lactococcus
lactis, Brevibacterium linens, Lactobacillus
acidophilus, Lactobacillus plantarum,
Lactobacillus johnsonii, Listeria monocytogenes - Partial listing of outside PGDBs at BioCyc.org
16Obtaining a PGDB for Organism of Interest
- Find existing curated PGDB
- Find existing PGDB in BioCyc
- Create your own
17EcoCyc Project EcoCyc.org
- E. coli Encyclopedia
- Review-level Model-Organism Database for E. coli
- Tracks evolving annotation of the E. coli genome
and cellular networks - The two paradigms of EcoCyc
- Multi-dimensional annotation of the E. coli K-12
genome - Positions of genes functions of gene products
76 / 66 exp - Gene Ontology terms MultiFun terms
- Gene product summaries and literature citations
- Evidence codes
- Multimeric complexes
- Metabolic pathways
- Cellular regulation
Karp, Gunsalus, Collado-Vides, Paulsen
Nuc. Acids Res. 357577 2007 ASM News
7025 2004 Science 2932040
18 EcoCyc E.coli Dataset
Pathway/Genome Navigator
URL EcoCyc.org
Pathways 246
Reactions Metabolic 1394 Transport 246
Compounds 1,830
EcoCyc v13.6 Citations 19,000
Proteins 4,479 Complexes 895 RNAs 285
Gene Regulation Operons 3,369 Trans
Factors 196 Promoters 1,796 TF Binding Sites
2,205
Genes 4,492
19Paradigm 1EcoCyc as Textual Review Article
- All gene products for which experimental
literature exists are curated with a minireview
summary - Found on protein and RNA pages, not gene pages!
- 3257 gene products contain summaries
- Summaries cover function, interactions, mutant
phenotypes, crystal structures, regulation, and
more - Additional summaries found in pages for operons,
pathways - EcoCyc cites 17,300 publications
20Paradigm 2 EcoCyc as Computational Symbolic
Theory
- Highly structured, high-fidelity knowledge
representation provides computable information - Each molecular species defined as a DB object
- Genes, proteins, small molecules
- Each molecular interaction defined as a DB object
- Metabolic reactions
- Transport reactions
- Transcriptional regulation of gene expression
- 220 database fields capture extensive properties
and relationships
21EcoCyc Procedures
- DB updates performed by 5 staff curators
- Information gathered from biomedical literature
- Enter data into structured database fields
- Author extensive summaries
- Update evidence codes
- Corrections submitted by E. coli researchers
- Four releases per year
- Quality assurance of data and software
- Evaluate database consistency constraints
- Perform element balancing of reactions
- Run other checking programs
22EcoCyc Accelerates Science
- Experimentalists
- E. coli experimentalists
- Experimentalists working with other microbes
- Analysis of expression data
- Computational biologists
- Biological research using computational methods
- Genome annotation
- Study connectivity of E. coli metabolic network
- Study phylogentic extent of metabolic pathways
and enzymes in all domains of life - Bioinformaticists
- Training and validation of new bioinformatics
algorithms predict operons, promoters, protein
functional linkages, protein-protein
interactions, - Metabolic engineers
- Design of organisms for the production of
organic acids, amino acids, ethanol, hydrogen,
and solvents - Educators
23MetaCyc Metabolic Encyclopedia
- Describe a representative sample of every
experimentally determined metabolic pathway - Describe properties of metabolic enzymes
- Literature-based DB with extensive references and
commentary - Pathways, reactions, enzymes, substrates
- Jointly developed by
- P. Karp, R. Caspi, C. Fulcher, SRI International
- L. Mueller, A. Pujar, Boyce Thompson Institute
- S. Rhee, P. Zhang, Carnegie Institution
Nucleic Acids Research 2008
24Applications of MetaCyc
- Reference source on metabolic pathways
- Metabolic engineering
- Find enzymes with desired activities, regulatory
properties - Determine cofactor requirements
- Predict pathways from genomes
- Systematic studies of metabolism
- Computer-aided education
25MetaCyc Data -- Version 13.6
Pathways 1,436
Reactions 8,200
Enzymes 6,060
Small Molecules 8,400
Organisms 1,800
Citations 21,700
26Taxonomic Distribution ofMetaCyc Pathways
version 13.1
Bacteria 883
Green Plants 607
Fungi 199
Mammals 159
Archaea 112
27MetaCyc Curation
- DB updates by 5 staff curators
- Information gathered from biomedical literature
- Emphasis on microbial and plant pathways
- More prevalent pathways given higher priority
- Review-level database
- Four releases per year
- Quality assurance of data and software
- Evaluate database consistency constraints
- Perform element balancing of reactions
- Run other checking programs
- Display every DB object
28MetaCyc Curation
- Ontologies guide querying
- Pathways (recently revised), compounds, enzymatic
reactions - Example Coenzyme M biosynthesis
- Extensive citations and commentary
- Evidence codes
- Controlled vocabulary of evidence types
- Attach to pathways and enzymes
- Code Citation Curator date
- Release notes explain recent updates
- http//biocyc.org/metacyc/release-notes.shtml
29MetaCyc Data
- Of the 1548 enzymes
- 818 are monomers
- 730 are multimers
- 570 are homomultimers, 160 are heteromultimers
- Enzymes with cofactors 512
- Enzymes with activators or inhibitors 577
- Average pathway length 5 reactions
30Enzyme Data Available in MetaCyc
- Reaction(s) catalyzed
- Alternative substrates
- Activators, inhibitors, cofactors, prosthetic
groups - Subunit structure
- Genes
- Features on protein sequence
- Cellular location
- pI, molecular weight, Km, Vmax
- Gene Ontology terms
- Links to other bioinformatics databases
31What is a Pathway?
- A connected sequence of biochemical reactions
- Occurs in one organism
- Conserved through evolution
- Regulated as a unit
- Often starts or stops at one of 13 common
intermediate metabolites
32MetaCyc Pathway Variants
- Pathways that accomplish similar biochemical
functions using different biochemical routes - Alanine biosynthesis I E. coli
- Alanine biosynthesis II H. sapiens
- Pathways that accomplish similar biochemical
functions using similar sets of reactions - Several variants of TCA Cycle
33MetaCyc Super-Pathways
- Groups of pathways linked by common substrates
- Example Super-pathway containing
- Chorismate biosynthesis
- Tryptophan biosynthesis
- Phenylalanine biosynthesis
- Tyrosine biosynthesis
- Super-pathways defined by listing their component
pathways - Multiple levels of super-pathways can be defined
- Pathway layout algorithms accommodate
super-pathways
34Family of Pathway/GenomeDatabases
35Comparison with KEGG
- KEGG vs MetaCyc Reference pathway collections
- KEGG maps are not pathways Nuc Acids
Res 343687 2006 - KEGG maps contain multiple biological pathways
- Two genes chosen at random from a BioCyc pathway
are more likely to be related according to genome
context methods than from a KEGG pathway - KEGG maps are composites of pathways in many
organisms -- do not identify what specific
pathways elucidated in what organisms - KEGG has no literature citations, no comments,
less enzyme detail - KEGG assigns half as many reactions to pathways
as MetaCyc - KEGG vs organism-specific PGDBs
- KEGG does not curate or customize pathway
networks for each organism - Highly curated PGDBs now exist for important
organisms such as E. coli, yeast, mouse,
Arabidopsis
36Comparison of Pathway Tools to KEGG
- Inference tools
- KEGG does not predict presence or absence of
pathways - KEGG lacks pathway hole filler, operon predictor
- Curation tools
- KEGG does not distribute curation tools
- No ability to customize pathways to the organism
- Pathway Tools schema much more comprehensive
- Visualization and analysis
- KEGG does not perform automatic pathway layout
- KEGG metabolic-map diagram extremely limited
- No comparative pathway analysis
37Pathway Tools Implementation Details
- Platforms
- Macintosh, PC/Linux, and PC/Windows platforms
- Same binary can run as desktop app or Web server
- Production-quality software
- Version control
- Two regular releases per year
- Extensive quality assurance
- Extensive documentation
- Auto-patch
- Automatic DB-upgrade
- 480,000 lines of Lisp code
38- ptools-support_at_ai.sri.com
39Pathway Tools Architecture
Pathway Genome Navigator
Web Mode
Desktop Mode
Protein Editor Pathway Editor Reaction Editor
Lisp Perl Java
GFP API
Oracle or MySQL
Disk File
Ocelot DBMS
40Ocelot Knowledge Server Architecture
- Frame data model
- Minimizes size of schema relative to semantic
complexity - Schema is stored within the DB
- Schema is self documenting
- Slot units define metadata about slots
- Domain, range, inverse
- Collection type, number of values, value
constraints - Comment
- Schema evolution facilitated by
- Easy addition/removal of slots, or alteration of
slot datatypes - Flexible data formats that do not require
dumping/reloading of data
41Ocelot Storage System Architecture
- Persistent storage via disk files or Oracle or
MySQL - Concurrent development Oracle or MySQL
- Single-user development disk files
- Oracle/MySQL DBMS storage
- DBMS is submerged within Ocelot, invisible to
users - Frames transferred from DBMS to Ocelot
- On demand
- By background prefetcher
- Memory cache
- Persistent disk cache to speed performance via
Internet - Transaction logging facility
42Why Do We Code in Common Lisp?
- Gatt studied Lisp and Java implementation of 16
programs by 14 programmers (Intelligence 1121
2000) - The average Lisp program ran 33 times faster than
the average Java program - The average Lisp program was written 5 times
faster than the average Java program - Roberts compared Java and Lisp implementations of
a Domain Name Server (DNS) resolver - http//www.findinglisp.com/papers/case_study_java_
lisp_dns.html - The Lisp version had ½ as many lines as code
43Common Lisp ProgrammingEnvironment
- Interpreted and/or compiled execution
- Fabulous debugging environment
- High-level language
- Interactive data exploration
- Extensive built-in libraries
- Dynamic redefinition
- Find out more!
- See ALU.org or
- http//www.international-lisp-conference.org/
44PathoLogic Processing
- Translate source genome to PGDB form
- Predict operons
- Predict metabolic pathways
- Predict pathway hole fillers
- Transport inference parser
- Build metabolic overview diagram
45PathoLogic Step 1 Translate Genome to PGDB
Annotated Genomic Sequence
Pathway/Genome Database
Pathways
Reactions
PathoLogic Software Integrates genome and pathway
data to identify putative metabolic networks
Compounds
Multi-organism Pathway Database (MetaCyc)
Gene Products
Genes
Genomic Map
46(No Transcript)
47PathoLogic Step 2Predict Operons
- Predict adjacent genes A and B in same operon
based on - Intragenic distance
- Functional relatedness of A and B
- Tests for functional relatedness
- A and B in same gene functional class (MultiFun)
- A and B in same metabolic pathway
- A codes for enzyme in a pathway and B codes for
transporter involving a substrate in that pathway - A and B are monomers in same protein complex
- Correctly predicts 80 of E. coli transcription
units - Marks predicted operons with computational
evidence codes
Bioinformatics 20709-17 2004
48PathoLogic Step 3 Prediction of Metabolic
Pathways
- Infer reaction complement of organism
- Match enzymes in source genome to MetaCyc
reactions they catalyze - Match enzyme names and EC numbers to MetaCyc
- Support user in manually matching additional
enzymes - Computationally predict which MetaCyc metabolic
pathways are present - For each MetaCyc pathway, evaluate which of its
reactions are catalyzed by the organism
49Match Enzymes to Reactions
5.1.3.2
Gene product
MetaCyc
UDP-glucose-4-epimerase
2057 proteins matched by EC 314 matched by name
Match
yes
no
Assign
Probable enzyme -ase
1320
UDP-D-glucose ? UDP-galactose
no
yes
Manually search
Not a metabolic enzyme
yes
no
Assign
Cant Assign
625
50Import Pathways
MetaCyc
Containing pathways
reactions
Import All
Prune?
yes
no
Delete
Manual Review
yes
no
delete
keep
51Pathway Prediction
- Prediction is hard because
- Enzyme naming is irregular
- Some reactions present in multiple pathways
- Pathway variants share many reactions in common
- MetaCyc now has many pathways
52Pathway Scoring Criteria
- Imported pathways must satisfy
- Pathways outside their taxonomic range must have
enzymes for all reactions - If any reactions in a pathway are designated as
key, an enzyme must be present for at least one - Pathway P is imported if any conditions
satisfied - One unique enzyme present for P
- P missing at most one reaction
- More reactions present than absent for P
- P is not a superset of another pathway with the
same number of enzymes present
53Pathway Evidence Report
54PathoLogic Step 4 Pathway Hole Filler
- Definition Pathway Holes are reactions in
metabolic pathways for which no enzyme is
identified
1.4.3.-
quinolinate synthetase nadA
iminoaspartate
L-aspartate
quinolinate
holes
n.n. pyrophosphorylase nadC
NAD synthetase, NH3 -dependent CC3619
deamido-NAD
nicotinate nucleotide
2.7.7.18
6.3.5.1
NAD
55Step 1 Query UniProt for all sequences having
EC of pathway hole
Step 2 BLAST against target genome
Step 3 4 Consolidate hits and evaluate
evidence
gene X
organism 1 enzyme A
organism 2 enzyme A
organism 3 enzyme A
organism 4 enzyme A
7 queries have high-scoring hits to sequence Y
organism 5 enzyme A
gene Y
organism 6 enzyme A
organism 7 enzyme A
organism 8 enzyme A
gene Z
56Bayes Classifier
P(protein has function X E-value, avg. rank,
aln. length, etc.)
protein has function X
best E-value
pwy directon
avg. rank in BLAST output
adjacent rxns
Number of queries
of query aligned
57Pathway Hole Filler
- Why should hole filler find things beyond the
original genome annotation? - Reverse BLAST searches more sensitive
- Reverse BLAST searches find second domains
- Integration of multiple evidence types
58Caulobacter crescentus Pathway Holes
- 130 pathways containing 582 reactions
- 92 pathways contain 236 pathway holes
- Caulobacter holes filled
- 77 holes filled at P gt0.9
- Previous functions of candidate hole fillers
- No predicted function
- Correctly assigned single function
- Incorrectly assigned function
- Imprecise functional assignment
- BMC Bioinformatics 576 2004
59Example Pathway
CC2913, P0.99
1.4.3.-
quinolinate synthetase nadA (CC2912)
iminoaspartate
L-aspartate
quinolinate
holes
n.n. pyrophosphorylase nadC (CC2915)
NAD synthetase, NH3 -dependent CC3619
deamido-NAD
nicotinate nucleotide
2.7.7.18
CC3431, P0.90
6.3.5.1
NAD
CC3619, P0.99
CC2913 L-aspartate oxidase (wrong EC on
rxn) CC3431 ORF CC3619 put. NAD()-synthetase
(multidomain)
60PathoLogic Step 5Transport Inference Parser
- Problem Write a program to query a genome
annotation to compute the substrates an organism
can transport - Typical genome annotations for transporters
- ATP transporter for ribose
- ribose ABC transporter
- D-ribose ATP transporter
- ABC transporter, membrane spanning protein
ribose - ABC transporter, membrane spanning protein
D-ribose
61Transport Inference Parser
- Input ATP transporter of phosphonate
- Output Structured description of transport
activity - Locates most transporters in genome annotation
using keyword analysis - Parse product name using a series of rules to
identify - Transported substrate, co-substrate
- Influx/efflux
- Energy coupling mechanism
- Creates transport reaction object
- phosphonateperiplasm H2O ATP phosphonate
Pi ADP
62Transport Inference Parser
- Permits symbolic computation with transport
activities - Compute transportable substrates of the cell
- Compute connectivity among compartments for
substrates - Facilitate reasoning about transport/metabolism
connections - Draw transport cartoon in protein pages, cellular
overview
63Transport Inference Parser
- User reviews all assignments using interactive
tool that allows assignments to be revised - User also reviews transporters for which no
assignment was made
64Regulation
65Encoding Cellular Regulation in Pathway Tools --
Goals
- Facilitate curation of wide range of regulatory
information within a formal ontology - Compute with regulatory mechanisms and pathways
- Summary statistics, complex queries
- Pattern discovery
- Visualization of network components
- Provide training sets for inference of regulatory
networks - Interpret gene-expression datasets in the context
of known regulatory mechanisms
66Regulatory Interactions Supported by Pathway Tools
- Substrate-level regulation of enzyme activity
- Binding to proteins or small molecules
(phosphorylation) - Regulation of transcription initiation
- Attenuation of transcription
- Regulation of translation by proteins and by
small RNAs
67Regulation in Pathway Tools
- Editing tools
- Transcription factor display window
- Transcription unit display window
- Regulatory Overview / Omics Viewer
68Regulatory Interaction Editor
69Regulatory Overview and Omics Viewer
- Show regulatory relationships among gene groups
70Infer Anti-Microbial Drug Targets
- Infer drug targets as genes coding for enzymes
that encode chokepoint reactions - Two types of chokepoint reactions
- Chokepoint analysis of Plasmodium falciparum
- 216/303 reactions are chokepoints (73)
- All 3 clinically proven anti-malarial drugs
target chokepoints - 21/24 biologically validated drug targets are
chokepoints - 11.2 of chokepoints are drug targets
- 3.4 of non-chokepoints are drug targets
- gt Chokepoints are significantly enriched for
drug targets
Genome Research 14917 2004
71Comparative Analysis
- Via Cellular Overview
- Comparative genome browser
- Comparative pathway table
- Comparative analysis reports
- Compare reaction complements
- Compare pathway complements
- Compare transporter complements
72Summary
- Pathway/Genome Databases
- MetaCyc non-redundant DB of literature-derived
pathways - 400 organism-specific PGDBs available through SRI
at BioCyc.org - Computational theories of biochemical machinery
- Pathway Tools software
- Extract pathways from genomes
- Morph annotated genome into structured ontology
- Distributed curation tools for MODs
- Query, visualization, WWW publishing
73Information Sources
- Pathway Tools Users Guide
- aic-export/pathway-tools/ptools/13.0/doc/manuals/u
serguide.pdf - NOTE Location of the aic-export directory can
vary across different computers - Pathway Tools Web Site
- http//bioinformatics.ai.sri.com/ptools/
- Publications, FAQ, programming examples, etc.
- Slides from this tutorial
- http//www.ai.sri.com/pkarp/talks/
- BioCyc Webinars
- http//biocyc.org/webinar.shtml
74BioCyc and Pathway Tools Availability
- BioCyc.org Web site and database files freely
available to all - Pathway Tools freely available to non-profits
- Macintosh, PC/Windows, PC/Linux
75Symbolic Systems Biology
- Definition
- Global analyses of biological systems using
symbolic computing
76Symbolic Systems Biology
- Symbolic computing is concerned with the
representation and manipulation of information in
symbolic form. It is often contrasted with
numeric representation. -- R. Cameron - Examples of symbolic computation
- Symbolic algebra programs, e.g., Mathematica,
Graphing Calculator - Compilers and interpreters for programming
languages - Database query languages
- Text analysis programs, e.g., Google
- String matching for DNA and protein sequences
- Artificial Intelligence methods, e.g., expert
systems, symbolic logic, machine learning,
natural language understanding
77Symbolic Systems Biology
- Concerned with different questions than
quantitative systems biology - Symbolic analyses can in many cases produce
answers when quantitative approaches fail because
of lack of parameters or intractable mathematics - Symbolic computation is intimately dependent on
the use of structured ontologies
78Pathway Tools Ontology
- 1064 classes
- Main classes such as
- Pathways, Reactions, Compounds, Macromolecules,
Proteins, Replicons, DNA-Segments (Genes,
Operons, Promoters) - Taxonomies for Pathways, Reactions, Compounds
- 205 slots
- Meta-data Creator, Creation-Date
- Comment, Citations, Common-Name, Synonyms
- Attributes Molecular-Weight, DNA-Footprint-Size
- Relationships Catalyzes, Component-Of, Product
- Classes, instances, slots all stored side by side
in DBMS
79Critiquing the Parts List
Slide thanks to Hirotada Mori (minus the banana!)
80Dead End Metabolites
- A small molecule C is a dead-end if
- C is produced only by SMM reactions in
Compartment, and no transporter acts on C in
Compartment OR - C is consumed only by SMM reactions in
Compartment, and no transporter acts on C in
Compartment
81Dead End Metabolites
- Not yet an official part of Pathway Tools
- Contact us if youd like to use it
82Reachability Analysis of Metabolic Networks
- Given
- A PGDB for an organism
- A set of initial metabolites
- Infer
- What set of products can be synthesized by the
small-molecule metabolism of the organism - Motivations
- Quality control for PGDBs
- Verify that a known growth medium yields known
essential compounds - Experiment with other growth media
- Experiment with reaction knock-outs
- Limitations
- Cannot properly handle compounds required for
their own synthesis - Nutrients needed for reachability may be a
superset of those required for growth
Romero and Karp, Pacific Symposium on
Biocomputing, 2001
83Algorithm Forward PropagationThrough Production
System
- Each reaction becomes a production rule
- Each of the 21 metabolites in the nutrient set
becomes an axiom
A B ? C
84Nutrients A, B, C, E, F
A B ? W
C D ? X
E F ? Y
W Y ? Z
Produced Compounds W, Y, Z
85Initial Metabolite Nutrient Set (Total 21
compounds)
86Essential CompoundsE. coli Total 41 compounds
- Proteins (20)
- Amino acids
- Nucleic acids (DNA RNA) (8)
- Nucleosides
- Cell membrane (3)
- Phospholipids
- Cell wall (10)
- Peptidoglycan precursors
- Outer cell wall precursors (Lipid-A,
oligosaccharides)
87(No Transcript)
88- http//brg.ai.sri.com/ptools09/slides/Tuesday/grow
th-experiment-Markus-Krummenacker.txt
89Flux Balance Modeling
- Generate, store, and update metabolic model
within Pathway Tools - Fast, accurate generation of metabolic model
- Close coupling to genome and regulatory
information - Extensive schema
- Extensive query and visualization tools
- Debug/validate model using Pathway Tools
- Export to SBML and import to constraint solver
for model execution - Visualize reaction flux and omics data using
overviews - Copy/update multiple PGDBs to reflect alternative
strains