Pathway Tools / BioCyc Fundamentals - PowerPoint PPT Presentation

1 / 82
About This Presentation
Title:

Pathway Tools / BioCyc Fundamentals

Description:

Pathway Tools / BioCyc Fundamentals Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International pkarp_at_ai.sri.com BioCyc.org EcoCyc.org, MetaCyc.org, HumanCyc.org – PowerPoint PPT presentation

Number of Views:178
Avg rating:3.0/5.0
Slides: 83
Provided by: PeterK188
Category:

less

Transcript and Presenter's Notes

Title: Pathway Tools / BioCyc Fundamentals


1
Pathway Tools / BioCycFundamentals
  • Peter D. Karp, Ph.D.
  • Bioinformatics Research Group
  • SRI International
  • pkarp_at_ai.sri.com
  • BioCyc.org
  • EcoCyc.org, MetaCyc.org, HumanCyc.org

2
Pathway Tools Capabilities
  • Create and maintain an organism database
    integrating genome, pathway, regulatory
    information
  • Computational inference tools
  • Interactive editing tools
  • Query and visualize that database
  • Use the database to interpret omics data
  • Metabolic network analysis tools
  • Comparative analysis tools
  • Export the metabolic network to SBML
  • Speed creation of flux-balance models by order of
    magnitude

3
BioCyc
  • Hundreds of microbial genomes
  • Inferred operons and metabolic networks
  • Couples curated data with computational
    predictions
  • Supports analysis of omics data
  • Comparative analysis tools
  • Microbial emphasis. Exceptions
  • HumanCyc, MouseCyc, CattleCyc

4
Model Organism Databases /Organism Specific
Databases
  • DBs that describe the genome and other
    information about an organism
  • Every sequenced organism with an active
    experimental community requires a MOD
  • Integrate genome data with information about the
    biochemical and genetic network of the organism
  • Integrate literature-based information with
    computational predictions
  • Curated by experts for that organism
  • No one group can curate all the worlds genomes
  • Distribute workload across a community of experts
    to create a community resource

5
Rationale for MODs
  • Each complete genome is incomplete in several
    respects
  • 40-60 of genes have no assigned function
  • Roughly 7 of those assigned functions are
    incorrect
  • Many assigned functions are non-specific
  • Need continuous updating of annotations with
    respect to new experimental data and
    computational predictions
  • MODs are platforms for global analyses of an
    organism
  • Interpret omics data in a pathway context
  • In silico prediction of essential genes
  • Characterize systems properties of metabolic and
    genetic networks

6
What is Curation?
  • Ongoing updating and refinement of a PGDB
  • Correcting false-positive and false-negative
    predictions
  • Incorporating information from experimental
    literature
  • Authoring of comments and citations
  • Updating database fields
  • Gene positions, names, synonyms
  • Protein functions, activators, inhibitors
  • Addition of new pathways, modification of
    existing pathways
  • Defining TF binding sites, promoters, regulation
    of transcription initiation and other processes

7
Pathway/Genome Database
Pathways
Reactions
Compounds
Sequence Features
Proteins RNAs
Regulation Operons Promoters DNA Binding
Sites Regulatory Interactions
Genes
Chromosomes Plasmids
CELL
8
BioCyc Collection of 507 Pathway/Genome Databases
  • Pathway/Genome Database (PGDB) combines
    information about
  • Pathways, reactions, substrates
  • Enzymes, transporters
  • Genes, replicons
  • Transcription factors/sites, promoters, operons
  • Tier 1 Literature-Derived PGDBs
  • MetaCyc
  • EcoCyc -- Escherichia coli K-12
  • Tier 2 Computationally-derived DBs, Some
    Curation -- 24 PGDBs
  • HumanCyc
  • Mycobacterium tuberculosis
  • Tier 3 Computationally-derived DBs, No Curation
    -- 481 DBs

9
Pathway Tools Overview
Annotated Genome
MetaCyc Reference Pathway DB
PathoLogic
Pathway/Genome Database
Pathway/Genome Navigator
Pathway/Genome Editors
10
Pathway Tools Software PathoLogic
  • Computational creation of new Pathway/Genome
    Databases
  • Transforms genome into Pathway Tools schema and
    layers inferred information above the genome
  • Predicts operons
  • Predicts metabolic network
  • Predicts which genes code for missing enzymes in
    metabolic pathways
  • Infers transport reactions from transporter names

Bioinformatics 18S225 2002
11
Pathway Tools SoftwarePathway/Genome Editors
  • Interactively update PGDBs with graphical editors
  • Support geographically distributed teams of
    curators with object database system
  • Gene editor
  • Protein editor
  • Reaction editor
  • Compound editor
  • Pathway editor
  • Operon editor
  • Publication editor

12
Pathway Tools SoftwarePathway/Genome Navigator
  • Querying and visualization of
  • Pathways
  • Reactions
  • Metabolites
  • Proteins
  • Genes
  • Chromosomes
  • Two modes of operation
  • Web mode
  • Desktop mode
  • Most functionality shared, but each has unique
    functionality

13
Pathway Tools Software PGDBs Created Outside SRI
  • 1,700 licensees 75 groups applying software to
    300 organisms
  • Saccharomyces cerevisiae, SGD project, Stanford
    University
  • 135 pathways / 565 publications
  • Candida albicans, CGD project, Stanford
    University
  • dictyBase, Northwestern University
  • Mouse, MGD, Jackson Laboratory
  • Under development
  • Drosophila, FlyBase
  • C. elegans, WormBase
  • Arabidopsis thaliana, TAIR, Carnegie Institution
    of Washington
  • 288 pathways / 2282 publications
  • PlantCyc, Carnegie Institution of Washington
  • Six Solanaceae species, Cornell University
  • GrameneDB, Cold Spring Harbor Laboratory
  • Medicago truncatula, Samuel Roberts Noble
    Foundation

14
Pathway Tools Software PGDBs Created Outside SRI
  • NIAID BRCs for Biodefense pathogens
  • BioHealthBase -- Mycobacterium tuberculosis,
    Francisella tuleremia
  • Pathema -- 80 PGDBs
  • PATRIC Brucella suis, Coxiella burnetii,
    Rickettsia typhi
  • EuPathDB Cryptosporidium, Plasmodium
  • G. Xie, Los Alamos Lab, Dental pathogens
  • F. Brinkman, Simon Fraser Univ, Pseudomonas
    aeruginosa
  • V. Schachter, Genoscope, Acinetobacter
  • M. Bibb, John Innes Centre, Streptomyces
    coelicolor
  • G. Church, Harvard, Prochlorococcus marinus,
    multiple strains
  • E. Uberbacher, ORNL and G. Serres, MBL,
    Shewanella onedensis
  • R.J.S. Baerends, University of Groningen,
    Lactococcus lactis IL1403, Lactococcus lactis
    MG1363, Streptococcus pneumoniae TIGR4, Bacillus
    subtilis 168, Bacillus cereus ATCC14579
  • Matthew Berriman, Sanger Centre, Trypanosoma
    brucei, Leishmania major
  • Sergio Encarnacion, UNAM, Sinorhizobium meliloti
  • Mark van der Giezen, University of London,
    Entamoeba histolytica, Giardia intestinalis
  • Michael Gottfert, Technische Universitat Dresden,
    Bradyrhizobium japonicum
  • Artiva Maria Goudel, Universidade Federal de
    Santa Catarina, Brazil, Chromobacterium violaceum
    ATCC 12472

15
Pathway Tools Software PGDBs Created Outside SRI
  • Large scale users
  • C. Medigue, Genoscope, 200 PGDBs
  • G. Sutton, J. Craig Venter Institute, 80 PGDBs
  • G. Burger, U Montreal, 60 PGDBs
  • Bart Weimer, Utah State University, Lactococcus
    lactis, Brevibacterium linens, Lactobacillus
    acidophilus, Lactobacillus plantarum,
    Lactobacillus johnsonii, Listeria monocytogenes
  • Partial listing of outside PGDBs at BioCyc.org

16
Obtaining a PGDB for Organism of Interest
  • Find existing curated PGDB
  • Find existing PGDB in BioCyc
  • Create your own

17
EcoCyc Project EcoCyc.org
  • E. coli Encyclopedia
  • Review-level Model-Organism Database for E. coli
  • Tracks evolving annotation of the E. coli genome
    and cellular networks
  • The two paradigms of EcoCyc
  • Multi-dimensional annotation of the E. coli K-12
    genome
  • Positions of genes functions of gene products
    76 / 66 exp
  • Gene Ontology terms MultiFun terms
  • Gene product summaries and literature citations
  • Evidence codes
  • Multimeric complexes
  • Metabolic pathways
  • Cellular regulation

Karp, Gunsalus, Collado-Vides, Paulsen
Nuc. Acids Res. 357577 2007 ASM News
7025 2004 Science 2932040
18
EcoCyc E.coli Dataset
Pathway/Genome Navigator
URL EcoCyc.org
Pathways 246
Reactions Metabolic 1394 Transport 246
Compounds 1,830
EcoCyc v13.6 Citations 19,000
Proteins 4,479 Complexes 895 RNAs 285
Gene Regulation Operons 3,369 Trans
Factors 196 Promoters 1,796 TF Binding Sites
2,205
Genes 4,492
19
Paradigm 1EcoCyc as Textual Review Article
  • All gene products for which experimental
    literature exists are curated with a minireview
    summary
  • Found on protein and RNA pages, not gene pages!
  • 3257 gene products contain summaries
  • Summaries cover function, interactions, mutant
    phenotypes, crystal structures, regulation, and
    more
  • Additional summaries found in pages for operons,
    pathways
  • EcoCyc cites 17,300 publications

20
Paradigm 2 EcoCyc as Computational Symbolic
Theory
  • Highly structured, high-fidelity knowledge
    representation provides computable information
  • Each molecular species defined as a DB object
  • Genes, proteins, small molecules
  • Each molecular interaction defined as a DB object
  • Metabolic reactions
  • Transport reactions
  • Transcriptional regulation of gene expression
  • 220 database fields capture extensive properties
    and relationships

21
EcoCyc Procedures
  • DB updates performed by 5 staff curators
  • Information gathered from biomedical literature
  • Enter data into structured database fields
  • Author extensive summaries
  • Update evidence codes
  • Corrections submitted by E. coli researchers
  • Four releases per year
  • Quality assurance of data and software
  • Evaluate database consistency constraints
  • Perform element balancing of reactions
  • Run other checking programs

22
EcoCyc Accelerates Science
  • Experimentalists
  • E. coli experimentalists
  • Experimentalists working with other microbes
  • Analysis of expression data
  • Computational biologists
  • Biological research using computational methods
  • Genome annotation
  • Study connectivity of E. coli metabolic network
  • Study phylogentic extent of metabolic pathways
    and enzymes in all domains of life
  • Bioinformaticists
  • Training and validation of new bioinformatics
    algorithms predict operons, promoters, protein
    functional linkages, protein-protein
    interactions,
  • Metabolic engineers
  • Design of organisms for the production of
    organic acids, amino acids, ethanol, hydrogen,
    and solvents
  • Educators

23
MetaCyc Metabolic Encyclopedia
  • Describe a representative sample of every
    experimentally determined metabolic pathway
  • Describe properties of metabolic enzymes
  • Literature-based DB with extensive references and
    commentary
  • Pathways, reactions, enzymes, substrates
  • Jointly developed by
  • P. Karp, R. Caspi, C. Fulcher, SRI International
  • L. Mueller, A. Pujar, Boyce Thompson Institute
  • S. Rhee, P. Zhang, Carnegie Institution

Nucleic Acids Research 2008
24
Applications of MetaCyc
  • Reference source on metabolic pathways
  • Metabolic engineering
  • Find enzymes with desired activities, regulatory
    properties
  • Determine cofactor requirements
  • Predict pathways from genomes
  • Systematic studies of metabolism
  • Computer-aided education

25
MetaCyc Data -- Version 13.6
Pathways 1,436
Reactions 8,200
Enzymes 6,060
Small Molecules 8,400
Organisms 1,800
Citations 21,700
26
Taxonomic Distribution ofMetaCyc Pathways
version 13.1
Bacteria 883
Green Plants 607
Fungi 199
Mammals 159
Archaea 112
27
MetaCyc Curation
  • DB updates by 5 staff curators
  • Information gathered from biomedical literature
  • Emphasis on microbial and plant pathways
  • More prevalent pathways given higher priority
  • Review-level database
  • Four releases per year
  • Quality assurance of data and software
  • Evaluate database consistency constraints
  • Perform element balancing of reactions
  • Run other checking programs
  • Display every DB object

28
MetaCyc Curation
  • Ontologies guide querying
  • Pathways (recently revised), compounds, enzymatic
    reactions
  • Example Coenzyme M biosynthesis
  • Extensive citations and commentary
  • Evidence codes
  • Controlled vocabulary of evidence types
  • Attach to pathways and enzymes
  • Code Citation Curator date
  • Release notes explain recent updates
  • http//biocyc.org/metacyc/release-notes.shtml

29
MetaCyc Data
  • Of the 1548 enzymes
  • 818 are monomers
  • 730 are multimers
  • 570 are homomultimers, 160 are heteromultimers
  • Enzymes with cofactors 512
  • Enzymes with activators or inhibitors 577
  • Average pathway length 5 reactions

30
Enzyme Data Available in MetaCyc
  • Reaction(s) catalyzed
  • Alternative substrates
  • Activators, inhibitors, cofactors, prosthetic
    groups
  • Subunit structure
  • Genes
  • Features on protein sequence
  • Cellular location
  • pI, molecular weight, Km, Vmax
  • Gene Ontology terms
  • Links to other bioinformatics databases

31
What is a Pathway?
  • A connected sequence of biochemical reactions
  • Occurs in one organism
  • Conserved through evolution
  • Regulated as a unit
  • Often starts or stops at one of 13 common
    intermediate metabolites

32
MetaCyc Pathway Variants
  • Pathways that accomplish similar biochemical
    functions using different biochemical routes
  • Alanine biosynthesis I E. coli
  • Alanine biosynthesis II H. sapiens
  • Pathways that accomplish similar biochemical
    functions using similar sets of reactions
  • Several variants of TCA Cycle

33
MetaCyc Super-Pathways
  • Groups of pathways linked by common substrates
  • Example Super-pathway containing
  • Chorismate biosynthesis
  • Tryptophan biosynthesis
  • Phenylalanine biosynthesis
  • Tyrosine biosynthesis
  • Super-pathways defined by listing their component
    pathways
  • Multiple levels of super-pathways can be defined
  • Pathway layout algorithms accommodate
    super-pathways

34
Family of Pathway/GenomeDatabases
35
Comparison with KEGG
  • KEGG vs MetaCyc Reference pathway collections
  • KEGG maps are not pathways Nuc Acids
    Res 343687 2006
  • KEGG maps contain multiple biological pathways
  • Two genes chosen at random from a BioCyc pathway
    are more likely to be related according to genome
    context methods than from a KEGG pathway
  • KEGG maps are composites of pathways in many
    organisms -- do not identify what specific
    pathways elucidated in what organisms
  • KEGG has no literature citations, no comments,
    less enzyme detail
  • KEGG assigns half as many reactions to pathways
    as MetaCyc
  • KEGG vs organism-specific PGDBs
  • KEGG does not curate or customize pathway
    networks for each organism
  • Highly curated PGDBs now exist for important
    organisms such as E. coli, yeast, mouse,
    Arabidopsis

36
Comparison of Pathway Tools to KEGG
  • Inference tools
  • KEGG does not predict presence or absence of
    pathways
  • KEGG lacks pathway hole filler, operon predictor
  • Curation tools
  • KEGG does not distribute curation tools
  • No ability to customize pathways to the organism
  • Pathway Tools schema much more comprehensive
  • Visualization and analysis
  • KEGG does not perform automatic pathway layout
  • KEGG metabolic-map diagram extremely limited
  • No comparative pathway analysis

37
Pathway Tools Implementation Details
  • Platforms
  • Macintosh, PC/Linux, and PC/Windows platforms
  • Same binary can run as desktop app or Web server
  • Production-quality software
  • Version control
  • Two regular releases per year
  • Extensive quality assurance
  • Extensive documentation
  • Auto-patch
  • Automatic DB-upgrade
  • 480,000 lines of Lisp code

38
  • ptools-support_at_ai.sri.com

39
Pathway Tools Architecture
Pathway Genome Navigator
Web Mode
Desktop Mode
Protein Editor Pathway Editor Reaction Editor
Lisp Perl Java
GFP API
Oracle or MySQL
Disk File
Ocelot DBMS
40
Ocelot Knowledge Server Architecture
  • Frame data model
  • Minimizes size of schema relative to semantic
    complexity
  • Schema is stored within the DB
  • Schema is self documenting
  • Slot units define metadata about slots
  • Domain, range, inverse
  • Collection type, number of values, value
    constraints
  • Comment
  • Schema evolution facilitated by
  • Easy addition/removal of slots, or alteration of
    slot datatypes
  • Flexible data formats that do not require
    dumping/reloading of data

41
Ocelot Storage System Architecture
  • Persistent storage via disk files or Oracle or
    MySQL
  • Concurrent development Oracle or MySQL
  • Single-user development disk files
  • Oracle/MySQL DBMS storage
  • DBMS is submerged within Ocelot, invisible to
    users
  • Frames transferred from DBMS to Ocelot
  • On demand
  • By background prefetcher
  • Memory cache
  • Persistent disk cache to speed performance via
    Internet
  • Transaction logging facility

42
Why Do We Code in Common Lisp?
  • Gatt studied Lisp and Java implementation of 16
    programs by 14 programmers (Intelligence 1121
    2000)
  • The average Lisp program ran 33 times faster than
    the average Java program
  • The average Lisp program was written 5 times
    faster than the average Java program
  • Roberts compared Java and Lisp implementations of
    a Domain Name Server (DNS) resolver
  • http//www.findinglisp.com/papers/case_study_java_
    lisp_dns.html
  • The Lisp version had ½ as many lines as code

43
Common Lisp ProgrammingEnvironment
  • Interpreted and/or compiled execution
  • Fabulous debugging environment
  • High-level language
  • Interactive data exploration
  • Extensive built-in libraries
  • Dynamic redefinition
  • Find out more!
  • See ALU.org or
  • http//www.international-lisp-conference.org/

44
PathoLogic Processing
  • Translate source genome to PGDB form
  • Predict operons
  • Predict metabolic pathways
  • Predict pathway hole fillers
  • Transport inference parser
  • Build metabolic overview diagram

45
PathoLogic Step 1 Translate Genome to PGDB
Annotated Genomic Sequence
Pathway/Genome Database
Pathways
Reactions
PathoLogic Software Integrates genome and pathway
data to identify putative metabolic networks
Compounds
Multi-organism Pathway Database (MetaCyc)
Gene Products
Genes
Genomic Map
46
(No Transcript)
47
PathoLogic Step 2Predict Operons
  • Predict adjacent genes A and B in same operon
    based on
  • Intragenic distance
  • Functional relatedness of A and B
  • Tests for functional relatedness
  • A and B in same gene functional class (MultiFun)
  • A and B in same metabolic pathway
  • A codes for enzyme in a pathway and B codes for
    transporter involving a substrate in that pathway
  • A and B are monomers in same protein complex
  • Correctly predicts 80 of E. coli transcription
    units
  • Marks predicted operons with computational
    evidence codes

Bioinformatics 20709-17 2004
48
PathoLogic Step 3 Prediction of Metabolic
Pathways
  • Infer reaction complement of organism
  • Match enzymes in source genome to MetaCyc
    reactions they catalyze
  • Match enzyme names and EC numbers to MetaCyc
  • Support user in manually matching additional
    enzymes
  • Computationally predict which MetaCyc metabolic
    pathways are present
  • For each MetaCyc pathway, evaluate which of its
    reactions are catalyzed by the organism

49
Match Enzymes to Reactions
5.1.3.2
Gene product
MetaCyc
UDP-glucose-4-epimerase
2057 proteins matched by EC 314 matched by name
Match
yes
no
Assign
Probable enzyme -ase
1320
UDP-D-glucose ? UDP-galactose
no
yes
Manually search
Not a metabolic enzyme
yes
no
Assign
Cant Assign
625
50
Import Pathways
MetaCyc
Containing pathways
reactions
Import All
Prune?
yes
no
Delete
Manual Review
yes
no
delete
keep
51
Pathway Prediction
  • Prediction is hard because
  • Enzyme naming is irregular
  • Some reactions present in multiple pathways
  • Pathway variants share many reactions in common
  • MetaCyc now has many pathways

52
Pathway Scoring Criteria
  • Imported pathways must satisfy
  • Pathways outside their taxonomic range must have
    enzymes for all reactions
  • If any reactions in a pathway are designated as
    key, an enzyme must be present for at least one
  • Pathway P is imported if any conditions
    satisfied
  • One unique enzyme present for P
  • P missing at most one reaction
  • More reactions present than absent for P
  • P is not a superset of another pathway with the
    same number of enzymes present

53
Pathway Evidence Report
54
PathoLogic Step 4 Pathway Hole Filler
  • Definition Pathway Holes are reactions in
    metabolic pathways for which no enzyme is
    identified

1.4.3.-
quinolinate synthetase nadA
iminoaspartate
L-aspartate
quinolinate
holes
n.n. pyrophosphorylase nadC
NAD synthetase, NH3 -dependent CC3619
deamido-NAD
nicotinate nucleotide
2.7.7.18
6.3.5.1
NAD
55
Step 1 Query UniProt for all sequences having
EC of pathway hole
Step 2 BLAST against target genome
Step 3 4 Consolidate hits and evaluate
evidence
gene X
organism 1 enzyme A
organism 2 enzyme A
organism 3 enzyme A
organism 4 enzyme A
7 queries have high-scoring hits to sequence Y
organism 5 enzyme A
gene Y
organism 6 enzyme A
organism 7 enzyme A
organism 8 enzyme A
gene Z
56
Bayes Classifier
P(protein has function X E-value, avg. rank,
aln. length, etc.)
protein has function X
best E-value
pwy directon
avg. rank in BLAST output
adjacent rxns
Number of queries
of query aligned
57
Pathway Hole Filler
  • Why should hole filler find things beyond the
    original genome annotation?
  • Reverse BLAST searches more sensitive
  • Reverse BLAST searches find second domains
  • Integration of multiple evidence types

58
Caulobacter crescentus Pathway Holes
  • 130 pathways containing 582 reactions
  • 92 pathways contain 236 pathway holes
  • Caulobacter holes filled
  • 77 holes filled at P gt0.9
  • Previous functions of candidate hole fillers
  • No predicted function
  • Correctly assigned single function
  • Incorrectly assigned function
  • Imprecise functional assignment
  • BMC Bioinformatics 576 2004

59
Example Pathway
CC2913, P0.99
1.4.3.-
quinolinate synthetase nadA (CC2912)
iminoaspartate
L-aspartate
quinolinate
holes
n.n. pyrophosphorylase nadC (CC2915)
NAD synthetase, NH3 -dependent CC3619
deamido-NAD
nicotinate nucleotide
2.7.7.18
CC3431, P0.90
6.3.5.1
NAD
CC3619, P0.99
CC2913 L-aspartate oxidase (wrong EC on
rxn) CC3431 ORF CC3619 put. NAD()-synthetase
(multidomain)
60
PathoLogic Step 5Transport Inference Parser
  • Problem Write a program to query a genome
    annotation to compute the substrates an organism
    can transport
  • Typical genome annotations for transporters
  • ATP transporter for ribose
  • ribose ABC transporter
  • D-ribose ATP transporter
  • ABC transporter, membrane spanning protein
    ribose
  • ABC transporter, membrane spanning protein
    D-ribose

61
Transport Inference Parser
  • Input ATP transporter of phosphonate
  • Output Structured description of transport
    activity
  • Locates most transporters in genome annotation
    using keyword analysis
  • Parse product name using a series of rules to
    identify
  • Transported substrate, co-substrate
  • Influx/efflux
  • Energy coupling mechanism
  • Creates transport reaction object
  • phosphonateperiplasm H2O ATP phosphonate
    Pi ADP

62
Transport Inference Parser
  • Permits symbolic computation with transport
    activities
  • Compute transportable substrates of the cell
  • Compute connectivity among compartments for
    substrates
  • Facilitate reasoning about transport/metabolism
    connections
  • Draw transport cartoon in protein pages, cellular
    overview

63
Transport Inference Parser
  • User reviews all assignments using interactive
    tool that allows assignments to be revised
  • User also reviews transporters for which no
    assignment was made

64
Regulation
65
Encoding Cellular Regulation in Pathway Tools --
Goals
  • Facilitate curation of wide range of regulatory
    information within a formal ontology
  • Compute with regulatory mechanisms and pathways
  • Summary statistics, complex queries
  • Pattern discovery
  • Visualization of network components
  • Provide training sets for inference of regulatory
    networks
  • Interpret gene-expression datasets in the context
    of known regulatory mechanisms

66
Regulatory Interactions Supported by Pathway Tools
  • Substrate-level regulation of enzyme activity
  • Binding to proteins or small molecules
    (phosphorylation)
  • Regulation of transcription initiation
  • Attenuation of transcription
  • Regulation of translation by proteins and by
    small RNAs

67
Regulation in Pathway Tools
  • Editing tools
  • Transcription factor display window
  • Transcription unit display window
  • Regulatory Overview / Omics Viewer

68
Regulatory Interaction Editor
69
Regulatory Overview and Omics Viewer
  • Show regulatory relationships among gene groups

70
Infer Anti-Microbial Drug Targets
  • Infer drug targets as genes coding for enzymes
    that encode chokepoint reactions
  • Two types of chokepoint reactions
  • Chokepoint analysis of Plasmodium falciparum
  • 216/303 reactions are chokepoints (73)
  • All 3 clinically proven anti-malarial drugs
    target chokepoints
  • 21/24 biologically validated drug targets are
    chokepoints
  • 11.2 of chokepoints are drug targets
  • 3.4 of non-chokepoints are drug targets
  • gt Chokepoints are significantly enriched for
    drug targets

Genome Research 14917 2004
71
Comparative Analysis
  • Via Cellular Overview
  • Comparative genome browser
  • Comparative pathway table
  • Comparative analysis reports
  • Compare reaction complements
  • Compare pathway complements
  • Compare transporter complements

72
Summary
  • Pathway/Genome Databases
  • MetaCyc non-redundant DB of literature-derived
    pathways
  • 400 organism-specific PGDBs available through SRI
    at BioCyc.org
  • Computational theories of biochemical machinery
  • Pathway Tools software
  • Extract pathways from genomes
  • Morph annotated genome into structured ontology
  • Distributed curation tools for MODs
  • Query, visualization, WWW publishing

73
Information Sources
  • Pathway Tools Users Guide
  • aic-export/pathway-tools/ptools/13.0/doc/manuals/u
    serguide.pdf
  • NOTE Location of the aic-export directory can
    vary across different computers
  • Pathway Tools Web Site
  • http//bioinformatics.ai.sri.com/ptools/
  • Publications, FAQ, programming examples, etc.
  • Slides from this tutorial
  • http//www.ai.sri.com/pkarp/talks/
  • BioCyc Webinars
  • http//biocyc.org/webinar.shtml

74
BioCyc and Pathway Tools Availability
  • BioCyc.org Web site and database files freely
    available to all
  • Pathway Tools freely available to non-profits
  • Macintosh, PC/Windows, PC/Linux

75
Symbolic Systems Biology
  • Definition
  • Global analyses of biological systems using
    symbolic computing

76
Symbolic Systems Biology
  • Symbolic computing is concerned with the
    representation and manipulation of information in
    symbolic form. It is often contrasted with
    numeric representation. -- R. Cameron
  • Examples of symbolic computation
  • Symbolic algebra programs, e.g., Mathematica,
    Graphing Calculator
  • Compilers and interpreters for programming
    languages
  • Database query languages
  • Text analysis programs, e.g., Google
  • String matching for DNA and protein sequences
  • Artificial Intelligence methods, e.g., expert
    systems, symbolic logic, machine learning,
    natural language understanding

77
Symbolic Systems Biology
  • Concerned with different questions than
    quantitative systems biology
  • Symbolic analyses can in many cases produce
    answers when quantitative approaches fail because
    of lack of parameters or intractable mathematics
  • Symbolic computation is intimately dependent on
    the use of structured ontologies

78
Pathway Tools Ontology
  • 1064 classes
  • Main classes such as
  • Pathways, Reactions, Compounds, Macromolecules,
    Proteins, Replicons, DNA-Segments (Genes,
    Operons, Promoters)
  • Taxonomies for Pathways, Reactions, Compounds
  • 205 slots
  • Meta-data Creator, Creation-Date
  • Comment, Citations, Common-Name, Synonyms
  • Attributes Molecular-Weight, DNA-Footprint-Size
  • Relationships Catalyzes, Component-Of, Product
  • Classes, instances, slots all stored side by side
    in DBMS

79
Critiquing the Parts List
Slide thanks to Hirotada Mori (minus the banana!)
80
Dead End Metabolites
  • A small molecule C is a dead-end if
  • C is produced only by SMM reactions in
    Compartment, and no transporter acts on C in
    Compartment OR
  • C is consumed only by SMM reactions in
    Compartment, and no transporter acts on C in
    Compartment

81
Dead End Metabolites
  • Not yet an official part of Pathway Tools
  • Contact us if youd like to use it

82
Reachability Analysis of Metabolic Networks
  • Given
  • A PGDB for an organism
  • A set of initial metabolites
  • Infer
  • What set of products can be synthesized by the
    small-molecule metabolism of the organism
  • Motivations
  • Quality control for PGDBs
  • Verify that a known growth medium yields known
    essential compounds
  • Experiment with other growth media
  • Experiment with reaction knock-outs
  • Limitations
  • Cannot properly handle compounds required for
    their own synthesis
  • Nutrients needed for reachability may be a
    superset of those required for growth

Romero and Karp, Pacific Symposium on
Biocomputing, 2001
83
Algorithm Forward PropagationThrough Production
System
  • Each reaction becomes a production rule
  • Each of the 21 metabolites in the nutrient set
    becomes an axiom

A B ? C
84
Nutrients A, B, C, E, F
A B ? W
C D ? X
E F ? Y
W Y ? Z
Produced Compounds W, Y, Z
85
Initial Metabolite Nutrient Set (Total 21
compounds)
86
Essential CompoundsE. coli Total 41 compounds
  • Proteins (20)
  • Amino acids
  • Nucleic acids (DNA RNA) (8)
  • Nucleosides
  • Cell membrane (3)
  • Phospholipids
  • Cell wall (10)
  • Peptidoglycan precursors
  • Outer cell wall precursors (Lipid-A,
    oligosaccharides)

87
(No Transcript)
88
  • http//brg.ai.sri.com/ptools09/slides/Tuesday/grow
    th-experiment-Markus-Krummenacker.txt

89
Flux Balance Modeling
  • Generate, store, and update metabolic model
    within Pathway Tools
  • Fast, accurate generation of metabolic model
  • Close coupling to genome and regulatory
    information
  • Extensive schema
  • Extensive query and visualization tools
  • Debug/validate model using Pathway Tools
  • Export to SBML and import to constraint solver
    for model execution
  • Visualize reaction flux and omics data using
    overviews
  • Copy/update multiple PGDBs to reflect alternative
    strains
Write a Comment
User Comments (0)
About PowerShow.com