Title: Combine knowledge, data, and tools to
1Goals of Computational Biology
- Combine knowledge, data, and tools to
- conduct complex analyses.
- simulate complex phenomena.
2Goals of Computational Biology
- Combine knowledge, data, and tools to
Analysis andPrediction
- conduct complex analyses.
- simulate complex phenomena.
3Goals of Computational Biology
- Combine knowledge, data, and tools to
- conduct complex analyses.
- simulate complex phenomena.
Model Development
4Experimental Data
Knowledge
BioLingua
Interactive Guidance from Biologists
5BioLingua Computational Biology Workbench
- Integrates Genomic and Data Analysis Tools
- Integrates Organism-specific as well as General
Knowledge - Unifies Important Knowledge Bases
- Integrates Model Development and Refinement
tools - Offers a Flexible Open Programming Methodology
- Provides Convenient Universal Access (fully
web-enabled)
6BioLingua Computational Biology Workbench
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
7BioLingua Computational Biology Workbench
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
8BioLingua Computational Biology Workbench
Standard analytic tools
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
9BioLingua Computational Biology Workbench
Standard analytic tools, plus discovery tools
that combine know- ledge and data under user
control.
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
10BioLingua Computational Biology Workbench
BioLisp Scripting Layer
A simple programming language to be used by
biologists to answer specific questions
regarding the integration of their data with the
concepts below.
Standard analytic tools, plus discovery tools
that combine know- ledge and data under user
control.
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
11BioLisp
(loop for pm4gene in (Genes ProcMed4)
as all-orthologous (all-blast-orthologs
pm4gene) as 6803ortholog (intersect
(Genes Syny6803) all-orthologous) when
(and (not-any member-geneid
(Genes slotv Proc9313)
all-orthologous)) (any
'member-geneID 6803ortholog)
(gt ma-ratio (ma-select 6803ortholog
Hihara1) 2))) collect light-specific-genes
6803ortholog)
- Biologically Specialized Programming Language
- Highly efficient (deeply compiled and optimized)
- Very Concise and Expressive (general purpose)
- Based upon Lisp
- The second oldest programming language
- The standard language of Artificial Intelligence
12BioLisp
(loop for pm4gene in (Genes ProcMed4)
as all-orthologous (all-blast-orthologs
pm4gene) as 6803ortholog (intersect
(Genes Syny6803) all-orthologous) when
(and (not-any member-geneid
(Genes slotv Proc9313)
all-orthologous)) (any
'member-geneID 6803ortholog)
(gt ma-ratio (ma-select 6803ortholog
Hihara1) 2))) collect light-specific-genes
6803ortholog)
- Biologically Specialized Programming Language
- Highly efficient (deeply compiled and optimized)
- Very Concise and Expressive (general purpose)
- Based upon Lisp
- The second oldest programming language
- The standard language of Artificial Intelligence
- And 8th-graders can learn it!
13Goals of Computational Biology
Enable biologists to
- Combine knowledge, data, and tools to
Analysis andPrediction
- conduct complex analyses.
- simulate complex phenomena.
Model Development
14Goals of Computational Biology
Enable biologists to
- Combine knowledge, data, and tools to
Analysis andPrediction
- conduct complex analyses.
- simulate complex phenomena.
15How do cells control response to light?
I.e., What genes are related to the adaptation
to high light?
Prochlorococcus MED4
Prochlorococcus MIT9313
16How do cells control response to light?
I.e., What genes are related to the adaptation
to high light?
Outline Protocol
Look for
- Gene present in Prochlorococcus MED4 MED4 is
naturally adapted to grow in high light.
- Ortholog absent in Prochlorococcus MIT9313
MIT9313 is naturally adapted to grow in low light
- Ortholog present in Synechocystis PCC 6803
In order to make contact with annotation and
microarray data
- Synechocystis PCC 6803 ortholog responds to high
light Gene turns on by factor gt 2 in response
to high light
17English Protocol
For each gene in ProMed4, Find all the
genes Blast orthologs, Find those from
Syny6803, When there are not any Pro9313
genes in the Blast orthologs, and
there are any the 6803 orthologs and
the expression ratio for the 6803 orthologs
in the Hihara microarray data is gt
2, collect the 6803 orthologs in a list, called
light-specific-genes.
18BioLingua Computational Biology Workbench
BioLisp Scripting Layer
A simple programming language to be used by
biologists to answer specific questions
regarding the integration of their data with the
concepts below.
Standard analytic tools, plus discovery tools
that combine know- ledge and data under user
control.
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
19BioLisp Program
(loop for pm4gene in (Genes ProcMed4)
as all-orthologous (all-blast-orthologs
pm4gene) as 6803ortholog (intersect
(Genes Syny6803) all-orthologous) when
(and (not-any member-geneid
(Genes slotv Proc9313)
all-orthologous)) (any
'member-geneID 6803ortholog)
(gt ma-ratio (ma-select 6803ortholog
Hihara1) 2))) collect light-specific-genes
6803ortholog)
By Jeff Elhai
20 For each gene in ProMed4, Find all the
genes Blast orthologs, Find those from
Syny6803, When there are not any Pro9313
genes in the Blast orthologs, and
there are any the 6803 orthologs and
the expression ratio for the 6803 orthologs
in the Hihara microarray data is gt
2, collect the 6803 orthologs in a list, called
light-specific-genes.
(loop for pm4gene in (Genes ProcMed4)
as all-orthologous (all-blast-orthologs
pm4gene) as 6803ortholog (intersect
(Genes Syny6803) all-orthologous) when
(and (not-any member-geneid
(Genes slotv Proc9313)
all-orthologous)) (any
'member-geneID 6803ortholog)
(gt ma-ratio (ma-select 6803ortholog
Hihara1) 2))) collect light-specific-genes
6803ortholog)
21BioLingua Alpha Platform
(http//nostoc.stanford.edu8002/biologin)
22Set Light-specific genes
Syny6803sll0990 Formaldehyde
dehydrogenase (glutathione dependent)
Syny6803srl7009 trnR tRNA Arg (UCU)
Syny6803slr1331 Processing protease
Syny6803slr1332 fabF beta ketoacyl acyl
carrier protein synthase
Syny6803sll0337 Sensor histidine
kinase
Syny6803sll0335 Hypothetical
Syny6803sll0789 Response regulator
(OmpR)
Syny6803sll0788 Hypothetical protein
Syny6803sll0576 Putative
epimerase/hydratase
23Goals of Computational Biology
Enable biologists to
- Combine knowledge, data, and tools to
Analysis andPrediction
- conduct complex analyses.
- simulate complex phenomena.
24Cyclodyn Experimental Design
Continuous Culture Turbidostat
Light Levels
Sampling mRNA/cDNA
Time
25Light
26P700 apoprotein subunit Ia (psaA) ATP synthase
subunit a (atpI) photosystem I subunit III
(psaF) photosystem II D2 protein (psbD2) sensory
transduction histidine kinasephycocyanin b
subunit (cpcB) phycocyanin a subunit
(cpcA) allophycocyanin a chain (apcA) photosystem
II D1 protein (psbA2) phycocyanin associated
linker protein (cpcC) OmpR subfamily hypothetical
protein (atp1) photosystem II CP43 protein
(psbC) photosystem II D1 protein (psbA3) 50S
ribosomal protein L10 (rpl10) P700 apoprotein
subunit Ia (psaA) ATP synthase subunit a
(atpI) photosystem I subunit III
(psaF) photosystem II D2 protein (psbD2) sensory
transduction histidine kinase
phycocyanin b subunit (cpcB) phycocyanin a
subunit (cpcA) allophycocyanin a chain
(apcA) photosystem II D1 protein
(psbA2) phycocyanin associated linker protein
(cpcC) OmpR subfamily hypothetical protein
(atp1) photosystem II CP43 protein
(psbC) photosystem II D1 protein (psbA3) 50S
ribosomal protein L10 (rpl10)
27Verbal Model
When "awake" (day) the cell regulates its
photosystem (PS) genes so as to match
photosynthetic output to energy demands. When
the available light exceeds its needs, the PS is
down-regulated, leading to an "M" pattern of
expression. At night, the cell sleeps, leading
to another drop in expression patterns at night.
Graphical Model
28Computable Model
(photosynthesis isa process with inputs
(chloroplast-inside.water everywhere.light
chloroplast-outside.nadph chloroplast-outside.a
dp chloroplast-outside.pi) outputs
(chloroplast-outside.atp chloroplast-outside.nadph
everywhere.o2) implemented-by
photosystem) (photosystem composition (psii
antenna-array atpase pq-pool)) (light-absorption
isa process with inputs (everywhere.light)
outputs (chlorophyll.energy) function
absorption implemented-by chlorophyll) (light-en
ergy-concentration isa process with outputs
psii.energy driver chlorophyll.energy
function concentration implemented-by
antenna-array) (psii-water-breakdown isa process
with inputs (chloroplast-inside.water) driver
psii.energy outputs (psii.e- psii.e-
chloroplast-inside.h chloroplast-inside.o2)
function molecular-splitting implemented-by
psii) (psii-pq-reduction isa process with
inputs (psii.e- chloroplast-membrane.h
chloroplast-membrane.plastoquinone) outputs
(chloroplast-membrane.plastoquinol) function
reduction implemented-by psii inhibited-by
dcmu)
29Explanation by Pathway Tracing
(photosynthesis isa process with inputs
(chloroplast-inside.water everywhere.light
chloroplast-outside.nadph chloroplast-outside.a
dp chloroplast-outside.pi) outputs
(chloroplast-outside.atp chloroplast-outside.nadph
everywhere.o2) implemented-by
photosystem) (photosystem composition (psii
antenna-array atpase pq-pool)) (light-absorption
isa process with inputs (everywhere.light)
outputs (chlorophyll.energy) function
absorption implemented-by chlorophyll) (light-en
ergy-concentration isa process with outputs
psii.energy driver chlorophyll.energy
function concentration implemented-by
antenna-array) (psii-water-breakdown isa process
with inputs (chloroplast-inside.water) driver
psii.energy outputs (psii.e- psii.e-
chloroplast-inside.h chloroplast-inside.o2)
function molecular-splitting implemented-by
psii) (psii-pq-reduction isa process with
inputs (psii.e- chloroplast-membrane.h
chloroplast-membrane.plastoquinone) outputs
(chloroplast-membrane.plastoquinol) function
reduction implemented-by psii inhibited-by
dcmu)
30Explanation by Pathway Tracing
(track-object 'chloroplast-inside.water)Tracking
CHLOROPLAST-INSIDE.WATER -gt PHOTOSYNTHESIS
Tracking CHLOROPLAST-OUTSIDE.ATP Tracking
CHLOROPLAST-OUTSIDE.NADPH Tracking
EVERYWHERE.O2 -gt PSII-WATER-BREAKDOWN
Tracking PSII.E- -gt PSII-PQ-REDUCTION
Tracking CHLOROPLAST-MEMBRANE.PLASTOQUINOL
-gt E-FUNNLING-PSII-TO-PSI Tracking
PSI.E- -gt PSI-NADPH-FORMATION
Tracking CHLOROPLAST-INSIDE.H -gt
ATP-FORMATION Tracking CHLOROPLAST-INSIDE.O2
-gt O2-DIFFUSSION
31Explanation by Pathway Tracing
(track-object 'chloroplast-inside.water)Tracking
CHLOROPLAST-INSIDE.WATER -gt PHOTOSYNTHESIS
Tracking CHLOROPLAST-OUTSIDE.ATP Tracking
CHLOROPLAST-OUTSIDE.NADPH Tracking
EVERYWHERE.O2 -gt PSII-WATER-BREAKDOWN
Tracking PSII.E- -gt PSII-PQ-REDUCTION
Tracking CHLOROPLAST-MEMBRANE.PLASTOQUINOL
-gt E-FUNNLING-PSII-TO-PSI Tracking
PSI.E- -gt PSI-NADPH-FORMATION
Tracking CHLOROPLAST-INSIDE.H -gt
ATP-FORMATION Tracking CHLOROPLAST-INSIDE.O2
-gt O2-DIFFUSSION
32Current representational practice
From GenNav, the NIH Gene Ontology Browser
33Goal Replace the Gene Ontology with Process
Models
(photosynthesis isa process with inputs
(chloroplast-inside.water everywhere.light
chloroplast-outside.nadph chloroplast-outside.a
dp chloroplast-outside.pi) outputs
(chloroplast-outside.atp chloroplast-outside.nadph
everywhere.o2) implemented-by
photosystem) (photosystem composition (psii
antenna-array atpase pq-pool)) (light-absorption
isa process with inputs (everywhere.light)
outputs (chlorophyll.energy) function
absorption implemented-by chlorophyll) (light-en
ergy-concentration isa process with outputs
psii.energy driver chlorophyll.energy
function concentration implemented-by
antenna-array) (psii-water-breakdown isa process
with inputs (chloroplast-inside.water) driver
psii.energy outputs (psii.e- psii.e-
chloroplast-inside.h chloroplast-inside.o2)
function molecular-splitting implemented-by
psii) (psii-pq-reduction isa process with
inputs (psii.e- chloroplast-membrane.h
chloroplast-membrane.plastoquinone) outputs
(chloroplast-membrane.plastoquinol) function
reduction implemented-by psii inhibited-by
dcmu)
From GenNav, the NIH Gene Ontology Browser
34BioLingua Computational Biology Workbench
BioLisp Scripting Layer
A simple programming language to be used by
biologists to answer specific questions
regarding the integration of their data with the
concepts below.
Standard analytic tools, plus discovery tools
that combine know- ledge and data under user
control.
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
35BioLingua Frame Browser
NewGO (Computer-Usable Content)
NowGO (Human-Usable Content)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39Goals of Computational Biology
Enable biologists to
- Combine knowledge, data, and tools to
- conduct complex analyses.
- simulate complex phenomena.
Model Development
40Model Development
41Model Development as Search in Model Space
Generic Biological Processes
Experimental Data
Biological Constraints
Model Space Search
Model Fitting
Best Model and Parameterization
Possible Models
42Some Generic Biological Processes
Light
Photo Protein
-
PhotoSynthesis
UnControlled Degradation
Product
Redox Potential
Reactive Oxygen Species
-
Controlled Degradation
Product
Control Species
Rate
Control Species
-
Type I Regulation
Protein
RNA
Translation
Rate
Control Species
Type II Regulation
Rate
RNA
-
Transcription
43With no constraints, any genes could be
co-regulated
44How many regulatory models are there?
n300 L4
45How many regulatory models are there?
2 1/2(n - n) L
89700 4
Model Identification n requires 2
observations!
n300 L4
46Model Development Protocol
Generic Biological Processes
Experimental Data
Biological Constraints
Model Space Search
Model Fitting
Best Model and Parameterization
Possible Models
47Genomic Analysis
Regulon Architecture (simplified)
Some promoters have high sequence
similarity, suggesting correlated-regulati
on
48Genomic analysis constrains co-regulation
49Data regularities further constrain probable
co-regulation
50Data regularities further constrain probable
co-regulation
51Regularities based upon predicted correlations
-
NBLA
NBLR
DFR
52AntiCorrelated
-
NBLA
NBLR
DFR
Correlated
AntiCorrelated
53Goals of Computational Biology
Enable biologists to
- Combine knowledge, data, and tools to
- conduct complex analyses.
- simulate complex phenomena.
Model Development
54Model Development Protocol
Generic Biological Processes
Experimental Data
Biological Constraints
Model Space Search
Model Fitting
Best Model and Parameterization
Possible Models
55(No Transcript)
56Model with fitted parameters
57(No Transcript)
58Goals of Computational Biology
Enable biologists to
- Combine knowledge, data, and tools to
- conduct complex analyses.
- simulate complex phenomena.
Model Development
59Model of Photosynthetic Light Regualtion
-
NBLA
NBLR
PBS
-
DFR
Health
psbA1
-
-
-
RR
Photosyntheticactivity
psbA2
-
Light
cpcB
60Model Development Protocol
Biological Process Knowledge
Preliminary Hypotheses
Experimental Data
Model Neighborhood Search
61Neighborhood search limits the search
to subsystems thought to be relevant.
62Model of Photosynthetic Light Regualtion
-
NBLA
NBLR
PBS
-
DFR
Health
psbA1
-
-
-
RR
Photosyntheticactivity
psbA2
-
Light
cpcB
63Adding Regulon Constraints
-
NBLA
NBLR
PBS
-
Regulon Constraints
DFR
Health
psbA1
-
-
-
RR
Photosyntheticactivity
psbA2
-
Light
cpcB
64Adding Biological Process Knowledge
energy
-
NBLA
NBLR
PBS
PBS
damage
Regulon Constraints
-
Health
DFR
Health
psbA1
Signal Cascade
-
-
-
Signal Detection
RR
Photosyntheticactivity
psbA2
-
Light
cpcB
65Model Development Protocol
Biological Process Knowledge
Preliminary Hypotheses
Experimental Data
Model Neighborhood Search
66Improved Regulatory Model
3000 candidates tried
energy
-
NBLA
NBLR
PBS
PBS
damage
Regulon Constraints
-
Health
DFR
Health
psbA1
Signal Cascade
-
-
-
-
-
Signal Detection
RR
Photosyntheticactivity
psbA2
-
Light
cpcB
67Goals of Computational Biology
Enable biologists to
- Combine knowledge, data, and tools to
Analysis andPrediction
- conduct complex analyses.
- simulate complex phenomena.
Model Development
68BioLingua Computational Biology Workbench
- Integrates Genomic and Data Analysis Tools
- Integrates Organism-specific as well as General
Knowledge - Unifies Important Knowledge Bases
- Integrates Model Development and Refinement
tools - Offers a Flexible Open Programming Methodology
- Provides Convenient Universal Access (fully
web-enabled)
69BioLingua Computational Biology Workbench
- Integrates Genomic and Data Analysis Tools
- Integrates Organism-specific as well as General
Knowledge - Unifies Important Knowledge Bases
- Integrates Model Development and Refinement
tools - Offers a Flexible Open Programming Methodology
- Provides Convenient Universal Access (fully
web-enabled) - And 8th-graders can learn it!
70BioLingua
- JP Massar
- Mike Travers
- Stephen Bay
- Devaki Bhaya
- Jeff Elhai
- Bob Haxo
- Sumudu Watagala
- Monica Jain
- Ashvin Kumar
- Pat Langley
- Andrew Pohorille
- Karl Schweighofer
- Colin Smith
- Serdar Uckun
With support from NASA, Franz Inc., and Xanalys
71Cyclodyn Experiments
- Rochelle Labiosa
- Stephen Bay
- Devaki Bhaya
- CJ Tu
- Arthur Grossman
- Tasha Reddy
- Kevin Arrigo
With support from NASA, Stanford, and Arthur and
Kevins Labs
72Discovery Tools
- Pat Langley
- Stephen Bay
- Andrew Pohorille
- Lonnie Chrisman
- Kazumi Saito
- Dileep George
With support from NASA and NTT
73Available reports and papers
JP Massar, M Travers, J Shrager (in prep.)
BioLingua A new paradigm in interactive
computational biology. (And online
http//aracyc.stanford.edu/jshrager/jeff/mbcs/web
listener/index.html) S Bay, J Shrager, A
Pohorille, P Langley (to appear). Revising
regulatory networks From expression data to
linear causal models. J. Biomed. Informatics. K
Saito, D George, S Bay, J Shrager (2003).
Inducing biological models from temporal gene
expression data. Proceedings of the 6th
International Conference on Discovery Systems.
Sapporo, Japan. J Shrager (2003). The fiction of
function. BioInformatics. L Chrisman, et al.
(2003). Incorporating biological knowledge into
evaluation of causal regulatory hypotheses.
Proc. of the Pacific Symposium on Biocomputing
(PSB2003). Hawaii. R Labiosa, et al. (2003).
Diurnal variations in pathways of photosynthetic
carbon fixation in a freshwater
cyanobacterium. Presented at the Euro-American
Geophysical Society meeting Nice, France. P
Langley, J Shrager, K Saito (2002).
Computational discovery of communicable
scientific knowledge. In Magnani,
Nersessian, Pizzi (Eds), Logical and
Computational Aspects of Model-Based Reasoning.
Dordrecht Kluwer Academic. J Shrager, P
Langley, A Pohorille (2002), Guiding revision
of regulatory models with expression data.
Proc. of the Pacific Symposium on BioComputing.
World Scientific Press. J Shrager (2001). High
throughput discovery Search and interpretation
on the path to new drugs. In K. Crowley, et
al. (Eds.) Design for Science. Hillsdale, NJ
Lawrence Erlbaum. pp 325-348.