Combine knowledge, data, and tools to - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Combine knowledge, data, and tools to

Description:

Integrates Genomic and Data Analysis Tools ... phycocyanin associated linker protein (cpcC) OmpR subfamily. hypothetical protein (atp1) ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 74
Provided by: Revi152
Category:

less

Transcript and Presenter's Notes

Title: Combine knowledge, data, and tools to


1
Goals of Computational Biology
  • Combine knowledge, data, and tools to
  • conduct complex analyses.
  • simulate complex phenomena.
  • propose abstract models.
  • refine specific models.

2
Goals of Computational Biology
  • Combine knowledge, data, and tools to

Analysis andPrediction
  • conduct complex analyses.
  • simulate complex phenomena.
  • propose abstract models.
  • refine specific models.

3
Goals of Computational Biology
  • Combine knowledge, data, and tools to
  • conduct complex analyses.
  • simulate complex phenomena.
  • propose abstract models.
  • refine specific models.

Model Development
4
Experimental Data
Knowledge
BioLingua
Interactive Guidance from Biologists
5
BioLingua Computational Biology Workbench
  • Integrates Genomic and Data Analysis Tools
  • Integrates Organism-specific as well as General
    Knowledge
  • Unifies Important Knowledge Bases
  • Integrates Model Development and Refinement
    tools
  • Offers a Flexible Open Programming Methodology
  • Provides Convenient Universal Access (fully
    web-enabled)

6
BioLingua Computational Biology Workbench
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
7
BioLingua Computational Biology Workbench
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
8
BioLingua Computational Biology Workbench
Standard analytic tools
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
9
BioLingua Computational Biology Workbench
Standard analytic tools, plus discovery tools
that combine know- ledge and data under user
control.
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
10
BioLingua Computational Biology Workbench
BioLisp Scripting Layer
A simple programming language to be used by
biologists to answer specific questions
regarding the integration of their data with the
concepts below.
Standard analytic tools, plus discovery tools
that combine know- ledge and data under user
control.
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
11
BioLisp
(loop for pm4gene in (Genes ProcMed4)
as all-orthologous (all-blast-orthologs
pm4gene) as 6803ortholog (intersect
(Genes Syny6803) all-orthologous) when
(and (not-any member-geneid
(Genes slotv Proc9313)
all-orthologous)) (any
'member-geneID 6803ortholog)
(gt ma-ratio (ma-select 6803ortholog
Hihara1) 2))) collect light-specific-genes
6803ortholog)
  • Biologically Specialized Programming Language
  • Highly efficient (deeply compiled and optimized)
  • Very Concise and Expressive (general purpose)
  • Based upon Lisp
  • The second oldest programming language
  • The standard language of Artificial Intelligence

12
BioLisp
(loop for pm4gene in (Genes ProcMed4)
as all-orthologous (all-blast-orthologs
pm4gene) as 6803ortholog (intersect
(Genes Syny6803) all-orthologous) when
(and (not-any member-geneid
(Genes slotv Proc9313)
all-orthologous)) (any
'member-geneID 6803ortholog)
(gt ma-ratio (ma-select 6803ortholog
Hihara1) 2))) collect light-specific-genes
6803ortholog)
  • Biologically Specialized Programming Language
  • Highly efficient (deeply compiled and optimized)
  • Very Concise and Expressive (general purpose)
  • Based upon Lisp
  • The second oldest programming language
  • The standard language of Artificial Intelligence
  • And 8th-graders can learn it!

13
Goals of Computational Biology
Enable biologists to
  • Combine knowledge, data, and tools to

Analysis andPrediction
  • conduct complex analyses.
  • simulate complex phenomena.
  • propose abstract models.
  • refine specific models.

Model Development
14
Goals of Computational Biology
Enable biologists to
  • Combine knowledge, data, and tools to

Analysis andPrediction
  • conduct complex analyses.
  • simulate complex phenomena.
  • propose abstract models.
  • refine specific models.

15
How do cells control response to light?
I.e., What genes are related to the adaptation
to high light?
Prochlorococcus MED4
Prochlorococcus MIT9313
16
How do cells control response to light?
I.e., What genes are related to the adaptation
to high light?
Outline Protocol
Look for
  • Gene present in Prochlorococcus MED4 MED4 is
    naturally adapted to grow in high light.
  • Ortholog absent in Prochlorococcus MIT9313
    MIT9313 is naturally adapted to grow in low light
  • Ortholog present in Synechocystis PCC 6803
    In order to make contact with annotation and
    microarray data
  • Synechocystis PCC 6803 ortholog responds to high
    light Gene turns on by factor gt 2 in response
    to high light

17
English Protocol
For each gene in ProMed4, Find all the
genes Blast orthologs, Find those from
Syny6803, When there are not any Pro9313
genes in the Blast orthologs, and
there are any the 6803 orthologs and
the expression ratio for the 6803 orthologs
in the Hihara microarray data is gt
2, collect the 6803 orthologs in a list, called
light-specific-genes.
18
BioLingua Computational Biology Workbench
BioLisp Scripting Layer
A simple programming language to be used by
biologists to answer specific questions
regarding the integration of their data with the
concepts below.
Standard analytic tools, plus discovery tools
that combine know- ledge and data under user
control.
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
19
BioLisp Program
(loop for pm4gene in (Genes ProcMed4)
as all-orthologous (all-blast-orthologs
pm4gene) as 6803ortholog (intersect
(Genes Syny6803) all-orthologous) when
(and (not-any member-geneid
(Genes slotv Proc9313)
all-orthologous)) (any
'member-geneID 6803ortholog)
(gt ma-ratio (ma-select 6803ortholog
Hihara1) 2))) collect light-specific-genes
6803ortholog)
By Jeff Elhai
20
For each gene in ProMed4, Find all the
genes Blast orthologs, Find those from
Syny6803, When there are not any Pro9313
genes in the Blast orthologs, and
there are any the 6803 orthologs and
the expression ratio for the 6803 orthologs
in the Hihara microarray data is gt
2, collect the 6803 orthologs in a list, called
light-specific-genes.
(loop for pm4gene in (Genes ProcMed4)
as all-orthologous (all-blast-orthologs
pm4gene) as 6803ortholog (intersect
(Genes Syny6803) all-orthologous) when
(and (not-any member-geneid
(Genes slotv Proc9313)
all-orthologous)) (any
'member-geneID 6803ortholog)
(gt ma-ratio (ma-select 6803ortholog
Hihara1) 2))) collect light-specific-genes
6803ortholog)
21
BioLingua Alpha Platform
(http//nostoc.stanford.edu8002/biologin)
22
Set Light-specific genes
Syny6803sll0990 Formaldehyde
dehydrogenase (glutathione dependent)
Syny6803srl7009 trnR tRNA Arg (UCU)
Syny6803slr1331 Processing protease
Syny6803slr1332 fabF beta ketoacyl acyl
carrier protein synthase
Syny6803sll0337 Sensor histidine
kinase
Syny6803sll0335 Hypothetical
Syny6803sll0789 Response regulator
(OmpR)
Syny6803sll0788 Hypothetical protein
Syny6803sll0576 Putative
epimerase/hydratase
23
Goals of Computational Biology
Enable biologists to
  • Combine knowledge, data, and tools to

Analysis andPrediction
  • conduct complex analyses.
  • simulate complex phenomena.
  • propose abstract models.
  • refine specific models.

24
Cyclodyn Experimental Design
Continuous Culture Turbidostat
Light Levels
Sampling mRNA/cDNA
Time
25
Light
26
P700 apoprotein subunit Ia (psaA) ATP synthase
subunit a (atpI) photosystem I subunit III
(psaF) photosystem II D2 protein (psbD2) sensory
transduction histidine kinasephycocyanin b
subunit (cpcB) phycocyanin a subunit
(cpcA) allophycocyanin a chain (apcA) photosystem
II D1 protein (psbA2) phycocyanin associated
linker protein (cpcC) OmpR subfamily hypothetical
protein (atp1) photosystem II CP43 protein
(psbC) photosystem II D1 protein (psbA3) 50S
ribosomal protein L10 (rpl10) P700 apoprotein
subunit Ia (psaA) ATP synthase subunit a
(atpI) photosystem I subunit III
(psaF) photosystem II D2 protein (psbD2) sensory
transduction histidine kinase
phycocyanin b subunit (cpcB) phycocyanin a
subunit (cpcA) allophycocyanin a chain
(apcA) photosystem II D1 protein
(psbA2) phycocyanin associated linker protein
(cpcC) OmpR subfamily hypothetical protein
(atp1) photosystem II CP43 protein
(psbC) photosystem II D1 protein (psbA3) 50S
ribosomal protein L10 (rpl10)
27
Verbal Model
When "awake" (day) the cell regulates its
photosystem (PS) genes so as to match
photosynthetic output to energy demands. When
the available light exceeds its needs, the PS is
down-regulated, leading to an "M" pattern of
expression. At night, the cell sleeps, leading
to another drop in expression patterns at night.
Graphical Model
28
Computable Model
(photosynthesis isa process with inputs
(chloroplast-inside.water everywhere.light
chloroplast-outside.nadph chloroplast-outside.a
dp chloroplast-outside.pi) outputs
(chloroplast-outside.atp chloroplast-outside.nadph
everywhere.o2) implemented-by
photosystem) (photosystem composition (psii
antenna-array atpase pq-pool)) (light-absorption
isa process with inputs (everywhere.light)
outputs (chlorophyll.energy) function
absorption implemented-by chlorophyll) (light-en
ergy-concentration isa process with outputs
psii.energy driver chlorophyll.energy
function concentration implemented-by
antenna-array) (psii-water-breakdown isa process
with inputs (chloroplast-inside.water) driver
psii.energy outputs (psii.e- psii.e-
chloroplast-inside.h chloroplast-inside.o2)
function molecular-splitting implemented-by
psii) (psii-pq-reduction isa process with
inputs (psii.e- chloroplast-membrane.h
chloroplast-membrane.plastoquinone) outputs
(chloroplast-membrane.plastoquinol) function
reduction implemented-by psii inhibited-by
dcmu)
29
Explanation by Pathway Tracing
(photosynthesis isa process with inputs
(chloroplast-inside.water everywhere.light
chloroplast-outside.nadph chloroplast-outside.a
dp chloroplast-outside.pi) outputs
(chloroplast-outside.atp chloroplast-outside.nadph
everywhere.o2) implemented-by
photosystem) (photosystem composition (psii
antenna-array atpase pq-pool)) (light-absorption
isa process with inputs (everywhere.light)
outputs (chlorophyll.energy) function
absorption implemented-by chlorophyll) (light-en
ergy-concentration isa process with outputs
psii.energy driver chlorophyll.energy
function concentration implemented-by
antenna-array) (psii-water-breakdown isa process
with inputs (chloroplast-inside.water) driver
psii.energy outputs (psii.e- psii.e-
chloroplast-inside.h chloroplast-inside.o2)
function molecular-splitting implemented-by
psii) (psii-pq-reduction isa process with
inputs (psii.e- chloroplast-membrane.h
chloroplast-membrane.plastoquinone) outputs
(chloroplast-membrane.plastoquinol) function
reduction implemented-by psii inhibited-by
dcmu)
30
Explanation by Pathway Tracing
(track-object 'chloroplast-inside.water)Tracking
CHLOROPLAST-INSIDE.WATER -gt PHOTOSYNTHESIS
Tracking CHLOROPLAST-OUTSIDE.ATP Tracking
CHLOROPLAST-OUTSIDE.NADPH Tracking
EVERYWHERE.O2 -gt PSII-WATER-BREAKDOWN
Tracking PSII.E- -gt PSII-PQ-REDUCTION
Tracking CHLOROPLAST-MEMBRANE.PLASTOQUINOL
-gt E-FUNNLING-PSII-TO-PSI Tracking
PSI.E- -gt PSI-NADPH-FORMATION
Tracking CHLOROPLAST-INSIDE.H -gt
ATP-FORMATION Tracking CHLOROPLAST-INSIDE.O2
-gt O2-DIFFUSSION
31
Explanation by Pathway Tracing
(track-object 'chloroplast-inside.water)Tracking
CHLOROPLAST-INSIDE.WATER -gt PHOTOSYNTHESIS
Tracking CHLOROPLAST-OUTSIDE.ATP Tracking
CHLOROPLAST-OUTSIDE.NADPH Tracking
EVERYWHERE.O2 -gt PSII-WATER-BREAKDOWN
Tracking PSII.E- -gt PSII-PQ-REDUCTION
Tracking CHLOROPLAST-MEMBRANE.PLASTOQUINOL
-gt E-FUNNLING-PSII-TO-PSI Tracking
PSI.E- -gt PSI-NADPH-FORMATION
Tracking CHLOROPLAST-INSIDE.H -gt
ATP-FORMATION Tracking CHLOROPLAST-INSIDE.O2
-gt O2-DIFFUSSION
32
Current representational practice
From GenNav, the NIH Gene Ontology Browser
33
Goal Replace the Gene Ontology with Process
Models
(photosynthesis isa process with inputs
(chloroplast-inside.water everywhere.light
chloroplast-outside.nadph chloroplast-outside.a
dp chloroplast-outside.pi) outputs
(chloroplast-outside.atp chloroplast-outside.nadph
everywhere.o2) implemented-by
photosystem) (photosystem composition (psii
antenna-array atpase pq-pool)) (light-absorption
isa process with inputs (everywhere.light)
outputs (chlorophyll.energy) function
absorption implemented-by chlorophyll) (light-en
ergy-concentration isa process with outputs
psii.energy driver chlorophyll.energy
function concentration implemented-by
antenna-array) (psii-water-breakdown isa process
with inputs (chloroplast-inside.water) driver
psii.energy outputs (psii.e- psii.e-
chloroplast-inside.h chloroplast-inside.o2)
function molecular-splitting implemented-by
psii) (psii-pq-reduction isa process with
inputs (psii.e- chloroplast-membrane.h
chloroplast-membrane.plastoquinone) outputs
(chloroplast-membrane.plastoquinol) function
reduction implemented-by psii inhibited-by
dcmu)
From GenNav, the NIH Gene Ontology Browser
34
BioLingua Computational Biology Workbench
BioLisp Scripting Layer
A simple programming language to be used by
biologists to answer specific questions
regarding the integration of their data with the
concepts below.
Standard analytic tools, plus discovery tools
that combine know- ledge and data under user
control.
Computed Concepts Layer
An ever-expanding library of computations that
produce complex, virtual, biological concepts,
such as pathways, complexes, regulons, etc.
Unified Basic Concepts Layer
Structures provided for important
biological concepts e.g., reactions, molecules,
enzymes, experiments, expression-levels, etc.
Integrated K/DB Layer
KEGG
BioCyc
GO
Remote Access Other K/DBs
SMD
Locally mirror important K/DBs
35
BioLingua Frame Browser
NewGO (Computer-Usable Content)
NowGO (Human-Usable Content)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Goals of Computational Biology
Enable biologists to
  • Combine knowledge, data, and tools to
  • conduct complex analyses.
  • simulate complex phenomena.
  • propose abstract models.
  • refine specific models.

Model Development
40
Model Development
41
Model Development as Search in Model Space
Generic Biological Processes
Experimental Data
Biological Constraints
Model Space Search
Model Fitting
Best Model and Parameterization
Possible Models
42
Some Generic Biological Processes
Light
Photo Protein
-
PhotoSynthesis
UnControlled Degradation
Product
Redox Potential
Reactive Oxygen Species
-
Controlled Degradation
Product
Control Species
Rate
Control Species
-
Type I Regulation
Protein
RNA
Translation
Rate
Control Species
Type II Regulation
Rate
RNA
-
Transcription
43
With no constraints, any genes could be
co-regulated
44
How many regulatory models are there?
n300 L4
45
How many regulatory models are there?
2 1/2(n - n) L
89700 4
Model Identification n requires 2
observations!
n300 L4
46
Model Development Protocol
Generic Biological Processes
Experimental Data
Biological Constraints
Model Space Search
Model Fitting
Best Model and Parameterization
Possible Models
47
Genomic Analysis
Regulon Architecture (simplified)
Some promoters have high sequence
similarity, suggesting correlated-regulati
on
48
Genomic analysis constrains co-regulation
49
Data regularities further constrain probable
co-regulation
50
Data regularities further constrain probable
co-regulation
51
Regularities based upon predicted correlations
-

NBLA
NBLR
DFR
52
AntiCorrelated
-

NBLA
NBLR
DFR
Correlated
AntiCorrelated
53
Goals of Computational Biology
Enable biologists to
  • Combine knowledge, data, and tools to
  • conduct complex analyses.
  • simulate complex phenomena.
  • propose abstract models.
  • refine specific models.

Model Development
54
Model Development Protocol
Generic Biological Processes
Experimental Data
Biological Constraints
Model Space Search
Model Fitting
Best Model and Parameterization
Possible Models
55
(No Transcript)
56
Model with fitted parameters
57
(No Transcript)
58
Goals of Computational Biology
Enable biologists to
  • Combine knowledge, data, and tools to
  • conduct complex analyses.
  • simulate complex phenomena.
  • propose abstract models.
  • refine specific models.

Model Development
59
Model of Photosynthetic Light Regualtion

-
NBLA
NBLR
PBS

-
DFR
Health
psbA1

-
-

-

RR
Photosyntheticactivity
psbA2

-
Light
cpcB
60
Model Development Protocol
Biological Process Knowledge
Preliminary Hypotheses
Experimental Data
Model Neighborhood Search
61
Neighborhood search limits the search
to subsystems thought to be relevant.
62
Model of Photosynthetic Light Regualtion

-
NBLA
NBLR
PBS

-
DFR
Health
psbA1

-
-

-

RR
Photosyntheticactivity
psbA2

-
Light
cpcB
63
Adding Regulon Constraints

-
NBLA
NBLR
PBS

-
Regulon Constraints
DFR
Health
psbA1

-
-

-

RR
Photosyntheticactivity
psbA2

-
Light
cpcB
64
Adding Biological Process Knowledge
energy


-

NBLA
NBLR
PBS
PBS
damage

Regulon Constraints
-
Health
DFR
Health
psbA1
Signal Cascade

-
-

-

Signal Detection
RR
Photosyntheticactivity
psbA2

-
Light
cpcB
65
Model Development Protocol
Biological Process Knowledge
Preliminary Hypotheses
Experimental Data
Model Neighborhood Search
66
Improved Regulatory Model
3000 candidates tried
energy


-

NBLA
NBLR
PBS
PBS
damage

Regulon Constraints
-
Health
DFR
Health
psbA1
Signal Cascade

-
-
-
-
-

Signal Detection
RR
Photosyntheticactivity
psbA2

-
Light
cpcB
67
Goals of Computational Biology
Enable biologists to
  • Combine knowledge, data, and tools to

Analysis andPrediction
  • conduct complex analyses.
  • simulate complex phenomena.
  • propose abstract models.
  • refine specific models.

Model Development
68
BioLingua Computational Biology Workbench
  • Integrates Genomic and Data Analysis Tools
  • Integrates Organism-specific as well as General
    Knowledge
  • Unifies Important Knowledge Bases
  • Integrates Model Development and Refinement
    tools
  • Offers a Flexible Open Programming Methodology
  • Provides Convenient Universal Access (fully
    web-enabled)

69
BioLingua Computational Biology Workbench
  • Integrates Genomic and Data Analysis Tools
  • Integrates Organism-specific as well as General
    Knowledge
  • Unifies Important Knowledge Bases
  • Integrates Model Development and Refinement
    tools
  • Offers a Flexible Open Programming Methodology
  • Provides Convenient Universal Access (fully
    web-enabled)
  • And 8th-graders can learn it!

70
BioLingua
  • JP Massar
  • Mike Travers
  • Stephen Bay
  • Devaki Bhaya
  • Jeff Elhai
  • Bob Haxo
  • Sumudu Watagala
  • Monica Jain
  • Ashvin Kumar
  • Pat Langley
  • Andrew Pohorille
  • Karl Schweighofer
  • Colin Smith
  • Serdar Uckun

With support from NASA, Franz Inc., and Xanalys
71
Cyclodyn Experiments
  • Rochelle Labiosa
  • Stephen Bay
  • Devaki Bhaya
  • CJ Tu
  • Arthur Grossman
  • Tasha Reddy
  • Kevin Arrigo

With support from NASA, Stanford, and Arthur and
Kevins Labs
72
Discovery Tools
  • Pat Langley
  • Stephen Bay
  • Andrew Pohorille
  • Lonnie Chrisman
  • Kazumi Saito
  • Dileep George

With support from NASA and NTT
73
Available reports and papers
JP Massar, M Travers, J Shrager (in prep.)
BioLingua A new paradigm in interactive
computational biology. (And online
http//aracyc.stanford.edu/jshrager/jeff/mbcs/web
listener/index.html) S Bay, J Shrager, A
Pohorille, P Langley (to appear). Revising
regulatory networks From expression data to
linear causal models. J. Biomed. Informatics. K
Saito, D George, S Bay, J Shrager (2003).
Inducing biological models from temporal gene
expression data. Proceedings of the 6th
International Conference on Discovery Systems.
Sapporo, Japan. J Shrager (2003). The fiction of
function. BioInformatics. L Chrisman, et al.
(2003). Incorporating biological knowledge into
evaluation of causal regulatory hypotheses.
Proc. of the Pacific Symposium on Biocomputing
(PSB2003). Hawaii. R Labiosa, et al. (2003).
Diurnal variations in pathways of photosynthetic
carbon fixation in a freshwater
cyanobacterium. Presented at the Euro-American
Geophysical Society meeting Nice, France. P
Langley, J Shrager, K Saito (2002).
Computational discovery of communicable
scientific knowledge. In Magnani,
Nersessian, Pizzi (Eds), Logical and
Computational Aspects of Model-Based Reasoning.
Dordrecht Kluwer Academic. J Shrager, P
Langley, A Pohorille (2002), Guiding revision
of regulatory models with expression data.
Proc. of the Pacific Symposium on BioComputing.
World Scientific Press. J Shrager (2001). High
throughput discovery Search and interpretation
on the path to new drugs. In K. Crowley, et
al. (Eds.) Design for Science. Hillsdale, NJ
Lawrence Erlbaum. pp 325-348.
Write a Comment
User Comments (0)
About PowerShow.com