Title: Domain-specific Languages for Cellular Interactions
1Domain-specific Languages for Cellular
Interactions
- Bill Harrison
- Department of Computer Science
- University of Missouri at Columbia
This work partially supported by NIH1 R0l
GM62920-04A1, NIH1 P20 GM065762-01A1, the
Georgia Research Alliance and the Georgia Cancer
Coalition.
2Domain-specific Languages for Cellular
Interactions
- Bill Harrison
- Department of Computer Science
- University of Missouri at Columbia
meow!
This work partially supported by NIH1 R0l
GM62920-04A1, NIH1 P20 GM065762-01A1, the
Georgia Research Alliance and the Georgia Cancer
Coalition.
3- Ph.D 2001, UIUC
- Thesis Modular Compilers and Their Correctness
Proofs - Thesis Advisor Sam Kamin
- Post-doc, Oregon Graduate Inst. (OGI)
- Three years on Programatica Project
- using Haskell programming language as basis for
formal methods - Assistant Professor, University of
Missouri-Columbia since Fall 2003
4Systems Biology asks
- Can static biological structure be related to
dynamic biological behavior with mathematical
clarity, precision, rigor? - Can biological systems be viewed as the sum of
their parts? - Can component-level models be integrated into
precise system-level models of biological
behavior? - What techniques from Mathematics and Computer
Science apply to this composition problem?
5Rhodobacter Sphaeroides
- Photosynthetic bacterium
- seeks out regions of greater light
- Roughly the size of wavelength of light
- cannot sense local light differences directly
- ? applies random walk
6Simulations of Biological Systems
- Simulations provide qualitative feedback, but are
not models per se - how accurate/faithful is a simulation?
- what does the feedback mean?
- can one reason about the biological phenomenon
based on the simulation? - can you identify the biology by inspecting the
text of the simulation program?
7R. Sphaeroides in C
- contains ?1000 LOC
- to understand requires
- expertise in C
- and biological model
- and critical system details
- e.g., how is concurrency implemented?
bool global_stateregister_state(void
apointer) if( number_of_states
mother_of_all_states.size())
mother_of_all_states.resize(number_of_states
1000) mother_of_all_statesnumber_of_states
apointer return true
8R. Sphaeroides in C
- Program structure does not reflect biological
model - can you look at the source code and recognize the
underlying biology? - ? difficult to comprehend
- and write correctly
- and modify
- and maintain
- and re-use
bool global_stateregister_state(void
apointer) if( number_of_states
mother_of_all_states.size())
mother_of_all_states.resize(number_of_states
1000) mother_of_all_statesnumber_of_states
apointer return true
9System Biology as Programming Language Design
- The Problem
- General-purpose programming languages do not have
the right vocabulary - Biological model Concurrent Markov chains
- C classes, pointers, etc.
- nor are they mathematics
- Our Solution Design small, special purpose
languages with exactly the right vocabulary - called a Domain-specific Language (DSL)
Sheard99,Thiemann01,Leijen01 - Mathematical semantics of DSLs gives formal model
of biology
10Language Model of R. Sphaeroides
cell1 celln
Executing
Produces animation
11Outline
- Language Design and Domain-specific Languages
- design, definition, and implementation
- Systems Biology as Language Design
- Case Study for Rhodobacter Sphaeroides
- Design what are the appropriate abstractions for
R. Sphaeroides? - Definition how do we specify exactly what R.
Sphaeroides programs mean? - Implementation how do we run R. Sphaeroides
programs? - Conclusions
12Cardinal Rule of Language Design
Application Programmers should choose languages
with abstractions most suited to their
task Language designers must provide languages
with those abstractions
Domain
Central Activities
Reasonable Language
System Programming
bit-fiddling
C
Artificial Intelligence
List processing
LISP
System Admin.
Text processing, etc.
PERL
13DSLs are small languages w/ domain abstractions
Ex Parsec Parser DSL
BNF for language
ltStmtgt ? ltidentgt ltExprgt
translates directly
assignStmt Parser Stmt assignStmt do id ?
ident symbol ""
s ? Expr return (Assign id s)
Parsec code
14Why a language and not a library?
- The Slogan What is excluded from a DSL is as
important as what is included in it - libraries in a general-purpose language still
require - considerable expertise self-discipline on the
part of the programmer - Lack of generality in DSL ? fewer things to go
wrong - DSL may have desirable properties that a
general-purpose language will not - e.g., implementation techniques specialized to
DSL that do not apply to general-purpose
languages - small size makes rigorous specification tractable
15DSL Design
- DSL design for R. Sphaeroides
- what are our domain abstractions?
- How does this organism behave?
- What modeling techniques are used by biologists
to describe this behavior?
16Bacterial Commands
laze
die
adjust speed
grow
divide
tumble
Probability of growth varies with light
concentration
17Chapman-Kolmogorov Equation
probability of being in state m
probability of transition from i to j
Commonly used framework for modeling
biological systems Bremaud99, Dailey02, Mao02,
Shah00
18Chapman-Kolmogorov Equation
A row in the above matrix encodes the transition
function from state i of a Markov chain
19Bacteria as Markov Chains
- non-deter. state machines with probabilistic
transitions - induced by the Chapman-Kolmogorov equation
- Pi,j in terms of environmental factors, organism
- state, etc.
- executing concurrently
20Domain Abstractions for R. Sphaeroides
- Individual cells Markov-chain abstraction
- choose
- P1 ? Action1
-
- Pn ? Actionn
- Actions Tumble, Divide, AdjSpeed, Laze, Grow,
etc. - Concurrency cell1 cell2
- Environmental Factors light, size
21Abstract syntax for CellSys
- choose is our principal domain abstraction
- behaves like the Markov chain transition function
- Cell-level environment variables light, size
22DSL Definition
- Background Programming languages are
collections of effects - Java OO Threads State
- LISP Higher-order Functions
- Prolog Backtracking
- Corresponding to each such effect is an algebraic
construction called a monad - used for the development of modular semantic
theories of programming languages Moggi89 - monads may be constructed using monad
transformers
23Periodic Table of Effects
- Prog. languages are collections of effects
captured as monads Moggi - Monads assembled from constructors (monad
transformers) - Our view Systems are collections of effects
captured as monads - Systems broadly construed
- Compilers Harrison00,98,01,02,
- Secure system software Harrison05,03, and
- Biology Harrison04
24Periodic Table of Effects
- Mathematical definitions for any language created
by combining MTs - CellSys StateT ResT ProbT ReactT
- Such definitions are flexible
- modular, extensible, and easily refactored
25DSL definition similar to traditional RTS
- In a traditional RTS
- threads request services like
- send a message
- output on device
- consume resource
- RTS mediates
- ensuring that the threads do not interfere
- global system state remains consistent
- schedules threads
threads
26High-level view of definition
- In CellSys
- Cells are threads with physical components as
well - size, velocity,
- cells request services like
- consume nutrients
- move me here
- want to divide
- GE mediates like RTS, also
- preserves physical integrity
- updates global world view
- performs scheduling
cells
27DSL Implementation
- Because CellSys defined in terms of monad
transformers, may be implemented directly as
Haskell program - I.e., monadic language definition may be
transcribed symbol for symbol into Haskell - Haskell implementation easily instrumented to
output system snapshots - prints out snapshots in POV (Persistence of
Vision) format converted into MPEG
28Q What are appropriate languages for modeling?
- Integrate techniques from programming languages
- models of concurrency
- language semantics
- i.e., precise, mathematical language definitions
- efficient language implementation
- into special purpose language called a
Domain-Specific Language - abstractions taken directly from biology
- ? comprehensible by biologists
- DSLs and DSL programs
- hide technical details irrelevant/uninteresting
to biologists - are tunable by computer scientist to reflect
discovery/refinement - execute to provide reality check by biologists
29Bioinformatics Computer Science Biology
Computer Science
Biology
- models of concurrency
- efficient implementation
- mathematical models of programs
- reasoning about programs
- organism structure behavior
- modeling techniques
- cellular automata
- systems of PDEs
- numerical techniques
?
?
Hard Problem How do you effect a technology
transfer from CS ? Biology?
30Interdisciplinary Process
CellSys (version 1.0)
Biologist evaluates DSL model for accuracy,
expressiveness, etc.
feedback/discussion
Language expert refactors language as needed
CellSys (version 2.0)
31Summary
Large body of work providing domain abstractions
models
Comprehensibility, Reusability, Ease of Use
systems biology
domain specific languages
modular monadic semantics
Precise description of biological phenomena
through DSL semantics
Harrison Harrison, Domain Specific Languages
for Cellular Interactions in Proceedings of the
International Conference IEEE Engineering in
Medicine and Biology, 2004.