Title: Luca Cardelli Microsoft Research Cambridge UK http://www.luca.demon.co.uk/BioComputing.htm http://research.microsoft.com/bioinfo
1Luca CardelliMicrosoft ResearchCambridge
UKhttp//www.luca.demon.co.uk/BioComputing.htmh
ttp//research.microsoft.com/bioinfo
Languages for Systems Biology
250 Years of Molecular Cell Biology
- Genes are made of DNA
- Store digital information as sequences of 4
different nucleotides - Direct protein assembly through RNA and the
Genetic Code - Proteins (gt10000) are made of amino acids
- Process signals
- Activate genes
- Move materials
- Catalyze reactions to produce substances
- Control energy production and consumption
- Bootstrapping still a mystery
- DNA, RNA, proteins, membranes are today
interdependent. Not clear who came first - Separation of tasks happened a long time ago
- Not understood, not essential
3Towards Systems Biology
- Biologists now understand many of the cellular
components - A whole team of biologists will typically study a
single protein for years - When each component and each reaction is
understood, the system is understood (?) - But this has not led to understand how the
system works - Behavior comes from complex chains of
interactions between components - Predictive biology and pharmacology still rare
- Synthetic biology still unreliable
- New approach try to understand the system
- Experimentally massive data gathering and data
mining (e.g. Genome projects) - Conceptually modeling and analyzing networks
(i.e. interactions) of components - What kind of a system?
- Just beyond the basic chemistry of energy and
materials processing - Built right out of digital information (DNA)
- Based on information processing for both survival
and evolution - Can we fix it when it breaks?
4Size
5Performance
Pentium II E. Coli
DVD
- 4700 megabytes of memory
- 1.385 megabytes per second
- 1 million macromolecules
- 1 megabyte of static genetic memory
- 1 million amino-acids per second
- 3 million transistors
- 1/4 megabyte of memory
- 100 million operations per second
Comparison courtesy of Eric Winfree
6Aims
- Modeling biological systems.
- By adapting paradigms and techniques developed
for modeling information-processing systems. - Because they have some similar features
- Deep layering of abstractions.
- Complex composition of simpler components.
- Discrete (non-linear) evolution.
- Digital coding of information.
- Reactive information-driven behavior.
- Very high degree of concurrency.
- Emergent behavior (not obvious from part list).
7EU Commission, Health Research Report on
Computational Systems Biology
- General Modelling Requirements
- Research projects should focus on integrated
modelling of several cellular processes leading
to as complete an understanding as possible of
the dynamic behaviour of a cell. Several projects
may be required to develop modules (metabolism,
signalling, trafficking, organelles, cell cycle,
gene expression, replication, cytoskeleton) in
model organisms. This modelling should involve
realistic analysis of experimental data,
including a wide range of data for
transcriptomics, proteomics and functional
genomics, and interactions with cellular pathways
including signal transduction, regulatory
cascades, metabolic pathways etc. It should
involve - Coherent, high-quality, quantitative,
heterogeneous and dynamic data sets as a basis
for novel model constructions to advance from
analytical to predictive modelling. - Experimental functional analysis tools (in-situ
proteomics, protein-protein interactions,
metabolic fluxes, etc)
8Methods
- Applying techniques unique to Computing
- Model Construction (writing things down
precisely) - Studying the notations used in systems biology.
- Devising formal languages to reflect them.
- Studying their dynamics (semantics).
- Model Validation (using models for postdiction
and prediction) - Stochastic Simulation
- Stochastic Quantitative concurrent semantics.
- Based on compositional descriptions.
- Program Analysis
- Control flow analysis
- Causality analysis
- Modelchecking
- Standard, Quantitative, Probabilistic
9Structural Architecture
Nuclear membrane
EukaryoticCell (10100 trillion in human body)
Mitochondria
Membranes everywhere
Golgi
Vesicles
E.R.
Plasma membrane (lt10 of all membranes)
10Functional Architecture
Regulation
The Abstract Machines of Systems Biology
GeneMachine
The hardware (biochemistry) is fairly well
understood.But what is the software that runs
on these machines?
Notations already used in Biology
Nucleotides
Biochemical toolkits
Makes proteins,where/when/howmuch
Holds genome(s),confines regulators
Directs membrane construction and protein
embedding
Signals conditions and events
Model Integration Different time and space scales
Holds receptors, actuators hosts reactions
ProteinMachine
Machine
Membrane
Implements fusion, fission
Aminoacids
Phospholipids
Phospholipids
Metabolism, PropulsionSignal ProcessingMolecular
Transport
ConfinementStorageBulk Transport
111 The Protein Machine
Pretty close to the atoms.
cf. BioCalculus KitanoNagasaki, k-calculus
DanosLaneve
On/Off switches
Each protein has a structure of binary switches
and binding sites. But not all may be always
accessible.
Inaccessible
Protein
Inaccessible
Binding Sites
Switching of accessible switches. - May cause
other switches and binding sites to become
(in)accessible. - May be triggered or inhibited
by nearby specific proteins in specific states.
- Binding on accessible sites.
- May cause other switches and binding sites to
become (in)accessible. - - May be triggered or inhibited by nearby
specific proteins in specific states.
12Molecular Interaction Maps
http//www.cds.caltech.edu/hsauro/index.htm
The p53-Mdm2 and DNA Repair Regulatory Network
JDesigner
Taken from Kurt W. Kohn
132. The Gene Machine
Pretty far from the atoms.
cf. Hybrid Petri Nets Matsuno, Doi, Nagasaki,
Miyano
Positive Regulation
Transcription
Negative Regulation
Input
Output
Coding region
Gene(Stretch of DNA)
External Choice The phage lambda switch
Regulatory region
Regulation of a gene (positive and negative)
influences transcription. The regulatory region
has precise DNA sequences, but not meant for
coding proteins meant for binding
regulators. Transcription produces molecules (RNA
or, through RNA, proteins) that bind to
regulatory region of other genes (or that are
end-products).
Human (and mammalian) Genome Size3Gbp (Giga base
pairs) 750MB _at_ 4bp/Byte (CD) Non-repetitive
1Gbp 250MB In genes 320Mbp 80MB Coding
160Mbp 40MB Protein-coding genes
30,000-40,000 M.Genitalium (smallest true
organism) 580,073bp 145KB (eBook)E.Coli
(bacteria) 4Mbp 1MB (floppy)Yeast (eukarya)
12Mbp 3MB (MP3 song)Wheat 17Gbp 4.25GB (DVD)
14Gene Regulatory Networks
http//strc.herts.ac.uk/bio/maria/NetBuilder/
NetBuilder
153. The Membrane Machine
Very far from the atoms.
Mate
P
Q
P
Q
Mito
Exo
P
P
Q
Q
Endo
16Membrane Transport Algorithms
LDL-Cholesterol Degradation
Protein Production and Secretion
Viral Replication
Taken from MCB p.730
17Process Calculi
- Today we represent, store, and analyze
- Gene sequence data
- Protein structure data
- Metabolic network data
-
- In the long run, how can we represent, store, and
analyze biological processes? - We want to do better than informal circuit
diagrams, or huge list of chemical reactions. - Scalable, precise, dynamic, highly structured,
maintainable representations for systems biology. - Process Calculi
- General formal framework for the description and
analysis of highly concurrent interacting
processes.
18Conclusions
- Identifying the architecture
- Emphasis on architecture, not components
- Modeling the system
- Information-oriented language-based models
- Analyzing the model
- Exploiting techniques unique to computing
- Perturbing, predicting, engineering
The data are accumulating and the computers are
humming, what we are lacking are the words, the
grammar and the syntax of a new language D.
Bray (TIBS 22(9)325-326, 1997)
Although the road ahead is long and winding, it
leads to a future where biology and medicine are
transformed into precision engineering. Hiroaki
Kitano.