Title: Virtual Seminars on Genomics and Bioinformatics
1Virtual Seminars on Genomics and
Bioinformatics Sharing Knowledge with the
World www.VirtualGenomics.Org
Presenting Val Bykoski Boston University and
Virtek, Inc.
2(No Transcript)
3H. Bolouri E. Davidson
4- Dr. Val Bykoski
- MS in Bioorganic Chemistry from Moscow Technology
University - PhD in Physics and Applied Math from Russian
Academy of Sciences. - Institute for Chemical Physics, Moscow
- National Institute for Control Problems, Moscow,
- Royal Society, London, Guest Professor lecturing
at six UK Universities - He was awarded the International Norbert Wiener
Prize - Boston University as a Faculty Research Professor
- University of Massachusetts at Lowell (CS and EE
Departments) - University of New Hampshire
- Consultant for Raytheon, Xerox, and EMC,
- Published over 100 papers and has 7 patents.
5Understanding Cell Dynamics via Data Analysis
- Val Bykoski
- valb_at_cns.bu.edu
- Boston University and Virtek, Inc.
- Boston University Grid Access Facility
- December 18, 2003
6Plan
- Motivation and Goal
- Existing Approaches to Cell Modeling
- Data Used to Build Models
- How Data Build a Model
- Next-Generation Data-Driven Models
- Cell Infrastructure
- Results
- Future Directions, Conclusion
7Dynamic Genome
- "In the future attention undoubtedly will be
centered on genome, and with greater appreciation
of its significance as a highly sensitive organ
of the cell, monitoring genomic activities and
correcting common errors, sensing the unusual and
unexpected events, and responding to them, often
by restructuring the genome. -
- Barbara McClintock, Significance of the Genome
Responses to Challenges, Nobel Lecture, December
8, 1983
8Knowledge-gain-and-use Genome
- A goal for the future would be to determine the
extent of knowledge the cell has of itself, and
how it utilizes this knowledge in a 'thoughtful'
manner when challenged. - Barbara McClintock, ibid.
9Motivation
- Current models have drawbacks
- sequence-based
- no realistic substrate, no temporal organization
- So, current understanding is limited
- Thus a bottleneck in
- understanding diseases, their dynamics, their
diagnostic and curing - in drug discovery technologies
10Goal Understanding Cell Dynamics in Env
- How a cell recognizes stimuli and responds
- How response structures are integrated into
genome - How stress-induced response develops
- How unanticipated challenges are handled
- Next-gen models should provide
- - structured cell substrate
- - mechanisms for spatial and temporal control
- - knowledge-gain-and-use mechanism
11Existing Cell Models Pros and Cons
- Kinetic models (such as eCell, MCell)
- can handle many 100s metabolites
- But cell infrastructure is missing
- realistic cell substrate
- spatial context where things occur
- temporal context when things occur
- interrelation context how things are related
12Existing Models (2)
- Advanced approaches
- - Modules - integrated model units
- - Synthetic biol. systems (phiX174, 5386 bp)
- Existing models are unrealistically
simple/limited - - Critical bottleneck in drug discovery efforts
- - Need refinement substrate and processes
- Next step building more realistic models using
data and model-building automation
13What is Data? Biology is Data
Taken in Context
- To talk, data needs to be related to biomedical
context - Then, it is piece of logic, a unit to build
models - Datapairs need to be representative
- - to cover a range of contexts, to be biol.
reasonable - Context is the quality of d-model, its resolving
power - - sample prep, staining/fixing methods,
conditions, etc. - To predict, datapairs need to be generalized
14How Data Build a Model?
- How physics builds its models
- - takes data in their context - datapairs
- - generalizes datapairs - common features
- - as analytical model yMod(x,T) or
- - as a table with interpolation yInt(x,T)
- Model and Data are equivalent, if taken entirely
15How Data Build a Model (2)
- Biology cannot afford analytical models, only
data-driven - Start a generic cell/framework/model
- Generic model gets trained by datapairs
- a highly specific model with more data
- a less specific model with less data
- Cell model gets updated with each new data(!)
16Building Genome to Balance Inverted Pendulum (how
data can build a model genome)
- Initially a structureless substrate, no model
- Data source
- - a permanently falling down pendulum
- Datapairs ltpState, Ctrgt or ltS, Rgt
- S and R get associated into a SR-pattern
- Multiple SRs overlap/form genome G ?SiRiT
- Which recognizes new S and build a new R
17Genome SR-Dynamics Quality of Control
A
c
c
?
B
c
?
Angle (S) and control (R) time diagrams for
inverted pendulum
18Building a Genome to Balance Pendulum
Genome
G ?Sp Rp
S
?
Env
R
RGS (?SR)S
19So What?
- Online model building
- Datapairs train aCell to balance
- Trial-and-error process, errors ignored
- Only successful SRs overlapped/generalized
- Response structure, a genome, emerges
- Space-variance gets created by data
-
20Next-Generation Data-Driven Cell Models
- Complex models can not be hand-built David
Waltz - Biol models too complex to be built by hand
- The only choice is a model driven/built by data
- Computer is able to generalize zillions datapairs
- So, no complexity limit for data-driven models
- And so model can be built using more realistic
cell architecture
21A Generic Framework to Build Models
- Is a generic cell (like stem cells)
- Its substrate is aperiodic structure, aCell
(Fig.) - aCell is built from space-variant BIOuniTs, biots
- Biot Ctr-MGLNAP, CtrH-bond Control,
- MGLNAP
- MetalGlycoLipoNucleoAminoacidPhosphate,
- Model is a generalized/overlapped set of SRs
- After training, it is a customized model, theCell
22Simple examples of aperiodic structures
Aperiodic structures obtained by interference
23Env Builds and Updates a Cell
- Env is the driving force
- (in phylogenetic timeframe )
- aCell is a container for distributed SR-patterns
- It gets updated incrementally by Env
- - nothing gets built from scratch(!)
- It has capability to reason, i.e. to smooth
responses over whole genome - (B. McClintock)
24Cell Reasoning/Interpolation Capability
- Schematics of interpolation (smoothing) in
S-space
y (responses)
new output
y2
Y Int(x,T), T xi,yi
yN
y1
X (stimuli)
x
xN
x2
x1
new
Stimuli (x) and responses (y) spaces
25Genome Response Dynamics
Genome G, or knowledge base KB
If S?Sk is arbitrary stimulus, as, e.g.,
heat-shock
26Chain Response to a Stimulus
A Chain training/folding 1?1/2?2/3?3/4? A?A/B?B
/C?C/D? B Response/unfolding a?b?c? I? 56I?2
3 D?
A
B
Genome may generate chain responses
27Cell Infrastructure Observations
- structured cell substrate
- dynamic knowledge-driven genome (B.McClintock)
- senses events, gains knowledge, restructures
itself - uses knowledge to build responses -
- "in a truly remarkable way"
- Figs. follow
28Cilia cross-section
29Microtubules of various origin
30Cilium architecture
31Cilia cross-section
32What is Generic Cell Substrate
- Aperiodic crystal, aCell 3D sequence of units
- biot Ctr-MGLNAP or
- Ctr-MetalGlycoLipoNucleoAminoacidPhosphate
- Ctr is H bonds 2(in AT) or 3(in GC)
- A generic cell architecture, aCellpolybiot,
- has a rigid 3D core poly-MGLNAP
- has a 3D plasticity substrate, H-bond lattice
- DNA, RNAs, proteins, etc. are just poly-X, X
-
33aCell Radial Structure
- Control matrix Mxyz (b) generates space-var units
- Mxyz (b) const equifunctional surfaces
(membranes) - ?Mxyz (b) const orthogonal to equifunctional
ones - Radial polyB Mr(b) generates a radial sequence
DNA xRNAy Proteins
...
b1
b2
b3
bN
Nucleuslt--gtCytoplasm-Cell Wall//Env
34Mutation Control via H-bonds
Eel
N
H
H
N
O
O
Tautomeric transition in H-bond in AT and GC base
pairs (WC-1953), which control complementarity,
and thus replication
35Guanin-uracil bp with H-bonds
36Processes Involved in Cell-Env Interaction
- Cell is a phylogenetic history of its Env
- It senses stimuli S, builds SR-patterns and
responses R - It accumulates the set of SR-patterns, a genome
- It generalizes SR patterns by spatial overlapping
- It builds responses by smoothing the gained SR
knowledge, the genome knowledge
37Env Control Inbound and Outbound Signals
- for aCell substrate, the signals are wavefronts
- They are coupled with cells electronic state
- Inbound (S) and Outbound (R) signals interfere to
form SR-patterns - which are distributed spatially over the cell
- SR-pattern maxima move Hs in the H-bond lattice
- H-lattice controls mutations (WC-53) and so
expression
38Genome gets built-up incrementally
- Env changes drive incremental genome update via
cell electronic state - It is the first (fastest) line of response
- Cell electronic state, its red-ox metabolic
electrons, is a spatial organizer - It controls the mol. conformation, including the
H-bond lattice (Hellman-Feynman theorem)
39Examples of Simple Electronic States
Benzene Molecule C6 H6
Ground (top) and few excited electronic states
John Bernal "Life is an expression of
potentialities of atomic electronic
states", Origin of Life, 1967
40Novelty-Driven Genome Update
- Spatial overlap of SR-patterns reinforce common
features - Permanent in time SR-patterns reinforce a central
core, a nucleus it is NOT a junk - Novelty elements get into peripheral layers
- This is novelty-driven reinforcement
- new cell space for new stimuli only
- space-division multiplexing
- old stimuli just reinforce existing SR-patterns
41Signal Dynamics
- Signal dynamic emerges by reinforcing repeated
signal patterns - from amorphous wavefront dynamics to
- focused signal paths to developed nervous system
42Summary of Results SR-Dynamics
- Cell dynamics is driven by Env via genome
- Stimulus S triggers the associated response R via
SR-pattern - Each SR-pattern is a result of genome
restructuring to match S-challenge - Each SR-pattern is distributed over the cell
- Each SR-pattern integrates many genes
- Each SR-pattern provides a fully-loaded response
R to S
43Summary (2) aCell
- Cell is an aperiodic crystal, aCell
- aCell is generic framework to build models
- aCell is based on space-variant BIOuniT, biot
- Datapairs customize the framework into theCell
- Signals are wavefronts and may interfere
- Signal paths emerge in genome building
44Summary (3) Genome Dynamics
- Genome is a spatially overlapped set of
SR-patterns, a knowledge base - Each stimulus S incrementally updates genome
- Genome can be sliced associatively by changing
S
45Summary (4) Hi-speed vs Low-speed Env
- Env controls flexibility/specificity ratio
- In hi-speed, novelty-enriched Env
- - loss of protein components
- - enrichment by hi-speed invasive crystal-like
component - - stress-induced crystallization
- In low-speed, highly specific Env
- - loss of nucleic components
- - enrichment by hi-specific protein components
- - env-driven protein diversification
46Future Directions Next-Gen Methods to Control
Cell Dynamics
- Computer-engineered stimuli and structures
- generic X-shock-like stimuli (dX)
- physical nanosubstrates, optical patterns
- species-sensitive structures
- as drugs and modifiers
47Hg-sensitive optical element (lens)
Multi-layer Hg-sensitive structure computed using
Hg spectrum (US EPA database)
48Response of the Hg element
Selective response peaks match the original Hg
lines
49Hg-specific focusing function
Hg-specific lens focuses only light from Hg lamp
50Open Source Project
- The open source project aCell (aCell.sf.net) is
setup. - The contributors are welcome to join the project
- email valb_at_cns.bu.edu
- Recent publications
- 1. A Control and Expression Framework for Data
Analysis in Bioinformatics - http//www.omg.org/lsr/oibc/abstracts/bykoski.doc
- 2. Emergence of Genetic Code and Cell
Infrastructure - http//midas-10.cs.ndsu.nodak.edu/bio/papers/Gene
tic_Code_Corr.doc
51Acknowledgments
- This research is work in progress, and was
performed during a long period of time. - Discussions with
- Prof. H.C.Longuet-Higgins, FRS,
- Prof. Herbert Froelich, FRS,
- Prof. Simon Shnol of Moscow University
- Prof. Evgeny Nikitin of Technion University,
Israel, - Prof. Simon Berkovich of Georgetown
University, - Prof. Lev Levitin of Boston University
- are thankfully acknowledged.
- Computer-engineered species-sensitive optical
structures have been - developed jointly with Prof. Mike Fiddy,
UMass/Lowell. - NSF support for this research is thankfully
acknowledged