Coding for DNA Computing: Combinatorial and Biophysical Aspects - PowerPoint PPT Presentation

About This Presentation
Title:

Coding for DNA Computing: Combinatorial and Biophysical Aspects

Description:

(Presence of Long Subsequence and its Reverse Complement Lead to Stabilization ) ... Hamming, Reverse Complement Hamming Distance at Least d. Longest Length ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 40
Provided by: mile171
Learn more at: https://cnls.lanl.gov
Category:

less

Transcript and Presenter's Notes

Title: Coding for DNA Computing: Combinatorial and Biophysical Aspects


1
Coding for DNA ComputingCombinatorial and
Biophysical Aspects
  • Olgica Milenkovic
  • University of Colorado, Boulder
  • A Joint Work with Navin Kashyap
  • Queens University, Kingston

2
LDPC ITERATIVE DECODING
3
Outline
  • The DNA Computing Paradigm
  • Applications
  • Error-Control Coding for DNA Computing
  • Constrained Coding DNA Secondary and Tertiary
    Structure
  • Statistical Mechanics of DNA/RNA Folding
  • Results and Open Problems

4
Molecular Biology Terminology
  • DNA Double Helix
  • Watson-Crick Complements A?T, G ?C, T ?A, C ?G
  • RNA Single-Stranded, T Replaced by U
  • Helix Denaturation (Ambient Temperature Governed)
  • DNA Oligonucleotide Sequences
  • DNA Hybridization
  • DNA Enzymes Functional Proteins Operating on DNA

5
DNA Computing Adlemans Experiment (1994)
The Problem An Unremarkable Instance of the
Directed Traveling Salesmen Problem on a Graph
with Seven Nodes Figures from Adleman, SA 1998
The Method Remarkable Oligonucleotide DNA
Hybridization Technique
Miami (CTACGG) NY (ATGCCG) Route
(Edge) Second Half of Codeword for Miami (CGG)
and First Half of Codeword for NY (ATG) CGGATG
--- Take the Complement of this Word GCCTAC
6
DNA Computing The Benefits
  • Not a von Neumann Architecture Stochastic
    Mechanism with Massive Parallelism 1/50th of
    Teaspoon, 1014paths/1s
  • Extremely Low Power Consumption 1 Joule for 2
    1019 Operations
  • Storage Capacity Vol(1g of DNA)1cm3 ,
    Information1 trillion CDs
  • 18Mb/inch of Length (0.35nm Between Base Pairs)
  • Versatility of Applications, Only Plausible
    Option in Many Cases
  • Drawbacks First Implementations not Interactive
  • 3-Day Processing Delay
  • VERY LOW RELIABILITY OF COMPUTATION

7
Applications of DNA Computers
  • Combinatorial Problems
  • Directed Traveling Salesmen (Adleman 94)
  • 3-SAT (Braich et.al., 02)
  • Input a 20-Variable, 24-Clause, Boolean
    Function
  • 3-Conjunctive Normal Form (3-CNF)
  • For each Variable, two Length15 DNA Sequences
  • Assigned, one representing the Variable,
  • the other representing its Complement
  • Operon Technology, Alameda, CA,
  • Integrated DNA Technologies, Skokie, IL
  • Non-Attacking Knights (Faulhammer, 00)
  • Configurations of Knights that can be Placed on
    nn Chess Board so that no Knight is Attacking
    any other Knight on the Board

Figure
8
Novel Designs of DNA Computers
  • DNA Logic and Automata Interactive Systems
  • DNA Transistors (Stojanovic, Stefanovic 03)
  • DNA Game-Playing Machines (Stojanovic, Stefanovic
    03)
  • MAYA Consists of Nine Wells (Tubes)
    Representing the 3x3 Tic-Tac-Toe Board
  • Tubes Contain Mixtures of Enzymes Network of 23
    Molecular Logic Gates
  • Human Player has Nine Different DNA Strands
    each Specific to one Square on the Board Player
    Selects one Square to Play DNA Strand
    representing that Square gets Added to all the
    Nine Wells
  • O
  • MAYA Analyzes Play Through Biochemical
    Reactions Occurring in Wells

9
Applications of DNA Computers
  • Meet MAYA(Stojanovic, Stefanovic 2003)

Figure http//www.cs.unm.edu/bandrews/ttt-applet
/
10
Applications of DNA Computers
  • The Killer Application SMART DRUGS
  • E. Shapiro et.al. (Weizmann Institute, Israel),
    Nature, Science 2003
  • Quintana et.al 2002
  • In Vitro DNA-Based Computer Programmed to
    Diagnose Cancer and Order Self-Destruction of
    Cells
  • Identifies RNA Cancer Fingerprint Molecules
  • Cancer Leaves its own Chemical Fingerprint in
    the Body, Including Over-Producing or
    Under-Producing Specific RNA Sequences
  • (Analysis Based on Regulatory Networks of Gene
    Interactions, Shmulevich et. al., 2002)
  • (Milenkovic and Vasic, DIMACS2004, ITW2004)
  • Software DNA, Hardware DNA Enzymes
  • Responds Appropriately by Releasing Short,
    Active DNA Strand
  • Interferes with Tumors by Suppressing Key Cancer
    Genes, Making Diseased Cells Self-Destruct
  • Experiments Prostate and Lung Cancer Cells

11
Applications of DNA Computers
  • Sensing, Storing, Nano-Scale Mechanics
  • Biosensing DNA Fingerprinting of
    Bacteria/Viruses, Roco et.al. 2004
  • DNA-Based Storage Systems Mansuripur et.al.,
    DIMACS2004
  • Nucleic Acid Nanostructures and Topology, DNA
    Self-Assembly, DNA Nanoscale Mechanical Devices,
    Seeman et.al. 1998-2002

RELIABILITY ISSUES FOR ALL DESCRIBED SYSTEMS
UNRESOLVED
Error Control Coding Constrained Coding Graph
Theory/Combinatorics/Pseudo-Knot
Theory Statistical Mechanics
12
The Biggest Obstacles
  • DNA Oligonucleotide Secondary and Tertiary
    Structure Formation
  • Unwanted Hybridization

DNA Oligonucleotide Sequences are Chemically
Active, Tend to Assume Thermodynamically Most
Stable Form! DNA Sequences can Bind to Partially
Complementary Sequences as Well!
13
DNA/RNA Secondary and Tertiary Structure
Secondary Structure
Pseudoknots (Tertiary Structure)
Mneimneh, 2003 (Figures from Web Lecture Notes)
14
DNA Hairpins
  • DNA/RNA Hairpin Structure Participate in
    Important Biological Functions
  • Regulation of Gene Expression (Zazopoulos, et.
    al., 1997)
  • DNA Recombination (Froelich-Ammon, et. al.,
    1994)
  • Facilitation of Mutagenic Events (Trinh and
    Sinden, 1993) in Living Cell, after Breaking of
    Intermolecular Pairing in Double Helix DNA, Loose
    Strands Form a DNA Hairpin
  • Potential Antisense Drug (Tang, et. al., 1993)
    Injecting into a Living Cell Hairpin with Nucleic
    Acid Bases Complementary to an mRNA of a Disease
    Gene Blocks its Expression

15
DNA/RNA Knots
RNA Secondary Structure Influences Function of
RNA Knots are Special Regulators
Figures Haslinger, 2001 Craven, 2001
16
Mathematical Formulation
Definition 1 (Hasliner, 2001) A Secondary
Structure S is a Vertex-Labeled Graph on n
Vertices, for which the Adjacency Matrix A has
the following properties
An Edge (i,j), i-jgt1 is Called a
Base-Pairing. A Secondary Structure Can Consist
of the Following Structural Elements
  • A Stack Consists of Subsequent Base Pairs
    (p-k,qk),
  • (p-k1,qk-1),,(p,q) k is the Length of the
    Stack
  • A Loop Consists of all Unpaired Vertices which
    are Immediately Interior to some Terminal Base
    Pair
  • An External Vertex is an Unpaired Vertex which
    does not Belong to a Loop

17
Mathematical Formulation
  • If Definition 1, Part 3 is Violated for a Base
    Pairing, then the Resulting Formation is Referred
    to as a Pseudoknot
  • With Information about Energy of Pairings and
    Additional Measurements Regarding the DNA
    Backbone, Determining Stable Secondary Structures
    Becomes a Purely Combinatorial Problem
  • Secondary Structure Prediction Dynamical
    Programming Approach, Polynomial Time Nussinovs
    and Zuckermann Algorithm
  • Pseudoknots NP-Complete, Except for Special
    Class of H-Knots (Rivas, Eddy 2003)

18
Nussinovs Folding Algorithm
Free Energy of Secondary Structure S
Free Energy of Secondary Structure Limited to
positions i, i1,, j
Figure Mneimneh, 2003, Bundschuh, 2004 Feynman
Diagrams for RNA Structure Prediction (Eddy,
Rivas 2001) Free Energy Table Sequence CCCAAATGG
19
Statistical Physics DNA Ensemble Analysis
Bundschuh, Hwa 2004 Statistics of Secondary
Structures in Ensemble of Long Random DNA
Sequences Why? Detection of Important Structural
Components in mRNAs, Functional RNAs,
Characterization of the Response of Long
Oligonucleotide DNA Molecule to Puling Forces
Random DNA Problem of Disordered Systems
Bundschuh, Hwa 2004
20
Statistical Physics DNA Ensemble Analysis
  • Molten Phase Absence of Disorder

Thermodynamic Ensemble Large Number of Different
Secondary Structures with Equal Energy Stability
of Molten Phase Use N-Replica Method
21
Stat Physics DNA Ensemble Analysis
  • Glassy Phase Few Low Energy Configurations in
    Thermodynamic Limit
  • Droplet Theory (Huse and Fisher) Large-Scale
    Low-Energy Excitations About
  • Ground State
  • Impose deformation over a length scale Lgtgt1,
    Monitor Minimal Free Energy Cost of Deformation
  • Cost Expected to Scale as Lw for large L
    Positive w Indicates Deformation Cost Grows with
    Increasing Size. Negative w Indicates Deformation
    Cost Decays there is a Large Number of
    Configurations with Low Overlap with Ground
    State, whose Energies are Similar to the Ground
    State Energy in the Thermodynamic Limit
    (Zero-Temperature Behavior not Stable to Thermal
    Fluctuations - No Thermodynamic Glass Phase can
    Exist at any Finite Temperature
  • Related Analysis A. Pagnani, G. Parisi, and F.
    Ricci-Tersenghi, 2000/2001

22
The Stability of a Particular Secondary Structure
is a Function of Several Constraints 1) Number
of GC versus AT /GT Base Pairs(Larger Number of
Hydrogen Bonds Form more Stable Structures) 2)
Number of Base Pairs Forming a Stem
Region(Presence of Long Subsequence and its
Reverse Complement Lead to Stabilization ) 3)
Number of Base Pairs in a Hairpin (More than 15
or less than 4-7 Bases put Stress on the Loop
) 4) Number of Unpaired Bases (More Unpaired
Bases lead to less Stable Structure )
23
Hybridization Constraints
  • Individual Sequence Constraints (Wood, Tsaftaaris
    etc)

IP1) The consecutive-bases constraint. Long Runs
of the Same Base Forbidden. IP2) The constant
GC-content constraint. Introduced to Achieve
Parallelized Operations on DNA Sequences Assures
Similar Thermodynamic (Melting Temperature)
Characteristics of all Codewords. GC-Content
Usually in the Range of 30-50 of Code Length
  • Joint Sequence Constraints

JP1) The Hamming distance constraint. Limits
Unwanted Hybridizations between Codewords.
Requirement is that all Distinct Pairs of
Codewords p,q in C be at Hamming Distance at
Least dmin. To Limit Undesired Hybridization
between a Codeword and the Reverse-Complement of
any other Codeword (including itself) the Reverse
Complement Hamming Distance has to be at Least
dRCmin
JP2) The frame-shift constraint. Applies Only to
Limited Number of Problems. Refers to Requirement
that Concatenation of Two or More Codewords
should not Properly Contain Another
Codeword. JP3) The forbidden subsequence
constraint. Specifies that a Class of Substrings
Must not Occur in any Codeword or Concatenation
of Codewords
24
Code Construction
PRIOR WORK Addressed 1/2/3 Requirements No
Families of Codes Given (Length Limited to
20) No Attempt Whatsoever to Consider Secondary
Structure Constraints References Condon et.al.
2000-2004 King 2003 Ryakov 2003 Gaborit and
King 2004 Ghrayeb et.al. 2004
  • Approach I Binary Mapping
  • Approach II Extended, Cyclic Goppa Codes over
    GF(4)
  • Approach III Hadamard Matrices with Cyclic Core
  • WHY Cyclic? Will Show that Computational
    Complexity for Nussinovs Algorithm Significantly
    Reduced in this Case

25
Terminology
DNA Code C Set of Codewords over Alphabet
Q Minimum Hamming, Reverse and
Reverse-Complement Hamming Distance Constant
GC Content Code
26
Binary Mapping Approach
Example qACGTCC b(q)001011011010 e(q)011011 o
(q)001100
Code D n,k,d, Contains All-Ones
Word Construction DNA Code Number of
Codewords Length 2n Hamming, Reverse
Complement Hamming Distance at Least d
27
Longest Length Codes
Bounds on
(Based on Bounds by Ashikhmin et
al, 2005) Binary Mapping Subcodes of Simplex
Codes (All-Zero Not Allowed) -- EVEN
Special Subset of Codewords from Menas/Zettenberg
Codes --ODD
28
Extended Cyclic Goppa Codes
  • Approach
  • Take a Family of Reversible (
    ) Cyclic Codes
  • Eliminate all Self-Reversible Codewords
  • From Each Remaining Pair Retain
    Exactly One Codeword
  • Complement Second Half of Each Codeword

Let for q a
Power of a Prime and Let g(z) be a
Polynomial of Degree over
such that g(z) has no Root in . The
Goppa Code, , consists of all words
such that

is a code of length n, dimension
and minimum distance .
Zhang et. al., 1988
29
DNA Codes and Goppa Codes
A Reversible Cyclic Code of Dimension k over
GF(q) contains self-reversible
Codewords.
For arbitrary positive integers a,m, there exist
DNA Codes D such that
having the following properties
Choose Constant GC Content Subset of Codewords
Example
CGTTC,CAAAT,CTCCA,GCCTT,GGAGA,ACTAA
30
Complex (Generalized) Hadamard Matrices
Matrix of Dimension nn over
Set of m-th Roots of Unity With
property Exponent Matrix over
TheoremHeng et.al, 02 Let Npk-1 for p Prime
and a Positive Integer k. Let g(x)c0c1xc2x2c
N-kxN-k be a Monic Polynomial over Zp, of Degree
N-k, such that g(x)h(x)xN-1 over Zp , for some
monic irreducible polynomial h(x) in Zpx .
Suppose that the vector , (0,c0,c1,c2,,cN-k)
with ci0 for N-kltiltN has the property that it
contains each element of Zp the same number of
times. Then the N cyclic shifts of the vector
(c0,c1,c2,,cN-k) form the code of the exponent
matrix of some Hadamard matrix H(pk,Cp)
Choose p3, and Use only One of G/C
For any , there exists DNA codes D with
codewords of length ,
with constant GC-content equal to and
Each Codeword of such a Code is a Cyclic Shift
of a Fixed Generator Codeword g.
31
Hadamard and Vienna
Vienna Package T37?C http//www.tbi.univie.ac.at
/ivo/RNA/ Based on Nussinovs Algorithm Gives
one Minimum Free Energy Secondary
Structure MFOLD (Zuckerman et.al.2000)
32
Why Cyclic Codes?
Let a DNA Code Consist of the Cyclic Shifts of a
Codeword . Provided that the free
energy table of is known, the free-energy tables
of all other codewords can be computed with a
total of O(n3) operations only. More precisely,
the free-energy table of the codeword
can be obtained from the table in O(n2) steps.
33
C C C A A A T G G
C 0 0 0 0 0 0 -1 -2 -3
C 0 0 0 0 0 0 -1 -2 -2
C 0 0 0 0 0 0 -1 -2 -2
A 0 0 0 0 0 0 -1 -1 -1
A 0 0 0 0 0 0 -1 -1 -1
A 0 0 0 0 0 0 -1 -1 -1
T 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0
G A C A A A G G T
G 0 0 -1 -1 -1 -1 -1 -1 -2
A 0 0 0 0 0 0 -1 -1 -2
C 0 0 0 0 0 0 -1 -1 -1
A 0 0 0 0 0 0 0 0 -1
A 0 0 0 0 0 0 0 0 -1
A 0 0 0 0 0 0 0 0 -1
G 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0
T 0 0 0 0 0 0 0 0 0
d WC(CCCAAATGG,GCCCAAATG)7
d WC(GACAAAGGT,TGACAAAGG)9 d
WC(CCCAAATGG,GGCCCAAAT)6
d WC(GACAAAGGT,GTGACAAAG)7 T1 Free
Energy -0.24Kcal/mol
T2 -0.19Kcal/mol Energies
Obtained from Vienna RNA Folding Package (I.
Hofacker)
34
Why Binary Mapping?
1 1 1 0 0 0 0 1 1
1 0 -1 -1 -1 -2 -2 -3 -4 -4
1 0 0 -1 -1 -2 -2 -3 -3 -4
1 0 0 0 0 -1 -1 -2 -3 -3
0 0 0 0 0 -1 -1 -2 -2 -3
0 0 0 0 0 0 -1 -1 -1 -2
0 0 0 0 0 0 0 -1 -1 -2
0 0 0 0 0 0 0 0 0 -1
1 0 0 0 0 0 0 0 0 -1
1 0 0 0 0 0 0 0 0 0
C
C
G
G
C
T
A
A
A
35
1 0 1 0 1 0 1 1 0
1 0 0 -1 -1 -2 -2 -3 -3 -4
0 0 0 0 -1 -1 -2 -2 -3 -3
1 0 0 0 0 -1 -1 -2 -2 -3
0 0 0 0 0 0 -1 -1 -2 -2
1 0 0 0 0 0 0 -1 -1 -1
0 0 0 0 0 0 0 0 -1 -2
1 0 0 0 0 0 0 0 -1 -1
1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
1 1 0 0 1 0 1 0 0
1 0 -1 -1 -2 -2 -2 -3 -3 -4
1 0 0 0 -1 -2 -2 -2 -3 -3
0 0 0 0 -1 -1 -1 -2 -2 -3
0 0 0 0 0 0 -1 -1 -2 -2
1 0 0 0 0 0 0 -1 -1 -2
0 0 0 0 0 0 0 0 -1 -1
1 0 0 0 0 0 0 0 0 -1
0 0 0 0 0 0 0 0 0 -1
0 0 0 0 0 0 0 0 0 0
What Type of Sequences do Minimize the entry
E1,n? Cyclic Shifts with a Minimized Set i
WC(Ci)Cik, k1,2,,m
36
The Cyclic Distance (Binary Case)
  • Known Peng, 1998

Sequence Weight w n/2, n even
w (n-1)/2, n odd
Achieved Maximum Length Shift Register (MLSR)
Sequences (Pseudo-Random Sequences in General)
What are the Reversal Distance Properties of MLSR
Sequences?
37
The Watson-Crick Distance
  • Watson-Crick Distance Plotkin-Type of Bound

38
The Free Energy of a DNA Strand (c1,c2,,cn) can
be Approximated According to Breslauers Formula
Much more Accurate
39
Other Coding Problems
  • Generalized deBruijn Sequences
  • Association Schemes for Hamming/RC
    Hamming/Constant GC Content
  • Binary Mapping Approach with Runlength
    Constraints
  • Forbidden Pattern Constraints (Enumeration
    Techniques by Goulden and Jackson)
  • Catalan Numbers
  • b1 CN(1)1   ( )b2 CN(2)2   ( ) ( ), ( ( )
    )b3 CN(3)5   ( ) ( ) ( ), ( ( ) ( ) ), ( ( )
    ) ( ), ( ) ( ( ) ), ( ( ( ) ) )
Write a Comment
User Comments (0)
About PowerShow.com