Title: Coding Theory and Protein Synthesis
1Coding Theory and Protein Synthesis
- Avogadro-Scale Engineering Form and Function
- November 18, 19 2003
- Elebeoba E. May
- Computational Biology Department
- Sandia National Laboratories
- eemay_at_sandia.gov
Sandia is a multiprogram laboratory operated by
Sandia Corporation, a Lockheed Martin
Company,for the United States Department of
Energy under contract DE-AC04-94AL85000.
2Agenda It is the glory of God to conceal a
matter to search out a matter is the glory of
kings. Proverbs 252 (NIV)
- Error Control at Diverse Molecular Scales
- Coding Theory Models of Protein Synthesis
- Gatlin
- Yockey
- May et al.
- Applications of Coding Theory to
- Genetic Classification
- Molecular Computation
- Construction and Control in Protein Synthesis
3Nucleotides Did nature select a parity check
code?
D. A. Mac Dónaill Numerical Interpretation of
nucleotides depicted as positions on a B4
hypercube (a) even-parity nucleotides (b)
odd-parity nucleotides. The natural alphabet is
structured as an error-checking code. D.A.
Mac Dónaill, A parity code interpretation of
nucleotide alphabet composition, Chem. Comm.
(2002) 2062-2063 and http//www.tcd.ie/Chemistry/P
eople/macdonaill/
4Protein Degeneracy of the genetic code
http//www.people.virginia.edu/rjh9u/code.html
B. Hayes how quickly a biochemical puzzle
was reduced to an abstract problem in symbol
manipulation. B. Hayes, The Invention of the
Genetic Code, Sc. Am. 1998 (Physicist George
Gamow and coding-theorist Solomon W. Golomb.
Experimental evidence from Marshall W. Nirenberg
and J. Heinrich Matthaei, NIH)
5Protein Information theory and binding sites
T. D. Schneider Strong minor groove base
conservation in sequence logos implies DNA
distortion or base flipping during replication
and transcription initiation, Nucleic Acids
Research, 2001, Vol. 29, No. 23 4881-4891
6Genome Increased length, increased fidelity
- Mutation Rates
- RNA viruses 1 - 0.1
- DNA microbes 1/300
- Higher eukaryotes 1/300 EfGn
Comparison of microbial genome base mutation rate
to genome size exhibits power law behavior
inverse relation between genome size and base
mutation rate.
7G. Battail increasing the codeword length
results in a decreasing probability of error
Comparison of higher eukaryotic genome base
mutation rate to genome size inverse relation
between genome size and base mutation rate.
8Evidence Is there evidence of error control in
protein synthesis process?
- Liebovitch et al. 1996, Rosen and Moore 2003
computational experiments did not find evidence
for linear block codes - Approach not comprehensive, did not consider
convolutional coding or noise - May et al. Looked for optimal generator for
translation initiation sites - Highly probable for encoding model not to conform
to known error control codes.
9Agenda It is the glory of God to conceal a
matter to search out a matter is the glory of
kings. Proverbs 252 (NIV)
- Error Control at Diverse Molecular Scales
- Coding Theory Models of Protein Synthesis
- Gatlin
- Yockey
- May et al.
- Applications of Coding Theory to
- Genetic Classification
- Molecular Computation
- Construction and Control in Protein Synthesis
10Central Dogma of Genetics Genetic Information
Transmission
A
Encode
(eukaryotes)
Channel
Decode
B
(http//www-stat.stanford.edu/susan/courses/s166/
central.gif)
11Coding Theory Models of Protein Synthesis
Gatlin, LL., Information theory and the Living
System. 1972.
Yockey, Hubert, Information Theory and Molecular
Biology. 1992
12Coding Theory View of Protein Synthesis, May et
al., JFI 2004
Genetic Encoder
Genetic Information
Genetic Channel
Errors
Principal Hypothesis If mRNA is viewed as a
noisy encoded signal, it is feasible to use
principles of error control coding theory to
interpret the genetic translation initiation
mechanism
Genetic Decoder
mRNA
3
AUG
UAA
13Engineering Communication System
B
A
Error Control
Encoder
111-000-000-111
k-bit Information
n-bit Information
1-0-0-1
Channel
111-001-000-110
Errors!
Decoder
111-001-000-110 1-0-0-1
k-bit Information
Noisen-bit Information
1-0-0-1
14Engineering Communication System
B
A
Error Control
Encoder
111-000-000-111
k-bit Information
n-bit Information
1-0-0-1
Channel
111-001-000-110
Errors!
Decoder
????
111-001-000-110 1-0-0-1
k-bit Information
Noisen-bit Information
1-0-0-1
15Agenda It is the glory of God to conceal a
matter to search out a matter is the glory of
kings. Proverbs 252 (NIV)
- Error Control at Diverse Molecular Scales
- Coding Theory Models of Protein Synthesis
- Gatlin
- Yockey
- May et al.
- Applications of Coding Theory to
- Genetic Classification
- Molecular Computation
- Construction and Control in Protein Synthesis
16Biological Coding Theory
- David Loewenstern, et. al
- Compression for DNA sequence classification
- Leonard Adleman, et al. Lila Kari, et al.
- Molecular computation
- Encoding for DNA computing
- Error-control coding
- Thomas Schneider, et al.
- Biological information theory
- Error-control via sphere packing
Error-Control Coding Based Methods
- Efficient Coding for the Desoxyribonucleic
Channel (S. W. Golomb 1962) - Applied Biorthogonal codes to genetic coding
problem (the codon to amino acid mapping
challenge) - Andrzej K. Konopka (1984)
- Gerard Battail
- Table-Based Convolutional Code for E. coli
Promoter (P. Bermel) - Based on the informational content of E. coli
promoter, approximates the coding rate for
promoter region as 1/9. - Developed a possible 1/5 binary code for E. coli
promoter region.
17Coding Theory in RBS Classification
DB
NRD
AUG
SD
Horizontal axis is position relative to the first
base of the initiation codon. Vertical axis is
the mean of the aligned minimum Hamming distance
values by position, for the 3 sequence groups
(Hamming distance of positions where two
vectors differ)
May et al., BioSystems 2004
18Coding Theory in RBS Classification
19Coding Theory and Molecular Computation
- Leonard M. Adleman, et al. Lila Kari, et al.
- Molecular computation
- Encoding for DNA computing
- Error-control coding
ligase
- M. Stojanovic and D. Stefanovic, A
deoxyribozyme-based molecular automaton. Nature
Biotech. 2003 - Can achieve computational robustness using coding
theory
http//www.scs.uiuc.edu/scott/index_files/ligatio
n.gif
20Construction and control Quantify and Optimize
Protein Translation
5
Initiation Factors
5
- Phases of translation initiation, elongation,
termination - Initiation is most time consuming, affects
overall gene expression level - Qualitative outline for initiation process
exists 1) 30S Ifs bind to mRNA and fMet-tRNA
2) Ternary complex binds 50S subunit 3) IFs
released prior to elongation. - mRNA is the only variable aspect of translation
initiation. - Information encoded in mRNA determines
specificity and efficiency
21Construction and control Quantify and Optimize
Protein Translation
mRNA Leader Region (UTR)
AUG GUG UUG
Downstream box
Non-random domain
5
3
Ribosome Binding Site
3..AUUCCUCCACUAG.
5
Modify E.coli Intergenic
22Acknowledgments
- Collaborators
- NCSU Mladen Vouk, Donald Bitzer, and Winser
Alexander, Ann Stomp - SNL Anna Johnston, William Hart, Jean-Paul
Watson, Richard Pryor - NIEHS John Drake (Mutagenesis data)
- Support
- SNL Tier 1 Seniors Council LDRD/DOE
- NSF, Ford Foundation