Title: Mathematical principles underlying genetic structures
1Mathematical principles underlying genetic
structures
2Overview
- Mathematics of DNA
- The DNA code
- Number of bases, number of amino acids
- Coding properties
- Evolution of sequences
- Evolution of introns, alternative splicing
- Non-coding DNA what is it good for?
- Mathematics for analysing DNA
- Mutual information
- Correlations
3DNA primer
- Bases with complimentary pairing
- Cytosine with Guanine
- Adenosine with Thymine
- Triplets code for 20 amino acids (includes start
codon) stop codon, some degeneracy (43gt21). - Other sequences binding sites, introns,
telomeres, junk DNA.
4Eukaryotic gene
http//web.indstate.edu/thcme/mwking/gene-regulati
on.html
5Types of mutations (1)
- Insertions
- attgcctgggtgc -gt attcgcctgggtgcc
- I A W V -gt I R L G A
- Point mutations
- attgcctgggtgc -gt attacctgggtgc
- I A W V -gt I T W V
- Deletions
- attgcctgggtgc -gt attcctgggtgc
- I A W V -gt I P G C
M. Spanò, F. Lillo, S. Miccichè and R. N.
Mantegna, Inverted repeats in viral genomes,
Fluctuations and Noise Letters, 5(2)L193-L200
(2005).
http//www.ebi.ac.uk/2can/disease/genes5.html
6Types of mutations (2)
- Gene duplication
- attgcctgggtac -gt
- attgcctgggttgcctgggtac
- Repeating elements
- attgcctgggtac -gt
- attgtcttcttcttctcctgggtac
- Flips
- attgcctgggtac -gt
- attgggtccgtac
http//www.ebi.ac.uk/2can/disease/genes5.html
7Types of mutations (3)
- Recombination
- Find sites based on short target sequences
insert different genes / parts of genes. - Gene/operon shuffles
- b0357 b0362
- b0358 b0363
- b0359 -gt b0357
- b0362 b0358
- b0363 b0359
http//www.bio.davidson.edu/courses/movies.html
M.D. Ermolaeva, O. White and S.L. Salzberg,
Prediction of operons in microbial genomes,
Nucleic Acids Res. 29(5)1216-1211 (2001).
8(No Transcript)
9Optimal number of amino acids
M.A. Soto C.J. Tohá, A hardware interpretation
of the evolution of the genetic code,
Biosystems, 18209-215 (1985).
10Optimal number of amino acids
11Optimal no. of bases and amino acids
12Doublet ancestor code
A. Patel, The Triplet Genetic code had a Doublet
Predecessor, arXivq-bio.GN/0403036
13(No Transcript)
14Gray code
- A Gray code is an encoding of numbers so that
adjacent numbers have a single digit differing by
1. - List all strings of bits of a given length in a
sequence such that each string differs from its
successor in only a single bit position. - Thus the essential property is one of minimal
change between the neighbouring bit strings. - In DNA terms, if we change a codon by a single
mutation we expect to end up mapping to the same,
or similar amino acid.
R. Swanson, A unifying concept for the amino
acid code, Bulletin of Mathematical Biology,
46(2)187-203 (1984).
15DNA Tower of Hanoi
A Gray coding-based assignment of codons to amino
acids can be formulated as the traveling salesman
problem1, which is analogous to solving the Tower
of Hanoi game.
1 D. Bonacki, H.M.M. ten Eikelder, P.A.J.
Hilbers, Genetic code as a Gray code revisited,
The 2003 International Conference on Mathematics
and Engineering Techniques in Medicine and
Biological Sciences
16DNA Tower of Hanoi
A Gray coding-based assignment of codons to amino
acids can be formulated as the traveling salesman
problem1, which is analogous to solving the Tower
of Hanoi game.
1 D. Bonacki, H.M.M. ten Eikelder, P.A.J.
Hilbers, Genetic code as a Gray code revisited.
17DNA Tower of Hanoi
A Gray coding-based assignment of codons to amino
acids can be formulated as the traveling salesman
problem1, which is analogous to solving the Tower
of Hanoi game.
1 D. Bonacki, H.M.M. ten Eikelder, P.A.J.
Hilbers, Genetic code as a Gray code revisited.
18DNA Tower of Hanoi
A Gray coding-based assignment of codons to amino
acids can be formulated as the traveling salesman
problem1, which is analogous to solving the Tower
of Hanoi game.
1 D. Bonacki, H.M.M. ten Eikelder, P.A.J.
Hilbers, Genetic code as a Gray code revisited.
19DNA Tower of Hanoi
A Gray coding-based assignment of codons to amino
acids can be formulated as the traveling salesman
problem1, which is analogous to solving the Tower
of Hanoi game.
1 D. Bonacki, H.M.M. ten Eikelder, P.A.J.
Hilbers, Genetic code as a Gray code revisited.
20DNA Tower of Hanoi
A Gray coding-based assignment of codons to amino
acids can be formulated as the traveling salesman
problem1, which is analogous to solving the Tower
of Hanoi game.
1 D. Bonacki, H.M.M. ten Eikelder, P.A.J.
Hilbers, Genetic code as a Gray code revisited.
21DNA Tower of Hanoi
A Gray coding-based assignment of codons to amino
acids can be formulated as the traveling salesman
problem1, which is analogous to solving the Tower
of Hanoi game.
1 D. Bonacki, H.M.M. ten Eikelder, P.A.J.
Hilbers, Genetic code as a Gray code revisited.
22DNA Tower of Hanoi
A Gray coding-based assignment of codons to amino
acids can be formulated as the traveling salesman
problem1, which is analogous to solving the Tower
of Hanoi game.
1 D. Bonacki, H.M.M. ten Eikelder, P.A.J.
Hilbers, Genetic code as a Gray code revisited.
23Gene for infidelity
0 fidelity, 1 infidelity
24Gene for infidelity
25Information theory
- Entropy
- Mutual information
- Chaitin-Kolmogorov entropy the algorithmic
complexity, ie what is the shortest program
which will reproduce the original information.
Depends on software libraries / set of genes
(routines) and proteins (running operating system)
X
Y
26Evolution of introns
- Alternative splicing as a form of compression
- subfunction1subfunction2subfunction3
subfunction1subfunction3 - subfunction1subfunction2subfunction3
- Advantages higher information content gt more
complexity gt more adaptable. - Trouble how to control this in an efficient,
error free manner, with the required flexibility. - actggggcttaa
- Introns allow us/cell machinery to efficiently
mark the start and end of subfunctions/exons. - actgtagggggctgtagtaa
- Efficient because we can use the same splicing
machinery throughout. - We can now insert extra information into the
genome, eg if nerve cell, include the following
exon and generate novelty, eg. include exons
which can vary with mutation of intron length, or
include coded sites for recombination.
M.V. Bell, A.E. Cowper, M.-P. Lefranc, J.I. Bell
G.R. Screaton, Influence of intron length on
alternative splicing of CD44, Molecular and
Cellular Biology 18(10)5930-594 (1998).
27Non-coding DNA
- We know that, ignoring binding sites, non-coding
DNA can, fairly easily, through point mutations,
turn into coding DNA and vice versa. - There are also other things--the splicing that
goes on through sexual reproduction, and we know
from somatic mutations in genes "for" MHC
complexes that mutation rates can be
controlled--that suggest that non-coding DNA may
actually be a recycling ground for de novo
creation of new genes. - If this were the case, then in some species which
"had to" undergo rapid mutation (ie, non-coding
DNA was selected for, by virtue of its ability to
generate new genes that better equipped a
species, thus gaining a higher game theory score
than others) then these species would be expected
to carry around more non-coding DNA, "just in
case" they needed to generate new genes again.
28Correlation and mutual information
29Mutual information - real DNA vs. random sequence
30Genomic Signal Processing
31Conclusions
- Mathematics shows us which solutions to
evolutionary problems are optimal, and therefore
what to expect - 4 bases, 20 amino acids.
- Properties of the genetic code.
- Frequencies of genes in a population.
- Genes that allow us to increase the complexity
and make systems that are more adaptable
(alternative splicing, introns). - Mathematics allows us to analyse the patterns
structures that form. - Correlation detection
- Mutual information
- Signal processing
32Evolution directing evolution
- Not only has life evolved but life has evolved
to evolve. That is, correlations within protein
structure have evolved, and mechanisms to
manipulate these correlations have evolved in
tandem. The rates at which the various events
within the hierarchy of evolutionary moves occur
are not random or arbitrary but are selected by
Darwinian evolution - D.J. Earl and M.W. Dean, PNAS 101(32)11531-11536
(2004)
33Examples
- Vertebrate immune system.
- Bacteria under nutrient starvation conditions.
- Centromere proteins in animals and humans.
- Cytochrome P450 genes.
- CYP19 (cancer)
- CYP2C9 (blood coagulation)