Codons, Genes and Networks - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Codons, Genes and Networks

Description:

The flower-like 7 clusters structure is flat. Seven classes vs. Seven clusters. GenScan ... Self-identification of protein-coding regions in microbial genomes. ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 39
Provided by: Past
Category:

less

Transcript and Presenter's Notes

Title: Codons, Genes and Networks


1
Codons, Genes and Networks
  • Bioinformatics service
  • Math_at_Bio group
  • of M.Gromov

Andrei Zinovyev
2
Plan of the talk
  • Part I 7-clusters structure of genome (codons
    and genes)
  • Part II Coding and non-coding DNA scaling laws
    (genes and networks)

3
Part I 7-clusters genome structure
  • Dr. Tatyana Popova
  • RD Centre in
  • Biberach,
  • Germany
  • Prof. Alexander Gorban
  • Centre for
  • Mathematical
  • Modelling

4
Genomic sequence as a text in unknown language
..cgtggtgagctgatgctagggacgcacgtggtgagctgatgctaggga
cgacgtggtgagctgatgctagggacgc
5
From text to geometry
cgtggtgagctgatgctagggacgcacgtggtgagctgatgctagggacg
acgtggtgagctgatgctagggacgc 107 cgtggtgagctgatgc
tagggacgcac ggtgagctgatgctagggacgcacact tgagctgatg
ctagggacgcacaattc gtgagctgatgctagggacgcacggtg
gagctgatgctagggacgcacaagtga
length200-400
10000-20000 fragments
RN
6
Method of visualizationprincipal components
analysis
RN
7
Caulobacter crescentus
8
First explanation
cgtggtgagctgatgctagggrcgcacgtggtgagctgatgctagggrcg
acgtggtgagctgatgctagggrcgc
9
Basic 7-cluster structure
gtgagctgatgctagggrcgcacgtggtgagc
10
Non-coding parts
Point mutations insertions, deletions
a
gtgagctgatgctagggr cgcacgaat
11
The flower-like 7 clusters structure is flat
12
Seven classes vs Seven clusters
Georgia Institute of Technology
Stanford
TIGR
Lomsadze A., Ter-Hovhannisyan V., Chernoff YO,
Borodovsky M. Gene identification in novel
eukaryotic genomes by self-training algorithm.
Nucleic Acids Research, 2005, Vol. 33, No. 20
Hong-Yu Ou, Feng-Biao Guo and Chun-Ting Zhang
(2003). Analysis of nucleotide distribution in
the genome of Streptomyces coelicolor A3(2) using
the Z curve method. FEBS Letters
540(1-3),188-194
Audic, S. and J. Claverie. Self-identification
of protein-coding regions in microbial
genomes. Proc Natl Acad Sci U S A,
95(17)10026-31, 1998.
13
Computational gene prediction
14
Mean-field approximationfor triplet frequencies
FIJK Frequency of triplet IJK ( I,J,K?
A,C,G,T ) FAAA , FAAT , FAAC FGGC , FGGG
64 numbers position-specific letter frequency
correlations 12 numbers
15
Why hexagonal symmetry?
GC-content PC PG
-0
0-
16
Genome codon usageand mean-field approximation
correct frameshift

ggtgaATG gat gct agg gtc gca cgc TAAtgagct
64 frequencies FIJK
17
PIJ are linear functions of GC-content
eubacteria
archae
18
THE MYSTERY OF TWOSTRAIGHT LINES ???
R64
R12
FIJK P1IP2JP3K correlations
19
Codon usage signature
0-
20
19 possible eubacterialsignatures
21
Example Palindromic signatures
22
Four symmetry typesof the basic 7-cluster
structure
23
S.Coelicolor (GC72)
24
Using branching principal components to analyze
7-clusters genome structures
25
Using branching principal components to analyze
7-clusters genome structures
Streptomyces coelicolor
Fusobacterium nucleatum
Bacillus halodurans
Ercherichia coli
26
Web-site
cluster structures in genomic sequences
http//www.ihes.fr/zinovyev/7clusters
27
Papers (type Zinovyev in Google)
Gorban A, Zinovyev A PCA deciphers genome. 2005.
Arxiv preprint Gorban A, Popova T, Zinovyev A
Codon usage trajectories and 7-cluster structure
of 143 complete bacterial genomic sequences.
2005. Physica A 353, 365-387 Gorban A, Popova
T, Zinovyev A Four basic symmetry types in the
universal 7-cluster structure of microbial
genomic sequences. 2005. In Silico Biology 5,
0025 Gorban A, Zinovyev A, Popova T Seven
clusters in genomic triplet distributions. 2003.
In Silico Biology. V.3, 0039. Zinovyev A,
Gorban A, Popova T Self-Organizing Approach for
Automated Gene Identification. 2003. Open
Systems and Information Dynamics 10 (4).
28
Part IICoding and non-coding DNA scaling laws
Dr. Sebastian Ahnert Cavendish
laboratory, University of Cambridge
Dr. Thomas Fink Bioinformatics service
29
C-value and G-valueparadox
  • Neither genome length nor gene number account for
    complexity of an organism
  • Drosophila melanogaster (fruit fly) C120Mb
  • Podisma pedestris (mountain grasshopper) C1650
    Mb

30
Non-linear growth of regulation
Amount of regulation scales non-linearly with
the number of genes every new gene with a new
function requires specific regulation, but the
regulators also need to be regulated
bacteria
Slope 1
archae
Log number of regulatory genes
Slope 1.96
Log number of genes
Mattick, J. S. Nature Reviews Genetics 5, 316323
(2004).
31
Complexity ceiling for prokaryotes
  • Adding a new function DS requires adding a
    regulatory overhead DR, the total increase is
  • DN DR DS
  • Since R N2 , at some point DR gt DS,
  • i.e. gain from a new function is too expensive
    for an organism, it requires too
  • much regulation to be integrated

There is a maximum possible genome length for
prokaryotes (10Mb)
32
How eukaryotes bypassed this limitation?
  • Presumably, they invented a cheaper (digital)
    regulatory system, based on RNA
  • This regulatory information is stored in the
    non-coding DNA

33
Simple modelAccelerated networks
Node is a gene (c genes) Edge is a regulation
(n edges)
n ac2
Connectivity gt kmax deficit of regulations is
taken from non-coding DNA
Connectivity lt kmax, regulators are only proteins
34
How much regulation genome needs to take from
non-coding DNA?
cmax (prokaryotic ceiling)
These regulations must be encoded in the
non-coding part of genome, therefore
N non-coding DNA length C coding DNA
length Cprok ceiling for prokaryotes (10Mb) b
- some coefficient
35
Observationcoding length vs non-coding
b1
Minimum non-coding length needed for the
deficit regulation
36
Hypothesis
  • Prokaryotes
  • ltNon-coding lengthgt a ltCoding lengthgt
  • a 5-15 (little constant add-on, promoters,
    UTRs)
  • 15 1/7
  • Eukaryotes
  • Nreg b/2 C/Cmaxprok(C-Cmaxprok) C2,
  • Cmaxprok 10Mb, b 1
  • This is the amount necessary for regulation, but
    repeats, genome parasites, etc., might make a
    genome much bigger

37
This is only a hypothesis, but
  • Prediction on the Nreg for human
  • Nreg 87 Mb 3 of genome length
  • C 48 Mb 1.7
  • NregC 4.7

38
Thank you for your attention
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com