Data are not homogenous: lessons from completed microbial genomes' - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Data are not homogenous: lessons from completed microbial genomes'

Description:

This structure has novel features which are of considerable biological interest. ... equal to the rate of occurence of neutral mutations (per gamete per generation) ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 68
Provided by: molecularb9
Category:

less

Transcript and Presenter's Notes

Title: Data are not homogenous: lessons from completed microbial genomes'


1
Data are not homogenous lessons from completed
microbial genomes.
  • James O. McInerney,
  • National University of Ireland, Maynooth,
  • Co. Kildare, Ireland.
  • http//www.may.ie/academic/biology/jmbioinformatic
    s.shtml
  • and
  • The Natural History Museum, Cromwell road, London
    SW7 5BD, UK.

2
We wish to suggest a structure for the salt of
Deoxyribose nucleic acid (DNA). This structure
has novel features which are of considerable
biological interest.
Watson and Crick, Nature, 1953
3
What was the relationship between DNA and
proteins?
  • Nucleotide to amino acid?
  • Doublet code?
  • Triplet code?
  • Intermediate device?

4
Attempted to solve this problem by combinatorics
  • George Gamow (Big Bang theory)
  • F.H.C. Crick

5
Attempted to solve it biochemically
  • Marshall Nirenberg

6
Universal Genetic code
7
Universal Genetic code
8
Universal Genetic code
9
Universal Genetic code
10
The rate of random fixation of neutral mutations
in evolution (per species per generation) is
equal to the rate of occurence of neutral
mutations (per gamete per generation)
Kimura, M. Nature 217 624 (1968)
11
As far as is known, synonymous mutations are
truly neutral with respect to natural selection.
King, J.L. and Jukes, T.H. Non-Darwinian
Evolution. Science, 164 788-798 (1969).
12
Evidence that all synonymous codons were not used
with equal frequency Fiers et al., 1975
A-protein gene of bacteriophage MS2, Nature 256,
273-278
UUU Phe 6 UCU Ser 5 UAU Tyr 4 UGU Cys
0 UUC Phe 10 UCC Ser 6 UAC Tyr 12 UGC
Cys 3 UUA Leu 8 UCA Ser 8 UAA Ter
UGA Ter UUG Leu 6 UCG Ser 10 UAG Ter
UGG Trp 12 CUU Leu 6 CCU Pro 5 CAU
His 2 CGU Arg 7 CUC Leu 9 CCC Pro 5
CAC His 3 CGC Arg 6 CUA Leu 5 CCA Pro 4
CAA Gln 9 CGA Arg 6 CUG Leu 2 CCG Pro
3 CAG Gln 9 CGG Arg 3 AUU Ile 1 ACU
Thr 11 AAU Asn 2 AGU Ser 4 AUC Ile 8
ACC Thr 5 AAC Asn 15 AGC Ser 3 AUA Ile 7
ACA Thr 5 AAA Lys 5 AGA Arg 3 AUG MeU
7 ACG Thr 6 AAG Lys 9 AGG Arg 4 GUU
Val 8 GCU Ala 6 GAU Asp 8 GGU Gly
15 GUC Val 7 GCC Ala 12 GAC Asp 5 GGC
Gly 6 GUA Val 7 GCA Ala 7 GAA Glu 5
GGA Gly 2 GUG Val 9 GCG Ala 10 GAG Glu
12 GGG Gly 5
13
Multivariate reduction
  • Attempts to reduce a high-dimensional space to a
    lower-dimensional one.
  • In other words, it tries to simplify the data
    set.
  • Many of the variables might co-vary, therefore
    there might only be one, or a small few sources
    of variation in the dataset
  • A gene can be represented by a 59-dimensional
    vector (universal code)
  • A genome consists of hundreds (thousands) of
    these genes
  • Variation in the variables (RSCU values) might be
    governed by only a small number of factors

14
Location of a hypothetical gene encoded only by
isoleucine codons in its three-dimensional space
AUA
2.00
1.00
Origin
0.66
1.00
AUU
1.00
2.00
1.33
AUC
2.00
15
Location of a collection of genes encoded only
by isoleucine codons in their three-dimensional
space
AUA
2.00
1.00
Origin
0.66
AUU
1.00
1.00
2.00
1.33
AUC
2.00
16
GCUA(General Codon Usage Analysis)
McInerney, J.O. (1998) Bioinformatics 14(4)
  • Computer program for analysing codon and amino
    acid usage data.
  • Written in ANSI C programming language
  • Runs on all operating systems (e.g. MacOS, SGI,
    SUN, DEC etc.).
  • Performs usual calculations such as number of
    times a codon is used, RSCU, amino acid usage
    etc.
  • Most important functions are the multivariate
    analysis functions.
  • GCUA performs Correspondance analysis (CA) and
    Principal Components Analysis (PCA) on both codon
    usage (RSCU) and amino acid usage (aa) data.

17
GCUAGeneral Codon Usage Analysis
18
(No Transcript)
19
Correspondence analysis of codon usage in
Escherichia coli.
Axis 2
Axis 1
20
Correspondence analysis of codon usage in
Escherichia coli.
Highly-expressed genes
Axis 2
Axis 1
21
Correspondence analysis of codon usage in
Escherichia coli.
"Lowly-expressed" genes
Axis 2
Axis 1
22
Correspondence analysis of codon usage in
Escherichia coli.
Recently-acquired genes
Axis 2
Axis 1
23
Prokaryotic genome evolution as assessed by
multivariate analysis of codon usage patterns
  • McInerney, J.O. Microbial and Comparative
    Genomics (1997) 2(1).

24
Organisms
  • Haemophilus influenzae (1, 830, 137 bp)
  • Mycoplasma genitalium (580, 070 bp)
  • Methanococcus jannaschii (1, 664, 976 bp)

25
Haemophilus influenzae
26
Mycoplasma genitalium
27
Methanococcus jannaschii
28
Mycoplasma genitalium genome.
Outer circle genes on the outer strand Second
circle genes on the inner strand
29
Mycoplasma genitalium
Axis 2
Axis 1
30
Mycoplasma genitalium
Axis 2
Highly-expressed genes
Axis 1
31
M. genitalium
0.5
0.4
0.3
GC3s
GC3s
0.2
0.1
0
Axis 1
32
M. genitalium
0.5
y 0.003x 0.233 r 0.875
0.4
0.3
Highly-expressed genes
GC3s
0.2
0.1
0
Axis 1
33
M. genitalium
Axis 1
Chromosome position
34
M. genitalium
2
3
Regression co-efficient r 0.718
Axis 1
Chromosome position
35
Base composition changes in M. genitalium
Origin of Replication
50
40
30
GC3
20
10
0
0
10
50
10
30
30
50
Percentage distance from the origin
36
Replicational and transcriptional selection on
codon usage in Borrelia burgdorferi
McInerney, J.O. (1998). Proceedings of the
National Academy of Sciences USA.
37
Borrelia burgdorferi
38
Lyme disease
39
Lyme disease II - this time its personal!
40
(No Transcript)
41
(No Transcript)
42
B. burgdorferi
  • Linear chromosome of 910,725 bp
  • At least 17 linear and circular plasmids with a
    combined size of 530,000 bp
  • Many of the plasmid-borne orfs are of unknown
    function or are hypothetically surmised to encode
    de facto genes.
  • Loss of significant genes for cellular
    biosynthetic reactions (similar situation to that
    seen in M. genitalium).

43
CoA of codon usage in B. burgdorferi
Axis 2
Axis 1
44
CoA of codon usage in B. burgdorferi
Axis 2
Axis 1
45
B. burgdorferi
  • Leading Lagging Leading
    Lagging
  • AA N RSCU N RSCU AA N
    RSCU N RSCU
  • Phe UUU12116 1.88 4146 1.64 Ser UCU 5926
    2.38 1571 1.50
  • UUC 756 0.12 903 0.36 UCC 574
    0.23 370 0.35
  • Leu UUA 7918 2.39 4031 2.49 UCA 2865
    1.15 2081 1.98
  • UUG 4319 1.30 958 0.59 UCG 532
    0.21 196 0.19
  • Leu CUU 6325 1.91 2360 1.46 Pro CCU 2272
    2.03 926 1.39
  • CUC 224 0.07 388 0.24 CCC 662
    0.59 425 0.64
  • CUA 775 0.23 1664 1.03 CCA 1318
    1.18 1207 1.81
  • CUG 332 0.10 324 0.20 CCG 232
    0.21 108 0.16
  • Ile AUU12007 2.00 5015 1.20 Thr ACU 2929
    1.88 1310 1.06
  • AUC 845 0.14 1365 0.33 ACC 806
    0.52 655 0.53
  • AUA 5165 0.86 6161 1.47 ACA 2118
    1.36 2817 2.27
  • Met AUG 3444 1.00 1692 1.00 ACG 390
    0.25 174 0.14
  • Val GUU 7778 2.59 1187 1.45 Ala GCU 4395
    2.08 1337 1.25

Leading Lagging Leading
Lagging AA N RSCU N RSCU AA
N RSCU N RSCU Tyr UAU 7043 1.77 2570
1.27 Cys UGU 1034 1.54 241 0.87 UAC
928 0.23 1480 0.73 UGC 311 0.46 312 1.13
ter UAA 3 0.00 0 0.00 ter UGA 0 0.00
0 0.00 ter UAG 2 0.00 0 0.00 Trp UGG
885 1.00 543 1.00 His CAU 1770 1.67 827
1.22 Arg CGU 455 0.40 50 0.13 CAC
352 0.33 532 0.78 CGC 192 0.17 59 0.16
Gln CAA 2936 1.51 2313 1.83 CGA 348 0.30
122 0.32 CAG 943 0.49 217 0.17
CGG 99 0.09 20 0.05 Asn AAU11290 1.80
5574 1.38 Ser AGU 3344 1.35 804 0.77
AAC 1273 0.20 2495 0.62 AGC 1676 0.67 1272
1.21 Lys AAA12380 1.4210585 1.82 Arg AGA
4213 3.67 1761 4.69 AAG 5102 0.58 1064
0.18 AGG 1585 1.38 241 0.64 Asp GAU
9509 1.75 2565 1.33 Gly GGU 3383 1.30 552
0.51 GAC 1338 0.25 1303 0.67 GGC
1620 0.62 681 0.63 Glu GAA 7952 1.31 6240
1.76 GGA 3666 1.40 2650 2.44 GAG
4151 0.69 856 0.24 GGG 1770 0.68 458
0.42
46
B. burgdorferi
Lagging
Leading
  • AA codon N RSCU N RSCU
  • Phe UUU 12116 1.88 4146 1.64

Significantly higher (95 conf.)
47
B. burgdorferi
  • Sarah French
  • Inserted high expression ribosomal RNA genes
    downstream of an E. coli promoter on a plasmid.
  • One clone had the rRNA gene in the sense
    orientation (on the leading strand) (CF78)
  • The other was in the anti-sense (lagging
    strand) orientation (CF95).

48
Replication fork movement - codirectional
49
Replication fork movement - antisense
50
Positions of replication forks after 6 minutes
between origin and rrnB In rrnB Beyond
rrnB CF78 3 (9) 2 (6) 30 (86) CF95 6
(16) 26 (68) 6 (16)
51
B. burgdorferi
  • B. burgdorferi has 66 of its genes on the
    leading strands
  • Almost all of the highly-expressed genes are on
    the leading strands
  • There is a significant difference in codon usage
    between the two strands

52
B. burgdorferi
  • Codon selection in B. burgdorferi is the result
    of replicational selection, transcriptional
    selection and mutational bias (not GC mutational
    bias, rather AC mutational bias)

53
Plasmodium falciparum Gametocyte
Courtesy London School of Hygeine and Tropical
Medecine.
54
Malaria vector - Anophalene mosquito
55
Plasmodium lifecycle
56
P. falciparum
  • 14 Chromosomes
  • From 0.65 Mb to 3.4 Mb in size
  • Chromosome 2
  • 947,103 bp in length
  • 210 predicted ORFs
  • 82 AT base composition overall
  • Origin of replication not identified by
    similarity searches
  • Distribution of genes is fairly even on both
    Watson and Crick strands

57
Replication in Plasmodium
  • Two theories
  • Multiple origins of replication.
  • Mathematically derived (Janse et al., 1986, Mol.
    Biochem. Parasitol.).
  • Single origin of replication in each chromosome.
  • Anecdotal - "what we see in some other organisms"

58
Multiple origins theory
  • During microgametogenesis, the entire haploid
    genome is replicated in about 3.2 minutes.
  • If we assume a replication rate of 50
    bases/second, then there is a requirement for at
    least 1300 origins of replication (one every 19.2
    Kb).

59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
P. falciparum
A3
C3
63
P. falciparum
G3
T3
64
Implications for single origins of replication.
  • For chromosomes 2 and 3, replication must proceed
    60 times faster than previously thought.
  • If the same holds true for the largest
    chromosomes, then replication must proceed 170
    times faster.

65
Gene recognition
pA0.2 pC0.4 pG0.1 pT0.3
S
66
  • Acknowledgements
  • London
  • Prof T. Martin Embley,
  • Dr. Mark Wilkinson,
  • Dr. Robert Hirt
  • Maynooth
  • Chris Creevey,
  • David Fitzpatrick.

67
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com