Title: Genome projects and model organisms
1Genome projects and model organisms
- Level 3 Molecular Evolution and Bioinformatics
- Jim Provan
2Genome projects and model organisms
3Genome projects
- Completed genomes
- Eubacteria (inc. Escherichia coli, Bacillis
subtilis, Haemophilus influenzae, Synechocystis
PCC6803) - Archaea (inc. Methanococcus jannaschii,
Methanobacterium thermoautotrophium) - Eukarya
- Saccharomyces cerevisiae
- Caenorhabditis elegans
- Homo sapiens
- Arabidopsis thaliana
- Partially sequenced genomes e.g. Drosophila
melanogaster, Fugu rubripes, Oryza sativa
4Relationships between model organisms
5Eubacterial genomes Bacillus subtilis
- Genome 4,214,810 bp
- 4100 protein sequences
- Average gene 890 bp
- Density 1 gene / 1028 bp
- 89 of total genome is protein-coding
- Protein coding genes
- 53 single copy
- 47 paralogous gene families
- Mostly involved in transport
- Genes are proximal i.e. have evolved through
tandem duplication of single genes
6Eubacterial genomes Bacillus subtilis
- On the basis of homology with genes of known
function, 58 of B. subtilis genes could be
assigned to functional categories - The B. subtilis genome contains remnants of 10
prophages, suggesting that horizontal transfer
has played a significant role in evolution of the
genome - Orthologous counterparts in other bacteria
- 1000 genes (24) have counterparts in E. coli
(Gram -ve) - More significantly, 100 operons conserved as
well - 800 genes (20) have orthologues in
Synechocystis PCC6803 (Cyanobacterium)
7Eubacterial genomes Mycoplasmas
- Obligate parasites
- Thought to be derived from Gram ve bacteria
similar to B. subtilis - 312 genes of M. genitalium (66) have homologues
in Gram ve bacteria - Parasitic lifestyle has led to a dramatic
reduction in genome size and content - Smallest-known genome in a self-replicating
organism
8Eubacterial genomes Mycoplasmas
- M. genitalium genome
- Circular chromosome of 580,070 bp
- Only 470 predicted genes for DNA replication,
transcription and translation, DNA repair,
cellular transport and energy metabolism - Coding regions comprise 88 of the genome
- Similar to H. influenzae (85)
- Suggests that genome reduction has been due to
loss of genes and not reduction in gene size or
increase in gene density - M. pneumoniae genome
- Larger than M. genitalium (816 kbp)
- All M. genitalium genes found in M. pneumoniae
- Not simply truncated - evidence of genome
rearrangements
9Eubacterial genomes E. coli
- 4288 protein coding genes
- Average ORF 317 amino acids
- Very compact average distance between genes
118bp - Numerous paralogous gene families 38 45 of
genes arisen through duplication - Homologues
- H. influenzae (1130 of 1703)
- Synechocystis (675 of 3168)
- M. jannaschii (231 of 1738)
- S. cerevisiae (254 of 5885)
10The minimum genome and redundancy
- Minimum set of genes required for survival
- Replication and transcription
- Translation (rRNA, ribosomal proteins, tRNAs
etc.) - Transport proteins to derive nutrients
- ATP synthesis
- Entire pathways eliminated in Mycoplasma
- Amino acid biosynthesis (1 gene vs. 68 in H.
influenzae) - Metabolism (44 genes vs. 228 in H. influenzae)
- Comparison of M. genitalium and H. influenzae has
identified a minimum set of 256 genes
11Archaeal genomes M. jannaschii
- Requires no organic nutrients for growth has all
biochemical pathways to use inorganic
constituents - Only 38 of genes could be assigned a known
function - Genes for translation, transcription and DNA
replication similar to eukaryote genes - DNA polymerase
- Ribosomal proteins
- Translation initiation factors
12Fungal genomes S. cerevisiae
- First completely sequenced eukaryote genome
- Very compact genome
- Short intergenic regions
- Scarcity of introns
- Lack of repetitive sequences
- Strong evidence of duplication
- Chromosome segments
- Single genes
- Redundancy non-essential genes provide selective
advantage
13Plant genomes Arabidopsis thaliana
- Contains 25,498 genes from 11,000 families
- Cross-phylum matches
- Vertebrates 12
- Bacteria / Archaea 10
- Fungi 8
- 60 ESTs have no match in non-plant databases
- Evolution involved whole genome duplication
followed by subsequent gene loss and extensive
local gene duplications
14Invertebrate genomes C. elegans
- Genome even less compact than yeast
- One gene every 7143 bp (2155 bp in yeast)
- Due mainly to introns in protein coding genes
- Much more compact than humans (One gene every
50,000 bp) - Compactness due mainly to polycistronic
arrangement - Trans-splicing
- Co-expression and co-regulation
15Vertebrate genomes Fugu rubripes
- Pufferfish genome (400 Mb) only four times larger
than C. elegans and 7.5 times smaller than human
genome - Homologous genes in Fugu and mammals show
conserved synteny - Same exon-intron organisation
- Introns much smaller
- Useful for identifying conserved essential
elements in vertebrate genomes
16The genome of the cenancestor
- Availability of complete genome sequences from
the three domains of life creates an opportunity
for the reconstruction of the complete genome of
the common ancestor - Of minimal bacterial set (256 genes), 143 have
orthologues in yeast (eukaryote) - Universal translation apparatus suggests that
cenancestor had a fully developed translation
system - Extreme differences in DNA replication apparatus
- Many fundamental metabolic processes are carried
out by similar proteins in Archaea and
eubacteria - Suggests a universal, autotrophic ancestor
- Not all central metabolism is universal
(methanogenesis, photosynthesis etc.)