Title: Genomics in molecular ecology and evolution
1Genomics in molecular ecology and evolution
2GenomicsWhat is it?
- Use of entire DNA sequence to study an organism
What can genomics tell us?
- Evolutionary history of individual genes in the
genome (Not always the same as the history
of the genome) - Clues about metabolic/physioligic capabilites
based on gene presence or absence
What cant genomics tell us?
- Function of all the genes in the genome
- When they are expressed
- How they interact with one another
3GenomesWhat do they look like?
Prokaryotes
- Often a single circular genome
- Sometimes multiple genomes and/or large plasmids
- A few examples of linear genomes
Eukaryotes
- Diploid2 copies of each chromosome
- Multiple different chromosomes
4GenomesHow Big?
Genome Size of GenesH.
influenzae 1.8 Mb 1700E. coli 4.7
Mb 4400Yeast 12 Mb 6300Fruit Fly 180
Mb 13,600Human 3000 Mb 30,000
1 Mb 1 million base pairs
Source-Oak Ridge National Lab Computational
Genomics Group
5Eukaryotes have a lot of junk DNA
Why isnt there a direct relationship between
genome size and gene content?
Mammalian cellsless than 1 of genomic DNA is
coding
Intron 1
DNA (ds)
Exon 1
Exon 2
Exon 3
Intron 2
transcription
nuclear RNA (ss)
splicing
mRNA (ss) for 1 gene
6Prokaryotes
--much more compact genome structure up to 90
coding
Gene1
Gene 2
Gene 3
DNA (ds)
--much less repetitive DNA
So, genome sequencing started with prokaryotes
7First complete genome sequence of a free-living
organism
1995 Haemophilus influenzae
1,830,137 base pairs (1.8 Mbp), 1743 genes
8How do you sequence an entire genome?
clone library
genomic DNA
sheared to 3kb
insert ends sequenced to 8X coverage
computer assembly of sequence reads
finishing and closure using PCR to close gaps and
verify assembly
9Since 1995 there has been an explosion in the
number of completed genomes
Bacteria 106 completed, 319 ongoing Archaea 16
Completed, 23 ongoing Eukaryotes 19 completed,
235 ongoing
Why? Advances in sequencing technologymajor
sequencing centers have enough capacity to
complete a bacterial genome in a day!
10Case study Escherichia coli
What can we learn from whole genome sequences?
- One of hundreds of microbial species that reside
in the mammalian colon - Often used in water quality studies as an
indicator of fecal contamination - There are over 170 serogroups of E. coli, the
majority are not harmful - BUT..
11Case study Escherichia coli
- One particular type of E. coli called O157 H7 is
pathogenic - Responsible for numerous incidents of food
poisoning in the early 90s - Many linked to contaminated ground beef
- 1993 outbreak in Seattleover 400 people
affected, 3 deaths
12What makes O157 H7 different from other E. coli
?
Perna et al., 2001
13O157 H7 contains many more genes
- 1.34 Mbp of DNA
- 1387 additional genes
- 3574 shared genes
- 911 identical proteins
BUT
Relationship between the two strains would be
hard to resolve using a single molecular marker
14Where did the extra genes come from?
Bacterial divide by asexual, clonal
reproduction
Point mutations could arise in the course of DNA
replication
But entire new genses must be acquired from other
organisms 3 mechanisms for this phage plasmi
ds conjugation
15Horizontal (lateral) gene transfer explains the
E. coli strain differences
Eisen, Nature 2001
16A significant fraction of many microbial genomes
may have been acquired through horizontal transfer
Ochman et al., 1999
17What does this mean for microbial evolution?
- New traits are acquired in discrete jumps, rather
than gradual modification of existing abilities - Newly acquired capabilities may allow recipient
to outcompete relatives without additional genes
and/or colonize new environments - Examples of horizontally transferred genes
- Virulence factors
- Antibiotic resistance
- Metabolic properties
18So, why arent genomes continually growing in
size?
- Gene Loss
- DNA is expensive to maintain
- Genes (both in the ancestral genome and newly
acquired) must provide a meaningful function
(enhance fitness) or they will be lost
19Loss of genes for NO3 and NO2 utilization in
surface dwelling phytoplankton
20What are the genes doing?
- Function is assigned based on degree of
similarity of an already characterized gene in
the database - 2 potential problems with this approach
Transitive catastrophe
Gene A Assigned function based on mutant
phenotype or biochemical characterization of
protein product
Gene B From genome sequence 70 identity to gene
A
Gene C From genome sequence 60 identity to gene
B
Gene D From genome sequence 70 identity to gene
C
But--Gene D has only 20 identity to gene A!
21What if there is nothing at all similar in the
database?
4
4
2
20
- Call it a hypothetical gene
- If it has a match but that is to another
hypothetical gene? - conserved hypothetical
1
4
1
32
2
1
Conserved Hypothetical
25
Hypothetical
1
4
DNA Replication Repair
Energy Metabolism
Nucleotide Metabolism
Lipid Metabolism
Transcription
Amino Acid Metabolism
Translation
Carbohydrate Metabolism
Transport
Cofactor Metabolism
Unassigned
22What about eukaryotes?
- Most complete genomes are of model organisms
(yeast, mustard plant, fruit fly, worm) - Japanese puffer fish (Fugu rubripes) has smallest
known vertebrate genome (400 Mb) - Has helped in predicting 1000 previously
unrecognized genes in the human genome
23What about eukaryotes?
- Many more organisms in the pipelinealgae,
insects, birds, sea urchin, sand crab, tilapia,
zebrafish, atlantic salmon - Plans to sequence a complete mitochondrial
genomes in each of the 146 families of mammals
The frozen zoo at the San Diego Zoo
24Key Points
- Genomics technology is advancing rapidly
- Enough data to do comparative evolutionary
studies in microbes - Population genomics coming soon
- Genomes are dynamic entities
- Horizontal gene transfer plays an important role
in evolution - Gene loss occurs constantly in the environment
- Genomic analyses cannot tell us everything
- A high percentage of genes are of unknown
function - Even those genes assigned a function need
laboratory verification