Introduction to Microbial Genomics - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Introduction to Microbial Genomics

Description:

Center for Biological Sequence Analysis The Technical University of Denmark DTU ... Kronborg Castle. Comparative Microbial Genomics group ... – PowerPoint PPT presentation

Number of Views:254
Avg rating:3.0/5.0
Slides: 47
Provided by: cbs6
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Microbial Genomics


1
Introduction to Microbial Genomics
An Introduction to Genomics
Dave Ussery Biological Sequence Analysis DTU
course 27803 27 April, 2007
2
(No Transcript)
3
Outline
Introduction - where are we? Comparison of Genome Sizes The Human Genome Project A few words about speed of sequencing DNA repeats Brief intro. to genome atlases
4
Where are we in the tree of life?
5
rRNA tree
?????????? ??????? ?????????????? 2006?3?8?
6

4.5 Bya
4.0 Bya
3.5 Bya
3.0 Bya
2.5 Bya
2.0 Bya
1.5 Bya
1.0 Bya
0.5 Bya
today
Oxygen levels rise
Time (Billions of years ago, Bya)
Earth formed (4.6 Bya)
Oldest rocks
7

4.5 Bya
4.0 Bya
3.5 Bya
3.0 Bya
2.5 Bya
2.0 Bya
1.5 Bya
1.0 Bya
0.5 Bya
today
Oxygen levels rise
Time (Billions of years ago, Bya)
Earth formed (4.6 Bya)
Oldest rocks
8
rRNA tree
9
(No Transcript)
10
(No Transcript)
11
What is a genome?
12
FX174
gagttttatc gcttccatga cgcagaagtt aacactttcg
gatatttctg atgagtcgaa aaattatctt gataaagcag
gaattactac tgcttgttta cgaattaaat cgaagtggac
tgctggcgga aaatgagaaa attcgaccta tccttgcgca
gctcgagaag ctcttacttt gcgacctttc gccatcaact
aacgattctg tcaaaaactg acgcgttgga tgaggagaag
tggcttaata tgcttggcac gttcgtcaag gactggttta
gatatgagtc acattttgtt catggtagag attctcttgt
tgacatttta aaagagcgtg gattactatc tgagtccgat
gctgttcaac cactaatagg taagaaatca tgagtcaagt
tactgaacaa tccgtacgtt tccagaccgc tttggcctct
attaagctca ttcaggcttc tgccgttttg gatttaaccg
aagatgattt cgattttctg acgagtaaca aagtttggat
tgctactgac cgctctcgtg ctcgtcgctg cgttgaggct
tgcgtttatg gtacgctgga ctttgtggga taccctcgct
ttcctgctcc tgttgagttt attgctgccg tcattgctta
ttatgttcat cccgtcaaca ttcaaacggc ctgtctcatc
atggaaggcg ctgaatttac ggaaaacatt attaatggcg
tcgagcgtcc ggttaaagcc gctgaattgt tcgcgtttac
cttgcgtgta cgcgcaggaa acactgacgt tcttactgac
gcagaagaaa acgtgcgtca aaaattacgt gcggaaggag
tgatgtaatg tctaaaggta aaaaacgttc tggcgctcgc
cctggtcgtc cgcagccgtt gcgaggtact aaaggcaagc
gtaaaggcgc tcgtctttgg tatgtaggtg gtcaacaatt
ttaattgcag gggcttcggc cccttacttg aggataaatt
atgtctaata ttcaaactgg cgccgagcgt atgccgcatg
acctttccca tcttggcttc cttgctggtc agattggtcg
tcttattacc atttcaacta ctccggttat cgctggcgac
tccttcgaga tggacgccgt tggcgctctc cgtctttctc
cattgcgtcg tggccttgct attgactcta ctgtagacat
ttttactttt tatgtccctc atcgtcacgt ttatggtgaa
cagtggatta agttcatgaa ggatggtgtt aatgccactc
ctctcccgac tgttaacact actggttata ttgaccatgc
cgcttttctt ggcacgatta accctgatac caataaaatc
cctaagcatt tgtttcaggg ttatttgaat atctataaca
actattttaa agcgccgtgg atgcctgacc gtaccgaggc
taaccctaat gagcttaatc aagatgatgc tcgttatggt
ttccgttgct gccatctcaa aaacatttgg actgctccgc
ttcctcctga gactgagctt tctcgccaaa tgacgacttc
taccacatct attgacatta tgggtctgca agctgcttat
gctaatttgc atactgacca agaacgtgat tacttcatgc
agcgttacca tgatgttatt tcttcatttg gaggtaaaac
ctcttatgac gctgacaacc gtcctttact tgtcatgcgc
tctaatctct gggcatctgg ctatgatgtt gatggaactg
accaaacgtc gttaggccag ttttctggtc gtgttcaaca
gacctataaa cattctgtgc cgcgtttctt tgttcctgag
catggcacta tgtttactct tgcgcttgtt cgttttccgc
ctactgcgac taaagagatt cagtacctta acgctaaagg
tgctttgact tataccgata ttgctggcga ccctgttttg
tatggcaact tgccgccgcg tgaaatttct atgaaggatg
ttttccgttc tggtgattcg tctaagaagt ttaagattgc
tgagggtcag tggtatcgtt atgcgccttc gtatgtttct
cctgcttatc accttcttga aggcttccca ttcattcagg
aaccgccttc tggtgatttg caagaacgcg tacttattcg
ccaccatgat tatgaccagt gtttccagtc cgttcagttg
ttgcagtgga atagtcaggt taaatttaat gtgaccgttt
atcgcaatct gccgaccact cgcgattcaa tcatgacttc
gtgataaaag attgagtgtg aggttataac gccgaagcgg
taaaaatttt aatttttgcc gctgaggggt tgaccaagcg
aagcgcggta ggttttctgc ttaggagttt aatcatgttt
cagactttta tttctcgcca taattcaaac tttttttctg
ataagctggt tctcacttct gttactccag cttcttcggc
acctgtttta cagacaccta aagctacatc gtcaacgtta
tattttgata gtttgacggt taatgctggt aatggtggtt
ttcttcattg cattcagatg gatacatctg tcaacgccgc
taatcaggtt gtttctgttg gtgctgatat tgcttttgat
gccgacccta aattttttgc ctgtttggtt cgctttgagt
cttcttcggt tccgactacc ctcccgactg cctatgatgt
ttatcctttg aatggtcgcc atgatggtgg ttattatacc
gtcaaggact gtgtgactat tgacgtcctt ccccgtacgc
cgggcaataa cgtttatgtt ggtttcatgg tttggtctaa
ctttaccgct actaaatgcc gcggattggt ttcgctgaat
aagagattat ttgtctccag ccacttaagt gaggtgattt
atgtttggtg ctattgctgg cggtattgct tctgctcttg
ctggtggcgc catgtctaaa ttgtttggag gcggtcaaaa
agccgcctcc ggtggcattc aaggtgatgt gcttgctacc
gataacaata ctgtaggcat gggtgatgct ggtattaaat
ctgccattca aggctctaat gttcctaacc ctgatgaggc
cgcccctagt tttgtttctg gtgctatggc taaagctggt
aaaggacttc ttgaaggtac gttgcaggct ggcacttctg
ccgtttctga taagttgctt gatttggttg gacttggtgg
caagtctgcc gctgataaag gaaaggatac tcgtgattat
cttgctgctg catttcctga gcttaatgct tgggagcgtg
ctggtgctga tgcttcctct gctggtatgg ttgacgccgg
atttgagaat caaaaagagc ttactaaaat gcaactggac
aatcagaaag agattgccga gatgcaaaat gagactcaaa
aagagattgc tggcattcag tcggcgactt cacgccagaa
tacgaaagac caggtatatg cacaaaatga gatgcttgct
tatcaacaga aggagtctac tgctcgcgtt gcgtctatta
tggaaaacac caatctttcc aagcaacagc aggtttccga
gattatgcgc caaatgctta ctcaagctca aacggctggt
cagtatttta ccaatgacca aatcaaagaa atgactcgca
aggttagtgc tgaggttgac ttagttcatc agcaaacgca
gaatcagcgg tatggctctt ctcatattgg cgctactgca
aaggatattt ctaatgtcgt cactgatgct gcttctggtg
tggttgatat ttttcatggt attgataaag ctgttgccga
tacttggaac aatttctgga aagacggtaa agctgatggt
attggctcta atttgtctag gaaataaccg tcaggattga
caccctccca attgtatgtt ttcatgcctc caaatcttgg
aggctttttt atggttcgtt cttattaccc ttctgaatgt
cacgctgatt attttgactt tgagcgtatc gaggctctta
aacctgctat tgaggcttgt ggcatttcta ctctttctca
atccccaatg cttggcttcc ataagcagat ggataaccgc
atcaagctct tggaagagat tctgtctttt cgtatgcagg
gcgttgagtt cgataatggt gatatgtatg ttgacggcca
taaggctgct tctgacgttc gtgatgagtt tgtatctgtt
actgagaagt taatggatga attggcacaa tgctacaatg
tgctccccca acttgatatt aataacacta tagaccaccg
ccccgaaggg gacgaaaaat ggtttttaga gaacgagaag
acggttacgc agttttgccg caagctggct gctgaacgcc
ctcttaagga tattcgcgat gagtataatt accccaaaaa
gaaaggtatt aaggatgagt gttcaagatt gctggaggcc
tccactatga aatcgcgtag aggctttgct attcagcgtt
tgatgaatgc aatgcgacag gctcatgctg atggttggtt
tatcgttttt gacactctca cgttggctga cgaccgatta
gaggcgtttt atgataatcc caatgctttg cgtgactatt
ttcgtgatat tggtcgtatg gttcttgctg ccgagggtcg
caaggctaat gattcacacg ccgactgcta tcagtatttt
tgtgtgcctg agtatggtac agctaatggc cgtcttcatt
tccatgcggt gcactttatg cggacacttc ctacaggtag
cgttgaccct aattttggtc gtcgggtacg caatcgccgc
cagttaaata gcttgcaaaa tacgtggcct tatggttaca
gtatgcccat cgcagttcgc tacacgcagg acgctttttc
acgttctggt tggttgtggc ctgttgatgc taaaggtgag
ccgcttaaag ctaccagtta tatggctgtt ggtttctatg
tggctaaata cgttaacaaa aagtcagata tggaccttgc
tgctaaaggt ctaggagcta aagaatggaa caactcacta
aaaaccaagc tgtcgctact tcccaagaag ctgttcagaa
tcagaatgag ccgcaacttc gggatgaaaa tgctcacaat
gacaaatctg tccacggagt gcttaatcca acttaccaag
ctgggttacg acgcgacgcc gttcaaccag atattgaagc
agaacgcaaa aagagagatg agattgaggc tgggaaaagt
tactgtagcc gacgttttgg cggcgcaacc tgtgacgaca
aatctgctca aatttatgcg cgcttcgata aaaatgattg
gcgtatccaa cctgca
13
Organism bp genes
F-X174 5386 9
Escherichia coli 4,600,000 4288
Saccharomyces cerevisiae 13,000,000 5885
Caenorhabditis elegans 100,000,000 14,000
Arabidopsis thaliana 120,000,000 10,000
Drosophila melanogastor 180,000,000 12,000
Homo sapiens 3,400,000,000 25,000
14
(No Transcript)
15
The Human Genome Project
Started more than 20 years ago (1985)
The U.S. government agreed to invest 200,000,000
U.S. per year for 20 years.
3,400,000,000 bp per haploid genome
6,800,000,000 bp per diploid genome
One base per second 216 years!
16
year human genes mapped years to sequence human genome
1970 none not possible
1980 3 4,000,000 years
1990 12 1000 years
2000 25,000 draft
2005 30,000 new draft
2007 31,784 Craig 30,384 1000 human genome
chimp, chicken, dog, mouse, pig, rat...
17
Discussion questions
Why is the human genome not finished yet?
Why is the human genome so large?
One base per second 216 years!
2 or less codes for proteins.
18
Aristotles ladder of complexity
19
(No Transcript)
20
Database of Genome Sizes (DOGS) ladder
of complexity
minerals
21
The C-value paradox The genome size of an
organism is defined as the amount of haploid DNA
in a genomic set (e.g., an egg or sperm nucleus).
This is also referred to as the "C-value" the
"C" means "constant" or "characteristic", since
the size of a genome is usually constant for a
given species. The large difference in genome
sizes without any seeming relation to an
organisms complexity, is called the C-value
paradox.
22
What does all this DNA do?
23
DNA repeats
The approximate size and characteristics of
genomes was characterised in the 1960s, in a
classic study of the kinetics of DNA
reassociation by Britten and Kohne (1968). 
They found that the DNA could be divided into
four fractions
1. foldback DNA 2. highly repetitive DNA 3.
middle-repetitive DNA 4. single-copy DNA
The repetitive DNA can either be localised to
discrete regions, or dispersed.
Britten,R.J., Kohne,D.E., "Repeated sequences in
DNA", Science, 161529-540, (1968). 
24
Highly repetitive DNA
Dispersed - e.g., Alu family about 300 bp
long 500,000 copies in humans (about
5 of the human genome) dispersed
throughout the chromosomes
Localised highly repetitive sequences
about 2-10 bp long present in millions of
copies, often in large blocks (about 6 of
the human genome) associated with
heterochromatin usually very high AT
content
25
Localised repetitive DNA
Often, satellite DNA consists of long tandem
arrays of repeated sequences, all localised to
one or a few discrete regions in the
chromosomes.  For example, in the kangaroo rat
(Dipodomys ordii), more than 50 of the genome
consists of three families of repeated sequences
(AAG)n, where n 2.24 x 109 (TTAGGG)n, where n
2.2 x 109 (ACACAGCGGG)n, where n 1.2 x 109
26
Middle repetitive DNA
makes up more than 40 of the human genome
position varies due to transposable elements
Includes the following types of
sequences
- Dinucleotide repeats - microsatellite DNA
- TRInucleotide repeats - associated with many
diseases - (e.g., Fragile X, muscular
distrophy)
27
What about bacteria?
28
(No Transcript)
29
(No Transcript)
30
A decade of sequencing prokaryotic genomes.
Number Genomes Published
Year
31
Kronborg Castle
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
What is Bioinformatics?
Bioinformatics is the application of machine learning processes to biological information.
36
bioinformatics, n.
The science of information and information flow
in biological systems, esp. the use of
computational methods in genetics and genomics.
37
(No Transcript)
38
What is Biological Information?


Genome -gt Transcriptome -gtProteome
39
(No Transcript)
40
(No Transcript)
41
The DNA sequence contains information. But what
kind of information?
42
FX174
gagttttatc gcttccatga cgcagaagtt aacactttcg
gatatttctg atgagtcgaa aaattatctt gataaagcag
gaattactac tgcttgttta cgaattaaat cgaagtggac
tgctggcgga aaatgagaaa attcgaccta tccttgcgca
gctcgagaag ctcttacttt gcgacctttc gccatcaact
aacgattctg tcaaaaactg acgcgttgga tgaggagaag
tggcttaata tgcttggcac gttcgtcaag gactggttta
gatatgagtc acattttgtt catggtagag attctcttgt
tgacatttta aaagagcgtg gattactatc tgagtccgat
gctgttcaac cactaatagg taagaaatca tgagtcaagt
tactgaacaa tccgtacgtt tccagaccgc tttggcctct
attaagctca ttcaggcttc tgccgttttg gatttaaccg
aagatgattt cgattttctg acgagtaaca aagtttggat
tgctactgac cgctctcgtg ctcgtcgctg cgttgaggct
tgcgtttatg gtacgctgga ctttgtggga taccctcgct
ttcctgctcc tgttgagttt attgctgccg tcattgctta
ttatgttcat cccgtcaaca ttcaaacggc ctgtctcatc
atggaaggcg ctgaatttac ggaaaacatt attaatggcg
tcgagcgtcc ggttaaagcc gctgaattgt tcgcgtttac
cttgcgtgta cgcgcaggaa acactgacgt tcttactgac
gcagaagaaa acgtgcgtca aaaattacgt gcggaaggag
tgatgtaatg tctaaaggta aaaaacgttc tggcgctcgc
cctggtcgtc cgcagccgtt gcgaggtact aaaggcaagc
gtaaaggcgc tcgtctttgg tatgtaggtg gtcaacaatt
ttaattgcag gggcttcggc cccttacttg aggataaatt
atgtctaata ttcaaactgg cgccgagcgt atgccgcatg
acctttccca tcttggcttc cttgctggtc agattggtcg
tcttattacc atttcaacta ctccggttat cgctggcgac
tccttcgaga tggacgccgt tggcgctctc cgtctttctc
cattgcgtcg tggccttgct attgactcta ctgtagacat
ttttactttt tatgtccctc atcgtcacgt ttatggtgaa
cagtggatta agttcatgaa ggatggtgtt aatgccactc
ctctcccgac tgttaacact actggttata ttgaccatgc
cgcttttctt ggcacgatta accctgatac caataaaatc
cctaagcatt tgtttcaggg ttatttgaat atctataaca
actattttaa agcgccgtgg atgcctgacc gtaccgaggc
taaccctaat gagcttaatc aagatgatgc tcgttatggt
ttccgttgct gccatctcaa aaacatttgg actgctccgc
ttcctcctga gactgagctt tctcgccaaa tgacgacttc
taccacatct attgacatta tgggtctgca agctgcttat
gctaatttgc atactgacca agaacgtgat tacttcatgc
agcgttacca tgatgttatt tcttcatttg gaggtaaaac
ctcttatgac gctgacaacc gtcctttact tgtcatgcgc
tctaatctct gggcatctgg ctatgatgtt gatggaactg
accaaacgtc gttaggccag ttttctggtc gtgttcaaca
gacctataaa cattctgtgc cgcgtttctt tgttcctgag
catggcacta tgtttactct tgcgcttgtt cgttttccgc
ctactgcgac taaagagatt cagtacctta acgctaaagg
tgctttgact tataccgata ttgctggcga ccctgttttg
tatggcaact tgccgccgcg tgaaatttct atgaaggatg
ttttccgttc tggtgattcg tctaagaagt ttaagattgc
tgagggtcag tggtatcgtt atgcgccttc gtatgtttct
cctgcttatc accttcttga aggcttccca ttcattcagg
aaccgccttc tggtgatttg caagaacgcg tacttattcg
ccaccatgat tatgaccagt gtttccagtc cgttcagttg
ttgcagtgga atagtcaggt taaatttaat gtgaccgttt
atcgcaatct gccgaccact cgcgattcaa tcatgacttc
gtgataaaag attgagtgtg aggttataac gccgaagcgg
taaaaatttt aatttttgcc gctgaggggt tgaccaagcg
aagcgcggta ggttttctgc ttaggagttt aatcatgttt
cagactttta tttctcgcca taattcaaac tttttttctg
ataagctggt tctcacttct gttactccag cttcttcggc
acctgtttta cagacaccta aagctacatc gtcaacgtta
tattttgata gtttgacggt taatgctggt aatggtggtt
ttcttcattg cattcagatg gatacatctg tcaacgccgc
taatcaggtt gtttctgttg gtgctgatat tgcttttgat
gccgacccta aattttttgc ctgtttggtt cgctttgagt
cttcttcggt tccgactacc ctcccgactg cctatgatgt
ttatcctttg aatggtcgcc atgatggtgg ttattatacc
gtcaaggact gtgtgactat tgacgtcctt ccccgtacgc
cgggcaataa cgtttatgtt ggtttcatgg tttggtctaa
ctttaccgct actaaatgcc gcggattggt ttcgctgaat
aagagattat ttgtctccag ccacttaagt gaggtgattt
atgtttggtg ctattgctgg cggtattgct tctgctcttg
ctggtggcgc catgtctaaa ttgtttggag gcggtcaaaa
agccgcctcc ggtggcattc aaggtgatgt gcttgctacc
gataacaata ctgtaggcat gggtgatgct ggtattaaat
ctgccattca aggctctaat gttcctaacc ctgatgaggc
cgcccctagt tttgtttctg gtgctatggc taaagctggt
aaaggacttc ttgaaggtac gttgcaggct ggcacttctg
ccgtttctga taagttgctt gatttggttg gacttggtgg
caagtctgcc gctgataaag gaaaggatac tcgtgattat
cttgctgctg catttcctga gcttaatgct tgggagcgtg
ctggtgctga tgcttcctct gctggtatgg ttgacgccgg
atttgagaat caaaaagagc ttactaaaat gcaactggac
aatcagaaag agattgccga gatgcaaaat gagactcaaa
aagagattgc tggcattcag tcggcgactt cacgccagaa
tacgaaagac caggtatatg cacaaaatga gatgcttgct
tatcaacaga aggagtctac tgctcgcgtt gcgtctatta
tggaaaacac caatctttcc aagcaacagc aggtttccga
gattatgcgc caaatgctta ctcaagctca aacggctggt
cagtatttta ccaatgacca aatcaaagaa atgactcgca
aggttagtgc tgaggttgac ttagttcatc agcaaacgca
gaatcagcgg tatggctctt ctcatattgg cgctactgca
aaggatattt ctaatgtcgt cactgatgct gcttctggtg
tggttgatat ttttcatggt attgataaag ctgttgccga
tacttggaac aatttctgga aagacggtaa agctgatggt
attggctcta atttgtctag gaaataaccg tcaggattga
caccctccca attgtatgtt ttcatgcctc caaatcttgg
aggctttttt atggttcgtt cttattaccc ttctgaatgt
cacgctgatt attttgactt tgagcgtatc gaggctctta
aacctgctat tgaggcttgt ggcatttcta ctctttctca
atccccaatg cttggcttcc ataagcagat ggataaccgc
atcaagctct tggaagagat tctgtctttt cgtatgcagg
gcgttgagtt cgataatggt gatatgtatg ttgacggcca
taaggctgct tctgacgttc gtgatgagtt tgtatctgtt
actgagaagt taatggatga attggcacaa tgctacaatg
tgctccccca acttgatatt aataacacta tagaccaccg
ccccgaaggg gacgaaaaat ggtttttaga gaacgagaag
acggttacgc agttttgccg caagctggct gctgaacgcc
ctcttaagga tattcgcgat gagtataatt accccaaaaa
gaaaggtatt aaggatgagt gttcaagatt gctggaggcc
tccactatga aatcgcgtag aggctttgct attcagcgtt
tgatgaatgc aatgcgacag gctcatgctg atggttggtt
tatcgttttt gacactctca cgttggctga cgaccgatta
gaggcgtttt atgataatcc caatgctttg cgtgactatt
ttcgtgatat tggtcgtatg gttcttgctg ccgagggtcg
caaggctaat gattcacacg ccgactgcta tcagtatttt
tgtgtgcctg agtatggtac agctaatggc cgtcttcatt
tccatgcggt gcactttatg cggacacttc ctacaggtag
cgttgaccct aattttggtc gtcgggtacg caatcgccgc
cagttaaata gcttgcaaaa tacgtggcct tatggttaca
gtatgcccat cgcagttcgc tacacgcagg acgctttttc
acgttctggt tggttgtggc ctgttgatgc taaaggtgag
ccgcttaaag ctaccagtta tatggctgtt ggtttctatg
tggctaaata cgttaacaaa aagtcagata tggaccttgc
tgctaaaggt ctaggagcta aagaatggaa caactcacta
aaaaccaagc tgtcgctact tcccaagaag ctgttcagaa
tcagaatgag ccgcaacttc gggatgaaaa tgctcacaat
gacaaatctg tccacggagt gcttaatcca acttaccaag
ctgggttacg acgcgacgcc gttcaaccag atattgaagc
agaacgcaaa aagagagatg agattgaggc tgggaaaagt
tactgtagcc gacgttttgg cggcgcaacc tgtgacgaca
aatctgctca aatttatgcg cgcttcgata aaaatgattg
gcgtatccaa cctgca
43
(No Transcript)
44
(No Transcript)
45
Summary
1. A genome is sum of all the DNA sequences
(chromosomes) in an organism.
2. The sizes of genomes range about 100,000,000
fold - from small viruses to very large amoebas.
3. Many eukaryotic genomes contain large
fractions of repeated DNA sequences, and there is
no apparent correlation between the size of an
organism and its biological complexity (C-value
Paradox)
4. Sequence databases are growing faster than
computational power!
46
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com