Title: Genome Sequence determination
1Genome Sequence determination
???
E-mail cychen_at_cycu.edu.tw Web site
www.cychen.idv.tw
2 Complete Microbial Genomes
3Genome what now?
- Sequencing is
- Determining the full nucleotide sequence of one
strain of an organism - Making predictions of genes within that sequence
predicting the function of those genes - HARD!!!!
- Sequencing requires
- Time
- Money
- People
- Computers
4Genome what now?
- Before Sequencing
- Nature of an organism
- Genetic code
- Genome size
- Genome structure
- Sequencing means
- - Bioinformatic
- - Functional Assay
- - More.
5Organism Selection
Library Creation
6Organism Selection
Library Creation
Sequencing
7Organism Selection
Library Creation
Sequencing
Assembly
8Organism Selection
Library Creation
Sequencing
Assembly
9Organism Selection
Library Creation
Sequencing
Assembly
10Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
11Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
12Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
13Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Annotation
14Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Which steps are computationally expensive?
Annotation
15Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Annotation
16Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Which steps have not already been
exceptionally well studied?
Annotation
17Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Annotation
18Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Which step has not been subjected to a variety
of approaches?
Annotation
19Organism Selection
Library Creation
Sequencing
Assembly
Gap Closure
Finishing
Annotation
20Organism Selection
Nature of an organism Pathogen? Genetic
code Genome size Genome structure
21Vibrio vulnificus
Strain YJ016 Genome Size 5.2 Mb Source
Southern Taiwan Significance Virulence Strategy
Whole Genome Shotgun Sequencing Coverage 10X
22Organism Selection
Nature of an organism Pathogen? Genetic code
Special Code? Genome size Genome structure
23Genetic Code Tables http//www.ncbi.nlm.nih.gov/Ta
xonomy/Utils/wprintgc.cgi?modec
24Organism Selection
Nature of an organism Pathogen? Genetic code
Special Code? Genome size How many
Megabases? Genome structure
25Organism Selection
Nature of an organism Pathogen? Genetic code
Special Code? Genome size How many
Megabases? Genome structure Linear/Circular
Chromosome? How many?
26How to sequence a complete genome?
Sizes of bacterial genomes vary between
Mycoplasma genitalium and Myxobacteria
0.6 Mb to 13 Mb reading length of DNA
sequencing reactions is just 600 bp ( 0.0006
Mb) ? a subdivision of the genome is obviously
necessary If the genome needs to be subdivided
into small pieces of suitable sizes for
sequencing, then Individual sequences/fragments
need to be ordered somehow into their "native"
order Therefore, overlaps between each other
are necessary in order to re-assemble the
pieces ? there are two main sequencing
strategies 1. whole genome shotgun
sequencing 2. ordered shotgun sequencing
27c Coverage
28- Two ends are overlapped
- Non overlapped
- Plasmid percentage in contigs
29(No Transcript)
30Library Creation
- Team Works
- QC control
- Time Table
- Budget
- Paper
31Standard Operation Procedures of a Genome project
A. Decision
Mapping
Protocol 1
QC
PCR Confirm
Protocol 2
B. Library
Protocol 3
DNA purification
Protocol 4
PFG
FISH
QC
????
PCR confirm
Protocol 5
Shotgun Library
Picking
Print Labels
C. Sequencing
QC
Protocol 6
Plasmid DNA
Sequencing Reactions
Dye Primers
Protocol 7
QC
Dye Terminator
Protocol 8
Gel Running
Protocol 9
377
QC
Protocol 10
3700
D. Finish
Protocol 11
Assemble
Protocol 12
Annotation
32Library (1)
Random Shearing of Genomic DNA
- Restriction enzyme
- Sau3AI (GATC)--- affected by CG methylase
- MboI (GATC) affected by dam methylase
- -- not affected by CG
methylase - 2. Sonication
- Sonication Bal31 repair T4 DNApolymerase
Sizing - Recover Ligation
- 3. GeneMachine easy sizing by filter
33(No Transcript)
34Library (2) Library clones Sequencing clones
Chromosome I
Chromosome II
1.8 Mb
3.3 Mb
Shotgun library
Library 1 2.5-3.5 kb inserts 7X Coverage
Library 2 5.5-7.5 kb inserts 3X Coverage
Library 3 30 kb inserts Cosmid library 10X Clone
Coverage, 0.4X Sequence Coverage
Sequenced for both ends
Sequenced for both ends
Sequenced for both ends
Assemble the reads by using phred/phrap/consed
softwares
Contig 1
Contig 2
Contig 3
Closing the gaps by primer walking, PCR or
re-sequencing
Annotation
35Library (2) Library clones Sequencing clones
5,000,000 bp
1000 bp/per clone
5,000,000/1000 5000 clones 52 x 96 well plates
10 x redundancy
52 x10 x 96 wells plates
Library clones
Both ends sequencing
2 x 52 x 10 x 96 well plates ? 1000 plates
Sequencing clones
36Sequencing (1) Time table
- 377 2 runs/per day (one run for one 96 well
plate) - 3700 6 runs/per day (POP6)
- 8 runs/per day (POP5)
- 3730 12 runs/per day
2. 377 x 2 sets 4 runs/per day 3700 x 2 sets
6 x1 8 x 1 14 runs/per day total 18 runs
per day
3. 1000 plates / 18 56 days 11 weeks (3
months)
4. Today, 3730 for 4 sets 48 runs/per day
1000 plats /48 20 days
37Sequencing (2) Cost
38????
ABI 377 ABI 3700
MegaBace 4000
ABI 3730XL
39The automated production line for sample
preparation at the Whitehead Institute, Center
for Genome Research. The system consists of
custom-designed factory-style conveyor belt
robots that perform all functions from purifying
DNA from bacterial cultures through setting up
and purifying sequencing reactions.
40Reads vs. Assembled Contigs
41Reads and Assembled Size
42How assemble software works?
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47What is Gap Closure?
- What are gaps?
- Unsequenced regions located between assembly
generated fragments of contiguous sequence
(contigs) - What causes gaps?
- Host toxicity, secondary structure, ???
- Back to gap closure
- Producing, purifying, and sequencing, or
locating, the missing regions of DNA
48How Can I Close Gaps?
- Genome Walking
- Blind PCR extension of contigs
- Multiplex PCR
- Combinatorial trial of every contig pair
- Read Pair Analysis
- Use information stored by the assembler to
suggest alignments, then PCR - Comparative Alignment
49Comparative Alignment(the Bioinformatics
Approach)
- Find locations where contigs are homologous to
known sequences - Determine if any contigs share homology in the
same region of the same sequence - Design primers
- Conduct PCR with those primers
- Sequence that product and use that sequence to
close the gap
50Blast Organism X(cross) - Comparison
- Compares contig ends to NCBI nr database with
BlastN - Parses all hits and finds biologically possible
contig pairs - Using the flanking sequence and Primer3, designs
primers that will produce a PCR product spanning
that gap
51Using the flanking sequence and Primer3, design
primers that produce a PCR product spanning that
gap
TTATGCTATCGAATTCCGACG
GTCTGCAGGTCTTCCGACGTAG
52Using the flanking sequence and Primer3, design
primers that produce a PCR product spanning that
gap
TTATGCTATCGAATTCCGACG
GTCTGCAGGTCTTCCGACGTAG
53Using the flanking sequence and Primer3, design
primers that produce a PCR product spanning that
gap
TTATGCTATCGAATTCCGACG
GTCTGCAGGTCTTCCGACGTAG
54Information to reduce gaps
- The distance of both end sequences
- Cosmid anchors
- Known genes
- Compare with other genomes
- Good luck
55Finishing Standards
- GENERAL RULES FOR FINISHING
- Phase1 draft sequence assembled in contigs
- Phase2 Contigs in order and linking
- Phase3 Assembled as one contig with low
error rate (0.01) - 2. Strategy of finishing
- A. primer walking
- B. re-sequencing individual clone
- C. PCR and sequencing
- D. Screening new clones
- E.. Subcloning
- F. Deletion and sequencing
- G. Change sequencing chemical
- H. Restriction map
- I. End sequencing
56Shotgun sequencing analogy shredding several
copies of Essential Cell Biology, then putting
back together via overlapping phrases Really only
good for small genomes 1995 used for genome
of Haemophilus influenza Problem repetitive
nucleotide sequences, which make up large part of
vertebrate genomes (Analogy -- phrases like the
human genome and difficulties they cause)
5710_10_Repetit.sequence.jpg
Repetitive sequences make correct assembly
difficult
58Multiple Genes
59Timeline of large-scale genomic analyses. Shown
are selected components of work on Several
non-vertebrate model organisms (red), the mouse
(blue) and the human (green) from 1990 earlier
projects are described in the text. SNPs, single
nucleotide polymorphisms ESTs, expressed
sequence tags.
60SCIENCE VOL. 277, p1453-1462, 1997
61(No Transcript)
62(No Transcript)
63Set up genome center
1998
1999
????
NLBL mapped Over 300 clones
2000
???????????? ????
??????
????
2001
?????????? ???????
2002
????????
?????????? ???
?????????
?????????? ??????
2003
?????????? ????
YMGRC/NHRI
64Vibrio vulnificus
Strain YJ016 Genome Size 5.2 Mb Source
Southern Taiwan Significance Virulence Strategy
Whole Genome Shotgun Sequencing Coverage 10X
65http//genome.nhri.org.tw/vv/
66Vibrio vulnificus
67Global feature of the Vibrio vulnificus YJ016
genome
68GC of V. vulnificus Chromosome 1 2
Chromosome 1
Chromosome 2
69GC skew of V. vulnificus Chromosome 1 2
Chromosome 1
Chromosome 2
70Comparison of the similarity between V.v. and
V.c. genome
71Circular presentation of Vibrio vulnificus YJ016
genome
Chromosome 2
1.85 Mb
Chromosome 1
3.3 Mb
Plasmid pYJ016
48.5 Kb
72Comparison of predicted genes of V. vulnificus
YJ016, V. cholerae El Tor N16961, and E. coli K12
73(No Transcript)
74(No Transcript)
75Some more technological approaches (some of
which really work!) Sequencing by
hybridization (annealing) Sequencing by
ligase-edited annealing Pyrosequencing Note
there are also higher tech versions of classic
Sanger sequencing in the works (see
http//www.helicosbio.com)
76(No Transcript)
77(No Transcript)
78(No Transcript)
79Several companies are pursuing massively
parallel ( cheaper) new DNA sequencing
strategies, including some that involve single
molecule analyses. Some of the main players are
given below 454 Life Sciences (http//www.454.com
/enabling-technology/the-system.asp) Solexa (now
part of Illumina) (http//www.illumina.com/pages.i
lmn?ID203) Helicos BioSciences (http//www.helico
sbio.com) VisiGen Biotechnologies (http//www.visi
genbio.com/technology.html)
80(No Transcript)
81(No Transcript)
82(No Transcript)
83(No Transcript)
84Solexa sequencing technology
85Solexa sequencing technology
86Solexa sequencing technology
87Thanks you