Title: Genome Characterization
1Genome Characterization
- DNA sequence-ULTIMATE Map
- DNA sequencing-methods
- Assembly/sequencing
Assigned reading Ch 13, Service 2006 review paper
BIO520 Bioinformatics Jim Lund
2DNA Sequence Project Size/Type
- 500 bases
- 2500 bases
- 10 kbp
- 150 kbp
- 3 Mbp
- simple
- repeats
- 3 Gbp
- 31 Gbp
- 1 locus EST,STS
- whole cDNA/EST
- Gene, virus
- BAC, big virus
- Bacterial genome, YAC-size
- Human, mouse
- Salamander
3Genome sizes
- Nematode (Caenorhabditis elegans) 100 Mb
- Thale cress (Arabidopsis thaliana) 160 Mb
- Fruit fly (Drosophila melanogaster) 180 Mb
- Puffer fish (Takifugu rubripes) 400 Mb
- Rice (Oryza sativa) 490 Mb
- Human (Homo sapiens) 3.5 Gb
- Leopard frog (Rana pipiens) 6.5 Gb
- Onion (Allium cepa) 16.4 Gb
- Mountain grasshopper(Podisma pedestris) 16.5
Gb - Tiger salamander (Ambystoma tigrinum) 31 Gb
- Easter lily (Lilium longiflorum) 34 Gb
- Marbled lungfish (Protopterus aethiopicus) 130 Gb
4DNA Sequencing Methods
- Chain termination/Dideoxy/Sanger
- Fluorescence paradigm, ABI
- Main method
- Sequencing by hybridization (SBH)
- Chips Affymetrix (Lander, et al)
- Other formats
- Hyseq
5Sequence by Hybridization
- High throughput, highly parallel Many different
formats - Superb for limited regions of genome
GAGCTACGTGACACGTCAGTCCCAG GAGCTACGTGACCCGTCAGTCCCA
G GAGCTACGTGACGCGTCAGTCCCAG GAGCTACGTGACTCGTCAGTCC
CAG
A
C
T
G
T
6Dideoxy/Chain Terminator/Sanger
- Template
- Primer
- Extension Chemistry
- polymerase
- termination
- labeling
- Separation
- Detection
7Chain Terminator Basics
TGCA
Extend
dN ddN 100 1
Ladder n, n1...
8Electrophoresis
Sequencing Reaction products
Polyacrylamide Gel Electrophoresis (PAGE)
9Separation
- Gel Electrophoresis
- Capillary Electrophoresis
- suited to automation
- rapid (2 hrs vs 12 hrs)
- re-usable
- simple temperature control
- 96 well format
migration 1/log N
10Paradigm Instrument
- Applied Biosystems
- http//www.appliedbiosystems.com/
- ABI3730XL (2002, 96 samples, 1000 base reads,
350,000, higher sensitivity, lower reagent
cost, 1/reaction) - 700 Kbp / 24 hours.
- 384 capillary sequencers
- 5700 sequences / 24 hr day
- 2.8 Mbp / 24 hours.
11384-well capillary sequencing
Results are shown as an electropherogram showing
a peak for each base. From the peak heights and
widths, a Phred score is assigned to each
individual base. A high Phred score indicates a
high certainty as to the identity of that
particular base.
12Sample Output
13Limitations/Challenges
- 1 trace1000 bases or less
- How do we cover a genome?
- DIVIDE AND CONQUER assemble these short sequence
fragments.
14Trace Editing
- EditView (ABI PRISM)
- Mac
- Chromas (free/pay versions)
- Windows
- Consed
- UNIX
- VectorNTI Contig Express
15Sequencing Strategies
- Ordered
- Divide and Conquer
- Random Sequence
- Brute Force
Sequencing Assembly Finishing Annotation
Big projects often mix methods.
16Random Method
- Shear DNA (nebulize)
- finish ends, ligate into vector
- Produce template
- Sequence to 8X 10X coverage
- Sequence both ends of templates.
- Read length (500 typical)
- Accuracy (99 good)
17Assembly Problem
CONTIG
18Contigs, Islands
contigs
Island
19Random
20Continuing rapid improvement in sequencing
technology
21- 1990s Human genome 3Gbps, 300 million (just
sequencing) - Current Mammalian genome (3 Gbps) 30 million
- Goal 100,000 genome, 300X cheaper (and
faster)
Current best technology