Title: DNA Sequencing
1DNA Sequencing
- Basic Techniques
- Project Design
- Process Improvements
2Project Size/Type
- 500 bases
- 2500 bases
- 10 kbp
- 150 kbp
- 3 Mbp
- simple
- repeats
- BIG
- 1 locus EST,STS
- whole cDNA/EST
- gene, virus
- BAC, big virus
- bacterial genome
- YAC-size
- HUMAN, etc.
3DNA Sequencing Methods
- Chain termination/Dideoxy/Sanger
- fluorescence paradigm, ABI, HOOD
- Sequencing by hybridization
- chips Affymetrix (Lander, et al)
- other formats
- Hyseq (Church, et al)
- Lark
4Dideoxy/Chain Terminator/Sanger
- Template
- Primer
- Extension Chemistry
- polymerase
- termination
- labeling
- Separation
- Detection
5Chain Terminator Basics
TGCA
Extend
dN ddN 100 1
Ladder n, n1...
6Electrophoresis
7Template Preparation
- ssDNA vectors
- M13
- pUC
- PCR
- dsDNA (/- PCR)
8Primers
- Universal primers
- cheap, reliable, easy, fast, parallel
- BULK sequencing
- Custom primers
- expensive, slow, one-at-a-time
- ADAPTABLE
Primer Label
Dye Terminator
9Extension Chemistry
100 termination Accurate Even signal
- Polymerase
- Sequenase
- Thermostable (Cycle Sequencing)
- Terminators
- Dye labels (Big Dye)
- spectrally different, high fluorescence
- (mass labels??)
- ddA,C,G,T with primer labels
10Separation
- Gel Electrophoresis
- Capillary Electrophoresis
- suited to automation
- rapid (2 hrs vs 12 hrs)
- re-usable
- simple temperature control
- 96 well format
migration 1/log N
11Paradigm Instrument
- Applied Biosystems
- ABI3700 (early 1999)
- 1500 samples/day!
- http//www2.perkin-elmer.com/ga/3700/features.html
- ABI377 (gel) and ABI310 (capillary)
12Alternate Instruments
- Molecular Dynamics, Beckman Coulter
- ALF, LiCor
- infrared detection
Not Complete List
13Sample Output
14Trace Editing
- EditView
- Mac
- Chromas
- WinNT
- Consed
- UNIX
15Project Goals
- de novo sequence
- Chain terminators
- repetitive sequencing
- Sequencing by hybridization
- Chip technology, eg
16Sequencing Strategies
- Random Sequence
- Brute Force
- Ordered
- Divide and Conquer
Sequencing Assembly Finishing Annotation
Mix to Suit
17Random Method
- Shear DNA (nebulize)
- finish ends, ligate into vector
- Produce template
- Sequence to target coverage
- read length (500 typical)
- accuracy (99 good)
Assemble Contigs
18Random
19Poisson Statistics
Lread length Nreads Ggenome size
P0e-L(N)/G
20Poisson-2
Gap LengthP0G
21Poisson-3
Gap NumberP0N (assume N500 bases)
224 Mbp Genome
- 10x Coverage
- 80,000 reads at 500 bases/read
- 4 gaps
- 400 bases in gaps
55 instrument days on ABI3700
233000 Mbp GenomeHUMAN
50000 instrument days on ABI3700
24Automation
QT
25Costs
- Raw cost 0.01/base
- Semi-finished 0.10 per base
- finished 0.30 per base
- High-quality Genome Project
- 0.50/base
26Ordered Methods
Primer Walking
Nested Deletion
27Limitations
- Slow, Expensive
- Expertise Needed
- especially nested deletion
- Repeat Problems
- especially primer walking
28Finishing
- GOALS
- gt95 coverage on BOTH strands
- every base covered 3X
- resolve ambiguities
- Finish when random no longer productive (3-10 X
range)
29Finish-How
- Identify gaps, ambiguities
- Extend from end of contigs
- specific primers
- subclones, etc.
- Resolve ambiguities
- consensus or resequence
- specific primers, different chemistry
30Assembly Methods
- Strip out vector
- Mask known repeats
- Trim off unreliable data
- Find Matches (500 x 500 x many!!)
- how long (and what ktuple)
- how perfect (reliability index)
- where to look? (ends only vs entire)
31Assembly Programs
- PHRAP FAMILY
- phrap, kangaroo, phrapo,
- GAP4, TIGRAssembler,...
- GCG
- gelstart, gelenter, gelmerge, gelassemble,
geldisassemble - thinly veiled vi editor
- SeqWeb.
32Assembly ImprovementsRepeat Problems
- Multiple fragment sizes in 1 project
- Use length/distance info
33Project Management
- Editing and Assembly
- RepeatMasker
- Phred/Phrap
- Consed
- Databases
- ACeDB
- A C. elegans database
- Oracle
34Annotation
- ORFs
- GRAIL, PowerBLAST
- Repeats
- Other Regions
Submit to Genbank ...HTGS (level1,2,3) ...nr
35Sequencing by Hybridization
Hybridize labeled query DNA
CHIP OLIGOS (20-mers)
...gaactAatact... ...gaactCatact... ...gaactGatact
... ...gaactTatact...
site 1
...gaactaAtact... ...gaactaCtact... ...gaactaGtact
... ...gaactaTtact...
site 2
GAACTATGTACT
36Modern Sequencing Challenges
- Heterozygous DNAs
- germline differences
- somatic variation
- Massive sequencing
- population studies
- genome scans
- Minimal sample preparation
- Doctors Office
Chips, Quantitative Seq Automation Miniaturization
37Physical MappingGenome Characterization
- Genome fragmentation and cloning
- vectors, etc.
- Physical map assembly
- hybridization
- fingerprinting