Title: Sequencing by Ligation on Polony Beads
1Sequencing by Ligation on Polony Beads
Molecular Genomic Imaging Center (CEGS)Harvard
/ Wash UGeorge Church, Rob MitraGreg Porreca,
Jay Shendure
Personal Genomics, Stem Cells, ELSI
with Nick Reppas, Kun Zhang, Shawn Douglas, Mike
Wang, Abraham Rosenbaum, Agencourt
Synthetic Biology
2 Polymerase colony
2 vs. 1 immobilized primer in situ
polonies vs. emulsion PCR beads single molecule
vs. multi-molecule detection dNTP extension (SBE)
vs. ligation (SBL) (gt3X error 1e-6, 1/10 cost
of ABI E.coli )
- Single chromosomes haplotyping (Zhang)
- Single cells full sequence (Zhang Martiny)
- Single RNA molecules RNA splicing (Zhu, Varma)
Shendure, Porreca, Mitra, Church
3Polony Sequencing Overview
- 1. In vitro construction of a complex
- mate-paired library
- 2. Template amplification to
- one micron beads by emulsion PCR
- 3. Cyclic Array
- Sequencing by Ligation (SBL)
4In vitro construction of a complex, mate-paired
library
common sequences
43 bp 32 25
1 kb genomic fragment
Fisseq
-
F
Fisseq
-
R
Fisseq
-
F
Left
Right
T30
Tag 2
Tag 1
Mid
Seq2
Seq1
paired genomic tags (17 to 18 bp each)
MmeI
Total 134-136 bp amplicon
5Template Amplification
- Emulsion PCR
- to 1 micron beads
- Dressman et al. PNAS'03
6Enrichment by Hybridization
7One of 750 megapixel frames of gel-immobilized
1.0 micron beads, 0.3 micron pixels, 4-colors
8Sequencing by Ligation (SBL) with fluorescent
combinatorial 9-mers
Excitation Emission 647 700 555
605 572 630 555 700
5-Cy5-nnnnAnnnn-3 5-Cy3-nnnnGnnnn-3
5-TR-nnnnCnnnn-3 5-Cy3Cy5-nnnnTnnnn-3
nm
5'PO4
ACUCAUC (3)TAGAGT???
?????????????TGAGTAG(5)
9Why low error rates?
Goal of Resequencing ? Discovery of Uncommon
Variation
Consensus Accuracy False Positives (E.coli) False
Positives (Human) 1E-3 4,000 3,000,000 1E-4
BERMUDA/ABI 400 300,000 1E-6 Polo
ny-SBL 4 3000
10Genome engineeringSelect for cross-feeding
SecondPassage
First Passage
?trp/?tyrA pair of genomes shows the best
co-growth (syntrophs)
Reppas Lin
11Co-evolution of cross-feedingTrp- Tyr- genome
pair
12860,000 independent mate-pairing events
1 kb genomic fragment
980 96 bp
13Aberrations in mate-pair distance indicative of
rearrangements
1,974,001 (MG1655)
1,978,000 (MG1655)
confirmed 776 bp deletion via tandem 8 bp repeats
14Base-calling Tetrahedron
C
A
T
G
Fluorescent SBL data quality measured by distance
to the 4 vertices.
15Raw Error Rate
Q40
Q30
Q20
Mean accuracy 99.5 Best 50 of base-calls
are 99.9 accurate
16Consensus error rates
17Mutation Discovery in Engineered Evolved
Trp-Strain
Position Type Gene Location ABI Confirmation Comments
986,334 T gt G ompF TATA box ? Only in evolved strain
931,960 8 bp del lrp frameshift ? Only in evolved strain
1,976,500 776 bp del insB_5 IS element ? MG1655 heterogeneity
3,957,960 C gt T ppiC 5' UTR ? MG1655 heterogeneity
4,654,533 T gt C cI Glu gt Glu ? l heterogeneity
4,647,960 T gt C ORF61 Lys gt Gly ? l heterogeneity
985,797 T gt G ompF Glu gt Ala (in progress)
454,864 T gt C tig Gly gt Gly (in progress)
4,648,691 G gt A exo Phe gt Phe (in progress)
18Cost comparison projection
ABI 2004 Jun 2005 2006
gt2007 bp/expt - 2e7 3e7 3e8
60e9 Complexity (bp) - 74 4e6 3e9
6e9 Avg Fold Cov 8 3e5 6 0.1
10 Pix per bp - 300
1724 333 1 Read-length 900 14
(SBE) 25 (pair) 35 42 / Q20 kb
8e-1 - 8e-2 4e-2
1e-5 / 1X 3e9 b 2e6 -
2e5 5e4 1e2
(2e3) Indel Error 5e-3 0.6 1e-3
1e-3 1e-3 Subst Error
4e-3 4e-6 1e-3 1e-3 1e-3 3X Cons
Err 1e-4 -
1e-6 3e-7 1e-7 Kb / min
0.8 360 27 1e3 1e6 Pix /
sec - 2e5 2e6 6e6 2e7 Enz
/mg - 8 8 8 0.4
19Challenges in 2000 genome
gt2007 bp/expt 60e9
20X of 3e9 10X diploid Complexity (bp) 6e9
Automated 96-well libraries Avg Fold Cov
10 (Currently align .4
pix .1 micron) Pix per bp 1
Sensitivity align CCD slide? Read-length
42 Is 34 enough? (next slide) /
Q20 kb 1e-5 (20X 3e9) /
1X 3e9 b 1e2 (2e3) Need haplotyping
too? (slide after next) Indel Error
1e-3 Subst Error 1e-3 3X Cons Err
1e-7 Kb / min 1e6 Pix / sec
2e7 Current camera is 3e7, but stage
is 2e6 Enz /mg 0.4 Realized for
many recombinant proteins
20Human Resequencing with Mate-Paired 17 bp Tags
simulation
Assume paired 17-mers (i.e. read full tag
length) with 750-1150 bp distance
distribution (980 s96 bp observed) Exact
Matching (34/34) Zero
Unique Multiple Paired, no substitutions
---- 94.4 5.6 Paired, one substitution
98.3 0.5 1.3 Unpaired, no substitutions
98.8 0.3 0.9 Single Substitution or Exact
(33/34 or 34/34) Zero Unique Multiple Pai
red, no substitutions ---- 90.4 9.7 Paired,
one substitution ---- 92.8 7.2 Unpaired,
no substitutions 96.0 1.5 2.5
21Single chromosome molecule haplotypes
GM10835
rs3778973 rs1557917 rs39284 rs10500042 rs4717028
C G C G C
T A T A T
153Mb
TT137 CT2 (TC1)
CC131
22Amplifying sequencing whole genomes from single
cells
Escherchia Prochlorococcus
Zhang, Martiny, Chisholm, Church, unpub.
No template control
f29 real-time amplification
Affymetrix quantitation of 2 independent
amplifications
23 Polymerase colony
2 vs. 1 immobilized primer in situ
polonies vs. emulsion PCR beads single molecule
vs. multi-molecule detection dNTP extension (SBE)
vs. ligation (SBL) (gt3X error 1e-6, 1/10 cost
of ABI E.coli )
- Single chromosomes haplotyping (Zhang)
- Single cells full sequence (Zhang Martiny)
- Single RNA molecules RNA splicing (Zhu, Varma)
Shendure, Porreca, Mitra, Church
24Roundtable I
Shared Resources STTR Polymerase libraries
NEB MJR ABI Fuller CCDs spectra, cost,
pixels, sensistivity, speed
software Cancer Genome 12500 NCAB clonal?
enrichment MRD accuracy read
length Cost estimates distribute template
spreadsheet