Sequencing shRNA libraries with DNA Sudoku - PowerPoint PPT Presentation

About This Presentation
Title:

Sequencing shRNA libraries with DNA Sudoku

Description:

Sequencing shRNA libraries with DNA Sudoku. Preparing DNA libraries. Programmable ... to the amenity of the world' (G. Hardy, A Mathematician's Apology,1940) ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 26
Provided by: wat897
Category:

less

Transcript and Presenter's Notes

Title: Sequencing shRNA libraries with DNA Sudoku


1
shRNA libraries sequencing using DNA Sudoku
  • Yaniv Erlich
  • Hannon Lab

2
Preparing DNA libraries
Introduction Naïve Solutions Chinese Pooling Analysis Results
Programmable microarray
Cloning into plasmids
Transformation
Array single colonies
3
The problem
Introduction Naïve Solutions Chinese Pooling Analysis Results
Input 40,000 bacterial colonies Output The
sequence of the shRNA inserts
Insert type
4
Motivation
Introduction Naïve Solutions Chinese Pooling Analysis Results
  • Filtering the correct fragments
  • Balanced representation
  • Subset selection.

5
Clone-by-clone sequencing
Introduction Naïve Solutions Chinese Pooling Analysis Results
Clone-by-clone sequencing Sequence each
clone by a capillary platform
Caveat Cost 40,000
Conclusion using next generation sequencing
6
Naïve next-gen
Introduction Naïve Solutions Chinese Pooling Analysis Results
Solexa
Pooling
??
Conclusion we need to add a source clone
identifier (barcode)
7
Naive barcoding
Introduction Naïve Solutions Chinese Pooling Analysis Results
Solexa
Pooling
Barcoding
  • Caveats
  • Order 40,000 barcodes. Each of length of 95nt.
  • 40,000 PCR reactions.

Barcode Sequence
214 AGTGC..
8106 CTCAA..
30010 TTTCG..
88 TTGAA..
Conclusion we need less barcodes
8
Naive Pooling(1)
Introduction Naïve Solutions Chinese Pooling Analysis Results
A
B
C
D
E
F
Barcode
1 2 3 4 5 6 7 8
Case 1
Which specimen appears in both barcode 5 and B?
Genotype Barcode
ACACA 5
ACACA B
Specimen 13!
erlich_at_cshl.edu
9
Naive Pooling(2)
Introduction Naïve Solutions Chinese Pooling Analysis Results
A
B
C
D
E
F
Barcode
1 2 3 4 5 6 7 8
Case 2
Or maybe ACGTT associated with specimens 25(D,2)
and 34(E,1)?
Genotype Barcode
ACGTT 1
ACGTT D
ACGTT E
ACGTT 2
ACGTT associated with specimens 25(D,1) and 34
(E,2)!
Ambiguity
Conclusion we should deal with shRNA duplicates
erlich_at_cshl.edu
10
Lessons learned for the desired scheme
Introduction Naïve Solutions Chinese Pooling Analysis Results
Features of the required encoding scheme
Compactness Using a small set of barcodes
Dealing with duplicates Every specimen should be resolved without ambiguity.
Experimental overhead While reducing the number of barcodes, we should also pay attention to the resource allocated to the pooling itself.
Simple This is not a computer program. Encoding is done by a robot and chemistry - So keep It Simple
erlich_at_cshl.edu
11
Overview of our solution
Introduction Naïve Solutions Chinese Pooling Analysis Results
Chinese Pooling
Barcoding
PE sequencing
Decoding
erlich_at_cshl.edu
12
The pooling design
Introduction Naïve Solutions Chinese Pooling Analysis Results
  • Combinatorial pooling using the
  • Chinese Remainder Theorem (CRT).

"I have never done anything 'useful'. No
discovery of mine has made, or is likely to make,
directly or indirectly, for good or ill, the
least difference to the amenity of the world (G.
Hardy, A Mathematician's Apology,1940)
13
Chinese remainder riddle
Introduction Naïve Solutions Chinese Pooling Analysis Results
An old woman goes to market and a horse steps on
her basket and crashes the eggs. The rider offers
to pay for the damages and asks her how many eggs
she had brought. She does not remember the exact
number, but when she had taken them out 3 at a
time, there was one egg left. The same happened
when she picked them out 4, and 5 at a time, but
when she took them 7 at a time they came out
even. What is the smallest number of eggs she
could have had?
  • Chinese Remainder Theorem says
  • There is one-to-one correspondence between n
    (0?nlt2357) and the residues.
  • There is an easy algorithm to solve the equation
    system.

Answer 91 eggs
14
Pooling construction with modular equations
Introduction Naïve Solutions Chinese Pooling Analysis Results
Destination well (different plates)
Specimen
Pooling window
One-to-One correspondence
15
Example of Chinese pooling
Introduction Naïve Solutions Chinese Pooling Analysis Results
Source array
03/06/09
erlich_at_cshl.edu
16
Chinese Remainder Pooling Design
Introduction Naïve Solutions Chinese Pooling Analysis Results
  • Inputs N (number of specimens in the
    experiment)
  • Weight (pooling efforts)
  • Algorithm
  • 1. Find W numbers x1,x2,,xw such that
  • Bigger than
  • Pairwise coprime
  • For instance 5,8,9 but not 5,6,9
  • 2. Generate W modular equations
  • 3. Construct the pooling design upon the modular
    equations
  • Output Pooling design

Chinese Remainder Theorem asserts (1) Two
specimens will be meet in no more than one
pool. (2) The number of pools
Number of bc
erlich_at_cshl.edu
17
How good is our method?
Introduction Naïve Solutions Chinese Pooling Analysis Results
Features of the required encoding scheme
Compactness Using a small set of barcodes
Dealing with duplicates Every specimen should be resolved without ambiguity.
Experimental overhead While reducing the number of barcodes, we should also pay attention to the resource allocated to the pooling itself.
Simple This is not a computer program. Encoding is done by a robot and chemistry - So keep It Simple
erlich_at_cshl.edu
18
Barcode reduction
Introduction Naïve Solutions Chinese Pooling Analysis Results
IEEE Transaction on Information Theory (1964)
Proved upon pure combinatorial constrains the
lower theoretical bound of the number of barcodes
is
Our method is very close the lower theoretical
bound
erlich_at_cshl.edu
19
How good is our method?
Introduction Naïve Solutions Chinese Pooling Analysis Results
Features of the required encoding scheme
Compactness Using a small set of barcodes
Dealing with duplicates Every specimen should be resolved without ambiguity.
Experimental overhead While reducing the number of barcodes, we should also pay attention to the resource allocated to the pooling itself.
Simple This is not a computer program. Encoding is done by a robot and chemistry - So keep It Simple
erlich_at_cshl.edu
20
Dealing with duplicates - simulation
Introduction Naïve Solutions Chinese Pooling Analysis Results
0.99
Probability of correct decoding
Duplicates size
40,000 specimens with only 384 barcodes
erlich_at_cshl.edu
21
How good is our method?
Introduction Naïve Solutions Chinese Pooling Analysis Results
Features of the required encoding scheme
Compactness Using a small set of barcodes
Dealing with duplicates Every specimen should be resolved without ambiguity.
Experimental overhead While reducing the number of barcodes, we should also pay attention to the resource allocated to the pooling itself.
Simple This is not a computer program. Encoding is done by a robot and chemistry - So keep It Simple
  • W5
  • 5 lanes of Solexa
  • One week and a half of robotics

erlich_at_cshl.edu
22
How good is our method?
Introduction Naïve Solutions Chinese Pooling Analysis Results
Features of the required encoding scheme
Compactness Using a small set of barcodes
Dealing with duplicates Every specimen should be resolved without ambiguity.
Experimental overhead While reducing the number of barcodes, we should also pay attention to the resource allocated to the pooling itself.
Simple This is not a computer program. Encoding is done by a robot and chemistry - So keep It Simple
erlich_at_cshl.edu
23
Real results
Introduction Naïve Solutions Chinese Pooling Analysis Results
  • Arabidopsis shRNA library with 17,000 shRNA
    fragments
  • Picked 40,320 bacterial colonies
  • Sequence 3,000 colonies with capillary sequencing
    for comparison.
  • Decoded 20,500 bacterial colonies with correct
    inserts
  • 96 of the assignments were correct.
  • 8,000 unique fragments of the library.

24
Future directions
Introduction Naïve Solutions Chinese Pooling Analysis Results
  • Developing a more advance decoder using machine
    learning approach
  • 2-stage algorithm

25
Acknowledgements
Greg Hannon
Oron Navon and Roy Ronen
Ken Chang
Michelle Rooks
Assaf Gordon
03/06/09
erlich_at_cshl.edu
DNA Sudoku
Write a Comment
User Comments (0)
About PowerShow.com