Microarray Synthesis through Multiple-Use PCR Primer Design - PowerPoint PPT Presentation

About This Presentation
Title:

Microarray Synthesis through Multiple-Use PCR Primer Design

Description:

Microarray Synthesis through Multiple-Use PCR Primer Design Research Proficiency Examination Rohan Fernandes Biology Background:PCR PCR animation (From the Dolan DNA ... – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0
Slides: 37
Provided by: www3CsSto1
Category:

less

Transcript and Presenter's Notes

Title: Microarray Synthesis through Multiple-Use PCR Primer Design


1
Microarray Synthesis through Multiple-Use PCR
Primer Design
  • Research Proficiency Examination
  • Rohan Fernandes

2
Biology BackgroundPCR
  • PCR animation (From the Dolan DNA Learning
    Center, CSHL)
  • Applications of PCR include
  • Genetic Fingerprinting.
  • Medical Diagnostics.
  • DNA Sequencing.

3
What are Microarrays?
4
What are Microarrays?
  • A grid with different DNA probes in each
    location.
  • Allows one to test a given sample for expression
    of multiple genes.
  • Can compare gene expression by using different
    colored fluorescent markers in two samples.

5
Genomic Data
  • Sequences are known for more than 800 organisms!
  • 100 free-living species have been sequenced
    already.
  • But we know very little about most of these
    organisms biology.
  • Exploiting full-genome sequence data, requires
    investigators to have inexpensive custom
    microarrays.

6
Why Microarrays?
  • Microarray technology has revolutionized our
    understanding of gene expression.
  • Applications include
  • Cell cycle analysis.
  • Response of cells to environmental stress.
  • Impact of gene knockouts.

7
A Primer Design True Story!!
  • Project for Futcher and Leatherwood to design PCR
    primers for microarray synthesis.
  • Strict criteria for primer length, melting
    temperature, self-similarity were specified.
  • Designed primers for 5827 and 5012 genes for
    Cerevisiae and Pombe.
  • PCR done with sample set of primers designed for
    96 genes each of S. Pombe and S. Cerevisiae was
    100 successful.

8
The 110,000 Dollar Problem
  • Good primer design can be crucial in synthesizing
    microarray DNA.
  • 110,000 out of a total budget of 220,000 for
    microarray synthesis was spent on PCR primers
    alone.
  • We propose an alternative method of PCR primer
    design to reduce costs.

9
Efficiency of PCR
  • Usually, PCR primers are designed to occurs
    uniquely on the genome.
  • However, efficiency of PCR falls exponentially as
    length of product increases.
  • PCR becomes ineffective for product sizes beyond
    1200 bases.

10
Exploiting PCR Efficiency Drop-off
  • Amplification is significant only if primers
    hybridize near each other.
  • We can reuse primers to amplify several genes,
    provided each primer pair is unique.
  • We can save thousands of primers through reuse!

11
Who can benefit?
  • The total cost of PCR primers may dissuade
    investigators of less studied organisms from
    using microarrays.
  • Our technique can reduce costs enough to make
    microarrays more attractive to less funded
    researchers.

12
What is the potential win?
  • Let (n,m) be the (number of genes, minimum number
    of primers required to amplify them).
  • m primers can result in m(m1)/2 unique primer
    pairs.
  • ?2n primers may be sufficient instead of 2n.
  • Conventional primer design requires 12,000
    primers for 6,000 genes, but 110 might suffice.
  • In practice this lower bound will be unreachable
    but there will still be a large win.

13
Potential Win? (Example)
  • Consider the cost of building a spotted
    microarray for a 20,000 gene organism.
  • Conventional techniques will require us to use
    40,000 primers.
  • Cost 160,000 at 4 a primer.
  • If 3,000 primers suffice, cost is only 12,000.
  • The best case is overoptimistic, but realistic
    wins are still impressive.

14
Cost of Split Addressing
  • What is the probability that two random strings
    will occur in a long random string in a certain
    order and with no more than a certain gap?

15
Split Addressing (Contd)
16
Split Addressing Conclusion
  • Total length of primers required to ensure
    uniqueness of hybridization increases only very
    slowly with the length of the genome.
  • The penalty for genome scale lengths and
    realistic PCR gap lengths amount to only
    additional 3-4 bases of primer over ungapped
    matching.
  • These results support the potential of
    multiple-use primers.

17
Minimum Primer Set Problem
18
Budgeted Primer Set Problem
19
Hardness of problems
  • The Minimum Primer Set problem is NP-hard and
    hard to approximate to within a logarithmic
    factor.
  • The Budgeted Primer Set problem is NP-hard and
    seems to be related to densest k-subgraph
    problem.
  • Approximation bounds for densest k- subgraph
    problem are not encouraging.

20
Reduction Gadget
21
Reduction from Set Cover to Minimum Primer Set
  • (S, X) is a set cover instance.
  • S ??U, X ??W. Connect vertex in U to vertex in W
    iff corresponding set in S contains element from
    X.
  • Label (color) each edge by the name of the
    element vertex at its end.
  • MPS solution will include all element vertices
    and minimum number of set vertices which cover
    all sets. Q.E.D.

22
A Heuristic to approximate MPS
  • Based on greedy heuristic to find densest
    subgraph.
  • Each edge is weighted with the value of (1/number
    of edges bearing that color).
  • Vertex weight is set to sum of adjoining edge
    weights.
  • Algorithm proceeds by removal of vertex with
    minimum weighted vertex without eliminating any
    color.
  • Algorithm terminates when no more vertices can be
    eliminated.

23
Example Run of Algorithm (1)
  • Initially graph with vertex weights.

Color Edges Weight
Blue 2 1/2
Green 1 1/1
Red 3 1/3
24
Example Run of Algorithm (2)
  • After removing minimum weighted vertex.

Color Edges Weight
Blue 1 1/1
Green 1 1/1
Red 3 1/3
25
Example Run of Algorithm (3)
  • Final graph.

Color Edges Weight
Blue 1 1/1
Green 1 1/1
Red 1 1/1
26
Performance of Heuristic
  • O(V.(VEC)) time and O(VEC)
    space.
  • This heuristic is too slow. It is quadratic in
    V hence very slow on large data sets.
  • For our largest dataset this heuristic produced a
    solution in two days as opposed to 25 minutes for
    the next heuristic.

27
A Linear-time Heuristic
  • We select an edge of each color that has maximum
    colored adjacency to form our seed graph.
  • We switch an edge for a color if that saves us
    any vertices in the seed graph
  • If there are no savings but no additional
    vertices we switch edges with p1/2.
  • Repeat above steps until no. of vertices is
    constant.
  • Eliminate all colors whose edges are not
    isolated.
  • Repeat above steps for remaining graph until no.
    of vertices is constant. Merge graph obtained.

28
Selecting Seed Edges
29
Replacing Seed Edges
30
Retrying with Isolated Colored Edges
31
Preparation of Experimental Data Sets
  • Candidate primer sets for S. Cerevisiae and S.
    Pombe prepared using Primer3.
  • Primer length range 8-12 bases.
  • PCR product size range from 300-1200 bases.
  • For each gene at most 10,000 pairs of primers
    were selected.
  • Three melting temperature ranges for each of S.
    Cerevisiae and S. Pombe were selected.

32
Degenerate Data Sets
  • A degenerate primer is a mix of two or more
    primers usually differing in a small number of
    bases.
  • Degenerate primers can make resulting colored
    graph more dense by merging primers.
  • Created degenerate data sets by merging primers
    differing in at most one base.

33
Summary of Results (Non-degenerate)
Yeast T_m Amplified Genes Lower Bound Cost (1) Cost (2) Savings (1) Savings (2)
Cerevisiae 47-57 3775 3065 5483 5511 2067 2039
Cerevisiae 42-52 2700 1344 3130 3232 2270 2168
Cerevisiae 40-50 5313 1241 4753 5157 5863 5469
Pombe 45-55 3583 2622 4987 5058 2179 2108
Pombe 43-53 4232 1988 4799 4951 3665 3513
Pombe 40-50 3400 1380 3651 3852 3149 2948
34
Summary of Results (Degenerate)
Yeast T_m Amplified Genes Lower Bound Cost (1) Cost (2) Savings(1) Savings(2)
Cerevisiae 47-57 3775 1221 3638 3940 3912 3610
Cerevisiae 42-52 2700 475 2105 2481 3295 2919
Pombe 45-55 3583 1050 3283 3598 3883 3568
35
Future Work
  • Using longer primers would enable more efficient
    PCR.
  • Increasing order of degeneracy would give a more
    dense colored graph and potentially greater
    savings.
  • Combining the above two ideas is the focus of our
    current work.
  • Consider the use of existing software
    architecture to solve other primer design
    problems.

36
Acknowledgements
  • Thanks to Steven Skiena, Bruce Futcher and Janet
    Leatherwood.
  • Sponsored by NSF Grant CCR-9988112.
Write a Comment
User Comments (0)
About PowerShow.com