in Silico Primer Design and Simulation for Targeted High Throughput Sequencing PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: in Silico Primer Design and Simulation for Targeted High Throughput Sequencing


1
  • in Silico Primer Design and Simulation for
    Targeted High Throughput Sequencing

I519 FALL 2010 Adam Thomas, Kanishka
Jain, Tulip Nandu
2
BACKGROUND
  • Major Milestone
  • Molecular structure of DNA
  • Human Genome Project
  • High-Throughput Sequencing (HTS)
  • HTS transformed common experiments on single
    genes to entire genomes
  • Low cost
  • Multiple samples in every run (Eg. 454 Sequencer
    can sequence 400-600Mb)

3
BACKGROUND
  • Primers are a short stand of nucleotides that
    serve as the starting point of DNA synthesis.
  • Approximately 20-25 nucleotides.
  • Used to determine the DNA strand that needs
    amplification.
  • Complement of DNA strand.

4
PCR
  • Polymerase Chain Reaction
  • Technique to amplify a small region of DNA
  • 3 step process
  • Denaturation,
  • Annealing and
  • Extension.
  • Process repeated for approximately 30 to 40
    cycles.

5
PCR
  • Denaturation

Heat (approx 90C) separates double strand into
two single strands
6
PCR
  • Annealing

Primer binding to individual strands (occurs at
45 to 60C)
7
PCR
  • Extension

Temperature raised to 72C and the Tag DNA
polymerase enzyme is used to replicate DNA strands
8
PCR
  • End of First Cycle

Process repeated for approximately 30 to 40
cycles.
9
CURRENT PROCESS
10
CURRENT PROCESS
  • Primer3 used to create primers using PCR.
  • The primers then need to be validated. Validation
    is performed by simulation, alignment and
    re-assembly.
  • MetaSim is used to simulate PCR to create
    expected amplicons.
  • CAP3 is used for re-assembly of simulated
    sequences.
  • BLASTing the simulated sequences against the
    original sequence give a fairly accurate measure
    of how well the primers will perform.

11
ISSUES FACED WITH CURRENT PROCESS
  • Each tool uses different file inputs and outputs.
  • Users have to manually convert file formats to
    use in each tool.
  • None of the tools up till now can integrate all
    of the functions and give high throughput
    analysis.

12
GOAL
  • Integrate the whole process involved in the High
    throughput sequencing experiment and keep track
    of the parameters that are enter or changed.

13
OBJECTIVES
  • A way to visualize the primers and amplicons in
    relation to the genome and be able to edit the
    primers manually and see how that affects the
    simulation.
  • Optimization of the high-throughput process by
    minimizing the number of reads needed by the 454
    process and still be able to assemble the
    sequence.
  • Validation of the simulated amplicon reads to see
    whether the predicted simulation is in order and
    rectify the problem.

14
PROPOSED SOLUTION
15
PROPOSED SOLUTION
  • Solution can be broken into two major components
  • Creation of overlapping amplicons when amplified
    by PCR
  • Validation of primers effectiveness in silico.
  • Automation of the above mentioned solution.

16
VISUALIZATION TOOL
  • GBrowse
  • Popular and open source.
  • Well defined plugin architecture.
  • Plugin to design primers using Primer3 already
    available.

17
PRIMER DESIGN
  • PrimerDesign.pm plugin already exists for
    GBrowse. Design primers using Primer 3
  • Designed to only amplify one specific region of
    DNA with as few primers and no overlapping
    amplicons.
  • Tweaked to take two additional input parameters
    Amplicon Overlap and Max Amplicon Length.
  • Once primers are created using GBrowse, the
    primers are output into a Featured File Format
    (FFF)

18
PRIMER VALIDATION - SIMULATION
  • Simulation performed using MetaSim.
  • MetaSim
  • Generates sets of synthetic reads or mate-pairs
    based on adaptable sequencing error models (e.g.
    for Sanger chemistry, Roche's 454 and Illumina
    (former Solexa).
  • Can be controlled via graphical user interface or
    in command line mode.

19
SIMULATION
  • Function written in Perl to invoke MetaSim using
    command line option.
  • Algorithm
  • Read FFF file. Extract primer coordinates.
  • Extract sequence from the original sequence.
  • Run MetaSim simulation using command line
    options.
  • Each sequence generates its own FASTA sequence
    file with multiple sequences.

20
ASSEMBLY
  • Perl function written to invoke CAP3 using its
    command line interface.
  • Each file generated from the MetaSim simulation
    is input into CAP3 which then assembles the
    contigs.

21
ASSEMBLY
  • CAP3.
  • Input simulated sequences as FASTA file.
  • CAP3 is a sequence assembly program that allows
    users to assemble a set of short contigs.
  • Takes an input a file of sequence reads in FASTA
    format.
  • If header contains a dot (.), CAP3 requires
    that the names of reads sequenced from the same
    subclone contain the same substring up to the
    first dot.
  • Can be invoked using a command line interface.

22
BLAST
  • Assembled contigs are then BLASTed against the
    original sequence to validate.
  • GBrowse accepts the assembled sequence and BLASTs
    against the original sequence.
  • This plugin requires 4 steps
  • Exporting assembled contigs and original sequence
    from Gbrowse.
  • Creating a BLAST database.
  • BLASTing the contigs against the sequence.
  • Importing result back into GBrowse.

23
DEMO
24
QUESTIONS
Write a Comment
User Comments (0)
About PowerShow.com