An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design - PowerPoint PPT Presentation

About This Presentation
Title:

An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design

Description:

An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design Won-Hyong Chung and Seong-Bae Park Dept. of Computer Engineering – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 26
Provided by: whch3
Category:

less

Transcript and Presenter's Notes

Title: An Empirical Study of Choosing Efficient Discriminative Seeds for Oligonucleotide Design


1
An Empirical Study of Choosing Efficient
Discriminative Seeds for Oligonucleotide Design
  • Won-Hyong Chung and Seong-Bae Park
  • Dept. of Computer Engineering
  • Kyungpook National University, South Korea

2
Motivation
  • Issues for designing oligonucleotides
  • To minimize the cross-hybridizations
  • To minimize the computing time
  • Seeding (or indexing) have been widely used for
    concurring those issues by means of pre-screening
    unreliable sequence regions before calculating
    cross-hybridizations.
  • Although many types of seeding methods have been
    proposed, measure of evaluating the seeds
    regarding how adequate and efficient they are in
    the oligonucleotide design is not yet proposed.

3
Difference between alignment and oligonucleotide
design
  • Alignment
  • To find all possible alignments which have enough
    scores.
  • Sensitivity is important, while specificity is
    usually guaranteed by seeds own specificity.
  • Oligoncleotide design
  • To find optimal oligonucleotides to differentiate
    target sequences from the others.
  • Specificity should be considered as well as
    sensitivity for checking cross-hybridization.

4
Objectives
  • We propose novel measures of evaluating the seeds
    based on the discriminability and the efficiency.
  • We examine five seeding methods in
    oligonucleotide design.
  • continuous, spaced, transition-constrained, BLAT,
    and Vector seed
  • We provide a software package SeedChooser which
    enables users to get the adequate seeds under
    their own experimental conditions.

5
What is Seed?
  • Seeding process
  • Filtering step short fixed-length common words
    which are found at both query and target
    sequences are selected.
  • Extension step the selected words are extended
    to the size of oligonucleotide and be checked the
    cross-hybridization.
  • Seed the filtering template of the fixed-length
    words

6
Seeding methods (1/2)
  • Continuous seed a seed to find k-length exact
    matches
  • BLAST employs 11-bp length seed 11111111111
  • Spaced seed allowing dont care letter labeled
    0 in the seed
  • 18-bp-length seed containing 11-bp matches
    101101100111001011 is used at PatternHunter.
  • Transition-constrained seed adopting transition
    (A lt-gt G, C lt-gt T) letter _at_ in the seed
  • YASS used such seed 1110_at_10010_at_1010111, it
    consists of 18-bp length, 10-bp matches and 2
    transitions.

7
Seeding methods (2/2)
  • Blat seed a continuous seed allowing one or two
    mismatches at any positions of the seed.
  • Vector seed a generalized seed by combining the
    idea of BLAT seed and spaced seed.
  • BLAT seed and Vector seed allow some mismatches
    in any positions.
  • They greatly increase the sensitivity but spends
    much more computing time than the previous seeds.

8
The Issues of seeds for oligo design
  • An ideal seed should filter all regions as fast
    as possible that have no possibility of being
    chosen as an oligo.

a seed should find as many oligos as possible
a seed should avoid to find non-oligo region
a seed should minimize the cost of indexing to
find oligos
Discriminability
Efficiency
Efficient Discriminability
9
Discriminability
The discriminability is a balance between
precision and recall to minimize both false
positives and false negatives.
jump
alpha
10
Efficiency
  • The efficiency is the proportion of useful
    regions filtered by a seed.
  • the duplication ratio of generated indices
  • the average number of indices in each oligo

jump
beta, gamma
11
Efficient discriminability
12
Experiments
  • Empirically chosen seeds were evaluated by three
    measures, discriminability, efficiency, and
    efficient discriminability, respectively.
  • We tested the seeds for designing the 50mer
    oligos.
  • The parameters are set to 1 for evaluation.
  • Simulated data set
  • A set of random sequences which are generated by
    OligoGenerator in SeedChooser.
  • Biological data set
  • Ecologically important genes involved in the
    nitrogen and carbon cycles.
  • nirS nitrite reductase gene set
  • pmoA methane monooxygenase gene set

13
Discriminability of the five seeding methods
14
Efficiency of the five seeding methods
15
Efficient Discriminability the five seeding
methods
16
Evaluation results of pmoA data set
17
Evaluation results of nirS data set
18
SeedChooser Seed Evaluation and Recommendation
Tools
  • SeedChooser To recommend best seeds by the
    evaluation parameters. It uses genetic algorithm
    to find best seeds.
  • SeedEvaluator To evaluate a set of the seeds by
    the parameters.
  • OligoGenerator To generate a set of oligos for
    the desired experimental conditions.
  • SeedChooser homepage
  • http//ml.knu.ac.kr/whchung/seedchooser.html

19
CONCLUSION
  • The novel measure for evaluating the seeds in the
    oligo design based on the discriminability and
    the efficiency.
  • The spaced seed was generally preferred to the
    other seeding methods.
  • Our study can be applied to the oligo design
    programs in order to improve the performance by
    suggesting the experiment-specific seeds.
  • We expect that our study will be helpful to the
    other genomic tasks.

20
Supplementary materials
21
P0
T0
P1
T1
P2
T2
T3
  • T1, T2, T3 the target sequences.
  • P1 and P2 are the matched oligos for an oligo P0
  • S1, S2 and S3 are the seed indices for S0 by a
    seed.

back
22
Relations of precision, recall and
discriminability
23
Discriminability according to values of a
back
24
Efficiency according to values of ß and ?
back
25
Efficient Discriminability for 70mer Oligos
Write a Comment
User Comments (0)
About PowerShow.com