SIMDDS - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

SIMDDS

Description:

Local two-sequences alignment is the basis of sequence ... Integrates features from Smith-Waterman, BLAST, Fasta and Haste (Hash-Accelerated Search) [5] ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 12
Provided by: jian4
Learn more at: https://www.cs.uml.edu
Category:
Tags: simdds | waterman

less

Transcript and Presenter's Notes

Title: SIMDDS


1
(No Transcript)
2
Major Application Finding Homologies
(C) Mark Gerstein, Yale University
bioinfo.mbb.yale.edu/mbb452a
3
AutoSimS
  • Local two-sequences alignment is the basis of
    sequence analysis, and perhaps the most widely
    used tool in computational molecular biology 1
  • The parameters of most popular local sequence
    alignment tools including BLAST and FASTA are set
    by
  • Default set to for the average case, which
    may not be appropriate for the sequences being
    examined
  • Custom the manual settings may be difficult,
    which usually require fine tuning through several
    manual trials
  • AutoSimS (Automated Sequence Similarity Search)
    contains three modules
  • A modified version of SIM/DDS (Similarity /
    DNA-DNA sequence) 2, 3 for finding similar
    regions
  • Adaptive simulated annealing (ASA) 4 for
    optimizing parameters for SIM/DDS
  • An AI decision-making system (not implemented)
    for guiding the adaptive simulated annealing

1
4
(SIM/DDS)
Similarity / DNA-DNA Sequence
  • Integrates features from Smith-Waterman, BLAST,
    Fasta and Haste (Hash-Accelerated Search) 5
  • Rated as one of fastest and least space
    consuming (linear space complexity) tools for
    universal sequence alignment 6
  • Provides tradeoffs between sensitivity and speed
    using over a dozen of parameters
  • Our modified SIM/DDS introduces more cutoffs
  • Increases flexibility of control
  • Sequence filtering
  • Word masking
  • Reduces the impact of short and exact matches
  • Allows adjusting sensitivity for weak similarity

2
5
(ASA)
Adaptive Simulated Annealing
  • Uses global and statistical optimization
    techniques that are able to handle complex,
    non-linear search spaces
  • Several improvements over the original simulated
    annealing technique
  • Computational complexity exponential
    temperature schedule for annealing
  • Completeness decreases the chance to miss
    optima
  • Generality more options to better fit problems
    to be solved
  • Most attractive feature individual
    considerations given to parameter range,
    annealing-time-dependent sensitivities, and the
    probability density distribution for each
    parameter
  • Provides up to 100 options
  • Facilitates incorporation into the AutoSimS model

3
6
AutoSimS Model
User Preferences
AI Decision-Making Module (not implemented)
Sequence Data
Data Selection
Knowledge Base
Modified SIM / DDS
Parameters
Parameter Search
Set of possible parameters with exponential
probability
Parameter Evaluation
Exponential Annealing
Value of objective function
ASA
Preferred similarity regions
4
7
Summary of Model
  • ASA works as a wrapper program to select
    parameters for SIM/DDS
  • With properly specified search spaces, objective
    function and successor heuristics determined by
    the AI decision-making system, ASA is used to
    find the optimal parameter setting of modified
    SIM/DDS program. This leads to finding better
    similar regions
  • Even though the above mentioned information to
    be given manually to ASA, we find it easier to do
    so and let ASA tune the parameters for SIM/DDS
    than to manually tune SIM/DDSs parameters
  • Adding the AI decision-making module will make
    AutoSimS nearly autonomous by automatically
    providing most of the information ASA needs

5
8
Results
  • AHSC (Average of High-Scoring Chain Scores) may
    be used as an ASA objective function to find
    parameters yielding highly similar regions
  • We find close-to-optimal parameter settings are
    difficult to find manually, and that there are
    many different parameter settings that yield
    close-to-optimal search results
  • An automatic search for parameters may be
    effective
  • Adaptive simulated annealing may be a preferred
    search technique

Three runs of our modified SIM/DDS program using
parameters selected by adaptive simulated
annealing for a 100 and 200 letter pair of DNA
sequences yield similar results, but with
different parameter settings. ASA settings
Annealing schedule T 20 exp(-0.005t) if t lt
100 and 0 otherwise Acceptance function exp(
?E / T )
6
9
Future Work
  • Implement the AI decision-making system,
    including the decision analysis and knowledge
    base system
  • Experiment on a large number of different types
    of molecular biological sequences to determine
    the proper annealing temperature schedules and
    successor heuristics and/or their parameters
  • Parallelize AutoSimS
  • Incorporate core ideas of more efficient very
    large-scale sequence comparison techniques, such
    as LSH (Locality-Sensitive Hashing) 7
  • Generate statistical estimates for the local
    alignment score distributions 1, which will be
    used in AutoSimSs decision-making system
  • Explore different ASA objective functions, which
    may improve results

7
10
Conclusion
  • ASAs ability to fit complex functions, i.e.
    nonlinear search spaces and multiple variables,
    allows it to find a suitable set of parameters
    for SIM/DDS
  • The incorporation of AI decision-making system
    to our ASA-SIM/DDS program should enhance our
    ability to achieve almost autonomous two-sequence
    similarity analysis with high volume throughput
    and acceptable performance
  • Our use of simulated annealing to find a
    suitable set of parameter can be adapted to other
    bioinformatics analysis programs, such as
    alignment and clustering

8
11
References
1 Altschul, S. F., Bundschuh, R., Olsen, R. and
Hwa, T., The Estimation of Statistical Parameters
for Local Alignment Score Distributions. Nucleic
Acids Research, Vol. 29, No. 2, 351361, 2001
2 Jiang, T., Xu, Y. and Zhang, M.Q., Current
Topics in Computational Molecular Biology. MIT
Press, 2002 3 Huang, X. and Miller, W., A
Time-Efficient, Linear-Space Local Similarity
Algorithm. Advances in Applied Mathematics 12,
337357, 1991 4 Ingber, L., Simulated
Annealing Practice versus Theory. Mathl. Comput.
Modelling, Vol.18, No.11, 2957, 1993 5
Borkowski, J.A., Smith, C.P. and Huang, X., PFPA
Flexible Integrated Filtering and Masking Tool,
Paracel Inc., Pasadena, CA 6 Tech Topics,
Michigan Technological University, Nov. 3, 1995,
Vol. XXVIII, No.9 7 Buhler, J., Efficient
Large-Scale Sequence Comparison by
Locality-Sensitive Hashing. Bioinformatics 17(5)
419428, 2001
9
Write a Comment
User Comments (0)
About PowerShow.com