Bioinformatics at USDA-ARS Livestock Issues Research Unit - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Bioinformatics at USDA-ARS Livestock Issues Research Unit

Description:

No large job fault tolerance. W.ND BLAST : A Bioinformatician promoting windows? .NET C# ... limit- ewww kewl WEB SERVICE! Shadow (Sub) Contractors- network ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 21
Provided by: drscot9
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics at USDA-ARS Livestock Issues Research Unit


1
Bioinformatics at USDA-ARS Livestock Issues
Research Unit
  • Scot E. Dowd, Joaquin Zaragoza
  • Mel Oliver and Paxton Payton

2
Projects
  • Future Interactive neural network based models
    to describe and predict gene expression in
    Livestock and Pathogens
  • Present Various Projects Various States Leading
    to the Future
  • Molecular Modeling
  • Gene Finding
  • Distributed BLAST
  • Whole Genome Comparison
  • Functional Genomics and pathways
  • Pathway or system targeted Microarray design

3
Functional Genomics
  • Functional Genomics/Gene Ontology- controlled
    vocabulary
  • Define, annotate, categorize, and describe large
    genetic datasets (e.g. est, mRNA)
  • We have developed a custom curated database for
    functional domain BLAST (regular blast and
    rps-BLAST using kog, cog, pfam, hmmr, smart
    domains)
  • Ultimately will become a comprehensive .NET suite
    of analyses for microarray design from new
    sequence all the way to result visualization.

4
Ontology
  • Annotation propogation of error in definitions
  • Ca

5
BLAST need for speed (II)
  • We are working with roughly 5000-100,000 queries
    against 1GB databases
  • 1 query takes a fairly fast PC 3 minute to
    complete
  • dual 3.2 GHZ XEON
  • 6 GB RAM
  • RAID0 SCSI-320 HD
  • Other methods MPI-BLAST, WU-BLAST, THREADED
    BLAST, SGE-BLAST, commercial TURBO BLAST, DNAstar
    etc.

6
BLAST ALGORITHM
  • Cgtcgctcgctgtaagtac query e.g.1000 letter word
  • Altschul, S. F., Gish, W., Miller, W., Myers, E.
    W. and Lipman, D. J. (1990) A basic local
    alignment search tool. Journal of Molecular
    Biology 215, 403-410.
  • What database sequence is most similar to my
    query.
  • Databases one of ours is 60GB worth of letters
  • BLAST generates statistics based upon similarity
    and substitution probabilities In simplest form
    purine to purine better than purine to
    pyrimidine
  • Slide along 4 GB database find word match and try
    to extend

7
  • BLASTX as example-Translation into 6 reading
    frames, search database with these 6 sequences
    with word size of 3.
  • Time to BLAST
  • Up to a point decreased time correlated with
    number of slaves available
  • Average test machines (2.4 ghz/1gb RAM/SATA150)
  • (e.g. 90 seq/13 CPU/3 min) vs (90seq/1CPU/38.5
    min) 350MB db GB-LAN

8
(No Transcript)
9
.NET Distributed BLAST
  • Take advantage of unused laboratory compute
    resources
  • Provide easy, powerful tool for Distributing
    BLAST
  • Target Atmosphere
  • Windows LAN
  • Current Open Source Distributed BLAST
    Applications
  • Require server class master or version of UNIX
  • Difficult to set up, configure databases, compile
    and submit jobs.
  • No large job fault tolerance

10
W.ND BLAST A Bioinformatician promoting windows?
  • .NET C
  • First tests Condor, MPI, a ported remote shell
  • Contractor
  • Project Manager
  • Database formatter
  • Worker machines
  • Job leasing
  • Output processing HT backend apps

11
Gotta GUI
12
Database formatter
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Functionality
  • Network bandwidth would eventually be limited
  • Fault tolerant to worker failure
  • Resume upon reboot if Contractor fails
  • No statistical problems with search results
  • Complete BLAST database on each worker node if
    resources allow
  • Easy to install a breeze to use

17
.NET Distributed BLAST
  • Queue at each node
  • Contractor only allows maximum of two query
    sequences in each nodes queue
  • Ensures application wait a minimal amount of time
    between completion and next job
  • Thread per node
  • Makes use of .NET Asynchronous Delegate / AD
    scalability ???
  • Thread Invokes BLAST on remote node
  • Upon completion, remote node sends finished
    message to the Contractor
  • The contractor collects results and performs
    validity check
  • Once results are verified, remote worker BLAST
    starts on queue sequence and Contractor prepares
    allocates future job

18
.NET Distributed BLAST
  • Fault Tolerance-revisited
  • Task migration handled through application-level
    checkpointing
  • Worker encounters fault or crashes,
  • Contractor redirects failed nodes sequence on
    another worker node.
  • Minimal loss of time
  • Integrating QOS functionality- current in works
  • decrease priority when workstation is in use
    based upon system remote call checking CPU,
    memory etc
  • GUI allows increasing or decreasing priority
    rev gauges and throttles
  • Storage requirement limitations - redirect query
    to other database source (working with 10
    connection limitation in XP pro)

19
Future Directions
  • Quality of Service
  • Allow Contractor to set priority for application
  • Contractor Fault Tolerance
  • Large Network Optimization
  • Sub Contractors
  • Asynch Del. Thread limit- ewww kewl WEB
    SERVICE!
  • Shadow (Sub) Contractors- network load balance

20
  • The End!
  • Questions?
  • Suggestions?
  • Advice?
  • Even Criticism?
Write a Comment
User Comments (0)
About PowerShow.com