A GT3 based BLAST grid service for biomedical research - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

A GT3 based BLAST grid service for biomedical research

Description:

we have provided our own wrappers for OpenPBS client side and the Condor submission components ... runs over OpenBPS and Condor resources via our own java ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 2
Provided by: EU3
Category:

less

Transcript and Presenter's Notes

Title: A GT3 based BLAST grid service for biomedical research


1
A GT3 based BLAST grid service for biomedical
research Micha Bayer1, Aileen Campbell2 Davy
Virdee2 1National e-Science Centre, e-Science
Hub, Kelvin Building, University of Glasgow,
Glasgow G12 8QQ 2Edikt, National e-Science
Centre, e-Science Institute, 15 South College
Street, Edinburgh EH8 9AA
  • Overview
  • BLAST is a well-known program for biological
    sequence comparison
  • used to compare query sequences to a set of
    target sequences in order to find similar
    sequences in the target set
  • can be extremely compute intensive
  • we present a parallel implementation of BLAST
    delivered via a GT3 grid service
  • part of the BRIDGES project, a UK e-Science
    project aimed at providing a grid based
    environment for research into the genetic causes
    of hypertension (http//www.brc.dcs.gla.ac.uk/proj
    ects/bridges/)

Scheduler Algorithm parse input and count no. of
query sequences poll resources and establish
total no. of idle nodes set number of sub-jobs to
be run to be equal to total no. of idle
nodes calculate no. of sequences to be run per
sub-job n ( no. of idle nodes/no. of
sequences) while there are sequences left save n
sequences to a sub-job input file if the number
of idle nodes is 0 make up small number of
sub-jobs (currently hardcoded to 5) and evenly
distribute these into queues across resources
else for each resource send i subjobs to the
resource as separate threads where is the number
of idle nodes on the resource when results are
complete save to file in the original input file
order return this to the user
  • Parallel BLAST
  • to achieve maximum performance in a grid context,
    we have parallelised BLAST
  • multiple query sequences are partitioned into
    sub-jobs on the basis of the number of idle
    compute nodes available and then processed on
    these in batches
  • we have provided our own java based scheduler
    which distributes sub-jobs across an array of
    resources
  • System Architecture
  • grid service uses GT3.0.2 core only
  • we have provided our own wrappers for OpenPBS
    client side and the Condor submission components
  • a scheduler component examines the input, polls
    resources for available processors and farms out
    subtasks to the resources
  • details of resources (i.e. clusters) are held in
    single XML config file adding new resources is
    easy
  • target databases are located on execute nodes or
    on cluster masternode to minimise stage-in time
    these need updating regularly
  • Client Side
  • users of service range from expert to low
    computer literacy
  • delivery mechanism chosen was therefore via
    BRIDGES web portal (see below)
  • Java based graphical client to service is
    downloaded via Java webstart
  • allows for easy, centralised updates
  • also provides good opportunity to explore client
    side Globus
  • Design Issues
  • no suitable metaschedulers available at time of
    designing the system had to write our own
  • system only uses GT3 core as a thin layer
    between client side and scheduler since full GT3
    was due to be replaced by WSRF minimises future
    porting effort
  • Summary
  • We have constructed a parallelised BLAST service
    that farms out multiple query sequences as
    subjobs to a pool of resources.
  • Our scheduler runs over OpenBPS and Condor
    resources via our own java wrappers.
  • Client side delivery is through a Java GUI
    delivered via a web portal and Java Webstart.
  • Compute Resources Used
  • ScotGRID compute cluster at Glasgow Univ. a 250
    processor Linux cluster
  • Condor pool at National e-Science Centre, Glasgow
    Univ. 25 desktop machines, single processors
  • Contact / Further Information
  • BRIDGES website and portal at http//www.brc.dcs.
    gla.ac.uk/projects/bridges/
  • email contact michab_at_dcs.gla.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com