The Center for Computational Genomics and Bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

The Center for Computational Genomics and Bioinformatics

Description:

The 'Bioinformatics' component 'Pipeline' data processing and ... The 'bioinformatics' component ' ... Many bioinformatics tools are heuristic rather than ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 14
Provided by: Comp752
Category:

less

Transcript and Presenter's Notes

Title: The Center for Computational Genomics and Bioinformatics


1
The Center for Computational Genomics and
Bioinformatics
  • Christopher Dwan
  • Mike Karo
  • Tim Kunau

2
Outline
  • Perspective
  • Processing tasks requirements
  • Computational solutions
  • Interesting issues

3
Funding chart
4
(No Transcript)
5
The Bioinformatics component
  • Pipeline data processing and storage
  • 100Kb data
  • lt5sec processing time
  • 10,000 / month
  • The problem Interface (batch dependancy
    management)
  • Similarity search
  • Search against one or more 10GB databases
  • The Problem Data movement memory
  • (much easier on dedicated resources)

6
The bioinformatics component
  • Unigene assembly
  • Traditional long run, big memory compute problem
  • Comes at the end of the other two types
  • The problem algorithms
  • Clustering / Pattern Discovery
  • Conference driven
  • Causes us to redo the other tasks

7
The bioinformatics component
  • Data warehouses
  • Mirroring and cross checking other public
    resources
  • Local Oracle implementation of public databases
    for local users (Genbank / Swiss-PROT / Medicago
    )

8
The bioinformatics component
  • Microarray data
  • Image data (1MB per image) requires processing
    and storage
  • Unknown normalization, errors, etc. requires that
    we simply keep all the raw data.
  • Web based display of results
  • Visualization

9
Computational resources
  • 100 CPU Opportunistic Condor Flock
  • Not dedicated
  • Configuration can change without warning
  • No permanent local data storage
  • Machines sit on desks.
  • flocking with Madison, CS dept, other labs
  • Reciprocity can hurt a LOT.
  • Server farms
  • Intel / Alpha
  • Hard to find money to buy dedicated machines,
    esp. on single organism projects.

10
Software and user issues
  • An intuitive interface to parallel and batch
    systems gives uninformed users a great deal of
    power.
  • Tools from outside Poor scalability
  • Tools from inside Poor portability

11
Heuristic algorithms
  • Many bioinformatics tools are heuristic rather
    than complete searches.
  • These searches can return different results on
    different machines (dynamic thresholds, 32 vs. 64
    bit math, )
  • How do we tell different from erroneous?

12
Thank you
  • The Condor team at Madison
  • Sanger Center

13
Collaborations are the key
  • Christopher Dwan cdwan_at_ahc.umn.edu
  • Mike Karo mek_at_ahc.umn.edu
  • Tim Kunau kunau_at_ahc.umn.edu
Write a Comment
User Comments (0)
About PowerShow.com