High performance computing in biology: challenges and perspectives - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

High performance computing in biology: challenges and perspectives

Description:

Biological processes occur simultaneously at many temporal and ... Polygraph and ScalaBLAST both implemented using ... 1. HPC Polygraph. Challenges for HPC ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 22
Provided by: Staf740
Category:

less

Transcript and Presenter's Notes

Title: High performance computing in biology: challenges and perspectives


1
High performance computing in biology challenges
and perspectives
PNNL-SA-61363
  • SciDAC 2008
  • Christopher Oehmen
  • Pacific Northwest National Laboratory

2
Computational Biology computing at many scales
  • Biological processes occur simultaneously at many
    temporal and spatial scales
  • Molecules form working units and control systems
  • Cells encapsulate information and function
  • Tissues, communities can work in concertemerging
    behaviors linked to survival

3
Genes
  • Challenge is to characterize cells by their
    genetic signals
  • Genes turned on indicate processes that are
    activated
  • Dynamic, complex signals result from and
    contribute to many combined effects

4
Proteins and proteomics
  • Proteins working molecules in cells
  • Understand what processes a cell is using by
    taking a snapshot of molecular activity
  • Mass spectrometry a good way to characterize all
    proteins at once
  • The key is effectively identifying proteins or
    fragments that give rise to MS peaks.

5
Modeling and simulation
  • Using theory from chemistry and physics to
    predict behavior of biomolecular systems.
  • insight into complex or unmeasurable behaviors
  • parameters, boundary conditions, etc... for
    higher level simulations
  • Can help predict protein structure from sequence

6
Molecular interactions and relationships
  • Graphs used in many different ways in biology
  • Can we use graph theory to produce meaningful
    visual metaphors and analysis techniques?
  • Biological graphs can be very complex,
    simplifying approximations often obscure the
    underlying biology

7
Multicell tissues and communities
  • Understanding emerging behavior requires
    integration across many levels
  • Control systems
  • Signals
  • Coordinated Responses
  • ...
  • Needs computing at terascale, petascale and
    beyond

8
HPC for genome analysis
  • Joint Genome Institute (JGI)
  • National Center for Biotechnology Information
    (NCBI)
  • The Institute For Genomic Research (TIGR-JCVI)
  • ...

Online tools
Data resources
9
Genome sequencing
consortium
Cost/genome Base pairs/day genomes
M 100K 1K-1M 100M-1B Few thousands
institute
single-investigator project
1995
2001
today
10
More genomes more analysis!
11
But what I really want to do with my genomes is...
  • Human health
  • Drug design
  • Biomarkers
  • Energy
  • Engineered systems for renewable energy
  • Carbon management
  • Defense
  • Rapid identification
  • Forensic analysis

How can we enable users to develop, refine
hypotheses in real time?
12
Conventional interface to HPC
Quality of solution related to theory used,
reliability of input and solution domain,
accuracy and precision of mathematical method.
Calculation driven by theoretical formulation and
observations. Generally the hypothesis being
evaluated maps well to the output of the
calculation.
13
Biology needs a different interface to HPC
DATA MATH
Calculations often driven by data and a mix of
statistical methods and chemistry, physics, other
equations. Driving question is often a high-level
hypothesis that is not easily mapped to the
output.
14
Spectrum of HPC coupling
High degree of coupling
Low degree of coupling
Remote HPC services
Local HPC integrated workflow
Local HPC services
  • Website/web services interface to external,
    shared large-scale facilities may require
  • Manual data entry
  • Password/authentication
  • Limit on priority for large-scale tasks
  • Website/web services interface to internal,
    dedicated facilities
  • User has more control over resource allocation
  • Application-level access to dedicated HPC
  • User doesnt even have to know HPC is being used

15
Taking advantage of local HPC resources
  • Web services, websites for launching parallel jobs

Local dedicated cluster
User or applications can integrate output into
workflow
Polygraph and ScalaBLAST both implemented using
this model at PNNL
16
More tightly couple HPC resources to biological
workflows
  • Case study multiple whole-genome analysis
    workflow

Visualization
High performance hardware, software
Genome 1
Genome 2
Genome 3
Post-processing

Genome n
17
More tightly couple HPC resources to biological
workflows
  • Case study proteomics workflow

3. Public tools
Instrument output
Bioinformatics Resource Manager (BRM)
2. PQuad
1. HPC Polygraph
18
Challenges for HPC users in biology
  • Biological computing is different than most HPC
  • Integer mathematics more important than floating
    point arithmetic
  • Memory or memory-latency bound
  • Data-driven, data-intensive
  • We need different kinds of hardware
  • Mathematical challenges
  • Biological data is growing exponentially
  • 1 false positive rate is unacceptable when you
    have 1 billion items
  • Often want to understand space of good solutions
    instead of 1 optimal solution
  • Scalable applications must continually evolve
    with new mathematical theory

19
More challenges...
  • Policy/permissions
  • Most HPC systems operate using multiuser, batch
    model
  • Tightly coupling HPC into biological workflows
    means applications will need more immediate
    access to compute cycles, NOT batch mode
    operation
  • Local HPC systems not normally available for
    anonymous users

20
Summary
  • Biology sciences are just scratching the surface
    of how high performance computing might be used.
  • HPC solutions in biology will need to accommodate
    exponentially growing datasets with evolving
    mathematics
  • Biology will likely continue to provide science
    drivers for novel architectures that prioritize
    integer mathematics and memory bandwidth/capacity
  • HPC can maximize its value to biology by more
    tightly coupling with analytical pipelines and
    workflowsbut this will require different access
    and user models

21
Acknowledgments
  • Support provided by the Data Intensive Computing
    For Complex Biological Systems funded by the
    Office of Advanced Scientific Computing Research,
    and under the LDRD Program at Pacific Northwest
    National Laboratory.
Write a Comment
User Comments (0)
About PowerShow.com