High Performance Computing for Disease Surveillance - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

High Performance Computing for Disease Surveillance

Description:

Expected Number of cases is defined for (s, t), where s is a spatial cluster, ... Analysis of ScanStat workload suggests most centroids complete quickly ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 10
Provided by: mitr50
Category:

less

Transcript and Presenter's Notes

Title: High Performance Computing for Disease Surveillance


1
High Performance Computing for Disease
Surveillance
  • David Bauer
  • Brandon Higgs
  • Mojdeh Mohtashemi

The MITRE Corporation 7525 Colshire Drive McLean,
VA 22102
2
Goal Apply HPC to ScanStat
  • Improve the performance of ScanStat algorithm by
    distributing algorithm across multiple processors

Expected Number of cases is defined for (s, t),
where s is a spatial cluster, and t is the time
span
then for all cases of outbreak N
3
HPC Amdahls Law
  • Amdahl quantifies the amount of improvement in
    runtime (SpeedUp) that can be achieved using
    multiple processors

where F is the percentage of an algorithm that is
sequential (i.e., cannot be parallelized)
4
ScanStat Details
  • Data source San Francisco Dept of Public Health,
    Tuberculosis Program
  • Spatial blocks may be census tracts or individual
    addresses
  • 76 CT form 441 centroids (rectangular)
  • 392 individuals form 4,234 centroids
    (rectangular)
  • Time window 4-72 weeks spanning a 10 year period
  • Based on case counts

5
Parallel ScanStat census tracts
  • Analysis of ScanStat workload suggests most
    centroids complete quickly
  • A few centroids require almost 2 hours to compute
  • 441 centroids
  • 10 year time span
  • Single CPU 28 hrs

6
Parallel ScanStat census tracts
  • SpeedUp from parallelization limited by longest
    running centroid
  • 440 centroids computed in 0.5 hr
  • Longest centroid bound running time
  • (7k seconds)

7
Parallel ScanStat individual addresses
  • Analysis of ScanStat workload suggests most
    individual addresses also complete quickly
  • A few addresses require over 3 hours to compute
  • 4,234 centroids
  • 10 year timespan
  • Single CPU 74 hrs

8
Parallel ScanStat individual addresses
  • SpeedUp from parallelization limited by longest
    running centroid
  • 4,233 centroids completed in less than 0.5 hr
  • Longest centroid bound running time
  • (11.8k seconds)

9
Future Work
  • Parallelize ScanStat algorithm in the time domain
  • Move ScanStat to distributed computing
    environment to
  • Reduce HPC environment costs
  • Potentially increase computing power
  • Apply to a decision support problem with
    real-time constraints
  • Interested in finding collaboration partners
Write a Comment
User Comments (0)
About PowerShow.com