Title: High Performance Computing for Disease Surveillance
1High Performance Computing for Disease
Surveillance
- David Bauer
- Brandon Higgs
- Mojdeh Mohtashemi
The MITRE Corporation 7525 Colshire Drive McLean,
VA 22102
2Goal Apply HPC to ScanStat
- Improve the performance of ScanStat algorithm by
distributing algorithm across multiple processors
Expected Number of cases is defined for (s, t),
where s is a spatial cluster, and t is the time
span
then for all cases of outbreak N
3HPC Amdahls Law
- Amdahl quantifies the amount of improvement in
runtime (SpeedUp) that can be achieved using
multiple processors
where F is the percentage of an algorithm that is
sequential (i.e., cannot be parallelized)
4ScanStat Details
- Data source San Francisco Dept of Public Health,
Tuberculosis Program - Spatial blocks may be census tracts or individual
addresses - 76 CT form 441 centroids (rectangular)
- 392 individuals form 4,234 centroids
(rectangular) - Time window 4-72 weeks spanning a 10 year period
- Based on case counts
5Parallel ScanStat census tracts
- Analysis of ScanStat workload suggests most
centroids complete quickly - A few centroids require almost 2 hours to compute
- 441 centroids
- 10 year time span
- Single CPU 28 hrs
6Parallel ScanStat census tracts
- SpeedUp from parallelization limited by longest
running centroid - 440 centroids computed in 0.5 hr
- Longest centroid bound running time
- (7k seconds)
7Parallel ScanStat individual addresses
- Analysis of ScanStat workload suggests most
individual addresses also complete quickly - A few addresses require over 3 hours to compute
- 4,234 centroids
- 10 year timespan
- Single CPU 74 hrs
8Parallel ScanStat individual addresses
- SpeedUp from parallelization limited by longest
running centroid - 4,233 centroids completed in less than 0.5 hr
- Longest centroid bound running time
- (11.8k seconds)
9Future Work
- Parallelize ScanStat algorithm in the time domain
- Move ScanStat to distributed computing
environment to - Reduce HPC environment costs
- Potentially increase computing power
- Apply to a decision support problem with
real-time constraints - Interested in finding collaboration partners