Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems

Description:

Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems. Panelist: ... Bayes Belief Net. Originated effort to add sensor data, intelligence info, ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 13
Provided by: newar1
Category:

less

Transcript and Presenter's Notes

Title: Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems


1
Spatiotemporal Cluster Detection in ESSENCE
Biosurveillance Systems
  • Panelist Howard Burkom
  • National Security Technology Department,
  • John Hopkins University Applied Physics
    Laboratory
  • DIMACS Working Group
  • Workshop on Analytical Methods for
  • Surveillance of Multidimensional Data
    StreamsRutgers University, Piscataway NJ
  • February 19, 2004

2
Problem/Data Context of ESSENCE Surveillance
Systems
Absentee Rates
Sales of OTC Remedies
Hosp ER Admissions
Physician Office Visits
Normalization Analysis Fusion
Counts/Clusters of Statistical Significance
Who? What? Where? When?
Epidemiological Significance
3
Applying Statistical Process Control to Multiple
Data Streams
  • Multiplicity from intertwined effects multiple
    data sources, regions, strata (syndrome groups,
    product groups)
  • Multiple univariate methods
  • Critical issue use individual detector outputs
    without getting overwhelmed by multiple testing
  • Low power for anomalies spread over inputs
  • Multivariate methods
  • Critical issue need modifications to reduce
    alerts due to irrelevant changes in data
    relationships
  • Need to retain power in individual source data

4
Significance Assessment Multiple Univariate
Alerting Algorithms
  • Bonferroni bound replace a by a/N
  • Alert based on individual outputs (conservative)
  • Edgingtons consensus method (1972)
  • Combined prob from alg. comb .of N individual
    p-values
  • Z-score approximation
  • ( mean(p-values) 0.5 ) / ( 0.2887 / vN )
  • Bayes Belief Net
  • Originated effort to add sensor data,
    intelligence info,
  • Recently applied to separate algorithm outputs
  • Can weight each type of information based on
    training data and/or intuition
  • Configurable to soften thresholds for evidence
    accrual

5
Multivariate Alerting Strategies
  • Variants of Hotellings T2
  • m vector mean est. from current baseline
  • S est. of covariance matrix calc. from baseline
  • X multivariate (filtered?) data from test
    interval
  • T2 statistic (X- m) S-1(X- m) (Ye et al, 2002)
  • Neighbor-regression preconditioning strategy of
    Hawkins removal of covariance effects
  • MEWMA (Lowry), MCUSUM (Crosier,
    Pignatiello/Runger)
  • Numerous strategies, adaptations to Poisson data
  • But which is appropriate for multivariate
    syndromic data streams?
  • Can EWMA/Shewhart (or CUSUM/Shewhart) encompass
    both point-source bioweapon epicurve and
    seasonal endemicgtepidemic outbreak?

6
Detection Challenge faint rise in all 3 data
sets
Respiratory Syndrome Data Counts
Military Dx
Military Rx
Civilian Dx
7
Detection Challenge faint rise in all 3 data
sets
Respiratory Syndrome Data Counts
Lowrys MEWMA Day 4 alert at each FA rate
8
Scan Statistics for Biosurveillance
Scarlet Fever Outbreak Study
Analysis of Claims Data in National Capital Area
ICD9 codes for scarlet fever 034 034.1
10 cases, 5 days p 0.013
15 cases, 12 days p 0.002
11 cases, 7 days p lt 0.001
9
Surveillance combining outpatient visits, OTC
anti-flu sales, school absenteeism
10
Practical Issues in Spatiotemporal Monitoring and
Evaluation
  • Control needed for mismatched scales variances
    among data sources
  • To retain power in indiv. sources, gain combined
    sensitivity
  • Difficult to assess delays, relative scales of
    effects among separate sources, in both
    background signal
  • Simulation much harder to validate
  • If distance matrix is used, it should reflect
    proximity according to the epidemiological case
    definition
  • Modifications to reflect plausible demographic
    behaviors
  • The importance of significance testing grows
    with the number of sources, especially for
    subregions where expected counts are low
  • More sources gt more small spurious clusters

11
Finding Clusters with Multiple Data Sources
  • For candidate cluster J1, the Kulldorff
    likelihood ratio is
  • LR(J1) (O1/E1)O1 ((N-O1) / (N-E1)) (N-O1)
  • where O1 number of cases inside J1,
  • E1 number of cases outside J1,
  • N total case count
  • Extension by treating multiple sources as
    covariates
  • O1 SO1k, E1 SE1k, N SNk, for sources
    k1,,K
  • adjusted method problem of adding sources
    with mismatched scales, variances
  • Alternate multisource approach stratified
    scan statistic
  • S log( LR(J1k) ), k1,,K
  • reduces chances for a noisy source to overwhelm
    others
  • can cost power to detect faint signal spread
    over sources

12
FROC Performance Assessment Adjusted vs
Stratified Multisource Scan Statistics
Prob. Signal-Based Significant Cluster
Prob. Random Background Significant Cluster
Write a Comment
User Comments (0)
About PowerShow.com