Title: A Wavelet-based Anomaly Detector for Disease Outbreaks
1A Wavelet-based Anomaly Detector for Disease
Outbreaks
- Thomas Lotze
- Galit Shmueli
- University of Maryland College Park
- Sean Murphy
- Howard Burkom
- Johns Hopkins University Applied Physics Lab
2Outline
- Motivation
- Wavelet method
- Difficulties
- Preconditioning
- Results
3Related Work
- Bakshi
- Wavelets in Chemical SPC
- Zhang
- Baseline wavelets
- Normalize syndromic baseline
- Goldenberg, et. al.
- Wavelets in syndromic surveillance
4Motivation
- Detecting disease outbreaks
- Bioterrorist attacks
- Virulent diseases
- Early detection saves lives!
- Syndromic Data will show outbreaks
- Anomaly detection to find outbreaks faster
5Wavelets
- Models a series as a sum of wavelets
- Wavelets are at different scales
- Wavelets are local (change over time)
6(No Transcript)
7Difficulties
- Holidays
- Non-stationary
- Day of week
- Seasonal
- Noisy
- Outbreaks are not labeled
- Outbreak pattern not known in advance
8Preconditioning
- Differs from Goldenberg, et. al
- Replace holidays
- One week previous
- Day-of-week
- Ratio to moving average
9Evaluation Simulated Outbreaks
- Real data from 5 cities, Resp and Gi
- Simulated outbreak patterns inserted
- Specific pattern of additional syndromes over
several days - Size is normalized by standard deviation of
recent days - Inserted at different starting points within the
sample data - Average detection rates vs. false alarm rates can
be determined to create ROC curves
10Results
- Comparable to Holt-Winters
- Not amazing
11Results
- Preconditioning is important
- Detection is much better when preconditioned
12Results
- Easier to detect on some days than others
- Days with low counts
- Daily preconditioning not sufficient
13Summary
- Wavelets are a fairly good detection method
- Preconditioning is very important
- Day-of-week not fully accounted for
14Questions?
- More details on wavelets method?
- Difficulties?
- Other outbreak signals?
- Future work?
- Will Microsoft survive Bill Gates' stepping down?
15Bonus More on Wavelets
- Level 1
- Run the data through a low-pass filter. This
gives the approximation coefficients - Run the data through a high-pass filter. This
gives the detail coefficients - Down-sample
- Reconstruct approximation and detail by
up-sampling and running reconstruction filters. - Level 2 and on
- Repeat the steps by applying them to the previous
level approximation coefficients.
16Bonus Wavelets on Cough Medication Sales
Haar Wavelet h 1/sqrt(2), 1/sqrt(2) g
1/sqrt(2), -1/sqrt(2) Downsample Upsample h
1/sqrt(2), 1/sqrt(2) g -1/sqrt(2),
1/sqrt(2) In general s a5 d1 d2 d5
17Bonus Wavelet Prediction
- Additional details
- 5 level decomposition
- Can be performed with more or fewer
- SWT Fill in holes
- Perform a decomposition for every possible
position - Series are no longer independent
- Edge issue
- Prediction is not possible at all time steps
- Solution construct wavelets backwards from
most recent observations
18Bonus Ratio-to-Moving-Average
- Way of normalizing day-of-week effects
- 1 Determine moving averages
- a(i)(x(i-3) x(i-2) ... x(i3)) /7
- 2 Determine ratio (raw seasonal) for each day
- r(i)x(i)/a(i)
- 3 Determine avg. ratio for each day
- r(Mon)sum(r(i) i is Mon) / count(i is Mon)
- 4 Normalize ratios to sum to 1
- r'(Mon)r(Mon) / (r(Mon) ... r(Sun))
- 5 Divide each day by its ratio
- x'(i)x(i)/r(Mon)
19Bonus Possible Extensions
- Multivariate wavelets
- Each day-of-week as a separate series
- Different wavelet shapes
- Different wavelet scale basis
- Different preconditioning
- Different sizes, lengths of outbreaks
- Don't normalize outbreak by standard deviation of
recent days - Show when outbreaks are harder to detect
- Estimate confidence based on experience
- Boosting
20Bonus Wavelet Prediction
- Decompose into timescales
- Use AR or EWMA to predict for each timescale
- Reconstruct prediction from predicted timescales
- Monitor deviations from prediction
21Bonus Alternative Preconditioning
- Regression using day-of-week predictors
- 7-day differencing
- Holt-Winters as preconditioner
- Seasonal preconditioning
22Bonus Other Outbreak Signals
- Normalized by total size
- Lognormal, exponential, step
- Spike is much easier than the others