Title: Elastic Burst Detection: Applications
1Elastic Burst Detection Applications
- Discovering intervals with an unusually large
numbers of events. - In astrophysics, the sky is constantly observed
for high-energy particles. When a particular
astrophysical event happens, a shower of
high-energy particles arrives in addition to the
background noise. - In finance, stocks with unusual high trading
volumes should attract the notice of traders (or
perhaps regulators). - Challenge to discover not only the time of the
burst, but also the duration of the burst which
may vary widely. - In astrophysics, a burst of high-energy particles
associated with a special event might last for a
few milliseconds or a few hours or even a few
days.
2Burst Detection Problem Statement
- ProblemGiven a time series of positive number
x1, x2,..., xn, and a threshold function f(w),
w1,2,...,n, find the subsequences of any size
such that their sums are above the thresholds - all 0ltwltn, 0ltmltn-w, such that xm xm1 xmw-1
gt f(w) - Brute force search O(n2) time
- Our shift wavelet tree (SWT) O(nk) time.
- k is the size of the output, i.e. the number of
windows with bursts
3Burst Detection Data Structure and Algorithm
- Lemma 1any subsequence s is included by one
window w in the SWT. - Lemma 2 if Sum(s)gtthreshold, then
Sum(w)gtthreshold (no false positives).
4StatStream Motivation
- Stock prices streams
- The New York Stock Exchange (NYSE)
- 50,000 securities (streams) 100,000 ticks (trade
and quote) - Pairs Trading, a.k.a. Correlation Trading
- Querywhich pairs of stocks were correlated with
a value of over 0.9 for the last three hours?
XYZ and ABC have been correlated with a
correlation of 0.95 for the last three hours. Now
XYZ and ABC become less correlated as XYZ goes up
and ABC goes down. They should converge back
later. I will sell XYZ and buy ABC
5StatStreamGoal
- Given tens of thousands of high speed time series
data streams, to detect high-value correlation,
including synchronized and time-lagged, over
sliding windows in real time. - Real time
- high update frequency of the data stream
- fixed response time, online
6StatStream Algorithm
- Naive algorithm
- N number of streams
- w size of sliding window
- space O(N) and time O(N2w) VS space O(N2) and
time O(N2) . - Suppose that the streams are updated every
second. - With a Pentium 4 PC, the exact computing method
can only monitor 700 streams with a delay of 2
minutes. - Our Approach
- Using Discrete Fourier Transform to approximate
correlation - Using grid structure to filter out unlikely pairs
- Our approach can monitor 10,000 streams with a
delay of 2 minutes.
7StatStream Stream synoptic data structure
- Three level time interval hierarchy
- Time point, Basic window, Sliding window
- Basic window (the key to our technique)
- The computation for basic window i must finish by
the end of the basic window i1 - The basic window time is the system response
time. - Digests
Basic window digests sum DFT coefs
Basic window digests sum DFT coefs
Sliding window digests sum DFT coefs