Title: High Performance Discovery from Time Series Streams
1High Performance Discovery from Time Series
Streams
Dennis Shasha Joint work with Yunyue
Zhu yunyue_at_cs.nyu.edu shasha_at_cs.nyu.edu Cou
rant Institute, New York University
2Overall Outline
- Data mining both classical and activist
- Algorithmic tools for time series
- Surprise.
3Goal of this work
- Time series are important in so many applications
biology, medicine, finance, music, physics, - A few fundamental operations occur all the time
burst detection, correlation, pattern matching. - Do them fast to make data exploration faster,
real time, and more fun. - Extend functionality for music and science.
4StatStream (VLDB,2002) Example
- Stock prices streams
- The New York Stock Exchange (NYSE)
- 50,000 securities (streams) 100,000 ticks (trade
and quote) - Pairs Trading, a.k.a. Correlation Trading
- Querywhich pairs of stocks were correlated with
a value of over 0.9 for the last three hours?
XYZ and ABC have been correlated with a
correlation of 0.95 for the last three hours. Now
XYZ and ABC become less correlated as XYZ goes up
and ABC goes down. They should converge back
later. I will sell XYZ and buy ABC
5Online Detection of High Correlation
- Given tens of thousands of high speed time series
data streams, to detect high-value correlation,
including synchronized and time-lagged, over
sliding windows in real time. - Real time
- high update frequency of the data stream
- fixed response time, online
6Online Detection of High Correlation
- Given tens of thousands of high speed time series
data streams, to detect high-value correlation,
including synchronized and time-lagged, over
sliding windows in real time. - Real time
- high update frequency of the data stream
- fixed response time, online
7Online Detection of High Correlation
- Given tens of thousands of high speed time series
data streams, to detect high-value correlation,
including synchronized and time-lagged, over
sliding windows in real time. - Real time
- high update frequency of the data stream
- fixed response time, online
8StatStream Algorithm
- Naive algorithm
- N number of streams
- w size of sliding window
- space O(N) and time O(N2w) VS space O(N2) and
time O(N2) . - Suppose that the streams are updated every
second. - With a Pentium 4 PC, the exact computing method
can only monitor 700 streams with a delay of 2
minutes. - Our Approach
- Use Discrete Fourier Transform to approximate
correlation - Use grid structure to filter out unlikely pairs
- Our approach can monitor 10,000 streams with a
delay of 2 minutes.
9StatStream Stream synoptic data structure
- Three level time interval hierarchy
- Time point, Basic window, Sliding window
- Basic window (the key to our technique)
- The computation for basic window i must finish by
the end of the basic window i1 - The basic window time is the system response
time. - Digests
10StatStream Stream synoptic data structure
- Three level time interval hierarchy
- Time point, Basic window, Sliding window
- Basic window (the key to our technique)
- The computation for basic window i must finish by
the end of the basic window i1 - The basic window time is the system response
time. - Digests
Basic window digests sum DFT coefs
11StatStream Stream synoptic data structure
- Three level time interval hierarchy
- Time point, Basic window, Sliding window
- Basic window (the key to our technique)
- The computation for basic window i must finish by
the end of the basic window i1 - The basic window time is the system response
time. - Digests
Basic window digests sum DFT coefs
Sliding window digests sum DFT coefs
12StatStream Stream synoptic data structure
- Three level time interval hierarchy
- Time point, Basic window, Sliding window
- Basic window (the key to our technique)
- The computation for basic window i must finish by
the end of the basic window i1 - The basic window time is the system response
time. - Digests
Basic window digests sum DFT coefs
Sliding window digests sum DFT coefs
13StatStream Stream synoptic data structure
- Three level time interval hierarchy
- Time point, Basic window, Sliding window
- Basic window (the key to our technique)
- The computation for basic window i must finish by
the end of the basic window i1 - The basic window time is the system response
time. - Digests
Basic window digests sum DFT coefs
Basic window digests sum DFT coefs
Basic window digests sum DFT coefs
Time point
Basic window
14Synchronized Correlation Uses Basic Windows
- Inner-product of aligned basic windows
Stream x
Stream y
Basic window
Sliding window
- Inner-product within a sliding window is the sum
of the inner-products in all the basic windows in
the sliding window.
15Approximate Synchronized Correlation
- Approximate with an orthogonal function family
(e.g. DFT)
x1 x2 x3 x4 x5
x6 x7 x8
16Approximate Synchronized Correlation
- Approximate with an orthogonal function family
(e.g. DFT)
x1 x2 x3 x4 x5
x6 x7 x8
17Approximate Synchronized Correlation
- Approximate with an orthogonal function family
(e.g. DFT)
x1 x2 x3 x4 x5
x6 x7 x8
18Approximate Synchronized Correlation
- Approximate with an orthogonal function family
(e.g. DFT) - Inner product of the time series Inner
product of the digests - The time and space complexity is reduced from
O(b) to O(n). - b size of basic window
- n size of the digests (nltltb)
- e.g. 120 time points reduce to 4 digests
x1 x2 x3 x4 x5
x6 x7 x8
19Approximate lagged Correlation
- Inner-product with unaligned windows
- The time complexity is reduced from O(b) to O(n2)
, as opposed to O(n) for synchronized
correlation. Reason terms for different
frequencies are non-zero in the lagged case.
20Grid Structure(to avoid checking all pairs)
- The DFT coefficients yields a vector.
- High correlation gt closeness in the vector space
- We can use a grid structure and look in the
neighborhood, this will return a super set of
highly correlated pairs.
21Empirical Study Speed
Our algorithm is parallelizable.
22Empirical Study Precision
- Approximation errors
- Larger size of digests, larger size of sliding
window and smaller size of basic window give
better approximation - The approximation errors are small for the stock
data.
23Sketches Random Projection
- Correlation between time series of the returns of
stock - Since most stock price time series are close to
random walks, their return time series are close
to white noise - DFT/DWT cant capture approximate white noise
series because there is no clear trend (too many
frequency components). - Solution Sketches (a form of random landmark)
- Sketches pool matrix of random variables drawn
from stable distribution - Sketches The random projection of all time
series to lower dimensions by multiplication with
the same matrix - The Euclidean distance (correlation) between time
series is approximated by the distance between
their sketches with a probabilistic guarantee.
24Burst Detection
25Burst Detection Applications
- Discovering intervals with unusually large
numbers of events. - In astrophysics, the sky is constantly observed
for high-energy particles. When a particular
astrophysical event happens, a shower of
high-energy particles arrives in addition to the
background noise. Might last milliseconds or
days - In telecommunications, if the number of packages
lost within a certain time period exceeds some
threshold, it might indicate some network
anomaly. Exact duration is unknown. - In finance, stocks with unusual high trading
volumes should attract the notice of traders (or
perhaps regulators).
26Bursts across different window sizes in Gamma Rays
- Challenge to discover not only the time of the
burst, but also the duration of the burst.
27Elastic Burst Detection Problem Statement
- Problem Given a time series of positive numbers
x1, x2,..., xn, and a threshold function f(w),
w1,2,...,n, find the subsequences of any size
such that their sums are above the thresholds - all 0ltwltn, 0ltmltn-w, such that xm xm1 xmw-1
f(w) - Brute force search O(n2) time
- Our shifted wavelet tree (SWT) O(nk) time.
- k is the size of the output, i.e. the number of
windows with bursts
28Burst Detection Data Structure and Algorithm
- Define threshold for node for size 2k to be
threshold for window of size 1 2k-1
29Burst Detection Example
30Burst Detection Example
True Alarm
False Alarm
31False Alarms (requires work, but no errors)
32Empirical Study Gamma Ray Burst
33Extension to other aggregates
- SWT can be used for any aggregate that is
monotonic - SUM, COUNT and MAX are monotonically increasing
- the alarm threshold is aggregateltthreshold
- MIN is monotonically decreasing
- the alarm threshold is aggregateltthreshold
- Spread MAX-MIN
- Application in Finance
- Stock with burst of trading or quote(bid/ask)
volume (Hammer!) - Stock prices with high spread
34Empirical Study Stock Price Spread Burst
35Extension to high dimensions
36Elastic Burst in two dimensions
- Population Distribution in the US
37How to find the threshold for Elastic Burst?
- Suppose that the moving sum of a time series is a
random variable from a normal distribution. - Let the number of bursts in the time series
within sliding window size w be So(w) and its
expectation be Se(w). - Se(w) can be computed from the historical data.
- Given a threshold probability p, we set the
threshold of burst f(w) for window size w such
that PrSo(w) f(w) p.
38Find threshold for Elastic Bursts
- F(x) is the normal cdf, so symmetric around 0
- Therefore
F(x)
p
x
F-1(p)
39Summary
- Able to detect bursts of many different durations
in essentially linear time. - Can be used both for time series and for spatial
searching. - Can specify thresholds either with absolute
numbers or with probability of hit. - Algorithm is simple to implement and has low
constants (code is available). - Ok, its embarrassingly simple.
40With a Little Help From My Warped Correlation
- Karens humming Match
- Denniss humming Match
- What would you do if I sang out of tune?"
- Yunyues humming Match
41Related Work in Query by Humming
- Traditional method String Matching
Ghias et. al. 95, McNab
et.al. 97,Uitdenbgerd and Zobel 99 - Music represented by string of pitch directions
U, D, S (degenerated interval) - Hum query is segmented to discrete notes, then
string of pitch directions - Edit Distance between hum query and music score
- Problem
- Very hard to segment the hum query
- Partial solution users are asked to hum
articulately - New Method matching directly from audio
Mazzoni and Dannenberg 00 - Problem
- slowed down by DTW
42Time Series Representation of Query
Segment this!
- An example hum query
- Note segmentation is hard!
43How to deal with poor hum queries?
- No absolute pitch
- Solution the average pitch is subtracted
- Incorrect tempo
- Solution Uniform Time Warping
- Inaccurate pitch intervals
- Solution return the k-nearest neighbors
- Local timing variations
- Solution Dynamic Time Warping
44Dynamic Time Warping
- Euclidean distance sum of point-by-point
distance - DTW distance allowing stretching or squeezing
the time axis locally
45Envelope Transform using Piecewise Aggregate
Approximation(PAA) Keogh VLDB 02
46Envelope Transform using Piecewise Aggregate
Approximation(PAA)
- Advantage of tighter envelopes
- Still no false negatives, and fewer false
positives
47Container Invariant Envelope Transform
- Container-invariant A transformation T for
envelope such that - Theorem if a transformation is
Container-invariant and Lower-bounding, then the
distance between transformed times series x and
transformed envelope of y lower bound their DTW
distance.
48The Vision
- Ability to match time series quickly may open up
entire new application areas, e.g. fast reaction
to external events, music by humming and so on. - Main problems accuracy, excessive specification.
- Reference (advert) High Performance Discovery in
Time Series (Springer 2004)