High Performance Discovery from Time Series Streams - PowerPoint PPT Presentation

About This Presentation

Title:

High Performance Discovery from Time Series Streams

Description:

Basic window digests: sum. DFT coefs. Three level time interval hierarchy ... The basic window time is the system response time. Digests ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 49

Provided by: yunyu

Learn more at: https://cs.nyu.edu

Category:

more less

Transcript and Presenter's Notes

Title: High Performance Discovery from Time Series Streams

1
High Performance Discovery from Time Series
Streams
Dennis Shasha Joint work with Yunyue
Zhu yunyue_at_cs.nyu.edu shasha_at_cs.nyu.edu Cou
rant Institute, New York University
2
Overall Outline

Data mining both classical and activist
Algorithmic tools for time series
Surprise.

3
Goal of this work

Time series are important in so many applications
biology, medicine, finance, music, physics,
A few fundamental operations occur all the time
burst detection, correlation, pattern matching.
Do them fast to make data exploration faster,
real time, and more fun.
Extend functionality for music and science.

4
StatStream (VLDB,2002) Example

Stock prices streams
The New York Stock Exchange (NYSE)
50,000 securities (streams) 100,000 ticks (trade
and quote)
Pairs Trading, a.k.a. Correlation Trading
Querywhich pairs of stocks were correlated with
a value of over 0.9 for the last three hours?

XYZ and ABC have been correlated with a
correlation of 0.95 for the last three hours. Now
XYZ and ABC become less correlated as XYZ goes up
and ABC goes down. They should converge back
later. I will sell XYZ and buy ABC
5
Online Detection of High Correlation

Given tens of thousands of high speed time series
data streams, to detect high-value correlation,
including synchronized and time-lagged, over
sliding windows in real time.
Real time
high update frequency of the data stream
fixed response time, online

6
Online Detection of High Correlation

Given tens of thousands of high speed time series
data streams, to detect high-value correlation,
including synchronized and time-lagged, over
sliding windows in real time.
Real time
high update frequency of the data stream
fixed response time, online

7
Online Detection of High Correlation

Given tens of thousands of high speed time series
data streams, to detect high-value correlation,
including synchronized and time-lagged, over
sliding windows in real time.
Real time
high update frequency of the data stream
fixed response time, online

8
StatStream Algorithm

Naive algorithm
N number of streams
w size of sliding window
space O(N) and time O(N2w) VS space O(N2) and
time O(N2) .
Suppose that the streams are updated every
second.
With a Pentium 4 PC, the exact computing method
can only monitor 700 streams with a delay of 2
minutes.
Our Approach
Use Discrete Fourier Transform to approximate
correlation
Use grid structure to filter out unlikely pairs
Our approach can monitor 10,000 streams with a
delay of 2 minutes.

9
StatStream Stream synoptic data structure

Three level time interval hierarchy
Time point, Basic window, Sliding window
Basic window (the key to our technique)
The computation for basic window i must finish by
the end of the basic window i1
The basic window time is the system response
time.
Digests

10
StatStream Stream synoptic data structure

Three level time interval hierarchy
Time point, Basic window, Sliding window
Basic window (the key to our technique)
The computation for basic window i must finish by
the end of the basic window i1
The basic window time is the system response
time.
Digests

Basic window digests sum DFT coefs
11
StatStream Stream synoptic data structure

Three level time interval hierarchy
Time point, Basic window, Sliding window
Basic window (the key to our technique)
The computation for basic window i must finish by
the end of the basic window i1
The basic window time is the system response
time.
Digests

Basic window digests sum DFT coefs
Sliding window digests sum DFT coefs
12
StatStream Stream synoptic data structure

Three level time interval hierarchy
Time point, Basic window, Sliding window
Basic window (the key to our technique)
The computation for basic window i must finish by
the end of the basic window i1
The basic window time is the system response
time.
Digests

Basic window digests sum DFT coefs
Sliding window digests sum DFT coefs
13
StatStream Stream synoptic data structure

Three level time interval hierarchy
Time point, Basic window, Sliding window
Basic window (the key to our technique)
The computation for basic window i must finish by
the end of the basic window i1
The basic window time is the system response
time.
Digests

Basic window digests sum DFT coefs
Basic window digests sum DFT coefs
Basic window digests sum DFT coefs
Time point
Basic window
14
Synchronized Correlation Uses Basic Windows

Inner-product of aligned basic windows

Stream x
Stream y
Basic window
Sliding window

Inner-product within a sliding window is the sum
of the inner-products in all the basic windows in
the sliding window.

15
Approximate Synchronized Correlation

Approximate with an orthogonal function family
(e.g. DFT)

x1 x2 x3 x4 x5
x6 x7 x8
16
Approximate Synchronized Correlation

Approximate with an orthogonal function family
(e.g. DFT)

x1 x2 x3 x4 x5
x6 x7 x8
17
Approximate Synchronized Correlation

Approximate with an orthogonal function family
(e.g. DFT)

x1 x2 x3 x4 x5
x6 x7 x8
18
Approximate Synchronized Correlation

Approximate with an orthogonal function family
(e.g. DFT)
Inner product of the time series Inner
product of the digests
The time and space complexity is reduced from
O(b) to O(n).
b size of basic window
n size of the digests (nltltb)
e.g. 120 time points reduce to 4 digests

x1 x2 x3 x4 x5
x6 x7 x8
19
Approximate lagged Correlation

Inner-product with unaligned windows

The time complexity is reduced from O(b) to O(n2)
, as opposed to O(n) for synchronized
correlation. Reason terms for different
frequencies are non-zero in the lagged case.

20
Grid Structure(to avoid checking all pairs)

The DFT coefficients yields a vector.
High correlation gt closeness in the vector space
We can use a grid structure and look in the
neighborhood, this will return a super set of
highly correlated pairs.

21
Empirical Study Speed
Our algorithm is parallelizable.
22
Empirical Study Precision

Approximation errors
Larger size of digests, larger size of sliding
window and smaller size of basic window give
better approximation
The approximation errors are small for the stock
data.

23
Sketches Random Projection

Correlation between time series of the returns of
stock
Since most stock price time series are close to
random walks, their return time series are close
to white noise
DFT/DWT cant capture approximate white noise
series because there is no clear trend (too many
frequency components).
Solution Sketches (a form of random landmark)
Sketches pool matrix of random variables drawn
from stable distribution
Sketches The random projection of all time
series to lower dimensions by multiplication with
the same matrix
The Euclidean distance (correlation) between time
series is approximated by the distance between
their sketches with a probabilistic guarantee.

24
Burst Detection
25
Burst Detection Applications

Discovering intervals with unusually large
numbers of events.
In astrophysics, the sky is constantly observed
for high-energy particles. When a particular
astrophysical event happens, a shower of
high-energy particles arrives in addition to the
background noise. Might last milliseconds or
days
In telecommunications, if the number of packages
lost within a certain time period exceeds some
threshold, it might indicate some network
anomaly. Exact duration is unknown.
In finance, stocks with unusual high trading
volumes should attract the notice of traders (or
perhaps regulators).

26
Bursts across different window sizes in Gamma Rays

Challenge to discover not only the time of the
burst, but also the duration of the burst.

27
Elastic Burst Detection Problem Statement

Problem Given a time series of positive numbers
x1, x2,..., xn, and a threshold function f(w),
w1,2,...,n, find the subsequences of any size
such that their sums are above the thresholds
all 0ltwltn, 0ltmltn-w, such that xm xm1 xmw-1
f(w)
Brute force search O(n2) time
Our shifted wavelet tree (SWT) O(nk) time.
k is the size of the output, i.e. the number of
windows with bursts

28
Burst Detection Data Structure and Algorithm

Define threshold for node for size 2k to be
threshold for window of size 1 2k-1

29
Burst Detection Example
30
Burst Detection Example
True Alarm
False Alarm
31
False Alarms (requires work, but no errors)
32
Empirical Study Gamma Ray Burst
33
Extension to other aggregates

SWT can be used for any aggregate that is
monotonic
SUM, COUNT and MAX are monotonically increasing
the alarm threshold is aggregateltthreshold
MIN is monotonically decreasing
the alarm threshold is aggregateltthreshold
Spread MAX-MIN
Application in Finance
Stock with burst of trading or quote(bid/ask)
volume (Hammer!)
Stock prices with high spread

34
Empirical Study Stock Price Spread Burst
35
Extension to high dimensions

36
Elastic Burst in two dimensions

Population Distribution in the US

37
How to find the threshold for Elastic Burst?

Suppose that the moving sum of a time series is a
random variable from a normal distribution.
Let the number of bursts in the time series
within sliding window size w be So(w) and its
expectation be Se(w).
Se(w) can be computed from the historical data.
Given a threshold probability p, we set the
threshold of burst f(w) for window size w such
that PrSo(w) f(w) p.

38
Find threshold for Elastic Bursts

F(x) is the normal cdf, so symmetric around 0
Therefore

F(x)
p
x
F-1(p)
39
Summary

Able to detect bursts of many different durations
in essentially linear time.
Can be used both for time series and for spatial
searching.
Can specify thresholds either with absolute
numbers or with probability of hit.
Algorithm is simple to implement and has low
constants (code is available).
Ok, its embarrassingly simple.

40
With a Little Help From My Warped Correlation

Karens humming Match
Denniss humming Match
What would you do if I sang out of tune?"
Yunyues humming Match

41
Related Work in Query by Humming

Traditional method String Matching
Ghias et. al. 95, McNab
et.al. 97,Uitdenbgerd and Zobel 99
Music represented by string of pitch directions
U, D, S (degenerated interval)
Hum query is segmented to discrete notes, then
string of pitch directions
Edit Distance between hum query and music score
Problem
Very hard to segment the hum query
Partial solution users are asked to hum
articulately
New Method matching directly from audio
Mazzoni and Dannenberg 00
Problem
slowed down by DTW

42
Time Series Representation of Query
Segment this!

An example hum query
Note segmentation is hard!

43
How to deal with poor hum queries?

No absolute pitch
Solution the average pitch is subtracted
Incorrect tempo
Solution Uniform Time Warping
Inaccurate pitch intervals
Solution return the k-nearest neighbors
Local timing variations
Solution Dynamic Time Warping

44
Dynamic Time Warping

Euclidean distance sum of point-by-point
distance
DTW distance allowing stretching or squeezing
the time axis locally

45
Envelope Transform using Piecewise Aggregate
Approximation(PAA) Keogh VLDB 02
46
Envelope Transform using Piecewise Aggregate
Approximation(PAA)

Advantage of tighter envelopes
Still no false negatives, and fewer false
positives

47
Container Invariant Envelope Transform

Container-invariant A transformation T for
envelope such that
Theorem if a transformation is
Container-invariant and Lower-bounding, then the
distance between transformed times series x and
transformed envelope of y lower bound their DTW
distance.

48
The Vision

Ability to match time series quickly may open up
entire new application areas, e.g. fast reaction
to external events, music by humming and so on.
Main problems accuracy, excessive specification.
Reference (advert) High Performance Discovery in
Time Series (Springer 2004)

Write a Comment

User Comments (0)