High Performance Algorithms for Multiple Streaming Time Series presentation

About This Presentation

Transcript and Presenter's Notes

Title: High Performance Algorithms for Multiple Streaming Time Series

1
High Performance Algorithms for Multiple
Streaming Time Series

Xiaojian Zhao
Advisor Dennis Shasha
Department of Computer Science
Courant Institute of Mathematical Sciences
New York University
Jan. 10 2006

2
Roadmap

Motivation
Incremental Uncooperative Time Series Correlation
Incremental Matching Pursuit (MP) (optional)
Future Work and Conclusion

3
Motivation (1)

Financial time series streams are watched closely
by millions of traders.
Which pairs of stocks were correlated with a
value of over 0.9 for the last three hours?
Report this information every half hour
(Incremental pairwise correlation)
How to form a portfolio consisting of a small
set of stocks which replicates the market? Update
it every hour (Incremental matching pursuit)

4
Motivation (2)

As processors speed up, algorithmic efficiency no
longer matters one might think.
True if problem sizes stay same. But they dont.
As processors speed up, sensors improve
Satellites spewing out more data a day
Magnetic resonance imagers give higher resolution
images, etc.

5
High performance incremental algorithms

Incremental Uncooperative Time Series Correlation
Monitor and report the correlation information
among all time series incrementally (e.g. every
half hour)
Improve the efficiency from quadratic to
super-linear
Incremental Matching Pursuit (MP)
Monitor and report the approximation vectors of
matching pursuit incrementally (e.g. every hour)
Improve the efficiency significantly

6
Incremental Uncooperative Time Series Correlation
7
Problem statement

Detect and report the correlation incrementally
and rapidly
Extend the algorithm into a general engine
Apply it in practical application domains

8
Online detection of high correlation
9
Pearson correlation and Euclidean distance

Normalized Euclidean distance ? Pearson
correlation
Normalization
dist22(1- correlation)
From now on, we will not differentiate between
correlation and Euclidean distance

10
Naïve approach pairwise correlation

Given a group of time series, compute the
pairwise correlation
Time O(WN2), where
N number of streams
W window size (e.g. in the past one hour)

Lets see high performance algorithms!
11
Technical review

Framework GEMINI
Tools Data Reduction Techniques
Deterministic Orthogonal vs. Randomized
Fourier Transform, Wavelet Transform, and Random
Projection
Target Various Data
Cooperative vs. Uncooperative

12
GEMINI Framework
Data reduction, e.g. DFT, DWT, SVD
Faloutsos, C., Ranganathan, M. Manolopoulos,
Y. (1994). Fast subsequence matching in
time-series databases,. SIGMOD, 1994
13
GEMINI an example

Objective find the nearest neighborhood
(L2-norm) of each time series.
Compute the Fourier Transform over each of them,
e.g. X and Y yield two coefficient vectors Xf
and Yf
Xf(a1, a2, ak) and Yf(b1, b2, bk)
Original distance vs. coefficient distance
(Parseval's Theorem)

Because, for some data types, energy concentrates
on first a few frequency components, coefficient
distance can work as a very good filter and at
the same time guarantee no false negatives
They may be stored in a tree or grid structure

14
DFT on random walk
15
Review DFT/DWT vs. Random Projection

Fourier Transform, Wavelet Transform and SVD
A set of orthogonal base (deterministic)
Based on Parseval's Theorem
Random Projection
A set of random base (non-deterministic)
Based on Johnson-Lindenstrauss (JL) Lemma

Orthogonal Base
Random Base
16
Review Random Projection Intuition

You are walking in a sparse forest and you are
lost.
You have an outdated cell phone without a GPS
(w/o latitudealtitude).
You want to know if you are close to your friend.
You identify yourself at 100 meters from Bestbuy
and 200 meters from a silver building etc.
If your friend is at similar distances from
several of these landmarks, you might be close to
one another.
Random projections are analogous to these
distances to landmarks.

17
Random Projection
inner product
sketches
time series
random vector
Sketch A vector of output returned by random
projection
18
Review Sketch Guarantees

Johnson-Lindenstrauss ( JL) Lemma
For any and any integer n, let k
be a positive integer such that
Then for any set V of n points in , there
is a map such that for all
Further this map can be found in randomized
polynomial time

W.B.Johnson and J.Lindenstrauss. Extensions of
Lipshitz mapping into hilbert space. Contemp.
Math.,26189-206,1984

19
Empirical study sketch approximation
Time series length256 and sketch size30
20
Empirical study sketch approximation
21
Empirical study sketch distance/real distance
Sketch30
Sketch1000
Sketch80
22
Data classification

Cooperative
Time series exhibiting a fundamental degree of
regularity, allowing them to be represented by
the first few coefficients in the spectral space
with little loss of information
Example Stock Price (random walk)
Tools Fourier Transform, Wavelet Transform, SVD
Uncooperative
Time series whose energy is not concentrated in
only a few frequency components, e.g.
Example Stock Return (
)
Tool Random Projection

23
DFT on random walk and white noise
Cooperative
Uncooperative
24
Approximation Power SVD Distance vs. Sketch
Distance

Note SVD is superior to DFT and DWT in
approximation power.
But all of them are all bad for uncooperative
data.
Here sketch size 32 and SVD coefficient number
30

25
Our new algorithm

The big picture of the system
Structured random vector (New)
Compute sketch by structured convolution (New)
Optimize in the parameter space (New)
Empirical study

Richard Cole, Dennis Shasha and Xiaojian Zhao.
Fast Window Correlations Over Uncooperative Time
Series. SIGKDD 2005

26
Big Picture
time series 1 time series 2 time series 3 time
series n
sketch 1 sketch 2 sketch n
Correlatedpairs
Random Projection
Grid structure
Data Reduction
Filtering
27
Our objective reminded

Monitor and report the correlation periodically
e.g. every half hour
We chose Random Projection as a means to reduce
the data dimension
The time series needs to be looked at in a time
window.
This time window should slide forward as time
goes on.

28
Definitions Sliding window and Basic window
Basic window (bw)
Time point
Stock 1
Stock 2
Stock 3

Stock n
Sliding window size8 Basic window size2
Sliding window (sw)
Time axis
Example Every half hour (bw) report the
correlation of the last three hours (sw)
29
Random vector and naïve random projection

Choose randomly sw random numbers to form a
random vector R(r1, r2, r3, r4, r5, r6, r7, r8,
r9, r10, r11, r12)
Inner product starts from each data point
Xsk1(x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
x11 x12)R
Xsk2(x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
x12 x13)R
Xsk3(x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
x13 x14)R

We improve it in two ways
Partition a random vector of length sw into
several basic windows
Use convolution instead of inner product

30
How to construct a random vector

Construct a random vector of 1/-1 of length sw.
Suppose sliding window size12, and basic window
size4
The random vector within a basic window is
A control vector
A final complete random vector for a sliding
window may look like

(1 1 -1 1 -1 -1 1 -1 1 1 -1 1)
Here Rbw(1 1 -1 1) b(1 -1 1)
Rbw
-Rbw
Rbw
31
Naive algorithm and hope for improvement
r( 1 1 -1 1 -1 -1 1 -1 1 1 -1
1 ) x(x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
x12)
dot product
xskrx x1x2-x3x4-x5-x6x7-x8x9x10-x11x12
With new data point arrival, this operation will
be done again
r ( 1 1 -1 1 -1 -1 1 -1 1
1 -1 1 ) x(x5 x6 x7 x8 x9 x10 x11
x12 x13 x14 x15 x16)

xskrx x5x6-x7x8-x9-x10x11x12x13x14x15-
x16

There is redundancy in the second dot product
given the first one.
We will eliminate the repeated computation to
save time

32
Our algorithm

All the operations are over the basic window
Pad with bw-1 zeros, then convolve
with Xbw

conv1(1 1 -1 1 0 0 0) (x1,x2,x3,x4) conv2(1
1 -1 1 0 0 0) (x5,x6,x7,x8) conv3(1 1 -1 1 0
0 0) (x9,x10,x11,x12)
x4
x4x3
Animation shows convolution in action
-x4x3x2
1 1 -1 1 0 0 0
x4-x3x2x1
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x3-x2x1
x2-x1
x1
33
Our algorithm example
First Convolution
Second Convolution
Third Convolution
x8 x8x7 x6x7-x8 x5x6-x7x8 x5-x6x7 x6-x5 x5
x12 x12-x11 x10x11-x12 x9x10-x11x12 x9-x10x11
x10-x9 x9
x4 x4x3 x2x3-x4 x1x2-x3x4 x1-x2x3 x2-x1 x1

xsk1 (x1x2-x3x4)-(x5x6-x7x8)(x9x10-x11x12)
xsk2(x2x3-x4x5)-(x6x7-x8x9)(x10x11-x12x13)

34
Our algorithm example
sk1(x1x2-x3x4) sk5(x5x6-x7x8)
sk9(x9x10-x11x12) xsk1 (x1x2-x3x4)-(x5x6-x
7x8)(x9x10-x11x12)b ( 1
-1 1)
First sliding window
sk2(x2x3-x4) (x5)sk6(x6x7-x8)
(x9)sk10(x10x11-x12) (x13)Then sum up and
we have xsk2(x2x3-x4x5)-(x6x7-x8x9)(x10x11-
x12x13) b( 1 -1
1)
Second sliding window
(Sk1 Sk5 Sk9)(b1 b2 b3) is inner product
35
Basic window version

Or if time series are highly correlated between
two consecutive data points, we may compute the
sketch every basic window.
That is, we update the sketch for each time
series only when data of a complete basic window
arrive. No convolution, only inner product.

36
Overview of our new algorithm

The projection of a sliding window is decomposed
into operations over basic windows
Each basic window is convolved/inner product with
each random vector only once
We may provide the sketches starting from each
data point or starts from the beginning of each
basic window.
There is no redundancy.

37
Performance comparison

Naïve algorithm
For each datum and random vector
(1) O(sw) integer additions
Pointwise version
Asymptotically for each datum and random vector
(1) O(sw/bw) integer additions
(2) O(log bw) floating point operations (use
FFT in computing convolutions)
Basic window version
Asymptotically for each datum and random vector
O(sw/bw2) integer additions

38
Big picture revisited
time series 1 time series 2 time series 3 time
series n
sketch 1 sketch 2 sketch n
Correlatedpairs
Random Projection
Grid structure
Filtering
So far we reduce the data dimension efficiently.
Next, how can it be used as a filter?
39
How to use the sketch distance as a filter

Naive method compute the sketch distance
Being close by sketch distance are likely to be
close by original distance (JL Lemma)
Finally any close data pair will be double
checked with the original data.

40
Use the sketch distance as a filter

But we do not use it, why? Expensive.
Since we still have to do the pairwise comparison
between each pair of stocks which is ,
k is the size of the sketches, e.g. typically 30,
40, etc
Lets see our new strategy

41
Our method sketch unit distance
Given sketches
We have
If f distance chunks have
we may say where f 30, 40, 50,
60 c 0.8, 0.9, 1.1
42
Further sketch groups
We may compute the sketch group
Remind us of a grid structure
For example
If f sketch groups have
we may say
43
Grid structure

To avoid checking all pairs, we can use a grid
structure and look in the neighborhood, this will
return a super set of highly correlated pairs.
The data labeled as potential will be double
checked using the raw data vectors.

44
Optimization in parameter space

How to choose the parameters g, c, f, N?

N total number of the sketches g group size c
the factor of distance f the fraction of groups
which are necessary to claim that two time series
are close enough

We will choose the best one to be applied to the
practical data. But how? --- an engineering
problem
Combinatorial Design (CD)
Bootstrapping

Now, Lets put all together.
45
Inner product with random vectors
r1,r2,r3,r4,r5,r6
46
(No Transcript)
47
Empirical study various data sources

Cstr Continuous stirred tank reactor
Fortal_ecg Cutaneous potential recordings of a
pregnant woman
Steamgen Model of a steam generator at Abbott
Power Plant in Champaign IL
Winding Data from a test setup of an industrial
winding process
Evaporator Data from an industrial evaporator
Wind Daily average wind speeds for 1961-1978 at
12 synoptic meteorological stations in the
Republic of Ireland
Spot_exrates The spot foreign currency exchange
rates
EEG Electroencepholgram

48
Empirical study performance comparison
Sliding window3616, basic window32 and sketch
size60
49
Section conclusion

How to perform data reduction over uncooperative
time series efficiently in contrast to
well-established methods for cooperative data
How to cope with middle-size sketch vectors
systematically.
Sketch vector partition, grid structure
Parameter space optimization by combinatorial
design and bootstrapping
Many ideas can be extended to other applications

50
Incremental Matching Pursuit (MP)
51
Problem Statement

Imagine a scenario where a group of
representative stocks will be chosen to form an
index e.g. for the Standard and Poors (SP) 500.

Target vector The summation of all the vectors
weighted by their capitalization. Candidate pool
All the stock price vectors in the market
Objective Find from candidate pool a small
group of vectors representing the target vectors
52
Vanilla Matching Pursuit (MP)

Greedily select a linear combination of vectors
from a dictionary to approximate a target vector

Set i1
Search the pool V and find the vector vi whose
angle with respect to target vector vt is
maximal
Compute the residue r vt-civi where ci
VA VA vi
If r lt error tolerance, then terminate and return
VA
Else set i i 1 and vt r, go back to 2

53
vt
v1
v3
v2
54
The incremental setting

Time granularity revisited

Basic windowa sequence of unit time
points Sliding windowseveral consecutive basic
windows Sliding window slides once per basic
window

Recomputing the representative vectors entirely
for each sliding window is wasteful since there
may be a trend between consecutive sliding windows

Xiaojian Zhao and Xin Zhang and Tyler Neylon and
Dennis Shasha. Incremental Methods for Simple
Problems in Time Series algorithms and
experiments, IDEAS 2005
55
First idea reuse vectors

The representative vectors may change only
slightly in both components and their order
True only if basic window is sufficiently small
e.g. 2, 3 time points
However, any newly introduced representative
vector may alter the entire tail of the
approximation path
The relative importance of the same
representative vector may differ a lot from one
sliding window to the next

56
Two insightful observations

The representative vectors are likely to remain
the same within a few sliding windows, though the
order may change
The vector of angles keeps quite
consistent, i.e. ( , , ,).
Here is the cosine of angle between the
ith residue and the selected vector at that
round.
An example is (0.9, 0.8, 0.7, 0.7, 0.6, 0.6,
0.6,..)

57
Angle space exploration ( )

Whenever a vector is found whose is
larger than some threshold, choose that vector.
If there is no such vector, the vector with
largest is selected as the
representative vector at this round.

58
Second idea cache good vectors

Those representative vectors appearing in the
last several sliding windows form a cache C
The search for a representative vector starts
from C. If not found then go to whole pool V
Works well in practice.

59
Empirical study time comparison
60
Empirical study approximation power comparison
61
Future Work and Conclusion
62
Future work Anomaly Detection

Measure the relative distance of each point from
its nearest neighbors
Our approach may serve as a monitor by reporting
those points far from any normal points

63
Conclusion

Motivation
Introduce the concept of cooperative vs.
uncooperative time series
Propose a set of strategies dealing with
different data (Random projection, Structured
Convolution, Combinatorial Design, Bootstrapping,
Grid Structure)
Explore various incremental schemes
Filter away obvious irrelevancies
Reuse previous results.
Future Work

64
Thanks a lot!

Write a Comment

User Comments (0)

About PowerShow.com

High Performance Algorithms for Multiple Streaming Time Series PowerPoint PPT Presentation