Title: High Performance Algorithms for Multiple Streaming Time Series
1High Performance Algorithms for Multiple
Streaming Time Series
- Xiaojian Zhao
- Advisor Dennis Shasha
- Department of Computer Science
- Courant Institute of Mathematical Sciences
- New York University
- Jan. 10 2006
2Roadmap
- Motivation
- Incremental Uncooperative Time Series Correlation
- Incremental Matching Pursuit (MP) (optional)
- Future Work and Conclusion
3Motivation (1)
- Financial time series streams are watched closely
by millions of traders. - Which pairs of stocks were correlated with a
value of over 0.9 for the last three hours?
Report this information every half hour
(Incremental pairwise correlation) - How to form a portfolio consisting of a small
set of stocks which replicates the market? Update
it every hour (Incremental matching pursuit)
4Motivation (2)
- As processors speed up, algorithmic efficiency no
longer matters one might think. - True if problem sizes stay same. But they dont.
- As processors speed up, sensors improve
- Satellites spewing out more data a day
- Magnetic resonance imagers give higher resolution
images, etc.
5High performance incremental algorithms
- Incremental Uncooperative Time Series Correlation
- Monitor and report the correlation information
among all time series incrementally (e.g. every
half hour) - Improve the efficiency from quadratic to
super-linear - Incremental Matching Pursuit (MP)
- Monitor and report the approximation vectors of
matching pursuit incrementally (e.g. every hour) - Improve the efficiency significantly
6Incremental Uncooperative Time Series Correlation
7Problem statement
- Detect and report the correlation incrementally
and rapidly - Extend the algorithm into a general engine
- Apply it in practical application domains
8Online detection of high correlation
9Pearson correlation and Euclidean distance
- Normalized Euclidean distance ? Pearson
correlation - Normalization
-
-
- dist22(1- correlation)
- From now on, we will not differentiate between
correlation and Euclidean distance
10Naïve approach pairwise correlation
- Given a group of time series, compute the
pairwise correlation - Time O(WN2), where
- N number of streams
- W window size (e.g. in the past one hour)
Lets see high performance algorithms!
11Technical review
- Framework GEMINI
- Tools Data Reduction Techniques
- Deterministic Orthogonal vs. Randomized
- Fourier Transform, Wavelet Transform, and Random
Projection - Target Various Data
- Cooperative vs. Uncooperative
12GEMINI Framework
Data reduction, e.g. DFT, DWT, SVD
Faloutsos, C., Ranganathan, M. Manolopoulos,
Y. (1994). Fast subsequence matching in
time-series databases,. SIGMOD, 1994
13GEMINI an example
- Objective find the nearest neighborhood
(L2-norm) of each time series. - Compute the Fourier Transform over each of them,
e.g. X and Y yield two coefficient vectors Xf
and Yf - Xf(a1, a2, ak) and Yf(b1, b2, bk)
- Original distance vs. coefficient distance
(Parseval's Theorem)
- Because, for some data types, energy concentrates
on first a few frequency components, coefficient
distance can work as a very good filter and at
the same time guarantee no false negatives - They may be stored in a tree or grid structure
14DFT on random walk
15Review DFT/DWT vs. Random Projection
- Fourier Transform, Wavelet Transform and SVD
- A set of orthogonal base (deterministic)
- Based on Parseval's Theorem
- Random Projection
- A set of random base (non-deterministic)
- Based on Johnson-Lindenstrauss (JL) Lemma
Orthogonal Base
Random Base
16Review Random Projection Intuition
- You are walking in a sparse forest and you are
lost. - You have an outdated cell phone without a GPS
(w/o latitudealtitude). - You want to know if you are close to your friend.
- You identify yourself at 100 meters from Bestbuy
and 200 meters from a silver building etc. - If your friend is at similar distances from
several of these landmarks, you might be close to
one another. - Random projections are analogous to these
distances to landmarks.
17Random Projection
inner product
sketches
time series
random vector
Sketch A vector of output returned by random
projection
18Review Sketch Guarantees
- Johnson-Lindenstrauss ( JL) Lemma
- For any and any integer n, let k
be a positive integer such that - Then for any set V of n points in , there
is a map such that for all - Further this map can be found in randomized
polynomial time
- W.B.Johnson and J.Lindenstrauss. Extensions of
Lipshitz mapping into hilbert space. Contemp.
Math.,26189-206,1984
19Empirical study sketch approximation
Time series length256 and sketch size30
20Empirical study sketch approximation
21Empirical study sketch distance/real distance
Sketch30
Sketch1000
Sketch80
22Data classification
- Cooperative
- Time series exhibiting a fundamental degree of
regularity, allowing them to be represented by
the first few coefficients in the spectral space
with little loss of information - Example Stock Price (random walk)
- Tools Fourier Transform, Wavelet Transform, SVD
- Uncooperative
- Time series whose energy is not concentrated in
only a few frequency components, e.g. - Example Stock Return (
) - Tool Random Projection
23DFT on random walk and white noise
Cooperative
Uncooperative
24Approximation Power SVD Distance vs. Sketch
Distance
- Note SVD is superior to DFT and DWT in
approximation power. - But all of them are all bad for uncooperative
data. - Here sketch size 32 and SVD coefficient number
30
25Our new algorithm
- The big picture of the system
- Structured random vector (New)
- Compute sketch by structured convolution (New)
- Optimize in the parameter space (New)
- Empirical study
- Richard Cole, Dennis Shasha and Xiaojian Zhao.
Fast Window Correlations Over Uncooperative Time
Series. SIGKDD 2005
26Big Picture
time series 1 time series 2 time series 3 time
series n
sketch 1 sketch 2 sketch n
Correlatedpairs
Random Projection
Grid structure
Data Reduction
Filtering
27Our objective reminded
- Monitor and report the correlation periodically
e.g. every half hour - We chose Random Projection as a means to reduce
the data dimension - The time series needs to be looked at in a time
window. - This time window should slide forward as time
goes on.
28Definitions Sliding window and Basic window
Basic window (bw)
Time point
Stock 1
Stock 2
Stock 3
Stock n
Sliding window size8 Basic window size2
Sliding window (sw)
Time axis
Example Every half hour (bw) report the
correlation of the last three hours (sw)
29Random vector and naïve random projection
- Choose randomly sw random numbers to form a
random vector R(r1, r2, r3, r4, r5, r6, r7, r8,
r9, r10, r11, r12) - Inner product starts from each data point
- Xsk1(x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
x11 x12)R - Xsk2(x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
x12 x13)R - Xsk3(x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
x13 x14)R -
- We improve it in two ways
- Partition a random vector of length sw into
several basic windows - Use convolution instead of inner product
30How to construct a random vector
- Construct a random vector of 1/-1 of length sw.
- Suppose sliding window size12, and basic window
size4 - The random vector within a basic window is
- A control vector
- A final complete random vector for a sliding
window may look like
(1 1 -1 1 -1 -1 1 -1 1 1 -1 1)
Here Rbw(1 1 -1 1) b(1 -1 1)
Rbw
-Rbw
Rbw
31Naive algorithm and hope for improvement
r( 1 1 -1 1 -1 -1 1 -1 1 1 -1
1 ) x(x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11
x12)
dot product
xskrx x1x2-x3x4-x5-x6x7-x8x9x10-x11x12
With new data point arrival, this operation will
be done again
r ( 1 1 -1 1 -1 -1 1 -1 1
1 -1 1 ) x(x5 x6 x7 x8 x9 x10 x11
x12 x13 x14 x15 x16)
xskrx x5x6-x7x8-x9-x10x11x12x13x14x15-
x16
- There is redundancy in the second dot product
given the first one. - We will eliminate the repeated computation to
save time
32Our algorithm
- All the operations are over the basic window
- Pad with bw-1 zeros, then convolve
with Xbw
conv1(1 1 -1 1 0 0 0) (x1,x2,x3,x4) conv2(1
1 -1 1 0 0 0) (x5,x6,x7,x8) conv3(1 1 -1 1 0
0 0) (x9,x10,x11,x12)
x4
x4x3
Animation shows convolution in action
-x4x3x2
1 1 -1 1 0 0 0
x4-x3x2x1
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x3-x2x1
x2-x1
x1
33Our algorithm example
First Convolution
Second Convolution
Third Convolution
x8 x8x7 x6x7-x8 x5x6-x7x8 x5-x6x7 x6-x5 x5
x12 x12-x11 x10x11-x12 x9x10-x11x12 x9-x10x11
x10-x9 x9
x4 x4x3 x2x3-x4 x1x2-x3x4 x1-x2x3 x2-x1 x1
- xsk1 (x1x2-x3x4)-(x5x6-x7x8)(x9x10-x11x12)
- xsk2(x2x3-x4x5)-(x6x7-x8x9)(x10x11-x12x13)
34Our algorithm example
sk1(x1x2-x3x4) sk5(x5x6-x7x8)
sk9(x9x10-x11x12) xsk1 (x1x2-x3x4)-(x5x6-x
7x8)(x9x10-x11x12)b ( 1
-1 1)
First sliding window
sk2(x2x3-x4) (x5)sk6(x6x7-x8)
(x9)sk10(x10x11-x12) (x13)Then sum up and
we have xsk2(x2x3-x4x5)-(x6x7-x8x9)(x10x11-
x12x13) b( 1 -1
1)
Second sliding window
(Sk1 Sk5 Sk9)(b1 b2 b3) is inner product
35Basic window version
- Or if time series are highly correlated between
two consecutive data points, we may compute the
sketch every basic window. - That is, we update the sketch for each time
series only when data of a complete basic window
arrive. No convolution, only inner product.
36Overview of our new algorithm
- The projection of a sliding window is decomposed
into operations over basic windows - Each basic window is convolved/inner product with
each random vector only once - We may provide the sketches starting from each
data point or starts from the beginning of each
basic window. - There is no redundancy.
37Performance comparison
- Naïve algorithm
- For each datum and random vector
- (1) O(sw) integer additions
- Pointwise version
- Asymptotically for each datum and random vector
- (1) O(sw/bw) integer additions
- (2) O(log bw) floating point operations (use
FFT in computing convolutions) - Basic window version
- Asymptotically for each datum and random vector
- O(sw/bw2) integer additions
38Big picture revisited
time series 1 time series 2 time series 3 time
series n
sketch 1 sketch 2 sketch n
Correlatedpairs
Random Projection
Grid structure
Filtering
So far we reduce the data dimension efficiently.
Next, how can it be used as a filter?
39How to use the sketch distance as a filter
- Naive method compute the sketch distance
- Being close by sketch distance are likely to be
close by original distance (JL Lemma) - Finally any close data pair will be double
checked with the original data.
40Use the sketch distance as a filter
- But we do not use it, why? Expensive.
- Since we still have to do the pairwise comparison
between each pair of stocks which is ,
k is the size of the sketches, e.g. typically 30,
40, etc - Lets see our new strategy
41Our method sketch unit distance
Given sketches
We have
If f distance chunks have
we may say where f 30, 40, 50,
60 c 0.8, 0.9, 1.1
42Further sketch groups
We may compute the sketch group
Remind us of a grid structure
For example
If f sketch groups have
we may say
43Grid structure
- To avoid checking all pairs, we can use a grid
structure and look in the neighborhood, this will
return a super set of highly correlated pairs. - The data labeled as potential will be double
checked using the raw data vectors.
44Optimization in parameter space
- How to choose the parameters g, c, f, N?
N total number of the sketches g group size c
the factor of distance f the fraction of groups
which are necessary to claim that two time series
are close enough
- We will choose the best one to be applied to the
practical data. But how? --- an engineering
problem - Combinatorial Design (CD)
- Bootstrapping
Now, Lets put all together.
45Inner product with random vectors
r1,r2,r3,r4,r5,r6
46(No Transcript)
47Empirical study various data sources
- Cstr Continuous stirred tank reactor
- Fortal_ecg Cutaneous potential recordings of a
pregnant woman - Steamgen Model of a steam generator at Abbott
Power Plant in Champaign IL - Winding Data from a test setup of an industrial
winding process - Evaporator Data from an industrial evaporator
- Wind Daily average wind speeds for 1961-1978 at
12 synoptic meteorological stations in the
Republic of Ireland - Spot_exrates The spot foreign currency exchange
rates - EEG Electroencepholgram
48Empirical study performance comparison
Sliding window3616, basic window32 and sketch
size60
49Section conclusion
- How to perform data reduction over uncooperative
time series efficiently in contrast to
well-established methods for cooperative data - How to cope with middle-size sketch vectors
systematically. - Sketch vector partition, grid structure
- Parameter space optimization by combinatorial
design and bootstrapping - Many ideas can be extended to other applications
50Incremental Matching Pursuit (MP)
51Problem Statement
- Imagine a scenario where a group of
representative stocks will be chosen to form an
index e.g. for the Standard and Poors (SP) 500.
Target vector The summation of all the vectors
weighted by their capitalization. Candidate pool
All the stock price vectors in the market
Objective Find from candidate pool a small
group of vectors representing the target vectors
52Vanilla Matching Pursuit (MP)
- Greedily select a linear combination of vectors
from a dictionary to approximate a target vector
- Set i1
- Search the pool V and find the vector vi whose
angle with respect to target vector vt is
maximal - Compute the residue r vt-civi where ci
VA VA vi - If r lt error tolerance, then terminate and return
VA - Else set i i 1 and vt r, go back to 2
53vt
v1
v3
v2
54The incremental setting
- Time granularity revisited
Basic windowa sequence of unit time
points Sliding windowseveral consecutive basic
windows Sliding window slides once per basic
window
- Recomputing the representative vectors entirely
for each sliding window is wasteful since there
may be a trend between consecutive sliding windows
Xiaojian Zhao and Xin Zhang and Tyler Neylon and
Dennis Shasha. Incremental Methods for Simple
Problems in Time Series algorithms and
experiments, IDEAS 2005
55First idea reuse vectors
- The representative vectors may change only
slightly in both components and their order - True only if basic window is sufficiently small
e.g. 2, 3 time points - However, any newly introduced representative
vector may alter the entire tail of the
approximation path - The relative importance of the same
representative vector may differ a lot from one
sliding window to the next
56Two insightful observations
- The representative vectors are likely to remain
the same within a few sliding windows, though the
order may change - The vector of angles keeps quite
consistent, i.e. ( , , ,).
- Here is the cosine of angle between the
ith residue and the selected vector at that
round. - An example is (0.9, 0.8, 0.7, 0.7, 0.6, 0.6,
0.6,..)
57Angle space exploration ( )
- Whenever a vector is found whose is
larger than some threshold, choose that vector. - If there is no such vector, the vector with
largest is selected as the
representative vector at this round.
58Second idea cache good vectors
- Those representative vectors appearing in the
last several sliding windows form a cache C - The search for a representative vector starts
from C. If not found then go to whole pool V - Works well in practice.
59Empirical study time comparison
60Empirical study approximation power comparison
61Future Work and Conclusion
62Future work Anomaly Detection
- Measure the relative distance of each point from
its nearest neighbors - Our approach may serve as a monitor by reporting
those points far from any normal points
63Conclusion
- Motivation
- Introduce the concept of cooperative vs.
uncooperative time series - Propose a set of strategies dealing with
different data (Random projection, Structured
Convolution, Combinatorial Design, Bootstrapping,
Grid Structure) - Explore various incremental schemes
- Filter away obvious irrelevancies
- Reuse previous results.
- Future Work
64Thanks a lot!