High Performance Correlation Techniques For Time Series PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: High Performance Correlation Techniques For Time Series


1
High Performance Correlation Techniques For Time
Series
  • Xiaojian Zhao
  • Department of Computer Science
  • Courant Institute of Mathematical Sciences
  • New York university
  • 25 Oct. 2004

2
Roadmap
  • Section 1 Introduction
  • Motivation
  • Problem Statement
  • Section 2 Background
  • GEMINI framework
  • Random Projection
  • Grid Structure
  • Some Definitions
  • Naive method and Yunyues Approach
  • Section 3 Sketch based StatStream
  • Efficient Sketch Computation
  • Sketch technique as a filter
  • Parameter selection
  • Grid structure
  • System Integration
  • Section 4 Empirical Study
  • Section 5 Future Work
  • Section 6 Conclusion

3
Section 1 Introduction
4
Motivation
  • Stock prices streams
  • The New York Stock Exchange (NYSE)
  • 50,000 securities (streams) 100,000 ticks (trade
    and quote)
  • Pairs Trading, a.k.a. Correlation Trading
  • Querywhich pairs of stocks were correlated with
    a value of over 0.9 for the last three hours?

XYZ and ABC have been correlated with a
correlation of 0.95 for the last three hours. Now
XYZ and ABC become less correlated as XYZ goes up
and ABC goes down. They should converge back
later. I will sell XYZ and buy ABC
5
Online Detection of High Correlation
6
Why speed is important
  • As processors speed up, algorithmic efficiency no
    longer matters one might think.
  • True if problem sizes stay same but they dont.
  • As processors speed up, sensors improve
    --satellites spewing out a terabyte a day,
    magnetic resonance imagers give higher resolution
    images, etc.

7
Problem Statement
  • Detect and report the correlation rapidly and
    accurately
  • Expand the algorithm into a general engine
  • Apply them in many practical application domains

8
Big Picture
time series 1 time series 2 time series 3 time
series n
sketch 1 sketch 2 sketch n
Correlatedpairs
Random Projection
Grid structure
9
Section 2 Background
10
GEMINI framework
DFT, DWT, etc
Faloutsos, C., Ranganathan, M. Manolopoulos,
Y. (1994). Fast subsequence matching in
time-series databases. In proceedings of the ACM
SIGMOD Int'l Conference on Management of Data.
Minneapolis, MN, May 25-27. pp 419-429.
11
Goals of GEMINI framework
  • High performance
  • Operations on synopses will save time such as
    distance computation
  • Guarantee no false negative
  • Feature Space shrinks the original distances in
    the raw data space
  • .

12
Random Projection Intuition
  • You are walking in a sparse forest and you are
    lost.
  • You have an outdated cell phone without a GPS.
  • You want to know if you are close to your friend.
  • You identify yourself at 100 meters from the
    pointy rock and 200 meters from the giant oak
    etc.
  • If your friend is at similar distances from
    several of these landmarks, you might be close to
    one another.
  • The sketches are the set of distances to
    landmarks.

13
How to make Random Projection
  • Sketch pool A list of random vectors drawn from
    stable distribution (like the landmarks)
  • Project the time series into the space spanned by
    these random vectors
  • The Euclidean distance (correlation) between time
    series is approximated by the distance between
    their sketches with a probabilistic guarantee.
  • W.B.Johnson and J.Lindenstrauss. Extensions of
    Lipshitz mapping into hilbert space. Contemp.
    Math.,26189-206,1984

14
Random Projection
X current position
X relative distances
Rocks, buildings
inner product
Y relative distances
Y current position
random vector
sketches
raw time series
15
Sketch Guarantees
  • Note Sketches do not provide approximations of
    individual time series window but help make
    comparisons.
  • Johnson-Lindenstrauss Lemma
  • For any and any integer n, let k
    be a positive integer such that
  • Then for any set V of n points in , there
    is a map such that for all
  • Further this map can be found in randomized
    polynomial time

16
Sketches Random Projection
  • Why we use sketches or random projections?
  • To reduce the dimensionality!
  • For example
  • The original time series x is of the length 256,
    we may represent it with a sketch vector of
    length 30.
  • First step to removing the curse of
    dimensionality

17
Achliptass lemma
  • Dimitris Achliptas proved that

Let P be an arbitrary set of n points in ,
represented as an matrix A. Given
, let For integer , let R be a
random matrix with R(ij) ,
where are independent random variables from
either one of the following two probability
distributions shown in next slide
Idea from Dimitris Achlioptas,
Database-friendly Random Projections,
Proceedings of the twentieth ACM
SIGMOD-SIGACT-SIGART symposium on Principles of
database systems
18
Achliptass lemma
or
Let Let map the row
of A to the row of E. With a probability at
least , for all
Idea from Dimitris Achlioptas,
Database-friendly Random Projections,
Proceedings of the twentieth ACM
SIGMOD-SIGACT-SIGART symposium on Principles of
database systems
19
Definition Sketch Distance
Note DFT, DWT distance are analogous. For those
measures, the difference between the original
vectors is approximated by the difference between
the first Fourier/Wavelet coefficients of those
vectors.
20
Empirical Study Sketch Approximation
21
Empirical Study sketch distance/real distance
Sketch30
Sketch1000
Sketch80
22
Grid Structure
23
Correlation and Distance
  • There is relationship between Euclidean
    distance and Pearson correlation
  • Normalization
  • dist22(1- correlation)

24
How to compute the correlation efficiently?
  • Goal To find the most highly correlated stream
    pairs over sliding windows
  • Naive method
  • Statstream method
  • Our method

25
Naïve Approach
  • Space and time cost
  • Space O(N) and time O(N2sw)
  • N number of streams
  • sw size of sliding window.
  • Lets see Statstream approach

26
Definitions Sliding window and Basic window
Time point
Basic window
Stock 1
Stock 2
Stock 3

Stock n
Sliding window size8 Basic window size2
Sliding window
Time axis
27
StatStream Idea
  • Use Discrete Fourier Transform(DFT) to
    approximate correlation as in the GEMINI approach
    discussed earlier.
  • Every two minutes (basic window size), update
    the DFT for each time series over the last hour
    (sliding window size)
  • Use a grid structure to filter out unlikely pairs

28
StatStream Stream synoptic data structure
Sliding window
29
Section 3 Sketch based StatStream
30
Problem not yet solved
  • DFT approximates the price-like data type very
    well. Gives a poor approximation for
    returns(todays price yesterdays
    price)/yesterdays price.
  • Return is more like white noise which contains
    all frequency components.
  • DFT uses the first n (e.g. 10) coefficients in
    approximating data, which is insufficient in the
    case of white noise.

31
(No Transcript)
32
Big Picture Revisited
time series 1 time series 2 time series 3 time
series n
sketch 1 sketch 2 sketch n
Correlatedpairs
Random Projection
Grid structure
Random Projection inner product between Data
Vector and random vector
33
How to compute the sketch efficiently
  • We will not compute the inner product at each
    data point because the computation is expensive.
  • A new strategy, in joint work with Richard Cole,
    is used to compute the sketch.
  • Here the random variable will be drawn from

34
How to construct the random vector
  • Given time series ,
    compute its sketch for a window of size sw12.
  • Partition to smaller basic windows of size bw
    4.
  • The random vector within a basic window is R and
    a control vector b
  • is used to determine which basic window will
    be multiplied with 1 or 1 (Why? Wait)
  • A final complete random vector may look like

Here bw(1 1 -1 1) b(1 -1 1)
(1 1 -1 1 -1 -1 1 -1 1 1 -1 1)
35
Naive algorithm and hope for improvement
r(1 1 -1 1 -1 -1 1 -1 1 1 -1 1) x(x1 x2 x3
x4 x5 x6 x7 x8 x9 x10 x11 x12)
dot product
xskrx x1x2-x3x4-x5-x6x7-x8x9x10-x11x12
With new data point arrival, such operations will
be done again
r(1 1 -1 1 -1 -1 1 -1 1 1 -1 1) x(x5 x6 x7
x8 x9 x10 x11 x12 x13 x14 x15 x16)

xskrx x5x6-x7x8-x9-x10x11x12x13x14x15-
x16
  • There is redundancy in the second dot product
    given the first one.
  • We will eliminate the repeated computation to
    save time

36
Our algorithm (Pointwise version)
  • Convolve with corresponding after
    padding with bw zeros.

x4
x4x3
Animation shows convolution in action
-x4x3x2
1 1 -1 1 0 0 0 0
x4-x3x2x1
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x1 x2 x3 x4
x3-x2x1
x2-x1
x1
37
Our algorithm example
First Convolution
Second Convolution
Third Convolution
x8 x8x7 x6x7-x8 x5x6-x7x8 x5-x6x7 x6-x5 x5
x12 x12-x11 x10x11-x12 x9x10-x11x12 x9-x10x11
x10-x9 x9
x4 x4x3 x2x3-x4 x1x2-x3x4 x1-x2x3 x2-x1 x1


38
Our algorithm example
sk1(x1x2-x3x4) sk5(x5x6-x7x8)
sk9(x9x10-x11x12) xsk1 (x1x2-x3x4)-(x5x6-x
7x8)(x9x10-x11x12)b ( 1
-1 1)
First sliding window
sk2(x2x3-x4) (x5)sk6(x6x7-x8)
(x9)sk10(x10x11-x12) (x13)Then sum up and
we have xsk2(x2x3-x4x5)-(x6x7-x8x9)(x9x10-x
11x12) b( 1 -1
1)
Second sliding window
(Sk1 Sk5 Sk9)(b1 b2 b3) is inner product
39
Our algorithm
  • The projection of a sliding window is decomposed
    into operations over basic windows
  • Each basic window is convolved with each random
    vector only once
  • We may provide the sketches incrementally
    starting from each data point.
  • There is no redundancy.

40
Jump by a basic window (basic window version)
  • Or if time series are highly correlated between
    two consecutive data points, we may compute the
    sketch every other basic window.
  • That is, we update the sketch for each time
    series only when data of a complete basic window
    arrive.

41
Online Version
  • We take the basic window version for instance
  • Review To have the same baseline we normalize
    the time series within each siding window.
  • Challenge The normalization of the time series
    change over each basic window

42
Online Version
  • Its incremental computation nature results in a
    update of the average and variance whenever a new
    basic window enters
  • Do we have to compute the normalization and thus
    the sketch whenever a new basic window enters?
  • Of course not. Otherwise our algorithm will
    degrade into the trivial computation

43
Online Version
  • Then how? After mathematical manipulation, we
    claim that we only need store and maintain the
    following quantities

44
Performance comparison
  • Naïve algorithm
  • For each datum and random vector
  • O(sw) integer additions
  • Pointwise version
  • Asymptotically for each datum and random
    vector
  • (1) O(sw/bw) integer additions
  • (2) O(log bw) floating point operations
    (use FFT in computing convolutions)
  • Basic window version
  • Asymptotically for each basic window and
    random vector
  • (1) O(sw/bw) integer additions
  • (2) O(bw) floating point operations

45
Sketch distance filter quality
  • We may use the sketch distance to filter the
    unlikely data pairs
  • How accurate is it?
  • How is it compared to DFT and DWT distance
  • in terms of the approximation ability?

46
Empirical Study Sketch sketch compared to DFT
and DWT distance
  • Data length256
  • DFT the first 14 DFT coefficients are used in
    the distance computation,
  • DWT db2 wavelet is used with coefficient size16
  • Sketch the random vector number is 64

47
Empirical Comparison DFT, DWT and Sketch
48
Empirical Comparison DFT, DWT and Sketch
49
Use the sketch distance as a filter
  • We may compute the sketch distance
  • c could be 1.2 or larger to reduce the number of
    false negatives.
  • Finally any possible data point will be double
    checked with the raw data.

50
Use the sketch distance as a filter
  • But we will not use it, why? Expensive.
  • Since we still have to do the pairwise comparison
    between each pair of stocks which is
    , k is the size of the sketches

51
Sketch unit distance
Given sketches
We have
If f distance chunks have
we may say where f 30, 40, 50,
60 c 0.8, 0.9, 1.1
52
Further sketch groups
We may compute the sketch group
Grid Structure
For example
If f sketch groups have
we may say where f 30, 40, 50,
60 c 0.8, 0.9, 1.1
53
Optimization in Parameter Space
  • Next, how to choose the parameters g, c, f, N?

N total number of the sketches g group size c
the factor of distance f the fraction of groups
which are necessary to claim that two time series
are close enough
54
Optimization in Parameter Space
  • Essentially, we will prepare several groups of
    good parameter candidates and choose the best one
    to be applied to the practical data
  • But, how to select the good candidates?
  • Combinatorial Design (CD)
  • Bootstrapping

55
Combinatorial Design
  • The pair-wise combinations of all the parameters
  • Informally Each parameter value will see each
    value of other parameters in some parameter
    group.
  • P P1, P2, P3
  • Q Q1, Q2, Q3, Q4
  • R R1, R2
  • Combinations PQR24 groups
  • Combinatorial Design12 groups

http//www.cs.nyu.edu/cs/faculty/shasha/papers/co
mb.html
56
Combinatorial Design
  • Much smaller test space compared to that of all
    parameter combinations
  • We will further reduce the test space by taking
    advantage of continuity of recall and precision
    in parameter space.

57
Combinatorial Design
  • we will employ the coarse to fine strategy

N 30, 36, 40, 60 g 1, 2, 3 c
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1,
1.1, 1.2, 1.3 f 0.1, 0.2, 0.3,
0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1
When the good parameters are located, its local
neighbors will be searched further for better
solutions
58
Bootstrapping
  • Choose parameters with a stable performance in
    both sample data and real data
  • A sample set with 2,000,000 pairs
  • Among it, choose with replacement 20,000 sample
    100 times.
  • Compute the recall and precision each time

59
Bootstrapping
  • 100 recalls and precisions
  • Compute mean and std of recalls and precisions
  • Criterion of good parameters

Mean(recall)-std(recall)gtThreshold(recall) Mean(pr
ecision)-std(precision)gtThreshold(precision)
  • If there are no such parameters, enlarge the
    replacement sample size

60
Parameter Selection
61
Preferred data distributions
  • The distribution of the data affects the
    performance of our algorithm (Recall price and
    return)
  • The ideal data distribution
  • Generally, the less human intervenes, the better
  • The green data give much better results.

Where, C is a small constant
62
Empirical Study Various data types
  • Cstr Continuous stirred tank reactor
  • Fortal_ecg Cutaneous potential recordings of a
    pregnant woman
  • Steamgen Model of a steam generator at Abbott
    Power Plant in Champaign IL
  • Winding Data from a test setup of an industrial
    winding process
  • Evaporator Data from an industrial evaporator
  • Wind Daily average wind speeds for 1961-1978 at
    12 synoptic meteorological stations in the
    Republic of Ireland
  • Spot_exrates The spot foreign currency exchange
    rates
  • EEG Electroencepholgram

63
Empirical Study Data distribution
64
Grid Structure
  • Critical The largest value
  • Useful in the normalization to fit in the grid
    structure
  • Our small lemma

65
Grid Structure
  • High correlation gt closeness in the vector space
  • To avoid checking all pairs
  • We can use a grid structure and look in the
    neighborhood, this will return a super set of
    highly correlated pairs.
  • The data labeled as potential will be double
    checked using the raw data vectors.
  • The pruning power how many percentage of data
    are filtered as impossible to be close.

66
Inner product with random vectors
r1,r2,r3,r4,r5,r6
67
(No Transcript)
68
System Integration
  • By combining the sketch scheme with the grid
    structure, we can
  • Reduce dimensionality
  • Eliminate unnecessary pair comparisons
  • The performance can be improved substantially

69
Empirical Study Speed
Sliding window3616, basic window32 and sketch
size60
70
Empirical Study Breakdown
71
Empirical Study Breakdown
72
The Pruning Power of the Grid Structure
73
Visualization
74
Other applications
  • Cointegration Test
  • Matching Pursuit
  • Anomaly Detection

75
Cointegration Test
  • Make stationary by the linear combination of
    several non-stationary time series.
  • Model long run characteristic as opposed to the
    correlation
  • Statstream may be applied to test the stationary
    condition of cointegration

76
Matching Pursuit
  • Decompose signal into a group of non- orthogonal
    sub-components
  • Test the correlation among atoms in a dictionary.
  • Expedite the component selection

77
Anomaly Detection
  • Measure the relative distance of each point from
    its nearest neighbors
  • Statstream may serve as a monitor by reporting
    those points far from any normal points

78
Conclusion
  • Introduction
  • GEMINI Framework
  • Random Projection
  • Statstream Review
  • Efficient Sketch Computation
  • Parameter Selection
  • Grid Structure
  • System Integration
  • Empirical Study
  • Future work

79
Thanks a lot!
80
Recall and Precision
A
B
A Query ball B Returned result C Intersection
C
  • RecallC/A
  • PrecisionC/B
Write a Comment
User Comments (0)
About PowerShow.com