Timeseries

About This Presentation

Title:

Timeseries

Description:

The 2001 IEEE International Conference on Data Mining, November 29, ... Typical Weblog: 5 Gigabytes per week. Space Shuttle Database: 158 Gigabytes and growing. ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 88

Provided by: informa76

Category:

more less

Transcript and Presenter's Notes

Title: Timeseries

1
Time-series

dr. János Abonyi
University of Veszprem
abonyij_at_fmt.vein.hu
www.fmt.vein.hu/softcomp/dw

2
Source

A Tutorial on Indexing and Mining Time Series
Data, ICDM '01,
The 2001 IEEE International Conference on Data
Mining, November 29, San Jose
By Dr Eamonn Keogh

3
Outline of Tutorial

Introduction, Motivation
The Utility of Similarity Measurements
Properties of distance measures
The Minkowski metrics
Preprocessing the data
Time warped measures
Weighted measures
Indexing Time Series
Spatial Access Methods and the curse of
dimensionality
The GEMINI Framework
Dimensionality reduction
Discrete Fourier Transform
Discrete Wavelet Transform
Singular Value Decomposition
Piecewise Linear Approximation
Symbolic Approximation
Piecewise Aggregate Approximation
Adaptive Piecewise Constant Approximation
Empirical Comparison

4
What are Time Series?
25.1750 25.2250 25.2500 25.2500
25.2750 25.3250 25.3500 25.3500
25.4000 25.4000 25.3250 25.2250
25.2000 25.1750 .. .. 24.6250
24.6750 24.6750 24.6250 24.6250
24.6250 24.6750 24.7500
A time series is a collection of observations
made sequentially in time.
Note that virtually all similarity measurements,
indexing and dimensionality reduction techniques
discussed in this tutorial can be used with other
data types.
5
Time Series are Ubiquitous! I

People measure things...
The presidents approval rating.
Their blood pressure.
The annual rainfall in Riverside.
The value of their Yahoo stock.
The number of web hits per second.
and things change over time.

Thus time series occur in virtually every
medical, scientific and businesses domain.
6
Time Series are Ubiquitous! II
A random sample of 4,000 graphics from 15 of the
worlds newspapers published from 1974 to 1989
found that more than 75 of all graphics were
time series (Tufte, 1983).
7
Time Series Similarity
Classification
Clustering
Defining the similarity between two time series
is at the heart of most time series data mining
applications/tasks
Rule Discovery
10 ? s 0.5 c 0.3
Thus time series similarity will be the primary
focus of this tutorial.
Query by Content
Query Q (template)
8
The Utility of Similarity Search (In the Context
of Time Series)

Classification Do other genes express
themselves like this gene?
Aach, J and Church, GM (2001). Aligning gene
expression time series with time warping
algorithms. Bioinformatics 17495-508
Clustering Grouping robot experiences.
Oates, Tim Schmill, Matthew D. and Cohen, Paul
R. A Method for Clustering the Experiences of a
Mobile Robot that Accords with Human Judgments.
In AAAI 2000.
Association Rules Peak followed plateau implies
a downward trend with a confidence of 0.4 and a
support of 0.2.
Das, et al. (1998). Rule discovery from time
series.
Exploratory Data Analysis Understanding the
data by interacting with it.
Wijk, J.J. van, E. van Selow.(1999). Cluster and
Calendar-based Visualization of Time Series Data.
IEEE InfoVis'99.

9
Why is Working With Time Series so Difficult?
Part I
Answer How do we work with very large databases?

1 Hour of EKG data 1 Gigabyte.
Typical Weblog 5 Gigabytes per week.
Space Shuttle Database 158 Gigabytes and
growing.
Macho Database 2 Terabytes, updated with 3
gigabytes per day.

Since most of the data lives on disk (or tape),
we need a representation of the data we can
efficiently manipulate.
10
Why is Working With Time Series so Difficult?
Part II
Answer We are dealing with subjective notions of
similarity.
The definition of similarity depends on the
user, the domain and the task at hand. We need to
be able to handle this subjectivity.
11
Why is working with time series so difficult?
Part III

Answer Miscellaneous data handling problems.
Differing data formats.
Differing sampling rates.
Noise, missing values, etc.

12
The similarity matching problem can come in two
flavors I
1 Whole Matching
Query Q (template)
6
1
7
2
8
3
C6 is the best match.
9
4
10
5
Database C
Given a Query Q, a reference database C and a
distance measure, find the Ci that best matches
Q.
13
The similarity matching problem can come in two
flavors II
2 Subsequence Matching
Query Q (template)
Database C
The best matching subsection.
Given a Query Q, a reference database C and a
distance measure, find the location that best
matches Q.
Note that we can always convert subsequence
matching to whole matching by sliding a window
across the long sequence, and copying the window
contents.
14
After all that background we might have forgotten
what we are doing and why we care! So here is a
simple motivator and review..
You go to the doctor because of chest pains. Your
ECG looks strange You doctor wants to search a
database to find similar ECGS, in the hope that
they will offer clues about your condition...

Two questions
How do we define similar?
How do we search quickly?

15
Defining Distance Measures

Definition Let O1 and O2 be two objects from the
universe of possible objects. The distance
(dissimilarity) is denoted by D(O1,O2)
What properties should a distance measure have?
D(A,B) D(B,A) Symmetry
D(A,A) 0 Constancy of Self-Similarity
D(A,B) 0 IIf A B Positivity
D(A,B) ? D(A,C) D(B,C) Triangular Inequality

16
Intuitions behind desirable distance measure
properties
D(A,B) D(B,A) Symmetry Otherwise you could
claim Alex looks like Bob, but Bob looks nothing
like Alex. D(A,A) 0 Constancy of
Self-Similarity Otherwise you could claim Alex
looks more like Bob, than Bob does. D(A,B) 0
IIf AB Positivity Otherwise there are objects
in your database that are different, but you
cannot tell apart. D(A,B) ? D(A,C)
D(B,C) Triangular Inequality Otherwise you could
claim Alex is very like Bob, and Alex is very
like Carl, but Bob is very unlike Carl.
17
Why is the Triangular Inequality so Important?
Virtually all techniques to index data require
the triangular inequality to hold.
Suppose I am looking for the closest point to Q,
in a database of 3 objects. Further suppose that
the triangular inequality holds, and that we have
precomplied a table of distance between all the
items in the database. I find a and calculate
that it is 2 units from Q, it becomes my
best-so-far. I find b and calculate that it is
7.81 units away from Q. I dont have to calculate
the distance from Q to c! I know
D(Q,b) ? D(Q,c) D(b,c) D(Q,b) - D(b,c) ?
D(Q,c) 7.81 - 2.30 ? D(Q,c)
5.51 ? D(Q,c) So I know that c is at least
5.51 units away, but my best-so-far is only 2
units away.
a
Q
c
b
18
The Minkowski Metrics
C
Q
p 1 Manhattan (Rectilinear, City Block) p 2
Euclidean p ? Max (Supremum, sup)
19
Euclidean Distance Metric
Given two time series Q q1qn and C
c1cn their Euclidean distance is defined as
20
Preprocessing the data before distance
calculations

If we naively try to measure the distance between
two raw time series, we may get very
unintuitive results.
This is because Euclidean distance is very
sensitive to some distortions in the data. For
most problems these distortions are not
meaningful, and thus we can and should remove
them.
In the next 4 slides I will discuss the 4 most
common distortions, and how to remove them.
Offset Translation
Amplitude Scaling
Linear Trend
Noise

21
Transformation I Offset Translation
D(Q,C)
Q Q - mean(Q)
C C - mean(C)
D(Q,C)
0
50
100
150
200
250
300
22
Transformation II Amplitude Scaling
0
100
200
300
400
500
600
700
800
900
1000
Q (Q - mean(Q)) / std(Q)
C (C - mean(C)) / std(C)
D(Q,C)
23
Transformation III Linear Trend
Removed offset translation
Removed amplitude scaling
Removed linear trend
Removed offset translation
The intuition behind removing linear trend is
this. Fit the best fitting straight line to the
time series, then subtract that line from the
time series.
Removed amplitude scaling
24
Transformation IIII Noise
Q smooth(Q)
C smooth(C)
The intuition behind removing noise is
this. Average each datapoints value with its
neighbors.
D(Q,C)
25
A Quick Experiment to Demonstrate the Utility of
Preprocessing the Data
3
Clustered using Euclidean distance on the raw
data
2
9
6
8
5
7
4
1
Clustered using Euclidean distance on the raw
data, after removing noise, linear trend, offset
translation and amplitude scaling.
9
8
7
Instances from Cylinder-Bell-Funnel with small,
random amounts of trend, offset and scaling
added.
5
6
4
3
2
1
26
Summary of Preprocessing
The raw time series may have distortions which
we should remove before clustering,
classification etc. Of course, sometimes the
distortions are the most interesting thing about
the data, the above is only a general rule. We
should keep in mind these problems as we consider
the high level representations of time series
which we will encounter later (Fourier
transforms, Wavelets etc). Since these
representations often allow us to handle
distortions in elegant ways.
27
Dynamic Time Warping
Fixed Time Axis Sequences are aligned one to
one.
Warped Time Axis Nonlinear alignments are
possible.
Note We will first see the utility of DTW, then
see how it is calculated.
28
Utility of Dynamic Time Warping Example, Data
Mining
Power-Demand Time Series. Each sequence
corresponds to a weeks demand for power in a
Dutch research facility in 1997 van Selow 1999.
Wednesday was a national holiday
29
Hierarchical clustering with Euclidean
Distance. ltGroup Average Linkagegt
4
5
3
6
The two 5-day weeks are correctly grouped. Note
however, that the three 4-day weeks are not
clustered together. Also, the two 3-day weeks
are also not clustered together.
7
2
1
30
Hierarchical clustering with Dynamic Time
Warping. ltGroup Average Linkagegt
The two 5-day weeks are correctly grouped. The
three 4-day weeks are clustered together. The
two 3-day weeks are also clustered together.
31
Time taken to create hierarchical clustering of
power-demand time series.

Time to create dendrogram
using Euclidean Distance 1.2 seconds
Time to create dendrogram
using Dynamic Time Warping 3.40 hours

32
Computing the Dynamic Time Warp Distance I
Note that the input sequences can be of different
lengths
Q
n
p
C
33
Computing the Dynamic Time Warp Distance II
Q
n
p
C
Every possible mapping from Q to C can be
represented as a warping path in the search
matrix. We simply want to find the cheapest
one Although there are exponentially many
such paths, we can find one in only quadratic
time using dynamic programming.
?(i,j) d(qi,cj) min ?(i-1,j-1) , ?(i-1,j )
, ?(i,j-1)
34
Fast Approximations to Dynamic Time Warp Distance
I
Simple Idea Approximate the time series with
some compressed or downsampled representation,
and do DTW on the new representation. How well
does this work...
35
Fast Approximations to Dynamic Time Warp Distance
II
22.7 sec
1.3 sec
.. strong visual evidence to suggests it works
well. Good experimental evidence the utility of
the approach on clustering, classification and
query by content problems also has been
demonstrated.
36
Weighted Distance Measures I
Intuition For some queries different parts of
the sequence are more important.
Weighting features is a well known technique in
the machine learning community to improve
classification and the quality of clustering.
37
Weighted Distance Measures II
D(Q,C)
D(Q,C,W)
The height of this histogram indicates the
relative importance of that part of the query
W
38
Weighted Distance Measures IIIHow do we set the
weights?
One Possibility Relevance Feedback
Definition Relevance Feedback is the
reformulation of a search query in response to
feedback provided by the user for the results of
previous versions of the query.
Search
Display Results
Gather Feedback
Update Weights
39
Relevance Feedback for Time Series
The original query
The weigh vector. Initially, all weighs are the
same.
Note In this example we are using a piecewise
linear approximation of the data. We will learn
more about this representation later.
40
The initial query is executed, and the five best
matches are shown (in the dendrogram)
One by one the 5 best matching sequences will
appear, and the user will rank them from between
very bad (-3) to very good (3)
41
Based on the user feedback, both the shape and
the weigh vector of the query are changed.
The new query can be executed. The hope is that
the query shape and weights will converge to the
optimal query.
Two paper consider relevance feedback for time
series. L Wu, C Faloutsos, K Sycara, T. Payne
FALCON Feedback Adaptive Loop for Content-Based
Retrieval. VLDB 2000 297-306
42
Motivating Example Revisited...
You go to the doctor because of chest pains. Your
ECG looks strange You doctor wants to search a
database to find similar ECGS, in the hope that
they will offer clues about your condition...

Two questions
How do we define similar?
How do we search quickly?

43
Indexing Time Series
We have seen techniques for assessing the
similarity of two time series. However we have
not addressed the problem of finding the best
match to a query in a large database...
Query Q
The obvious solution, to retrieve and examine
every item (sequential scanning), simply does not
scale to large datasets. We need someway to
index the data...
44
We can project time series of length n into
n-dimension space. The first value in C is the
X-axis, the second value in C is the Y-axis
etc. One advantage of doing this is that we have
abstracted away the details of time series, now
all query processing can be imagined as finding
points in space...
45
Interesting Sidebar The Minkowski Metrics have
simple geometric interoperations...
we can project the query time series Q into the
same n-dimension space and simply look for the
nearest points.
Euclidean
Q
Weighted Euclidean
Manhattan
Max
the problem is that we have to look at every
point to find the nearest neighbor..
46
We can group clusters of datapoints with boxes,
called Minimum Bounding Rectangles (MBR). We can
further recursively group MBRs into larger MBRs.
47
these nested MBRs are organized as a tree
(called a spatial access tree or a
multidimensional tree). Examples include R-tree,
Hybrid-Tree etc.
R10 R11 R12
R1 R2 R3
R4 R5 R6
R7 R8 R9
Data nodes containing points
48
If we project a query into n-dimensional space,
how many additional (nonempty) MBRs must we
examine before we are guaranteed to find the best
match?
For the one dimensional case, the answer is
clearly 2...
49
If we project a query into n-dimensional space,
how many additional (nonempty) MBRs must we
examine before we are guaranteed to find the best
match? For the one dimensional case, the answer
is clearly 2...
For the two dimensional case, the answer is 8...

50
If we project a query into n-dimensional space,
how many additional (nonempty) MBRs must we
examine before we are guaranteed to find the best
match? For the one dimensional case, the answer
is clearly 2...
For the three dimensional case, the answer is
26...
More generally, in n-dimension space we must
examine 3n -1 MBRs This is known as the curse
of dimensionality
For the two dimensional case, the answer is 8...

n 21 ? 10,460,353,201 MBRs
51
Spatial Access Methods
We can use Spatial Access Methods like the R-Tree
to index our data, but The performance of
R-Trees degrade exponentially with the number of
dimensions. Somewhere above 6-20 dimensions the
R-Tree degrades to linear scanning. Often we
want to index time series with hundreds, perhaps
even thousands of features.
52
GEMINI GEneric Multimedia INdexIng
Christos Faloutsos

Establish a distance metric from a domain expert.
Produce a dimensionality reduction technique that
reduces the dimensionality of the data from n to
N, where N can be efficiently handled by your
favorite SAM.
Produce a distance measure defined on the N
dimensional representation of the data, and prove
that it obeys Dindexspace(A,B) ? Dtrue(A,B).
i.e. The lower bounding lemma.
Plug into an off-the-shelve SAM.

53
We have 6 objects in 3-D space. We issue a query
to find all objects within 1 unit of the point
(-3, 0, -2)...
A
3
2.5
2
C
1.5
1
B
0.5
F
0
-0.5
-1
3
2
D
3
1
2
E
1
0
0
-1
-1
-2
-2
-3
-3
-4
54
Consider what would happen if we issued the same
query after reducing the dimensionality to 2,
assuming the dimensionality technique obeys the
lower bounding lemma...
The query successfully finds the object E.
A
3
2
C
1
B
F
0
-1
3
2
D
3
1
2
1
0
E
0
-1
-1
-2
-2
-3
-3
-4
55
Example of a dimensionality reduction technique
in which the lower bounding lemma is satisfied
Informally, its OK if objects appear closer in
the dimensionality reduced space, than in the
true space.
Note that because of the dimensionality
reduction, object F appears to less than one unit
from the query (it is a false alarm). This is OK
so long as it does not happen too much, since we
can always retrieve it, then test it in the true,
3-dimensional space. This would leave us with
just E , the correct answer.
3
2.5
A
2
1.5
C
F
1
0.5
0
B
-0.5
D
E
-1
-4
-3
-2
-1
0
1
2
3
56
Example of a dimensionality reduction technique
in which the lower bounding lemma is not
satisfied
Informally, some objects appear further apart in
the dimensionality reduced space than in the true
space.
Note that because of the dimensionality
reduction, object E appears to be more than one
unit from the query (it is a false
dismissal). This is unacceptable. We have
failed to find the true answer set to our query.
3
2.5
A
2
E
1.5
1
C
0.5
0
B
F
-0.5
D
-1
-4
-3
-2
-1
0
1
2
3
57
The examples on the previous slides illustrate
why the lower bounding lemma is so
important. Now all we have to do is to find a
dimensionality reduction technique that obeys the
lower bounding lemma, and we can index our time
series!
58
Notation for Dimensionality Reduction
For the future discussion of dimensionality
reduction we will assume that M is the number
time series in our database. n is the original
dimensionality of the data. N is the reduced
dimensionality of the data. CRatio N/n is the
compression ratio.
(i.e. the length of the time series)
59
An Example of a Dimensionality Reduction
Technique I
The graphic shows a time series with 128
points. The raw data used to produce the graphic
is also reproduced as a column of numbers (just
the first 30 or so points are shown).
C
0
20
40
60
80
100
120
140
n 128
60
An Example of a Dimensionality Reduction
Technique II
We can decompose the data into 64 pure sine waves
using the Discrete Fourier Transform (just the
first few sine waves are shown). The Fourier
Coefficients are reproduced as a column of
numbers (just the first 30 or so coefficients are
shown). Note that at this stage we have not done
dimensionality reduction, we have merely changed
the representation...
C
0
20
40
60
80
100
120
140
. . . . . . . . . . . . . .
61
An Example of a Dimensionality Reduction
Technique III
Truncated Fourier Coefficients
Fourier Coefficients
1.5698 1.0485 0.7160 0.8406
0.3709 0.4670 0.2667 0.1928
1.5698 1.0485 0.7160 0.8406
0.3709 0.4670 0.2667 0.1928
0.1635 0.1602 0.0992 0.1282
0.1438 0.1416 0.1400 0.1412
0.1530 0.0795 0.1013 0.1150
0.1801 0.1082 0.0812 0.0347
0.0052 0.0017 0.0002 ...
n 128 N 8 Cratio 1/16
C
C
0
20
40
60
80
100
120
140
however, note that the first few sine waves
tend to be the largest (equivalently, the
magnitude of the Fourier coefficients tend to
decrease as you move down the column). We can
therefore truncate most of the small coefficients
with little effect.
We have discarded of the data.
62
An Example of a Dimensionality Reduction
Technique IIII
Sorted Truncated Fourier Coefficients
1.5698 1.0485 0.7160 0.8406
0.2667 0.1928 0.1438 0.1416
C
C
0
20
40
60
80
100
120
140
Instead of taking the first few coefficients, we
could take the best coefficients This can help
greatly in terms of approximation quality, but
makes indexing hard (impossible?). Note this
applies also to Wavelets
63
0
20
40
60
80
100
120
0
20
40
60
80
100
120
0
20
40
60
80
100
120
0
20
40
60
80
100
120
0
20
40
60
80
100
120
0
20
40
60
80
100
120
DFT
DWT
SVD
APCA
PAA
PLA
Morinaka, Yoshikawa, Amagasa, Uemura, PAKDD
2001
Korn, Jagadish Faloutsos. SIGMOD 1997
Chan Fu. ICDE 1999
Agrawal, Faloutsos, . Swami. FODO
1993 Faloutsos, Ranganathan, Manolopoulos.
SIGMOD 1994
Keogh, Chakrabarti, Pazzani Mehrotra SIGMOD
2001
Keogh, Chakrabarti, Pazzani Mehrotra KAIS
2000 Yi Faloutsos VLDB 2000
64
Discrete Fourier Transform I
Basic Idea Represent the time series as a linear
combination of sines and cosines, but keep only
the first n/2 coefficients. Why n/2
coefficients? Because each sine wave requires 2
numbers, for the phase (w) and amplitude (A,B).
X
X'
0
20
40
60
80
100
120
140
Jean Fourier 1768-1830
0
1
2
3
4
5
6
7
Excellent free Fourier Primer Hagit Shatkay, The
Fourier Transform - a Primer'', Technical Report
CS-95-37, Department of Computer Science, Brown
University, 1995. http//www.ncbi.nlm.nih.gov/CBB
research/Postdocs/Shatkay/
8
9
65
Discrete Fourier Transform II

Pros and Cons of DFT as a time series
representation.
Good ability to compress most natural signals.
Fast, off the shelf DFT algorithms exist.
O(nlog(n)).
(Weakly) able to support time warped queries.
Difficult to deal with sequences of different
lengths.
Cannot support weighted distance measures.

X
X'
0
20
40
60
80
100
120
140
0
1
2
3
4
5
6
7
Note The related transform DCT, uses only cosine
basis functions. It does not seem to offer any
particular advantages over DFT.
8
9
66
Discrete Wavelet Transform I
Basic Idea Represent the time series as a linear
combination of Wavelet basis functions, but keep
only the first N coefficients. Although there
are many different types of wavelets, researchers
in time series mining/indexing generally use Haar
wavelets. Haar wavelets seem to be as powerful
as the other wavelets for most problems and are
very easy to code.
Alfred Haar 1885-1933
Excellent free Wavelets Primer Stollnitz, E.,
DeRose, T., Salesin, D. (1995). Wavelets for
computer graphics A primer IEEE Computer
Graphics and Applications.
67
X 8, 4, 1, 3
h1 4 mean(8,4,1,3)
h2 2 mean(8,4) - h1
h3 2 (8-4)/2
h4 -1 (1-3)/2
8
7
6
5
4
3
2
1
I have converted a raw time series X 8, 4, 1,
3, into the Haar Wavelet representation H 4,
2 , 2, 1 We can covert the Haar representation
back to raw signal with no loss of information...
h1 4
h2 2
h3 2
h4 -1
X 8, 4, 1, 3
8
7
6
5
4
3
2
1
68
Discrete Wavelet Transform II
We have only considered one type of wavelet,
there are many others. Are the other wavelets
better for indexing? YES I. Popivanov, R.
Miller. Similarity Search Over Time Series Data
Using Wavelets. ICDE 2002. NO K. Chan and A.
Fu. Efficient Time Series Matching by Wavelets.
ICDE 1999
I consider this an open question...
69
Discrete Wavelet Transform III

Pros and Cons of Wavelets as a time series
representation.
Good ability to compress stationary signals.
Fast linear time algorithms for DWT exist.
Able to support some interesting non-Euclidean
similarity measures.
Signals must have a length n 2some_integer
Works best if N is 2some_integer. Otherwise
wavelets approximate the left side of signal at
the expense of the right side.
Cannot support weighted distance measures.

70
Singular Value Decomposition I
Basic Idea Represent the time series as a linear
combination of eigenwaves but keep only the first
N coefficients. SVD is similar to Fourier and
Wavelet approaches is that we represent the data
in terms of a linear combination of shapes (in
this case eigenwaves). SVD differs in that the
eigenwaves are data dependent. SVD has been
successfully used in the text processing
community (where it is known as Latent Symantec
Indexing ) for many years. Good free SVD Primer
Singular Value Decomposition - A Primer. Sonia
Leach
X
X'
SVD
James Joseph Sylvester 1814-1897
0
20
40
60
80
100
120
140
Camille Jordan (1838--1921)
Eugenio Beltrami 1835-1899
71
Singular Value Decomposition II
How do we create the eigenwaves?
We have previously seen that we can regard time
series as points in high dimensional space. We
can rotate the axes such that axis 1 is aligned
with the direction of maximum variance, axis 2 is
aligned with the direction of maximum variance
orthogonal to axis 1 etc. Since the first few
eigenwaves contain most of the variance of the
signal, the rest can be truncated with little
loss.
X
X'
SVD
0
20
40
60
80
100
120
140
This process can be achieved by factoring a M by
n matrix of time series into 3 other matrices,
and truncating the new matrices at size N.
72
Singular Value Decomposition III

Pros and Cons of SVD as a time series
representation.
Optimal linear dimensionality reduction
technique .
The eigenvalues tell us something about the
underlying structure of the data.
Computationally very expensive.
Time O(Mn2)
Space O(Mn)
An insertion into the database requires
recomputing the SVD.
Cannot support weighted distance measures or non
Euclidean measures.

X
X'
SVD
0
20
40
60
80
100
120
140
Note There has been some promising research into
mitigating SVDs time and space complexity.
73
Piecewise Linear Approximation I
Basic Idea Represent the time series as a
sequence of straight lines. Lines could be
connected, in which case we are allowed N/2
lines If lines are disconnected, we are
allowed only N/3 lines Personal experience on
dozens of datasets suggest disconnected is
better. Also only disconnected allows a lower
bounding Euclidean approximation
X
Karl Friedrich Gauss 1777 - 1855
X'
0
20
40
60
80
100
120
140

Each line segment has
length
left_height
(right_height can be inferred by looking at the
next segment)

Each line segment has
length
left_height
right_height

How do we obtain the Piecewise Linear
Approximation?
Optimal Solution is O(n2N), which is too slow for
data mining.
A vast body on work on faster heuristic solutions
to the problem can be classified into the
following classes
Top-Down O(n2N)
Bottom-Up O(n(1/CRatio))
Sliding Window O(n(1/CRatio))
Other (genetic algorithms, randomized
algorithms, Bspline wavelets, MDL etc)
Recent extensive empirical evaluation of all
approaches suggest that Bottom-Up is the best
approach overall.

Piecewise Linear Approximation II
X
X'
0
20
40
60
80
100
120
140
75
Piecewise Linear Approximation III

Pros and Cons of PLA as a time series
representation.
Good ability to compress natural signals.
Fast linear time algorithms for PLA exist.
Able to support some interesting non-Euclidean
similarity measures. Including weighted measures,
relevance feedback, fuzzy queries
Already widely accepted in some communities (ie,
biomedical)
Not (currently) indexable by any data structure
(but does allows fast sequential scanning).

X
X'
0
20
40
60
80
100
120
140
76
Basic Idea Convert the time series into an
alphabet of discrete symbols. Use string indexing
techniques to manage the data. Potentially an
interesting idea, but all the papers thusfar are
very ad hoc.
Symbolic Approximation
X
X'
C U U C D C U D
0
20
40
60
80
100
120
140

Pros and Cons of Symbolic Approximation as a time
series representation.
Potentially, we could take advantage of a wealth
of techniques from the very mature field of
string processing.
There is no known technique to allow the support
of Euclidean queries.
It is not clear how we should discretize the
times series (discretize the values, the slope,
shapes? How big of an alphabet? etc)

C
U
U
C
D
C
Key C Constant U Up D Down
U
D
77
Basic Idea Represent the time series as a
sequence of box basis functions. Note that each
box is the same length.
Piecewise Aggregate Approximation I
X
X'
0
20
40
60
80
100
120
140
Given the reduced dimensionality representation
we can calculate the approximate Euclidean
distance as...
This measure is provably lower bounding.
Independently introduced by two authors Keogh,
Chakrabarti, Pazzani Mehrotra, KAIS
(2000) Byoung-Kee Yi, Christos Faloutsos, VLDB
(2000)
78
Piecewise Aggregate Approximation II

Pros and Cons of PAA as a time series
representation.

Extremely fast to calculate
As efficient as other approaches (empirically)
Support queries of arbitrary lengths
Can support any Minkowski metric
Supports non Euclidean measures
Supports weighted Euclidean distance
Simple! Intuitive!
If visualized directly, looks ascetically
unpleasing.

X
X'
0
20
40
60
80
100
120
140
79
Basic Idea Generalize PAA to allow the piecewise
constant segments to have arbitrary lengths.
Note that we now need 2 coefficients to
represent each segment, its value and its length.
Adaptive Piecewise Constant Approximation I
X
X
0
20
40
60
80
100
120
140
ltcv1,cr1gt
ltcv2,cr2gt
ltcv3,cr3gt
ltcv4,cr4gt
The intuition is this, many signals have little
detail in some places, and high detail in other
places. APCA can adaptively fit itself to the
data achieving better approximation.
80
The high quality of the APCA had been noted by
many researchers. However it was believed that
the representation could not be indexed because
some coefficients represent values, and some
represent lengths. However an indexing method
was discovered! (SIGMOD 2001 best paper award)
Unfortunately, it is non-trivial to understand
and implement.
Adaptive Piecewise Constant Approximation II
X
X
0
20
40
60
80
100
120
140
ltcv1,cr1gt
ltcv2,cr2gt
ltcv3,cr3gt
ltcv4,cr4gt
81
Adaptive Piecewise Constant Approximation

Pros and Cons of APCA as a time series
representation.
Fast to calculate O(n).
More efficient as other approaches (on some
datasets).
Support queries of arbitrary lengths.
Supports non Euclidean measures.
Supports weighted Euclidean distance.
Support fast exact queries , and even faster
approximate queries on the same data structure.
Somewhat complex implementation.
If visualized directly, looks ascetically
unpleasing.

X
X
0
20
40
60
80
100
120
140
ltcv1,cr1gt
ltcv2,cr2gt
ltcv3,cr3gt
ltcv4,cr4gt
82
Comparison of all dimensionality reduction
techniques

We can compare the time it takes to build the
index, for different sizes of databases, and
different query lengths.
We can compare the indexing efficiency. How long
does it take to find the best answer to out
query. It turns out that the fairest way to
measure this is to measure the number of times we
have to retrieve an item from disk.
We can simply compare features. Does approach X
allow weighted queries, queries of arbitrary
lengths, is it simple to implement...

83
The time needed to build the index
Black topped histogram bars (in SVD) indicate
experiments abandoned at 1,000 seconds.
84
The fraction of the data that must be retrieved
from disk to answer a one nearest neighbor query
Dataset is Stock Market Data
85
The fraction of the data that must be retrieved
from disk to answer a one nearest neighbor query
Dataset is mixture of many structured datasets,
like ECGs
86
Directions for Future Research