Mining Time Series Data

About This Presentation

Title:

Mining Time Series Data

Description:

Note that virtually all similarity measurements, indexing and ... Faloutsos, K Sycara, T. Payne: FALCON: Feedback Adaptive Loop for Content-Based Retrieval. ... – PowerPoint PPT presentation

Number of Views:500

Avg rating:3.0/5.0

Slides: 60

Provided by: informatio122

Learn more at: http://web.cs.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: Mining Time Series Data

1
Mining Time Series Data
CS240B Notes by Carlo Zaniolo UCLA CS Dept

With Slides from
A Tutorial on Indexing and Mining Time Series
Data ICDM '01The 2001 IEEE International
Conference on Data Mining November 29, San
Jose Dr Eamonn Keogh Computer Science
Engineering DepartmentUniversity of California -
RiversideRiverside,CA 92521eamonn_at_cs.ucr.edu

2
Outline

Introduction, Motivation
Similarity Measures
Properties of distance measures
Preprocessing the data
Time warped measures
Indexing Time Series
Dimensionality reduction
Discrete Fourier Transform
Discrete Wavelet Transform
Singular Value Decomposition
Piecewise Linear Approximation
Symbolic Approximation
Piecewise Aggregate Approximation
Adaptive Piecewise Constant Approximation
Summary, Conclusions

3
What are Time Series?
25.1750 25.2250 25.2500 25.2500
25.2750 25.3250 25.3500 25.3500
25.4000 25.4000 25.3250 25.2250
25.2000 25.1750 .. .. 24.6250
24.6750 24.6750 24.6250 24.6250
24.6250 24.6750 24.7500
A time series is a collection of observations
made sequentially in time.
Note that virtually all similarity measurements,
indexing and dimensionality reduction techniques
discussed in this tutorial can be used with other
data types.
4
Time Series are Ubiquitous! I

People measure things...
The presidents approval rating.
Their blood pressure.
The annual rainfall in Riverside.
The value of their Yahoo stock.
The number of web hits per second.
and things change over time.

Thus time series occur in virtually every
medical, scientific and businesses domain.
5
Time Series are Ubiquitous! II
A random sample of 4,000 graphics from 15 of the
worlds newspapers published from 1974 to 1989
found that more than 75 of all graphics were
time series (Tufte, 1983).
6
Time Series Similarity
Classification
Clustering
Defining the similarity between two time series
is at the heart of most time series data mining
applications/tasks
Rule Discovery
10 ? s 0.5 c 0.3
Thus time series similarity will be the primary
focus of this tutorial.
Query by Content
Query Q (template)
7
Why is Working With Time Series so Difficult?
Part I
Answer How do we work with very large databases?

1 Hour of EKG data 1 Gigabyte.
Typical Weblog 5 Gigabytes per week.
Space Shuttle Database 158 Gigabytes and
growing.
Macho Database 2 Terabytes, updated with 3
gigabytes per day.

Since most of the data lives on disk (or tape),
we need a representation of the data we can
efficiently manipulate.
8
Why is Working With Time Series so Difficult?
Part II
Answer We are dealing with subjective notions of
similarity.
The definition of similarity depends on the
user, the domain and the task at hand. We need to
be able to handle this subjectivity.
9
Why is working with time series so difficult?
Part III

Answer Miscellaneous data handling problems.
Differing data formats.
Differing sampling rates.
Noise, missing values, etc.

10
Similarity Matching Problem Flavors 1
1 Whole Matching
Query Q (template)
6
1
7
2
8
3
C6 is the best match.
9
4
10
5
Database C
Given a Query Q, a reference database C and a
distance measure, find the Ci that best matches
Q.
11
Similarity matching problem flavor 2
Query Q (template)
2 Subsequence Matching
Database C
The best matching subsection.
Given a Query Q, a reference database C and a
distance measure, find the location that best
matches Q.
Note that we can always convert subsequence
matching to whole matching by sliding a window
across the long sequence, and copying the window
contents.
12
After all that background we might have forgotten
what we are doing and why we care! So here is a
simple motivator and review..
You go to the doctor because of chest pains. Your
ECG looks strange You doctor wants to search a
database to find similar ECGS, in the hope that
they will offer clues about your condition...

Two questions
How do we define similar?
How do we search quickly?

13
Similarity is always subjective.(i.e. it depends
on the application)

All models are wrong, but some are useful

This slide was taken from A practical
Time-Series Tutorial with MATLABpresented at
ECLM PAKDD 2005, by Michalis Vlachos.
14
Distance functions

Metric
Euclidean Distance
Correlation
Triangle Inequality d(x,z) d(x,y) d(y,z)
Assume d(Q,bestMatch) 20
and d(Q,B) 150
Then, since d(A,B)20
d(Q,A) d(Q,B) d(B,A)
d(Q,A) 150 20 130
We do not need to get A from disk

Non-Metric
Time Warping
LCSS longest common sub-sequence

15
Preprocessing the data before distance
calculations

If we naively try to measure the distance between
two raw time series, we may get very
unintuitive results.
This is because Euclidean distance is very
sensitive to some distortions in the data. For
most problems these distortions are not
meaningful, and thus we can and should remove
them.
In the next 4 slides I will discuss the 4 most
common distortions, and how to remove them.
Offset Translation
Amplitude Scaling
Linear Trend
Noise

16
Transformation I Offset Translation
D(Q,C)
Q Q - mean(Q)
C C - mean(C)
D(Q,C)
0
50
100
150
200
250
300
17
Transformation II Amplitude Scaling
0
100
200
300
400
500
600
700
800
900
1000
Q (Q - mean(Q)) / std(Q)
C (C - mean(C)) / std(C)
D(Q,C)
18
Transformation III Linear Trend
After offset translation
And amplitude scaling
Removed linear trend
The intuition behind removing linear trend is
this. Fit the best fitting straight line to the
time series, then subtract that line from the
time series.
19
Transformation IIII Noise
Q smooth(Q)
C smooth(C)
The intuition behind removing noise is
this. Average each datapoints value with its
neighbors.
D(Q,C)
20
A Quick Experiment to Demonstrate the Utility of
Preprocessing the Data
3
Clustered using Euclidean distance on the raw
data
2
9
6
8
5
7
4
1
Clustered using Euclidean distance on the raw
data, after removing noise, linear trend, offset
translation and amplitude scaling.
9
8
7
5
6
4
3
2
1
21
Summary of Preprocessing
The raw time series may have distortions which
we should remove before clustering,
classification etc. Of course, sometimes the
distortions are the most interesting thing about
the data, the above is only a general rule. We
should keep in mind these problems as we consider
the high level representations of time series
which we will encounter later (Fourier
transforms, Wavelets etc). Since these
representations often allow us to handle
distortions in elegant ways.
22
Dynamic Time Warping
Fixed Time Axis Sequences are aligned one to
one.
Warped Time Axis Nonlinear alignments are
possible.
Note We will first see the utility of DTW, then
see how it is calculated.
23
Utility of Dynamic Time Warping Example II, Data
Mining
Power-Demand Time Series. Each sequence
corresponds to a weeks demand for power in a
Dutch research facility in 1997 van Selow 1999.
Wednesday was a national holiday
24
Hierarchical clustering with Euclidean
Distance. ltGroup Average Linkagegt
4
5
3
6
The two 5-day weeks are correctly grouped. Note
however, that the three 4-day weeks are not
clustered together. Also, the two 3-day weeks
are also not clustered together.
7
2
1
25
Hierarchical clustering with Dynamic Time
Warping. ltGroup Average Linkagegt
The two 5-day weeks are correctly grouped. The
three 4-day weeks are clustered together. The
two 3-day weeks are also clustered together.
26
Dynamic Time-Warping

(how does it work?)The intuition is that we copy
an element multiple times so as to achieve a
better matching

Euclidean distance d 1 T1 1, 1, 2, 2
T2 1, 2, 2, 2 Warping
distance d 0 T1 1, 1, 2, 2
T2 1, 2, 2, 2
27
Computing the Dynamic Time Warp Distance I
Note that the input sequences can be of different
lengths
Q
n
p
C
28
Computing the Dynamic Time Warp Distance II
Q
n
p
C
Every possible mapping from Q to C can be
represented as a warping path in the search
matrix. We simply want to find the cheapest
one Although there are exponentially many
such paths, we can find one in only quadratic
time using dynamic programming.
29
Complexity of Time Warping

Time taken to create hierarchical clustering of
power-demand time series.
Time to create dendrogram
using Euclidean Distance 1.2 seconds
Time to create dendrogram
using Dynamic Time Warping 3.40 hours
How to speed it up.
Approach 1 Complexity is O(n2). We can reduce it
to O(n) simply by restricting the warping path.
Approach 2 Approximate the time series with some
compressed or downsampled representation, and do
DTW on the new representation.

30
Fast Approximations to Dynamic Time Warp Distance
II
22.7 sec
1.3 sec
.. strong visual evidence to suggests it works
well. Good experimental evidence the utility of
the approach on clustering, classification and
query by content problems also has been
demonstrated.
31
Weighted Distance Measures I
Intuition For some queries different parts of
the sequence are more important.
Weighting features is a well known technique in
the machine learning community to improve
classification and the quality of clustering.
32
Relevance Feedback for Time Series
The original query
The weigh vector. Initially, all weighs are the
same.
Note In this example we are using a piecewise
linear approximation of the data. We will learn
more about this representation later.
33
The initial query is executed, and the five best
matches are shown (in the dendrogram)
One by one the 5 best matching sequences will
appear, and the user will rank them from between
very bad (-3) to very good (3)
34
Based on the user feedback, both the shape and
the weigh vector of the query are changed.
The new query can be executed. The hope is that
the query shape and weights will converge to the
optimal query.
Two paper consider relevance feedback for time
series. L Wu, C Faloutsos, K Sycara, T. Payne
FALCON Feedback Adaptive Loop for Content-Based
Retrieval. VLDB 2000 297-306
35
Motivating Example Revisited...
You go to the doctor because of chest pains. Your
ECG looks strange You doctor wants to search a
database to find similar ECGS, in the hope that
they will offer clues about your condition...

Two questions
How do we define similar?
How do we search quickly?

36
Indexing Time Series

We have seen techniques for assessing the
similarity of two time series.
However we have not addressed the problem of
finding the best match to a query in a large
database...
We need someway to index the data...
A topics extensively discussed in topical
literature that we will not discuss here for lack
of timealso it might not be applicable to data
streams

37
Compression Dimensionality Reduction

Project all sequences into a new space, and
search this space instead.

38
An Example of a Dimensionality Reduction
Technique
The graphic shows a time series with 128
points. The raw data used to produce the graphic
is also reproduced as a column of numbers (just
the first 30 or so points are shown).
C
0
20
40
60
80
100
120
140
n 128
39
Dimensionality Reduction (cont.)
We can decompose the data into 64 pure sine waves
using the Discrete Fourier Transform (just the
first few sine waves are shown). The Fourier
Coefficients are reproduced as a column of
numbers (just the first 30 or so coefficients are
shown). Note that at this stage we have not done
dimensionality reduction, we have merely changed
the representation...
C
0
20
40
60
80
100
120
140
. . . . . . . . . . . . . .
40
An Example of a Dimensionality Reduction
Technique III
Truncated Fourier Coefficients
Fourier Coefficients
1.5698 1.0485 0.7160 0.8406
0.3709 0.4670 0.2667 0.1928
1.5698 1.0485 0.7160 0.8406
0.3709 0.4670 0.2667 0.1928
0.1635 0.1602 0.0992 0.1282
0.1438 0.1416 0.1400 0.1412
0.1530 0.0795 0.1013 0.1150
0.1801 0.1082 0.0812 0.0347
0.0052 0.0017 0.0002 ...
n 128 N 8 Cratio 1/16
C
C
0
20
40
60
80
100
120
140
however, note that the first few sine waves
tend to be the largest (equivalently, the
magnitude of the Fourier coefficients tend to
decrease as you move down the column). We can
therefore truncate most of the small coefficients
with little effect.
We have discarded of the data.
41
An Example of a Dimensionality Reduction
Technique IIII
Sorted Truncated Fourier Coefficients
1.5698 1.0485 0.7160 0.8406
0.2667 0.1928 0.1438 0.1416
C
C
0
20
40
60
80
100
120
140
Instead of taking the first few coefficients, we
could take the best coefficients This can help
greatly in terms of approximation quality, but
makes indexing hard (impossible?). Note this
applies also to Wavelets
42
Compressed Representations
43
Discrete Fourier Transform I
Basic Idea Represent the time series as a linear
combination of sines and cosines, but keep only
the first n/2 coefficients. Why n/2
coefficients? Because each sine wave requires 2
numbers, for the phase (w) and amplitude (A,B).
X
X'
0
20
40
60
80
100
120
140
Jean Fourier 1768-1830
0
1
2
3
4
5
Excellent free Fourier Primer Hagit Shatkay, The
Fourier Transform - a Primer'', Technical Report
CS-95-37, Department of Computer Science, Brown
University, 1995. http//www.ncbi.nlm.nih.gov/CBB
research/Postdocs/Shatkay/
6
7
8
9
44
Discrete Fourier Transform II

Pros and Cons of DFT as a time series
representation.
Good ability to compress most natural signals.
Fast, off the shelf DFT algorithms exist.
O(nlog(n)).
(Weakly) able to support time warped queries.
Difficult to deal with sequences of different
lengths.
Cannot support weighted distance measures.

X
X'
0
20
40
60
80
100
120
140
0
1
2
3
4
5
6
Note The related transform DCT, uses only cosine
basis functions. It does not seem to offer any
particular advantages over DFT.
7
8
9
45
Discrete Wavelet Transform I
Basic Idea Represent the time series as a linear
combination of Wavelet basis functions, but keep
only the first N coefficients. Although there
are many different types of wavelets, researchers
in time series mining/indexing generally use Haar
wavelets. Haar wavelets seem to be as powerful
as the other wavelets for most problems and are
very easy to code.
Alfred Haar 1885-1933
Excellent free Wavelets Primer Stollnitz, E.,
DeRose, T., Salesin, D. (1995). Wavelets for
computer graphics A primer IEEE Computer
Graphics and Applications.
46
X 8, 4, 1, 3
h1 4 mean(8,4,1,3)
h2 2 mean(8,4) - h1
h3 2 (8-4)/2
h4 -1 (1-3)/2
8
7
6
5
4
3
2
1
I have converted a raw time series X 8, 4, 1,
3, into the Haar Wavelet representation H 4,
2 , 2, 1 We can covert the Haar representation
back to raw signal with no loss of information...
h1 4
h2 2
h3 2
h4 -1
X 8, 4, 1, 3
8
7
6
5
4
3
2
1
47
Discrete Wavelet Transform III

Pros and Cons of Wavelets as a time series
representation.
Good ability to compress stationary signals.
Fast linear time algorithms for DWT exist.
Able to support some interesting non-Euclidean
similarity measures.
Works best if N is 2some_integer. Otherwise
wavelets approximate the left side of signal at
the expense of the right side.
Cannot support weighted distance measures.

Open Question We have only considered one type
of wavelet, there are many others. Are the other
wavelets better for indexing? YES I. Popivanov,
R. Miller. Similarity Search Over Time Series
Data Using Wavelets. ICDE 2002. NO K. Chan and
A. Fu. Efficient Time Series Matching by
Wavelets. ICDE 1999 Obviously, this question
still open...
48
Singular Value Decomposition
Basic Idea Represent the time series as a linear
combination of eigenwaves but keep only the first
N coefficients. SVD is similar to Fourier and
Wavelet approaches is that we represent the data
in terms of a linear combination of shapes (in
this case eigenwaves). SVD differs in that the
eigenwaves are data dependent. SVD has been
successfully used in the text processing
community (where it is known as Latent Symantec
Indexing ) for many yearsbut it is
computationally expensive Good free SVD Primer
Singular Value Decomposition - A Primer. Sonia
Leach
X
X'
SVD
James Joseph Sylvester 1814-1897
0
20
40
60
80
100
120
140
Camille Jordan (1838--1921)
Eugenio Beltrami 1835-1899
49
Singular Value Decomposition (cont.)
How do we create the eigenwaves?
We have previously seen that we can regard time
series as points in high dimensional space. We
can rotate the axes such that axis 1 is aligned
with the direction of maximum variance, axis 2 is
aligned with the direction of maximum variance
orthogonal to axis 1 etc. Since the first few
eigenwaves contain most of the variance of the
signal, the rest can be truncated with little
loss.
X
X'
SVD
0
20
40
60
80
100
120
140
50
Piecewise Linear Approximation I
Basic Idea Represent the time series as a
sequence of straight lines. Lines could be
connected, in which case we are allowed N/2
lines If lines are disconnected, we are allowed
only N/3 lines Personal experience on dozens of
datasets suggest disconnected is better. Also
only disconnected allows a lower bounding
Euclidean approximation
X
Karl Friedrich Gauss 1777 - 1855
X'
0
20
40
60
80
100
120
140

Each line segment has
length
left_height
(right_height can be inferred by looking at the
next segment)

Each line segment has
length
left_height
right_height

51
Piecewise Linear Approximation II

How do we obtain the Piecewise Linear
Approximation?
Optimal Solution is O(n2N), which is too slow for
data mining.
A vast body on work on faster heuristic solutions
to the problem can be classified into the
following classes--CRatio denotes the compression
ratio
Top-Down O(n2N)
Bottom-Up O(n/CRatio)
Sliding Window O(n/CRatio)
Other (genetic algorithms, randomized
algorithms, Bspline wavelets, MDL etc)
Recent extensive empirical evaluation of all
approaches suggest that Bottom-Up is the best
approach overall.

X
X'
0
20
40
60
80
100
120
140
52
Piecewise Linear Approximation III

Pros and Cons of PLA as a time series
representation.
Good ability to compress natural signals.
Fast linear time algorithms for PLA exist.
Able to support some interesting non-Euclidean
similarity measures. Including weighted measures,
relevance feedback, fuzzy queries
Already widely accepted in some communities (ie,
biomedical)
Not (currently) indexable by any data structure
(but does allows fast sequential scanning).

X
X'
0
20
40
60
80
100
120
140
53
Basic Idea Convert the time series into an
alphabet of discrete symbols. Use string indexing
techniques to manage the data. Potentially an
interesting idea, but all the papers thusfar are
very ad hoc.
Symbolic Approximation
X
X'
C U U C D C U D
0
20
40
60
80
100
120
140

Pros and Cons of Symbolic Approximation as a time
series representation.
Potentially, we could take advantage of a wealth
of techniques from the very mature field of
string processing.
There is no known technique to allow the support
of Euclidean queries.
It is not clear how we should discretize the
times series (discretize the values, the slope,
shapes? How big of an alphabet? etc)

C
U
U
C
D
C
Key C Constant U Up D Down
U
D
54
Basic Idea Represent the time series as a
sequence of box basis functions. Note that each
box is the same length.
Piecewise Aggregate Approximation I
X
X'
0
20
40
60
80
100
120
140
Given the reduced dimensionality representation
we can calculate the approximate Euclidean
distance as...
Independently introduced by two authors Keogh,
Chakrabarti, Pazzani Mehrotra, KAIS
(2000) Byoung-Kee Yi, Christos Faloutsos, VLDB
(2000)
55
Piecewise Aggregate Approximation II

Pros and Cons of PAA as a time series
representation.

Extremely fast to calculate
As efficient as other approaches (empirically)
Support queries of arbitrary lengths
Can support any Minkowski metric
Supports non Euclidean measures
Supports weighted Euclidean distance
Simple! Intuitive!
If visualized directly, looks ascetically
unpleasing.

X
X'
0
20
40
60
80
100
120
140
56
Basic Idea Generalize PAA to allow the piecewise
constant segments to have arbitrary lengths.
Note that we now need 2 coefficients to
represent each segment, its value and its length.
Adaptive Piecewise Constant Approximation I
X
X
0
20
40
60
80
100
120
140
ltcv1,cr1gt
ltcv2,cr2gt
ltcv3,cr3gt
ltcv4,cr4gt
The intuition is this, many signals have little
detail in some places, and high detail in other
places. APCA can adaptively fit itself to the
data achieving better approximation.
57
The high quality of the APCA had been noted by
many researchers. However it was believed that
the representation could not be indexed because
some coefficients represent values, and some
represent lengths. However an indexing method
was discovered! (SIGMOD 2001 best paper award)
Unfortunately, it is non-trivial to understand
and implement.
Adaptive Piecewise Constant Approximation II
X
X
0
20
40
60
80
100
120
140
ltcv1,cr1gt
ltcv2,cr2gt
ltcv3,cr3gt
ltcv4,cr4gt
58
Adaptive Piecewise Constant Approximation

Pros and Cons of APCA as a time series
representation.
Fast to calculate O(n).
More efficient as other approaches (on some
datasets).
Support queries of arbitrary lengths.
Supports non Euclidean measures.
Supports weighted Euclidean distance.
Support fast exact queries , and even faster
approximate queries on the same data structure.
Somewhat complex implementation.
If visualized directly, looks ascetically
unpleasing.

X
X
0
20
40
60
80
100
120
140
ltcv1,cr1gt
ltcv2,cr2gt
ltcv3,cr3gt
ltcv4,cr4gt
59
Conclusion

This is just an introduction, with many
unavoidable omissions
There are dozens of papers that offer new
distance measures.
Hidden Markov models do have a sound basis, but
dont scale well.
Time series analysis remains a hot area of
research and the most recent papers have not
been discussed here.

Write a Comment

User Comments (0)

About PowerShow.com

Mining Time Series Data - PowerPoint PPT Presentation

Mining Time Series Data

Note that virtually all similarity measurements, indexing and ... Faloutsos, K Sycara, T. Payne: FALCON: Feedback Adaptive Loop for Content-Based Retrieval. ... – PowerPoint PPT presentation