Title: General problem
1(No Transcript)
2General problem
Retrieval of time-series similar to a given
pattern.
3Example Stock charts
Database of time-series
4Example Stock charts
Database of time-series
Pattern
5Example Stock charts
Database of time-series
Pattern
Retrieval results
6Example Stock charts
Database of time-series
Pattern
Retrieval results
.92
.87
.86
.84
7Example Electrocardiogram
Database of time-series
8Example Electrocardiogram
Database of time-series
Pattern
9Example Electrocardiogram
Database of time-series
Pattern
Retrieval results
.91
.87
.98
1.0
10Outline
- Previous work
- Important points
- Indexing and retrieval
- Empirical results
- Conclusions
11Outline
- Previous work
- Important points
- Indexing and retrieval
- Empirical results
- Conclusions
Contributions
12Criteria for retrieval methods
Gunopulos 2000
- Work for erratic time-series
- Accept any pattern
- Find inexact matches
- Work when some points are missing
- Work on streaming data
13Outline
- Previous work
- Important points
- Indexing and retrieval
- Empirical results
- Conclusions
14Previous work
- Feature choice
- Similarity metrics
- Indexing and retrieval
15Previous work Feature choice
- Discrete Fourier transforms
- Alphabets
- Statistical features
- Subsets of points
16Previous work Similarity metrics
- Euclidean distance
- Bounding rectangles
- Envelope count
- Aggregate similarity
17Previous work Indexing and retrieval
- Advanced techniques
- B-trees
- R-trees
- KD-trees
- VP-trees
- Grids
- Applied techniques
- Linear search with compression
18Outline
- Previous work
- Important points
- Indexing and retrieval
- Empirical results
- Conclusions
19Important points
Choose important maxima and minima, and discard
the other points.
20Important points
Choose important maxima and minima, and discard
the other points.
Example
Original series
21Important points
Choose important maxima and minima, and discard
the other points.
Example
Original series
22Important points
Choose important maxima and minima, and discard
the other points.
Example
Compressed series
Original series
23Definition of important points
Important minimum
24Definition of important points
Important minimum
- am is the minimum among ai,, aj
25Definition of important points
Important minimum
- am is the minimum among ai,, aj
26Definition of important points
Important minimum
- am is the minimum among ai,, aj
- R is a knob that determines compression rate
27Definition of important points
Important maximum
- am is the maximum among ai,, aj
- R is a knob that determines compression rate
28Compression example
Original series
29Compression example
Original series
Compressed series
30Compression example
Original series
Compressed series
31Compression example
Original series
Compressed series
32Compression algorithm
- Linear time
- Constant memory
- Accepts streaming data
- For a series with n values, compression time is
0.0133 ? n milliseconds (300 MHz PC,
Visual Basic 6.0).
33Outline
- Previous work
- Important points
- Indexing and retrieval
- Empirical results
- Conclusions
34Retrieval
- Retrieval of time-series similar to a given
pattern. - Intuition
- Find a prominent feature in the pattern
- Find candidate segments with a similar feature
- Compare similarity of candidates to the pattern
35Example Stock charts
Database of time-series
36Example Stock charts
Database of time-series
37Example Stock charts
Database of time-series
Pattern
38Example Stock charts
Database of time-series
Pattern
39Example Stock charts
Database of time-series
Pattern
40Example Stock charts
Database of time-series
Pattern
Retrieval results
.92
.87
.86
.84
41Algorithm
- Identify the prominent leg in the pattern
- Retrieve similar legs from the database
- Identify corresponding candidate segments
- For each candidate segment, compute its
similarity to the pattern - Output the candidates whose similarity is above
the threshold
42Important details
- Use compressed pattern and compressed sequences
in the retrieval process - The prominent feature is the leg having the
greatest ratio of right end to left end - All legs in the database are indexed by their
prominence, using a binary search tree
43Alternative versions
- Different prominence definitions
- Different similarity metrics
- The end-point ratio prominence usually gives the
best empirical results.
44Extended legs
Similar sequence
45Indexing on extended legs
- Advantage More accurate retrieval
- Disadvantage Larger index, more memory
- If a compressed sequence has n legs
- Worst case n2/2 extended legs
- Average case ?(n ? lg n) extended legs
46Outline
- Previous work
- Important points
- Indexing and retrieval
- Empirical results
- Conclusions
47Data sets
- Stock charts
- Air and sea temperatures
- Wind speeds
- Electroencephalograms
- Electrocardiograms
48Data sets
- Stock charts
- Air and sea temperatures
- Wind speeds
- Electroencephalograms
- Electrocardiograms
60,000 points 445,000 points 79,000 points
17,000 points 2,000 points
49Patterns
Compressed patterns with 4 to 27 legs
Examples
50Retrieval time
Retrieval time 0.07 ? m ? k
milliseconds m legs in a pattern k candidates
51Retrieval accuracy Stock charts
20 candidates C 3
10 C 2
5 C 1.5
1 C 1.1
52Retrieval accuracy Wind speeds
20 candidates C 1.5
10 C 1.2
5 C 1.1
53Retrieval candidate quality
Found matches among ten best
Candidates
5 10 20
Stock charts (5,400 legs) 4 4 7 Air and sea
temperatures (5,500 legs) 4 5 6 Wind speeds
(10,500 legs) 3 7 9
54Outline
- Previous work
- Important points
- Indexing and retrieval
- Empirical results
- Conclusions
55Criteria for retrieval methods
Gunopulos 2000
- Work for erratic time-series
- Accept any pattern
- Find inexact matches
- Work when some points are missing
- Work on streaming data
56Criteria for retrieval methods
Gunopulos 2000
?
- Work for erratic time-series
- Accept any pattern
- Find inexact matches
- Work when some points are missing
- Work on streaming data
57Criteria for retrieval methods
Gunopulos 2000
?
- Work for erratic time-series
- Accept any pattern
- Find inexact matches
- Work when some points are missing
- Work on streaming data
58Criteria for retrieval methods
Gunopulos 2000
?
- Work for erratic time-series
- Accept any pattern
- Find inexact matches
- Work when some points are missing
- Work on streaming data
?
59Criteria for retrieval methods
Gunopulos 2000
?
- Work for erratic time-series
- Accept any pattern
- Find inexact matches
- Work when some points are missing
- Work on streaming data
?
?
60Criteria for retrieval methods
Gunopulos 2000
?
- Work for erratic time-series
- Accept any pattern
- Find inexact matches
- Work when some points are missing
- Work on streaming data
?
?
61Main results
- Compression
- Fast compression procedure
- Preserves similarity
- Retrieval
- Works with compressed data
- Controlled trade-off between speed and accuracy