General problem - PowerPoint PPT Presentation

About This Presentation
Title:

General problem

Description:

Example: Electrocardiogram. Database of time-series. Pattern. Retrieval results .91 .87. ... Electrocardiograms. 48. Data sets. Stock charts. Air and sea ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 62
Provided by: kevinb51
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: General problem


1
(No Transcript)
2
General problem
Retrieval of time-series similar to a given
pattern.
3
Example Stock charts
Database of time-series
4
Example Stock charts
Database of time-series
Pattern
5
Example Stock charts
Database of time-series
Pattern
Retrieval results
6
Example Stock charts
Database of time-series
Pattern
Retrieval results
.92
.87
.86
.84
7
Example Electrocardiogram
Database of time-series
8
Example Electrocardiogram
Database of time-series
Pattern
9
Example Electrocardiogram
Database of time-series
Pattern
Retrieval results
.91
.87
.98
1.0
10
Outline
  • Previous work
  • Important points
  • Indexing and retrieval
  • Empirical results
  • Conclusions

11
Outline
  • Previous work
  • Important points
  • Indexing and retrieval
  • Empirical results
  • Conclusions


Contributions
12
Criteria for retrieval methods
Gunopulos 2000
  • Work for erratic time-series
  • Accept any pattern
  • Find inexact matches
  • Work when some points are missing
  • Work on streaming data

13
Outline
  • Previous work
  • Important points
  • Indexing and retrieval
  • Empirical results
  • Conclusions

14
Previous work
  • Feature choice
  • Similarity metrics
  • Indexing and retrieval

15
Previous work Feature choice
  • Discrete Fourier transforms
  • Alphabets
  • Statistical features
  • Subsets of points

16
Previous work Similarity metrics
  • Euclidean distance
  • Bounding rectangles
  • Envelope count
  • Aggregate similarity

17
Previous work Indexing and retrieval
  • Advanced techniques
  • B-trees
  • R-trees
  • KD-trees
  • VP-trees
  • Grids
  • Applied techniques
  • Linear search with compression

18
Outline
  • Previous work
  • Important points
  • Indexing and retrieval
  • Empirical results
  • Conclusions

19
Important points
Choose important maxima and minima, and discard
the other points.
20
Important points
Choose important maxima and minima, and discard
the other points.
Example
Original series
21
Important points
Choose important maxima and minima, and discard
the other points.
Example
Original series
22
Important points
Choose important maxima and minima, and discard
the other points.
Example
Compressed series
Original series
23
Definition of important points
Important minimum

24
Definition of important points
Important minimum
  • am is the minimum among ai,, aj


25
Definition of important points
Important minimum
  • am is the minimum among ai,, aj
  • ai/am ? R and aj/am ? R

26
Definition of important points
Important minimum
  • am is the minimum among ai,, aj
  • ai/am ? R and aj/am ? R
  • R is a knob that determines compression rate

27
Definition of important points
Important maximum
  • am is the maximum among ai,, aj
  • am/ai ? R and am/aj ? R
  • R is a knob that determines compression rate

28
Compression example
Original series
29
Compression example
Original series
Compressed series
30
Compression example
Original series
Compressed series
31
Compression example
Original series
Compressed series
32
Compression algorithm
  • Linear time
  • Constant memory
  • Accepts streaming data
  • For a series with n values, compression time is
    0.0133 ? n milliseconds (300 MHz PC,
    Visual Basic 6.0).

33
Outline
  • Previous work
  • Important points
  • Indexing and retrieval
  • Empirical results
  • Conclusions

34
Retrieval
  • Retrieval of time-series similar to a given
    pattern.
  • Intuition
  • Find a prominent feature in the pattern
  • Find candidate segments with a similar feature
  • Compare similarity of candidates to the pattern

35
Example Stock charts
Database of time-series
36
Example Stock charts
Database of time-series
37
Example Stock charts
Database of time-series
Pattern
38
Example Stock charts
Database of time-series
Pattern
39
Example Stock charts
Database of time-series
Pattern
40
Example Stock charts
Database of time-series
Pattern
Retrieval results
.92
.87
.86
.84
41
Algorithm
  • Identify the prominent leg in the pattern
  • Retrieve similar legs from the database
  • Identify corresponding candidate segments
  • For each candidate segment, compute its
    similarity to the pattern
  • Output the candidates whose similarity is above
    the threshold

42
Important details
  • Use compressed pattern and compressed sequences
    in the retrieval process
  • The prominent feature is the leg having the
    greatest ratio of right end to left end
  • All legs in the database are indexed by their
    prominence, using a binary search tree

43
Alternative versions
  • Different prominence definitions
  • Different similarity metrics
  • The end-point ratio prominence usually gives the
    best empirical results.

44
Extended legs
Similar sequence
45
Indexing on extended legs
  • Advantage More accurate retrieval
  • Disadvantage Larger index, more memory
  • If a compressed sequence has n legs
  • Worst case n2/2 extended legs
  • Average case ?(n ? lg n) extended legs

46
Outline
  • Previous work
  • Important points
  • Indexing and retrieval
  • Empirical results
  • Conclusions

47
Data sets
  • Stock charts
  • Air and sea temperatures
  • Wind speeds
  • Electroencephalograms
  • Electrocardiograms

48
Data sets
  • Stock charts
  • Air and sea temperatures
  • Wind speeds
  • Electroencephalograms
  • Electrocardiograms

60,000 points 445,000 points 79,000 points
17,000 points 2,000 points
49
Patterns
Compressed patterns with 4 to 27 legs
Examples
50
Retrieval time
Retrieval time 0.07 ? m ? k
milliseconds m legs in a pattern k candidates
51
Retrieval accuracy Stock charts
20 candidates C 3
10 C 2
5 C 1.5
1 C 1.1
52
Retrieval accuracy Wind speeds
20 candidates C 1.5
10 C 1.2
5 C 1.1
53
Retrieval candidate quality
Found matches among ten best
Candidates
5 10 20
Stock charts (5,400 legs) 4 4 7 Air and sea
temperatures (5,500 legs) 4 5 6 Wind speeds
(10,500 legs) 3 7 9
54
Outline
  • Previous work
  • Important points
  • Indexing and retrieval
  • Empirical results
  • Conclusions

55
Criteria for retrieval methods
Gunopulos 2000
  • Work for erratic time-series
  • Accept any pattern
  • Find inexact matches
  • Work when some points are missing
  • Work on streaming data

56
Criteria for retrieval methods
Gunopulos 2000
?
  • Work for erratic time-series
  • Accept any pattern
  • Find inexact matches
  • Work when some points are missing
  • Work on streaming data

57
Criteria for retrieval methods
Gunopulos 2000
?
  • Work for erratic time-series
  • Accept any pattern
  • Find inexact matches
  • Work when some points are missing
  • Work on streaming data


58
Criteria for retrieval methods
Gunopulos 2000
?
  • Work for erratic time-series
  • Accept any pattern
  • Find inexact matches
  • Work when some points are missing
  • Work on streaming data


?
59
Criteria for retrieval methods
Gunopulos 2000
?
  • Work for erratic time-series
  • Accept any pattern
  • Find inexact matches
  • Work when some points are missing
  • Work on streaming data


?
?
60
Criteria for retrieval methods
Gunopulos 2000
?
  • Work for erratic time-series
  • Accept any pattern
  • Find inexact matches
  • Work when some points are missing
  • Work on streaming data


?
?

61
Main results
  • Compression
  • Fast compression procedure
  • Preserves similarity
  • Retrieval
  • Works with compressed data
  • Controlled trade-off between speed and accuracy
Write a Comment
User Comments (0)
About PowerShow.com