Identifying Patterns in Time Series Data - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Identifying Patterns in Time Series Data

Description:

Atomic Wedgie. Preparation: All patterns are clustered by similarity ... Atomic Wedgie (cont') Usage: ... Atomic Wedgie Optimization. Optimization: Summation ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 23
Provided by: cacsLou
Category:

less

Transcript and Presenter's Notes

Title: Identifying Patterns in Time Series Data


1
Identifying Patterns in Time Series Data
Daniel Lewis 04/06/06
2
Time Series Data
  • Definition
  • An ordered set of m real-valued variables
  • How can patterns that occur in time series data
    be located?

3
Comparing Time Series
  • Euclidean Distance

4
Adaptive Piecewise Constant Approximation
  • Segments of time series data are represented by 2
    values, the mean value of all points in the
    segment and the right endpoint of the segment
  • Allows large queries to be quickly compared, but
    APCA representations must be created first.

5
Adaptive Piecewise Constant Approximation (cont')
6
APCA (cont')
  • Data Compression allows for faster search
  • Can be used for indexing large series
  • Can handle large queries
  • But what if we want to identify patterns in
    streaming time series data?

7
Pattern Recognition
  • For a given query q and a time series s, if the
    Euclidean distance between q and s lt r, the
    series match
  • r user defined, application specific threshold

8
Detecting Patterns in Streaming Data
  • Brute Force Method
  • P1 ,.., Pn Set of patterns of length k
  • S Input Stream
  • For every possible substring of length k in s,
    calculate the distance between the substring and
    all n patterns
  • This method is obviously extremely costly in the
    case of a large pattern set and a large input
    stream
  • O(n(S - k))

9
Speeding Up Pattern Identification
  • Early Abandoning
  • If at any point error gt r 2, we can stop
    computation

10
Wedge Creation
  • Combine multiple patterns into a wedge
  • Define Upper Limit
  • Ui max( C1i , .. , Cki)
  • Define Lower Limit
  • Li min( C1i , .. , Cki)
  • This produces a wedge such that

11
Wedge Creation (cont')

12
Wedge Comparison
  • Distance Between a Query and a Wedge
  • If distance gt r, then the distance between the
    query and all component patterns gt r, allowing
    you to eliminate multiple possible matches with a
    single comparison

13
Hierarchical Wedges
  • The usefulness of any wedge is determined by the
    similarity of the patterns used in its
    construction.
  • More similar patterns create smaller, more useful
    wedges
  • Patterns can be combined in a tree-like pattern
    to produce a hierarchy of wedges

14
Hierarchical Wedges (cont')
15
Atomic Wedgie
  • Preparation
  • All patterns are clustered by similarity
  • The most similar patterns are combined into
    wedges
  • The resulting wedges are combined to form larger,
    less specific wedges

16
Atomic Wedgie (cont')
  • Usage
  • When streaming data arrives, each substring of
    length k is first compared to the largest wedge,
    if dist gt r, comparison stops, else, the distance
    is compared against the two component wedges,
    eliminating any branches where the distance
    exceeds r.
  • Eventually, all branches are eliminated or a
    single (atomic) pattern is matched

17
Atomic Wedgie Optimization
  • Optimization
  • Summation is order independent
  • Large sections are less likely to increase error
    than small sections
  • Thus, if error is summed starting with the
    smallest sections first, the requirements for
    early abandon are more likely to be met earlier

18
Atomic Wedgie (cont')
  • Advantages
  • If wedges are well formed, large speed increases
    can occur
  • A large number of similar possible patterns can
    be analyzed quickly
  • Disadvantages
  • If wedges are poorly formed, the time required
    will exceed the Brute Force Method
  • Dissimilar patterns are not handled well

19
Special Considerations
  • The choice of r (similarity threshold) is of
    great importance
  • if r is too large, a substring can match too many
    patterns to be useful
  • if r is too small, too little matching may occur
  • Good Choice
  • r average distance from any pattern to its
    nearest neighbor

20
Atomic Wedgie Results
21
References
  • Chakrabarti, K., Keogh, E., Mehrotra, S., and
    Pazzani, M., Locally Adaptive Dimensionality
    Reduction for Indexing Large Time Series
    Databases, ACM Transactions on Database Systems,
    Vol 27, 2002.
  • Wei, L., Keogh, E., Van Herle, H., Mafra-Neto,
    A., Atomic Wedgie Efficient Query Filtering
    for Streaming Times Series, Data Mining, Fifth
    IEEE International Conference on 27-30 Nov.
    2005 Page(s)490 - 497

22
Questions?
Write a Comment
User Comments (0)
About PowerShow.com