Mining Asynchronous Periodic Patterns in Time Series Data - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Mining Asynchronous Periodic Patterns in Time Series Data

Description:

Two parameters are used to qualify valid patterns and the symbol ... General Model (1 of 2) Set of features : L. Don't care character : * Pattern : over ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 28
Provided by: csieNc
Category:

less

Transcript and Presenter's Notes

Title: Mining Asynchronous Periodic Patterns in Time Series Data


1
Mining Asynchronous Periodic Patterns in Time
Series Data
Appearing in IEEE Transactions on Knowledge and
Data Engineering 2001
  • Jiong Yang, Wei Wang, Philip S. Yu

Presented by Chih-chieh Hung 2004.12.1
2
Outline
  • Introduction
  • General Model
  • Algorithm
  • Experimental Results
  • Conclusion

3
Introduction
  • Synchronous v.s Asynchronous

4
Introduction
  • Two parameters are used to qualify valid patterns
    and the symbol sequence containing it.
  • min_rep make sure the periodicity
  • max_dis make sure the system behavior

min_rep 3, max_dis 3
Valid segment
Valid segment
Valid segment
5
Main Difficulties
  • Longest valid subsequence Composing
  • All periodic patterns discovery
  • The period is not always available a priori.

6
Outline
  • Introduction
  • General Model
  • Algorithm
  • Experimental Results
  • Conclusion

7
General Model (1 of 2)
  • Set of features L
  • Dont care character
  • Pattern over
  • Period l
  • i-pattern
  • La,b,c,d,e , s a,,c, d, e
  • s is a 4-pattern of period 5.
  • Generalization Specialization

Generalization
Specialization
8
General Model (2 of 2)
  • Match
  • Valid segment
  • min_rep 2
  • Valid subsequence

( , , , ) is a match.
valid_seg1
valid_seg2
valid_seg3
?max_dis
?max_dis
9
Outline
  • Introduction
  • General Model
  • Algorithm
  • Experimental Results
  • Conclusion

10
Algorithm
  • Algorithm Overview
  • Distance-based Pruning
  • Single Pattern Verification
  • Complex Pattern Verification

11
Distance-Based Pruning
  • If DCd,l lt min_rep-1, then d doesnt participate
    in some valid pattern of period l.

min_rep 3 max_dis 3
12
Single Pattern Verification
  • If a symbol d and period l pair has passed the
    distance-based pruning, then Algorithm SB is
    employed to discover the subsequence with the
    most repetitions of (d,,,,) with period l.

13
Extendibility Dominance
  • At position i X is longer than Y

But Y should be kept, because
  • At position j

Y-V is longer than X
14
Algorithm SB
  • Phase A Segment validation
  • at least 1 instance of (d,,,) found
  • of rep. lt min_rep
  • Phase B Valid segment growth
  • the segment becomes valid ( of rep.
    min_rep )
  • Phase C Extension
  • valid segment may end or extend

15
Ambiguity in Phase Transition
Phase B to Phase C
Phase C to Phase A
16
Pruning Issue Dominance
  • After recognizing D7, X overtakes Y
  • K is a good point to check the dominance.

17
Extendible Principles
  • Mark the subsequence that end priori i-1 as in
    Phase C.
  • If repetition of subsequence in Phase A
    min_rep, mark it as in Phase B.
  • The most repetitions in Phase B and C is
    identified and used to update the longest valid
    subsequence for (d,,,,).
  • The dominating subsequence in Phase C is extended.

18
Three Data Structure
  • longest_seq
  • longest valid subsequence that is known (at
    position i) to be not extendible.
  • ongoing_seq
  • a set of subsequences that are currently being
    extended (whether valid or not)
  • valid_seq
  • a set of subsequences that may be extendible

19
Example (1 of 3)
  • 8th occurrence of d1 at position 14

Before
After
20
Example (2 of 3)
  • 9th occurrence of d1 at position 16

Before
After
21
Example (3 of 3)
  • 10th occurrence of d1 at position 17

Before
After
22
Complex Pattern Verification
  • Symbol Property
  • If a pattern P is valid, then all of its
    generalizations are also valid.
  • Segment Property
  • If D is a valid segment for pattern P, then D
    is also a valid segment of all generalizations of
    P.

23
Outline
  • Introduction
  • General Model
  • Algorithm
  • Experimental Results
  • Conclusion

24
Experimental Results
  • Apply this model to a real trace of a web access
    log of http//www.scour.net.
  • Collect all hits on 100 days.
  • Total number of accesses is over 170 million.
  • Time division and labeling
  • 0-4999 A
  • 5000-9999 B
  • .
  • The summarized sequence 14400 occurrences of 71
    symbols.

000
010
020
030
050
040
25
Experimental Results
  • Parameter min_rep 4 , max_dis 200.
  • Discover 212 patterns.
  • Example
  • (b, b, b) in weekdays between 3am and 830am.
  • (c, c, c) in weekdays between 11am and 5pm.

26
Outline
  • Introduction
  • General Model
  • Algorithm
  • Experimental Results
  • Conclusion

27
Conclusion
  • This paper proposed a flexible model of
    asynchronous periodic patterns to mine patterns.
  • Two parameters used to ensure the correctness of
    pattern periodicity (min_rep) and system
    behaviors (max_dis).
  • Proposed algorithm first generates potential
    periods and then validate candidate patterns and
    locate the longest valid subsequence. At last,
    complex patterns are composed.
Write a Comment
User Comments (0)
About PowerShow.com