Discovering partial periodic pattern on discrete spatiotemporal data - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Discovering partial periodic pattern on discrete spatiotemporal data

Description:

Time window, w. Goal: Find the periods automatically in window w ... w: window. Suppose w = 1000, w2 is about 1M. absolute value is acceptable. 16 ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 29
Provided by: caohu
Category:

less

Transcript and Presenter's Notes

Title: Discovering partial periodic pattern on discrete spatiotemporal data


1
Discovering partial periodic pattern on discrete
spatio-temporal data
  • Huiping Cao
  • Sep. 26, 2003

2
Outline
  • Background
  • Problem definition
  • Solution
  • Experiments
  • Future work
  • References

3
Background
  • More spatio-temporal data are generated with the
    development of moving computing equipments
  • Most provided methods support queries on such
    kind of data efficiently by making use of index
  • We are trying to find some periodic patterns from
    the data to facilitate the queries (3same
    motivation).

4
Related Work
  • Partial period patterns discovered from
    spatio-temporal data refer to those location
    series that appear periodically and frequently.
  • Existing works on periodic pattern mining
  • Either assume that the periods are given in
    advance by the user
  • Or could not efficiently find the periods
    automatically

5
Pre-handling of data
  • Continuous spatio-temporal data sequence is
    converted to discrete symbol data sequence
  • Discrete data is defined in advance. E.g., some
    district name in the real world.
  • (x,y) sequence (20,20),(21, 20) (21,21)
  • Discrete symbol sequence A A B
  • Where A and B are predefined by the user

6
Problem definition
  • Given discrete value sequence S D1, D2, ...,
    Dn where sampling rate is fixed.
  • Partial pattern s s1 ... sp . Here, si is
    defined over (2L-??) where L is the
    underlying set of features and refers to the
    dont care character.

7
Problem definition
  • s pattern length
  • L-length of s number of si which contains
    letters from L.
  • Sub-pattern of a pattern s a pattern s s1
    ... sp such that si si and si ?si for
    every position i where si ?.
  • E.g. s aa,cde
  • s5,
  • L-length is 4(also called 4-pattern)
  • aa,c and cde are all its sub-patterns

8
Problem definition
  • A patterns s s1 ... sp is true in some period
    segment if
  • for each position i, either si is or all the
    letters in si occur in the ith set of the
    features in the segment.
  • E.g., Pattern ab is true in segment acb, but
    not true in bcb
  • frequency_count(s) in sequence SD1, D2, ..., Dn
  • frequency_count(s) i0?iltm, and string s is
    true in Dis1, Diss, ..., Diss.

9
Problem definition
  • support(s) frequency_count(s)/m
  • m maximum number of periods of length s
    contained in the sequence.(ms? nlt(m1)s).
  • E.g. In ab,cbaebaced, freq_count(ab) 2,
    sup(ab) 2/3
  • frequent partial periodic pattern s
  • sup(s) ? min_conf, which is a user specified
    threshold

10
Problem definition
  • Input
  • A discrete data sequence, S
  • min_support , min_sup
  • Time window, w
  • Goal
  • Find the periods automatically in window w
  • Discover all the frequent patterns for one period
    or some periods

11
Solution
  • Step1
  • scans the sequence and constructs a memory based
    structure, abbreviated list table, to find the
    potential periods.
  • Create disk-based inverted lists for the typical
    data points in the sequence
  • Step2
  • Find all the frequent patterns taking advantage
    of the disk-based inverted lists gotten from the
    first step and the max sub-pattern tree

12
Step 1
  • Abbreviated list table
  • For each value v and each possible period p(1?p ?
    w), count the occurrences of v at position 0, 1,
    ..., p-1
  • Example.

13
Example
  • E.g.
  • SABAAACCAAE
  • min_sup 0.8
  • w5

14
Example(cont.)
  • Possible periods
  • 2,4,5
  • F1
  • p2 A
  • p4 A, A
  • p5 A, A

15
Analysis on step1
  • Time complexity O(n)
  • where n is the sequence length
  • Space O(Dw2)
  • Space Dw(w1)/2
  • D domain size
  • w window
  • Suppose w 1000, w2 is about 1M
  • absolute value is acceptable

16
Analysis on step1(cont.)
  • Compare with the circular autocorrelation method
  • generate F1 in the same time
  • n could be unknown in advance
  • avoid generating useless period
  • e.g.
  • S AAAAA ( dont care), min_sup0.8
  • bitmap of A 1010100110
  • f(0).f(4) ? (1010100110).(0110101010) ?
  • 3 gt 210/40.8 frequent
  • However, p4 is not frequent

17
Step 2
  • Construct max sub-pattern tree by scanning the
    disk-based inverted list
  • access disk with less cost
  • E.g.,
  • Domain A,B,C,D,E,F,G,H
  • The symbols that appear in F1 are A and C
  • Just need scan the inverted list of A and C but
    neednt access other symbols
  • Traverse max sub-pattern tree to get frequent
    ones

18
Step2(cont.)
1
abd
a
d
b
1
0
1
ab
bd
ad
  • F1 a, b, d
  • s tbydi abbdd abccc
  • sup(abd)1
  • sup(bd) 11 2
  • sup(ad) 01 1
  • sup(ab) 11 2

19
Analysis
  • Advantages
  • Find periods efficiently(Experiments) compared
    with the circular autocorrelation method
  • Mine frequent patterns more efficiently(Experiment
    s)
  • Disadvantage
  • Inverted list uses the same space as the sequence

20
Experiments
  • data24192 data points
  • min_sup0.7
  • Varying window

21
Experiments(cont.)
  • window24
  • min_sup0.7
  • Varying data volume

22
Experiments(cont.)
  • data24192 data points
  • window 48
  • Varying min_sup

23
Experiments(cont.)
  • window24
  • min_sup0.7
  • Varying data volume

24
Experiments(cont.)
  • data 67200 data points
  • min_sup0.7
  • Varying window

25
Experiments(cont.)
  • data 67200 data points
  • window48
  • Varying min_sup

26
Future work
  • Finding new kind of patterns
  • How to store patterns more efficiently
  • How to facilitate queries when using patterns

27
References
  • J. Han, G. Dong, Y. Yin. Efficient Mining of
    Partial Periodic Patterns in Time Series
    Database. In ICDE99.
  • C.Berberidis, I. Vlahavas, W. G. Aref. etc. On
    the Discovery of Weak Periodicities in Large Time
    Series. In PKDD02.
  • L.H. Yang, M. L. Lee, W. Hsu. Efficient Mining of
    XML Query Patterns for Caching. In VLDB04.

28
Suggestions Questions
Write a Comment
User Comments (0)
About PowerShow.com