Mining timedelayed associations from event datasets - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Mining timedelayed associations from event datasets

Description:

Episode. Proposed by Mannila et al in [Mannila95] 'An episode is a partially ordered ... Episodes can be described as directed acyclic graphs. X. Y. X ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 35
Provided by: looki
Category:

less

Transcript and Presenter's Notes

Title: Mining timedelayed associations from event datasets


1
Mining time-delayed associations from event
datasets
  • Loo, Kin Kong
  • 13 July, 2006

2
Contents
  • Motivation
  • Definition of time-delayed association
  • A simple algorithm
  • Improving the simple algorithm
  • Experiments
  • Conclusion

3
The Butterfly Effect...
  • A butterfly's wings might create tiny changes in
    the atmosphere that ultimately cause a tornado to
    appear - from Wikipedia

......
http//www.wrh.noaa.gov/hnx/newslet/summer05/xwind
.jpg
http//www.clipartxp.com/Butterflies/04_Butterfly.
jpg
4
Road network
  • Network of roads and roundabouts

5
Road network
  • Alerts issued by a network monitoring system

6
Episode
  • Proposed by Mannila et al in Mannila95
  • An episode is a partially ordered collection of
    events occurring together
  • Episodes can be described as directed acyclic
    graphs

X
X
X
Y
Z
Y
Y
7
Minimal Occurrences
  • A time interval ts, te) is a minimal occurrence
    of an episode ? if
  • ? exists in ts, te)
  • ? does not occur in any proper sub-interval ts',
    te') of ts, te)

8
Time-delayed associations
  • A time-delayed association r between two event
    types is in the form I J ,
  • I and J are two event types
  • 0 lt u ? v
  • Informally interpreted as if an event of type I
    occurs at time t, then it is likely that an event
    of type J occurs within the interval tu, tv

9
Definition
  • Let D be the event dataset
  • A1, ..., Am are event attributes with domains D1,
    ..., Dm
  • An event e is in the form (A1, ..., Am, t), where
    t is the time that e occurred ORe is in the
    form (E, t), where E is an event type
  • NotationsEe the event type of the event te
    the time when e occurred
  • Let E be the set of all possible event types

10
Definition (Contd)
  • A time-delayed association relates two event
    types, in the form I J such that 0 lt u
    ? v
  • An event i is a match to I J if
  • Ei I
  • ? j Ej J and ti u ? tj ? ti v
  • The event j here is a consequence to which i
    corresponds
  • Support (number of distinct matches) / D
  • Confidence (number of distinct matches) /
    (number of occurrences of I)
  • Length 2

11
Complex association
  • Treating a time-delayed association as a complex
    event type, it can be extended and associate to
    another event type
  • Let R be the complex event type representing I
    J
  • A time-delayed association between R and K is in
    the form R K
  • K is an ordinary event type
  • u and v are same as that in I J

12
Complex association (Contd)
  • An event i is a match to R K if
  • i is a match of I J
  • ? j such that j is a consequence of i and ? k
    Ek K and tj u ? tk ? tj v
  • Support (number of distinct matches) / D
  • Confidence supp(R K ) / supp(I J)
  • Length 1 length(I J)

13
Problem statement
  • Given an event data set D, time constraints u, v
    such that 0 lt u ? v, find all time-delayed
    associations with support not smaller than ?s and
    confidence not smaller than ?c.

14
MQ Mappings
  • Matches and their corresponding consequences can
    be arranged in a table-like structure

A
A
A
B
B
B
C
B
A
B
D
B
Time
2
1
4
3
6
5
8
7
10
9
12
11
14
13
15
0
15
From MQ mappings to MQ mapping
A
A
A
B
B
B
C
B
A
B
B
D
Time
2
1
4
3
6
5
8
7
10
9
12
11
14
13
15
0
(A B) C
16
Algorithm BRUTE-FORCE
17
Room for improvement
  • Every possible permutation is evaluated (until
    one is found to be infrequent)!
  • Very large number of MQ mappings are generated ?
    too much for main memory? intermediate results
    may need to be swapped to secondary storage
  • Hence, we need
  • Good pruning strategies
  • Good cache management

18
Pruning strategy
  • Apriori property does not hold in time-delayed
    associations
  • (A? B)? C is frequent does not implies that A?
    C is frequent

19
Multiplicity of consequences
  • We define the multiplicity of a consequence as
    the number of distinct matches corresponding to
    the consequence

(A B) C
20
GlobalK
  • Sort all multiplicities in reverse order
  • Find the minimal k such that sum of top-k of the
    multiplicities is not less than ?s ? D

(A B) C
?s ? D 3
Infrequent
k 1
21
SectTop
  • GlobalK does not make use of temporal information
  • We can divide the whole history into a number of
    segments and keep information for each segment
  • For each segment, a vector containing
    multiplicities of consequences in inverse order
    is kept

22
SectTop (Contd)
?s ? D 3
Cannot be frequent
23
Cache replacement strategies
  • Common cache replacement strategies
  • FIFO oldest data in cache are replaced
  • LRU data that have not been referenced for the
    longest time are replaced
  • LFU data that are least frequently referenced
    are replaced

24
Candidate generation
25
Experiment
  • Open and closing prices of 33 HSI constituents
    for around 1400 transaction days (since Jan 2000)
    are obtained for experiments
  • The prices are transformed to events as follows
  • Hence, 33 data points are generated each day,
    each data point in the form (0008B, date)

Closing price
Open price
Open price
Closing price
Open price
Closing price
Type A
Type B
Type C
26
Experiment - Effectiveness of pruning strategies
  • No. of candidate actually enumerated

?s ()
?s ()
27
Experiment - Effectiveness of pruning strategies
(Contd)
  • Case when u, v ?, 1 at high and low ?s

28
Experiment - Effectiveness of pruning strategies
(Contd)
  • Case when u, v ?, 2 at high and low ?s

29
Experiment - Cache replacement strategy and
candidate generation
  • LRU, u, v ?, 1 at high ?s (0.6), with and
    without using pruning strategy

30
Experiment - Cache replacement strategy and
candidate generation
  • LFU, u, v ?, 1 at high ?s (0.6), with and
    without using pruning strategy

31
Experiment - Cache replacement strategy and
candidate generation
  • LRU, u, v ?, 1 at low ?s (0.3), with and
    without using pruning strategy

32
Conclusion
  • Time-delayed association rules offer an
    alternative tool for sequential analysis of event
    datasets
  • Although the Apriori property does not hold in
    the model of time-delayed association, we can
    still derive good pruning strategies to trim the
    search space for frequent associations
  • Due to the large volume of intermediate data
    generated, it is essential to have a rightly
    chosen cache replacement strategy
  • The way candidates are generated can play a
    crucial role in the performance of the algorithm

33
Reference
  • Mannila, H., Toivonen, H., and Verkamo, A.I.
    1995. Discovering frequent episodes in sequences.
    In Proceedings of the First International
    Conference on Knowledge Discovery and Data Mining
    (KDD 95). Montreal, Canada, pp. 210215.

34
Thank you.
Write a Comment
User Comments (0)
About PowerShow.com