Title: Mining timedelayed associations from event datasets
1Mining time-delayed associations from event
datasets
- Loo, Kin Kong
- 13 July, 2006
2Contents
- Motivation
- Definition of time-delayed association
- A simple algorithm
- Improving the simple algorithm
- Experiments
- Conclusion
3The Butterfly Effect...
- A butterfly's wings might create tiny changes in
the atmosphere that ultimately cause a tornado to
appear - from Wikipedia
......
http//www.wrh.noaa.gov/hnx/newslet/summer05/xwind
.jpg
http//www.clipartxp.com/Butterflies/04_Butterfly.
jpg
4Road network
- Network of roads and roundabouts
5Road network
- Alerts issued by a network monitoring system
6Episode
- Proposed by Mannila et al in Mannila95
- An episode is a partially ordered collection of
events occurring together - Episodes can be described as directed acyclic
graphs
X
X
X
Y
Z
Y
Y
7Minimal Occurrences
- A time interval ts, te) is a minimal occurrence
of an episode ? if - ? exists in ts, te)
- ? does not occur in any proper sub-interval ts',
te') of ts, te)
8Time-delayed associations
- A time-delayed association r between two event
types is in the form I J , - I and J are two event types
- 0 lt u ? v
- Informally interpreted as if an event of type I
occurs at time t, then it is likely that an event
of type J occurs within the interval tu, tv
9Definition
- Let D be the event dataset
- A1, ..., Am are event attributes with domains D1,
..., Dm - An event e is in the form (A1, ..., Am, t), where
t is the time that e occurred ORe is in the
form (E, t), where E is an event type - NotationsEe the event type of the event te
the time when e occurred - Let E be the set of all possible event types
10Definition (Contd)
- A time-delayed association relates two event
types, in the form I J such that 0 lt u
? v - An event i is a match to I J if
- Ei I
- ? j Ej J and ti u ? tj ? ti v
- The event j here is a consequence to which i
corresponds - Support (number of distinct matches) / D
- Confidence (number of distinct matches) /
(number of occurrences of I) - Length 2
11Complex association
- Treating a time-delayed association as a complex
event type, it can be extended and associate to
another event type - Let R be the complex event type representing I
J - A time-delayed association between R and K is in
the form R K - K is an ordinary event type
- u and v are same as that in I J
12Complex association (Contd)
- An event i is a match to R K if
- i is a match of I J
- ? j such that j is a consequence of i and ? k
Ek K and tj u ? tk ? tj v - Support (number of distinct matches) / D
- Confidence supp(R K ) / supp(I J)
- Length 1 length(I J)
13Problem statement
- Given an event data set D, time constraints u, v
such that 0 lt u ? v, find all time-delayed
associations with support not smaller than ?s and
confidence not smaller than ?c.
14MQ Mappings
- Matches and their corresponding consequences can
be arranged in a table-like structure
A
A
A
B
B
B
C
B
A
B
D
B
Time
2
1
4
3
6
5
8
7
10
9
12
11
14
13
15
0
15From MQ mappings to MQ mapping
A
A
A
B
B
B
C
B
A
B
B
D
Time
2
1
4
3
6
5
8
7
10
9
12
11
14
13
15
0
(A B) C
16Algorithm BRUTE-FORCE
17Room for improvement
- Every possible permutation is evaluated (until
one is found to be infrequent)! - Very large number of MQ mappings are generated ?
too much for main memory? intermediate results
may need to be swapped to secondary storage - Hence, we need
- Good pruning strategies
- Good cache management
18Pruning strategy
- Apriori property does not hold in time-delayed
associations - (A? B)? C is frequent does not implies that A?
C is frequent
19Multiplicity of consequences
- We define the multiplicity of a consequence as
the number of distinct matches corresponding to
the consequence
(A B) C
20GlobalK
- Sort all multiplicities in reverse order
- Find the minimal k such that sum of top-k of the
multiplicities is not less than ?s ? D
(A B) C
?s ? D 3
Infrequent
k 1
21SectTop
- GlobalK does not make use of temporal information
- We can divide the whole history into a number of
segments and keep information for each segment - For each segment, a vector containing
multiplicities of consequences in inverse order
is kept
22SectTop (Contd)
?s ? D 3
Cannot be frequent
23Cache replacement strategies
- Common cache replacement strategies
- FIFO oldest data in cache are replaced
- LRU data that have not been referenced for the
longest time are replaced - LFU data that are least frequently referenced
are replaced
24Candidate generation
25Experiment
- Open and closing prices of 33 HSI constituents
for around 1400 transaction days (since Jan 2000)
are obtained for experiments - The prices are transformed to events as follows
- Hence, 33 data points are generated each day,
each data point in the form (0008B, date)
Closing price
Open price
Open price
Closing price
Open price
Closing price
Type A
Type B
Type C
26Experiment - Effectiveness of pruning strategies
- No. of candidate actually enumerated
?s ()
?s ()
27Experiment - Effectiveness of pruning strategies
(Contd)
- Case when u, v ?, 1 at high and low ?s
28Experiment - Effectiveness of pruning strategies
(Contd)
- Case when u, v ?, 2 at high and low ?s
29Experiment - Cache replacement strategy and
candidate generation
- LRU, u, v ?, 1 at high ?s (0.6), with and
without using pruning strategy
30Experiment - Cache replacement strategy and
candidate generation
- LFU, u, v ?, 1 at high ?s (0.6), with and
without using pruning strategy
31Experiment - Cache replacement strategy and
candidate generation
- LRU, u, v ?, 1 at low ?s (0.3), with and
without using pruning strategy
32Conclusion
- Time-delayed association rules offer an
alternative tool for sequential analysis of event
datasets - Although the Apriori property does not hold in
the model of time-delayed association, we can
still derive good pruning strategies to trim the
search space for frequent associations - Due to the large volume of intermediate data
generated, it is essential to have a rightly
chosen cache replacement strategy - The way candidates are generated can play a
crucial role in the performance of the algorithm
33Reference
- Mannila, H., Toivonen, H., and Verkamo, A.I.
1995. Discovering frequent episodes in sequences.
In Proceedings of the First International
Conference on Knowledge Discovery and Data Mining
(KDD 95). Montreal, Canada, pp. 210215.
34Thank you.