Title: Discovering Calendarbased Temporal Association Rules
1Discovering Calendar-based Temporal Association
Rules
TIME 01, 8th International Symposium on Temporal
Representation and Reasoning
- SHOU Yu Tao
- May. 21st, 2003
2Outline of the Presentation
- Background
- Temporal Association Rule Mining w.r.t Precise
Match - Temporal Association Rule Mining w.r.t Fuzzy
Match - Experiments
- Conclusions
- References
- Q A
3Background
- Temporal Association RuleAssociation rules
along with their temporal intervals - E.g. turkey? pumpkin pie is a temporal
association rule along with the temporal interval
within the week before thanksgiving. - Why interested in temporal association rule
mining? - We may discover different association rules
regarding different time intervals. Some
association rules may hold during some intervals
but not during others. - ? this may lead to useful information.
4Calendar Schema
- Relational schema R (fnDn, fn-1Dn-1, .
f1D1) together with a valid constraint - fi a calendar unit name like year, month, etc.
- Di a finite subset of the positive integers.
- A constraint valid a boolean function on DnD1
specifying which combinations of the values in
DnD1 are valid.
5Calendar Schema
- E.g.a calendar schema (year1995,1996,..2002,
month1,2,..,12, day1,2,..,31) with the
constraint valid that evaluates to True
only if the combination gives a valid date. - e.g., is not valid
- Simply stated, a calendar schema is determined by
a hierarchy of calendar concept - e.g. (year, month, day)
6Calendar Pattern
- Defines a set of time intervals based on the
calendar schema - e.g is a calendar pattern based on
the calendar schema
corresponding to the time intervals consisting of
all the 16th days of all months in year 2000 - Time intervals or periodic cycles can be easily
described by calendar patterns with appropriate
calendar schemas. - E.g. the periodic cycle every seven days can
be expressed by a calendar pattern , where
1
day) depending on which day the cycle starts
7Problem Formulation
- Given a calendar schema R, a set T of timestamped
transactions and a match ratio (optional), we
want to discover all interesting association
rules w.r.t. - Precise Match
- Fuzzy Match
- Assumption we are not interested in the
association rules that only hold during basic
time intervals. Indeed, such rules do not reveal
much interesting information in terms of time. - E.g. if the calendar schema is (year, month,
day), we are not going to find the association
rules hold during each single day. - -- Basic time interval a calendar pattern with
no wild-card symbol
8Problem Formulation
- Temporal Association Rule w.r.t. Precise Match
- Given a calendar schema R and a set T of
timestamped transactions, a temporal association
rule (r,e) hold if and only if the association
rule r holds for each basic time interval t
covered by star calendar pattern e. - -- Star calendar pattern a calendar pattern with
at least one wild-card symbol - E.g., given the calendar schema (year, month,
Thursday), we may have a temporal association
rule (turkey?pumpkin pie, ) that holds
w.r.t precise match. The rule means that the
association rule (turkey?pumpkin pie) holds on
all Thanksgiving days, which is the 4th Thursday
in November of every year.
9Problem Formulation
- Temporal Association Rule w.r.t. Fuzzy Match
- Given a calendar schema R, a set T of timestamped
transactions and a match ratio m, a temporal
association rule (r,e) hold if and only if the
association rule r holds for at least 100m of
basic time interval t covered by star calendar
pattern e. - E.g., given the calendar schema (year, month,
Thursday) and match ratio m0.8, we may have a
temporal association rule (turkey?pumpkin pie,
) that holds w.r.t fuzzy match. This
means that the association rule (turkey?pumpkin
pie) holds on at least 80 of Thanksgiving days.
10Temporal Association Rule Mining
- Two sub-problems
- Finding all large itemsets for all the star
calendar patterns on the given calendar schema
(based on Apriori AS94) crux of the discovery
of temporal association rules. - Generating temporal association rules using the
large itemsets and their calendar patterns the
same as traditional association rule generation
approach AS94.
11Outline of the Algorithm (for both precise and
fuzzy match)
critical step! Because fewer candidate large
itemsets, less time for phase II needed.
The same as traditional approach
The same here
12Phase III for Precise Match
- After the basic time interval e0 is processed in
pass k, the large k-itemsets for all the calendar
patterns e that covers e0 are updated as follows, - If Lk(e) is updated for the first time
(i.g.,Lk(e) NULL), let Lk(e)Lk(e0) - Else Lk(e) Lk(e) Lk(e0)
- E.g.
- given calendar patterns (1995, , 1) and (,2,)
and L2(1995, , 1) AB, DE and L2(,2,)
AB, BC, DE. suppose after processing basic time
interval (1995,2,1), we get L2 AC, BC, DE - ? L2(1995, , 1) DE
- ? L2(,2,) BC, DE
- So after all the basic time intervals are
processed, the set of large k-itemsets for each
calendar pattern could be discovered.
13Phase III for Fuzzy Match
- Associate a counter c_update with each candidate
for each star calendar pattern. - Counters are initially set to 1
- When Lk(e0) is used to update Lk(e) in phase III,
the counters of the itemsets in Lk(e) that are
also in Lk(e0) are increment by 1 - Suppose there are totally N basic time intervals
covered by e and this is the nth update to Lk(e),
an itemset cannot be large for e if its counter
c_update does not satisfy c_update (N-n) mN
14Phase III for Fuzzy Match
- Example
- Calendar schema R (week, day)
- fuzzy match ratio m 0.8
- Consider calendar pattern , suppose there
are only 5 basic time intervals covered. (N5) - This is the 3rd time that L2() is updated
(n3) - So we only keep the itemsets with c_update mN
(N-n) 2
15Candidate Generation (Phase I)
- Direct-Apriori A naïve approach to generate
candidate itemsets is to treat each basic time
interval individually and directly apply
Aprioris candidate generation approach. - For both precise match and fuzzy match
16Candidate Generation for Precise MatchTemporal
AprioriGen
- Since we are not interested in the large itemsets
for basic time intervals, if a Ck(e0) cannot be
large for any of the star calendar patterns that
cover the basic time interval e0, simply ignore
it. - So, we can generate the candidate Ck (k1) as
follows
17Candidate Generation for Precise MatchTemporal
AprioriGen
- Example
- Consider the calendar schema R
(week1,..,5, day1,..,7). Suppose we already
have L2()AB,AC,AD,AE,BC,BD,CD,CE
L2() AB,AC,AD,BC,BD,CE
L2()AB,AC,AD,BD,CD. - By using temporal aprioriGen C3()ABC,ABD
C3()ABD,ACD - C3()C3() U C3()ABC,ABD,ACD
- B y using Direct-Apriori,
- C3() ABC,ABD,ACD,ACE,BCD
18Candidate Generation for Precise
MatchHorizontal Pruning
- If an itemset l in Ck(e0) does not appear in any
of the tentative Lk(e1), where e1 is a 1-star
pattern that covers e0, then l cannot be large
for any star pattern e that covers e0.
Therefore, we drop l from Ck(e0)
19Candidate Generation for Precise
MatchHorizontal Pruning
- Example
- suppose when the
basic time interval is being processed, we
already have - L3()ABD
- L3()ABD,ACD.
- we get C3() ABC,ABD,ACD after using
temporal aprioriGen, we can further prune it by - C3()C3() (L3() U L3())
- ABD, ACD
20Candidate Generation for Fuzzy Match Temporal
AprioriGen
Temporal AprioriGen for precise match cannot be
directly applied to solve the fuzzy match
problem, because an itemset may be large for a
star calendar e even if it is not large for any
1-star pattern covered by e.
For example Consider a schema R (week, day)
and fuzzy ratio m 0.8. We can see and
is large and is not large
21Candidate Generation for Fuzzy Match Temporal
AprioriGen
- Change the temporal aprioriGen to apply to fuzzy
match as follows, - Change blue underline part to Lk-1(e) when memory
is the critical resource.
22Candidate Generation for Fuzzy Match Temporal
AprioriGen
- Example
- Suppose we already have
- L2() AB,AC,AD,AE,BD,CD,CE
- L2() AB,AC,AD,BC,BD,CE
- L2() AB,AC,AD,BD,CD
- L2() AB,AD,BD,CD,AC,AE
- LT L2() L2() AB,AC,AD,BD,CE
- C3() aprioriGen(LT) ABD
- Similarly, we can get C3()ABD,ACD and
C3()ABD,ACE - ?C3() C3() U C3() U C3()
ABD,ACD,ACE
23Candidate Generation for Fuzzy MatchHorizontal
Pruning
- The pruning idea is to discard the candidate
itemsets that cannot be large for calendar
pattern e even if they are large for basic time
interval e0.
24Candidate Generation for Fuzzy MatchHorizontal
Pruning
- Example
- Suppose we already have
- C3() ABD,ACD,ACE
- L3(), L3(), L3() have been
updated once. - C3()ABD,ABE C3()ABD,ACDC3()ABD
- then C3() can be pruned as
- C3() C3() (C3()UC3()UC3(,))
- ABD,ACD
25Experiments
- Real Data set
- Data file consists of homepage request records,
each of which contains attribute values
describing the request and the person who sent
the request. - Data file records are from Jan 30 to Mar 31,2000
- Calendar schema used R week, day, timeofday,
where timeofday contains
(0am-8am), daytime (8am-4pm), evening (4pm-12pm) - Data set contains 777,480 transactions, 23.4
items per transaction on average.
26Experiments
27Experiments
- Synthetic Data Set
- Extend the data generator propose in AS94 to
incorporate temporal features.
28Experiments
- Synthetic Data Set Result
29Conclusions
- Develops a new representation mechanism for
temporal association rules on the basis of
calendars and identify two classes of interesting
temporal association rules w.r.t. precise match
and fuzzy match. - The representation requires less prior knowledge
and resulting time intervals are easier to
understand - Extend the algorithm Apriori and develop two
optimization techniques to discover both classes
of temporal association rules - Experiments show that the optimization techniques
are effective
30Possible Future Works
- It requires for a calendar schema (fn,fn-1,,f1),
each calendar unit of fi is uniquely contained in
a unit of fi1, where 0 - E.g., (year, month, week) is NOT allowed because
a week may not be contained in a unique month - Consider temporal patterns in other data mining
problems such as clustering, etc.
31References
- Y. Li, P. Ning, X. S. Wang, and S. Jajodia.
Discovering calendar-based temporal association
rules. In the Eighth International Symposium on
Temporal Representation and Reasoning (TIME 01) - AS94 R. Agrawal and R. Srikant. Fast algorithms
for mining association rules in large databases.
VLDB 94 - S. Ramaswamy, S. Mahajan and A. Silberschatz. On
the discovery of interesting patterns in
association rules. VLDB98
32Questions and Answers
?