Title: Temporal Data Mining
1 Temporal Data Mining
- Claudio Bettini, X.Sean Wang and
Sushil Jajodia - Presented by Zhuang Liu
2Outline
- What is Data Mining?
- Formal Problem Definition
- TAG (Timed Automaton with Granularity)
- A Naive Solution
- Techniques for Improving Performance
- Experimental Results
3What is Data Mining
- Data Mining
- A non-trivial extraction of implicit, previously
unknown potentially useful
information from data - Common Data Mining Techniques
- association-rule mining
- Sequential mining (Temporal mining)
- Clustering
- Classification
- Outlier detection
4Temporal Data Mining
- Finding time-related frequent patterns (frequent
sub-sequences) - which pairs of events occur frequently one week
after another - A simple example user may be interested in
finding all those events that frequently follow
within 2 business days of a rise of the IBM stock
price.
5Definition
- Event Type (E)
- e.g. deposit to an account
- e.g. price increase of a specific stock
- Event e
- An event e is a pair e(E, t), where E is an
event type and t is a positive integer, called
the timestamp of e . - Event Sequence
- An Event Sequence a finite set of events.
- Each event (E, t) appearing in an event
sequence represents the occurrence of event type
E at time t. -
6Granularity
- Granularity is a mappingµfrom the set of the
positive integers to subset of the time domain
such that for all positive integers i and j with
iltj - (1) implies that
each number in - ??i? is less than all the numbers in
??j?, and - (2) implies .
- Example year, month, week, day, business-day,
business-week etc.
7TCG
- A temporal constraint with granularity (TCG)
m,n? is a binary relation on positive integers.
For positive integers t1 and t2, (t1, t2)
satisfies m,n ? iff - (1) t1 ? t2
- (2) and are both defined, and
- (3)
- Example TCG0,0day, 0,2hour, 1,1month
8Event Structure
- An event structure (with granularities) is a
rooted directed acyclic graph (W,A,G), where W is
a finite set of event variables, A ? W ? W andG
is a mapping from A to the finite set of TCGs. - Complex event type derived from S
- each variable associated with a specific
event type. - Complex event matching S
- each variable associated with a distinct
event such that the event timestamps satisfy the
time constraints.
9Example of Event Structure
- Assign the event types for x0 , x1, x2, x3, to
be IBM-rise, IBM-earnings-report, HP-rise, and
IBM-fall, respectively, we have a complex event
type. This complex event type describes that the
IBM earnings were reported one business day after
the IBM stock rose, and in the same or the next
week the IBM stock fell while the HP stock rose
within 5 business days after the same rise of the
IBM stock and within 8 hours before the same fall
of the IBM stock.
1,1b-day
0,1week
0,8hours
0,5b-day
Figure 1 An event structure
10Formal Problem Definition
- An event-mining problem is a quadruple (S, ?, E0
, ?), - where S is an event structure, ? is the
minimum confidence value, E0 an event type, and ?
is a partial mapping which assigns a set of event
types to some of the variables (expect root). - An event-mining problem is the problem of finding
all complex event types such that each occurs
frequently in the input sequence and is derived
from S by assigning E to the root and a specific
event type to each of the other variables. - Example (S, 0.8, IBM-rise, ?)
11TAG
- Timed Automaton with Granularities
- A basic component to test if a candidate complex
event type appears frequent in a time sequence. - A timed automaton with granularities is a 6-tuple
????, S, S0, C, T, F), where - (1) ? is a finite set of input letters,
- (2) S is a finite set of states,
- (3) S0 ? S is a set of start states,
- (4) C is a finite set of clocks,
- (5) T ? S ? S ? ? ? 2C? ?(C) is a set of
transitions, - (6) F ? S is a set of accepting states.
12TAG
- ?(C) is the set of all the formulas called clock
constraints. - A transition (s, s, e, ?, ?) represents a
transition from state s to state s on input
symbol e. the set ? ? C gives the clocks to be
reset with this transition. And ? is a clock
constraint over C. - Is essentially standard finite automata with some
modifications. - Each TAG maintains a set of clocks.
- Both input symbol and clock determine the next
state. - A run is an accepting run if the last state is in
the set F. An event sequence is accepted by a TAG
if there exists an accepting run.
13A Naïve Solution
- Consider all the event types that occur in the
given event sequence, and consider all the
complex types derived from the given event
structure, one from each assignment of these
event types to the variables. Each of these
complex types is called a candidate complex type
for the event-mining problem. - For each candidate complex type, start the
corresponding TAG at every occurrence of E0. That
is, for each occurrence of E0 in the event
structure, use the rest of the event sequence as
the input to one copy of the TAG. By counting the
number of TAGs reaching a final state, versus the
number of occurrences of E0 , all the solutions
of the event-mining problem will be derived. - The number of candidate types is exponential in
the number of event types occurring in the event
structure. Too costly.
14Techniques to improve performance
- The performance of this algorithm can be improved
by - identifying the possible inconsistencies in the
given event structure before starting the
process, - reducing the length of the sequence,
- reducing the number of times an automaton has to
be started, - reducing the number of different automata to be
started, - applying the naïve algorithm.
15Recognition of Inconsistent Event Structures
- A event structure is consistent if there exists a
complex event that matches that event structure. - If an event structure is inconsistent, it should
be discarded even before the mining process
starts. - It is difficult to determine the consistency of
event structures. - Use approximated polynomial algorithms to check
the consistency of event structures.
16Recognition of Inconsistent Event Structures
- If one of the constraints implied by the given
ones is the empty one, i.e. unsatisfiable, the
whole event structure is inconsistent. - A TCG m, n? is logically implied by a TCG m,
n ? if each pair (x, y) satisfying the second
constraint, satisfies also the first one. - For example, a TCG 1,2b-week can be converted
into 3,18day or 0,1month, while it cannot
be converted into 2,3week-end or 1,3week,
since the resulting constraints are not implied
by 1,2b-week.
17Reduction of the Event Sequence
- We can reduce the event sequence by
- exploiting the granularities.
- For example, if a discovery problem is defined on
the sub-structure excluding variable x3, the
input event sequence can be reduced discarding
any event that does not occur in a business day.
18Reduction of the occurrences of the root
- The basic idea is to remove those occurrences of
reference types which cannot be the root of a
complex event matching the given structure. - It is possible that for some occurrences of the
reference types in the sequence, a constraint is
unsatisfiable. - Consider all the non-empty sets of explicit and
implicit constraints on the pair of the root and
each non-root node. Check if one of the
constraints cannot be satisfied. - For example, if no event occurs in the sequence
in the next business day of an IBM-rise event,
this particular reference event can be discarded.
(No automaton is started for it.)
19Reduction of the occurrences of the root
- Let N be the number of occurrences of the
reference event type in the sequence. - Let N be the number of occurrences of reference
events for which one of the constraints is
unsatisfiable. These are reference events that
are certainly not the root of a complex event
satisfying the given event structure. - If N/N 1-?, there cannot be any frequent
complex event type and the empty set should be
returned to the user. - Otherwise, remove these occurrences of the
reference type and modify ? into ? (? N) / (N-
N) .
20Reduction of the Candidate Type
- Based on the property if a complex event type
occurs frequently, then any of its sub-type
should also occur frequently. - In other words, if one assignment to two
variables is not frequent, any candidate complex
event type including this assignment wont be
frequent. So we can remove these complex event
type from the candidate complex event type. - For each subset W of W, the induced approximated
sub-structure of W is (W, A, G), where A
consists of all pairs (X, Y) ? W ? W, such that
there is a path from X to Y in S and there is at
least one constraint on (X,Y).
21Reduction of the Candidate Type
- To find the solutions to the induced discovery
problems is rather straightforward and simple in
time complexity. Indeed, the induced
sub-structure gives the distance from the root to
the variable (in effect, two distances, namely
the minimum distance and the maximum distance). - For each occurrence of E0 , this distance
translates into a window, i.e., a period of time
during which the event for X must appear. - Extend the sub-structure to more than one
non-root variable. These variable form a chain in
S.
22Experimental Results
- Closing prices of 439 stocks for 517 trading days
- Price changes are partitioned into 7 categories
(- ?, -5), (-5, -3), (-3, 0), (0, 0), (0,
3), (3, 5), (5, ?) - Total number of event types is 2978. The number
of event is 181089. - The reference event type X0 the drop of IBM
stock of less than 3. Minimum confidence value
is 0.7. There is no other assignment to other
variables.
23Experimental Results cont.
24Experimental Results cont.
- This experiment focuses on Step 4, namely
reduction of the candidate complex event types by
using sub-structures. - The result shows that after using heuristics the
number of candidate complex event types reduces
significantly.
25Experimental Results cont.
The two frequent event combinations discovered in
the experiment
26References
- C. Bettini, Wang, X.S., Jajodia, S. and Jia-Ling,
L. "Discovering Temporal Relationships with
Multiple Granularities in Time Sequences". IEEE
Transations on Knowledge and Data Engineering,
Vol. 10 (2), 1998. - C. Bettini, X. Wang, and S. Jajodia. A General
Framework for Time Granularity and its
Application to Temporal Reasoning. Annals of
Mathematics and Artificial Intelligence, Vol. 22
(1-2), pages 29-58, Baltzer Science Publishers,
1998. - C. Bettini, X. S. Wang, and S. Jajodia. Testing
complex temporal relationships involving multiple
granularities and its application to data mining.
In Proceedings of the Fifteenth ACM
SIGACT-SIGMODSIGART Symposium on Principles of
Database Systems (PODS'96), pages 68-78,
Montreal, Canada, June 1996 - C. Bettini, X. Sean Wang, and S. Jajodia. Mining
temporal relationships with multiple
granularities in time sequences. Data Engineering
Bulletin, 2132--38, 1998.
27