Temporal Data Mining - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Temporal Data Mining

Description:

... that frequently follow within 2 business days of a rise of the IBM stock price. ... type X0: the drop of IBM stock of less than 3%. Minimum confidence ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 28
Provided by: zhu68
Category:
Tags: data | ibm | mining | stock | temporal

less

Transcript and Presenter's Notes

Title: Temporal Data Mining


1
Temporal Data Mining
  • Claudio Bettini, X.Sean Wang and
    Sushil Jajodia
  • Presented by Zhuang Liu

2
Outline
  • What is Data Mining?
  • Formal Problem Definition
  • TAG (Timed Automaton with Granularity)
  • A Naive Solution
  • Techniques for Improving Performance
  • Experimental Results

3
What is Data Mining
  • Data Mining
  • A non-trivial extraction of implicit, previously
    unknown potentially useful
    information from data
  • Common Data Mining Techniques
  • association-rule mining
  • Sequential mining (Temporal mining)
  • Clustering
  • Classification
  • Outlier detection

4
Temporal Data Mining
  • Finding time-related frequent patterns (frequent
    sub-sequences)
  • which pairs of events occur frequently one week
    after another
  • A simple example user may be interested in
    finding all those events that frequently follow
    within 2 business days of a rise of the IBM stock
    price.

5
Definition
  • Event Type (E)
  • e.g. deposit to an account
  • e.g. price increase of a specific stock
  • Event e
  • An event e is a pair e(E, t), where E is an
    event type and t is a positive integer, called
    the timestamp of e .
  • Event Sequence
  • An Event Sequence a finite set of events.
  • Each event (E, t) appearing in an event
    sequence represents the occurrence of event type
    E at time t.

6
Granularity
  • Granularity is a mappingµfrom the set of the
    positive integers to subset of the time domain
    such that for all positive integers i and j with
    iltj
  • (1) implies that
    each number in
  • ??i? is less than all the numbers in
    ??j?, and
  • (2) implies .
  • Example year, month, week, day, business-day,
    business-week etc.

7
TCG
  • A temporal constraint with granularity (TCG)
    m,n? is a binary relation on positive integers.
    For positive integers t1 and t2, (t1, t2)
    satisfies m,n ? iff
  • (1) t1 ? t2
  • (2) and are both defined, and
  • (3)
  • Example TCG0,0day, 0,2hour, 1,1month

8
Event Structure
  • An event structure (with granularities) is a
    rooted directed acyclic graph (W,A,G), where W is
    a finite set of event variables, A ? W ? W andG
    is a mapping from A to the finite set of TCGs.
  • Complex event type derived from S
  • each variable associated with a specific
    event type.
  • Complex event matching S
  • each variable associated with a distinct
    event such that the event timestamps satisfy the
    time constraints.

9
Example of Event Structure
  • Assign the event types for x0 , x1, x2, x3, to
    be IBM-rise, IBM-earnings-report, HP-rise, and
    IBM-fall, respectively, we have a complex event
    type. This complex event type describes that the
    IBM earnings were reported one business day after
    the IBM stock rose, and in the same or the next
    week the IBM stock fell while the HP stock rose
    within 5 business days after the same rise of the
    IBM stock and within 8 hours before the same fall
    of the IBM stock.

1,1b-day
0,1week
0,8hours
0,5b-day
Figure 1 An event structure
10
Formal Problem Definition
  • An event-mining problem is a quadruple (S, ?, E0
    , ?),
  • where S is an event structure, ? is the
    minimum confidence value, E0 an event type, and ?
    is a partial mapping which assigns a set of event
    types to some of the variables (expect root).
  • An event-mining problem is the problem of finding
    all complex event types such that each occurs
    frequently in the input sequence and is derived
    from S by assigning E to the root and a specific
    event type to each of the other variables.
  • Example (S, 0.8, IBM-rise, ?)

11
TAG
  • Timed Automaton with Granularities
  • A basic component to test if a candidate complex
    event type appears frequent in a time sequence.
  • A timed automaton with granularities is a 6-tuple
    ????, S, S0, C, T, F), where
  • (1) ? is a finite set of input letters,
  • (2) S is a finite set of states,
  • (3) S0 ? S is a set of start states,
  • (4) C is a finite set of clocks,
  • (5) T ? S ? S ? ? ? 2C? ?(C) is a set of
    transitions,
  • (6) F ? S is a set of accepting states.

12
TAG
  • ?(C) is the set of all the formulas called clock
    constraints.
  • A transition (s, s, e, ?, ?) represents a
    transition from state s to state s on input
    symbol e. the set ? ? C gives the clocks to be
    reset with this transition. And ? is a clock
    constraint over C.
  • Is essentially standard finite automata with some
    modifications.
  • Each TAG maintains a set of clocks.
  • Both input symbol and clock determine the next
    state.
  • A run is an accepting run if the last state is in
    the set F. An event sequence is accepted by a TAG
    if there exists an accepting run.

13
A Naïve Solution
  • Consider all the event types that occur in the
    given event sequence, and consider all the
    complex types derived from the given event
    structure, one from each assignment of these
    event types to the variables. Each of these
    complex types is called a candidate complex type
    for the event-mining problem.
  • For each candidate complex type, start the
    corresponding TAG at every occurrence of E0. That
    is, for each occurrence of E0 in the event
    structure, use the rest of the event sequence as
    the input to one copy of the TAG. By counting the
    number of TAGs reaching a final state, versus the
    number of occurrences of E0 , all the solutions
    of the event-mining problem will be derived.
  • The number of candidate types is exponential in
    the number of event types occurring in the event
    structure. Too costly.

14
Techniques to improve performance
  • The performance of this algorithm can be improved
    by
  • identifying the possible inconsistencies in the
    given event structure before starting the
    process,
  • reducing the length of the sequence,
  • reducing the number of times an automaton has to
    be started,
  • reducing the number of different automata to be
    started,
  • applying the naïve algorithm.

15
Recognition of Inconsistent Event Structures
  • A event structure is consistent if there exists a
    complex event that matches that event structure.
  • If an event structure is inconsistent, it should
    be discarded even before the mining process
    starts.
  • It is difficult to determine the consistency of
    event structures.
  • Use approximated polynomial algorithms to check
    the consistency of event structures.

16
Recognition of Inconsistent Event Structures
  • If one of the constraints implied by the given
    ones is the empty one, i.e. unsatisfiable, the
    whole event structure is inconsistent.
  • A TCG m, n? is logically implied by a TCG m,
    n ? if each pair (x, y) satisfying the second
    constraint, satisfies also the first one.
  • For example, a TCG 1,2b-week can be converted
    into 3,18day or 0,1month, while it cannot
    be converted into 2,3week-end or 1,3week,
    since the resulting constraints are not implied
    by 1,2b-week.

17
Reduction of the Event Sequence
  • We can reduce the event sequence by
  • exploiting the granularities.
  • For example, if a discovery problem is defined on
    the sub-structure excluding variable x3, the
    input event sequence can be reduced discarding
    any event that does not occur in a business day.

18
Reduction of the occurrences of the root
  • The basic idea is to remove those occurrences of
    reference types which cannot be the root of a
    complex event matching the given structure.
  • It is possible that for some occurrences of the
    reference types in the sequence, a constraint is
    unsatisfiable.
  • Consider all the non-empty sets of explicit and
    implicit constraints on the pair of the root and
    each non-root node. Check if one of the
    constraints cannot be satisfied.
  • For example, if no event occurs in the sequence
    in the next business day of an IBM-rise event,
    this particular reference event can be discarded.
    (No automaton is started for it.)

19
Reduction of the occurrences of the root
  • Let N be the number of occurrences of the
    reference event type in the sequence.
  • Let N be the number of occurrences of reference
    events for which one of the constraints is
    unsatisfiable. These are reference events that
    are certainly not the root of a complex event
    satisfying the given event structure.
  • If N/N 1-?, there cannot be any frequent
    complex event type and the empty set should be
    returned to the user.
  • Otherwise, remove these occurrences of the
    reference type and modify ? into ? (? N) / (N-
    N) .

20
Reduction of the Candidate Type
  • Based on the property if a complex event type
    occurs frequently, then any of its sub-type
    should also occur frequently.
  • In other words, if one assignment to two
    variables is not frequent, any candidate complex
    event type including this assignment wont be
    frequent. So we can remove these complex event
    type from the candidate complex event type.
  • For each subset W of W, the induced approximated
    sub-structure of W is (W, A, G), where A
    consists of all pairs (X, Y) ? W ? W, such that
    there is a path from X to Y in S and there is at
    least one constraint on (X,Y).

21
Reduction of the Candidate Type
  • To find the solutions to the induced discovery
    problems is rather straightforward and simple in
    time complexity. Indeed, the induced
    sub-structure gives the distance from the root to
    the variable (in effect, two distances, namely
    the minimum distance and the maximum distance).
  • For each occurrence of E0 , this distance
    translates into a window, i.e., a period of time
    during which the event for X must appear.
  • Extend the sub-structure to more than one
    non-root variable. These variable form a chain in
    S.

22
Experimental Results
  • Closing prices of 439 stocks for 517 trading days
  • Price changes are partitioned into 7 categories
    (- ?, -5), (-5, -3), (-3, 0), (0, 0), (0,
    3), (3, 5), (5, ?)
  • Total number of event types is 2978. The number
    of event is 181089.
  • The reference event type X0 the drop of IBM
    stock of less than 3. Minimum confidence value
    is 0.7. There is no other assignment to other
    variables.

23
Experimental Results cont.
24
Experimental Results cont.
  • This experiment focuses on Step 4, namely
    reduction of the candidate complex event types by
    using sub-structures.
  • The result shows that after using heuristics the
    number of candidate complex event types reduces
    significantly.

25
Experimental Results cont.
The two frequent event combinations discovered in
the experiment
26
References
  • C. Bettini, Wang, X.S., Jajodia, S. and Jia-Ling,
    L. "Discovering Temporal Relationships with
    Multiple Granularities in Time Sequences". IEEE
    Transations on Knowledge and Data Engineering,
    Vol. 10 (2), 1998.
  • C. Bettini, X. Wang, and S. Jajodia. A General
    Framework for Time Granularity and its
    Application to Temporal Reasoning. Annals of
    Mathematics and Artificial Intelligence, Vol. 22
    (1-2), pages 29-58, Baltzer Science Publishers,
    1998.
  • C. Bettini, X. S. Wang, and S. Jajodia. Testing
    complex temporal relationships involving multiple
    granularities and its application to data mining.
    In Proceedings of the Fifteenth ACM
    SIGACT-SIGMODSIGART Symposium on Principles of
    Database Systems (PODS'96), pages 68-78,
    Montreal, Canada, June 1996
  • C. Bettini, X. Sean Wang, and S. Jajodia. Mining
    temporal relationships with multiple
    granularities in time sequences. Data Engineering
    Bulletin, 2132--38, 1998.

27
  • Thank you
  • Question?
Write a Comment
User Comments (0)
About PowerShow.com