TAR: Temporal Association Rules on Evolving Numerical Attributes - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

TAR: Temporal Association Rules on Evolving Numerical Attributes

Description:

... evolution E(Ai) iff, for each snapshot in the window, the value of Ai in the ... interval specified in E(Ai) Follows an evolution conjunction E(A1) E(A2) ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 33
Provided by: sarah132
Category:

less

Transcript and Presenter's Notes

Title: TAR: Temporal Association Rules on Evolving Numerical Attributes


1
TAR Temporal Association Rules on Evolving
Numerical Attributes
  • Wei Wang, Jiong Yang, and Richard Muntz
  • Speaker Sarah Chan
  • CSIS DB Seminar
  • May 7, 2003

2
Presentation Outline
  • Introduction
  • Problem Definition
  • Mining Algorithms
  • Performance Evaluation
  • Conclusions

3
Introduction
  • Association rule mining
  • X ? Y (itemsets)
  • Existence of X implies existence of Y
  • Earlier work focused on binary attributes and
    intra-transaction relationships
  • E.g. ham ? bread means A customer who buys
    ham is likely to buy bread as well

4
Introduction
  • Cannot describe relationships such as
  • If price of item A falls below 1, then monthly
    sales of item B rise by a margin between 10K and
    20K.
  • People between 35 and 45 with salary between 80K
    and 120K are likely to buy a house whose price is
    between 300K and 400K within 2 years of marriage.
  • Goal to mine ARs involving numerical attributes
    and temporal evolution

5
Problem Definition
  • Each object has a set of numerical attributes
  • Database a sequence of snapshots S1, S2, .. St
    of objects
  • Evolution temporal changes of values of some
    attribute of some object
  • E.g. Evolution of salary attr. with 3 snapshots
  • (salary ? 40000,45000) ? (salary ?
    47500,55000) ? (salary ? 60000,70000)

6
Problem Definition
  • TARs (on evolving numerical attributes) ARs that
    capture correlations among attr. evolutions
  • Scope of paper only consider correlations of
    simultaneous evolutions (i.e. attr. evolutions
    over same set of snapshots)

7
Mining Quantitative ARs
  • Srikant and Agrawal (SIGMOD96)
  • Divide domain of each quantitative attr. into
    intervals
  • Combine intervals as long as their support is
    less than max-sup threshold
  • A set of items original and combined intervals
  • Apply traditional AR mining algorithm

8
Mining Quantitative ARs
  • BitOp (Lent et al., ICDE97)
  • Rule form
  • A ? B ? C
  • quantitative categorical
  • Partition attribute domain
  • into 2-D grids
  • For each value of attr. C
  • Examine data in each grid cell to see if AR
    applies
  • Represent result by a bit in a 2-D bitmap
  • Combine ARs with adjacent LHS attr. values to
    form a clustered AR
  • Smoothing to cover small holes in a big cluster

a1
a2
a3
a4
a5
a6
b1
x
x
x
b2
x
x
x
x
x
x
x
x
x
x
x
b3
x
x
x
x
x
x
x
x
x
b4
9
Mining TARs
  • SR algorithm (based on Srikant et al., 1996)
  • Map numerical attribute evolutions to binary
    attrs.
  • Apply any traditional AR mining algorithm
  • Transform binary attr. values in rules to
    numerical ranges
  • Complexity
  • For a numerical attr. quantized to b intervals
  • Need O(b2) items to represent all possible
    sub-ranges
  • For t snapshots, need O(b2t) items to encode all
    possible evolutions
  • Huge number of items, very inefficient

10
Mining TARs
  • LE algorithm (based on BitOp)
  • Quantize domains
  • Map each possible evolution of RHS attr. into an
    item
  • For each rule form, generate clustered rules for
    each possible value of each possible RHS attr.
  • Complexity
  • For a RHS attr. quantized to b intervals,
    consider its evolution over t snapshots
  • There could be b2t distinct evolutions
  • Total no. of possible evolutions increases
    exponentially with no. of attrs. and no. of
    snapshots

11
Mining TARs
  • TAR algorithm

12
The Model Evolution and Its Space
  • Given attr. Ai and m snapshots
  • Evolution E(Ai ) (Ai ? l1, u1) ? (Ai ? l2,
    u2) ? ? (Ai ? lm, um)
  • Length of evolution m
  • Evolution space of Ai m dimensional space (jth
    dimension associated with value of Ai at jth
    snapshot)

13
The Model Evolution and Its Space
  • E.g. E1 (salary ? 40000,45000) ? (salary ?
    47500,55000) ? (salary ? 60000,70000)

14
The Model Evolution Conj. and Its Space
  • Given n attrs A1, A2, , An (length m)
  • Evolution conjunction E(A1) ? E(A2) ? ? E(An)
  • Evolution space n x m dimensional space (each
    dimension associated with value of one attr. at
    one snapshot)

15
The Model TAR
  • TAR X ? Y (evolution conjunctions)
  • Symmetric relationship
  • Assumption Y only contains evolution of one
    attr.
  • E(A1)?E(A2)??E(Ak-1)?E(Ak1)??E(An) ? E(Ak)

16
The Model Window
  • Window
  • Subsequence of m consecutive snapshots
  • For t available snapshots S1, S2, , St, there
    are t-m1 windows of width m

17
The Model Object History
  • Object history of an object o over a window W
  • The sequence of changes of o over W
  • Follows an evolution E(Ai) iff, for each snapshot
    in the window, the value of Ai in the object
    history falls into corr. interval specified in
    E(Ai)
  • Follows an evolution conjunction E(A1) ? E(A2) ?
    ? E(An) iff it follows every evolution in it
  • o satisfies the TAR X ? Y iff, it has an object
    history that follows X and Y

18
The Model TAR as Hypercube
  • Each object history can be mapped to a point in
    evolution space of involved attributes
  • TAR a hypercube in this space, which contains
    the set of object histories satisfying the rule
  • Support, density strength thresholds
    constraints on number distribution of object
    histories in hypercube

19
The Model Rule Set
  • Rule set ltrmin, rmaxgt set of all rules r s.t. r
    is a specialization of rmax and a generalization
    of rmin
  • Each rule set can summarize a large no. of valid
    rules

20
Mining TARs TAR algorithm
  • Find density-based (subspace) clusters
  • Find all valid rule sets

21
Mining TARs TAR algorithm
  • Find density-based (subspace) clusters
  • Create base intervals for each attribute
  • Form base cubes from base intervals n1, m1
  • Bottom-up clustering algorithm
  • Density of an evolution cube object history
    concentration of the sparsest base cube in it
  • The Apriori property holds on density
  • Find all valid rule sets

22
Mining TARs TAR algorithm
  • Find density-based (subspace) clusters
  • Find all valid rule sets
  • Make use of the strength and support metrics
  • For rule X ? Y,
  • strength Sup(X ?Y) / (Sup(X) x Sup(Y))
  • Strength is used to prune search space

23
Pruning with the Strength Threshold
  • Property 1
  • For any rule r, ? a base rule bri
  • which is a specialization of r and
  • with strength ? that of r.
  • Implication
  • Only have to examine rules which are
    generalizations of BR (set of base rules) whose
    strength ? thres.

24
Pruning with the Strength Threshold
  • Property 2
  • For any two rules r and r where
  • r is a specialization of r, and
  • strength of r lt strength of r,
  • ? another base rule bri which is
  • a specialization of r but not r and
  • strength of bri gt strength of r.
  • Implication
  • Can skip rules which are generalizations of r
    but which do not contain any other base rule in
    BR.

25
Finding Rule Sets from Each Cluster
  • Find BR
  • For each subset of BR, explore
  • corr. search region from rule r
  • (min. bounding box of rules in BR)
  • If strength of r lt thres., ignore region
  • min-rule
  • If sup of r ? thres., min-rule ? r
  • If sup of r lt thres., search for its valid
    generalizations within region.
  • Stop when strength lt thres.
  • max-rule
  • Search similarly until a rule is found s.t. all
    of its generalizations either violate strength
    requirement or another base rule is included
  • There can be multiple max-rules for a min-rule

26
Performance Evaluation
  • 300MHz CPU with 128MB memory
  • Three synthetic datasets
  • 100,000 objects with 5 attributes
  • 100 snapshots
  • Embedded 500 rules of length 5 or less
  • User-specified thresholds
  • Density 2 (2 times the average density)
  • Support 5
  • Strength 1.3

27
Performance Evaluation
Recall
  • Precision 100 for all algorithms

28
Performance Evaluation
  • Observations
  • TAR is faster than SR and LE
  • Strength is used to prune the search space in TAR
  • Search a smaller set of candidate rules
  • Response time of TAR increases at a slower pace
    w.r.t. number of base intervals

29
Performance Evaluation
  • Real dataset
  • 20,000 objects (persons)
  • 5 attributes age, title, salary, family status
    (single, married, head of household), distance
    between persons house and a major city
  • 10 snapshots (one per year)
  • No. of base intervals 100 support 3, density 2,
    strength 1.3

30
Performance Evaluation
  • Performance of TAR alg. on real dataset
  • Time taken 260s to mine 347 rule sets
  • Examples of TARs
  • People receiving a salary raise tend to move
    further away from city center.
  • If people with a salary in the range 70K and 100K
    get a raise, the range of the raise will likely
    be from 7K to 15K.

31
Conclusions
  • A TAR model is proposed to represent correlations
    among numerical attribute evolutions.
  • A novel approach to mine TARs by first
    discovering clusters and then efficiently
    constructing rule sets is introduced.
  • Empirical evaluation shows TAR algorithm
    outperforms alternative algs. by a large margin.

32
References
  • W. Wang, J. Yang, and R. Muntz. TAR Temporal
    association rules on evolving numerical
    attributes, ICDE01.
  • R. Srikant and R. Agrawal. Mining quantitative
    association rules in large relational tables,
    SIGMOD96.
  • B. Lent, A. Swami, and J. Widom. Clustering
    association rules, ICDE97.
  • R. Agrawal, J. Gehrke, D. Gunopulos, and P.
    Raghavan. Automatic subspace clustering of high
    dimensional data for data mining application,
    SIGMOD98.
Write a Comment
User Comments (0)
About PowerShow.com