Online Association Rule Mining - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Online Association Rule Mining

Description:

Online Association Rule Mining Data Engineering Lab. Abstract to compute large itemsets online at most two scans any time during ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 18
Provided by: magenta
Category:

less

Transcript and Presenter's Notes

Title: Online Association Rule Mining


1
Online Association Rule Mining
  • ???? ????
  • Data Engineering Lab.
  • ? ? ?

2
Abstract
  • to compute large itemsets online
  • at most two scans
  • any time during the first scan, the user is free
    to change the support threshold
  • for a transaction sequence
  • from network
  • too large to be stored locally for a rescan
  • using a new forward-pruning technique

3
Introduction
  • ??? algorithms - offline
  • given the user-specified support threshold
  • often several times scans
  • in general, the user does not know an appropriate
    support threshold in advance
  • result in useless or misleading rules
  • an algorithm to be online
  • continuous feedback
  • user controllable during processing
  • a deterministic and accurate result

4
Sketch of the Algorithm
  • CARMA uses two distinct algorithms
  • PhaseI
  • continuously constructs a lattice of all
    potentially large itemsets
  • for each itemset v, (see figure 1)
  • count(v) the number of occurrences of v since v
    was inserted
  • in the lattice
  • firstTrans(v) the index of the transaction at
    which v was
  • inserted in the lattice
  • maxMissed(v) upper bound on the number of
    occurrences of
  • v before v was inserted
    in the lattice

5
(No Transcript)
6
  • for any itemset v in the lattice,
  • minSupport(v) count(v)/i
  • maxSupport(v) (maxMissed(v) count(v))/i
  • at the end of the transaction sequence, the
    lattice contains a superset of all large itemset
  • PhaseII
  • initially removes all small itemsets
  • determines the precise number of occurrences
  • continuously removes all small itemsets using
    forward-pruning technique
  • end up with the set of all large itemsets along
    with their support

7
support lattice support sequence
  • support lattice
  • contains all itemset v with supporti(v) ? s
  • superset of all large itemsets in the firtst i
    transactions w.r.t. support threshold s
  • support sequence
  • for each transaction processed, the user can
    specify an arbitrary support threshold
  • a sequence of support threshold ? (?1, ?2, )
  • ???i ceiling of ? up to i
  • the least monotone decreasing sequence which is
    up to if pointwise greater or equal to ? and 0
    otherwise
  • avgi(?) the running average of ? up to i

8
(No Transcript)
9
PhaseI Algorithm
  • V support lattice
  • initialize
  • V to ?
  • count(?) 0, firstTrans(?) 0, maxMissed(?)
    0
  • to maintain the lattice
  • increment
  • insert
  • prune

10
  • Increment
  • increment count(v) for all itemsets v ? V that
    are contained in ti
  • Insert
  • insert a subset v of ti in V
  • if and only if all subsets w of v are already
    contained in V
  • and are potentially large (? maxSupport(w) ? ?i )
  • firstTrans(v) i, count(v) 1
  • maxMissed(v)
  • minmax?(i-1)avgi-1(???i-1)?, v-1,
  • maxMissed(w)
    count(w) w ? v

11
Function PhaseI( transaction sequence(t1,,tn),
support sequence ?
(?1,,?n) ) support lattice support lattice
V begin V ?, maxMissed(v) 0,
firstTrans(v) 0, count(v) 0 for i from 1
to n do Increment for all v ? V with
v ? ti do count(v) Insert for all v
? ti with v ? V do if ?w ? v w ? V and
maxSupport(w) ? ?i then
V V ? v
maxMissed(v)
minmax?(i-1)avgi-1(??
?i-1)?, v-1,
maxMissed(w) count(w) w ?
v
firstTrans(v) i
count(v) 1
fi od Prune remove v ? V
from V only if maxSupport(v) ? ?i
if v ? V is removed, remove all
supersets as well od return V end
12
  • example
  • T (a, b, c, a, b, b, d)
  • ? (0.3, 0.5, 0.4)

13
PhaseII Algorithm
  • preliminary PhaseII
  • V support lattice computed by PhaseI
  • uses the last user-specified support threshold as
    pruning threshold
  • initially removes all trivially small itemsets
    from V
  • scanning the transaction sequence
  • count(v), maxMissed(v)--
  • if v ? ti and ft gt i
  • maxMissed(w) count(v) - count(w)
  • if maxSupport(w) gt maxSupport(v) w ? v
  • stops as soon as the current transaction index is
    past firstTrans for all itemsets in the lattice
  • contain all large itemsets with the precise
    support

14
  • Forward Pruning
  • remove some small singleton set v and all its
    descendant from V, before firstTrans(v) even if
    maxSupport(v) ? ?n
  • if v1 and v does not occur in t1,,ti and
  • then prune v and all its descendants from V
  • Theorem 2
  • if v is singleton set which does not occur in the
    first i transaction and
  • then supportn(v) ? ?n

15
Function PhaseII( support lattice V, transaction
sequence(t1,,tn),
support sequence ? ) support lattice integer
ft, i 0 begin InitialPrune V V\v ?
V maxSupport(v) ? ?n Rescan
while ?v ? V i ? firstTrans(v) do
i
for all v ? V do
ft firstTrans(v) Adjust
if v ? ti and ft ? i then
count(v),
maxMissed(v)--
fi if ft
i then
maxMissed(v) 0
for all w ? V v ? w and

maxSupport(w) ? maxSupport(v) do
maxMissed(w)
count(v) - count(w)
od fi
16
Prune if maxSupport(v)
lt ?n then V V\v fi Forward
Prune if v 1 and v does not occur in
t1,,ti and ?n??n?
- count(v) ?i?avgi(???i)?
? ?(ft-1)avgft-1(???ft-1)? then
V V\w ? V v ?
w fi od od
return V end
17
CARMA
Function CARMA( trnasaction sequence T, support
sequence ? )
support lattice support lattice V begin V
PhaseI( T, ? ) V PhaseII( V, T, ? )
return V end
Write a Comment
User Comments (0)
About PowerShow.com