Title: Online Association Rule Mining
1Online Association Rule Mining
- ???? ????
- Data Engineering Lab.
- ? ? ?
2Abstract
- to compute large itemsets online
- at most two scans
- any time during the first scan, the user is free
to change the support threshold - for a transaction sequence
- from network
- too large to be stored locally for a rescan
- using a new forward-pruning technique
3Introduction
- ??? algorithms - offline
- given the user-specified support threshold
- often several times scans
- in general, the user does not know an appropriate
support threshold in advance - result in useless or misleading rules
- an algorithm to be online
- continuous feedback
- user controllable during processing
- a deterministic and accurate result
4Sketch of the Algorithm
- CARMA uses two distinct algorithms
- PhaseI
- continuously constructs a lattice of all
potentially large itemsets - for each itemset v, (see figure 1)
- count(v) the number of occurrences of v since v
was inserted - in the lattice
- firstTrans(v) the index of the transaction at
which v was - inserted in the lattice
- maxMissed(v) upper bound on the number of
occurrences of - v before v was inserted
in the lattice
5(No Transcript)
6- for any itemset v in the lattice,
- minSupport(v) count(v)/i
- maxSupport(v) (maxMissed(v) count(v))/i
- at the end of the transaction sequence, the
lattice contains a superset of all large itemset - PhaseII
- initially removes all small itemsets
- determines the precise number of occurrences
- continuously removes all small itemsets using
forward-pruning technique - end up with the set of all large itemsets along
with their support
7support lattice support sequence
- support lattice
- contains all itemset v with supporti(v) ? s
- superset of all large itemsets in the firtst i
transactions w.r.t. support threshold s - support sequence
- for each transaction processed, the user can
specify an arbitrary support threshold - a sequence of support threshold ? (?1, ?2, )
- ???i ceiling of ? up to i
- the least monotone decreasing sequence which is
up to if pointwise greater or equal to ? and 0
otherwise - avgi(?) the running average of ? up to i
8(No Transcript)
9PhaseI Algorithm
- V support lattice
- initialize
- V to ?
- count(?) 0, firstTrans(?) 0, maxMissed(?)
0 - to maintain the lattice
- increment
- insert
- prune
10- Increment
- increment count(v) for all itemsets v ? V that
are contained in ti - Insert
- insert a subset v of ti in V
- if and only if all subsets w of v are already
contained in V - and are potentially large (? maxSupport(w) ? ?i )
- firstTrans(v) i, count(v) 1
- maxMissed(v)
- minmax?(i-1)avgi-1(???i-1)?, v-1,
- maxMissed(w)
count(w) w ? v
11Function PhaseI( transaction sequence(t1,,tn),
support sequence ?
(?1,,?n) ) support lattice support lattice
V begin V ?, maxMissed(v) 0,
firstTrans(v) 0, count(v) 0 for i from 1
to n do Increment for all v ? V with
v ? ti do count(v) Insert for all v
? ti with v ? V do if ?w ? v w ? V and
maxSupport(w) ? ?i then
V V ? v
maxMissed(v)
minmax?(i-1)avgi-1(??
?i-1)?, v-1,
maxMissed(w) count(w) w ?
v
firstTrans(v) i
count(v) 1
fi od Prune remove v ? V
from V only if maxSupport(v) ? ?i
if v ? V is removed, remove all
supersets as well od return V end
12- example
- T (a, b, c, a, b, b, d)
- ? (0.3, 0.5, 0.4)
13PhaseII Algorithm
- preliminary PhaseII
- V support lattice computed by PhaseI
- uses the last user-specified support threshold as
pruning threshold - initially removes all trivially small itemsets
from V - scanning the transaction sequence
- count(v), maxMissed(v)--
- if v ? ti and ft gt i
- maxMissed(w) count(v) - count(w)
- if maxSupport(w) gt maxSupport(v) w ? v
- stops as soon as the current transaction index is
past firstTrans for all itemsets in the lattice - contain all large itemsets with the precise
support
14- Forward Pruning
- remove some small singleton set v and all its
descendant from V, before firstTrans(v) even if
maxSupport(v) ? ?n - if v1 and v does not occur in t1,,ti and
- then prune v and all its descendants from V
- Theorem 2
- if v is singleton set which does not occur in the
first i transaction and - then supportn(v) ? ?n
15Function PhaseII( support lattice V, transaction
sequence(t1,,tn),
support sequence ? ) support lattice integer
ft, i 0 begin InitialPrune V V\v ?
V maxSupport(v) ? ?n Rescan
while ?v ? V i ? firstTrans(v) do
i
for all v ? V do
ft firstTrans(v) Adjust
if v ? ti and ft ? i then
count(v),
maxMissed(v)--
fi if ft
i then
maxMissed(v) 0
for all w ? V v ? w and
maxSupport(w) ? maxSupport(v) do
maxMissed(w)
count(v) - count(w)
od fi
16 Prune if maxSupport(v)
lt ?n then V V\v fi Forward
Prune if v 1 and v does not occur in
t1,,ti and ?n??n?
- count(v) ?i?avgi(???i)?
? ?(ft-1)avgft-1(???ft-1)? then
V V\w ? V v ?
w fi od od
return V end
17CARMA
Function CARMA( trnasaction sequence T, support
sequence ? )
support lattice support lattice V begin V
PhaseI( T, ? ) V PhaseII( V, T, ? )
return V end