Title: MAIDS: MiningAlarmingIncidents inDataStreams
1MAIDS Mining Alarming Incidents in Data Streams
- A discussion on the MAIDS project
- August 22, 2003
2FPGrowth (1) FP-Tree Construction
TID Items bought (ordered) frequent
items 100 f, a, c, d, g, i, m, p f, c, a, m,
p 200 a, b, c, f, l, m, o f, c, a, b,
m 300 b, f, h, j, o, w f, b 400 b, c,
k, s, p c, b, p 500 a, f, c, e, l, p, m,
n f, c, a, m, p
min_support 3
- Scan DB once, find frequent 1-itemset
- Sort frequent items in frequency descending
order, f-list - Scan DB again, construct FP-tree
F-listf-c-a-b-m-p
3FPGrowth (2) FP-Tree Mining
- Start at the frequent item header table in the
FP-tree - Traverse the FP-tree by following the link of
each frequent item p - Accumulate all of transformed prefix paths of
item p to form ps conditional pattern base
Conditional pattern bases item cond. pattern
base c f3 a fc3 b fca1, f1, c1 m fca2,
fcab1 p fcam2, cb1
4Mining Frequent Patterns for Stream Data
- Frequent pattern mining is valuable in stream
applications - e.g., network intrusion mining (Dokas, et al02)
- Mining precise freq. patterns in stream data
unrealistic - Even store them in a compressed form, such as
FPtree - How to mine frequent patterns with good
approximation? - Approximate frequent patterns (Manku Motwani,
VLDB02) - Major ideas not tracing items until it becomes
first frequent - Adv guarantee error bound
- Disadv keep a large set of traces
- Our comments
- Keep only current frequent patterns? No changes
can be detected
5Our Approach on Frequent Stream Patterns
- Approach 1 Mining only interested itemsets
- Identify interested items in stream environment
- Keep precise/compressed history in tilted time
window - Mining using FP-tree and related fast mining
method - Approach 2 Mining approximate itemsets (with
error bounds) - C. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu,
Mining Frequent Patterns in Data Streams at
Multiple Time Granularities, Next Gen. Data
Mining, MIT Press, 2003 - Keep pattern-trees at the tilted time window
frame (using tree-sharing method) - Mining evolution and dramatic changes of frequent
patterns
6FP-tree Tilted-time window in tree node
- Each node in FPtree has a tilted time window
- Merge counts when time flows across boundary
- Easy to trace object evolution
- Hard to derive patterns for all objects within
one period
7FP-tree Tree in tilted-time window slot
- Each time slot has an FPtree (for that time
period) - Merge FPtrees when time flows across boundary
- Hard to trace object evolution
- Easy to derive patterns within one period
8Frequent-Pattern Growth Approach
- Depth-first growth of patterns using local
frequent items in projected databases a
divide-and-conquer approach - FPGrowth (Han, et al._at_SIGMOD00)
- Tree-Projection (Agarwal, et al._at_J. P. D.
Comp.01) - Opportunistic Projection (OP) (Liu et al._at_KDD02)
- Mining closed itemsets CLOSET (Pei, et
al._at_DMKD00), CLOSET (Wang, et al. _at_KDD03) - Mine only frequent 1-itemset in each projected
DB, and grow patterns in corresponding projected
DBs
9Query Item-Based Mining of FP-tree
- Most queries are interested in item-centered
patterns - Item-based mining of FP-tree extraction and
mining
10www.cs.uiuc.edu/hanj