Title: Mining frequent patterns in a database
1Mining frequent patterns in a database
- Advisor Anthony J. T. Lee
- Presenter Chun-sheng Wang
2Outline
- Introduce the basic concept of data mining
- Topic 1 Mining frequent itemsets with
multi-dimensional constraints - Topic 2 Mining inter-transactional patterns
- Topic 3 Mining Inter-sequence patterns
- Topic 4 Mining closed patterns
- Topic 5 Mining patterns in a time-series database
3Basic Concept of Data Mining
- Data mining is the task of discovering knowledge
from a large amount of data. - One of the fundamental data mining problems, is
to mine frequent patterns. - Frequent pattern mining is to discover all the
patterns whose supports in the database exceed a
user-specified threshold.
4Applications
- Association rules
- Financial data analysis
- Web traversal analysis
- Weather forecasting
- Bioinformatic data mining
- Multimedia data mining
- Traversal path analysis
5Association Rules
- Association rule is of the form X?Y, where X and
Y are both frequent itemsets in the given
database and X?Y?. - The support of X?Y is the percentage of
transactions in the given database that contain
both X and Y, i.e., P(X?Y). - The confidence of X?Y is the percentage of
transactions in the given database containing X
that also contain Y, i.e., P(YX).
6Frequent Itemsets Mining
- frequent itemsets
- (a)2,(b)2,(c)4,(d)2
- (ac)2,(ad)2,(bc)2,(cd)2
- (acd)2
- c ? d
- support50
- confidence100
7Sequential Patterns
- A sequence is an ordered list of itemsets, and
denoted by lts1s2slgt, where sj is an itemset. - sj is also called an element of the sequence, and
denoted as (x1x2xm), where xk is an item. - The support of a sequence ? in a sequence
database is the number of sequences containing ?. - A sequence ? is called a sequential pattern if
its support ? min-support.
8Sequential Patterns Mining
- frequent patterns
- lt(a)gt4,lt(b)gt2,lt(c)gt2
- lt(ab)gt2,lt(c),(a)gt2,
- lt(c),(b)gt2
- lt(c),(ab)gt2
9Algorithm of Mining Frequent Itemsets
- Apriori
- Candidate generation-andtest
- Level-wise it iteratively generates candidate
k-itemsets from previously found frequent
(k-1)-itemsets, and then checks the supports of
candidates to form frequent k-itemsets. - Lk-1
Lk
Ck
10Apriori example
- L1(a)2,(b)2,(c)4,(d)2
- C2(ab),(ac),(ad),(bc),(bd),(cd)
- L2(ac)2,(ad)2,(bc)2,(cd)2
- C3(acd)
- L3(acd)2
11Algorithm of Mining Frequent Itemsets (contd)
- FP-growth
- The method constructs a compressed frequent
pattern tree, called FP-tree. - A divide-and-conquer strategy is used to
recursively decompose the mining task into a set
of smaller tasks in conditional databases, and to
concatenate the suffix itemset with the frequent
itemsets generated from a conditional FP-tree.
12T
Td
13Algorithm of Mining Sequential Patterns-PrefixSpan
- It finds length-1 sequential patterns in the
target database first, and partitions the
database into smaller projected databases with
the prefix of each sequential pattern previously
found. - The sequential patterns can be mined by
constructing corresponding projected databases,
each of which can be mined recursively. - It preserves the element order in the mining
process.
14(No Transcript)
15Topic 1 Mining Frequent Itemsets with
Multi-dimensional Constraints
- Frequent itemset mining often generates a very
large number of frequent itemsets. - Only the subset of the frequent itemsets and
association rules is of interest to users. - Users need additional post-processing to find
useful ones. - Constraint-based mining pushes user-specific
constraints deep inside the mining process to
improve performance. - With multi-dimensional items, constraints can be
imposed on multiple dimensional attributes.
16Topic 1 Mining Frequent Itemsets with
Multi-dimensional Constraints
Multi-dimensional Constraints
itemID a1 a2 . am ik
(k1, k2 , km) A iA (A1,
A2,, Am) A1A.a1
17Topic 1 Mining Frequent Itemsets with
Multi-dimensional Constraints
- Multi-dimensional constraints can be categorized
according to constraint properties. - anti-monotone, monotone, convertible and
inconvertible - It can be also classified according to the number
of sub-constraints included. - Single constraint against multiple dimensions,
- Ex max(S.cost) ? min(S.price)
- Conjunction and/or disjunction of multiple
sub-constraints, - Ex (C1 S.cost ? v1) ? (C2 S.price ? v2)
18Anti-monotone Monotone
- Anti-monotone A constraint Ca is anti-monotone
if and only if whenever an itemset S violates Ca,
so does any superset of S. - Ex min(S) ? v
- Monotone A constraint Cm is monotone if and only
if whenever an itemset S satisfies Cm, so does
any superset of S. - Ex max(S) ? v
19Topic 1 Mining Frequent Itemsets with
Multi-dimensional Constraints
- We extend constraints to place over
multi-dimensional itemsets and develop algorithms
to mine frequent itemsets with multi-dimensional
constraints by extension of CFG (Constrained
Frequent Pattern Growth), - Overview of our algorithm
- Phase 1 Frequency check
- Phase 2 Constraint check
- Phase 3 Conditional database construction
20Example Cam ? max(S.cost) ? min(S.price)
d-conditional Database d-conditional Database
c,a c,a
Database Database
c c,a,d c,b c,a,b,d c c,a,d c,b c,a,b,d
Frequent items c,a
Frequent items c, a, b, d
a,d-conditional Database
21Results
22- Anthony J. T. Lee, Wan-chuen Lin and Chun-sheng
Wang, Mining association rules with
multi-dimensional constraints, The Journal of
Systems and Software, Vol. 79, No. 1, pp. 79-92,
2006.
23Topic 2 Mining Inter-transactional Patterns
- A transaction could be the items bought by the
same customer, or the events happened on the same
day, and so on. - Intra-transactional association rules
associations among items within the same
transaction. - Ex buy (X, diapers) gt buy (X, beer)
support80 - Inter-transactional association rules
association relations across different
transactions. - Ex If the prices of IBM and SUN go up,
Microsofts will most likely 80 increases the
next day.
24Topic 2 Mining Inter-transactional Patterns
- Extended transaction denoted by
- z(t2-t1)u1(t2-t1), u2(t2-t1), , un(t2-t1)
- Extended item denoted as u1(t2-t1)
- Reference point t1
- Megatransaction y z1(0)?z2(t2-t1)? ...
?zk(tk-t1) - Maxspan tk-t1 lt maxspan in y
25Example
26ITP-miner Example
27Results
28Topic 3 Mining Inter-sequence Patterns
Transaction ID
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
Transaction Time
ltc(ab)d(ad)gt
lt gt
ltdd(ac)bdgt
ltbcgt
lt(bc)cbgt
lte(ac)bacgt
ltb(ab)ccgt
ltabgt
ltaccgt
ltceacc(ce)gt
29Application Example
- The degree of daily stock price movement can also
be viewed as a sequence. - The inter-transaction association rule
- If stock A and stock B go up on day 1, stock C
and stock D are very likely to go up on day 3. - Rule ?0(AB) ? ?2(CD)
- The inter-sequence association rule
- If stock B goes up more than stock A on day 1,
stock D is very likely to go up more than stock C
on day 3. - Rule ?0ltBAgt ? ?2ltDCgt
30Example
The database
31ISP-Miner Algorithm
32ISP-Miner Algorithm
33Results
34Topic 4 Mining Closed Patterns
- frequent itemsets
- (a)2,(b)2,(c)4,(d)2
- (ac)2,(ad)2,(bc)2,(cd)2
- (acd)2
- Closed itemsets
- (c)4,(bc)2,(acd)2
35Topic 5 Mining Closed Patterns in a Time-series
Database
- A time-series segment is an ordered and
continuous list in the form t1, t2, , tm
describing the property of the subject over time. - Step 1 Find the frequent segments and points in
each segment-set. (suffix tree construction) - Step 2 Use those frequent segment-sets to find
the associations among them. (suffix tree closed
patterns mining)
36Topic 5 Mining Closed Patterns in a Time-series
Database
37Step 1 Frequent Segment Discovery (Suffix Tree
Approach)
38Thank You!
39References
- R. Agrawal and R. Srikant. Fast algorithms for
mining association rules, In Proceeding of Int.
Conf. on Very Large Data Bases, pages 487-499,
Santiago, Chile, September 1994. - H. Lu, L. Feng, and J. Han, Beyond
intratransaction association analysis mining
multidimensional inter-transaction association
rules. ACM Transactions on Information Systems,
18(4), pages 423-454, October 2000. - J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H.
Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. Mining
Sequential Patterns by Pattern-Growth The
PrefixSpan Approach. IEEE Transactions on
Knowledge and Data Engineering, 16(10), pages
1424-1440, 2004. - A. K. H. Tung, H. Lu, J. Han, and L. Feng.
Efficient mining of intertransaction association
rules. IEEE Transactions on Knowledge and Data
Engineering, 15(1), pages 43-56, January 2003. - J. Han, J. Pei, Y. Yin and R. Mao, Mining
Frequent Patterns without Candidate Generation A
Frequent-Pattern Tree Approach, Data Mining and
Knowledge Discovery, 8(1)53-87, 2004. - Anthony J. T. Lee, Wan-chuen Lin and Chun-sheng
Wang, Mining association rules with
multi-dimensional constraints, The Journal of
Systems and Software, Vol. 79, No. 1, pp. 79-92,
2006.