Mining frequent patterns in a database - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Mining frequent patterns in a database

Description:

Introduce the basic concept of data mining ... pushes user-specific constraints deep inside the mining process to improve performance. ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 40
Provided by: smal7
Category:

less

Transcript and Presenter's Notes

Title: Mining frequent patterns in a database


1
Mining frequent patterns in a database
  • Advisor Anthony J. T. Lee
  • Presenter Chun-sheng Wang

2
Outline
  • Introduce the basic concept of data mining
  • Topic 1 Mining frequent itemsets with
    multi-dimensional constraints
  • Topic 2 Mining inter-transactional patterns
  • Topic 3 Mining Inter-sequence patterns
  • Topic 4 Mining closed patterns
  • Topic 5 Mining patterns in a time-series database

3
Basic Concept of Data Mining
  • Data mining is the task of discovering knowledge
    from a large amount of data.
  • One of the fundamental data mining problems, is
    to mine frequent patterns.
  • Frequent pattern mining is to discover all the
    patterns whose supports in the database exceed a
    user-specified threshold.

4
Applications
  • Association rules
  • Financial data analysis
  • Web traversal analysis
  • Weather forecasting
  • Bioinformatic data mining
  • Multimedia data mining
  • Traversal path analysis

5
Association Rules
  • Association rule is of the form X?Y, where X and
    Y are both frequent itemsets in the given
    database and X?Y?.
  • The support of X?Y is the percentage of
    transactions in the given database that contain
    both X and Y, i.e., P(X?Y).
  • The confidence of X?Y is the percentage of
    transactions in the given database containing X
    that also contain Y, i.e., P(YX).

6
Frequent Itemsets Mining
  • frequent itemsets
  • (a)2,(b)2,(c)4,(d)2
  • (ac)2,(ad)2,(bc)2,(cd)2
  • (acd)2
  • c ? d
  • support50
  • confidence100

7
Sequential Patterns
  • A sequence is an ordered list of itemsets, and
    denoted by lts1s2slgt, where sj is an itemset.
  • sj is also called an element of the sequence, and
    denoted as (x1x2xm), where xk is an item.
  • The support of a sequence ? in a sequence
    database is the number of sequences containing ?.
  • A sequence ? is called a sequential pattern if
    its support ? min-support.

8
Sequential Patterns Mining
  • frequent patterns
  • lt(a)gt4,lt(b)gt2,lt(c)gt2
  • lt(ab)gt2,lt(c),(a)gt2,
  • lt(c),(b)gt2
  • lt(c),(ab)gt2

9
Algorithm of Mining Frequent Itemsets
  • Apriori
  • Candidate generation-andtest
  • Level-wise it iteratively generates candidate
    k-itemsets from previously found frequent
    (k-1)-itemsets, and then checks the supports of
    candidates to form frequent k-itemsets.
  • Lk-1

Lk
Ck
10
Apriori example
  • L1(a)2,(b)2,(c)4,(d)2
  • C2(ab),(ac),(ad),(bc),(bd),(cd)
  • L2(ac)2,(ad)2,(bc)2,(cd)2
  • C3(acd)
  • L3(acd)2

11
Algorithm of Mining Frequent Itemsets (contd)
  • FP-growth
  • The method constructs a compressed frequent
    pattern tree, called FP-tree.
  • A divide-and-conquer strategy is used to
    recursively decompose the mining task into a set
    of smaller tasks in conditional databases, and to
    concatenate the suffix itemset with the frequent
    itemsets generated from a conditional FP-tree.

12
T
Td
13
Algorithm of Mining Sequential Patterns-PrefixSpan
  • It finds length-1 sequential patterns in the
    target database first, and partitions the
    database into smaller projected databases with
    the prefix of each sequential pattern previously
    found.
  • The sequential patterns can be mined by
    constructing corresponding projected databases,
    each of which can be mined recursively.
  • It preserves the element order in the mining
    process.

14
(No Transcript)
15
Topic 1 Mining Frequent Itemsets with
Multi-dimensional Constraints
  • Frequent itemset mining often generates a very
    large number of frequent itemsets.
  • Only the subset of the frequent itemsets and
    association rules is of interest to users.
  • Users need additional post-processing to find
    useful ones.
  • Constraint-based mining pushes user-specific
    constraints deep inside the mining process to
    improve performance.
  • With multi-dimensional items, constraints can be
    imposed on multiple dimensional attributes.

16
Topic 1 Mining Frequent Itemsets with
Multi-dimensional Constraints
Multi-dimensional Constraints
itemID a1 a2 . am ik
(k1, k2 , km) A iA (A1,
A2,, Am) A1A.a1
17
Topic 1 Mining Frequent Itemsets with
Multi-dimensional Constraints
  • Multi-dimensional constraints can be categorized
    according to constraint properties.
  • anti-monotone, monotone, convertible and
    inconvertible
  • It can be also classified according to the number
    of sub-constraints included.
  • Single constraint against multiple dimensions,
  • Ex max(S.cost) ? min(S.price)
  • Conjunction and/or disjunction of multiple
    sub-constraints,
  • Ex (C1 S.cost ? v1) ? (C2 S.price ? v2)

18
Anti-monotone Monotone
  • Anti-monotone A constraint Ca is anti-monotone
    if and only if whenever an itemset S violates Ca,
    so does any superset of S.
  • Ex min(S) ? v
  • Monotone A constraint Cm is monotone if and only
    if whenever an itemset S satisfies Cm, so does
    any superset of S.
  • Ex max(S) ? v

19
Topic 1 Mining Frequent Itemsets with
Multi-dimensional Constraints
  • We extend constraints to place over
    multi-dimensional itemsets and develop algorithms
    to mine frequent itemsets with multi-dimensional
    constraints by extension of CFG (Constrained
    Frequent Pattern Growth),
  • Overview of our algorithm
  • Phase 1 Frequency check
  • Phase 2 Constraint check
  • Phase 3 Conditional database construction

20
Example Cam ? max(S.cost) ? min(S.price)

d-conditional Database d-conditional Database
c,a c,a


Database Database
c c,a,d c,b c,a,b,d c c,a,d c,b c,a,b,d


Frequent items c,a
Frequent items c, a, b, d
a,d-conditional Database



21
Results
22
  • Anthony J. T. Lee, Wan-chuen Lin and Chun-sheng
    Wang, Mining association rules with
    multi-dimensional constraints, The Journal of
    Systems and Software, Vol. 79, No. 1, pp. 79-92,
    2006.

23
Topic 2 Mining Inter-transactional Patterns
  • A transaction could be the items bought by the
    same customer, or the events happened on the same
    day, and so on.
  • Intra-transactional association rules
    associations among items within the same
    transaction.
  • Ex buy (X, diapers) gt buy (X, beer)
    support80
  • Inter-transactional association rules
    association relations across different
    transactions.
  • Ex If the prices of IBM and SUN go up,
    Microsofts will most likely 80 increases the
    next day.

24
Topic 2 Mining Inter-transactional Patterns
  • Extended transaction denoted by
  • z(t2-t1)u1(t2-t1), u2(t2-t1), , un(t2-t1)
  • Extended item denoted as u1(t2-t1)
  • Reference point t1
  • Megatransaction y z1(0)?z2(t2-t1)? ...
    ?zk(tk-t1)
  • Maxspan tk-t1 lt maxspan in y

25
Example
26
ITP-miner Example
27
Results
28
Topic 3 Mining Inter-sequence Patterns
  • Inter-sequence model

Transaction ID
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
Transaction Time
ltc(ab)d(ad)gt
lt gt
ltdd(ac)bdgt
ltbcgt
lt(bc)cbgt
lte(ac)bacgt
ltb(ab)ccgt
ltabgt
ltaccgt
ltceacc(ce)gt
29
Application Example
  • The degree of daily stock price movement can also
    be viewed as a sequence.
  • The inter-transaction association rule
  • If stock A and stock B go up on day 1, stock C
    and stock D are very likely to go up on day 3.
  • Rule ?0(AB) ? ?2(CD)
  • The inter-sequence association rule
  • If stock B goes up more than stock A on day 1,
    stock D is very likely to go up more than stock C
    on day 3.
  • Rule ?0ltBAgt ? ?2ltDCgt

30
Example
The database
  • min_support 3
  • maxspan 1

31
ISP-Miner Algorithm
32
ISP-Miner Algorithm
33
Results
34
Topic 4 Mining Closed Patterns
  • frequent itemsets
  • (a)2,(b)2,(c)4,(d)2
  • (ac)2,(ad)2,(bc)2,(cd)2
  • (acd)2
  • Closed itemsets
  • (c)4,(bc)2,(acd)2

35
Topic 5 Mining Closed Patterns in a Time-series
Database
  • A time-series segment is an ordered and
    continuous list in the form t1, t2, , tm
    describing the property of the subject over time.
  • Step 1 Find the frequent segments and points in
    each segment-set. (suffix tree construction)
  • Step 2 Use those frequent segment-sets to find
    the associations among them. (suffix tree closed
    patterns mining)

36
Topic 5 Mining Closed Patterns in a Time-series
Database
37
Step 1 Frequent Segment Discovery (Suffix Tree
Approach)
38
Thank You!
39
References
  • R. Agrawal and R. Srikant. Fast algorithms for
    mining association rules, In Proceeding of Int.
    Conf. on Very Large Data Bases, pages 487-499,
    Santiago, Chile, September 1994.
  • H. Lu, L. Feng, and J. Han, Beyond
    intratransaction association analysis mining
    multidimensional inter-transaction association
    rules. ACM Transactions on Information Systems,
    18(4), pages 423-454, October 2000.
  • J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H.
    Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. Mining
    Sequential Patterns by Pattern-Growth The
    PrefixSpan Approach. IEEE Transactions on
    Knowledge and Data Engineering, 16(10), pages
    1424-1440, 2004.
  • A. K. H. Tung, H. Lu, J. Han, and L. Feng.
    Efficient mining of intertransaction association
    rules. IEEE Transactions on Knowledge and Data
    Engineering, 15(1), pages 43-56, January 2003.
  • J. Han, J. Pei, Y. Yin and R. Mao, Mining
    Frequent Patterns without Candidate Generation A
    Frequent-Pattern Tree Approach, Data Mining and
    Knowledge Discovery, 8(1)53-87, 2004.
  • Anthony J. T. Lee, Wan-chuen Lin and Chun-sheng
    Wang, Mining association rules with
    multi-dimensional constraints, The Journal of
    Systems and Software, Vol. 79, No. 1, pp. 79-92,
    2006.
Write a Comment
User Comments (0)
About PowerShow.com