Graduate Course DataMining - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Graduate Course DataMining

Description:

Graduate Course DataMining Jun-Ki Min DataMinig Knowledge discovery in databases Association Rule A B Transactions containing A tend to also contain the items ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 19
Provided by: ackr
Category:

less

Transcript and Presenter's Notes

Title: Graduate Course DataMining


1
Graduate CourseDataMining
  • Jun-Ki Min

2
DataMinig
  • Knowledge discovery in databases
  • Association Rule
  • A?B
  • Transactions containing A tend to also contain
    the items
  • Confidence
  • The percentage of transactions containing B among
    the transaction containing A
  • Support
  • The percentage of transactions that contain both
    A and B

3
  • Fast Algorithms for Mining Association Rules

4
Problem Statement
  • I i1,i2, , im //set of items
  • general association rule
  • X?Y, where X ? I, and Y ? I, X ?Y ?
  • confidence c if c of transactions in D that
    contain X also contain Y
  • support s if s of transactions in D contain X?Y
  • Given a set of transaction D, the problem of
    mining association rules is to generate all
    association rules that have support and
    confidence greater than minsup and minconf,
    respectively

5
Problem Decomposition
  • Find all sets of items (large itemset) that have
    transaction support above minsup
  • Use large itemsets to generate the desired rules.
    For each large itemset l, find all non-empty
    subsets of l. For every such subset a, output a
    rule of the form a?(l-a) if the ratio of
    support(l) to support(a) is at least minconf.

6
Discovering Large Itemsets
  • Require multiple pass
  • 1st pass, find all large itemsets whose size is
    one.
  • In each subsequence pass, we start with a seed
    set of itemsets (candidate set) found to be large
    in the previous pass. Then compute support.
  • Anti-Monotonic
  • if sup(A) gt minSup, sup(A) gt minSup where A ?
    A

7
Aprior Algorithm
  • L1 large 1-items
  • for( k 2 Lk-1 !0 k) do
  • Ck apriori-gen(Lk-1)
  • forall transactions t ? D do
  • Ct subset(Ck,t) //cadidates contained in t
  • for all candidates c ? Ct do
  • c.count
  • end
  • Lk c ?Ckc.count gt minsup
  • end
  • Answer ?Lk

8
AprioriGen
  • insert into Ck
  • select p.item1, p.item2, , p.itermk-1,q.itemk-1
  • from Lk-1 p, Lk-1 q
  • where p.iterm1 q.iterm1,,p.itermk-2
    q.itermk-2,p.itemk-1 lt q.itermk-1
  • forall itemsets c ? Ck do
  • forall (k-1)-subsets s of c do
  • if(not(s ?Lk-1 )) then
  • delete c from Ck
  • Using Lk-1, generate super sets of k-item
  • c ?Ck? c??? k-1?? ??? ??? ?? ??? ??? ???? Lk-1?
    ???? ?? ?? c? Ck?? ????

9
Example
  • Item set I A, B, C, D, E
  • min_sup 0.4(i.e., gt2 transactions)
  • D

TID ????
100 A,C,D
200 B,C,E
300 A,B,C,E
400 B,E
10
  • Pass1
  • C1 L1

itemset support itemset support
A 2/4 A 2/4
B 3/4 B 3/4
C 3/4 C 3/4
D 1/4 E 3/4
E 3/4
11
  • Pass2
  • C2 C2
    L2

itemset itemset support itemset support
A,B A,B 1/4 A,C 2/4
A,C A,C 2/4 B,C 3/4
A.E A,E 1/4 B,E 3/4
B,C B,C 2/4 C,E 2/4
B,E B,C 3/4
C,E C,E 2/4      
12
  • Pass3
  • sup(B,C,E ) 2 and sup(B,C) 2
  • Thus, rule B,CgtE with confidence 100

itemset itemset support itemset support
B,C,E B,C,E 2/4 B,C,E 2/4
13
AprioriTid
  • Principle of Apriori is simple
  • As increase the length of itemset by 1, whole DB
    should be retrieved.
  • AprioriTid Index? ??
  • As Pass gone, the size of Index Ck is reduced.

14
AprioriTid Algorithm
  • L1 large 1-itermsets
  • C1 database D
  • for (k 2 Lk-1 ?0 k) do begin
  • Ck apriori-gen(Lk-1) //new candidate
  • Ck 0
  • forall entries t ? Ck-1 do begin
    ? (1)
  • //determine candidate itemsets in Ck contained
  • //in the transaction with identifier t.TID
  • Ct c ? Ck (c ck) ? t.set-of-itemsets
    ?
  • (c ck-1) ? t.set-of-itemsets ? (2)
  • forall candidates c ? Ct do
  • c. count
  • if (Ct ? 0) then Ck ltt.TID, Ctgt
  • end
  • Lk c ?Ck c.count min_sup
  • end
  • Answer ?k Lk
  • ck denotes kth item

15
Example
  • C1 L1
    C2

TID Set-of-ItemSet itestset support itemset support
100 A,C,D A 2/4 A,B 1/4
200 B,C,E B 3/4 A,C 2/4
300 A,B,C,E C 3/4 A,E 1/4
400 B,E E 3/4 B,C 2/4
B.E 3/4
C,E 2/4
16
  • C2 L2
    C3

TID Set-of-ItermSet ???? ??? ???? ???
100 A C A C 2/4 B C E 2/4
200 B C,B E, C E B C 2/4
300 A B,A C,A E,B C,B E,C E B E 3/4
400 B E C E 2/4
17
Example
  • C3 L3

TID Set-of-ItermSets itemset support
200 B C E B C E 2/4
300 B C E
18
Apriori HyBrid
  • Apriori and AprioriTid use the same candidate
    generation procedure and therefore count the same
    itemsets.
  • In the later passes, the number of candidate
    itemsets reduces. However, Apriori still examines
    every transaction in DB. In other hand,
    AprioriTid use Index.
  • Thus, AprioruHybrid perform Apriori in initial
    passes, then, if the size of Ck is enough small
    to fix memory, AprioriTid is performed in order
    to reduce DISK I/O.5
Write a Comment
User Comments (0)
About PowerShow.com