Huffman Codes and Asssociation Rules (II) - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Huffman Codes and Asssociation Rules (II)

Description:

We have to first find out the frequent itemset using Apriori algorithm. ... Now, Join step is complete and Prune step will be used to reduce the size of C3. ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 74
Provided by: Lee144
Category:

less

Transcript and Presenter's Notes

Title: Huffman Codes and Asssociation Rules (II)


1
Huffman Codes and Asssociation Rules (II)
Lecture 15
  • Prof. Sin-Min Lee
  • Department of Computer Science

2
Huffman Code Example
  • Given A B C D E
  • 3 1 2 4 6
  • By using an increasing algorithm (changing from
    smallest to largest), it changes to
  • B C A D E
  • 1 2 3 4 6

3
Huffman Code Example Step 1
  • Because B and C are the lowest values, they can
    be appended. The new value is 3

4
Huffman Code Example Step 2
  • Reorder the problem using the increasing
    algorithm again. This gives us
  • BC A D E
  • 3 3 4 6

5
Huffman Code Example Step 3
  • Doing another append will give

6
Huffman Code Example Step 4
  • From the initial BC A D E code we get
  • D E ABC
  • 6 6
  • D E BCA
  • 4 6 6
  • D ABC E
  • 4 6 6
  • D BCA E
  • 4 6 6

7
Huffman Code Example Step 5
  • Taking derivates from the previous step, we get
  • D E BCA
  • 4 6 6
  • E DBCA
  • 6 10
  • DABC E
  • 10 6
  • D E ABC
  • 4 6 6

8
Huffman Code Example Step 6
  • Taking derivates from the previous step, we get
  • BCA D E
  • 6 4 6
  • E DBCA
  • 6 10
  • E DABC
  • 40 10
  • ABC D E
  • 6 4 6

9
Huffman Code Example Step 7
  • After the previous step, were supposed to map a
    1 to each right branch and a 0 to each left
    branch. The results of the codes are

10
Example
  • Itemsmilk, coke, pepsi, beer, juice.
  • Support 3 baskets.
  • B1 m, c, b B2 m, p, j B3 m, b
  • B4 c, j B5 m, p, b B6 m,
    c, b, j
  • B7 c, b, j B8 b, c
  • Frequent itemsets m, c, b, j, m,
    b, c, b, j, c.

11
Association Rules
  • Association rule R Itemset1 gt Itemset2
  • Itemset1, 2 are disjoint and Itemset2 is
    non-empty
  • meaning if transaction includes Itemset1 then
    it also has Itemset2
  • Examples
  • A,B gt E,C
  • A gt B,C

12
Example
  • B1 m, c, b B2 m, p, j
  • B3 m, b B4 c, j
  • B5 m, p, b B6 m, c, b, j
  • B7 c, b, j B8 b, c
  • An association rule m, b ? c.
  • Confidence 2/4 50.

_ _

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
From Frequent Itemsets to Association Rules
  • Q Given frequent set A,B,E, what are possible
    association rules?
  • A gt B, E
  • A, B gt E
  • A, E gt B
  • B gt A, E
  • B, E gt A
  • E gt A, B
  • __ gt A,B,E (empty rule), or true gt A,B,E

18
Classification vs Association Rules
  • Classification Rules
  • Focus on one target field
  • Specify class in all cases
  • Measures Accuracy
  • Association Rules
  • Many target fields
  • Applicable in some cases
  • Measures Support, Confidence, Lift

19
Rule Support and Confidence
  • Suppose R I gt J is an association rule
  • sup (R) sup (I ? J) is the support count
  • support of itemset I ? J (I or J)
  • conf (R) sup(J) / sup(R) is the confidence of R
  • fraction of transactions with I ? J that have J
  • Association rules with minimum support and count
    are sometimes called strong rules

20
Association Rules Example
  • Q Given frequent set A,B,E, what association
    rules have minsup 2 and minconf 50 ?
  • A, B gt E conf2/4 50
  • A, E gt B conf2/2 100
  • B, E gt A conf2/2 100
  • E gt A, B conf2/2 100
  • Dont qualify
  • A gtB, E conf2/6 33lt 50
  • B gt A, E conf2/7 28 lt 50
  • __ gt A,B,E conf 2/9 22 lt 50

21
Find Strong Association Rules
  • A rule has the parameters minsup and minconf
  • sup(R) gt minsup and conf (R) gt minconf
  • Problem
  • Find all association rules with given minsup and
    minconf
  • First, find all frequent itemsets

22
Finding Frequent Itemsets
  • Start by finding one-item sets (easy)
  • Q How?
  • A Simply count the frequencies of all items

23
Finding itemsets next level
  • Apriori algorithm (Agrawal Srikant)
  • Idea use one-item sets to generate two-item
    sets, two-item sets to generate three-item sets,
  • If (A B) is a frequent item set, then (A) and (B)
    have to be frequent item sets as well!
  • In general if X is frequent k-item set, then all
    (k-1)-item subsets of X are also frequent
  • Compute k-item set by merging (k-1)-item sets

24
(No Transcript)
25
Finding Association Rules
  • A typical question find all association rules
    with support s and confidence c.
  • Note support of an association rule is the
    support of the set of items it mentions.
  • Hard part finding the high-support (frequent )
    itemsets.
  • Checking the confidence of association rules
    involving those sets is relatively easy.

26
Naïve Algorithm
  • A simple way to find frequent pairs is
  • Read file once, counting in main memory the
    occurrences of each pair.
  • Expand each basket of n items into its n (n
    -1)/2 pairs.
  • Fails if items-squared exceeds main memory.

27
(No Transcript)
28
Filter
Filter
Construct
Construct
C1
L1
C2
L2
C3
First pass
Second pass
29
Agrawal, Srikant 94
Fast Algorithms for Mining Association Rules, by
Rakesh Agrawal and Ramakrishan Sikant, IBM
Almaden Research Center
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
C1
L1
Database
Items TID
1 3 4 100
2 3 5 200
1 2 3 5 300
2 5 400
Set-of-itemsets TID
1,3,4 100
2,3,5 200
1,2,3,5 300
2,5 400
Support Itemset
2 1
3 2
3 3
3 5
C2
L2
C2
itemset
1 2
1 3
1 5
2 3
2 5
3 5
Set-of-itemsets TID
1 3 100
2 3,2 5 3 5 200
1 2,1 3,1 5, 2 3, 2 5, 3 5 300
2 5 400
Support Itemset
2 1 3
3 2 3
3 2 5
2 3 5
C3
L3
C3
Set-of-itemsets TID
2 3 5 200
2 3 5 300
Support Itemset
2 2 3 5
itemset
2 3 5
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
  • Dynamic Programming Approach
  • Want proof of principle of optimality and
    overlapping subproblems
  • Principle of Optimality
  • The optimal solution to Lk includes the
    optimal solution of Lk-1
  • Proof by contradiction
  • Overlapping Subproblems
  • Lemma of every subset of a frequent item set is a
    frequent item set
  • Proof by contradiction

44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
The Apriori Algorithm Example
  • Consider a database, D , consisting of 9
    transactions.
  • Suppose min. support count required is 2 (i.e.
    min_sup 2/9 22 )
  • Let minimum confidence required is 70.
  • We have to first find out the frequent itemset
    using Apriori algorithm.
  • Then, Association rules will be generated using
    min. support min. confidence.

TID List of Items
T100 I1, I2, I5
T100 I2, I4
T100 I2, I3
T100 I1, I2, I4
T100 I1, I3
T100 I2, I3
T100 I1, I3
T100 I1, I2 ,I3, I5
T100 I1, I2, I3
57
Step 1 Generating 1-itemset Frequent Pattern
Compare candidate support count with minimum
support count
Scan D for count of each candidate
Itemset Sup.Count
I1 6
I2 7
I3 6
I4 2
I5 2
Itemset Sup.Count
I1 6
I2 7
I3 6
I4 2
I5 2
C1
L1
  • In the first iteration of the algorithm, each
    item is a member of the set of candidate.
  • The set of frequent 1-itemsets, L1 , consists of
    the candidate 1-itemsets satisfying minimum
    support.

58
Step 2 Generating 2-itemset Frequent Pattern
Generate C2 candidates from L1
Compare candidate support count with minimum
support count
Itemset
I1, I2
I1, I3
I1, I4
I1, I5
I2, I3
I2, I4
I2, I5
I3, I4
I3, I5
I4, I5
Itemset Sup. Count
I1, I2 4
I1, I3 4
I1, I4 1
I1, I5 2
I2, I3 4
I2, I4 2
I2, I5 2
I3, I4 0
I3, I5 1
I4, I5 0
Itemset Sup Count
I1, I2 4
I1, I3 4
I1, I5 2
I2, I3 4
I2, I4 2
I2, I5 2
Scan D for count of each candidate
L2
C2
C2
59
Step 2 Generating 2-itemset Frequent Pattern
Cont.
  • To discover the set of frequent 2-itemsets, L2 ,
    the algorithm uses L1 Join L1 to generate a
    candidate set of 2-itemsets, C2.
  • Next, the transactions in D are scanned and the
    support count for each candidate itemset in C2 is
    accumulated (as shown in the middle table).
  • The set of frequent 2-itemsets, L2 , is then
    determined, consisting of those candidate
    2-itemsets in C2 having minimum support.
  • Note We havent used Apriori Property yet.

60
Step 3 Generating 3-itemset Frequent Pattern
Compare candidate support count with min support
count
Itemset Sup. Count
I1, I2, I3 2
I1, I2, I5 2
Itemset Sup Count
I1, I2, I3 2
I1, I2, I5 2
Scan D for count of each candidate
Scan D for count of each candidate
Itemset
I1, I2, I3
I1, I2, I5
C3
L3
C3
  • The generation of the set of candidate
    3-itemsets, C3 , involves use of the Apriori
    Property.
  • In order to find C3, we compute L2 Join L2.
  • C3 L2 Join L2 I1, I2, I3, I1, I2, I5,
    I1, I3, I5, I2, I3, I4, I2, I3, I5, I2,
    I4, I5.
  • Now, Join step is complete and Prune step will
    be used to reduce the size of C3. Prune step
    helps to avoid heavy computation due to large Ck.

61
Step 3 Generating 3-itemset Frequent Pattern
Cont.
  • Based on the Apriori property that all subsets of
    a frequent itemset must also be frequent, we can
    determine that four latter candidates cannot
    possibly be frequent. How ?
  • For example , lets take I1, I2, I3. The 2-item
    subsets of it are I1, I2, I1, I3 I2, I3.
    Since all 2-item subsets of I1, I2, I3 are
    members of L2, We will keep I1, I2, I3 in C3.
  • Lets take another example of I2, I3, I5 which
    shows how the pruning is performed. The 2-item
    subsets are I2, I3, I2, I5 I3,I5.
  • BUT, I3, I5 is not a member of L2 and hence it
    is not frequent violating Apriori Property. Thus
    We will have to remove I2, I3, I5 from C3.
  • Therefore, C3 I1, I2, I3, I1, I2, I5
    after checking for all members of result of Join
    operation for Pruning.
  • Now, the transactions in D are scanned in order
    to determine L3, consisting of those candidates
    3-itemsets in C3 having minimum support.

62
Step 4 Generating 4-itemset Frequent Pattern
  • The algorithm uses L3 Join L3 to generate a
    candidate set of 4-itemsets, C4. Although the
    join results in I1, I2, I3, I5, this itemset
    is pruned since its subset I2, I3, I5 is not
    frequent.
  • Thus, C4 f , and algorithm terminates, having
    found all of the frequent items. This completes
    our Apriori Algorithm.
  • Whats Next ?
  • These frequent itemsets will be used to generate
    strong association rules ( where strong
    association rules satisfy both minimum support
    minimum confidence).

63
Step 5 Generating Association Rules from
Frequent Itemsets
  • Procedure
  • For each frequent itemset l, generate all
    nonempty subsets of l.
  • For every nonempty subset s of l, output the rule
    s ? (l-s) if
  • support_count(l) / support_count(s) gt min_conf
    where min_conf is minimum confidence threshold.
  • Back To Example
  • We had L I1, I2, I3, I4, I5,
    I1,I2, I1,I3, I1,I5, I2,I3, I2,I4,
    I2,I5, I1,I2,I3, I1,I2,I5.
  • Lets take l I1,I2,I5.
  • Its all nonempty subsets are I1,I2, I1,I5,
    I2,I5, I1, I2, I5.

64
Step 5 Generating Association Rules from
Frequent Itemsets Cont.
  • Let minimum confidence threshold is , say 70.
  • The resulting association rules are shown below,
    each listed with its confidence.
  • R1 I1 I2 ? I5
  • Confidence scI1,I2,I5/scI1,I2 2/4 50
  • R1 is Rejected.
  • R2 I1 I5 ? I2
  • Confidence scI1,I2,I5/scI1,I5 2/2 100
  • R2 is Selected.
  • R3 I2 I5 ? I1
  • Confidence scI1,I2,I5/scI2,I5 2/2 100
  • R3 is Selected.

65
Step 5 Generating Association Rules from
Frequent Itemsets Cont.
  • R4 I1 ? I2 I5
  • Confidence scI1,I2,I5/scI1 2/6 33
  • R4 is Rejected.
  • R5 I2 ? I1 I5
  • Confidence scI1,I2,I5/I2 2/7 29
  • R5 is Rejected.
  • R6 I5 ? I1 I2
  • Confidence scI1,I2,I5/ I5 2/2 100
  • R6 is Selected.
  • In this way, We have found three strong
    association rules.

66
Example
Simple algorithm
ABCDE
Large itemset
ACDE?B
ABCE?D
Rules with minsup
CDE?AB
BCE?AD
ABE?CD
ADE?BC
ACD?BE
ACE?BD
ACE?BD
ABC?ED
ABCDE
Fast algorithm
ACDE?B
ABCE?D
ACE?BD
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com