Title: Data Mining and Its Applications to Image Processing
1Data Mining and Its Applications to Image
Processing
????????????????
- ???? Chang, Chin-Chen (???)
- ??? Lin, Chih-Yang (???)
Department of Computer Science and Information
Engineering, National Chung Cheng University
2The Fields of Data Mining
- Mining Association Rules
- Sequential Mining
- Clustering (Declustering)
- Classification
3Outline
- Part I Design and Analysis Data Mining
Algorithms - Part II Data Mining Applications to Image
Processing
4Part I Design and Analysis Data Mining Algorithms
- 1. Perfect Hashing Schemes for Mining Association
Rules (or for Mining Traversal Patterns)
5Mining Association Rules
- Mining Association Rules
- Support
- Obtain Large Itemset
- Confidence
- Generate Association Rules
6D
C1
L1
Apriori
Itemset Sup.
A 2
B 3
C 3
D 1
E 3
TID Items
100 A C D
200 B C E
300 A B C E
400 B E
Itemset Sup.
A 2
B 3
C 3
E 3
C2
C2
Sup2
L2
Itemset
A B
A C
A E
B C
B E
C E
Itemset Sup.
A B 1
A C 2
A E 1
B C 2
B E 3
C E 2
Itemset Sup.
A C 2
B C 2
B E 3
C E 2
C3
C3
L3
Itemset
B C E
Itemset Sup.
B C E 2
Itemset Sup.
B C E 2
7Apriori Cont.
- Disadvantages
- Inefficient
- Produce much more useless candidates
8DHP
- Prune useless candidates in advance
- Reduce database size at each iteration
9C1 Count
A 2
B 3
C 3
D 1
E 3
D
L1
A
B
C
E
Min sup2
TID Items
100 A C D
200 B C E
300 A B C E
400 B E
Making a hash table Making a hash table
100 A C
200 B C,B E,C E
300 A B,A C,A E,B C,B E,C E
400 B E
Hx y((order of x )10(order of y)) mod 7
A E B E
C E B C B E A C
C E B C B E A B A C
3 0 2 0 3 1 2
0 1 2 3 4 5 6
1 0 1 0 1 0 1
Hash table H2
Hash address
Bit vector
The number of items hashed to bucket 0
10Perfect Hashing Schemes (PHS) for Mining
Association Rules
11Motivation
- Apriori and DHP produce Ci from Li-1 that may be
the bottleneck - Collisions in DHP
- Designing a perfect hashing function for every
transaction databases is a thorny problem
12Definition
- Definition. A Join operation is to join two
different (k-1)-itemsets, , respectively, to
produces a k-itemset, where - p1p2pk-1
- q1q2qk-1 and
- p2q1, p3q2,,pk-2qk-3, pk-1qk-2.
- Example ABC, BCD
- 3-itemsets of ABCD ABC, ABD, ACD, BCD
- only one pair that satisfies the join definition
13Algorithm
- PHS (Perfect Hashing and Data Shrinking)
14Example1 (sup2)
L1
Itemset Sup.
B 3
C 3
D 2
E 3
15Example2 (sup2)
Decode AD -gt (BC)(CE) BCE
16Problem on Hash Table
- Consider a database contains p transactions,
which are comprised of unique items and are of
equal length N, and the minimum support of 1. - At iteration k, the of candidate k-itemsets is
- The of buckets required in the next pass is
, where m - While the actual of the next candidates is
-
Loading density
17How to Improve the Loading Density
- Two level perfect hash scheme (parital hash)
18Experiments
19Experiments
20Experiments
21Part II Data Mining Applications to Image
Processing
- 1. A Prediction Scheme for Image Vector
Quantization - based on Mining Association Rules
- 2. Reversible Steganography for VQ-compressed
Images - Using Clustering and Relocation
- 3. A Reversible Steganographic Method Using SMVQ
Approach - based on Declustering
22A Prediction Scheme for Image Vector Quantization
Based on Mining Association Rules
23Vector Quantization (VQ)
Image encoding and decoding techniques
24SMVQ(cont.)
Codebook
State Codebook
25Framework of the Proposed Method
v/10
(Quantized)
26Condition
Horizontal, Vertical, Diagonal, Association Rules
If X ? y' , there is no such rule X' ? y',
where X' ? X and y' y.
27The Prediction Strategy
28Example
Rules DB
Query
Result
? may be 5, 1, 8, or 10. How to decide?
29Example cont.
The weight of 5 4904905100 12.2 The
weight of 1 385295 4.45 The weight of 8
470 2.8 The weight of 10 375 2.25 5, 1
is called the consequence list, which size is
determined by the user
30Experiments
Reconstructed image by the proposed method
Reconstructed image by full-search VQ
Original Image
31Experiments cont.
The performance comparisons on various methods
32Experiments cont.
Overfitting problem
33Advantages
- Mining association rules can be applied to image
prediction successfully - Broader spatial correlation is considered than
that of SMVQ - More efficient than that of SMVQ since no
Euclidean distances should be calculated
34Reversible Steganography for VQ-compressed Images
Using Clustering and Relocation
35Flowchart of the Proposed Method
X
36Construction of the Hit Map
0
13
7 13 4
7
0
6
0
4
0
2
7
0
3
0
11
0
0
.
0
.
0
.
0
Sorted codebook
Hit map
37Clustering Codebook
Assume that the size of a codebook is 15 cw0,
cw1, , cw14
Clustering
C1 cw0, cw1, cw3, cw6, cw8, cw10
C2 cw4, cw14
C3 cw2, cw5, cw9
C4 cw12
C5 cw7, cw11, cw13
38Relocation
U
L cw14
Assume that the size of the state codebook is 4
cw0, cw1 cw3, cw6 cw8, cw10
cw2, cw5 cw9
cw4, cw14
cw7, cw11 cw13
cw12
39Embedding
Only the codewords in G0 can embed the secret bits
The codewords in G1 should be replaced with the
codewords in G2
cw14 cw12 cw1
cw2 cw6 cw3
cw10 cw8 cw6
Secret bits 1011
cw4 cw12 cw0
cw2 cw6 cw5
cw3 cw8 cw1
40Extraction Reversibility
cw4 cw12 cw0
cw2 cw6 cw5
cw3 cw8 cw1
recover
cw14 cw12 cw1
cw2 cw6 cw3
cw10 cw8 cw6
Secret bits
41Experiments
Method Measure Lena Pepper Sailboat Baboon
Modified Tians method PSNR (dB) 26.92 26.45 25.05 22.70
Modified Tians method Payload (bits) 2777 3375 3283 2339
MFCVQ PSNR (dB) 28.03 26.43 26.60 24.04
MFCVQ Payload (bits) 5892 5712 5176 1798
Proposed method PSNR (dB) 30.23 29.15 28.00 24.04
Proposed method Payload (bits) 8707 8421 7601 3400
12 hit maps (600 bits), 250 clusters
42Experiments
Tians method
MFCVQ
43Single hit map
Multiple hit maps without clustering
Using clustering and multiple hit maps
44Experiments
Using Lena as the cover image
45A Reversible Steganographic Method Using SMVQ
Approach based on Declustering
46Find the most dissimilar pairs (De-clustering)
CW1
CW8
CW2
CW9
CW3
CW10
CW4
CW11
CW5
CW12
CW6
CW13
CW7
CW14
0
1
Dissimilar
47Embedding Using Side-Match
CW1
CW8
Dissimilar Pair
Assume X CW1
V0 ((U13L4)/2, U14, U15, U16, L8, L12, L16)
V1 (X1, X2, X3, X4, X5, X9, X13)CW1
V8 (X1, X2, X3, X4, X5, X9, X13)CW8
d1Euclidean_Distance(V0, V1)
d8Euclidean_Distance(V0, V8)
If (d1ltd8), then Block X is replaceable Otherwise,
Block X is non-replaceable
48 A secret message 1 0 1 0 1 0 0 1 0 1 1 1
1 0 0 0
1
1
0
0
1
0
0
1
1
1
1
0
1
0
0
0
Secret bits
Index Table
If (d6ltd13)
CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15
CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0
6
Embedding Result
1
0
49 A secret message 1 0 1 0 1 0 0 1 0 1 1 1
1 0 0 0
1
1
0
0
1
0
0
1
1
1
1
0
1
0
0
0
Secret bits
Index Table
If (d2ltd9)
CW1, CW2, CW3, CW4 CW5, CW6 CW7, CW15
CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0
6
9
Embedding Result
1
0
50 A secret message 1 0 1 0 1 0 0 1 0 1 1 1
1 0 0 0
1
1
0
0
1
0
0
1
1
1
1
0
1
0
0
0
Secret bits
Index Table
If (d12gtd5)
CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15
CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0
6
9
1512
Embedding Result
1
0
CW15 embed 1
51 A secret message 1 0 1 0 1 0 0 1 0 1 1 1
1 0 0 0
1
1
0
0
1
0
0
1
1
1
1
0
1
0
0
0
Secret bits
Index Table
If (d9gtd2)
CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15
CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0
6
9
1512
09
Embedding Result
1
0
CW0 embed 0
52 Extraction and Recovery
1
6
9
1512
09
Extract Secret bits
Steganographic Index Table
If (d6ltd13)
CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15
CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0
6
Recovery
1
0
53 Extraction and Recovery
1
6
9
0
1512
09
Extract Secret bits
Steganographic Index Table
If (d9gtd2)
CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15
CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0
6
2
Recovery
1
0
54 Extraction and Recovery
1
0
1
6
9
1512
09
Extract Secret bits
Steganographic Index Table
CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15
CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0
12
2
6
Recovery
1
0
55 Extraction and Recovery
1
0
0
1
6
9
1512
09
Extract Secret bits
Steganographic Index Table
CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15
CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0
12
2
9
6
Recovery
1
0
56Find Dissimilar Pairs
PCA projection
57Improve Embedding Capacity
Partition into more groups
58Experiments
Codebook size 512
Codeword size 16
The number of original image blocks12812816384
The number of non-replaceable blocks 139
59Experiments
Codebook size 512
Codeword size 16
The number of original image blocks12812816384
The number of non-replaceable blocks 458
60Experiments
Embedding capacity
Images Tians method MFCVQ Chang et al.s method Proposed Method (3 groups) Proposed Method (9 groups) Proposed Method (17 groups)
Lena 2,777 5,892 10,111 16,129 45,075 55,186
Baboon 2,339 1,798 4,588 16,129 36,609 39,014
Time Comparison
Image Lena Lena Lena Lena Lena Lena Lena Lena Lena Lena
Methods Tians method MFCVQ Chang et al.s method Chang et al.s method Chang et al.s method Chang et al.s method Proposed mehtod Proposed mehtod Proposed mehtod Proposed mehtod
Time (sec) 0.55 1.36 Size of the state codebook Size of the state codebook Size of the state codebook Size of the state codebook Number of groups Number of groups Number of groups Number of groups
Time (sec) 0.55 1.36 4 8 16 32 3 5 9 17
Time (sec) 0.55 1.36 14.59 29.80 58.8 161.2 0.11 0.13 0.14 0.19
61Future Research Directions
- Extend the proposed reversible steganographic
methods to other image formats - Apply perfect hashing schemes to other
applications
62Thanks all