An integer programming approach for frequent itemset hiding - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

An integer programming approach for frequent itemset hiding

Description:

An integer programming approach for frequent itemset hiding Aris Gkoulalas-Divanis Vassilos S. Verykios CIKM 06 – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 29
Provided by: edut1152
Category:

less

Transcript and Presenter's Notes

Title: An integer programming approach for frequent itemset hiding


1
An integer programming approach for frequent
itemset hiding
  • Aris Gkoulalas-Divanis
  • Vassilos S. Verykios
  • CIKM06

2
outline
  • Introduction
  • Basic definitions
  • Methodology
  • Experimental results
  • Conclusions

3
introduction
  • It based on the notion of distance between
    original database and the sanitized database
  • goal minimized the distance based on the integer
    programming while hiding the sensitive itemsets
    and minimally affecting non-sensitive itemsets

4
Basic definitions
  • the support count of itemsets in bitmap
    representation

a b c
1 0 1
1 1 1
Maximizing the number of 1 left in D
non-sensitive itemsets should satisfy this rule
in D
sensitive itemsets should satisfy this rule in D
5
(cont.)
  • Solving this problem is NP-hard ,there are 2m-1
    inequalities (mtransactions lists)

6
(cont.)
  • SIe,ae,bc (sensitive itemsets)
  • Se,bc (minimal sensitive itemsets)
  • SSe,ae,bc,ce,abc, set of all sensitive
    itemsets and their supersets
  • Ideal case FF-SS ,santized database D to
    contain all the frequent itemsets of D expect
    from the sensitive ones

7
(cont.)
  • Negative border
  • Positive border

8
Border revision
null
A
B
C
D
BC
AD
AC
AB
BD
CD
ABC
BCD
ACD
ABD
ABCD
9
Problem size minimization
  • Cthe total set of affected itemsets
  • Lc the set of solutions of the corresponding
    inequalities
  • remove the inequality of C2
    without affecting the global solution of the
    system then C2 covers C1

10
(cont.)
  • Corollary any itemset belonging in the positive
    border of F-SS covers all its subsets
  • gtB(F) cover all itemset of F
  • B-(F) cover all itemsets of
  • Ideal solution Lc

11
(cont.)
12
example
  • FA,B,C,D,AB,AC,AD,CD,ACD
  • SIAB,SAB
  • FA,B,C,D,AC,AD,CD,ACD
  • B(F)B,ACD

Bfrequent
ACDfrequent
ABinfrequent
13
Constraint satisfaction problem
  • A solution of a CSP is a complete assignment of
    values to the variables that satisfies all the
    constraints
  • In CSP we usually wish to maximize or minimize an
    objective function subject to a number of
    constraints
  • To solve this problem we use binary integer
    programming (BIP) that transform the CSP to an
    optimization problem

14
Binary integer problem
15
Experimental results
  • 10,000 transactions,10items,msup0.1

16
conclusions
  • Defined a new metric to quantify the distance of
    the initial database D and its sanitized version
    D
  • It has benefit of being exact when ideal solution
    can be identified

17
Exact knowledge hiding through database extension
  • Aris Gkoulalas-Divanis
  • Vassilos S. Verykios
  • TKDE08

18
introduction
  • The goal of the hiding algorithm is to create a
    minimal extension DX to the original database DO

D
19
(cont.)
  • Se,ae,bc

20
methodology
  • PD NDo QDx
  • ex e4,ae3,bc4

21
(cont.)
  • The distance between Do and D is measured based
    on the extension Dx

(minimize)
22
(cont.)
  • Optimal solution set c
  • Se,ae,bc mfreq0.3 Q4
  • Ce,f,bc,bd,ab,acd

23
Safety margin
  • The lower bound of Q under certain circumstances
    be insufficient to allow for the identification
    of an exact solution
  • Safety margin(SM) Expand the size of Q of Dx, it
    can be predefined or be computed dynamically
  • Exsabc
  • only 1 transaction is insufficient to
  • provide an exact solution

24
(cont.)
  • Null transaction
  • (i) an unnecessarily large safety
  • margin
  • Should be removed from Dx
  • (ii) a large value of Q essential for
  • proper hiding
  • Need to be validated ,since Q denotes the
  • lower bound in the number of
    transactions to
  • ensure proper hiding

25
(cont.)
  • To ensure minimum size of Dx, the hiding
    algorithm keeps only k null transactions
  • Qinvnull transaction
  • VQSM-Qinv
  • Ex sabc ,Q1 ,SM3
  • Kmax(1-3,0)1

Null transaction
26
Experimental results
27
(cont.)
28
conclusions
  • Use a minimal extension to the original database
  • It has benefit of being exact when ideal solution
    can be identified
Write a Comment
User Comments (0)
About PowerShow.com