Title: An%20Experimental%20Study%20of%20Association%20Rule%20Hiding%20Techniques
1An Experimental Study of Association Rule Hiding
Techniques
- Emmanuel Pontikakis
- pontikak_at_ceid.upatras.gr
- Dept. of Computer Engineering and Informatics
- University of Patras
- Patra, Greece
- Vassilios Verykios
- verykios_at_cti.gr
- Dept. of Computer and Communication
EngineeringUniversity of ThessalyVolos, Greece - Computer Technology Institute
- Research Unit 3
- Athens, Greece
2Outline
- Introduction - Related Work
- Distortion-based Techniques
- Blocking-based Techniques
- Comparison and Analysis
- Conclusions
3Introduction
Database
Changed Database
4Related Work
- Association Rule Hiding
- Blocking-based Technique (Saygin, Verykios,
Clifton) - Distortion-based (Sanitization) Technique
(Oliveira, Zaiane, Verykios, Dasseni)
5Outline
- Introduction - Related Work
- Distortion-based Techniques
- Blocking-based Techniques
- Comparison and Analysis
- Conclusion
6Distortion-based Techniques
Sample Database
Distorted Database
A B C D
1 1 1 0
1 0 1 1
0 0 0 1
1 1 1 0
1 0 1 1
A B C D
1 1 1 0
1 0 0 1
0 0 0 1
1 1 1 0
1 0 0 1
Rule A?C has Support(A?C)80 Confidence(A?C)10
0
Rule A?C has now Support(A?C)40 Confidence(A?C
)50
7Side Effects
Before Hiding Process After Hiding Process Side Effect
Rule Ri has had conf(Ri)gtMCT Rule Ri has now conf(Ri)ltMCT Rule Eliminated (Undesirable Side Effect)
Rule Ri has had conf(Ri)ltMCT Rule Ri has now conf(Ri)gtMCT Ghost Rule (Undesirable Side Effect)
Large Itemset I has had sup(I)gtMST Itemset I has now sup(I)ltMST Itemset Eliminated (Undesirable Side Effect)
8Distortion-based Techniques
- Challenges/Goals
- To minimize the undesirable Side Effects that the
hiding process causes to non-sensitive rules. - To minimize the number of 1s that must be
deleted in the database. - Algorithms must be linear in time as the database
increases in size.
9Our Proposal Weight-based Sorting Distortion
Algorithm (WSDA)
- High Level Description
- Input
- Initial Database
- Set of Sensitive Rules
- Safety Margin (for example 10)
- Output
- Sanitized Database
- Sensitive Rules no longer hold in the Database
10WSDA Algorithm
- High Level Description
- 1st step
- Retrieve the set of transactions which support
sensitive rule RS - For each sensitive rule RS find the number N1 of
transaction in which, one item that supports the
rule will be deleted
11WSDA Algorithm
- High Level Description
- 2nd step
- For each rule Ri in the Database with common
items with RS compute a weight w that denotes how
strong is Ri - For each transaction that supports RS compute a
priority Pi, that denotes how many strong rules
this transaction supports
12WSDA Algorithm
- High Level Description
- 3rd step
- Sort the N1 transactions in ascending order
according to their priority value Pi - 4th step
- For the first N1 transactions hide an item that
is contained in RS
13WSDA Algorithm
- High Level Description
- 5th step
- Update confidence and support values for other
rules in the database
14Experimental Results of WSDA algorithm
15Experimental Results of WSDA algorithm
16Outline
- Introduction - Related Work
- Distortion-based Techniques
- Blocking-based Techniques
- Comparison and Analysis
- Conclusion
17Quality of Data
- Sometimes it is dangerous to delete some items
from the database (etc. medical databases)
because the false data may create undesirable
effects. - So, we have to hide the rules in the database by
adding uncertainty without distorting the
database.
18Blocking-based Techniques
Initial Database
New Database
A B C D
1 1 1 0
1 0 1 1
0 0 0 1
1 1 1 0
1 0 1 1
A B C D
1 1 1 0
1 0 ? 1
? 0 0 1
1 1 1 0
1 0 1 1
Support and Confidence becomes marginal. In New
Database 60 conf(A ? C) 100
19Modification of Association Rule Definition
- A rules A?B confidence and support becomes
marginal - sup(A?B) minsup(A?B), maxsup(A?B)
- conf(A?B) minconf(A?B), maxconf(A?B)
- minsup(A?B)
- maxsup(A?B)
20Modification of Association Rule Definition
- minconf(A?B)
- maxconf(A?B)
21Negative Border Rules Set (NBRS) Definition
- When a rule R has either
- sup(R)gtMST AND conf(R)ltMCT
- OR
- sup(R)ltMST AND conf(R)gtMCT,
- then we say that R belongs to NBRS.
22Side Effects Definition Modification in
Blocking-based Techniques
Before Hiding Process After Hiding Process Side Effect
Rule Ri has had conf(Ri)gtMCT Rule Ri has now minconf(Ri)ltMCT Rule Eliminated (Undesirable Side Effect)
Rule Ri has had conf(Ri)ltMCT Rule Ri has now maxconf(Ri)gtMCT Ghost Rule (Desirable Side Effect)
Large Itemset I has had sup(I)gtMST Itemset I has now minsup(I)ltMST Itemset Eliminated (Undesirable Side Effect)
Itemset I has had sup(I)ltMST Itemset I has now maxsup(I)gtMST Ghost Itemset (Desirable Side Effect)
23Privacy Breaches Definitions
- If an item i, some values of which, are hidden by
?s, is contained in a sensitive rule, a privacy
breach will occur if the adversary can assume
that with c confidence. - For a rule R with maxconf(R)gtMCT, a privacy
breach occurs if it can be estimated, with c
confidence, that R is either a sensitive or a
ghost rule. - For a blocked item i in a specific transaction T,
a privacy breach occurs if the adversary can
estimate with c confidence that its original
value is either 0 or 1.
24Blocking-Based Techniques
- Goals that an algorithm has to achieve
- To put a relatively small number of ?s and
reduce significantly the confidence of senstitive
rules. - To minimize the undesirable side effects (rules
and itemsets lost) by selecting the items in the
appropriate transactions to change, and maximize
the desirable side effects. - To modify the database in a way that an adversary
cannot recover the original values of the
database.
25Our Proposal Blocking Algorithm (BA)
- High Level Description
- 1st step
- For each sensitive rule RS (Rule RS has left
itemset IL and right itemset IR) compute how many
0s and 1s you have to block, in order to reduce
the confidence of RS. - 2nd step
- Find the set of transactions TR that support RS
or the set of transactions TLpR that support
partially RS (support partially the left itemset
and do not support the right itemset). - For each transaction in TR find the rules Rcommon
with at least one common item with IR and for
each transaction in TLpR find the Rcommon?NBRS
with at least one common item with IL. - Assign a weight w for each Rcommon and a weight
w for each Rcommon. - Assign a PT for each transaction in T such as PT
is large if transaction Ti has many Rcommon rules
with large w, and a priority value PT for each
Ti such as PT is small if transaction T has
many Rcommon rules with large w.
26Blocking Algorithm
- High Level Description
- 3rd step
- Sort T?TR starting from them with lowest PTi. and
sort T?TLRp starting from them with highest
PTi. - 4th step
- For the first N1 sorted T?TR block an item i?IR
and for the first N0 sorted T?TLRp block an item
i? IL - 5th step
- Update values minconf(Ri), minsup(Ri), for all
other rules that have been affected.
27Blocking-Based Techniques
- Main Problems of blocking technique
- The maximum confidence of a sensitive rule cannot
be reduced. - An adversary can infer the hidden values if he
applies a smart inference technique, if the
blocking algorithm does not add much uncertainty
in the database. - Both 0s and 1s must be hidden, because if only
1s were hidden the adversary would simply
replace all the ?s with 1s and would restore
easily the initial database. - Many ?s must be inserted, if we dont want an
adversary to infer hidden data.
28Experimental Results of Blocking Algorithm
29Experimental Results of Blocking Algorithm (2)
30Experimental Results of Blocking Algorithm (3)
31Outline
- Introduction - Related Work
- Distortion-based Techniques
- Blocking-based Techniques
- Comparison and Analysis
- Conclusions
32Comparison and Analysis
Distortion-based Techniques Blocking-based Techniques
Privacy Breaches No privacy breaches Many kinds of privacy breaches
Simplicity of algorithms Simpler More complicated
Database Modification Database contains false information Many ?s must be inserted in the Database
33Outline
- Introduction - Related Work
- Distortion-based Techniques
- Blocking-based Techniques
- Comparison and Analysis
- Conclusions
34Conclusions
- There are open research problems in Blocking
Technique - A) What techniques must be used in order to
reduce the privacy breaches? - B) In what other ways can we prevent an adversary
from inferring the association rules in the
database? - C) Maybe applying a chi-square test to the final
database reveal some correlations between the
items
35References
- Evfimienski et.al Alexandre Evfimievski,
Ramakrishnan Srikant, Rakesh Agrawal, Johannes
Gehrke. Privacy Preserving Mining of Association
Rules. SIGKDD 2002, Edmonton, Alberta Canada. - Murat Kantarcioglou and Chris Clifton, Privacy
Preserving Distributed Mining of Association
Rules on Horizontally Partitioned Data, In
Proceedings of the ACM SIGMOD Workshop on
Research Issues in Data Mining and Knowledge
Discovery (2002), 2431. - Jaideep Vaidya and Chris Clifton, Privacy
Preserving Association Rule Mining in Vertically
Partitioned Data, In the 8th ACM SIGKDD
International Conference on Knowledge Discovery
and Data Mining (2002), 639644.
36References
- Stanley R. M. Oliveira and Osmar R. Zaïane.
Algorithms for Balacing Privacy and Knowledge
Discovery in Association Rule Mining. In Proc.
of the Seventh International Database Engineering
Applications Symposium (IDEAS'03), pp. 54-63,
Hong Kong, July 16-18, 2003. - Yucel Saygin, Vassilios Verykios, and Chris
Clifton, Using Unknowns to Prevent Discovery of
Association Rules, SIGMOD Record 30 (2001), no.
4, 4554. - S. Verykios, Ahmed K. Elmagarmid, Bertino Elisa,
Yucel Saygin, and Dasseni Elena, Association Rule
Hiding, IEEE Transactions on Knowledge and Data
Engineering (2003).