An%20Experimental%20Study%20of%20Association%20Rule%20Hiding%20Techniques - PowerPoint PPT Presentation

About This Presentation
Title:

An%20Experimental%20Study%20of%20Association%20Rule%20Hiding%20Techniques

Description:

An Experimental Study of Association Rule Hiding Techniques. Emmanuel Pontikakis ... Blocking-based Technique (Saygin, ... SIGKDD 2002, Edmonton, Alberta Canada. ... – PowerPoint PPT presentation

Number of Views:277
Avg rating:3.0/5.0
Slides: 37
Provided by: man12
Category:

less

Transcript and Presenter's Notes

Title: An%20Experimental%20Study%20of%20Association%20Rule%20Hiding%20Techniques


1
An Experimental Study of Association Rule Hiding
Techniques
  • Emmanuel Pontikakis
  • pontikak_at_ceid.upatras.gr
  • Dept. of Computer Engineering and Informatics
  • University of Patras
  • Patra, Greece
  • Vassilios Verykios
  • verykios_at_cti.gr
  • Dept. of Computer and Communication
    EngineeringUniversity of ThessalyVolos, Greece
  • Computer Technology Institute
  • Research Unit 3
  • Athens, Greece

2
Outline
  • Introduction - Related Work
  • Distortion-based Techniques
  • Blocking-based Techniques
  • Comparison and Analysis
  • Conclusions

3
Introduction
Database
Changed Database
4
Related Work
  • Association Rule Hiding
  • Blocking-based Technique (Saygin, Verykios,
    Clifton)
  • Distortion-based (Sanitization) Technique
    (Oliveira, Zaiane, Verykios, Dasseni)

5
Outline
  • Introduction - Related Work
  • Distortion-based Techniques
  • Blocking-based Techniques
  • Comparison and Analysis
  • Conclusion

6
Distortion-based Techniques
Sample Database
Distorted Database
A B C D
1 1 1 0
1 0 1 1
0 0 0 1
1 1 1 0
1 0 1 1
A B C D
1 1 1 0
1 0 0 1
0 0 0 1
1 1 1 0
1 0 0 1
Rule A?C has Support(A?C)80 Confidence(A?C)10
0
Rule A?C has now Support(A?C)40 Confidence(A?C
)50
7
Side Effects
Before Hiding Process After Hiding Process Side Effect
Rule Ri has had conf(Ri)gtMCT Rule Ri has now conf(Ri)ltMCT Rule Eliminated (Undesirable Side Effect)
Rule Ri has had conf(Ri)ltMCT Rule Ri has now conf(Ri)gtMCT Ghost Rule (Undesirable Side Effect)
Large Itemset I has had sup(I)gtMST Itemset I has now sup(I)ltMST Itemset Eliminated (Undesirable Side Effect)
8
Distortion-based Techniques
  • Challenges/Goals
  • To minimize the undesirable Side Effects that the
    hiding process causes to non-sensitive rules.
  • To minimize the number of 1s that must be
    deleted in the database.
  • Algorithms must be linear in time as the database
    increases in size.

9
Our Proposal Weight-based Sorting Distortion
Algorithm (WSDA)
  • High Level Description
  • Input
  • Initial Database
  • Set of Sensitive Rules
  • Safety Margin (for example 10)
  • Output
  • Sanitized Database
  • Sensitive Rules no longer hold in the Database

10
WSDA Algorithm
  • High Level Description
  • 1st step
  • Retrieve the set of transactions which support
    sensitive rule RS
  • For each sensitive rule RS find the number N1 of
    transaction in which, one item that supports the
    rule will be deleted

11
WSDA Algorithm
  • High Level Description
  • 2nd step
  • For each rule Ri in the Database with common
    items with RS compute a weight w that denotes how
    strong is Ri
  • For each transaction that supports RS compute a
    priority Pi, that denotes how many strong rules
    this transaction supports

12
WSDA Algorithm
  • High Level Description
  • 3rd step
  • Sort the N1 transactions in ascending order
    according to their priority value Pi
  • 4th step
  • For the first N1 transactions hide an item that
    is contained in RS

13
WSDA Algorithm
  • High Level Description
  • 5th step
  • Update confidence and support values for other
    rules in the database

14
Experimental Results of WSDA algorithm
15
Experimental Results of WSDA algorithm
16
Outline
  • Introduction - Related Work
  • Distortion-based Techniques
  • Blocking-based Techniques
  • Comparison and Analysis
  • Conclusion

17
Quality of Data
  • Sometimes it is dangerous to delete some items
    from the database (etc. medical databases)
    because the false data may create undesirable
    effects.
  • So, we have to hide the rules in the database by
    adding uncertainty without distorting the
    database.

18
Blocking-based Techniques
Initial Database
New Database
A B C D
1 1 1 0
1 0 1 1
0 0 0 1
1 1 1 0
1 0 1 1
A B C D
1 1 1 0
1 0 ? 1
? 0 0 1
1 1 1 0
1 0 1 1
Support and Confidence becomes marginal. In New
Database 60 conf(A ? C) 100
19
Modification of Association Rule Definition
  • A rules A?B confidence and support becomes
    marginal
  • sup(A?B) minsup(A?B), maxsup(A?B)
  • conf(A?B) minconf(A?B), maxconf(A?B)
  • minsup(A?B)
  • maxsup(A?B)

20
Modification of Association Rule Definition
  • minconf(A?B)
  • maxconf(A?B)

21
Negative Border Rules Set (NBRS) Definition
  • When a rule R has either
  • sup(R)gtMST AND conf(R)ltMCT
  • OR
  • sup(R)ltMST AND conf(R)gtMCT,
  • then we say that R belongs to NBRS.

22
Side Effects Definition Modification in
Blocking-based Techniques
Before Hiding Process After Hiding Process Side Effect
Rule Ri has had conf(Ri)gtMCT Rule Ri has now minconf(Ri)ltMCT Rule Eliminated (Undesirable Side Effect)
Rule Ri has had conf(Ri)ltMCT Rule Ri has now maxconf(Ri)gtMCT Ghost Rule (Desirable Side Effect)
Large Itemset I has had sup(I)gtMST Itemset I has now minsup(I)ltMST Itemset Eliminated (Undesirable Side Effect)
Itemset I has had sup(I)ltMST Itemset I has now maxsup(I)gtMST Ghost Itemset (Desirable Side Effect)
23
Privacy Breaches Definitions
  • If an item i, some values of which, are hidden by
    ?s, is contained in a sensitive rule, a privacy
    breach will occur if the adversary can assume
    that with c confidence.
  • For a rule R with maxconf(R)gtMCT, a privacy
    breach occurs if it can be estimated, with c
    confidence, that R is either a sensitive or a
    ghost rule.
  • For a blocked item i in a specific transaction T,
    a privacy breach occurs if the adversary can
    estimate with c confidence that its original
    value is either 0 or 1.

24
Blocking-Based Techniques
  • Goals that an algorithm has to achieve
  • To put a relatively small number of ?s and
    reduce significantly the confidence of senstitive
    rules.
  • To minimize the undesirable side effects (rules
    and itemsets lost) by selecting the items in the
    appropriate transactions to change, and maximize
    the desirable side effects.
  • To modify the database in a way that an adversary
    cannot recover the original values of the
    database.

25
Our Proposal Blocking Algorithm (BA)
  • High Level Description
  • 1st step
  • For each sensitive rule RS (Rule RS has left
    itemset IL and right itemset IR) compute how many
    0s and 1s you have to block, in order to reduce
    the confidence of RS.
  • 2nd step
  • Find the set of transactions TR that support RS
    or the set of transactions TLpR that support
    partially RS (support partially the left itemset
    and do not support the right itemset).
  • For each transaction in TR find the rules Rcommon
    with at least one common item with IR and for
    each transaction in TLpR find the Rcommon?NBRS
    with at least one common item with IL.
  • Assign a weight w for each Rcommon and a weight
    w for each Rcommon.
  • Assign a PT for each transaction in T such as PT
    is large if transaction Ti has many Rcommon rules
    with large w, and a priority value PT for each
    Ti such as PT is small if transaction T has
    many Rcommon rules with large w.

26
Blocking Algorithm
  • High Level Description
  • 3rd step
  • Sort T?TR starting from them with lowest PTi. and
    sort T?TLRp starting from them with highest
    PTi.
  • 4th step
  • For the first N1 sorted T?TR block an item i?IR
    and for the first N0 sorted T?TLRp block an item
    i? IL
  • 5th step
  • Update values minconf(Ri), minsup(Ri), for all
    other rules that have been affected.

27
Blocking-Based Techniques
  • Main Problems of blocking technique
  • The maximum confidence of a sensitive rule cannot
    be reduced.
  • An adversary can infer the hidden values if he
    applies a smart inference technique, if the
    blocking algorithm does not add much uncertainty
    in the database.
  • Both 0s and 1s must be hidden, because if only
    1s were hidden the adversary would simply
    replace all the ?s with 1s and would restore
    easily the initial database.
  • Many ?s must be inserted, if we dont want an
    adversary to infer hidden data.

28
Experimental Results of Blocking Algorithm
29
Experimental Results of Blocking Algorithm (2)
30
Experimental Results of Blocking Algorithm (3)
31
Outline
  • Introduction - Related Work
  • Distortion-based Techniques
  • Blocking-based Techniques
  • Comparison and Analysis
  • Conclusions

32
Comparison and Analysis
Distortion-based Techniques Blocking-based Techniques
Privacy Breaches No privacy breaches Many kinds of privacy breaches
Simplicity of algorithms Simpler More complicated
Database Modification Database contains false information Many ?s must be inserted in the Database
33
Outline
  • Introduction - Related Work
  • Distortion-based Techniques
  • Blocking-based Techniques
  • Comparison and Analysis
  • Conclusions

34
Conclusions
  • There are open research problems in Blocking
    Technique
  • A) What techniques must be used in order to
    reduce the privacy breaches?
  • B) In what other ways can we prevent an adversary
    from inferring the association rules in the
    database?
  • C) Maybe applying a chi-square test to the final
    database reveal some correlations between the
    items

35
References
  • Evfimienski et.al Alexandre Evfimievski,
    Ramakrishnan Srikant, Rakesh Agrawal, Johannes
    Gehrke. Privacy Preserving Mining of Association
    Rules. SIGKDD 2002, Edmonton, Alberta Canada.
  • Murat Kantarcioglou and Chris Clifton, Privacy
    Preserving Distributed Mining of Association
    Rules on Horizontally Partitioned Data, In
    Proceedings of the ACM SIGMOD Workshop on
    Research Issues in Data Mining and Knowledge
    Discovery (2002), 2431.
  • Jaideep Vaidya and Chris Clifton, Privacy
    Preserving Association Rule Mining in Vertically
    Partitioned Data, In the 8th ACM SIGKDD
    International Conference on Knowledge Discovery
    and Data Mining (2002), 639644.

36
References
  • Stanley R. M. Oliveira and Osmar R. Zaïane.
    Algorithms for Balacing Privacy and Knowledge
    Discovery in Association Rule Mining.  In Proc.
    of the Seventh International Database Engineering
    Applications Symposium (IDEAS'03), pp. 54-63,
    Hong Kong, July 16-18, 2003.
  • Yucel Saygin, Vassilios Verykios, and Chris
    Clifton, Using Unknowns to Prevent Discovery of
    Association Rules, SIGMOD Record 30 (2001), no.
    4, 4554.
  • S. Verykios, Ahmed K. Elmagarmid, Bertino Elisa,
    Yucel Saygin, and Dasseni Elena, Association Rule
    Hiding, IEEE Transactions on Knowledge and Data
    Engineering (2003).
Write a Comment
User Comments (0)
About PowerShow.com