Secure Incremental Maintenance of Distributed Association Rules - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Secure Incremental Maintenance of Distributed Association Rules

Description:

... a = b without letting anyone knows the value of a and b. Problem Definition ... Mining association rules under the definition of Secure Multiparty computation ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 38
Provided by: dept74
Category:

less

Transcript and Presenter's Notes

Title: Secure Incremental Maintenance of Distributed Association Rules


1
Secure Incremental Maintenance of Distributed
Association Rules
2
Agenda
  • Introduction
  • Secure Technologies
  • Problem Definition
  • Our algorithm
  • Experiments
  • Conclusions

3
Introduction
  • Association Rules
  • A means to identify patterns and trends
  • Secure Distributed Association Rules
  • Privacy is concerned
  • Restricted usage of some information
  • Maintenance of environment
  • Association rules with more sites
  • Use past results to reduce workload

4
Secure Data Mining
  • Approach 1 Data Obfuscation
  • Association rules from modified data
  • Simple algorithms but may get false rules
  • Approach 2 Secure Protocols
  • Complex communication
  • Difficult and costly algorithms but get accurate
    rules
  • Balance between cost and privacy

5
Secure Technologies
  • Secure Sum
  • There are n sites
  • Each site holds a private number
  • Compute the sum of a group of sites
  • Secure Union
  • There are n sites
  • Each site holds a private set of items
  • Compute the union of sets

6
Secure Sum Example
10
Site 1
Upper Bound 40
R 28
17 28 mod 40 29
8
11
38
17
38 11 mod 40 9
Site 2
Site 3
9
7
Secure Technologies
  • Secure Comparison
  • Two sites
  • A site holds a number a, another holds a number b
  • Check if a gt b without letting anyone knows the
    value of a and b

8
Problem Definition
  • There are n old sites
  • Knows the association rules in these sites
  • There are r new sites
  • Requires update of association rules in new
    environment
  • Maintain the privacy as well

9
Privacy? What to protect?
  • Different requirements in different situation
  • Basic requirements
  • Protect individual transaction
  • Protect individual site information
  • Local large itemsets, counts for itemsets
  • Secure Multi-party computation
  • The process does not reveal any other useful
    information except the information that can be
    derived from own input and the final result

10
Algorithms
  • Secure Incremental Maintenance of Distributed
    Association Rules (SIMDAR)
  • Mining association rules with basic privacy level
  • More Secure Incremental Maintenance of
    Distributed Association Rules (MSIMDAR)
  • Mining association rules under the definition of
    Secure Multiparty computation

11
SIMDAR What we know? (Assumption)
  • Original Large Itemset Lk is available
  • Total count for each old large Itemset is known
  • All sites follow a semi-honest model
  • They follow the rules, but may try to guess
    others information based on the received data
    (intermediate messages)
  • No collusion among any sites
  • Sites do not exchange intermediate information

12
Algorithm - SIMDAR
  • To find the large itemsets
  • Generate the candidate sets
  • Count on the candidates
  • Summing counts
  • Check for large itemset
  • Check if an association rule holds
  • Easy with counts available

13
Generate the candidates
  • C1 I
  • For Ck,
  • Each new site generates its own candidate set
    with own (k-1)th locally large and globally large
    itemsets
  • Secure Union to find the candidate sets from the
    new sites
  • Union with Lk

14
Summing on candidates
  • Partition into 2 groups
  • Pk in Lk
  • Qk not in Lk
  • For Pk, we got the original count, just add up
    the count in new sites using secure sum (no scan
    on old sites)

15
Summing Count for Qk
  • First summed up in new sites, we get a count
  • If the itemset is large in new sites, send to old
    sites for scan
  • Otherwise, prune away

16
Information Protected by SIMDAR
  • Individual transaction
  • We never access to individual transaction of
    others
  • Large Itemset of specific site
  • They are input to Secure Union
  • Count of each Itemset on each site
  • They are input to Secure Sum

17
MSIMDAR for Higher privacy level
  • Final result global association rules
  • Input Site database
  • Other information should be protected
  • Cannot reveal large itemsets?
  • Costly checking
  • We treat the large itemsets as part of the result

18
MSIMDAR
  • Target Global large itemsets and association
    rules
  • Useful information revealed by SIMDAR
  • Total Counts of itemsets
  • Original results of large itemset to new sites
  • New Candidates at new sites to old sites
  • Add fake itemset to hide the actual supported
    itemsets

19
MSIMDAR
  • Hiding the total count of an itemset
  • Do we really need to find out the total count?
  • Protect the large itemsets of the original
    results
  • Use a more complex protocol

20
MSIMDAR Adding
  • Total excess count
  • X.excess X.count s DB
  • Instead of summing X.counti, we sum the excess
    count X.excessi
  • Even revealed, we cannot know the count and
    database size
  • Checking for large itemsets after Secure Sum
  • Sa (the first site) holds random key Rx
  • Sb (the last site) holds (X.count s DB Rx)
  • Secure Comparison between Sa and Sb

21
Storage
  • We can reuse it in future and we need it in the
    future
  • Checking for association rules requires counting
    information
  • Prepare for next update

22
Storage
  • Commonly used method
  • Each site holds their own information
  • Count for each itemset
  • Database size
  • need to calculate the total count each time

23
Storage
  • We first sum the total database size DB using
    Secure Sum
  • Su (first site) holds the key of secure sum Rt
  • Sv (last site) get the sum DB Rt
  • For each itemset X, we store also
  • The protecting key Rx
  • The protected excess count X.excess Rx

24
Reusing the count
  • Checking association rules
  • A.count c B.count gt 0
  • Can be derived by six stored numbers
  • N1 (-1)N2 (-c)N3 (c)N4 (c-1)sN5
    (1-c)N6
  • N1 A.excess Ra
  • N2 Ra
  • N3 B.excess Rb
  • N4 Rb
  • N5 DB Rt
  • N6 Rt
  • Secure sum and secure comparison

25
Avoiding new sites knowing past results
  • Generating the candidates is similar except an
    old site will join to the Secure Union process
  • For counting, two old sites will join
  • Define
  • Pk Lk intersect Ck
  • Qk Ck Pk
  • Note that the new sites should not be able to
    distinguish Pk and Qk

26
Adding counts in new site
27
Adding for Pk
Old sites
Protected excess
New Sites
A
B
Random Key
A
Sum
Secure Compare
28
Adding for Qk
Old sites
0
New Sites
A
B
0
A
Sum
Secure Compare
29
New site pruning
  • New sites sends the count to an old site to
    continue
  • We got final excess count for Pk
  • Comparison means if the itemset is large in all
    sites
  • We got excess count in new sites for Qk
  • Comparison means if the itemset is large in new
    sites

30
Experiments
  • 3 programs
  • With privacy but no maintenance (SEC)
  • No Privacy but maintenance (MAN)
  • With privacy and maintenance (MSIDMAR)
  • Environment
  • P4 1.7GHz under Linux
  • Each site is simulated by an individual computer
  • Measure
  • CPU time

31
DB size
32
Support
33
Ratio
Total
123
96
69
312
34
Ratio
35
Analysis
  • Process time at new sites takes much longer
  • About 3 time to 5 times of that of old sites
  • Cost overhead due to secure algorithm
  • At old sites, average 10 of total cost
  • At new sites, average 6 of total cost
  • Both decrease in proportion with increase in db
    size

36
Conclusion
  • We have proposed algorithms to solve the
    maintenance problem at different privacy level
  • All can give a more efficient solution than
    simply ignoring the past results
  • As the number of sites are most likely to
    increase
  • The load on old sites will be low relatively to
    new sites
  • High entrance cost but low maintenance cost

37
End
Write a Comment
User Comments (0)
About PowerShow.com