Information Sharing across Private Databases - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Information Sharing across Private Databases

Description:

Selective Document Sharing. R is shopping for technology. ... Selective Document Sharing: Implementation. For each pair of documents dR DR and dS DS ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 43
Provided by: IBMU305
Category:

less

Transcript and Presenter's Notes

Title: Information Sharing across Private Databases


1
Information Sharing across Private Databases
  • Rakesh Agrawal
  • Alexandre Evfimievski
  • Ramakrishnan Srikant
  • IBM Almaden Research Center

2
Todays Information Sharing Systems
Mediator
Q
R
Q
R
Centralized
Federated
  • Assumption Information in each database can be
    freely shared.

3
Selective Document Sharing
  • R is shopping for technology.
  • S has intellectual property it may want to
    license.
  • First find the specific technologies where there
    is a match, and then reveal further information
    about those.

R Shopping List
S Technology List
Example 2 Govt. agencies sharing information on
a need-to-know basis.
4
Medical Research
  • Validate hypothesis between adverse reaction to a
    drug and a specific DNA sequence.
  • Researchers should not learn anything beyond 4
    counts

DNA Sequences
Mayo Clinic
Drug Reactions
5
Minimal Necessary Information Sharing
  • Compute queries across databases so that no more
    information than necessary is revealed.
  • Need is driven by several trends
  • End-to-end integration of information systems
    across companies.
  • Simultaneously compete and cooperate.
  • Security need-to-know information sharing
  • Privacy legislation stated privacy polices

6
Talk Outline
  • Motivation
  • Problem Definition
  • Protocols
  • Cost Analysis
  • Conclusions

7
Current Techniques
  • Trusted Third Party
  • Has to be completely trusted, both wrt intent and
    competence against security breaches.
  • Secure Multi-Party Computation
  • Given two parties with inputs x and y, compute
    f(x,y) such that the parties learn only f(x,y)
    and nothing else.
  • Can be solved by building a combinatorial ciruit,
    and simulating that circuit Yao86.
  • Cost makes them impractical for database-size
    problems.

8
Our Security Model
  • No third party.
  • Main parties directly execute a protocol, which
    is designed to guarantee that they do not learn
    any more than they would have learnt had they
    given the data to a trusted third party and got
    back the answer.
  • Honest-but-curious behavior Parties follow
    protocol properly, except that they can record
    all computation received messages, and analyze
    them to learn additional information.

9
Problem Statement (Ideal)
  • Given
  • Two parties R (receiver) and S (sender)
  • Databases DR and DS
  • Query Q spanning the tables in DR and DS
  • Compute the answer to Q and return it to R
    without revealing any additional information to
    either party.

Anything R can learn from the answer to the query
is fair game! Example If Q VR ? VS, then for
all v ? VR VS, R knows v ? VS.
10
Problem Statement (Minimal Sharing)
  • Given
  • Two parties R (receiver) and S (sender)
  • Databases DR and DS
  • Query Q spanning the tables in DR and DS
  • Additional (pre-specified) categories of
    information I
  • Compute the answer to Q and return it to R
    without revealing any additional information to
    either party, except for the information
    contained in I

11
Protocols
  • Protocols for four key operations
  • Intersection, Equijoin, Intersection Size
    Equijoin Size
  • Notation
  • TR , TS tables in DR and DS respectively.
  • VR, VS set of distinct values in TR and TS
    respectively.
  • Additional Information I
  • For intersection, intersection size equijoin,
  • I VS , VR
  • For equijoin size, I also includes the
    distribution of duplicates some subset of
    information in VS ? VR

12
Related Work
  • NP99 Protocols for list intersection problem
  • Oblivious evaluation of n polynomials of degree n
    each.
  • Oblivious evaluation of n2 polynomials.
  • HFH99 find people with common preferences,
    without revealing the preferences.
  • Intersection protocols are similar to ours, but
    do not provide proofs of security.
  • Private Information Retrieval
  • Privacy Preserving Data Mining

13
Talk Outline
  • Motivation
  • Problem Definition
  • Protocols
  • Intersection
  • Intersection Size Equijoin Size
  • Joins
  • Proof Methodology
  • Cost Analysis
  • Conclusions

14
A Simple, but Incorrect, Intersection Protocol
R
S
R S agree to use encryption function fe (with
key e)
Shorthand for fe(x) x ? VS
VR
VS
fe(VS )
fe(VS )
VR ? VS v ? VR fe(v) ? fe(VS )
Problem For any element x, R can check whether
fe(x) is in fe(VS )
15
Intersection Protocol Intuition
  • Still want to encrypt the value in VR and VS and
    compare the encrypted values.
  • However, want an encryption function such that it
    can only be jointly computed by R and S, not
    separately.

16
Commutative Encryption
  • Pair of encryption functions f and g such that
  • f(g(v)) g(f(v))
  • Assuming the Decisional Diffie-Hellman (DDH)
    hypothesis,
  • fe(x) xe mod p
  • where
  • p safe prime number, i.e., both p and q(p-1)/2
    are primes
  • Dom f all quadratic residues modulo p, and
  • encryption key e ? 1, 2, , q-1
  • is a commutative encryption.

17
Commutative Encryption (2)
  • The powers commute
  • (xd mod p)e mod p xde mod p (xe mod p)d mod
    p
  • DDH hypothesis The distribution of ltga, gb, gabgt
    is computationally indistinguishable from the
    distribution of ltga, gb, gcgt where a,b,c ?r Dom
    f.
  • Implication ltx, xe, y, yegt is also
    indistinguishable from
  • ltx, xe, y, zgt where x,y,z ?r Dom f.
  • Note DDH does not hold if adversary can select
    a, b, c.

18
Intersection Protocol
Secret key
R
S
eS
eR
VS
VR
feS(VS )
To satisfy DDH, we apply feS on h(VS), where h is
a hash function, not directly on VS.
19
Intersection Protocol
R
S
eS
eR
VS
VR
feS(VS )
feS(VS )
feR(feS(VS ))
Commutative property
feS(feR(VS ))
20
Intersection Protocol
R
S
eS
eR
VS
VR
feR(VR )
feR(VR )
feS(feR(VS ))
lty, feS(y)gt for y ? feR(VR )
lty, feS(y)gt for y ? feR(VR )
Since R knows ltx, feR(x)gt
ltx, feS(feR(x))gt for x ? VR
21
Intersection Size Protocol
R
S
eS
eR
VS
VR
feR(VR )
feS(VS )
R cannot map z ? feR(feS(VR)) back to x ? VR.
feS(VS )
feR(VR )
feR(feS(VS ))
feS(feR(VR ))
feR(feS(VR))
22
Equijoin Size Protocol
  • Same as intersection size protocol, but allows
    duplicates.
  • Can reveal some subset of information in VR ? VS
    based on distribution of duplicates.
  • If each element in VR ? VS has same number of
    duplicates in VR, does not reveal any additional
    information beyond the join size and the
    distribution of duplicates in VS.
  • If each element in VR ? VS has unique number of
    duplicates in VR, reveals VR ? VS and the number
    of duplicates in VS for elements in VR ? VS.

23
Equijoin Protocol Intuition
  • R needs some extra information ext(v) for values
    v ? VR ? VS.
  • ext(v) information about the other attributes in
    TS for those records where TS.A v
  • S has second secret key eS
  • For each value v ? VS,
  • S generates an encryption key ? feS(v), and
  • encrypts ext(v) using encryption function K with
    key ?.
  • S allows R to learn feS(v) only for v ? VR.
  • K need not be a commutative encryption.

24
Join Protocol
R
S
eS, eS
eR
VR
feR(VR )
feR(VR )
lty, feS(y) , feS(y)gt for y ? feR(VR )
ltx, feS(feR(x)), feS(feR(x))gt for x ? VR
feR-1(feS(feR(x)) feR-1(feR(feS(x))
feS(x)
ltx, feS(x), feS(x)gt for x ? VR
25
Join Protocol
S
R
eS, eS
eR
VS ext(VS)
VR
ltx, feS(x), feS(x)gt for x ? VR
ltfeS(v), K(feS(v), ext(v))gt for v ? VS
ltfeS(v), K(feS(v), ext(v))gt for v ? VS
K encryption function, Encrypts ext(v)
using feS(v) as the encryption key
ltx, feS(x), feS(x), K(feS(x), ext(x))gt for x
? VR ? VS
26
Proof Methodology
  • Consider two distributions
  • Ss view of the protocol.
  • a simulation of Ss view that only uses what S is
    supposed to have at the end of the protocol.
  • e.g., VS, VS ? VR, and VR for intersection.
  • If for any VS and VR, these two distributions are
    computationally indistinguishable, then the
    protocol is secure.
  • i.e., S cannot learn anything else from the
    protocol.

27
Proof Methodology (2)
  • Simulation only uses the knowledge S is supposed
    to have at the end of the protocol.
  • Distinguisher can also use the inputs of R, i.e.,
    VR, but not Rs secret keys.
  • Implication S doesnt learn anything from the
    protocol even if S (correctly) guesses some of
    Rs inputs.

28
Proofs
  • We prove (for each protocol) that if the two
    distributions can be distinguished, the DDH
    hypothesis is false.
  • Easy to come up with protocols that look okay,
    but are flawed
  • Proof of security is important for real-world
    acceptance use.
  • The proofs are also fun!

29
Talk Outline
  • Motivation
  • Problem Statement
  • Protocols
  • Cost Analysis
  • Conclusions

30
Cost Analysis Operations
  • Cost is dominated by exponentiations.
  • Let Ce cost of xe mod p
  • x, e, p are all 1024-bit integers
  • Roughly 0.02 seconds on a Pentium 3 (in 2001)
    NP01, or 2 x 105 per hour
  • Intersection 2 (VR VS) Ce
  • Join (2 VR 5 VS) Ce
  • Algorithms are trivially parallelizable.

31
Selective Document Sharing Implementation
  • For each pair of documents dR ? DR and dS ? DS
  • R and S execute the intersection protocol to get
    dR, dS, and dR ? dS.
  • Then compute similarity function f between the
    documents.
  • Note This protocol also reveals to R, for each
    document dR ? DR, the size of dR ? dS for each
  • dS ? DS.

32
Selective Document SharingCost Analysis
  • If
  • DR 10 documents, DS 100 docs,
  • each document has 1000 words,
  • 10 parallel processors,
  • 2 hours computation time
  • 35 minutes communication time (on T1 line).

33
Medical ResearchImplementation
  • Let
  • VR set of ids in Rs database that took the
    drug.
  • VR subset of VR with adverse reaction.
  • VS set of ids in Ss database.
  • VS subset of VS with DNA sequence.
  • Execute intersection size protocol 4 times
  • (VR - VR) ? (VS - VS) (VR - VR) ? VS,
  • VR ? (VS - VS) VR ? VS
  • Modified version of protocol that sends results
    directly to researchers.

34
Medical ResearchCost Analysis
  • If VR VS 1 million ids, and 10 parallel
    processors
  • 4 hours computation time.
  • 1.5 hours communication time.

35
Talk Outline
  • Motivation
  • Problem Statement
  • Protocols
  • Cost Analysis
  • Conclusions

36
Summary
  • Identified information sharing across private
    databases as a new area for database research.
  • Developed novel protocols for intersection,
    intersection size equijoin, and proved that
    these protocols disclose minimal information.
  • Also gave protocol for equijoin size. This
    protocol reveals some information about which
    tuples joined, based on the distribution of
    duplicates.
  • Showed how new applications can be built using
    these protocols.

37
Future Work
  • What is the tradeoff between the additional
    information disclosed and efficiency?
  • Will we be able to obtain much faster protocols
    if we are willing to disclose additional
    information?
  • Can we formalize models of minimal disclosure and
    discover corresponding protocols for higher-level
    database operations?

38
Backup
39
System Components
Cryptographic Protocol
Secure Communication
Libraries ( incl. Encryption Primitives)
Database
Operating System
40
Lemma 1
  • For polynomial m, the distribution of the 2 ? m
    tuple
  • is indistinguishable from the distribution of the
    tuple
  • where

41
Lemma 2
  • For polynomial m and n, the distribution of the 2
    ? n tuple
  • is indistinguishable from the distribution of the
    tuple
  • where

42
Lemma 3
  • For polynomial m and n, the distribution of the 3
    ? n tuple
  • is indistinguishable from the distribution of the
    tuple
  • where

43
Limitations
  • Multiple Queries
  • Schema Discovery Heterogeneity
Write a Comment
User Comments (0)
About PowerShow.com