Privacy Preserving Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Privacy Preserving Data Mining

Description:

Secure Multiparty Computation Basic Cryptographic Methods Li Xiong CS573 Data Privacy and Security – PowerPoint PPT presentation

Number of Views:219
Avg rating:3.0/5.0
Slides: 56
Provided by: Yehu45
Category:

less

Transcript and Presenter's Notes

Title: Privacy Preserving Data Mining


1
Secure Multiparty Computation Basic
Cryptographic Methods
Li Xiong CS573 Data Privacy and Security
2
The Love Game (AKA the AND game)
He loves me, he loves me not
She loves me, she loves me not
Want to know if both parties are interested in
each other. But Do not want to reveal unrequited
love.
Input 1 I love you Input 0 I love
you Must compute F(X,Y)X AND Y, giving
F(X,Y) to both players.
as a friend
Can we reveal the answer without revealing the
inputs?
3
The Spoiled Children Problem(AKA The
Millionaires Problem Yao 1982)
Who has more toys?
Who Cares?
Pearl wants to know whether she has more toys
than Gersh, Doesnt want to tell Gersh
anything. Doesnt want Pearl to know how many
toys he has.
Pearl wants to know whether she has more toys
than Gersh,. Gersh is willing for Pearl to find
out who has more toys,
Can we give Pearl the information she wants, and
nothing else, without giving Gersh any
information at all?
4
Secure Multiparty Computation
  • A set of parties with private inputs
  • Parties wish to jointly compute a function of
    their inputs so that certain security properties
    (like privacy and correctness) are preserved
  • Properties must be ensured even if some of the
    parties maliciously attack the protocol
  • Examples
  • Secure elections
  • Auctions
  • Privacy preserving data mining

5
Application to Private Data Mining
  • The setting
  • Data is distributed at different sites
  • These sites may be third parties (e.g.,
    hospitals, government bodies) or may be the
    individual him or herself
  • The aim
  • Compute the data mining algorithm on the data so
    that nothing but the output is learned
  • Privacy ? Security (why?)

6
Privacy and Secure Computation
  • Privacy ? Security
  • Secure computation only deals with the process of
    computing the function
  • It does not ask whether or not the function
    should be computed
  • A two-stage process
  • Decide that the function/algorithm should be
    computed an issue of privacy
  • Apply secure computation techniques to compute it
    securely security

7
Outline
  • Secure multiparty computation
  • Problem and security definitions
  • Feasibility results for secure computation
  • Basic cryptographic tools and general
    constructions

8
Heuristic Approach to Security
  • Build a protocol
  • Try to break the protocol
  • Fix the break
  • Return to (2)

9
Another Heuristic Tactic
  • Design a protocol
  • Provide a list of attacks that (provably) cannot
    be carried out on the protocol
  • Reason that the list is complete
  • Problem often, the list is not complete

10
A Rigorous Approach
  • Provide an exact problem definition
  • Adversarial power
  • Network model
  • Meaning of security
  • Prove that the protocol is secure

11
Secure Multiparty Computation
  • A set of parties with private inputs wish to
    compute some joint function of their inputs.
  • Parties wish to preserve some security
    properties. e.g., privacy and correctness.
  • Example secure election protocol
  • Security must be preserved in the face of
    adversarial behavior by some of the participants,
    or by an external party.

12
Defining Security
  • Components of ANY security definition
  • Adversarial power
  • Network model
  • Type of network
  • Existence of trusted help
  • Stand-alone versus composition
  • Security guarantees
  • It is crucial that all the above are explicitly
    and clearly defined.

13
Security Requirements
  • Consider a secure auction (with secret bids)
  • An adversary may wish to learn the bids of all
    parties to prevent this, require privacy
  • An adversary may wish to win with a lower bid
    than the highest to prevent this, require
    correctness

14
Defining Security
  • Option 1 analyze security concerns for each
    specific problem
  • Auctions privacy and correctness
  • Contract signing fairness
  • Problems
  • How do we know that all concerns are covered?
  • Definitions are application dependent and need to
    be redefined from scratch for each task

15
Defining Security Option 2
  • The real/ideal model paradigm for defining
    security GMW,GL,Be,MR,Ca
  • Ideal model parties send inputs to a trusted
    party, who computes the function for them
  • Real model parties run a real protocol with no
    trusted help
  • A protocol is secure if any attack on a real
    protocol can be carried out in the ideal model
  • Since no attacks can be carried out in the ideal
    model, security is implied

16
The Real Model
x
y
Protocol output
Protocol output
17
The Ideal Model
x
y
y
x
f1(x,y)
f2(x,y)
f2(x,y)
f1(x,y)
18
The Security Definition
Protocol interaction
Trusted party
IDEAL
REAL
19
Properties of the Definition
  • Privacy
  • The ideal-model adversary cannot learn more about
    the honest partys input than what is revealed by
    the function output
  • Thus, the same is true of the real-model
    adversary
  • Correctness
  • In the ideal model, the function is always
    computed correctly
  • Thus, the same is true in the real-model
  • Others
  • For example, fairness, independence of inputs

20
Why This Approach?
  • General it captures all applications
  • The specifics of an application are defined by
    its functionality, security is defined as above
  • The security guarantees achieved are easily
    understood (because the ideal model is easily
    understood)
  • We can be confident that we did not miss any
    security requirements

21
Adversary Model
  • Computational power
  • Probabilistic polynomial-time versus all-powerful
  • Adversarial behaviour
  • Semi-honest follows protocol instructions
  • Malicious arbitrary actions
  • Corruption behaviour
  • Static set of corrupted parties fixed at onset
  • Adaptive can choose to corrupt parties at any
    time during computation
  • Number of corruptions
  • Honest majority versus unlimited corruptions

22
Outline
  • Secure multiparty computation
  • Defining security
  • Feasibility results for secure computation
  • Basic cryptographic tools and general
    constructions

23
Feasibility A Fundamental Theorem
  • Any multiparty functionality can be securely
    computed
  • For any number of corrupted parties security
    with abort is achieved, assuming enhanced
    trapdoor permutations Yao,GMW
  • With an honest majority full security is
    achieved, assume private channels only BGW,CCD

24
Outline
  • Secure multiparty computation
  • Defining security
  • Feasibility results for secure computation
  • Basic cryptographic tools and general
    constructions

25
Public-key encryption
  • Let (G,E,D) be a public-key encryption scheme
  • G is a key-generation algorithm (pk,sk) ? G
  • Pk public key
  • Sk secret key
  • Terms
  • Plaintext the original text, notated as m
  • Ciphertext the encrypted text, notated as c
  • Encryption c Epk(m)
  • Decryption m Dsk(c)
  • Concept of one-way function knowing c, pk, and
    the function Epk, it is still computationally
    intractable to find m.
  • Different implementations available, e.g. RSA

26
Construction paradigms
  • Passively-secure computation for two-parties
  • Use oblivious transfer to securely select a value
  • Passively-secure computation with shares
  • Use secret sharing scheme such that data can be
    reconstructed from some shares
  • From passively-secure protocols to
    actively-secure protocols
  • Use zero-knowledge proofs to force parties to
    behave in a way consistent with the
    passively-secure protocol

27
1-out-of-2 Oblivious Transfer (OT)
  • 1-out-of-2 Oblivious Transfer (OT)
  • Inputs
  • Sender has two messages m0 and m1
  • Receiver has a single bit ??0,1
  • Outputs
  • Sender receives nothing
  • Receiver obtain m? and learns nothing of m1-?

28
Semi-Honest OT
  • Let (G,E,D) be a public-key encryption scheme
  • G is a key-generation algorithm (pk,sk) ? G
  • Encryption c Epk(m)
  • Decryption m Dsk(c)
  • Assume that a public-key can be sampled without
    knowledge of its secret key
  • Oblivious key generation pk ? OG
  • El-Gamal encryption has this property

29
Semi-Honest OT
  • Protocol for Oblivious Transfer
  • Receiver (with input ?)
  • Receiver chooses one key-pair (pk,sk) and one
    public-key pk (oblivious of secret-key).
  • Receiver sets pk? pk, pk1-? pk
  • Note receiver can decrypt for pk? but not for
    pk1-?
  • Receiver sends pk0,pk1 to sender
  • Sender (with input m0,m1)
  • Sends receiver c0Epk0(m0), c1Epk1(m1)
  • Receiver
  • Decrypts c? using sk and obtains m?.

30
Security Proof
  • Intuition
  • Senders view consists only of two public keys
    pk0 and pk1. Therefore, it doesnt learn anything
    about that value of ?.
  • The receiver only knows one secret-key and so can
    only learn one message
  • Note this assumes semi-honest behavior. A
    malicious receiver can choose two keys together
    with their secret keys.

31
Generalization
  • Can define 1-out-of-k oblivious transfer
  • Protocol remains the same
  • Choose k-1 public keys for which the secret key
    is unknown
  • Choose 1 public-key and secret-key pair

32
General GMW Construction
  • For simplicity consider two-party case
  • Let f be the function that the parties wish to
    compute
  • Represent f as an arithmetic circuit with
    addition and multiplication gates
  • Aim compute gate-by-gate, revealing only random
    shares each time

33
Random Shares Paradigm
  • Let a be some value
  • Party 1 holds a random value a1
  • Party 2 holds aa1
  • Note that without knowing a1, aa1 is just a
    random value revealing nothing of a.
  • We say that the parties hold random shares of a.
  • The computation will be such that all
    intermediate values are random shares (and so
    they reveal nothing).

34
Circuit Computation
  • Stage 1 each party randomly shares its input
    with the other party
  • Stage 2 compute gates of circuit as follows
  • Given random shares to the input wires, compute
    random shares of the output wires
  • Stage 3 combine shares of the output wires in
    order to obtain actual output

NOT
Alices inputs
Bobs inputs
35
Addition Gates
  • Input wires to gate have values a and b
  • Party 1 has shares a1 and b1
  • Party 2 has shares a2 and b2
  • Note a1a2a and b1b2b
  • To compute random shares of output cab
  • Party 1 locally computes c1a1b1
  • Party 2 locally computes c2a2b2
  • Note c1c2a1a2b1b2abc

36
Multiplication Gates
  • Input wires to gate have values a and b
  • Party 1 has shares a1 and b1
  • Party 2 has shares a2 and b2
  • Wish to compute c ab (a1a2)(b1b2)
  • Party 1 knows its concrete share values.
  • Party 2s values are unknown to Party 1, but
    there are only 4 possibilities (depending on
    correspondence to 00,01,10,11)

37
Multiplication (cont)
  • Party 1 prepares a table as follows
  • Row 1 corresponds to Party 2s input 00
  • Row 2 corresponds to Party 2s input 01
  • Row 3 corresponds to Party 2s input 10
  • Row 4 corresponds to Party 2s input 11
  • Let r be a random bit chosen by Party 1
  • Row 1 contains the value a?br when a20,b20
  • Row 2 contains the value a?br when a20,b21
  • Row 3 contains the value a?br when a21,b20
  • Row 4 contains the value a?br when a21,b21

38
Concrete Example
  • Assume a10, b11
  • Assume r1

Row Party 2s shares Output value
1 a20,b20 (00).(10)11
2 a20,b21 (00).(11)11
3 a21,b20 (01).(10)10
4 a21,b21 (01).(11)11
39
The Gate Protocol
  • The parties run a 1-out-of-4 oblivious transfer
    protocol
  • Party 1 plays the sender message i is row i of
    the table.
  • Party 2 plays the receiver it inputs 1 if a20
    and b20, 2 if a20 and b21, and so on
  • Output
  • Party 2 receives c2cr this is its output
  • Party 1 outputs c1r
  • Note c1 and c2 are random shares of c, as
    required

40
Summary
  • By computing each gate these way, at the end the
    parties hold shares of the output wires.
  • Function output generated by simply sending
    shares to each other.

41
Security
  • Reduction to the oblivious transfer protocol
  • Assuming security of the OT protocol, parties
    only see random values until the end. Therefore,
    simulation is straightforward.
  • Note correctness relies heavily on semi-honest
    behavior (otherwise can modify shares).

42
Outline
  • Secure multiparty computation
  • Defining security
  • Feasibility results for secure computation
  • Basic cryptographic tools and general
    constructions
  • Coming up
  • Applications in privacy preserving distributed
    data mining
  • Random response protocols

43
A real-world problem and some simple solutions
  • Bob comes to Ron (a manager), with a complaint
    about a sensitive matter, asking Ron to keep his
    identity confidential
  • A few months later, Moshe (another manager) tells
    Ron that someone has complained to him, also with
    a confidentiality request, about the same matter
  • Ron and Moshe would like to determine whether the
    same person has complained to each of them
    without giving information to each other about
    their identities

Comparing information without leaking it. Fagin
et al, 1996
44
References
  • Secure Multiparty Computation for
    Privacy-Preserving Data Mining, Pinkas, 2008
  • Chapter 7 General Cryptographic Protocols ( 7.1
    Overview), The Foundations of Cryptography,
    Volume 2, Oded Goldreich
  • http//www.wisdom.weizmann.ac.il/Eoded/foc-vo
    l2.html
  • Comparing information without leaking it. Fagin
    et al, 1996

45
Slides credits
  • Tutorial on secure multi-party computation,
    Lindell
  • www.cs.biu.ac.il/lindell/research-statements/tut
    orial-secure-computation.ppt
  • Introduction to secure multi-party computation,
    Vitaly Shmatikov, UT Austin
  • www.cs.utexas.edu/shmat/courses/cs380s_fall08/16
    smc.ppt

46
Remark
  • The semi-honest model is often used as a tool for
    obtaining security against malicious parties.
  • In many (most?) settings, security against
    semi-honest adversaries does not suffice.
  • In some settings, it may suffice.
  • One example hospitals that wish to share data.

47
Malicious Adversaries
  • The above protocol is not secure against
    malicious adversaries
  • A malicious adversary may learn more than it
    should.
  • A malicious adversary can cause the honest party
    to receive incorrect output.
  • We need to be able to extract a malicious
    adversarys input and send it to the trusted
    party.

48
Tool Zero Knowledge
  • Problem setting a prover wishes to prove a
    statement to the verifier so that
  • Zero knowledge the verifier will learn nothing
    beyond the fact that the statement is correct
  • Soundness the prover will not be able to
    convince the verifier of a wrong statement
  • Zero-knowledge proven using simulation.

49
Illustrative Example
  • Prover has two colored cards that he claims are
    of different color
  • The verifier is color blind and wants a proof
    that the colors are different.
  • Idea 1 use a machine to measure the light waves
    and color. But, then the verifier will learn what
    the colors are.

50
Example (continued)
  • Protocol
  • Verifier writes color1 and color2 on the back of
    the cards and shows the prover
  • Verifier holds out one card so that the prover
    only sees the front
  • The prover then says whether or not it is color1
    or color2
  • Soundness if they are both the same color, the
    prover will fail with probability ½. By repeating
    many times, will obtain good soundness bound.
  • Zero knowledge verifier can simulate by itself
    by holding out a card and just saying the color
    that it knows

51
Zero Knowledge
  • Fundamental Theorem GMR zero-knowledge proofs
    exist for all languages in NP
  • Observation given commitment to input and random
    tape, and given incoming message series,
    correctness of next message in a protocol is an
    NP-statement.
  • Therefore, it can be proved in zero-knowledge.

52
Protocol Compilation
  • Given any protocol, construct a new protocol as
    follows
  • Both parties commit to inputs
  • Both parties generate uniform random tape
  • Parties send messages to each other, each message
    is proved correct with respect to the original
    protocol, with zero-knowledge proofs.

53
Resulting Protocol
  • Theorem if the initial protocol was secure
    against semi-honest adversaries, then the
    compiled protocol is secure against malicious
    adversaries.
  • Proof
  • Show that even malicious adversaries are limited
    to semi-honest behavior.
  • Show that the additional messages from the
    compilation all reveal nothing.

54
Summary
  • GMW paradigm
  • First, construct a protocol for semi-honest adv.
  • Then, compile it so that it is secure also
    against malicious adversaries
  • There are many other ways to construct secure
    protocols some of them significantly more
    efficient.
  • Efficient protocols against semi-honest
    adversaries are far easier to obtain than for
    malicious adversaries.

55
Useful References
  • Oded Goldreich. Foundations of Cryptography
    Volume 1 Basic Tools. Cambridge University
    Press.
  • Computational hardness, pseudorandomness, zero
    knowledge
  • Oded Goldreich. Foundations of Cryptography
    Volume 2 Basic Applications. Cambridge
    University Press.
  • Chapter on secure computation
  • Papers an endless list (I would rather not go on
    record here, but am very happy to personally
    refer people).
Write a Comment
User Comments (0)
About PowerShow.com