Title: Poorvi Vora
1A model for data revelation
- Poorvi Vora
- Dept. of Computer Science
- George Washington University
2Security frameworks
- Binary
- Divide the world into trusted and untrusted
parties - Provides complete revelation of information or
complete protection - E.g. multiparty computation, encrypted data
3Even a statistic or aggregate reveals private
information
- Secure multiparty computation reveals
- f(x1, x2, .. xn)
- And nothing more.
- Yet, this reveals information about all xi
- Thus, typical security assurances not enough
4What is privacy
- Control over information
- Extent of information revelation
- Tensions between
- Access to aggregate information for community
- Vs.
- Individual control
- reputation vs. predjudice
5Individual control requires more than binary
security of personal information
- Information is often given up for something in
return - Safeway card
- Monthly charge to be kept of phone books
- Information for community statistics
- Health statistics
- Collaborative filtering/personalization in
virtual communities
6A model introduce uncertaintymaximum
uncertainty (i.e. secrecy) corresponds to crypto
protocols
- Alice and Bob determine
- a binary data point from Alices personal
information, x - a probability of truth, p
- a return, y
- Alice reveals a variable z x with probability p
- Bob provides, in return, y
- z exists in the ether as Alices value x with
probability p - This is not mutually exclusive with cryptographic
protection (p0.5 is cryptographic) - Used in public health community for twenty odd
years
7Outcome
- Protocol is a mathematical game between Alice and
Bob - Optimal situation not when no information is
revealed, but when Alice gets maximum benefit for
her information - Think about this should women in Africa test for
HIV when they will certainly not obtain any
treatment for it?
8An analogy
- The protocol is a communication channel
- The sender is Alice, the receiver (malicious?)
Bob - The probability of error is the probability of a
lie
9Security properties of randomization
- Repeated queries
- Error ? 0 as n ? ?
- And n ? ? as Error ? 0
- Cost to attacker increases without bound if error
not bounded above zero - This is a repetition code over channel
10Other attacks
- Query 1 Graying?
- Query 2 Balding?
- Query 3 Weight?
- Query 4 Sports?
- Really asking about age and gender
- How does one characterize all such attacks?
- What can one say about security wrt such attacks?
11An analogy
- The protocol is a communication channel
- The sender is Alice, the receiver (malicious?)
Bob - The probability of error is the probability of a
lie - The attributes that Bob wants to determine form
the message
12A simple attack
- Query 1 Female?
- Query 2 Over 40?
- Query 3 Losing Calcium?
- Query 3 checks answers to Query 1 and 2
- Is a parity-check it
13An analogy
- All attacks are communication over channel
- Good attacks are codes
- What Bob queries is a codeword bit
- What he receives is the transmitted codeword that
he decodes
14Shannons theorems apply
- In fact, assuming
- any functions of Alices data points as queries
(adaptive, related queries) - and error probability ? 0 as n ??
- The number of queries required per bit of entropy
- is asymptotically tightly bound below by the
inverse of the channel capacity - Above this bound, error tends exponentially to 0
- Below it, it increases exponentially with n
15Questions
- How does one determine the entropy of a
particular data set, or a general data set? - What kinds of attacks are computationally
feasible? - This was a very powerful attacker. What are
reasonable limits on the attackers abilities? - Result in itself, independent of model.
- Partly published at Int. Symp. Info. Theory, 2003
- Journal paper in review, at website
16Value-free model
- Human rights aspects covered through crypto
protocols - Necessary health information and community
information can be gathered - Consumer behaviour treated through this game
- Criticism very adversarial model
17Another application anonymous deliveryCrowds
Reiter and Rubin/Lucent and ATT
At node i1 node i more likely than any other
Receiver Node i1 Message sending
node Received symbol Node i Channel
characteristic Probability that true sender is
Node i, Probability that other nodes are
senders Traffic analysis/data mining
correlations among senders (communication across
channel, less efficient than some
error-correcting code)
B
A
E
C
D
N nodes pf probability of forwarding
18An example of model use to measure the value of
informationwith Yu-An Sun and Sumit Joshi
- Auction bids reveal much about an individuals
profile - Consider the Vickrey sealed second highest bid
auction - Optimal strategy to bid ones valuation
- Bids (and hence valuations) can be protected with
secure multiparty computation - But, bids allow determination of market demand
(efficient markets) - Need for an aggregate value, not well-defined at
the moment of the auction
19Variably Private Vickrey Bidding
RoundIntroduce uncertainty
- The seller announces a minimum sale price and a
maximum randomization setting. - Each bidder submits a sealed interval containing
her bid. The size of the interval is her choice. - In the running with high end, committed to low
20Variably Private Vickrey Revealing Round
- Bidders not in the running will reveal no more
information on their valuations. - Largest of the others will reveal which half of
their interval contains valuation
21Sale Price
Seller gets
Buyer pays
Divided among all bidders proportional to the
interval width
22Properties?
- Provides various demand statistics
- In general, accuracy of future bid estimation
lower for more uncertainty - Allows for bidder to vary uncertainty, and pay
for it - Allows seller to obtain more than regular
Vickrey, depending on how much information is
valued - Bidder with highest valuation still wins auction
as long as she can tolerate revealing her
valuation to the extent required.
23Summary
- A model that we hope will
- Provide choices not currently typically available
to users - Extend the security framework to include problems
like those in statistical databases - Provide a means of measuring uncertainty in
situations where there is some not none or
complete - Include other leakage from security-related
protocols such as anonymous delivery and ciphers - Be useful for measuring the economic value of
information