Foundations of Privacy Lecture 10 presentation

About This Presentation

Transcript and Presenter's Notes

Title: Foundations of Privacy Lecture 10

1
Foundations of PrivacyLecture 10

Lecturer Moni Naor

2
Recap of lecture two weeks ago

Continual changing data
Counters
How to combine expert advice
Multi-counter and the list update problem
Pan Privacy

3
What if the data is dynamic?

Want to handle situations where the data keeps
changing
Not all data is available at the time of
sanitization

Curator/ Sanitizer
4
Google Flu Trends
We've found that certain search terms are good
indicators of flu activity. Google Flu Trends
uses aggregated Google search data to estimate
current flu activity around the world in near
real-time.
5
Example of Utility Google Flu Trends
6
What if the data is dynamic?

Want to handle situations where the data keeps
changing
Not all data is available at the time of
sanitization
Issues
When does the algorithm make an output?
What does the adversary get to examine?
How do we define an individual which we should
protect? DMe
Efficiency measures of the sanitizer

7
Data Streams
Data is a stream of items Sanitizer sees each
item and updates internal state. Produces output
either on-the-fly or at the end
output
Sanitizer
Data Stream
8
Three new issues/concepts

Continual Observation
The adversary gets to examine the output of the
sanitizer all the time
Pan Privacy
The adversary gets to examine the internal state
of the sanitizer. Once? Several times? All the
time?
User vs. Event Level Protection
Are the items singletons or are they related

9
Randomized Response

Randomized Response Technique Warner 1965
Method for polling stigmatizing questions
Idea Lie with known probability.
Specific answers are deniable
Aggregate results are still valid
The data is never stored in the plain

trust no-one
Popular in DB literature Mishra and Sandler.
1
0
1

noise
noise
noise

10
The Dynamic Privacy Zoo
Petting
User-Level Continual Observation Pan Private
Differentially Private
Continual Observation
Pan Private
Randomized Response
User level Private
11
Continual Output Observation
Data is a stream of items Sanitizer sees each
item, updates internal state. Produces an output
observable to the adversary
Output
Sanitizer
12
Continual Observation

Alg - algorithm working on a stream of data
Mapping prefixes of data streams to outputs
Step i output ?i
Alg is e-differentially private against continual
observation if for all
adjacent data streams S and S
for all prefixes t outputs ?1 ?2 ?t

Adjacent data streams can get from one to the
other by changing one element
S acgtbxcde S acgtbycde
PrAlg(S)?1 ?2 ?t
ee 1e
e-e
PrAlg(S)?1 ?2 ?t
13
The Counter Problem
0/1 input stream 011001000100000011000000100101
Goal a publicly observable counter,
approximating the total number of 1s so
far Continual output each time period, output
total number of 1s Want to hide individual
increments while providing reasonable accuracy
14
Counters w. Continual Output Observation
Data is a stream of 0/1 Sanitizer sees each xi,
updates internal state. Produces a value
observable to the adversary
1
1
1
2
Output
Sanitizer
1
0
0
1
0
0
1
1
0
0
0
1
15
Counters w. Continual Output Observation
Continual output each time period, output total
1s Initial idea at each time period, on input
xi 2 0, 1 Update counter by input xi Add
independent Laplace noise with magnitude
1/e Privacy since each increment protected by
Laplace noise differentially private whether xi
is 0 or 1 Accuracy noise cancels out, error
Õ(vT) For sparse streams this error too high.
T total number of time periods
16
Why So Inaccurate?

Operate essentially as in randomized response
No utilization of the state
Problem we do the same operations when the
stream is sparse as when it is dense
Want to act differently when the stream is dense
The times where the counter is updated are
potential leakage

17
Delayed Updates
Main idea update output value only when large
gap between actual count and output Have a good
way of outputting value of counter once the
actual counter noise. Maintain Actual count
At ( noise ) Current output outt ( noise)
D update threshold
18
Delayed Output Counter

Outt - current output
At - count since last update.
Dt - noisy threshold
If At Dt gt fresh noise then
Outt1 ? Outt At fresh noise
At1 ? 0
Dt1 ? D fresh noise
Noise independent Laplace noise with magnitude
1/e
Accuracy
For threshold D w.h.p update about N/D times
Total error (N/D)1/2 noise D noise noise
Set D N1/3 ? accuracy N1/3

delay
19
Privacy of Delayed Output
Outt1?Outt At fresh noise
At Dt gt fresh noise, Dt1 ? D fresh noise

Protect update time and update value
For any two adjacent sequences
101101110001
101101010001
Can pair up noise vectors
?1?2?k-1 ?k ?k1
?1?2?k-1 ?k ?k1
Identical in all locations except one
?k ?k 1

Where first update after difference occurred
Dt Dt
Prob ee
20
Dynamic from Static
Accumulator measured when stream is in the time
frame

Run many accumulators in parallel
each accumulator counts number of 1's in a fixed
segment of time plus noise.
Value of the output counter at any point in time
sum of the accumulators of few segments
Accuracy depends on number of segments in
summation and the accuracy of accumulators
Privacy depends on the number of accumulators
that a point influences

Idea apply conversion of static algorithms into
dynamic ones Bentley-Saxe 1980
Only finished segments used
xt
21
The Segment Construction
Based on the bit representation Each point t is
in dlog te segments ?i1t xi - Sum of at most log
t accumulators
By setting ? ¼ ? / log T can get the desired
privacy Accuracy With all but negligible in T
probability the error at every step t is at most
O((log1.5 T)/?)).
canceling
22
Synthetic Counter

Can make the counter synthetic
Monotone
Each round counter goes up by at most 1
Apply to any monotone function

23
Lower Bound on Accuracy

Theorem additive inaccuracy of log T is
essential for ?-differential privacy, even for
?1
Consider the stream 0T compared to collection of
T/b streams of the form 0jb1b0T-(j1)b
Sj 000000001111000000000000

b
Call output sequence correct if a b/3
approximation for all points in time
24
Lower Bound on Accuracy
Sj000000001111000000000000

Important properties
For any output ratio of probabilities under
stream Sj and 0T should be at least e-?b
Hybrid argument from differential privacy
Any output sequence correct for at most one Sj or
0T
Say probability of a good output sequence is at
least ?

b/3 approximation for all points in time
Good for Sj
Prob under 0T at least ?e-?b
b1/2log T, ? 1/2
T/b ? e-?b 1-?
contradiction
25
Hybrid Proof

Want to show that for any event B

PrA(0T)2 B
Let Sji0jb1i0T-jb-i Sj00T SjbSj
e-eb
PrA(Sj) 2 B
PrA(Sji) 2 B
e-e
PrA(Sji1)2B
PrA(Sj0)2B
PrA(Sj0)2B
PrA(Sjb-1)2B
.
.

e-eb
PrA(Sjb)2B
PrA(Sj1)2B
PrA(Sjb)2B
26
What shall we do with the counter?

Privacy-preserving counting is a basic building
block in more complex environments
General characterizations and transformationsEven
t-level pan-private continual-output algorithm
for any low sensitivity function
Following expert advice privatelyTrack experts
over time, choose who to followNeed to track how
many times each expert was correct

27
Following Expert Advice
Hannan 1957Littlestone Warmuth 1989

n experts, in every time period each gives 0/1
advice
pick which expert to follow
then learn correct answer, say in 0/1
Goal over time, competitive with best expert in
hindsight

1
1
1
0
1
Expert 1
0
1
1
0
0
Expert 2
0
0
1
1
1
Expert 3
0
1
1
0
0
Correct
28
Following Expert Advice
n experts, in every time period each gives 0/1
advice pick which expert to follow then learn
correct answer, say in 0/1 Goal over time,
competitive with best expert in hindsight
Goalmistakes of chosen experts mistakes
made by best expert in hindsight Want 1o(1)
approximation
1
1
1
0
1
Expert 1
0
1
1
0
0
Expert 2
0
0
1
1
1
Expert 3
0
1
1
0
0
Correct
29
Following Expert Advice, Privately

n experts, in every time period each gives 0/1
advice
pick which expert to follow
then learn correct answer, say in 0/1
Goal over time, competitive with best expert in
hindsight
New concern
protect privacy of experts opinions and outcomes
User-level privacyLower bound, no non-trivial
algorithm
Event-level privacy counting gives
1o(1)-competitive

Was the expert consulted at all?
30
Algorithm for Following Expert Advice

Follow perturbed leader Kalai VempalaFor each
expert keep perturbed of mistakesfollow
expert with lowest perturbed count
Idea use counter, count privacy-preserving
mistakes
Problem not every perturbation worksneed
counter with well-behaved noise distribution
Theorem Follow the Privacy-Perturbed LeaderFor
n experts, over T time periods, mistakes is
within poly(log n,log T,1/e) of best expert

31
List Update Problem

There are n distinct elements Aa1, a2, an
Have to maintain them in a list some
permutation
Given a request sequence r1, r2,
Each ri 2 A
For request ri cost is how far ri is in the
current permutation
Can rearrange list between requests
Want to minimize total cost for request sequence
Sequence not known in advance

for each request ri cannot tell whether ri is in
the sequence or not
Our goal do it while providing privacy for the
request sequence, assuming list order is public
32
List Update Problem

In general cost can be very high
First problem to be analyzed in the competitive
framework by Sleator and Tarjan (1985)
Compared to the best algorithm that knows the
sequence in advance
Best algorithms
2- competitive deterministic
Better randomized 1.5
Assume free rearrangements between request
Bad news cannot be better than (1/?)-competitive
if we want to keep privacy

Cannot act until 1/? requests to an element appear
33
Lower bound for Deterministic Algorithms

Bad schedule always ask for the last element in
the list
Cost of online nt
Cost of best fixed list sort the list according
to popularity
Average cost 1/2n
Total cost 1/2nt

34
List Update Problem Static Optimality

A more modest performance goal compete with the
best algorithm that fixes the permutation in
advance
Blum-Chowla-Kalai can be 1o(1) competitive wrt
best static algorithm (probabilistic)
BCK algorithm based on number of times each
element has been requested.
Algorithm
Start with random weights ri in range 1,c
At all times wi ri ci
ci is of times element ai was requested.
At any point in time arrange elements according
to weights

35
Privacy with Static Optimality

Algorithm
Start with random weights ri in range 1,c
At any point in time wi ri ci
ci is of times element ai was requested.
Arrange elements according to weights
Privacy from privacy of counters
list depends on counters plus randomness
Accuracy can show that BCK proof can be modified
to handle approximate counts as well
What about efficiency?

Run with private counter
36
The multi-counter problem

How to run n counters for T time steps
In each round few counters are incremented
Identity of incremented counter is kept private
Work per increment logarithmic in n and T
Idea arrange the n counters in a binary tree
with n leaves
Output counters associated with leaves
For each internal node maintain a counter
corresponding to sum of leaves in subtree

37
The multi-counter problem

Idea arrange the n counters in a binary tree
with n leaves
Output counters associated with leaves
For each internal node maintain
Counter corresponding to sum of leaves in subtree
Register with number of increments since last
output update
When a leaf counter is updated
All log n nodes to root are incremented
Internal state of root updated.
If output of parent node updated, internal state
of children updated

(internal, output)
Determines when to update subtree
38
Tree of Counters
(counter, register)
Output counter
39
The multi-counter problem

Work per increment
log n increment number of counter need to
update
Amortized complexity is O(n log n /k)
k number of times we expect to increment a
counter until output is updated
Privacy each increment of a leaf counter effects
log n counters
Accuracy we have introduced some delay
After t k log n increments all nodes on path
have been update

40
Pan-Privacy
think of the children

In privacy literature data curator trusted
In reality
even well-intentioned curator subject to mission
creep, subpoena, security breach
Pro baseball anonymous drug tests
Facebook policies to protect users from
application developers
Google accounts hacked
Goal curator accumulates statistical
information,but never stores sensitive data
about individuals
Pan-privacy algorithm private inside and out
internal state is privacy-preserving.

41
Randomized Response Warner 1965

Method for polling stigmatizing questions
Idea participants lie with known probability.
Specific answers are deniable
Aggregate results are still valid
Data never stored in the clearpopular in DB
literature MiSa06

Strong guarantee no trust in curator Makes sense
when each users data appears only
once,otherwise limited utility New idea curator
aggregates statistical information,but never
stores sensitive data about individuals
User Response
noise
noise
noise

1
0
1
User Data
42
Aggregation Without Storing Sensitive Data?

Streaming algorithms small storage
Information stored can still be sensitive
My data many appearances, arbitrarily
interleaved with those of others
Pan-Private Algorithm
Private inside and out
Even internal state completely hides the
appearance pattern of any individualpresence,
absence, frequency, etc.

User level
43
Pan-Privacy Model
Data is stream of items, each item belongs to a
user Data of different users interleaved
arbitrarily Curator sees items, updates internal
state, output at stream end
Can also consider multiple intrusions
Pan-Privacy For every possible behavior of user
in stream, joint distribution of the internal
state at any single point in time and the final
output is differentially private
44
Adjacency User Level

Universe U of users whose data in the stream x 2
U
Streams x-adjacent if same projections of users
onto U\x
Example axbxcxdxxxex and abcdxe are x-adjacent
Both project to abcde
Notion of corresponding locations in x-adjacent
streams
U -adjacent 9 x 2 U for which they are
x-adjacent
Simply adjacent, if U is understood
Note Streams of different lengths can be adjacent

45
Example Stream Density or Distinct Elements

Universe U of users, estimate how many distinct
users in U appear in data stream
Application distinct users who searched for
flu
Ideas that dont work
NaïveKeep list of users that appeared (bad
privacy and space)
Streaming
Track random sub-sample of users (bad privacy)
Hash each user, track minimal hash (bad privacy)

46
Pan-Private Density Estimator
Inspired by randomized response. Store for each
user x 2 U a single bit bx Initially all bx
0 w.p. ½ 1 w.p. ½ When encountering
x redraw bx 0 w.p. ½-e 1 w.p. ½e Final
output (fraction of 1s in table - ½)/e noise
Distribution D0
Distribution D1
Pan-PrivacyIf user never appeared entry drawn
from D0If user appeared any of times entry
drawn from D1D0 and D1 are 4e-differentially
private
47
Pan-Private Density Estimator
Inspired by randomized response. Store for each
user x 2 U a single bit bx Initially all bx 0
w.p. ½ 1 w.p. ½ When encountering x redraw
bx 0 w.p. ½-e 1 w.p. ½e Final output
(fraction of 1s in table - ½)/e noise
Improved accuracy and Storage Multiplicative
accuracy using hashing Small storage using
sub-sampling
48
Pan-Private Density Estimator
Theorem density estimation streaming
algorithm e pan-privacy, multiplicative error
a space is poly(1/a,1/e)
49
Density Estimation with Multiple Intrusions

If intrusions are announced, can handle multiple
intrusionsaccuracy degrades exponentially in
of intrusions
Can we do better?
Theorem multiple intrusion lower bounds
If there are either
Two unannounced intrusions (for finite-state
algorithms)
Non-stop intrusions (for any algorithm)
then additive accuracy cannot be better than ?(n)

50
What other statistics have pan-private algorithms?
Density of users appeared at least
once Incidence counts of users appearing k
times exactly Cropped means mean, over users,
of min(t,appearances) Heavy-hitters users
appearing at least k times
51
Counters and Pan Privacy

Is the counter algorithm pan private?
No the internal counts accurately reflect what
happened since last update
Easy to correct store them together with noise
Add (1/?)-Laplacian noise to all accumulators
Both at storage and when added
At most doubles the noise

count
accumulator
noise
52
Continual Intrusion

Consider multiple intrusions
Most desirable resistance to continual intrusion
Adversary can continually examine the internal
state of the algorithm
Implies also continual observation
Something can be done randomized response
But
Theorem any counter that is e-pan-private under
continual observation and with m intrusions must
have additive error ?(vm) with constant
probability.

53
Proof of lower bound

Two distributions
I0 all 0 stream
I1 xi 0 with probability 1 - 1/kvn
and xi 1 with probability 1/kvn.
Let Db be the distribution on states when running
Ib
Claim statistical distance between D0 and D1 is
small
Key point can represent transition probabilities
as
Q0s (x) 1/2 C(x) 1/2 C(x)
Q1s (x) (1/2-1/kvn)C(x)(1/21/kvn)C(x)

Randomized Response is the best we can do
54
Pan Privacy under Continual Observation
Definition? U-adjacent streams S and S, joint
distribution on internal state at any single
location and sequence of all outputs is
differentially private.
Output
Internal state
55
A General Transformation

Transform any static algorithm A to continual
output, maintain
Pan-privacy
Storage size
Hit in accuracy low for large classes of
algorithms
Main idea delayed updatesUpdate output value
only rarely, when large gap between As current
estimate and output

56
Theorem General Transformation
Max output difference on adjacent streams
Transform any algorithm A for monotone function f
with error a, sensitivity sensA, maximum value
N New algorithm has e-privacy under continual
observation, maintains As pan-privacy and
storage Error is Õ(avNsensA/e)
57
General Transformation Main Idea
input a0bcbbde
A
out

Assume A is a pan-private estimator for monotone
f N
If At outt-1 gt D then outt ? At
For threshold D w.h.p update about N/D times

58
General Transformation Main Idea
input a0bcbbde
A
out

Assume A is a pan-private estimator for monotone
f N
As output may not be monotonic
If At outt-1 gt D then outt ? At
What about privacy? Update times, update values
For threshold D w.h.p update about N/D times
Quit if updates exceeds Bound N/D

59
General Transformation Privacy
If At outt-1 gt D then outt ? At What about
privacy? Update times, update values Add
noise Noisy threshold test ? privacy-preserving
update times Noisy update ? privacy preserving
update values
60
Error ÕD(sAN)/(De)
General Transformation Privacy

If At outt-1 noise gt D
then outt ? At noise
Scale noise(s) to BoundsensA/e
Yields (e,d)-diff. privacyPrzS
eePrzSd
Proof pairs noise vectors that are far from
causing quitting on S, with noise vectors for
which S has exact same update times
Few noise vectors bad paired vectors e-private

61
Theorem General Transformation

Transform any algorithm A for monotone function f
with error a, sensitivity sensA, maximum value N
New algorithm
satisfies e-privacy with continual observation,
maintains As pan-privacy and storage
Error is Õ(avNsensA/e)
Extends from monotone to stable functions
Loose characterization of functions that can be
computed privately under continual observation
without pan-privacy

62
What other statistics have pan-private algorithms?

Pan-private streaming algorithms for
Stream density / number of distinct elements
t-cropped mean mean, over users, of
min(t,appearances)
Fraction of users appearing k times exactly
Fraction of heavy-hitters, users appearing at
least k times

63
Incidence Counting

Universe X of users. Given k, estimate what
fraction of users in X appear exactly k times in
data stream
Difficulty cant track individuals of
appearances
Idea keep track of noisy of appearances
However cant accurately track whether
individual appeared 0,k or 100k times.
Different approach follows count-min CM05
idea from streaming literature

User level privacy!
64
Incidence Counting a la Count-Min

Use pan-private algorithm that gets input
hash function h Z?M (for small range M)
target val
Outputs fraction of users with h(appearances)
val
Given this, estimate k-incidence as fraction of
users with
h( appearances) h(k)
Concern Might we over-estimate? (hash
collisions)
Accuracy If h has low collision prob, then with
some probability collisions are few and estimate
is accurate.
Repeat to amplify (output minimal estimate)

65
Putting it together

Hash by choosing small random prime ph(z) z
(mod p)
Pan-private modular incidence counterGets p and
val, estimates fraction of users with
appearances val (mod p)space is poly(p), but
small p suffices
Theorem k-incidence counting streaming
algorithm
e pan-privacy, multiplicative error a,upper
bound N on number of appearances.
Space is poly(1/a,1/e,log N)

66
t -Incidence Estimator

Let R 1, 2, , r be the smallest range of
integers containing at least 4 logN/? distinct
prime numbers.
Choose at random L distinct primes p1, p2,,pL
Run modular incidence counter these L primes.
When a value x 2 M appears update each of the L
modular counters
For any desired t For each i 2 L
Let fi b the i-th modular incidence counter t
(mod pi)
Output the (noisy) minimum of these fractions

67
Pan-Private Modular Incidence Counter

For every user x, keep counter cx20,,p-1Increa
se counter (mod p) every time user appears
If initially 0 no privacy, but perfect accuracy
If initially random perfect privacy, but no
accuracy
Initialize using a distribution slightly biased
towards 0
Prcxi e-ei/(p-1)
Privacy users appearances has only small
effecton distribution of cx

0
p-1
68
Modular Incidence Counter Accuracy

For j2 0,,p-1
oj is users with observed noisy count j
tj is true users that truly appear j times (mod
p)
oj ? tj-k (mod p)e-ek/(p-1)
Using observed ojsGet p (approx.) equations in
p variables (the tks)Solve using linear
programming
Solution is close to true counts

p-1
k0
69
Pan-private Algorithms
Continual Observation
Density of users appeared at least
once Incidence counts of users appearing k
times exactly Cropped means mean, over users, of
min(t,appearances) Heavy-hitters users
appearing at least k times
70
The Dynamic Privacy Zoo
Petting
Continual Pan Privacy
Differentially Private Outputs
Privacy under Continual Observation
Pan Privacy
Sketch vs. Stream
User level Privacy

Write a Comment

User Comments (0)

About PowerShow.com

Foundations of Privacy Lecture 10 PowerPoint PPT Presentation