Title: Foundations of Privacy Lecture 10
 1Foundations of PrivacyLecture 10 
  2Recap of lecture two weeks ago
- Continual changing data 
- Counters 
- How to combine expert advice 
- Multi-counter and the list update problem 
- Pan Privacy 
3What if the data is dynamic?
- Want to handle situations where the data keeps 
 changing
- Not all data is available at the time of 
 sanitization
-  
Curator/ Sanitizer 
 4Google Flu Trends
We've found that certain search terms are good 
indicators of flu activity. Google Flu Trends 
uses aggregated Google search data to estimate 
current flu activity around the world in near 
real-time. 
 5Example of Utility Google Flu Trends 
 6What if the data is dynamic?
- Want to handle situations where the data keeps 
 changing
- Not all data is available at the time of 
 sanitization
-  Issues 
- When does the algorithm make an output? 
- What does the adversary get to examine? 
- How do we define an individual which we should 
 protect? DMe
- Efficiency measures of the sanitizer 
7Data Streams
Data is a stream of items Sanitizer sees each 
item and updates internal state. Produces output 
either on-the-fly or at the end
output
Sanitizer
Data Stream 
 8Three new issues/concepts
- Continual Observation 
- The adversary gets to examine the output of the 
 sanitizer all the time
- Pan Privacy 
- The adversary gets to examine the internal state 
 of the sanitizer. Once? Several times? All the
 time?
- User vs. Event Level Protection 
- Are the items singletons or are they related 
9Randomized Response
- Randomized Response Technique Warner 1965 
- Method for polling stigmatizing questions 
- Idea Lie with known probability. 
- Specific answers are deniable 
- Aggregate results are still valid 
- The data is never stored in the plain 
trust no-one
Popular in DB literature Mishra and Sandler. 
1
0
1
noise
noise
noise 
 10The Dynamic Privacy Zoo
Petting
User-Level Continual Observation Pan Private
Differentially Private
Continual Observation
Pan Private
Randomized Response
User level Private 
 11Continual Output Observation
Data is a stream of items Sanitizer sees each 
item, updates internal state. Produces an output 
observable to the adversary
Output
Sanitizer 
 12Continual Observation
- Alg - algorithm working on a stream of data 
- Mapping prefixes of data streams to outputs 
- Step i output ?i 
- Alg is e-differentially private against continual 
 observation if for all
- adjacent data streams S and S 
- for all prefixes t outputs ?1 ?2  ?t 
-  
Adjacent data streams can get from one to the 
other by changing one element 
S acgtbxcde S acgtbycde 
PrAlg(S)?1 ?2  ?t
 ee  1e 
e-e 
PrAlg(S)?1 ?2  ?t 
 13The Counter Problem
0/1 input stream 011001000100000011000000100101
 Goal  a publicly observable counter, 
approximating the total number of 1s so 
far Continual output each time period, output 
total number of 1s Want to hide individual 
increments while providing reasonable accuracy  
 14Counters w. Continual Output Observation
Data is a stream of 0/1 Sanitizer sees each xi, 
updates internal state. Produces a value 
observable to the adversary
1
1
1
2
Output
Sanitizer
1
0
0
1
0
0
1
1
0
0
0
1 
 15Counters w. Continual Output Observation
 Continual output each time period, output total 
1s Initial idea at each time period, on input 
xi 2 0, 1 Update counter by input xi Add 
independent Laplace noise with magnitude 
1/e Privacy since each increment protected by 
Laplace noise  differentially private whether xi 
is 0 or 1 Accuracy noise cancels out, error 
Õ(vT) For sparse streams this error too high. 
T  total number of time periods 
 16Why So Inaccurate?
- Operate essentially as in randomized response 
- No utilization of the state 
- Problem we do the same operations when the 
 stream is sparse as when it is dense
- Want to act differently when the stream is dense 
- The times where the counter is updated are 
 potential leakage
17Delayed Updates
 Main idea update output value only when large 
gap between actual count and output Have a good 
way of outputting value of counter once the 
actual counter  noise. Maintain Actual count 
At ( noise ) Current output outt ( noise) 
D  update threshold 
 18Delayed Output Counter
- Outt - current output 
- At - count since last update. 
- Dt - noisy threshold 
-  
- If At  Dt gt fresh noise then 
-  Outt1 ? Outt  At  fresh noise 
-  At1 ? 0 
-  Dt1 ? D  fresh noise 
- Noise independent Laplace noise with magnitude 
 1/e
- Accuracy 
- For threshold D w.h.p update about N/D times 
- Total error (N/D)1/2 noise  D  noise  noise 
- Set D  N1/3 ? accuracy  N1/3 
delay 
 19Privacy of Delayed Output
Outt1?Outt At fresh noise
At  Dt gt fresh noise, Dt1 ? D  fresh noise 
- Protect update time and update value 
- For any two adjacent sequences 
- 101101110001 
- 101101010001 
- Can pair up noise vectors 
- ?1?2?k-1 ?k ?k1 
- ?1?2?k-1 ?k ?k1 
- Identical in all locations except one 
- ?k  ?k 1 
Where first update after difference occurred
Dt Dt
Prob  ee  
 20Dynamic from Static
Accumulator measured when stream is in the time 
frame
- Run many accumulators in parallel 
- each accumulator counts number of 1's in a fixed 
 segment of time plus noise.
- Value of the output counter at any point in time 
 sum of the accumulators of few segments
- Accuracy depends on number of segments in 
 summation and the accuracy of accumulators
- Privacy depends on the number of accumulators 
 that a point influences
Idea apply conversion of static algorithms into 
dynamic ones Bentley-Saxe 1980 
Only finished segments used 
xt 
 21The Segment Construction
Based on the bit representation Each point t is 
in dlog te segments ?i1t xi - Sum of at most log 
t accumulators 
By setting ? ¼ ? / log T can get the desired 
privacy Accuracy With all but negligible in T 
probability the error at every step t is at most 
O((log1.5 T)/?)). 
canceling 
 22Synthetic Counter
- Can make the counter synthetic 
- Monotone 
- Each round counter goes up by at most 1 
- Apply to any monotone function
23Lower Bound on Accuracy 
- Theorem additive inaccuracy of log T is 
 essential for ?-differential privacy, even for
 ?1
- Consider the stream 0T compared to collection of 
 T/b streams of the form 0jb1b0T-(j1)b
- Sj  000000001111000000000000 
b
Call output sequence correct if a b/3 
approximation for all points in time 
 24Lower Bound on Accuracy
Sj000000001111000000000000 
- Important properties 
- For any output ratio of probabilities under 
 stream Sj and 0T should be at least e-?b
- Hybrid argument from differential privacy 
- Any output sequence correct for at most one Sj or 
 0T
- Say probability of a good output sequence is at 
 least ?
b/3 approximation for all points in time
Good for Sj
Prob under 0T at least ?e-?b
b1/2log T, ? 1/2 
T/b ? e-?b  1-? 
contradiction 
 25Hybrid Proof
- Want to show that for any event B
PrA(0T)2 B
Let Sji0jb1i0T-jb-i Sj00T SjbSj
e-eb 
PrA(Sj) 2 B
PrA(Sji) 2 B
e-e 
PrA(Sji1)2B
PrA(Sj0)2B
PrA(Sj0)2B
PrA(Sjb-1)2B
.
.
  e-eb 
PrA(Sjb)2B
PrA(Sj1)2B
PrA(Sjb)2B 
 26What shall we do with the counter?
- Privacy-preserving counting is a basic building 
 block in more complex environments
- General characterizations and transformationsEven
 t-level pan-private continual-output algorithm
 for any low sensitivity function
- Following expert advice privatelyTrack experts 
 over time, choose who to followNeed to track how
 many times each expert was correct
27Following Expert Advice
Hannan 1957Littlestone Warmuth 1989
- n experts, in every time period each gives 0/1 
 advice
- pick which expert to follow 
- then learn correct answer, say in 0/1 
- Goal over time, competitive with best expert in 
 hindsight
1
1
1
0
1
Expert 1
0
1
1
0
0
Expert 2
0
0
1
1
1
Expert 3
0
1
1
0
0
Correct 
 28Following Expert Advice
n experts, in every time period each gives 0/1 
advice pick which expert to follow then learn 
correct answer, say in 0/1 Goal over time, 
competitive with best expert in hindsight 
Goalmistakes of chosen experts mistakes 
made by best expert in hindsight Want 1o(1) 
approximation 
1
1
1
0
1
Expert 1
0
1
1
0
0
Expert 2
0
0
1
1
1
Expert 3
0
1
1
0
0
Correct 
 29Following Expert Advice, Privately
- n experts, in every time period each gives 0/1 
 advice
- pick which expert to follow 
- then learn correct answer, say in 0/1 
- Goal over time, competitive with best expert in 
 hindsight
- New concern 
- protect privacy of experts opinions and outcomes 
- User-level privacyLower bound, no non-trivial 
 algorithm
- Event-level privacy counting gives 
 1o(1)-competitive
Was the expert consulted at all? 
 30Algorithm for Following Expert Advice
- Follow perturbed leader Kalai VempalaFor each 
 expert keep perturbed  of mistakesfollow
 expert with lowest perturbed count
- Idea use counter, count privacy-preserving 
 mistakes
- Problem not every perturbation worksneed 
 counter with well-behaved noise distribution
- Theorem Follow the Privacy-Perturbed LeaderFor 
 n experts, over T time periods,  mistakes is
 within  poly(log n,log T,1/e) of best expert
31List Update Problem
- There are n distinct elements Aa1, a2,  an 
- Have to maintain them in a list  some 
 permutation
- Given a request sequence r1, r2,  
- Each ri 2 A 
- For request ri cost is how far ri is in the 
 current permutation
- Can rearrange list between requests 
- Want to minimize total cost for request sequence 
- Sequence not known in advance 
for each request ri cannot tell whether ri is in 
the sequence or not
Our goal do it while providing privacy for the 
request sequence, assuming list order is public 
 32List Update Problem
- In general cost can be very high 
- First problem to be analyzed in the competitive 
 framework by Sleator and Tarjan (1985)
- Compared to the best algorithm that knows the 
 sequence in advance
- Best algorithms 
- 2- competitive deterministic 
- Better randomized  1.5 
- Assume free rearrangements between request 
- Bad news cannot be better than (1/?)-competitive 
 if we want to keep privacy
Cannot act until 1/? requests to an element appear 
 33Lower bound for Deterministic Algorithms
- Bad schedule always ask for the last element in 
 the list
- Cost of online nt 
- Cost of best fixed list sort the list according 
 to popularity
-  Average cost  1/2n 
- Total cost  1/2nt
34List Update Problem Static Optimality
- A more modest performance goal compete with the 
 best algorithm that fixes the permutation in
 advance
- Blum-Chowla-Kalai can be 1o(1) competitive wrt 
 best static algorithm (probabilistic)
- BCK algorithm based on number of times each 
 element has been requested.
- Algorithm 
- Start with random weights ri in range 1,c 
- At all times wi  ri  ci 
-  ci is  of times element ai was requested. 
- At any point in time arrange elements according 
 to weights
35Privacy with Static Optimality
- Algorithm 
- Start with random weights ri in range 1,c 
- At any point in time wi  ri  ci 
-  ci is  of times element ai was requested. 
- Arrange elements according to weights 
- Privacy from privacy of counters 
- list depends on counters plus randomness 
- Accuracy can show that BCK proof can be modified 
 to handle approximate counts as well
- What about efficiency?
Run with private counter 
 36The multi-counter problem
- How to run n counters for T time steps 
- In each round few counters are incremented 
- Identity of incremented counter is kept private 
- Work per increment logarithmic in n and T 
- Idea arrange the n counters in a binary tree 
 with n leaves
- Output counters associated with leaves 
- For each internal node maintain a counter 
 corresponding to sum of leaves in subtree
37The multi-counter problem
- Idea arrange the n counters in a binary tree 
 with n leaves
- Output counters associated with leaves 
- For each internal node maintain 
- Counter corresponding to sum of leaves in subtree 
- Register with number of increments since last 
 output update
- When a leaf counter is updated 
- All log n nodes to root are incremented 
- Internal state of root updated. 
- If output of parent node updated, internal state 
 of children updated
(internal, output)
Determines when to update subtree 
 38Tree of Counters
 (counter, register) 
Output counter 
 39The multi-counter problem
- Work per increment 
- log n increment  number of counter need to 
 update
- Amortized complexity is O(n log n /k) 
- k number of times we expect to increment a 
 counter until output is updated
- Privacy each increment of a leaf counter effects 
 log n counters
- Accuracy we have introduced some delay 
- After t  k log n increments all nodes on path 
 have been update
40Pan-Privacy
think of the children 
- In privacy literature data curator trusted 
- In reality 
- even well-intentioned curator subject to mission 
 creep, subpoena, security breach
- Pro baseball anonymous drug tests 
- Facebook policies to protect users from 
 application developers
- Google accounts hacked 
- Goal curator accumulates statistical 
 information,but never stores sensitive data
 about individuals
- Pan-privacy algorithm private inside and out 
- internal state is privacy-preserving.
41Randomized Response Warner 1965
- Method for polling stigmatizing questions 
- Idea participants lie with known probability. 
- Specific answers are deniable 
- Aggregate results are still valid 
- Data never stored in the clearpopular in DB 
 literature MiSa06
Strong guarantee no trust in curator Makes sense 
when each users data appears only 
once,otherwise limited utility New idea curator 
aggregates statistical information,but never 
stores sensitive data about individuals 
User Response
noise
noise
noise
1
0
1
User Data 
 42Aggregation Without Storing Sensitive Data?
- Streaming algorithms small storage 
- Information stored can still be sensitive 
- My data many appearances, arbitrarily 
 interleaved with those of others
- Pan-Private Algorithm 
- Private inside and out 
- Even internal state completely hides the 
 appearance pattern of any individualpresence,
 absence, frequency, etc.
User level 
 43Pan-Privacy Model
Data is stream of items, each item belongs to a 
user Data of different users interleaved 
arbitrarily Curator sees items, updates internal 
state, output at stream end
Can also consider multiple intrusions
Pan-Privacy For every possible behavior of user 
in stream, joint distribution of the internal 
state at any single point in time and the final 
output is differentially private 
 44Adjacency User Level
- Universe U of users whose data in the stream x 2 
 U
- Streams x-adjacent if same projections of users 
 onto U\x
-  Example axbxcxdxxxex and abcdxe are x-adjacent 
- Both project to abcde 
- Notion of corresponding locations in x-adjacent 
 streams
- U -adjacent 9 x 2 U for which they are 
 x-adjacent
- Simply adjacent, if U is understood 
- Note Streams of different lengths can be adjacent
45Example Stream Density or  Distinct Elements
- Universe U of users, estimate how many distinct 
 users in U appear in data stream
- Application  distinct users who searched for 
 flu
- Ideas that dont work 
- NaïveKeep list of users that appeared (bad 
 privacy and space)
- Streaming 
- Track random sub-sample of users (bad privacy) 
- Hash each user, track minimal hash (bad privacy) 
46Pan-Private Density Estimator
Inspired by randomized response. Store for each 
user x 2 U a single bit bx Initially all bx 
 0 w.p. ½ 1 w.p. ½ When encountering 
x redraw bx 0 w.p. ½-e 1 w.p. ½e Final 
output (fraction of 1s in table - ½)/e  noise
Distribution D0 
Distribution D1 
Pan-PrivacyIf user never appeared entry drawn 
from D0If user appeared any  of times entry 
drawn from D1D0 and D1 are 4e-differentially 
private 
 47Pan-Private Density Estimator
Inspired by randomized response. Store for each 
user x 2 U a single bit bx Initially all bx 0 
w.p. ½ 1 w.p. ½ When encountering x redraw 
bx 0 w.p. ½-e 1 w.p. ½e Final output 
(fraction of 1s in table - ½)/e  noise
Improved accuracy and Storage Multiplicative 
accuracy using hashing Small storage using 
sub-sampling 
 48Pan-Private Density Estimator
 Theorem density estimation streaming 
algorithm e pan-privacy, multiplicative error 
a space is poly(1/a,1/e) 
 49Density Estimation with Multiple Intrusions
- If intrusions are announced, can handle multiple 
 intrusionsaccuracy degrades exponentially in
 of intrusions
- Can we do better? 
- Theorem multiple intrusion lower bounds 
- If there are either 
- Two unannounced intrusions (for finite-state 
 algorithms)
- Non-stop intrusions (for any algorithm) 
- then additive accuracy cannot be better than ?(n)
50What other statistics have pan-private algorithms?
Density  of users appeared at least 
once Incidence counts  of users appearing k 
times exactly Cropped means mean, over users, 
of min(t,appearances) Heavy-hitters users 
appearing at least k times 
 51Counters and Pan Privacy
- Is the counter algorithm pan private? 
- No the internal counts accurately reflect what 
 happened since last update
- Easy to correct store them together with noise 
- Add (1/?)-Laplacian noise to all accumulators 
- Both at storage and when added 
- At most doubles the noise
count
accumulator
noise 
 52Continual Intrusion
- Consider multiple intrusions 
- Most desirable resistance to continual intrusion 
- Adversary can continually examine the internal 
 state of the algorithm
- Implies also continual observation 
- Something can be done randomized response 
- But 
- Theorem any counter that is e-pan-private under 
 continual observation and with m intrusions must
 have additive error ?(vm) with constant
 probability.
53Proof of lower bound
- Two distributions 
- I0 all 0 stream 
- I1 xi  0 with probability 1 - 1/kvn 
-  and xi  1 with probability 1/kvn. 
- Let Db be the distribution on states when running 
 Ib
- Claim statistical distance between D0 and D1 is 
 small
- Key point can represent transition probabilities 
 as
- Q0s (x)  1/2 C(x) 1/2 C(x) 
- Q1s (x)  (1/2-1/kvn)C(x)(1/21/kvn)C(x) 
Randomized Response is the best we can do 
 54Pan Privacy under Continual Observation
Definition? U-adjacent streams S and S, joint 
distribution on internal state at any single 
location and sequence of all outputs is 
differentially private. 
Output
Internal state 
 55A General Transformation
- Transform any static algorithm A to continual 
 output, maintain
- Pan-privacy 
- Storage size 
- Hit in accuracy low for large classes of 
 algorithms
- Main idea delayed updatesUpdate output value 
 only rarely, when large gap between As current
 estimate and output
56Theorem General Transformation
Max output difference on adjacent streams
Transform any algorithm A for monotone function f 
with error a, sensitivity sensA, maximum value 
N New algorithm has e-privacy under continual 
observation, maintains As pan-privacy and 
storage Error is Õ(avNsensA/e) 
 57General Transformation Main Idea
input a0bcbbde
A
out
- Assume A is a pan-private estimator for monotone 
 f  N
-  
- If At  outt-1 gt D then outt ? At 
- For threshold D w.h.p update about N/D times 
58General Transformation Main Idea
input a0bcbbde
A
out
- Assume A is a pan-private estimator for monotone 
 f  N
-  As output may not be monotonic 
- If At  outt-1 gt D then outt ? At 
-  What about privacy? Update times, update values 
- For threshold D w.h.p update about N/D times 
-  Quit if updates exceeds Bound  N/D 
59General Transformation Privacy
If At  outt-1 gt D then outt ? At What about 
privacy? Update times, update values Add 
noise Noisy threshold test ? privacy-preserving 
update times Noisy update ?  privacy preserving 
update values 
 60Error ÕD(sAN)/(De)
General Transformation Privacy
- If At  outt-1 noise  gt D 
-  then outt ? At noise 
- Scale noise(s) to  BoundsensA/e  
- Yields (e,d)-diff. privacyPrzS  
 eePrzSd
- Proof pairs noise vectors that are far from 
 causing quitting on S, with noise vectors for
 which S has exact same update times
- Few noise vectors bad paired vectors e-private
61Theorem General Transformation
- Transform any algorithm A for monotone function f 
 
- with error a, sensitivity sensA, maximum value N 
- New algorithm 
-  satisfies e-privacy with continual observation, 
-  maintains As pan-privacy and storage 
-  Error is Õ(avNsensA/e) 
- Extends from monotone to stable functions 
- Loose characterization of functions that can be 
 computed privately under continual observation
 without pan-privacy
62What other statistics have pan-private algorithms?
- Pan-private streaming algorithms for 
- Stream density / number of distinct elements 
- t-cropped mean mean, over users, of 
 min(t,appearances)
- Fraction of users appearing k times exactly 
- Fraction of heavy-hitters, users appearing at 
 least k times
63Incidence Counting
- Universe X of users. Given k, estimate what 
 fraction of users in X appear exactly k times in
 data stream
- Difficulty cant track individuals  of 
 appearances
- Idea keep track of noisy  of appearances 
- However cant accurately track whether 
 individual appeared 0,k or 100k times.
- Different approach follows count-min CM05 
 idea from streaming literature
User level privacy! 
 64Incidence Counting a la Count-Min
- Use pan-private algorithm that gets input 
- hash function h Z?M (for small range M) 
- target val 
- Outputs fraction of users with h(appearances)  
 val
- Given this, estimate k-incidence as fraction of 
 users with
- h( appearances)  h(k) 
- Concern Might we over-estimate? (hash 
 collisions)
- Accuracy If h has low collision prob, then with 
 some probability collisions are few and estimate
 is accurate.
- Repeat to amplify (output minimal estimate) 
65Putting it together
- Hash by choosing small random prime ph(z)  z 
 (mod p)
- Pan-private modular incidence counterGets p and 
 val, estimates fraction of users with
 appearances  val (mod p)space is poly(p), but
 small p suffices
- Theorem k-incidence counting streaming 
 algorithm
- e pan-privacy, multiplicative error a,upper 
 bound N on number of appearances.
- Space is poly(1/a,1/e,log N) 
66t -Incidence Estimator
- Let R  1, 2, , r be the smallest range of 
 integers containing at least 4 logN/? distinct
 prime numbers.
- Choose at random L distinct primes p1, p2,,pL 
- Run modular incidence counter these L primes. 
- When a value x 2 M appears update each of the L 
 modular counters
- For any desired t For each i 2 L 
- Let fi b the i-th modular incidence counter t 
 (mod pi)
- Output the (noisy) minimum of these fractions
67Pan-Private Modular Incidence Counter
- For every user x, keep counter cx20,,p-1Increa
 se counter (mod p) every time user appears
- If initially 0 no privacy, but perfect accuracy 
- If initially random perfect privacy, but no 
 accuracy
- Initialize using a distribution slightly biased 
 towards 0
-  
-  Prcxi  e-ei/(p-1) 
- Privacy users  appearances has only small 
 effecton distribution of cx
0
p-1 
 68Modular Incidence Counter Accuracy
- For j2 0,,p-1 
- oj is  users with observed noisy count j 
- tj is true  users that truly appear j times (mod 
 p)
- oj  ? tj-k (mod p)e-ek/(p-1) 
- Using observed ojsGet p (approx.) equations in 
 p variables (the tks)Solve using linear
 programming
- Solution is close to true counts 
p-1
k0 
 69Pan-private Algorithms
Continual Observation
Density  of users appeared at least 
once Incidence counts  of users appearing k 
times exactly Cropped means mean, over users, of 
min(t,appearances) Heavy-hitters users 
appearing at least k times 
 70The Dynamic Privacy Zoo
Petting
Continual Pan Privacy
Differentially Private Outputs
Privacy under Continual Observation
Pan Privacy
Sketch vs. Stream
User level Privacy