Title: Sketching and Streaming Entropy via Approximation Theory
1Sketching and Streaming Entropy via Approximation
Theory
Nick Harvey (MSR/Waterloo) Jelani Nelson
(MIT) Krzysztof Onak (MIT)
2Streaming Model
m updates
Increment x4
Increment x1
x ? Zn
Goal Compute statistics, e.g. x1, x2
Trivial solution Store x (or store all
updates) O(nlog(m))
space
Goal Compute using O(polylog(nm)) space
3Streaming Algorithms(a very brief introduction)
- Fact Alon-Matias-Szegedy 99, Bar-Yossef et
al. 02, Indyk-Woodruff 05, Bhuvanagiri et
al. 06, Indyk 06, Li 08, Li 09 - Can compute (1?) (1?)Fp using O(?-2
logc n) bits of space (if 0? p?2) O(?-O(1)
n1-2/p logO(1)(n)) bits (if 2ltp??) - Another Fact Mostly optimal Alon-Matias-Szegedy
99, Bar-Yossef et al. 02, Saks-Sun 02,
Chakrabarti-Khot-Sun 03, Indyk-Woodruff 03,
Woodruff 04 - Proofs using communication complexity and
information theory
4Practical Motivation
- General goal Dealing with massive data sets
- Internet traffic, large databases,
- Network monitoring anomaly detection
- Stream consists of internet packets
- xi packets sent to port i
- Under typical conditions, x is very concentrated
- Under port scan attack, x less concentrated
- Can detect by estimating empirical entropy
Lakhina et al. 05, Xu et al. 05, Zhao et
al. 07
5Entropy
- Probability distribution a (a1, a2, , an)
- Entropy H(a) -S ailg(ai)
- Examples
- a (1/n, 1/n, , 1/n) H(a) lg(n)
- a (0, , 0, 1, 0, , 0) H(a) 0
- small when concentrated, LARGE when not
6Streaming Algorithms for Entropy
- How much space to estimate H(x)?
- Guha-McGregor-Venkatasubramanian 06,
- Chakrabarti-Do Ba-Muthu 06,
Bhuvanagiri-Ganguly 06 - Chakrabarti-Cormode-McGregor 07
multiplicative (1?) approx O(?-2 log2 m) bits
additive ? approx O(?-2 log4 m)
bits O(?-2) lower bound for both - Our contributions
- Additive ? or multiplicative (1?) approximation
- Õ(?-2 log3 m) bits, and can handle deletions
- Can sketch entropy in the same space
7First Idea
- If you can estimate Fp for p1,
- then you can estimate H(x)
Why?
Rényi entropy
8Review of Rényi
- Definition
- Convergence to Shannon
Hp(x)
1
0
2
Alfred Rényi
Claude Shannon
p
9Overview of Algorithm
Analysis
- Set p1.01 and let x
- Compute
- Set
- So
(using Lis compressed counting)
10Making the tradeoff
- How quickly does Hp(x) converge to H(x)?
- Theorem Let x be distr., with mini xi 1/m.
- Let . Then
- Let . Then
- Plugging in O(?-3 log4 m) bits of space suffice
for additive ? approximation
Multiplicative Approximation
Additive Approximation
11Proof A trick worth remembering
- Let f R ? R and g R ? R be such that
- It actually says more! It says
converges toat least as fast as
does.
12Improvements
- Status additive ? approx using O(?-3 log4 m)
bits - How to reduce space further?
- Interpolate with multiple points Hp1(x), Hp2(x),
...
13Analyzing Interpolation
- Let f(z) be a Ck1 function
- Interpolate f with polynomial q with q(zi)f(zi),
0ik - Fact
- where y, zi
a,b - Our case Set f(z) H1z(x)
- Goal Analyze f(k1)(z)
14Bounding Derivatives
- Rényi derivatives are messy to analyze
- Switch to Tsallis entropy f(z) S1z(x),
- Can prove Tsallis also converges to Shannon
Fact
(when a-O(1/(klog m)), b0) can set k
log(1/e)loglog m
15Key IngredientNoisy Interpolation
- We dont have f(zi), we have f(zi)e
- How to interpolate in presence of noise?
- Idea we pick our zi very carefully
16Chebyshev Polynomials
- Rogosinskis Theorem
- q(x) of degree k and q(ßj) 1 (0jk)
- q(x) Tk(x) for x gt 1
- Map -1,1 onto interpolation interval z0,zk
- Choose zj to be image of ßj, j0,,k
- Let q(z) interpolate f(zj)e and q(z) interpolate
f(zj) - r(z) (q(z)-q(z))/ e satisfies Rogosinskis
conditions!
17Tradeoff in Choosing zk
Tk grows quickly once leaving z0, zk
- zk close to 0 Tk(preimage(0))still
small - but zk close to 0 high space complexity
- Just how close do we need 0 and zk to be?
0
z0
zk
18The Magic of Chebyshev
- Paturi 92Tk(1 1/kc) e4k1-(c/2). Set c
2. - Suffices to set zk-O(1/(k3log m))
- Translates to Õ(?-2 log3 m) space
19The Final Algorithm(additive approximation)
- Set k lg(1/?) lglg(m),
- zj (k2cos(jp/k)-(k21))/(9k3lg(m)) (0
j k) - Estimate S1zj (1-(F1zj/(F1)1zj))/zj for 0
j k - Interpolate degree-k polynomial q(zj) S1zj
- Output q(0)
20Multiplicative Approximation
- How to get multiplicative approximation?
- Additive approximation is multiplicative, unless
H(x) is small - H(x) small large CCM 07
- Suppose and define
- We combine (1e)RF1 and (1e)RF1zj to get
(1e)f(zj) - Question How do we get (1e)RFp?
- Two different approaches
- A general approach (for any p, and negative
frequencies) - An approach exploiting p 1, only for
nonnegative freqs(better by log(m))
21Questions / Thoughts
- For what other problems can we use this
generalize-then-interpolate strategy? - Some non-streaming problems too?
- The power of moments?
- The power of residual moments?CountMin (CM 05)
CountSketch (CCF 02) ? HSS (Ganguly et al.) - WANTED Faster moment estimation (some progress
in Cormode-Ganguly 07)