Optimal Space Lower Bounds for All Frequency Moments - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Optimal Space Lower Bounds for All Frequency Moments

Description:

Optimal Space Lower Bounds for All Frequency Moments ... If SC(fS) = 2|S|, S shattered by F. VC Dimension of F, VCD(F), = size of largest S shattered by F ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 23
Provided by: DavidWo48
Category:

less

Transcript and Presenter's Notes

Title: Optimal Space Lower Bounds for All Frequency Moments


1
Optimal Space Lower Bounds for All Frequency
Moments
  • David Woodruff
  • MIT
  • dpwood_at_mit.edu

2
The Streaming Model
  • Stream of elements a1, , aq each in 1, , m
  • Want to compute statistics on stream
  • Elements arranged in adversarial order
  • Algorithms given one pass over stream
  • Goal Minimum space algorithm

3
Frequency Moments
  • q stream size, m universe size
  • fi occurrences of item i
  • Define k-th Frequency Moment
  • Applications
  • F_0 distinct elements in stream, F_1 q
  • F_2 repeat rate
  • Compute self-joins in database

4
The Best Determininistic Algorithm
  • Trivial Algorithm for Fk
  • Store/update frequency fi of each item i
  • Space m items i, log q bits for each fi Total
    Space O(m log q)

Can we do better?
  • Negative Result AMS96 Any algorithm computing
    Fk exactly must use ?(m) space.

5
Approximating Fk
  • Negative Result AMS96 Any deterministic
    algorithm that outputs x with Fk x lt ? Fk
    must use ?(m) space.

What about randomized approximation algorithms?
  • Randomized algorithm A ?-approximates Fk if
    A outputs x with PrFk x lt ? Fk gt 2/3

6
Previous Work
  • Upper Bounds Can ?-approximate F0 BJKST02, F2
    AMS96, Fk CK04, k gt 2 with space
    respectively
  • Lower Bounds
  • AMS96 8 k, ?approximating Fk need ?(log m)
    space
  • IW03 ?-approximating F0 requires
    space if
  • Questions Does the bound hold for k ?
    0?
  • Does it hold for F0 for smaller ??

7
First Result
  • Optimal Lower Bound 8 k ? 1 and any ?
    ?(m-1/2), any ?-approximator for Fk must use
    ?(?-2) bits of space.
  • F1 q computed trivially in log q space
  • Fk computed in O(m log q) space, so need ?
    ?(m-.5)
  • Technique Reduction from 2-party protocol for
    computing Hamming distance ?(x,y)

8
Idea Behind Lower Bounds
Alice
Bob
y 2 0,1m
x 2 0,1m
Stream s(y)
Stream s(x)
S
Internal state of A
(1 ?) Fk algorithm A
(1 ?) Fk algorithm A
  • Compute (1 ?) Fk(s(x) s(y)) w.p. gt 2/3
  • Idea If can decide f(x,y) w.p. gt 2/3, space
    used
  • by A at least randomized 1-way comm.
    Complexity of f(,)

9
Randomized 1-way comm. complexity
  • Boolean function f X Y ! 0,1
  • Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y)
  • Only 1 message sent must be from Alice to Bob
  • Comm. cost of protocol expected length of
    longest message sent over all inputs.
  • ? -error randomized 1-way comm. complexity of f,
    R?(f), is comm. cost of optimal protocol
    computing f w.p. 1-?

How do we lower bound R?(f)?
10
The VC Dimension KNR
  • F f X ! 0,1 family of Boolean functions
  • f 2 F is length-X bitstring
  • For S µ X, shatter coefficient SC(fS) of S is
    f Sf 2 F distinct bitstrings when F
    restricted to S
  • SC(F, p) maxS 2 X, S p SC(fS)
  • If SC(fS) 2S, S shattered by F
  • VC Dimension of F, VCD(F), size of largest S
    shattered by F

11
Shatter Coefficient Theorem
  • Notation For f X Y ! 0,1, define
  • fX fx(y) Y ! 0,1 x 2 X ,
  • where fx(y) f(x,y)
  • Theorem BJKS For every f X Y ! 0,1, every
    p VCD( fX ),
  • R1/3(f) ?(log(SC(fX, p)))

12
Hamming Distance Decision Problem (HDDP)
Set t ?(1/?2)
Alice
Bob
x 2 0,1t
y 2 0,1t
Promise Problem ?
?(x,y) t/2 t1/2
?(x,y) gt t/2 f(x,y) 0 OR
f(x,y) 1
  • We will lower bound R1/3(f) via SC(fX, t), but
    first, a critical lemma

13
Main Lemma
S µ0,1n
T
y
S-T
  • Show 9 S µ 0,1n with S n s.t.
  • there exists 2?(n) good sets T µ S so
    that
  • 9 a separator y 2 0,1n s.t
  • 8 t 2 T, ?(y, t) n/2 cn1/2 for some c gt 0
  • 8 t 2 S T, ?(y,t) gt n/2

14
Lemma Solves HDDP Complexity
  • Theorem R1/3(f) ?(t) ?(?-2).
  • Proof
  • Alice gets yT for random good set T applying
    main lemma with n t.
  • Bob gets random s 2 S
  • Let f yT T S ! 0,1.
  • Main Lemma gtSC(f) 2?(t)
  • BJKS gt R1/3(f) ?(t) ?(?-2)

15
Back to Frequency Moments
Idea Use ?-approximator for Fk in a
protocol to solve HDDP
y 2 0,1t
s 2 S µ 0,1t
ith universe element included exactly once in
auxiliary stream ay (resp. as) if and only if yi
(resp. si) 1.
ay
as
Fk Alg
Fk Alg
State
16
Solving HDDP with Fk
  • Alice/Bob compute ?-approx to Fk(ay as)
  • Fk(ay as) 2k wt(y Æ s) 1k ?(y,s)
  • For k ? 1,
  • Alice also transmits wt(y) in log m space.

Conclusion ?-approximating Fk(ay as) decides
HDDP, so space for Fk is ?(t) ?(?-2)
17
But How to Prove Main Lemma?
  • Recall show 9 S µ 0,1n with S n s.t.
  • there exists 2?(n) sets T µ S so that
  • 9 a separator y 2 0,1n s.t
  • 8 t 2 T, ?(y, t) n/2 cn1/2 for some c gt 0
  • 8 t 2 S T, ?(y,t) gt n/2
  • Use probabilistic method
  • For S, choose n random elts in 0,1n
  • Show probability arbitrary T µ S satisfies
    (1),(2) is gt 2-zn for constant z lt 1.
  • Hence expected such T is 2?(n)
  • So exists S with 2?(n) such T

Key
18
Proving the Main Lemma
  • Let T t1, , tn/2 µ S be arbitrary
  • Let yi majority(t1,i, ..., tn/2,i) for all i 2
    m
  • What is probability p that both
  • 8 t 2 T, ?(y, t) n/2 cn1/2 for some c gt 0
  • 8 t 2 S T, ?(y,t) gt n/2
  • For 1, let x Pr8 t 2 T, ?(y,t) n/2 cn.5
  • For 2, let y Pr8 t 2 S-T, ?(y,t) gt n/2
    2-n/2
  • By independence, p x y.
  • It remains to lower bound x

19
The Matrix Problem
  • WLOG, assume y 1n (recall y is majority word)
  • Want lower bound Pr8 t 2 T, ?(y,t) n/2 cn.5
  • Equivalent to matrix problem

t1 -gt t2 -gt tn/2 -gt
101001000101111001 100101011100011110 001110111101
010101 101010111011100011
Given random n/2 x n binary matrix w/each column
majority 1, what is probablity each row has at
least n/2 cn.5 1s?
20
Bipartite Graphs
  • Matrix Problem ? Bipartite Graph Counting
    Problem


  • How many bipartite graphs exist on n/2 by n
    vertices s.t. each left vertex has degree gt n/2
    cn.5 and each right vertex degree gt n/2?

21
Second Result
  • Bipartite graph count Probabilistic argument
    shows at least 2n2/2 zn/2 n such bipartite
    graphs for constant z lt 1.
  • Analysis generalizes to show bipartite graphs
    on m n vertices w/each left vertex having
    degree gt n/2 and each right vertex degree gt m/2
    is gt 2mn-zm-n.
  • Previous known count 2mn-m-n MW personal
    comm.
  • Follows easily from a correlation inequality of
    Kleitman.
  • Our proof uses correlation inequalities, but more
    involved analysis.

22
Summary
  • Results
  • Optimal Lower Bound 8 k ? 1 and any ?
    ?(m-1/2), any ?-approximator for Fk must use
    ?(?-2) bits of space.
  • Bipartite Graph Count bipartite graphs on m
    n vertices w/each left vertex having degree gt n/2
    and each right vertex degree gt m/2 is at least
    2mn-zm-n for constant z lt 1.
Write a Comment
User Comments (0)
About PowerShow.com