Title: Extractors: Optimal Up to Constant Factors
1Extractors Optimal Up to Constant Factors
- Avi Wigderson
- IAS, Princeton
- Hebrew U., Jerusalem
- Joint work with
- Chi-Jen Lu, Omer Reingold, Salil Vadhan.
- To appear STOC 03
2Original MotivationB84,SV84,V85,VV85,CG85,V87,CW
89,Z90-91
- Randomization is pervasive in CS
- Algorithm design, cryptography, distributed
computing, - Typically assume perfect random source.
- Unbiased, independent random bits
- Can we use a weak random source?
- (Randomness) Extractors convert weak random
sources into almost perfect randomness.
3Extractors Nisan Zuckerman 93
k-source of length n
EXT
m almost-uniform bits
- X has min-entropy k (? is a k-source) if ?x
PrX x ? 2-k (i.e. no heavy elements).
4Extractors Nisan Zuckerman 93
k-source of length n
EXT
m bits ?-close to uniform
- X has min-entropy k (? is a k-source) if ?x
PrX x ? 2-k (i.e. no heavy elements). - Measure of closeness statistical difference
(a.k.a. variation distance, a.k.a. half L1-norm).
5Applications of Extractors
- Derandomization of error reduction in BPP Sip88,
GZ97, MV99,STV99 - Derandomization of space-bounded algorithms
NZ93, INW94, RR99, GW02 - Distributed Network Algorithms WZ95, Zuc97,
RZ98, Ind02. - Hardness of Approximation Zuc93, Uma99, MU01
- Cryptography CDHKS00, MW00, Lu02 Vad03
- Data Structures Ta02
6Unifying Role of Extractors
- Extractors are intimately related to
- Hash Functions ILL89,SZ94,GW94
- Expander Graphs NZ93, WZ93, GW94, RVW00, TUZ01,
CRVW02 - Samplers G97, Z97
- Pseudorandom Generators Trevisan 99,
- Error-Correcting Codes T99, TZ01, TZS01, SU01,
U02 - ? Unify the theory of pseudorandomness.
7Extractors as graphs
(k,?)-extractor Ext 0,1n ? 0,1d
?0,1m
Sampling Hashing Amplification Coding Expanders
?
Discrepancy For all but 2k of the x? 0,1n,
?(X) ? B/2d - B/2m lt ?
8Extractors - Parameters
k-source of length n
(short) seed
EXT
d random bits
m bits ?-close to uniform
- Goals minimize d, maximize m.
- Non-constructive optimal Sip88,NZ93,RT97
- Seed length d log(n-k) 2 log 1/? O(1).
- Output length m k d - 2 log 1/? - O(1).
9Extractors - Parameters
k-source of length n
(short) seed
EXT
d random bits
m bits ?-close to uniform
- Goals minimize d, maximize m.
- Non-constructive optimal Sip88,NZ93,RT97
- Seed length d log n O(1).
- Output length m k d - O(1).
10Explicit Constructions
- A large body of work ..., NZ93, WZ93, GW94,
SZ94, SSZ95, Zuc96, Ta96, Ta98, Tre99, RRV99a,
RRV99b, ISW00, RSW00, RVW00, TUZ01, TZS01, SU01
( those I forgot ) - Some results for particular value of k (small k
and large k are easier). Very useful example
Zuc96 k?(n), dO(log n), m.99 k - For general k, either optimize seed length or
output length. Previous records RSW00 - dO(log n (poly loglog n)), m.99 k
- dO(log n), m k/log k
11This Work
- Main Result Any k, dO(log n), m.99 k
- Other results (mainly for general ?).
- Technical contributions
- New condensers w/ constant seed length.
- Augmenting the win-win repeated condensing
paradigm of RSW00 w/ error reduction à la
RRV99. - General construction of mergers TaShma96 from
locally decodable error-correcting codes.
12Condensers RR99,RSW00,TUZ01
A (k,k,?)-condenser
k-source of length n
Con
(?,k)-source of length n
- Lossless Condenser if kk (in this case, denote
as (k,?)-condenser).
13Repeated Condensing RSW00
k-source length n
(?0,k)-source length n/2
(t.?0,k)-source length O(k)
141st Challenge Error Accumulation
- Number of steps tlog (n/k).
- Final error gt t.?0 ? Need ?0 lt 1/t.
- Condenser seed length gt log 1/?0 gt log t.
- ? Extractor seed length gt t.log t which may be as
large as log n.loglog n (partially explains seed
length of RSW00). - Solution idea start with constant ?0. Combine
repeated condensing w/ error reduction (à la
RRV99) to prevent error accumulation.
15Error Reduction
Con0 w/ error ? condensation ? seed length d
- Con has condensation 2? seed length 2d.
- Hope error ? ?2.
Only if error comes from seeds!
16Parallel composition
Con0 seed error ? condensation ? seed length
d source error ? entropy loss ?
kd-k
- Con seed error O(?2) condensation 2? seed
dO(log 1/?) - source error ? entropy loss ?1
17Serial composition
Con0 seed error ? condensation ? seed d
source error ? entropy loss ? kd-k
- Con seed error O(?) condensation ?2 seed 2d
- source error ?(11/?) entropy loss 2?
18Repeated condensing revisited
- Start with Con0 w/ constant seed error ? ? 1/18
constant condensation ? constant seed length d
(source error ?0 entropy loss ?0). - Alternate parallel and serial composition
loglog n/k O(1) times. - ? Con w/ seed error ? condenses to O(k) bits
optimal seed length dO(log n/k) - (source error (polylog n/k). ?0
- entropy loss O(log n/k).?0).
- Home? Not so fast
192nd Challenge No Such Explicit Lossless
Condensers
- Previous condensers with constant seed length
- A (k,?)-condenser w/ nn-O(1) CRVW02
- A (k,?(k),?)-condenser w/ nn/100 Vad03 for
k?(n) - Here A (k,?(k),?)-condenser w/ nn/100 for any
k (see them later). - Still not lossless! Challenge persists
20Win-Win Condensers RSW00
- Assume Con is a (k,?(k),?)-condenser then
?k-source X, we are in one of two good cases - Con(X,Y) contains almost k bits of randomness ?
Con is almost lossless. - X still has some randomness even conditioned on
Con(X,Y). - ? (Con(X,Y), X) is a block source CG85. Good
extractors for block sources already known (based
on NZ93, ) - ? Ext(Con(X,Y), X) is uniform on ?(k) bits
21Win-Win under composition
- More generally (Con,Som) is a win-win condenser
if ?k-source X, either - Con(X,Y) is lossless. Or
- Som(X,Y) is somewhere random a list of b sources
one of which is uniform on ?(k) bits. - Parallel composition generalized
- Con(X,Y1?Y2) Con(X,Y1)?Con(X,Y2)
- Som(X,Y1?Y2) Som(X,Y1) ? Som(X,Y2)
- Serial composition generalized
- Con(X,Y1?Y2) Con(Con(X,Y1),Y2)
- Som(X,Y1?Y2) Som(X,Y1) ? Som(Con(X,Y1),Y2)
22Partial Summary
- We give a constant seed length,
(k,?(k),?)-condenser w/ nn/100 (still to be
seen). - Implies a lossless win-win condenser with
constant seed length. - Iterate repeated condensing and (seed-) error
reduction loglog n/k O(1) times. - Get a win-win condenser (Con,Som) where
- Con condenses to O(k) bits and
- Som produces a short list of t sources where
one of which is a block source (t can be made as
small as log(c)n).
233rd Challenge Mergers TaShma96
- Now we have a somewhere random source X1,X2,,Xt
(one of the Xi is random) - t can be made as small as log(c)n
- An extractor for such a source is called merger
TaS96. - Previous constructions mergers w/ seed length
dO(log t . log n) TaS96 . - Here mergers w/ seed length dO(log t) and
seed length dO(t) (independent of n)
24New Mergers From LDCs
- Example mergers from Hadamard codes.
- Input, a somewhere k-source XX1,X2,,Xt (one of
the Xi is a k-source). - Seed, Y is t bits. Define Con(X,y) ?i?Y Xi
- Claim With prob ½, Con(X,Y) has entropy k/2
- Proof idea
- Assume wlog. that X1 is a k-source.
- For every y, Con(X,y) ? Con(X,y?e1) X1.
- ? At least one of Con(X,y) and Con(X,y?e1)
contains entropy ? k/2.
25Old Debt The Condenser
- Promised a (k,?(k),?)-condenser, w/ constant
seed length and nn/100. - Two new simple ways of obtaining them.
- Based on any error correcting codes (gives best
parameters influenced byRSW00,Vad03). - Based on the new mergers Ran Raz.
- Mergers ? CondensersLet X be a k-source. For any
constant t XX1,X2,,Xt is ? a somewhere k/t
source. - ? The Hadamard merger is also a condenser w/
the desired parameters.
26Some Open Problems
- Improved dependence on ?. Possible direction
mergers for t blocks with seed length f(t)
O(log n/?). - Getting the right constants
- d log n O(1).
- m k d - O(1).
- Possible directions
- .lossless condenser w/ constant seed
- lossless mergers
- Better locally decodable codes.
27New Mergers From LDCs
- Generally View the somewhere k-source
XX1,X2,,Xt ?(?n)t as a t?n matrix. - Encode each column with a code C ?t ? ?u.
- Output a random row of the encoded matrix
- dlog u (independent of n)
28New Mergers From LDCs
- C ?t ? ?u is (q,?) locally decodable erasure
code if - For any fraction ? of non-erased codeword
symbols S. - For any message position i.
- The ith message symbol can be recovered using q
codeword symbols from S. - Using such C, the above mergers essentially turn
a somewhere k-sources to a k/q-source with
probability at least 1- ?.
29New Mergers From LDCs
- Hadamard mergers w/ smaller error
- Seed length dO(t.log 1/?). Transform a
somewhere k-sources X1,X2,,Xt to a (k/2,
?)-source. - Reed-Muller mergers w/ smaller error
- Seed length dO(t? .log 1/?). Transform a
somewhere k-sources X1,X2,,Xt to a (?(?k),
?)-source. -
- Note seed length doesnt depend on n !
- Efficient enough to obtain the desired extractors.
30Error Reduction
- Main Extractor Any k, m.99 k, dO(log n).
- Caveat constant ?.
- Error reduction for extractors RRV99 is not
efficient enough for this case. - Our new mergers RRV99,RSW00 give improved
error reduction. - ? Get various new extractors for general ?.
- Any k, m.99 k, dO(log n), ?exp(-log n/log(c)n)
- Any k, m.99 k, dO(log n (logn)2 log 1/? ),
31Source vs. Seed Error Cont.
Defining bad inputs
0,1n
0,1n
(ka)-source X
heavy outputs lt ?2k ? bad inputs lt 2k (a
fraction 2-a)
32Source vs. Seed Error Conclusion
- More formally for a (k,k,?)-condenser Con,
?(klog 1/?)-source X, ?set G of good pairs
(x,y) s.t. - For 1-? density x?X, Pr(x,Y)?Ggt1-2?.
- Con(X,Y)(X,Y)?G is a (k-log 1/?) source.
- ? Can differentiate source error ? and seed error
?. - Source error free in seed length!
- ? (Con(x,y1),Con(x,y2)) has source error ? and
seed error O(?2). - Dont need random y1,y2 (expander edge).