Title: Recent Results in NonAsymptotic Shannon Theory
1Recent Results in Non-Asymptotic Shannon Theory
- Dror Baron
- Supported by AFOSR, DARPA, NSF, ONR, and Texas
Instruments - Joint work with M. A. Khojastepour, R. G.
Baraniuk, and S. Sarvotham
2we may someday see the end of wireline
- S. Cherry, Edholms law of bandwidth, IEEE
Spectrum, vol. 41, no. 7, July 2004, pp. 58-60
3But will there ever be enough data rate?
- R. Lucky, 1989
- We are not very good at predicting uses until
the actual service becomes available. I am not
worried we will think of something when it
happens. - There will always be new applications that gobble
up more data rate!
4How much can we improve wireless?
- Spectrum is limited natural resource
- Information theory says we need lots of power for
high data rates - even with infinite bandwidth! - Solution transmit more power BUT
- Limited by environmental concerns
- Will batteries support all that power?
- Sooner or later wireless rates will hit a wall!
5Where can we improve?
- Algorithms and hardware gains
- Power-efficient computation
- Efficient power amplifiers
- Advances in batteries
- Directional antennas
- Communication gains
- Channel coding
- Source coding
- Better source and channel models
6Where will the last dB of communication
gains come from?Network information theory
(Shannon theory)
7Traditional point to point information theory
Channel
Decoder
Encoder
- Single source
- Single transmitter
- Single receiver
- Single communication stream
- Most aspects are well-understood
8Network information theory
Channel
Encoder
Decoder
- Network of
- Multiple sources
- Multiple transmitters
- Multiple receivers
- Multiple communication streams
- Few results
- My goal understand various costs of network
information theory
Encoder
Channel
9What costs has information theory overlooked?
10Channel coding has arrived
Channel
Decoder
Encoder
- Turbo codes Berrou et al., 1993
- 0.5 dB gap to capacity (rate R below capacity)
- BER10-5
- Block length n6.5104
- Regular LDPC codes Gallager, 1963
- Irregular LDPC Richardson et al., 2001
- 0.13 dB gap to capacity
- BER10-6
- n106
11Distributed source coding has also arrived
Decoder
Encoder
x
y
- Encoder for x based on syndrome of channel code
- Decoder for x has correlated side information y
- Various types of channel codes can be used
- Slepian-Wolf via LDPC codes Xiong et al., 2004
- H(XY)0.47
- R0.5 (rate above Slepian-Wolf limit)
- BER10-6
- Block length n105
-
12Hey! Did you notice those block lengths?
- Information theory provides results in the
asymptotic regime - Channel coding 8?0, rate RC-? achievable with
?!0 as n!1 - Slepian-Wolf coding 8?0, rate RH(XY)?
achievable with ?!0 as n!1 - Best practical results achieved for n105
- Do those results require large n?
13But we live in a finite world
- Real world data doesnt always have n106
- IP packets
- Emails, text messages
- Sensornet applications
- (small battery ! small n)
- How do those methods perform for n104? 103?
- How quickly can we approach the performance
limits of information theory?
14And we dont know the statistics either!
- Lossless coding (single source)
- Length-n input xBer(p)
- Encode with wrong parameter q
- K-L divergence penalty with variable rate codes
- Performance loss (minor bitrate penalty)
- Channel coding, distributed source coding
- Encode with wrong parameter q
- Fixed rate codes based on joint-typicality
- Typical set Tq for q is smaller than Tp for p
- As n!1, Pr(error)!1
- Performance collapse!
15Main challenges
- How quickly can we approach the performance
limits of information theory? - Will address for channel coding and Slepian-Wolf
- What can we do when the source statistics are
unknown? - Will address for Slepian-Wolf
16But first . . .What does the prior art indicate?
17Underlying problem
- Shannon 1958
- This inverse problem is perhaps the more
natural in applications given a required level
of probability of error, how long must the code
be? - Motivation may have been phone and space
communication - Small probability of codeword error ?
- Wireless paradigm
- Given k bits, what are the minimal channel
resources to attain probability of error ?? - Can retransmit packet ? fix large ?
- n depends on packet length
- Need to characterize R(n,?)
18Error exponents
- Fix rate R
- Bounds on probability of error
- Random coding Prerror2-nEr(R)
- Sphere packing Prerror2-nEsp(R)o(n)
- Er(R)Esp(R) for R near C
19Error exponents
- Fix rate R
- Bounds on probability of error
- Random coding Prerror2-nEr(R)
- Sphere packing Prerror2-nEsp(R)o(n)
- Er(R)Esp(R) for R near C
- Shannons regime
- This inverse problem is perhaps the more
natural in applications given a required level
of probability of error, how long must the code
be? - Fix R
- E(R)O(1)
- log(?)O(n) good for small ?
20Error exponents
- Fix rate R
- Bounds on probability of error
- Random coding Prerror2-nEr(R)
- Sphere packing Prerror2-nEsp(R)o(n)
- Er(R)Esp(R) for R near C
- Wireless paradigm
- Given k bits, what are the minimal channel
resources to attain probability of error ?? - Fix ?
- nE(R)O(1)
- o(n) term dominates
- Bounds diverge
21Error exponents fail for RC-?/n0.5
22How quickly can we approach the channel
capacity?(known statistics)
23Binary symmetric channel (BSC) setup
s
xf(s)
y
sg(y)
Encoder f
Decoder g
zBer(n,p)
- s21,,M input message
- x, y, and z binary length-n sequences
- zBernoulli(n,p) implies crossover probability p
- Code (f,g,n,M,?) includes
- Encoder xf(s)21,,M
- Rate Rlog(M)/n
- Channel yx?z
- Decoder g reconstructs s by sg(y)
- Error probability Prg(y)?s?
24Non-asymptotic capacity
25Key to solution Packing typical sets
- Need to encode typical set TZ for z
- Code needs to cover z2Tz
- Need Pr(z2TZ)¼ 1-?
- Probability ? of codeword error
- What about rate?
- Output space 2n possible sequences
- Cant pack more than 2n/Tz sets into output
- M2n/Tz
- Minimal cardinality Tmin covers 1-?
- CNA1-log(Tmin)
output space
Tz
26Whats the cardinality of Tmin?
- Consider empirical statistics nz?i zi, PZnz/n
- p
- Minimal Tmin has form Tmin,z PZ?(?)
- Determine ?(?) with central limit theorem (CLT)
- EPZp, Var(PZ)p(1-p)/n
- PzN(p,p(1-p)/n)
- Non-asymptotic
- ?p?p(1-p)/n0.5
- CLT ? ! ?(?)
27Tight non-asymptotic capacity
- Theorem
- CNA(n,?)C-K(?)/n0.5o(n-0.5)
- K(?)?-1(?) p(1-p)0.5 log((1-p)/p)
- Gap to capacity is K(?)/n0.5o(n-0.5)
- Note o(n-0.5) asymptotically negligible w.r.t.
K/n0.5 - Tightened Wolfowitz bounds up to o(n-0.5)
- Gap to capacity of LDPC codes 2-3x greater
- We know how quickly we can approach C
28Non-asymptotic capacity of BSC
29Gaussian channel results
s
xf(s)
y
sg(y)
Encoder f
Decoder g
zN(0,?2)
- Continuous channel
- Power constraint ?i(xi)2 nP
- Shannon 1958 derived CNA(n,?) for Gaussian
channel via cone packing (non-i.i.d. codebook) - Information spectrum bounds on probabilities of
error indicate Gaussian codebooks are sub-optimal - i.i.d. codebooks arent good enough!
30Excess power of Gaussian channel
31How quickly can we approach the Slepian-Wolf
limit?(known statistics)
32But first . . .Slepian-Wolf Review
33Slepian-Wolf setup
x
fX(x)
gX(fX(x),fY(y))
Encoder fX
Decoder g
y
fY(y)
gY(fX(x),fY(y))
Encoder fY
- x and y are correlated length-n sequences
- Code (fX,fY,gX,gY,n,MX,MY,?X,?Y) includes
- Encoders fX(x)21,,MX, fY(y)21,,MY
- Rates RXlog(MX)/n, RYlog(MY)/n
- Decoder g reconstructs x and y by gX(fX(x),fY(y))
and gY(fX(x),fY(y)) - Error probabilities PrgX(fX(x),fY(y))?x?X and
PrgY(fX(x),fY(y))?y?Y
34Slepian-Wolf theorem
- Theorem SlepianWolf,1973
- RXH(XY) (conditional entropy)
- RYH(YX)
- RXRYH(X,Y) (joint entropy)
RY
Slepian-Wolf rate region
H(Y)
H(YX)
RX
H(X)
H(XY)
35Slepian-Wolf with binary symmetric correlation
structure (known statistics)
36Binary symmetric correlation setup
- y, z, and z are length-n Bernoulli sequences
- Correlation channel z is independent of y
- Bernoulli parameters p,q20,0.5), rp(1-q)(1-p)q
- Code (f,g,n,M,?) includes
- Encoder f(x)21,,M
- Rate Rlog(M)/n
- Decoder g(f(x),y)20,1n
- Error probability Prg(f(x),y)?x?
37Relation to general Slepian-Wolf setup
- x, y, and z are Bernoulli
- Correlation z independent of y
-
- Focus on encoding x at rate approaching H(Z)
- Neglect well-known encoding of y at rate RYH(Y)
RY
our setup
H(Y)
H(YX)
RX
H(X)
H(Z)
38Non-asymptotic Slepian-Wolf rate
- Definition RNA(n,?)min9 code (f,g,n,M,?)
log(M)/n - Prior art Wolfowitz,1978
- Converse result RNA(n,?) H(XY)KC(?)/n0.5
- Achievable result RNA(n,?) H(XY)KA(?)/n0.5
- Bounds are loose KA(?)KC(?)
- Can we tighten Wolfowitzs bounds?
39Tight non-asymptotic rate
- Theorem
- RNA(n,?)H(Z)K(?)/n0.5o(n-0.5)
- K(?)?-1(?) q(1-q)0.5 log((1-q)/q)
- Redundancy rate is K(?)/n0.5o(n-0.5)
- Note o(n-0.5) decays faster than K/n0.5
- Tightened Wolfowitz bounds up to o(n-0.5)
- We know how quickly we can approach H(Z) with
known statistics
RNA(n,?)
tight bound
H(XY)
n
40What can we do when the source statistics are
unknown?(universality)
41Universal setup
- Unknown Bernoulli parameters p, q, r
- Encoder observes x and ny?iyi
- Communication of ny requires log(n) bits
- Variable rate used
- Need distribution for nz
- Distribution depends on nx and ny (not x)
- Codebook size Mnx,ny
42Distribution of nz
- CLT was key to solution with known statistics
- How can we apply CLT when q is unknown?
- Consider a numerical example
- p0.3, q0.1, rp(1-q)(1-p)q
- PXr, PYp, PZq (empirical true)
- We plot Pr(nznx,ny) as function of nz20,,n
43Pr(nznx,ny) for n102
44Pr(nznx,ny) for n103
45Pr(nznx,ny) for n104
46Pr(nznx,ny) for n104
where
47Universal rate
- Theorem
- RNA(n,?)H(PZ)K(?)/n0.5o(n-0.5)
- K(?)f(PY)K(?)
- f(PY)2PY(1-PY)0.5/1-2PY
f(PY)!1
f(PY)!0
48Why is f(PY) small when PY is small?
- Known statistics ? Var(nz)nq(1-q) regardless of
empirical statistics - PY!0 ? can estimate nZ with small variance
- Universal scheme outperforms known statistics
when PY is small - Key issue variable rate coding (universal) beats
fixed rate coding (known statistics) - Can cut down expected redundancy (known
statistics) by communicating ny to encoder - log(n) bits for ny will save O(n0.5)
49Redundancy for PY¼0.5
- f(PY) blows up as PY approaches 0.5
- Redundancy is O(n-0.5) with enormous constant
- Another scheme has O(n-1/3) redundancy
- Better performance for PY0.5O(n-1/6)
- Universal redundancy can be huge!
- Ongoing research improvement of O(n-1/3)
50Numerical example
- n104
- q0.1
- Slepian-Wolf requires nH2(q)4690 bits
- Non-asymptotic approach (known statistics) with
?10-2 requires nRNA(n,?)4907 bits - Universal approach with PY0.3 requires 5224 bits
- With PY0.4 we need 5863 bits
- In practice, penalty for universality is huge!
51Summary
- Network information theory (Shannon theory) may
enable to increase wireless data rates - Practical channel codes and distributed source
codes approaching limits, rely on large n - How quickly can we approach the performance
limits of information theory? - CNAC-K(?)/n1/2o(n-1/2)
- RNAH(Z)K(?)/n1/2o(n-1/2)
- Gap to capacity of LDPC codes 2-3x greater
52Universality
- What can we do when the source statistics are
unknown? (Slepian Wolf) - PY
- PY¼0.5 H(PZ)O(n-1/3) can be huge!
- Universal channel coding with feedback for BSC
- Capacity-achieving code requires PY0.5
- Universality with current scheme is O(n-1/3)
Channel
Decoder
Encoder
feedback
53Further directions
- Gaussian channel (briefly discussed)
- Shannon 1958 derived CNA(n,?) for Gaussian
channel with cone packing (non-i.i.d. codebook) - Gaussian codebooks are sub-optimal!
- Other channels
- CNA(n,?) C-KA(?)/n0.5 via information spectrum
- Gaussian codebook distribution sub-optimal
- Must consider non-i.i.d. codebook constructions
- Penalties for finite n and unknown statistics
exist everywhere in Shannon theory!! - www.dsp.rice.edu