Erasure Correcting Codes - PowerPoint PPT Presentation

About This Presentation
Title:

Erasure Correcting Codes

Description:

Practical Loss-Resilient Codes. Michael Luby, Amin Shokrollahi, Dan Spielman, ... Concentration results (edge exposure martingale) proves that all but a small ... – PowerPoint PPT presentation

Number of Views:312
Avg rating:3.0/5.0
Slides: 47
Provided by: wieder
Category:

less

Transcript and Presenter's Notes

Title: Erasure Correcting Codes


1
Erasure Correcting Codes
  • In The Real World
  • Udi Wieder

Incorporates presentations made by Michael Luby
and Michael Mitzenmacher.
2
Based On..
  • Practical Loss-Resilient Codes
  • Michael Luby, Amin Shokrollahi, Dan Spielman,
    Bolker Stemann
  • STOC 97
  • Analysis of Random Processes Using And-Or Tree
    Evolution
  • Michael Luby, Amin Shokrollahi
  • SODA 98
  • LT Codes
  • Michael Luby
  • STOC 2002
  • Online Codes
  • Petar Maymounkov

3
Probabilistic Channels
1-p
1-p
0
0
0
0
p
p
?
p
p
1
1
1
1
1-p
1-p
The binary erasure channel
The binary symmetric channel
4
Erasure Codes
Content
n
Encoding
Encoding
cn
Transmission
Received
n
Decoding
Content
n
5
Performance Measures
  • Time Overhead
  • The time to encode and decode expressed as a
    multiple of the encoding length.
  • Reception Efficiency
  • Ratio of packets in message to packets needed to
    decode. Optimal is 1.

6
Known Codes
  • Random Linear Codes (Elias)
  • A linear code of minimum distance d is capable of
    correcting any pattern of d-1 or less erasures.
  • Achieves capacity of the channel with high
    probability, i.e. can be used to transmit over
    erasure channel at any rate Rlt1-p.
  • Decoding time O(n3). Unacceptable.
  • Reed-Solomon Codes
  • Optimal reception efficiency with probability 1.
  • Decoding and Encoding in Quadratic time. (About
    one minute to encode 1MB).

7
Tornado Codes
Practical Loss-Resilient Codes Michael Luby, Amin
Shokrollahi, Dan Spielman, Bolker Stemann
(1997) Analysis of Random Processes Using And-Or
Tree Evolution Michael Luby, Amin Shokrollahi
(1998)
8
Low Density Parity Check Codes
  • Introduced in the early 60s by Gallager and were
    reinvented many times.

Check bits
Message bits
a b c d e f
g h i j k l
The time to encode is proportional to the number
of edges.
9
Encoding Process.
Standard Loss-Resilient Code.
Bipartite Graph
Bipartite Graph
Length of message k
Check bits
Rate 1-?
10
Decoding Rule
  • Given the value of a check bit and all but one of
    the message bits on which it depends, set the
    missing message bit to be the XOR of the check
    bit and its known message bits.
  • XOR the message bit with all its neighbors.
  • Delete from the graph the message bit and all
    edges to which it belongs.
  • Decoding ends (successfully) when all edges are
    deleted.

11
Decoding Process
a
?
c
d
?
f
?
?
12
Decoding Process
?
?
?
?
13
Regular Graphs
Random Permutation of the Edges
Degree 3
Degree 6
14
3-6 Regular Graph Analysis
left
left
right
Pr not recovered ? (1-(1-x)5)2
Pr all recovered (1-x)5
x Pr not recovered
15
Decoding to Completion (sketch)
  • Most message bits are roots of trees.
  • Concentration results (edge exposure martingale)
    proves that all but a small fraction of message
    bits are decoded with high probability.
  • The remaining bits are decoded do to expansion.
    (Original graph is a good expander on small
    sets).
  • If a set of size s and average degree a has more
    than as/2 neighbors then a unique neighbor exists
    and decoding continues.

16
Efficiency
Encoding time (sec), 1k packets Encoding time (sec), 1k packets Encoding time (sec), 1k packets
size Reed-Solomon Tornado
250k 4.6 0.06
500k 19 0.12
1 MB 93 0.26
2 MB 442 0.53
4 MB 1717 1.06
9 MB 6994 2.13
16 MB 30802 4.33
Decoding time (sec), 1k packets Decoding time (sec), 1k packets Decoding time (sec), 1k packets
size Reed-Solomon Tornado
250k 2.06 0.06
500k 8.4 0.09
1 MB 40.5 0.14
2 MB 199 0.19
4 MB 800 0.40
9 MB 3166 0.87
16 MB 13829 1.75
Rate 0.5 Erasure probability
0.5 Implementation ?
17
LT Codes
LT Codes Michael Luby (2002)
18
Rateless Codes
  • A different model of transmition.
  • Sender sends an infinite sequence of encoding
    symbols.
  • Time complexity Average time for encoding a
    symbol.
  • Erasures are independent of content.
  • Receiver may decode when received enough symbols.
  • Reception efficiency.
  • Digital Fountain approach.

19
Applications
  • Unreliable Channels.
  • In Tornado codes small rate implies big graphs
    and therefore a lot of memory (proportional to
    the size of the encoding).
  • Multi-source download.
  • Downloading from different servers requires no
    coordination.
  • Efficient exchange of data between users requires
    small rate of the source.
  • Multi-cast without feedback (say over the
    internet).
  • Rateless codes are the natural notion.

20
Trivial Examples - Repetition
  • Each time unit send a random symbol of the code.
  • Advantage Encoding complexity O(1).
  • Disadvantage Need k k ln(k/?) code symbols to
    cover all k content symbols with failure
    probability at most ?.Example
  • k 100,000, ? 10-6Reception overhead
    2400 (terrible)

21
Trivial Examples Reed Solomon
  • Each time unit send an evaluation of the
    polynomial on a random point.
  • Advantage Decoding possible when k symbols
    received.
  • Disadvantage Large time complexity for encoding
    and decoding.

22
Parameters of LT Codes
  • Encoding time complexity O(ln n) per symbol.
  • Decoding time complexity O(n ln n).
  • Reception efficiency Asymptotically zero (unlike
    Tornado codes).
  • Failure probability very small (smaller than
    Tornado).

23
LT encoding
Content
Choose 2 random content symbols
2
XOR content symbols
24
LT encoding
Content
Choose 1 random content symbol
1
Copy content symbol
25
LT encoding
Content
Choose 4 random content symbols
4
XOR content symbols
26
LT encoding properties
  • Encoding symbols generated independently of each
    other
  • Any number of encoding symbols can be generated
    on the fly
  • Reception overhead independent of loss patterns
  • The success of the decoding process depends only
    on the degree distribution of received encoding
    symbols.
  • The degree distribution on received encoding
    symbols is the same as the degree distribution on
    generated encoding symbols.

27
LT decoding
Content (unknown)
  1. Collect enough encoding symbols and set up graph
    between encoding symbols and content symbols to
    be recovered
  1. Collect enough encoding symbols and set up graph
    between encoding symbols and content symbols to
    be recovered
  1. Identify encoding symbol of degree 1. STOP if
    none exists.
  1. Identify encoding symbol of degree 1. STOP if
    none exists.

3. Copy value of encoding symbol into unique
neighbor, XOR value of newly recovered content
symbol into encoding symbol neighbors and delete
edges emanating from content symbol.
3. Copy value of encoding symbol into unique
neighbor, XOR value of newly recovered content
symbol into encoding symbol neighbors and delete
edges emanating from content symbol.
4. Go to Step 2.
4. Go to Step 2.
28
Releasing an encoding symbol
xth recovered content symbol releases encoding
symbol
x
x-1
x-1 recovered content symbols
k-x unrecovered content symbols
content symbol can be recovered by encoding
symbol
i-2
encoding symbol of degree i
29
The Ripple
  • Definition At each decoding step, the ripple is
    the set of encoding symbols that have been
    released at any previous decoding step but their
    one remaining content symbol has not yet been
    recovered.

x
x recovered content symbols
k-x unrecovered content symbols
collision
encoding symbols in the ripple
30
Successful Decoding
  • Decoding succeeds iff the ripple never becomes
    empty
  • Ripple small
  • Small chance of encoding symbol collisions ?
    small reception overhead
  • Risk of ripple becoming empty due to random
    fluctuations is large
  • Ripple large
  • Large chance of encoding symbol collisions ?
    large reception overhead
  • Risk of ripple becoming empty due to random
    fluctuations is small

31
LT codes idea
  • Control the release of encoding symbols over the
    entire decoding process so that ripple is never
    empty but never too large
  • Very few encoding symbol collisions
  • Very little reception overhead

32
Release probability
  • Definition Release probability for degree i
    encoding symbols at decoding step x is q(i,x).
  • Proposition
  • For i 1 q(i,x) 1 for x 0, q(i,x) 0 for
    all x gt 1
  • For i gt 1 for x i -1, , k-1,

33
Release probability
xth recovered content symbol releases encoding
symbol
x
x-1
x-1 recovered content symbols
k-x unrecovered content symbols
content symbol can be recovered by encoding
symbol
i-2
encoding symbol is released at decoding step x
34
Release distributions for specific degrees
i 2
i 3
i 4
i 10
i 20
k 1000
35
Overall release probability
  • Definition At each decoding step x, r(x) is the
    overall probability that an encoding symbol is
    released at decoding step x with respect to
    specific degree distribution p()
  • Proposition

36
Uniform release question
  • Question Is there a degree distribution such
    that the overall release distribution is uniform
    over x?
  • Why interesting?
  • One encoding symbol released for each content
    symbol decoded
  • Ripple will tend to stay small ? minimize
    reception overhead
  • Ripple will tend not to become empty ? decoding
    will succeed

37
Uniform release answer YES!
  • Ideal Soliton Distribution

38
Ideal Soliton Distribution
k 1000
39
A simple way to choose from Ideal SD
Choose A uniformly from the interval 0,1) If
then degree Else degree 1.
4
5
6
1/k
2
3
Degree
0
1/6
1/4
1/3
1/2
1
Value of A
1/k
1/5
40
Ideal SD theorem
  • Ideal SD Theorem The overall release
    distribution is exactly uniform, i.e., r(x) 1/k
    for all x 0,,k-1.

41
Overall release distribution for Ideal SD
Release Distribution
k 1000
42
In expected value
  • Optimal recovery with respect to Ideal SD
  • Receive exactly k encoding symbols
  • Exactly one encoding symbol released before any
    decoding steps, recovers one content symbol
  • At each decoding step a content symbol is
    recovered, it releases exactly one new encoding
    symbol, which in turn recovers exactly one more
    content symbol
  • Ripple size always exactly 1
  • Performance Analysis
  • No reception overhead
  • Average degree

43
When taking into account random fluctuations
  • Ideal Soliton Distribution fails miserably
  • Expected behavior not equal to actual behavior
    because of variance
  • Ripple very likely to become empty
  • Fails with very very high probability (even with
    high reception overhead)

44
Robust Soliton Distribution design
  • Need to ensure that the ripple never empties
  • At the beginning of the decoding process
  • ISD ripple is not large enough to withstand
    random fluctuations
  • RSD boost p(1)c/ sqrtk so that expected
    ripple size at beginning is c sqrtk
  • At the end of the decoding process
  • ISD expected rate of adding to the ripple not
    large enough to compensate for collisions towards
    the end of the decoding process when ripple is
    large relative to the number of unrecovered
    content symbols
  • RSD boost p(i) for higher degrees i so that
    expected ripple growth at the end of the decoding
    process is higher

45
LT Codes Bottom line
  • Using the Robust Soliton Distribution
  • Number of symbols needed to recover the data with
    probability ? is
  • The average degree of an encoding symbol is

46
Online Codes
We are out of time
  • Online Codes
  • Petar Maymounkov
Write a Comment
User Comments (0)
About PowerShow.com