Title: Source Coding
1Source Coding
- Data Compression
- June 2009
- A.J. Han Vinck
2DATA COMPRESSION
- NO LOSS of information and exact reproduction
- (low compression ratio 14)
- general problem statement
- find a means for spending as little time as
possible on packing as much of data as possible - into as little space as possible, and with no
loss of information
3GENERAL IDEA
- represent likely symbols with short length
binary words - where likely is derived from
- - prediction of next symbol in source output
- q-ue q-ua q-ui q-uo
- q ?
- q-00 q-01 q-10 q-11
-
- context between the source
symbols words sounds context in pictures
4Why compress?
- - Lossless compression often reduces file size by
40 to 80. - - More economical to transport and store
- - Most Internet content is compressed for
transmission - - Compression before encryption can make
code-breaking difficult - - Conserve battery power and storage space on
mobile devices - - Compression and decompression can be hardwired
5Some history
- 1948 Shannon-Fano coding
- 1952 Huffman coding
- reduced redundancy in symbol coding
- demonstrably optimal fixed-length coding
- 1977 Lempel-Ziv coding
- first major dictionary method
- maps repeated word patterns to code words
6MODEL KNOWLEDGE
- best performance exact prediction!
- exact prediction no new information!
- no new information no message to
transmit
7Example No prediction
- source C
-
- message 0 1 2 3 4
5 6 7 - code 000 001 010 011 100 101 110
111 -
- representation length 3
-
8Example with prediction
- ENCODE DIFFERENCE
- probability .25 .5 .25
- difference -1 0 1
- code 00 1 01
- source C code
- -
- P
- L .25 2 .5 1 .25 2 1.5
bit/difference symbol -
9binary tree codes
the relation between source symbols and
codewords
A 11 B 10 C 0
1 0
1 0
code
General Properties - every node has two
successors leaves or/and nodes - the way to
reach a leave gives the connected codeword -
source letters are only assigned to leaves
i.e. no codeword is prefix of another code word
10tree codes
Tree codes are prefix codes and uniquely
decodable i.e. a string of codewords can be
uniquely decomposed into the individual
codewords Non-prefix codes may be uniquely
decodable example A 1 B 10 C 100
11binary tree codes
The average codeword length
Property an optimal code has minimum
L Homework show that L sum (node
probabilities)
12Tree encoding (1)
- for data / text the compression should be
lossless? no errors - STEP 1 assign messages to nodes
codeword niP(i) - a 0.5 1 1 1 1 1.5
- 1
- b 0.25 0 0 1 1 0.75
- 1
- c 0.125 0 0 1 0.25
- d 0.0625 1 1 0 0.125
- 0
- e 0.0625 0 0 0 0.125
- AVERAGE CODEWORD LENGTH 2.75 bit/source symbol
13Tree encoding (2)
- STEP 2 OPTIMIZE ASSIGNMENT (MINIMIZE average
length ) - codeword
niP(i) - e 0.0625 1 1111 0.25
- 1
- d 0.0625 0 0111
0.25 - 1
- c 0.125 0 011
0.375 - 1
- b 0.25
0 01 0.5 -
-
a 0.5 0 0 0.5 - AVERAGE CODEWORD LENGTH 1.875 bit/source
symbol !
14Kraft inequality
- Prefix codes with M code words satisfy the Kraft
inequality -
- where nk is the code word length for message k
- Proof let nM be the longest codeword length
- then, in a code tree of depth nM, the terminal
nodes eliminate - from the total number of available nodes
15example
eliminates 2
eliminates 4
eliminates 8
Depth 4
Homework can we replace into in the Kraft
inequality?
16 Kraft inequality
- Suppose that the length specification of M code
words satisfies the Kraft inequality, - then
- where Ni is the number of code words of length
i. - Then, we can construct a prefix code with the
specified lengths. - Note that
-
17 Kraft inequality
- From this,
- Interpretation at every level less nodes used
than available! - E.g. for level 3, we have 8 nodes minus the nodes
cancelled by - Level 1 and 2.
18performance
- Suppose that we select the code word lengths as
- Then, a prefix code exists, since
-
-
- with average length
19Lower bound for prefix codes
- We show that
- We write
- Equality can be established for
20Huffman coding (1)
The average codeword length
Property an optimal code has minimum
L Property for an optimal code the two least
probable codewords have the same length, are
the longest by manipulating the assignment
differ only in the last code digit
Homework proof
21Huffman Coding optimality (2)
Given code C with average length L and M symbols
construct C ( For Cthe codewords for least
probable symbols differ in last digit
) 1. replace the 2 least probable symbols CM
and CM-1 in C by symbol CM-1 with
probability P(M-1) P(M) P(M-1) 2.
to minimize L, we have to minimize L.
22Huffman Coding (JPEG, MPEG, MP3)
- 1 take together smallest probabilites
P(i) P(j) - 2 replace symbol i and j by new symbol
- 3 go to 1 - until end
-
- Example code
- 0.3 0.3 0.3 11
- 0.55
- 0.25 0.25 0.25 01
- 1.00
- 0.25 0.25 10 0.45 0.45
- 0.1 100
- 0.2
- 0.1
000 -
23Properties
- ADVANTAGES
- uniquely decodable code
- smallest average codeword length
- DISADVANTAGES
- LARGE tables give complexity
- variable word length
- sensitive to channel errors
24Conclusion Huffman
- Tree coding (Huffman) is not universal!
- it is only valid for one particular type of
source! -
- For COMPUTER DATA data reduction is
- lossless? no errors at reproduction
- universal? effective for different types of data
-
-
25Some comments
- The Huffman code is not unique, but efficiency is
the same! - For alphabets larger than 2 small modification
necessary (where?)
26Performance Huffman
- Using the probability distribution for the source
U, a prefix code exists with average length - L lt H(U) 1
- Since Huffman is optimum, this bound is also true
for Huffman codes. - Problem if H(U) ? 0
- Improvements can be made when we take J symbols
together, then - JH(U) L lt J H(U) 1
- and
- H(U) L L/J lt H(U) 1/J
27Example
28Example
s1 Pr(s1)0.1
s2 Pr(s2)0.25
s3 Pr(s3)0.2
s4 Pr(s4)0.45
0.3
0.55
1
29Encoding idea Lempel Ziv Welch-LZW
Assume we have just read a segment w from the
text. a is the next symbol.
a
w
- If wa is not in the dictionary,
- Write the index of w in the output file.
- Add wa to the dictionary, and set w ? a.
a
- If wa is in the dictionary,
- Process the next symbol with segment wa.
30Encoding example
- address 0 a address 1 b address 2 c
- String
- a a b a a c a b c a b c
b output update - a a aa not in dictionry, output 0 add aa
to dictionary 0 aa 3 - a a b continue with a, store ab in dictionary
0 ab 4 - a a b a continue with b, store ba in
dictionary 1 ba 5 - a a b a a c aa in dictionary, aac not,
3 aac 6 - a a b a a c a 2 ca 7
- a a b a a c a b c 4 abc 8
- a a b a a c a b c a b 7 cab 9
31UNIVERSAL (LZW) (decoder)
- Start with basic symbol set
- 2. Read a code c from the compressed file.
- - The address c in the dictionary determines the
segment w. - - write w in the output file.
- 3. Add wa to the dictionary a is the first
letter of the next segment -
32Decoding example
- address 0 a address 1 b address 2 c
- String input update
- a ? output a 0
- a a ! output a determines ? a, update aa
0 aa 3 - a a b . output 1 determines !b, update ab
1 ab 4 - a a b a a . 3 ba 5
- a a b a a c . 2 aac 6
- a a b a a c a b . 4 ca
7 - a a b a a c a b c a .
7 abc 8
33Conclusion (LZW)
- IDEA TRY to copy long parts of source output
- if overflow
- throw least-recently used entry away in en- and
decoder - universal
- lossless
Homework encode/decode the sequence
1001010110011... Try to solve the problem that
occurs!
34Some history
- GIF, TIFF, V.42bis modem compression standard,
PostScript Level 2 - 1977 published by Abraham Lempel and Jakob Ziv
- 1984 LZ-Welch algorithm published in IEEE
Computer - Sperry patent transferred to Unisys (1986)
- GIF file format Required use of LZW algorithm
35references
J. Ziv and A. Lempel, A Universal Algorithm for
Sequential Data Compression, IEEE Transactions on
Information Theory, May 1977. Terry Welch, A
Technique for High-Performance Data Compression,
Computer, June 1984.
36Summary of operations
- ENCODING output update location
- W1 A loc( W1 ) W1A
N - W2 F loc( W2 ) W2
F N1 - W3 X
loc( W3 ) W3 X N2 - DECODE INPUT update location
- loc( W1 ) W1 ?
- loc( W2 ) W2 ?
W1A N - loc( W3) W3 ? W2 F
N1
37Problem and solution
- ENCODING output update location
- W1 A loc( W1 ) W1A N
- W2 W1 A F loc( W2 ) W2 F
N1 - DECODE INPUT update location
- loc( W1 ) W1 ?
- loc( W2 W1 A) W2
W1A N - Since W2 W1 A, the ? can be solved ? W2
updated at location N as W1A
38Shannon-Fano coding
Suppose that we have a source with M symbols.
Every symbol ui occurs with probability
P(ui). We try to encode symbol ui with
bits Then the average representation length is
39code realization
Define
40continued
Define The codeword for ui is the binary
expansion for Q(ui) of length ni Property The
code is a prefix code with the promised
length Proof Let i ? k1
41continued
- The binary radix-2 representation for Q(ui) and
Q(uk) differ at least in position nk. - The codewords for Q(ui) and Q(uk) have length
- The truncated representation for Q(uk) can never
be a prefix for the codeword ni.
42example
P(u0 u1 u2 u3 u4 u5 u6 u7)(5/16, 3/16,1/8, 1/8,
3/32, 1/16, 1/16, 1/32)
43Enumerative coding
suppose pn ones in long sequence of length
n. According to Shannon we need nh(p) bits
to represent every sequence How do we realize
the encoding and decoding?
44Enumerative coding
- Solution do lexicographical ordering
- Example 2 ones in sequence of length 6
- 1 1 0 0 0 0
-
- 9 0 1 1 0 0 0
- 8 0 1 0 1 0 0
- 7 0 1 0 0 1 0
- 6 0 1 0 0 0 1
- 0 0 1 1 0 0
- 0 0 1 0 1 0
- 3 0 0 1 0 0 1
- 0 0 0 1 1 0
- 1 0 0 0 1 0 1
- 0 0 0 0 0 1 1
Encode Sequence of sequences with lower
lexicographical order Decode reconstruct
sequence with sequence
45Enumerative encoding
Example index for sequence 0 1 0 1 0 0
8 0 1 0 1 0 0
There are 2 sequences with prefix 0 length 2
and with 1 one
There are 6 sequences with prefix 0 length 4
and with 2 ones
46Enumerative decoding
Given sequence of length 6 with 2 ones What is
the sequence for index 8 ? There are 10
sequences with prefix 0, length 5 and 2
ones Hence, sequence starts with 0 There are 6
sequences with prefix 00, length 4 and 2 ones
Hence, sequence starts with 01 01 6 There
are 3 sequences with prefix 010, length 3 and 1
one Hence, sequence starts with 010 and not
011 010 6 There are 2 sequences with prefix
0100, length 2 and 1 one Hence, sequence starts
with 0101 010100 8
47Enumerative encoding performance
The number of bits per n source outputs for pn
ones
Asymptotically Efficiency ? h(p) bits per source
output
Note added for words of length n, - encode
first the number of ones in a block with
log2(n1) bits, - then do the enumerative
encoding with h(p) bits per source output The
contribution (log2(n1))/n dissappears for large
n!
48David A. Huffman
In 1951 David A. Huffman and his classmates in an
electrical engineering graduate course on
information theory were given the choice of a
term paper or a final exam. For the term paper,
Huffman's professor, Robert M. Fano, had assigned
what at first appeared to be a simple problem.
Students were asked to find the most efficient
method of representing numbers, letters or other
symbols using a binary code. Besides being a
nimble intellectual exercise, finding such a code
would enable information to be compressed for
transmission over a computer network or for
storage in a computer's memory. Huffman worked
on the problem for months, developing a number of
approaches, but none that he could prove to be
the most efficient. Finally, he despaired of ever
reaching a solution and decided to start studying
for the final. Just as he was throwing his notes
in the garbage, the solution came to him. "It
was the most singular moment of my life," Huffman
says. "There was the absolute lightning of
sudden realization."
49The inventors
LZW (Lempel-Ziv-Welch) is an implementation of a
lossless data compression algorithm created by
Lempel and Ziv. It was published by Terry Welch
in 1984 as an improved version of the LZ78
dictionary coding algorithm developed by Abraham
Lempel and Jacob Ziv.
50Intuitive Lempel Ziv (be careful !)
- A source generates independent symbols 0 and 1
- p(1) 1-p(0) p
- Then
- There are roughly 2nh(p) typical sequences,
- every typical sequence has p(t) ? 2-nh(p)
- We expect that in a binary sequence of lenth N
2nh(p) , every typical sequence occurs once - (with very high probability)
51Intuitive Lempel Ziv (be careful !)
- Idea for the Algorithm
- Start with an initial sequence of length N
- Generate a string of length n
- ( which is typical with high probability)
- b. Transmit its starting position in the string
of length N with log2N bits - if not present, transmit the n bist as they
occur - c. Delete the first n bits of the initial
sequence and append the newly generated n bits.
Go back to a, unless end of the source sequence
52Intuitive Lempel Ziv (be careful !)
- EFFICIENCY the new n bits are typical with
probability 1 - ?, where ? ? 0 - - if non typical, transmit 0, followed by the n
bits - - if typical, transmit 1, followed by log2N bits
for the position in the block - hence average bits/source output
- ? (1-?)(log2N)/n ? 1/n ? h(p) bits/source
output for large n and ? ? 0! - NOTE
- - if p changes, we can adapt N and n, or choose
some worst case value in advance - - the typical words can also be stored in a
memory. The algorithm then outputs the location
of the new word. Every time a new word is entered
into the memory and one word is deleted. - Why is this not a good solution?