Variable-Length Codes: Huffman Codes - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Variable-Length Codes: Huffman Codes

Description:

Chapter 4 Variable-Length Codes: Huffman Codes Outline 4.1 Introduction 4.2 Unique Decoding 4.3 Instantaneous Codes 4.4 Construction of Instantaneous Codes 4.5 The ... – PowerPoint PPT presentation

Number of Views:876

Avg rating:3.0/5.0

Slides: 33

Provided by: isCsNthu6

Category:

more less

Transcript and Presenter's Notes

Title: Variable-Length Codes: Huffman Codes

1
Chapter 4

Variable-Length Codes Huffman Codes

2
Outline

4.1 Introduction
4.2 Unique Decoding
4.3 Instantaneous Codes
4.4 Construction of Instantaneous Codes
4.5 The Kraft Inequality
4.6 Huffman Codes

3
4.1 Introduction

Consider the problem of efficient coding of
message to be sent over a noiseless channel.
maximize the number of messages that can be sent
in a given period of time.
transmit a message in the shortest possible time.
make the codeword as short as possible.

4
4.2 Unique Decoding

Source symbols (alphabet) s1, . . . , sq
Codes alphabet C1, C2, . . . , Cr
X is a random variable
X? s1, . . . , sq with probabilities p1, . .
. , pq
X is observed over and over again, i.e., it
generates a sequence of symbols s1, . . .
, sq
Ex s1 ? 000
s2 ? 111

encode
Si
Ci Cj Ck
5

The collection of all codewords is called a
code.
Our Objective
Minimize the average codeword length
Unique decodability - the received message must
have a single, unique possible interpretation.
Ex. s1 ? 0 Source
alphabets1, s2, s3, s4
s2 ? 01 Code
alphabet0,1
s3 ? 11
s4 ? 00
Then 0011
So it doesnt satisfy unique
decodability

s4 s3
s1 s1 s3
6

Ex
s1 ? 0
s2 ? 010 Then 010
s3 ? 01
s4 ? 10 It also doesnt satisfy
unique decodability
Ex
s1 ? 0
s2 ? 01
s3 ? 011 It is a unique decodable
code
s4 ? 111

s1 s4
s2
s3 s1
7

Definition
The nth extension of a code is simple all
possible concatenations of n symbols of the
original source code.
No two encoded concatenations can be the same,
even for different extensions.
Every finite sequence of code characters
corresponds at most one message.
every distinct sequence of source symbols has a
corresponding encoded sequence that is unique.

8
4.3 Instantaneous Codes

Decision (Decoding) tree

s1 0 s2 10 s3 110 s4 111
s1
0
Initial state
s2
0
1
s3
0
1
1
s4
9

Note that each bit of the received stream is
examined only once and that the terminal states
of this tree are the four source symbols s1, s2,
s3 and s4.
Definition A code is instantaneous if it is
decodable without lookahead (i.e., a word can be
recognized as soon as complete).
When a complete symbol is received, the receiver
immediately know this, and do not have to look
further before deciding what message symbol you
received.
A code is instantaneous iff no codeword si is a
prefix of another codeword sj.

the existence of the decoding tree
the existence of the instantaneous
decodability
Ex. Let n be a positive integer. A comma code is
a code with codewords
1 becomes a comma to represent end of a
codeword.
Because a comma code is prefix-free, it is a
instantaneous code.

1, 01, 001, 0001, . . . , 0001, 000
n-1
n
11

Ex
ex 01111111
So it had better use comma code.

s1 ? 0 s2 ? 01 s3 ? 011 s4 ? 111
Not instantaneous code, but it still be uniquely
decodable code.
U.D.
I.C. is better than U.D.
I.C.
s1 ? 1 s2 ? 01 s3 ? 001 s4 ? 001
s4
s4
12
4.4 Construction of Instantaneous Codes

Given five symbols si in the source code S.
Both C1 and C2 are Instantaneous Codes, which
one is better?
Answer Depends on the frequency of
occurrence of the symbols

s1 ? 0 s2 ? 10 s3 ? 110 s4 ? 1110 s5 ? 1111
s1 ? 00 s2 ? 01 s3 ? 10 s4 ? 110 s5 ? 111
C1
C2
13
4.5 Kraft Inequality

Theorem A necessary and sufficient condition for
the existence of an instantaneous code S of q
symbols si
(i 1, .., q) with encoded words of length
l1 ? l2 ? ? lq is where r is the
radix (number of symbols) of the alphabet of the
encoded symbols.

Thm An instantaneous code with word length n1 ,
n2, . . ., nM exits iff where D is
the size of the code alphabet. (?) For
simplicity, we assume D 2(1) when H 1, n1
1 and n2 1 s1?0 s2?1

s1
0
is OK for tree of length 1
1
s2
15

(2)If H ? h is OK,
then k ? 1 k ? 1when H h 1,
By induction method, the inequality is
true.

K
k
K
16

Another proof(?) C c1, c2, , cM with
codeword lengths l1, , lMLet L max li
Ifwhere yj are any code symbols, cannot be in
C because ci is a prefix of x. x has
possibilities.
words (length of L) not in C

gt If there are ?1 number of words with length
1 then ?1? r.
If there are ?2 number of words with
length 2 then
(?2 ? r2 - ?1r). Infer that, ?3 ?
r3 - ?1r2 - ?2r. gt ?1 ? r ?1r ?2 ? r2
?1r2 ?2r ?3 ? r3
?1rn-1 ?2rn-2 ?n ? rnSo if it satisfy the
last equation, then all the equations hold.

gt It satisfy Krafts inequality.
18

Note A code may obey Kraft inequality still
not be instantaneous.EX 0 01
011 111EX Binary Block codes
(Error Correcting Codes) ( )
( )

But it is not I.C.
n
k
b 2
19

Ex Comma code D
.length 1 1 (It must to
be.)length 2 D-1length 3
(D-1)2length k
D(D-1)k-1
Kraft inequality can be extended to any uniquely
decodable codes.

McMillan InequalityThm A uniquely decodable
code has word length l1, l2, , lq exits iff (r
is the size of the code alphabet)
(?)Trivial. Because I.C. is one kind of
U.C.(?)where l is the length of the longest
symbol, i.e., and Nk
is the number of code symbols (of radix r) of
length k.

(the number of distinct
sequences of length k in radix r)If k gt 1, we
can find a n s.t. kn gt nl ??

22
4.6 Huffman Codes

Lemma If a code C is optimal within the class of
instantaneous codes, then C is optimal with the
entire class of U.D. codes.
pf Suppose C is a U.D. code. C has a smaller
average codeword length than C.
Let n1, n2 , . . . , nM be the codeword
length of C
So, C is not optimal in I.C. ??

(It satisfy Kraft Inequality)
23

Optimal Codes
Given a binary I.C. C with codeword length n1,
, nM associated with probability p1, , pM.
For convenience, let p1 p2 pM-1 pM
(ni ni1 nir if pi pi1
pir)
Then C is optimal within the class of I.C., C
must have the following properties

(a) Higher probable symbols have shorter
codewords.
i.e. if pj gt pk gt nj nk
(b) The 2 least probable symbols have codewords
of equal length, i.e., nM-1 nM
(c) Among the codewords of length nM, 2 codes
the agree in all digits except the least one.
Ex

x1 ? 0 x2 ? 100 x3 ? 101 x4 ? 1101 x5 ? 1110
Dont satisfy (c), it have to be
x4 ? 1101 x5 ? 1100
25

pf
(a) if ( pj gt pk ) ( nj gt nk ) then we
can construct a better codes C by interchange
codewords j, k.
(b) From (a) if pM-1 gt pM then nM-1 nM By
assumption if pM-1 pM then nM-1 nM .We may
make nM-1 nM and still have in I.C. better than
the original one.
(c) If condition (c) is not true, we may drop
the least digit of all such codewords to obtain a
better code.
Huffman coding- Construction of Optimal
(instantaneous) codes

Let x1, , xM be an array of symbols with
probabilities p1, , pM ( p1 p2 pM)
(1) Combine xM-1, xM into xM-1,M with
probability pM-1pM
(2) Assume we can construct an O.C. C2 for
x1, x2, , xM-1,M
(3) Now construct a code C1 for x1, , xM as
follows
The codeword associated with x1, , xM-2 in C1 is
exactly the same as the corresponding codewords
of C2
Let wM-1,M be the codeword of xM-1,M in C2
The codewords for xM-1, xM in C1 is
either wM-1,M 0 ? xM-1 or
wM-1,M 1 ? xM

Claim C1 is an optimal code for the set of
probability p1, , pM.
Ex

x3,4,5,6 0.45 x1 0.3 x2 0.25
x1,2 0.55 x3,4,5,6 0.45
x1 0.3 x2 0.25 x3 0.2 x4 0.1 x5 0.1 x6 0.05
x1 0.3 x2 0.25 x3 0.2 x5,6 0.15 x4 0.1
x1 0.3 x2 0.25 x4,5,6 0.25 x3 0.2
28
x1 00 x2 01
x1,2 0 x3,4,5,6 1
x3 10 x4,5,6 11
x3,4,5,6 1
x4 110 x5,6 111
x5 1110 x6 1111
29

pf
We assume that C1 is not optimal.
Let C1 be an optimal instantaneous code for x1,
, xM.
Then C1 has codewords w1, w2, , wM with
length n1, n2, , nM.
If there are only two symbols of maximum length
in a tree, they must have their last decision
node in common, and they must be the two least
probable symbols. Before we reduce a tree, the
two symbols contribute nM( pM pM-1) and after
the reduction they contribute (nM - 1)( pM
pM-1).
So that the code length is reduced by ( pM
pM-1).
Average length of C1 gt Average length of C1
--- (1)
After reduction,
Average length of C2 gt Average length of C2
(The terms of (1) minus pMpM-1)
But C2 is optimal ??

If there are more than two symbols of the maximum
length, we can use the following proposition
Symbols having the same length may be
inter-change without changing the average code
length.
We can use the biggest two probable symbols to
encode like the way before.
Huffman encoding is not unique.

p1 0.4 ? 00 p2 0.2 ? 10 p3 0.2 ? 11 p4
0.1 ? 010 p5 0.1 ? 011
Average length L 0.420.220.220.130.1
3 2.2
p1 0.4 ? 1 p2 0.2 ? 01 p3 0.2 ? 000 p4
0.1 ? 0010 p5 0.1 ? 0011
Or
Average length L 0.410.220.230.140.1
4 2.2
32

Which encoding way is better?
Var( I ) 0.4(2-2.2)2 0.2(2-2.2)2
0.2(2-2.2)2 0.1(3-2.2)2 0.1(3-2.2)2 0.16
(Good!)
Var( II ) 0.4(1-2.2)2 0.2(2-2.2)2
0.2(3-2.2)2 0.1(4-2.2)2 0.1(4-2.2)2
1.36

Write a Comment

User Comments (0)