Variable-Length Codes: Huffman Codes - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Variable-Length Codes: Huffman Codes

Description:

Chapter 4 Variable-Length Codes: Huffman Codes Outline 4.1 Introduction 4.2 Unique Decoding 4.3 Instantaneous Codes 4.4 Construction of Instantaneous Codes 4.5 The ... – PowerPoint PPT presentation

Number of Views:869
Avg rating:3.0/5.0
Slides: 33
Provided by: isCsNthu6
Category:

less

Transcript and Presenter's Notes

Title: Variable-Length Codes: Huffman Codes


1
Chapter 4
  • Variable-Length Codes Huffman Codes

2
Outline
  • 4.1 Introduction
  • 4.2 Unique Decoding
  • 4.3 Instantaneous Codes
  • 4.4 Construction of Instantaneous Codes
  • 4.5 The Kraft Inequality
  • 4.6 Huffman Codes

3
4.1 Introduction
  • Consider the problem of efficient coding of
    message to be sent over a noiseless channel.
  • maximize the number of messages that can be sent
    in a given period of time.
  • transmit a message in the shortest possible time.
  • make the codeword as short as possible.

4
4.2 Unique Decoding
  • Source symbols (alphabet) s1, . . . , sq
  • Codes alphabet C1, C2, . . . , Cr
  • X is a random variable
  • X? s1, . . . , sq with probabilities p1, . .
    . , pq
  • X is observed over and over again, i.e., it
    generates a sequence of symbols s1, . . .
    , sq
  • Ex s1 ? 000
  • s2 ? 111

encode
Si
Ci Cj Ck
5
  • The collection of all codewords is called a
    code.
  • Our Objective
  • Minimize the average codeword length
  • Unique decodability - the received message must
    have a single, unique possible interpretation.
  • Ex. s1 ? 0 Source
    alphabets1, s2, s3, s4
  • s2 ? 01 Code
    alphabet0,1
  • s3 ? 11
  • s4 ? 00
  • Then 0011
  • So it doesnt satisfy unique
    decodability

s4 s3
s1 s1 s3
6
  • Ex
  • s1 ? 0
  • s2 ? 010 Then 010
  • s3 ? 01
  • s4 ? 10 It also doesnt satisfy
    unique decodability
  • Ex
  • s1 ? 0
  • s2 ? 01
  • s3 ? 011 It is a unique decodable
    code
  • s4 ? 111

s1 s4
s2
s3 s1
7
  • Definition
  • The nth extension of a code is simple all
    possible concatenations of n symbols of the
    original source code.
  • No two encoded concatenations can be the same,
    even for different extensions.
  • Every finite sequence of code characters
    corresponds at most one message.
  • every distinct sequence of source symbols has a
    corresponding encoded sequence that is unique.

8
4.3 Instantaneous Codes
  • Decision (Decoding) tree

s1 0 s2 10 s3 110 s4 111
s1
0
Initial state
s2
0
1
s3
0
1
1
s4
9
  • Note that each bit of the received stream is
    examined only once and that the terminal states
    of this tree are the four source symbols s1, s2,
    s3 and s4.
  • Definition A code is instantaneous if it is
    decodable without lookahead (i.e., a word can be
    recognized as soon as complete).
  • When a complete symbol is received, the receiver
    immediately know this, and do not have to look
    further before deciding what message symbol you
    received.
  • A code is instantaneous iff no codeword si is a
    prefix of another codeword sj.

10
  • the existence of the decoding tree
  • the existence of the instantaneous
    decodability
  • Ex. Let n be a positive integer. A comma code is
    a code with codewords
  • 1 becomes a comma to represent end of a
    codeword.
  • Because a comma code is prefix-free, it is a
    instantaneous code.

1, 01, 001, 0001, . . . , 0001, 000
n-1
n
11
  • Ex
  • ex 01111111
  • So it had better use comma code.

s1 ? 0 s2 ? 01 s3 ? 011 s4 ? 111
Not instantaneous code, but it still be uniquely
decodable code.
U.D.
I.C. is better than U.D.
I.C.
s1 ? 1 s2 ? 01 s3 ? 001 s4 ? 001
s4
s4
12
4.4 Construction of Instantaneous Codes
  • Given five symbols si in the source code S.
  • Both C1 and C2 are Instantaneous Codes, which
    one is better?
  • Answer Depends on the frequency of
    occurrence of the symbols

s1 ? 0 s2 ? 10 s3 ? 110 s4 ? 1110 s5 ? 1111
s1 ? 00 s2 ? 01 s3 ? 10 s4 ? 110 s5 ? 111
C1
C2
13
4.5 Kraft Inequality
  • Theorem A necessary and sufficient condition for
    the existence of an instantaneous code S of q
    symbols si
  • (i 1, .., q) with encoded words of length
  • l1 ? l2 ? ? lq is where r is the
    radix (number of symbols) of the alphabet of the
    encoded symbols.

14
  • Thm An instantaneous code with word length n1 ,
    n2, . . ., nM exits iff where D is
    the size of the code alphabet. (?) For
    simplicity, we assume D 2(1) when H 1, n1
    1 and n2 1 s1?0 s2?1

s1
0
is OK for tree of length 1
1
s2
15
  • (2)If H ? h is OK,
  • then k ? 1 k ? 1when H h 1,
    By induction method, the inequality is
    true.

K
k
K
16
  • Another proof(?) C c1, c2, , cM with
    codeword lengths l1, , lMLet L max li
    Ifwhere yj are any code symbols, cannot be in
    C because ci is a prefix of x. x has
    possibilities.

  • words (length of L) not in C


17
  • gt If there are ?1 number of words with length
    1 then ?1? r.
  • If there are ?2 number of words with
    length 2 then
  • (?2 ? r2 - ?1r). Infer that, ?3 ?
    r3 - ?1r2 - ?2r. gt ?1 ? r ?1r ?2 ? r2
    ?1r2 ?2r ?3 ? r3
    ?1rn-1 ?2rn-2 ?n ? rnSo if it satisfy the
    last equation, then all the equations hold.


gt It satisfy Krafts inequality.
18
  • Note A code may obey Kraft inequality still
    not be instantaneous.EX 0 01
    011 111EX Binary Block codes
    (Error Correcting Codes) ( )
    ( )

But it is not I.C.
n
k
b 2
19
  • Ex Comma code D
    .length 1 1 (It must to
    be.)length 2 D-1length 3
    (D-1)2length k
    D(D-1)k-1
  • Kraft inequality can be extended to any uniquely
    decodable codes.



20
  • McMillan InequalityThm A uniquely decodable
    code has word length l1, l2, , lq exits iff (r
    is the size of the code alphabet)
  • (?)Trivial. Because I.C. is one kind of
    U.C.(?)where l is the length of the longest
    symbol, i.e., and Nk
    is the number of code symbols (of radix r) of
    length k.

21
  • (the number of distinct
    sequences of length k in radix r)If k gt 1, we
    can find a n s.t. kn gt nl ??

22
4.6 Huffman Codes
  • Lemma If a code C is optimal within the class of
    instantaneous codes, then C is optimal with the
    entire class of U.D. codes.
  • pf Suppose C is a U.D. code. C has a smaller
    average codeword length than C.
  • Let n1, n2 , . . . , nM be the codeword
    length of C
  • So, C is not optimal in I.C. ??

(It satisfy Kraft Inequality)
23
  • Optimal Codes
  • Given a binary I.C. C with codeword length n1,
    , nM associated with probability p1, , pM.
  • For convenience, let p1 p2 pM-1 pM
  • (ni ni1 nir if pi pi1
    pir)
  • Then C is optimal within the class of I.C., C
    must have the following properties

24
  • (a) Higher probable symbols have shorter
    codewords.
  • i.e. if pj gt pk gt nj nk
  • (b) The 2 least probable symbols have codewords
    of equal length, i.e., nM-1 nM
  • (c) Among the codewords of length nM, 2 codes
    the agree in all digits except the least one.
  • Ex

x1 ? 0 x2 ? 100 x3 ? 101 x4 ? 1101 x5 ? 1110
Dont satisfy (c), it have to be
x4 ? 1101 x5 ? 1100
25
  • pf
  • (a) if ( pj gt pk ) ( nj gt nk ) then we
    can construct a better codes C by interchange
    codewords j, k.
  • (b) From (a) if pM-1 gt pM then nM-1 nM By
    assumption if pM-1 pM then nM-1 nM .We may
    make nM-1 nM and still have in I.C. better than
    the original one.
  • (c) If condition (c) is not true, we may drop
    the least digit of all such codewords to obtain a
    better code.
  • Huffman coding- Construction of Optimal
    (instantaneous) codes

26
  • Let x1, , xM be an array of symbols with
    probabilities p1, , pM ( p1 p2 pM)
  • (1) Combine xM-1, xM into xM-1,M with
    probability pM-1pM
  • (2) Assume we can construct an O.C. C2 for
    x1, x2, , xM-1,M
  • (3) Now construct a code C1 for x1, , xM as
    follows
  • The codeword associated with x1, , xM-2 in C1 is
    exactly the same as the corresponding codewords
    of C2
  • Let wM-1,M be the codeword of xM-1,M in C2
  • The codewords for xM-1, xM in C1 is
  • either wM-1,M 0 ? xM-1 or
  • wM-1,M 1 ? xM

27
  • Claim C1 is an optimal code for the set of
    probability p1, , pM.
  • Ex

x3,4,5,6 0.45 x1 0.3 x2 0.25
x1,2 0.55 x3,4,5,6 0.45
x1 0.3 x2 0.25 x3 0.2 x4 0.1 x5 0.1 x6 0.05
x1 0.3 x2 0.25 x3 0.2 x5,6 0.15 x4 0.1
x1 0.3 x2 0.25 x4,5,6 0.25 x3 0.2
28
x1 00 x2 01
x1,2 0 x3,4,5,6 1
x3 10 x4,5,6 11
x3,4,5,6 1
x4 110 x5,6 111
x5 1110 x6 1111
29
  • pf
  • We assume that C1 is not optimal.
  • Let C1 be an optimal instantaneous code for x1,
    , xM.
  • Then C1 has codewords w1, w2, , wM with
    length n1, n2, , nM.
  • If there are only two symbols of maximum length
    in a tree, they must have their last decision
    node in common, and they must be the two least
    probable symbols. Before we reduce a tree, the
    two symbols contribute nM( pM pM-1) and after
    the reduction they contribute (nM - 1)( pM
    pM-1).
  • So that the code length is reduced by ( pM
    pM-1).
  • Average length of C1 gt Average length of C1
    --- (1)
  • After reduction,
  • Average length of C2 gt Average length of C2
  • (The terms of (1) minus pMpM-1)
  • But C2 is optimal ??

30
  • If there are more than two symbols of the maximum
    length, we can use the following proposition
  • Symbols having the same length may be
    inter-change without changing the average code
    length.
  • We can use the biggest two probable symbols to
    encode like the way before.
  • Huffman encoding is not unique.

31
  • Ex

p1 0.4 ? 00 p2 0.2 ? 10 p3 0.2 ? 11 p4
0.1 ? 010 p5 0.1 ? 011
Average length L 0.420.220.220.130.1
3 2.2
p1 0.4 ? 1 p2 0.2 ? 01 p3 0.2 ? 000 p4
0.1 ? 0010 p5 0.1 ? 0011
Or
Average length L 0.410.220.230.140.1
4 2.2
32
  • Which encoding way is better?
  • Var( I ) 0.4(2-2.2)2 0.2(2-2.2)2
    0.2(2-2.2)2 0.1(3-2.2)2 0.1(3-2.2)2 0.16
    (Good!)
  • Var( II ) 0.4(1-2.2)2 0.2(2-2.2)2
  • 0.2(3-2.2)2 0.1(4-2.2)2 0.1(4-2.2)2
    1.36
Write a Comment
User Comments (0)
About PowerShow.com