Variable Length Coding (Compression) - PowerPoint PPT Presentation

About This Presentation
Title:

Variable Length Coding (Compression)

Description:

IT-101 Section 001 Introduction to Information Technology Lecture #6 Introduction Compression techniques can significantly reduce the bandwidth and memory required ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 23
Provided by: mkur
Learn more at: http://mason.gmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Variable Length Coding (Compression)


1
IT-101Section 001
Introduction to Information Technology
  • Lecture 6

2
  • Overview
  • Chapter 7 Compression
  • Introduction
  • Entropy
  • Huffman coding
  • Universal coding

3
Introduction
World Wide Web not World Wide Wait
  • Compression techniques can significantly reduce
    the bandwidth and memory required for sending,
    receiving, and storing data.
  • Most computers are equipped with modems that
    compress or decompress all information leaving or
    entering via the phone line.
  • With a mutually recognized system (e.g. WinZip)
    the amount of data can be significantly
    diminished.
  • Examples of compression techniques
  • Compressing BINARY DATA STREAMS
  • Variable length coding (e.g. Huffman coding)
  • Universal Coding (e.g. WinZip)
  • IMAGE-SPECIFIC COMPRESSION (will will see that
    images are well suited for compression)
  • GIF and JPEG
  • VIDEO COMPRESSION
  • MPEG

4
Why can we compress information?
REDUNDANCY
  • Compression is possible because information
    usually contains redundancies, or information
    that is often repeated.
  • For example, two still images from a video
    sequence of images are often similar. This fact
    can be exploited by transmitting only the changes
    from one image to the next.
  • For example, a line of data often contains
    redundancies
  • File compression programs remove this redundancy.

Ask not what your country can do for you - ask
what you can do for your country.
5
FREQUENCY
  • Some characters occur more frequently than
    others.
  • Its possible to represent frequently occurring
    characters with a smaller number of bits during
    transmission.
  • This may be accomplished by a variable length
    code, as opposed to a fixed length code like
    ASCII.
  • An example of a simple variable length code is
    Morse Code.
  • E occurs more frequently than Z so we
    represent E with a shorter length code

. E - T - - . . Z -
- . - Q
6
Information Theory
  • Variable length coding exploits the fact that
    some information occurs more frequently than
    others.
  • The mathematical theory behind this concept is
    known as INFORMATION THEORY
  • Claude E. Shannon developed modern Information
    Theory at Bell Labs in 1948.
  • He saw the relationship between the probability
    of appearance of a transmitted signal and its
    information content.
  • This realization enabled the development of
    compression techniques.

7
A Little Probability
  • Shannon (and others) found that information can
    be related to probability.
  • An event has a probability of 1 (or 100) if we
    believe this event will occur.
  • An event has a probability of 0 (or 0) if we
    believe this event will not occur.
  • The probability that an event will occur takes on
    values anywhere from 0 to 1.
  • Consider a coin toss heads or tails each has a
    probability of .50
  • In two tosses, the probability of tossing two
    heads is
  • 1/2 x 1/2 1/4 or .25
  • In three tosses, the probability of tossing all
    tails is
  • 1/2 x 1/2 x 1/2 1/8 or .125
  • We compute probability this way because the
    result of each toss is independent of the
    results of other tosses.

8
Entropy
  • If the probability of a binary event is .5 (like
    a coin), then, on average, you need one bit to
    represent the result of this event.
  • As the probability of a binary event increases or
    decreases, the number of bits you need, on
    average, to represent the result decreases
  • The figure is expressing that unless an event is
    totally random, you can convey the information of
    the event in fewer bits, on average, than it
    might first appear
  • Lets do an example...

As part of information theory, Shannon
developed the concept of ENTROPY
Bits
Probability of an event
9
Example from text..
A MENS SPECIALTY STORE
  • The probability of male patrons is .8
  • The probability of female patrons is .2
  • Assume for this example, groups of two enter the
    store. Calculate the probabilities of different
    pairings
  • Event A, Male-Male. P(MM) .8 x .8 .64
  • Event B, Male-Female. P(MF) .8 x .2 .16
  • Event C, Female-Male. P(FM) .2 x .8 .16
  • Event D, Female-Female. P(FF) .2 x .2 .04
  • We could assign the longest codes to the most
    infrequent events while maintaining unique
    decodability.

10
Example (cont..)
  • Lets assign a unique string of bits to each
    event based on the probability of that event
    occurring.
  • Event Name Code A Male-Male 0
    B Male-Female 10 C Female-Male 110
    D Female-Female 111
  • Given a received code of 01010110100, determine
    the events
  • The above example has used a variable length code.

11
Variable Length Coding
Takes advantage of the probabilistic nature of
information.
  • Unlike fixed length codes like ASCII, variable
    length codes
  • Assign the longest codes to the most infrequent
    events.
  • Assign the shortest codes to the most frequent
    events.
  • Each code word must be uniquely identifiable
    regardless of length.
  • Examples of Variable Length Coding
  • Morse Code
  • Huffman Coding

If we have total uncertainty about the
information we are conveying, fixed length codes
are preferred.
12
Morse Code
  • Characters represented by patterns of dots and
    dashes.
  • More frequently used letters use short code
    symbols.
  • Short pauses are used to separate the letters.
  • Represent Hello using Morse Code
  • H . . . .
  • E .
  • L . - . .
  • L . - . .
  • O - - -
  • Hello . . . . . . - . . . - . . - - -

13
Huffman Coding
The Huffman coding procedure finds the optimum,
uniquely decodable, variable length code
associated with a set of events, given their
probabilities of occurrence.
  • Creates a Binary Code Tree
  • Nodes connected by branches with leaves
  • Top node root
  • Two branches from each node

14
Huffman Coding
  • A 0
  • B 10
  • C 110
  • D 111
  • Given the adjacent Huffman code tree, decode the
    following sequence 11010001110

15
Huffman Code Construction
  • First list all events in descending order of
    probability.
  • Pair the two events with lowest probabilities and
    add their probabilities.

.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
0.15
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
16
Huffman Code Construction
  • Repeat for the pair with the next lowest
    probabilities.

0.15
0.25
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
17
Huffman Code Construction
  • Repeat for the pair with the next lowest
    probabilities.

0.4
0.15
0.25
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
18
Huffman Code Construction
  • Repeat for the pair with the next lowest
    probabilities.

0.4
0.6
0.15
0.25
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
19
Huffman Code Construction
  • Repeat for the last pair and add 0s to the left
    branches and 1s to the right branches.

1
0
0.4
0.6
0
1
0.15
0.25
0
1
1
0
0
1
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
00
01
100
101
110
111
20
Exercise
  • Given the code we just constructed
  • Event A 00 Event B 01
  • Event C 100 Event D 101
  • Event E 110 Event F 111
  • How can you decode the string 0000111010110001000
    000111?
  • Starting from the leftmost bit, find the shortest
    bit pattern that matches one of the codes in the
    list. The first bit is 0, but we dont have an
    event represented by 0. We do have one
    represented by 00, which is event A. Continue
    applying this procedure

21
Universal Coding
  • Huffman has its limits
  • You must know a priori the probability of the
    characters or symbols you are encoding.
  • What if a document is one of a kind?
  • Universal Coding schemes do not require a
    knowledge of the statistics of the events to be
    coded.
  • Universal Coding is based on the realization that
    any stream of data consists of some repetition.
  • Lempel-Ziv coding is one form of Universal Coding
    presented in the text.
  • Compression results from reusing frequently
    occurring strings.
  • Works better for long data streams. Inefficient
    for short strings.
  • Used by WinZip to compress information.

22
Lempel-Ziv Coding
  • The basis for Lempel-Ziv coding is the idea that
    we can achieve compression of a string by always
    coding a series of zeroes and ones as some
    previous string (prefix string) plus one new bit.
    Compression results from reusing frequently
    occuring strings
  • We will not go through Lempel-Ziv coding in
    detail..
Write a Comment
User Comments (0)
About PowerShow.com