Data Compression Basics - PowerPoint PPT Presentation

About This Presentation

Title:

Data Compression Basics

Description:

Motivation of Data Compression. Lossless and Lossy Compression Techniques. Static Lossless Compression: Huffman Coding. Correctness of Huffman Coding : prefix property. – PowerPoint PPT presentation

Number of Views:771

Avg rating:3.0/5.0

Slides: 21

Provided by: hat2

Category:

more less

Transcript and Presenter's Notes

Title: Data Compression Basics

1
Data Compression Basics Huffman Coding

Motivation of Data Compression.
Lossless and Lossy Compression Techniques.
Static Lossless Compression Huffman Coding.
Correctness of Huffman Coding prefix property.

2
Why Data Compression?

Data storage and transmission cost money. This
cost increases with the amount of data available.
This cost can be reduced by processing the data
so that it takes less memory and less
transmission time.
Data transmission is faster by using better
transmission media or by compressing the data.
Data compression algorithms reduce the size of a
given data without affecting its content.
Examples
. Huffman coding
. Run-Length coding
. Lempel-Ziv coding

3
Lossless and Lossy Compression Techniques

Data compression techniques are broadly
classified into lossless and lossy.
Lossless techniques enable exact reconstruction
of the original document from the compressed
information while lossy techniques do not.
Run-length, Huffman and Lempel-Ziv are lossless
while JPEG and MPEG are lossy techniques.
Lossy techniques usually achieve higher
compression rates than lossless ones but the
latter are more accurate.

4
Lossless and Lossy Compression Techniques (cont'd)

Lempel-Ziv reads variable-sized input and outputs
fixed length bits while Huffman coding is the
exact opposite.
Lossless techniques are classified into static
and adaptive.
In a static scheme, like Huffman coding, the data
is first scanned to obtain statistical
information before compression begins.
Adaptive models like Lempel-Ziv begin with an
initial statistical distribution of the text
symbols but modifies this distribution as each
character or word is encoded.
Adaptive schemes fit the text more closely but
static schemes involve less computations and are
faster.

5
Introduction to Huffman Coding

What is the likelihood that all symbols in a
message to be transmitted have the same number of
occurrences?
Huffman coding assigns different bits to
characters based on their frequency of
occurrences in the given message.
The string to be transmitted is first analysed to
find the relative frequencies of its constituent
characters.
The coding process generates a binary tree, the
Huffman code tree, with branches labeled with
bits (0 and 1).
The Huffman tree must be sent with the compressed
information to enable the receiver decode the
message.

6
Example 1 Huffman Coding

Example 1 Information to be transmitted over the
internet contains the following characters with
their associated frequencies as shown in the
following table
.Use Huffman technique to answer the following
questions
Build the Huffman code tree for the message.
Use the Huffman tree to find the codeword for
each character.
If the data consists of only these characters,
what is the total number of bits to be
transmitted? What is the percentage saving if the
data is sent with 8-bit ASCII values without
compression?
Verify that your computed Huffman codewords are
correct.

t s o n l e a Characters
53 22 18 45 13 65 45 Frequency

7
Example 1 Huffman Coding (Solution)

Solution The Huffman coding process uses a
priority queue and binary trees using the
frequencies.
We begin by filling the priority queue with
one-node binary trees each containing a frequency
count and the symbol with that frequency.
The initial priority queue is built by arranging
the one-node binary trees in decreasing order of
frequency.
The object with the lowest priority is designated
as the front of the queue.
At each step, the priority queue is manipulated
as outlined next

8
Example 1 Huffman Coding (Solution)

The priority queue is manipulated as follows
1. Dequeue two trees from the front of the queue.
2. Construct a new binary tree from the two trees
as follows
a. Construct a new tree by using the two trees
that were dequeued as
the left and right subtrees of the new tree
b. Give the new tree the priority that is the sum
of the priorities of its left and right subtrees.
3. Enqueue the new tree using as its priority the
sum of the priorities of the two trees used to
construct it.
4. Continue this process until only one tree is
in the priority queue.

9
Example 1 Huffman Coding Step 1

front
l o s n a
t e
13 18 22 45 45 53
65

10
Example 1 Solution (cont'd)

front
s n a
t e
22 31 45 45
53 65
l o

11
Example 1 Solution (cont'd)

front
n a
t e
45 45 53
53 65
s 31
l
o

12
Example 1 Solution (cont'd)

front
t e
53 53 65
90
s 31
n a
l o

13
Example 1 Solution (cont'd)

front
e
65 90
106
n a 53
t
s
31
l o

14
Example 1 Solution (cont'd)

front
106 155
53 t e
90
s 31 n
a
l o

15
Example 1 Solution (cont'd)

261
106 155
53 t e
90
s 31 n
a
l o

16
Example 1 Solution (cont'd)

261
106 155
53 t e
90
s 31 n
a
l o

1
0
1
1
0
0
1
0
0
1
0
1
17
Example 1 Solution (cont'd)

261
106 155
53 t e
90
s 31 n
a
l o

1
0
1
1
0
0
1
0
0
1
0
1
18
Example 1 Solution (cont'd)

The sequence of zeros and ones that are the arcs
in the path from the root to each terminal node
are the desired codes
Character a e l
n o s
t
if we assume the message consists of only the
characters a,e,l,n,o,s and t then the number of
bits transmitted will be
265253345345322418413 696 bits
If the message is sent uncompressed with 8-bit
ASCII
representation for the characters, we have
2618 2088 bits, i.e. we saved about 70
transmission time.

01 000 0011 110 0010 10 111 Codeword
19
Example 1 Solution The Prefix Property

Data encoded using Huffman coding is uniquely
decodable. This is because Huffman codes satisfy
an important property called the prefix property.
This property guarantees that no codeword is a
prefix of another Huffman codeword
For example, 10 and 101 cannot simultaneously be
valid Huffman codewords because the first is a
prefix of the second.
Thus, any bitstream is uniquely decodable with a
given Huffman code.
We can see by inspection that the codewords we
generated (shown in the preceding slide) are
valid Huffman codewords.

20
Exercises

Using the Huffman tree constructed in this
session, decode the following sequence of bits,
if possible. Otherwise, where does the decoding
fail?
10100010111010001000010011
Using the Huffman tree construted in this
session, write the bit sequences that encode the
messages
test , state , telnet , notes
Mention one disadvantage of a lossless
compression scheme and one disadvantage of a
lossy compression scheme.
Write a Java program that implements the Huffman
coding algorithm.