Greedy Algorithms (Huffman Coding) - PowerPoint PPT Presentation

About This Presentation

Title:

Greedy Algorithms (Huffman Coding)

Description:

(Huffman Coding) Slide * * Huffman Coding A technique to compress data effectively Usually between 20%-90% compression Lossless compression No information is lost ... – PowerPoint PPT presentation

Number of Views:307

Avg rating:3.0/5.0

Slides: 33

Provided by: webCsWpi99

Learn more at: http://web.cs.wpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: Greedy Algorithms (Huffman Coding)

1
Greedy Algorithms(Huffman Coding)
2
Huffman Coding
Original file

A technique to compress data effectively
Usually between 20-90 compression
Lossless compression
No information is lost
When decompress, you get the original file

3
Huffman Coding Applications
Huffman coding
Compressed file
Original file

Saving space
Store compressed files instead of original files
Transmitting files or data
Send compressed data to save transmission time
and power
Encryption and decryption
Cannot read the compressed file without knowing
the key

4
Main Idea Frequency-Based Encoding

Assume in this file only 6 characters appear
E, A, C, T, K, N
The frequencies are

Character Frequency
E 10,000
A 4,000
C 300
T 200
K 100
N 100

Option I (No Compression)
Each character 1 Byte (8 bits)
Total file size 14,700 8 117,600 bits
Option 2 (Fixed size compression)
We have 6 characters, so we need
3 bits to encode them
Total file size 14,700 3 44,100 bits

Character Fixed Encoding
E 000
A 001
C 010
T 100
K 110
N 111
5
Main Idea Frequency-Based Encoding(Contd)

Assume in this file only 6 characters appear
E, A, C, T, K, N
The frequencies are

Character Frequency
E 10,000
A 4,000
C 300
T 200
K 100
N 100

Option 3 (Huffman compression)
Variable-length compression
Assign shorter codes to more frequent characters
and longer codes to less frequent characters
Total file size

Char. HuffmanEncoding
E 0
A 10
C 110
T 1110
K 11110
N 11111
(10,000 x 1) (4,000 x 2) (300 x 3) (200 x
4) (100 x 5) (100 x 5) 20,700 bits
6
Huffman Coding

A variable-length coding for characters
More frequent characters ? shorter codes
Less frequent characters ? longer codes
It is not like ASCII coding where all characters
have the same coding length (8 bits)
Two main questions
How to assign codes (Encoding process)?
How to decode (from the compressed file, generate
the original file) (Decoding process)?

7
Decoding for fixed-length codes is much easier
010001100110111000
Character Fixed-length Encoding
E 000
A 001
C 010
T 100
K 110
N 111
Divide into 3s
010 001 100 110 111 000
Decode
C A T K N E
8
Decoding for variable-length codes is not that
easy
000001
Character Variable-length Encoding
E 0
A 00
C 001

Huffman encoding guarantees to avoid this
uncertainty Always have a single decoding
9
Huffman Algorithm

Step 1 Get Frequencies
Scan the file to be compressed and count the
occurrence of each character
Sort the characters based on their frequency
Step 2 Build Tree Assign Codes
Build a Huffman-code tree (binary tree)
Traverse the tree to assign codes
Step 3 Encode (Compress)
Scan the file again and replace each character by
its code
Step 4 Decode (Decompress)
Huffman tree is the key to decompress the file

10
Step 1 Get Frequencies
Input File
Eerie eyes seen near lake.
11
Step 2 Build Huffman Tree Assign Codes

It is a binary tree in which each character is a
leaf node
Initially each node is a separate root
At each step
Select two roots with smallest frequency and
connect them to a new parent (Break ties
arbitrary) The greedy choice
The parent will get the sum of frequencies of the
two child nodes
Repeat until you have one root

12
Example
Each char. has a leaf node with its frequency
13
Find the smallest two frequenciesReplace them
with their parent
14
Find the smallest two frequenciesReplace them
with their parent
15
Find the smallest two frequenciesReplace them
with their parent
16
Find the smallest two frequenciesReplace them
with their parent
17
Find the smallest two frequenciesReplace them
with their parent
18
Find the smallest two frequenciesReplace them
with their parent
19
Find the smallest two frequenciesReplace them
with their parent
20
Find the smallest two frequenciesReplace them
with their parent
21
Find the smallest two frequenciesReplace them
with their parent
22
Find the smallest two frequenciesReplace them
with their parent
23
Find the smallest two frequenciesReplace them
with their parent
24
Now we have a single rootThis is the Huffman Tree
25
Lets Analyze Huffman Tree

All characters are at the leaf nodes
The number at the root of characters in the
file
High-frequency chars (E.g., e) are near the
root
Low-frequency chars are far from the root

26
Lets Assign Codes

Traverse the tree
Any left edge ? add label 0
As right edge ? add label 1
The code for each character is its root-to-leaf
label sequence

27
Lets Assign Codes
1
0
0
0
1
1
1
0
0
1
0
1
1
0
0
1
1
0
0
1
1
0

Traverse the tree
Any left edge ? add label 0
As right edge ? add label 1
The code for each character is its root-to-leaf
label sequence

28
Lets Assign Codes

Traverse the tree
Any left edge ? add label 0
As right edge ? add label 1
The code for each character is its root-to-leaf
label sequence

29
Huffman Algorithm

Step 1 Get Frequencies
Scan the file to be compressed and count the
occurrence of each character
Sort the characters based on their frequency
Step 2 Build Tree Assign Codes
Build a Huffman-code tree (binary tree)
Traverse the tree to assign codes
Step 3 Encode (Compress)
Scan the file again and replace each character by
its code
Step 4 Decode (Decompress)
Huffman tree is the key to decompess the file

30
Step 3 Encode (Compress) The File
Input File
Eerie eyes seen near lake.

0000
10
1100
0001
10
.
Notice that no code is prefix to any other code
? Ensures the decoding will be unique (Unlike
Slide 8)
31
Step 4 Decode (Decompress)