Adaptive Dictionary: - PowerPoint PPT Presentation

About This Presentation

Title:

Adaptive Dictionary:

Description:

LZ and LZW Adaptive Dictionary In 1977 and 1978 two papers were published by Jacob Ziv and Abraham Lemple that would produce a compression scheme still widely used ... – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 39

Provided by: Joseph513

Learn more at: http://user.engineering.uiowa.edu

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Dictionary:

1
Adaptive Dictionary

LZ and LZW

2
Adaptive Dictionary

In 1977 and 1978 two papers were published by
Jacob Ziv and Abraham Lemple that would produce a
compression scheme still widely used today
(1977) LZ77 or LZ1
(1978) LZ78 or LZ2
These techniques, and variations, are used in
data compression
File Compression in UNIX (compress)
Image Compression (GIF graphical Interchange
format)
Compression over Modems V.42 bis

3
LZ77

This algorithm is base on a portion of the
previous encoded sequence
The encoder examines the input sequence through a
sliding window
The sliding window consists of two parts
Search Buffer
Look-ahead Buffer

4
LZ77

Encoding Process
Move pointer back into search buffer in order to
obtain a match of with the symbol to be encoded
Offset distance from symbol to be encoded
Encoder then examines the symbol following the
symbol to be encoded and the matching symbol in
the search buffer to see if they match
consecutive symbols in the look-ahead buffer
Length number of consecutive symbols matching
symbol to be encoded in the search buffer
The encoder stores the longest match and
continues back through the search buffer in order
to possibly find a longer length match

5
LZ77

Encoding Process (contd)
Once the search is complete, the encoder encodes
the information to be sent with a triple
lto, l, cgt
o offset
l length
c codeword of the symbol following the match in
the look-ahead buffer

6
LZ77

Encoding Process (contd)
Note The reason the third element (c) is placed
in the triple is to take care of the situation
that no match was found in the search buffer
(i.e. l 0)
This may seem inefficient, sending a triple when
we only need to encode c, however this
situation is not common due to the actual size of
the search buffers. (in practice the search
buffers are much larger than the examples in this
presentation)
The reason why this is done will become clear
with an example

7
LZ77

Encoding Process (contd)
Let S represent the size of the search buffer
Let W represent the size of the entire window
Let A represent the size of the alphabet
Using fixed length codes, the triple is encoded
using
? Log2 (S) ? ? Log2 (W) ? ? Log2 (A) ?
bits
Note ? x ? is the ceiling function
? 3.5 ? 4.0 (ceiling function)
_ 3.5 _ 3.0 (floor function)

8
LZ77

Encoding Process (contd)
The second term, Log2 (W), may seem a bit
strange. It may, at first, seem as though the
second term should be Log2 (S). However, the
length of the match may extend into the
look-ahead buffer. This will become clear in an
example
There are 3 cases to consider in this algorithm
No match in the search buffer
There is a match within the search buffer
The match extends inside the look-ahead buffer
The following example outlines each of these cases

9
LZ77

Encoding Process (contd)
Example
Let W 13, S 7 (which implies the LAB 6)
Suppose the sequence to be encoded is
cabracadabrarrarrad
It can be seen that there is no match in the
search buffer for d. Thus, we transmit the
triple lt0,0,C(d)gt
Shift the window by 1 symbol

cabraca dabrar
10
LZ77

Encoding Process (contd)
Example (contd)
A match is found at o 2, l 1
Another match is found at o 4, l 1
Another match is found at o 7, l 4
Thus, we encode the triple as lt7, 4, C(r)gt
Shift the window by 5 symbols

abracad abrarr
11
LZ77

Encoding Process (contd)
Example (contd)
A match is found at o 1, l 1
Another match is found at o 3, l 3 if we do
not look further into the look-ahead buffer
However, if we do look into the look-ahead
buffer, we can extend our length to 5
This resolves the question regarding the second
term Log2 (W) in our bits needed to encode the
triple

adabrar rarrad
12
LZ77

Encoding Process (contd)
Example (contd)
Thus, we encode the triple as lt3, 5, C(d)gt
If we were continuing to encode symbols we would
again shift the window by 6 symbols
Decoding Process
The decoding process is best understood by an
example

adabrar rarrad
13
LZ77

Decoding Process (contd)
Example
Assume we have already decoded the sequence
cabraraca and have received the triples
(1) lt0, 0, C(d)gt
(2) lt7, 4, C(r)gt
(3) lt3, 5, C(d)gt
Initially start at
(0)

cabraraca
14
LZ77

Decoding Process (contd)
Example (contd)
(0)
(1) lt0, 0, C(d)gt
(2) lt7, 4, C(r)gt
(3) lt3, 5, C(d)gt

cabraca
c abraca d
cabrac ad abra r
cabracadabra r rarra d
15
LZ77

Decoding Process (contd)
Example (contd)
(2) lt7, 4, C(r)gt

16
LZ77

Decoding Process (contd)
Example (contd)
(3) lt3, 5, C(d)gt

17
LZ77 - SUMMARY

In General
The algorithm is a simple adaptive scheme that
requires no prior knowledge of the source and
seems to require no assumptions
Lemple and Ziv showed that asymptotically the
performance of this algorithm approaches the best
that could be obtained by using a scheme that had
full knowledge about the statistics of the source
This may be true asymptotically, however in
practice there are ways to improve LZ77
There is a hidden assumption that patterns
recur close together. We shall see that this
assumption is removed in LZ78

18
LZ77 - SUMMARY

Variations
Efficient encoding of triples
With added complexity we could drop the
assumption that the triples are fixed length
PKzip, Zip, LHarc, PNG, gzip, ARJ all use LZ77
with variable-length encoder
Varying the size of the search and look-ahead
buffers
Increasing the size of the search buffer will
require more effective search strategies
Such strategies can be implanted more effectively
if the contents of the search buffer are stored
in a manner conducive to fast searches

19
LZ77 - SUMMARY

Variations
Eliminate encoding data in a triple
This can be done using a flag bit
Implementing the flag bit removes the necessity
of the triple. Now the data can be encoded as
either the single symbol codeword or a pair
representing the match. For example
Flag 1 ? single symbol codeword
Flag 0 ? pair lt o, l gt representing the match
length
This is referred to as LZSS

20
LZ78

Updates to LZ77
The assumptions from LZ77 that patterns will
occur close together was dropped
Makes use of recent past sequence as dictionary
for encoding
However, this means that any pattern that recurs
over a period longer than that covered by the
coder window will not be captured

21
LZ78

Updates to LZ77
It can be seen that if the search window was one
symbol longer. Thus, each symbol will be encoded
as a single symbol
LZSS additional 1-bit overhead
LZ77 triple encoded for a single symbol
Thus, the effect of this problem actually causes
an expansion instead of a compression

22
LZ78

Solution to this problem
LZ78 drops the search buffer for a dictionary
Note care must be taken to identically build the
dictionary by both the encoder and decoder
Now, the date is encoded in a double (or pair)
lt i, c gt
i - the index of the symbol in the dictionary
c codeword for the character following the
matched portion of the input

23
LZ78

Example
Let us encode the following word
The character b with a slash represents a space

24
LZ78

Example (contd)

25
LZ78

Example (contd)
Problems
The dictionary grows indefinitely
To resolve this problem there are two options
Pruning
However, added complexity is required in order to
keep track of the most frequently used dictionary
elements
Goes to a static dictionary
This limits the performance of the algorithm

26
LZW

Variation of LZ78
Terry Welch proposed a method for removing the
necessity of encoding the pair lt i, c gt and only
encoding the index
The dictionary must be primed with the source
alphabet
This variation is know as LZW

27
LZW