CHAPTER 3 Fundamentals of Lossy Image Compression presentation

About This Presentation

Transcript and Presenter's Notes

Title: CHAPTER 3 Fundamentals of Lossy Image Compression

1
CHAPTER 3Fundamentals of Lossy Image Compression
2
Lossy Compression System

Lossy compression of images deals with
compression processes where decompression yields
an imperfect reconstruction of the original image
data.
There is always a bound on the minimum bit rate
of the compressed bit stream.
Image data tend to have a high degree of spatial
redundancy.
Within such a system, compression is achieved by
exploiting both the spatial redundancies within
the image and the perceptual characteristics of
the human visual system so that the loss due to
compression may not be discernible to the viewer.

3
Sample-Based Coding

There are two classes of lossy compression
schemes for images
sample-based coding
block-based coding
Spatial domain block coding
Transform-domain block coding
In sample-based coding, the image samples are
compressed on a sample-by-sample bases. The
samples can be either in the spatial domain or in
the frequency.
Differential pulse code modulation (DPCM)

xij
eij
qij

Quantizer
qij
-
Pij
Pij
Predictor
Predictor
Encoder
Decoder
4
Quantizer

If the image is highly correlated, Pij will track
xij, and eij will consequently be quite small.
The residue signal eij is quantized. The
quantizer maps several of its inputs into a
single output. This process is irreversible and
is the main cause of information loss.
For a uniform quantizer, the quantization process
can be expressed as
Since the variance of eij is lower than the
variance of xij, quantizing eij will not
introduce
significant distortion. Furthermore, the
lower
variance corresponds to lower entropy and
thus to higher compression.

qij
5
Block-Based Coding

In spatial-domain block coding, the pixels are
grouped into blocks, and the blocks are then
compressed in the spatial domain.
In transform-domain block coding, the pixels are
grouped into blocks, and the blocks are then
transformed to another domain, such as the
frequency domain.
The motivation for transform coding is a more
compact representation of the data.
Some of the most commonly used transform include
the discrete Fourier transform (DFT), the
discrete cosine transform (DCT), the discrete
sine transform (DST), the discrete Hadamard
transform (DHT), and the Karhunen-Loeve transform
(KLT).

6
Compaction Efficiency for Various Image Transforms
7
Compaction Efficiency for Various Image
Transforms (Cont.)

The KLT basis is the most efficient in terms of
compaction efficiency, since all the energy is
compacted into the top left corner.
It packs the most energy in the least numbers of
elements in Y.
It minimizes the total entropy of the sequence,
and
It completely decorrelated the element in X.
The KLT has several implementation-related
deficiencies
The basis functions are image dependent. The
other basis functions (DFT, DCT, DST, and DHT)
are image independent.
The compaction efficiency of DCT basis is close
to the produced by the KLT. Therefore, it is
widely used in image and video compression
standards.

8
Basic Transformation Forms
9
Transform Coding

Spatial image data (image or motion-compensated
residual image) are transformed into a different
representation, transform domain.
Make the image data easy to be compressed.
Techniques
Discrete cosine transform (DCT)
Usually applied to small regular locks of image,
ex. 8 ? 8 squares.
JPEG, H26X, MPEG-x
Discrete wavelet transform (DWT)
Usually applied to larger image section, ex.
Tiles, or to complete image
JPEG 2000, MPEG-4 still texture

10
Blocks

Process the data in blocks of 8 x 8 samples
Convert Red-Green-Blue into Luminance (greyscale)
and Chrominance (Blue colour difference and Red
colour difference)
Use half resolution for Chrominance (because eye
is more sensitive to greyscale than to colour)

11
Discrete Cosine Transform

Transform each block of 8 x 8 samples into a
block of 8 x 8 spatial frequency coefficients

12
Discrete Cosine Transform
13
An Example of Energy Compaction
14
Two-Dimensional DCT (1974)
15
Discrete Cosine Transform

Any 8 x 8 block of pixels
can be represented as a
sum of 64 basis patterns
(black and white patterns)
Output of the DCT is the
set of weights for these
basis patterns (the DCT
coefficients)
multiply each basis pattern
by its weight and add them
together
result is the original image

16
Discrete Cosine Transform

Most image blocks only contain a few significant
coefficients (usually the lowest frequencies)

17
Hardware Architectures of Discrete Cosine
Transform
18
Hardware/Software Tradeoff

For low-end applications, using software is
powerful enough.
For high-end application, must use hardware
approach.
For middle-end applications, either software or
hardware approach is possible, depending on the
target design platform.

19
DCT Algorithm Classification

Direct 2-D Method
The 2-D transforms, DCT and IDCT, to be applied
directly on the N ? N input data items.
Row-Column Method
The 2-D transform can be carried out with two
passes of 1-D transforms.
The separability property of 2-D DCT/IDCT allows
the transform to be applied on one dimension
(row) then on the other (column)
Require 2N instances of N-point 1-D DCT to
implement an N ? N 2-D DCT.

20
Straightforward Approach

Carry out the computation as full matrix-vector
multiplications
1-D transform requires N ? N multiplications and
N ? (N-1) additions
2-D transform requires N4 multiplications and N ?
N ? (N ? N -1) additions
Although requiring the most number of operations,
this method is very regular.
Most suitable for vector processors or deeply
pipelined architecture for high PE utilization
1-D fast algorithm ? O(NlogN)
2-D fast algorithm ? O(N2logN)

21
1-D DCT Definition
22
4-Point DCT (N4)
23
4-Point DCT Matrix Form
24
4-Point DCT
25
4-Point DCT
16 Mult reduced to 6
26
Butterfly First DCT Stage
P0 M0
x(0) x(3)

P0 X(0) X(3) M0 X(0) X(3)
-

P1 M1
x(1) x(2)

P1 X(1) X(2) M1 X(1) X(2)
-

Reversed input order
27
Butterfly Second Stage
X(0)P0P1?c2 X(1)M0 ? c1 M1 ? c3
X(2)P0-P1?c2 X(3)M0 ? c3 - M1 ? c1
P0 M0
X(0) X(1)
X(2) X(3)
P1 M1
c1
28
4-Point DCT
P0 M0
P1 M1
29
8-Point DCT
30
Row-Column Method Example

A. Madisetti and A. N. Willson Jr., A 100 MHz
2-D 8 ? 8 DCT/IDCT Processor for HDTV
Applications, IEEE Transactions on Circuits and
Systems for Video Technology, vol. 5, no. 2,
pp. 158-165, Apr. 1995.

31
Description of Algorithms
32
Description of Algorithms (Cont.)

A straightforward implementation requires N4
multiplications for the evaluation of the DCT and
IDCT, respectively.
Decomposition to triple matrix product results in
a reduction in computational complexity to 2N3
multiplications.
Since 2N3 multiplications must be performed in N2
clock cycles (or input sample periods), the
computational requirement of such an
implementation is 2N multiplies per input sample.
For an input sample rate of 100 MHz, the
computation requirement is 1.6 GOPS, where each
operation is a multiply-accumulate.

33
Row-Column Method

Basic concept
2-D DCT 1-D DCT (Row) ? 1-D DCT (Column)
Each 1-D DCT unit must be capable of computing N
multiplies per input sample.

YAX
ZYAT
Transpose Memory
1-D DCT/IDCT
1-D DCT/IDCT
Z
X
DCT
DCT for row
for column
34
Row-Column Method (Cont.)

Let first consider the computation of the triple
matrix product Z AXAT for the DCT or Z ATXA
for the IDCT. This is computed as Y AX and Z
YAT for the DCT and Y ATX and Z YA for the
IDCT.

35
Computation of the DCT

Even rows of A are even-symmetric and odd rows
are odd-symmetric.

36
Matrix Decomposition

Reduce an 8 ? 8 matrix computations to two 4 ? 4
matrix computations.

37
Computation of the IDCT
38
System Architecture
39
System Architecture (Cont.)
Z
X
Y
40
Architecture of Data Reorder Unit (DRU)
INSEL
41
Data Flow of DRU
X(3)X(2)X(1)X(0)
Y(3)Y(2)Y(1)Y(0)
x0x1x2x3
42
Data Flow of DRU (Cont.)
X0X1X2X3
X7X6X5X4
X0 X6 X2 X4
X0-X7 X1-X6 X2-X5 X3-X4
X7 X1 X5 X3
X0X7 X1X6 X2X5 X3X4
The first four clock cycles
43
Data Flow of DRU (Cont.)
The next four clock cycles
44
ACF Matrix-Vector Multiplication
45
ACF Matrix-Vector Multiplier
Broadcasting to a, c, f multipliers
Timing and Control
xe
Ye
Mult a
Mult c
Mult f
ACC 0
ACC 1
ACC 2
ACC 3
MUX 41
46
BDEG Matrix-Vector Multiplication
47
BDEG Matrix-Vector Multiplier
48
Hardwired Multiplier
Signed Digit Representation of the DCT
Coefficients
49
Accumulator
50
Transpose Memory
51
Transpose Memory (Cont.)
52
Finite Wordlength Analysis
53
Implementation Results
54
1-D Approach with DA
55
DCT Algorithm
56
DCT Algorithm (Cont.)
57
DCT Algorithm (Cont.)
58
Block Diagram
59
Input Data Format Converter
60
PreAdd and Postadd
61
DA-Based DCT Core
62
DA-Based DCT Core (Cont.)
63
DA-Based DCT Core (Cont.)
64
Transpose Memory
65
1-D Approach with Systolic Array

IEEE Transactions onCircuits and Systems for
Video Technology, Volume 5, Issue 2, April 1995
Page(s)150 - 157

66
DCT Algorithm
67
Three Steps
68
Systolic Array
69
Systolic Array (Cont.)
70
Features of 1-D Approach with Systolic Array
71
Direct 2-D DCT Architecture
72
Direct 2-D DCT Architecture
73
Data Flow Graph

Write a Comment

User Comments (0)

About PowerShow.com

CHAPTER 3 Fundamentals of Lossy Image Compression PowerPoint PPT Presentation