Title: P1252109244ROZBA
1April 30th, 2003
Parallel Design of JPEG2000 Image Compression
Xiuzhen Huang
CS Department UC Santa Barbara
2Outline
- Introduction to image compression
- JPEG2000 compression scheme
- Parallel implementation of JPEG2000
- On distributed-memory multiprocessors
- On shared-memory multiprocessors
- Conclusion
3Introduction to Image Compression
Why do we need image compression?
File size of a small digital photo without
compression
1280 ? 800 ? 3 (RGB) 3 M bytes
800 pixels
To speedup the image transmission over
Internet and reduce image storage space, we need
compression
1280 pixels
4Introduction to Image Compression
Original Picture 3 M bytes
JPEG2000 Compression 19 K bytes
- Compression Ratio gt150 times !
- No noticeable difference in picture quality
5JPEG2000 International Standard
JPEG2000 the new international standard for
image compression, is much more efficient than
the old JPEG international standard. For the same
compression ratio / bit rate / file size, the
JPEG2000 picture has much better quality.
JPEG
JPEG2000
Original Picture
Compression ratio 501
Strong blockiness
6JPEG2000 International Standard
JPEG2000 has a much Higher computational
complexity than JPEG, especially for larger
pictures.
Need parallel implementation to reduce
compression time.
7JPEG2000 Compression Scheme
Major steps of JPEG2000 image compression
Wavelet Transform
Blockwise Partition
Coding of each block
Input
Binary Compressed data
- Wavelet transform uses most of the image
compression time (gt80) - parallel implementation should focus on wavelet
transform
8JPEG2000 Compression Scheme
Brief Introduction to Wavelet Transform
Step 1 Horizontal wavelet transform of an image
for each row do 1-D wavelet transform end
What is 1-D wavelet transform ?
9JPEG2000 Compression Scheme
A simple example 1-D Haar wavelet transform
Low- Frequency coefficients
Low-pass filter
Average of neighboring pixels
1, 1
2
First half of the output
One array of image data
Down-sample by 2
High- Frequency coefficients
Difference of neighboring pixels
1, -1
2
Second half of the output
high-pass filter
High
Low
Horizontal Wavelet Transform of Each Row
10JPEG2000 Compression Scheme
Wavelet Transform
Step 2 Vertical transform of image
for each column of the new image do 1-D wavelet
transform end
11JPEG2000 Compression Scheme
Low
High
Horizontal Wavelet Transform of Each Row
Vertical Wavelet Transform of Each Column
Low Low
High Low
High High
Low High
12Parallel Design of JPEG2000 Compression
Two Parallel Computing Architectures
Distributed-Memory Multiprocessors
- Each processor has its own memory module
- Processors communicate to each other over a
high-speed network - Programming tool MPI (Message Passing Interface)
Shared-Memory Multiprocessors
- Has a single address space.
- Allow processors to communicate through variables
stored in a shared address space - Programming tool openMP
13Parallel Implementation of JPEG2000
Compression on Distributed-Memory Multiprocessors
14Parallel Design of JPEG2000 Compression-DMP
Traditional Approach
- The image is first divided into n regions on
rows. - Each processor performs 1-D horizontal wavelet
transform - Then, the new image is divided into n regions on
columns. - Each processor performs 1-D vertical wavelet
transform.
This approach requires intensive data
transmission among processors, has very high
network communication cost.
15Parallel Design of JPEG2000 Compression-DMP
Tiling Approach
P1
P2
P3
- JPEG2000 international standard supports
tile-based image compression. - A large image is divided into several tiles and
each image tile is compressed independently.
P5
P4
P6
P8
P7
P9
16Parallel Design of JPEG2000 Compression-DMP
Choose MPI for parallel implementation of
JPEG2000, because the JPEG2000 software is
written in C, which supported by MPI. Basic
framework is
17Parallel Design of JPEG2000 Compression-DMP
Image 512x512
Size 32
Compression Time (Sec)
Size 256
Number of processors
The picture shows the compression time using
different tile size. For each tile size,processor
number increases,compression time is reduced.The
small tile need larger computation overhead.
18Parallel Design of JPEG2000 Compression-DMP
Note
- There is a jump between one process and two
processes. - When there is only one process, JPEG2000
compression is sequential - If there are more than two processes involved in
the program, Process 1 is responsible for
collecting data, while the others are responsible
for processing different tiles and sending
processed data back to the Process 1.
19Parallel Implementation of JPEG2000
Compression on Shared-Memory Multiprocessors
20Parallel Design of JPEG2000 Compression-SMP
A problem with tile-based approach
Images compressed by JPEG, JPEG2000, and JPEG2000
with relatively small tiles.
Each tile is compressed independently, which
causes discontinuity across tile edges, also
called blockiness.
21Parallel Design of JPEG2000 Compression-SMP
- Another parallel architecture is shared-memory
multiprocessors. - The excellent price-performance ratio of
Intel-based SMPs make such systems very popular
in many data processing applications. - There are also many available programming tools
for shared memory processor, such as openMP and
Java Threads.
22Parallel Design of JPEG2000 Compression-SMP
- In SMP, we do not need worry about data
communication over network, because the data is
in the shared memory. So there is no need for
tile partitioning. - Therefore, we can use the traditional data
partitioning approach for horizontal and vertical
wavelet transforms.
23Parallel Design of JPEG2000 Compression-SMP
- JPEG2000 image compression is implemented on a
4-processor SMP system using direct openMP. - The speedup in wavelet transform is only about
1.6 times, which is supposed to be near 4 times. - Why?
24Parallel Design of JPEG2000 Compression-SMP
It is found that the vertical wavelet transform
requires more than 10 times the horizontal
transform. But we know that both vertical and
horizontal transforms have the same number of
operations.
vertical
horizontal
25Parallel Design of JPEG2000 Compression-SMP
Cache Miss Problem
- In computer memory, the image data is stored line
by line in a raster-scan order (from left to
right, from top to bottom). - Each continuous block of image data is brought
into the cache from memory for wavelet transform. - In horizontal wavelet transform, as the filter
window is moving, the data of next transform is
often available, few cache miss.
26Parallel Design of JPEG2000 Compression-SMP
Cache Miss Problem
data
- In vertical wavelet transform, the filtering is
done in the vertical direction, however, the data
is brought into cache in a horizontal way. So,
there are very frequent cache miss.
filtering
Solution
Do vertical transform of several columns at the
same time to make full use of the existing data
in the cache. , instead of column by column
Significantly reduces cache miss.
27Parallel Design of JPEG2000 Compression-SMP
The vertical transform is speed up by about 10
times.
Original Vertical transform
Improved Vertical transform
28Parallel Design of JPEG2000 Compression-SMP
Using the improved vertical wavelet transform,
the overall speedup times of wavelet transform is
now close to the number of processors.
29Conclusion
- Give a brief review JPEG2000 image compression.
- Discussed two approaches for parallel
implementation of JPEG2000 image compression
distributed memory multiprocessor and shared
memory multiprocessor.
Question?