Title: Design and Implementation of FPGA-based systolic array for
1 Design and Implementation of FPGA-based systolic
array for LZ Data Compression
By Mohamed Ahmed Abd El Ghany Ahmed
2006
2Overview
- Introduction to Data Compression
- Data Compression Methods
- Systolic Array Operation in LZ
- Proposed design (Design-P)
- FPGA Implementation
- Testing Application
- Software simulation
- Conclusions
3Introduction to Data Compression
- Data compression is the process of converting an
input data stream into another data stream with a
reduced size. - Benefits of data compression
- Reduction of data storage requirements
- Reduction of data transfer cost
4Data Compression Methods
Lossy Data Compression
Lossless Data Compression
The decompressed data are some approximation of
the original data
The decompressed data must always be identical to
the original data
Run-Length Encoding
Transform coding schemes
Statistical Methods
Vector Quantization schemes
Dictionary Methods
Sub-band coding schemes
5Lempel Ziv Algorithms
LZ78
LZ77
LZSS
LZH
LZW
LZMW
6LZSS Idea
Lookahead buffer
Dictionary
b b a d e
c b b a c d e
a a . . . .
a b
Window
Output codeword (1, Ip, Lmax)
(1, 2, 3)
d e a a e
a c d e b b a
f g . . . .
b b
Shifting by Lmax ( 3 )
7Codeword length Lc
Lc log2 (dictionary length) log2 (lookahead
buffer length) 1 bits
In the example, Lc log2(7)log2(5)1 7 bits
b b a
(1, 2,3)
3 bytes 24 bits
7 bits
Compressed to
8Non-Match Case
Lookahead buffer
Dictionary
f a c d e
c b b a c d e
a a . . . .
a b
Window
Output codeword (0, S) S first symbol of
lookahead buffer
(0, f )
a c d e a
b b a c d e f
a . . . .
b c
Shifting by 1
9Systolic Array Operation in LZ
dictionary
Lookahead buffer
X0 X1 X2 X3 X4 X5 X6 Y0 X7 Y1 X8 Y2
Length Ls
Length n-Ls
10Interleaved Design (Design-i)
Li
PE2
PE1
PE0
D
D
X7 X4 X6 X3 X5 X2 X4 X1 X3 X0
D
D
Y2
Y1
Y0
X8 X7 X6 X5 X4 X3 X2 X1 X0
Input sequence
11The Match Results Block
12Proposed Design (Design-P)
PE2
PE1
PE0
X7.X2 X1 X0
D
D
Y2
Y1
Y0
E0
E1
E2
Li
L-encoder
13Design-P PE
Design-i PE
14L-Encoder
E0
Li0
E1
Li1
E2
15MRB of Design-P
MRB of Design-2i
16Parallel Compression
PE2
PE1
PE0
D
D
X0
Y2
Y1
Y0
X1
X2
E0
E1
E2
LI
X3
L-encoder
X4
X5
PE2
PE1
PE0
X6
D
D
X7
Y2
Y1
Y0
X8
E0
E1
E2
LII
L-encoder
17LZ Compression Chip
Yi
SALZC component
FIFO
Xi
Input sequence
Li
Control
Control_FIFO
Host controller
Code word
18First-in-First-out (FIFO)
Block RAM
Write_counter
Write_address
Input_sequence
controls
Read_counter
read_address
19The implementation results of Design-P and
Design-i
Maximum Frequency Number of BRAMs Number of BRAMs Number of 4 input LUTs Number of 4 input LUTs Number of Slice Flip Flops Number of Slice Flip Flops Number of Slices Number of Slices
200 MHz 14 14 4704 4704 4704 4704 2352 2352
113.766 MHz 7 1 8 398 8 401 12 302 Design-p (n512, Ls16)
79.815 MHz 7 1 13 619 10 500 19 459 Design-2i (n512, Ls16)
104.308 MHz 14 2 8 419 8 408 13 310 Design-p (n1024, Ls16)
79.700 MHz 14 2 13 650 10 511 20 471 Design-2i (n1024, Ls16)
20I/O Interface of LZ Compression Chip
LZ compression chip
Data input
codeword
8
16
Codeword ready
Control signals
6
end
21Testing Application
22Data Flow of Testing Application
Data stream
PC
LZ compression Chip
Compressed data
23Decompression Architecture
24The Compression Rate (Rc)
LsW
clk
Rc
n-Ls1
- Example
- The dictionary size (n) 1k
- Ls 16
- w 8
- clk 104.308 MHz
Rc 13 Mbit per second
25Software Simulation
Data Sets
Calgary corpus
Silesia corpus
26Experiments on the Calgary corpus
27Experiments on the Silesia Corpus
28Conclusions
- The proposed implementation is area and speed
efficient. The compression rate is increased by
more than 40 and the design area is decreased by
more than 30. - The prototype is executed using XILINX, Spartan
II FPGA. - The chip can be incorporated among real-time
systems so that data can be compressed and
decompressed on-the-fly.
29Future Work
- Studying the effect of combining the proposed
architecture for LZ data compression and elliptic
curve cryptography in a single chip. - Study the fast string matching techniques are
required to accelerate the compression process. - By modifying the host controller and including,
e.g., dictionaries, our chip can be used for
other string-matching based LZ algorithms, such
as LZ78 and LZW.
30Thanks