Title: Overview
1High Speed Energy Efficient Architecture for
Finite Ridgelet Transform Shrutisagar
Chandrasekaran and Abbes Amira
Overview September 2004
2Outline
- Research Objectives
- Introduction
- Discrete Ridgelet Transform
- Finite Radon Transform
- Discrete Wavelet Transform
- FRIT Architecture
- FPGA Implementations and Results
- Conclusions
- Future Work and Acknowledgements
3Research Objectives
- To evaluate and model power consumption of FPGA
based designs at various levels of abstraction
and to evolve and implement strategies for low
power energy efficient design
- To develop a high level framework for FPGAs based
matrix algorithms implementation such as Ridglet
transform, matrix multiplication, SVD, DCT,
DWT..etc used in image and signal processing
applications. - To efficiently implement the Finite Ridgelet
Transform (FRIT) on FPGA using Handel C, for
satellite based onboard image compression within
the ongoing Framework development
4Research Objectives
Application User
System Architect
- Estimating Performance Measures
- (Power, Area, Max Frequencyetc)
- Capturing Platform Features at higher level
RLC
CSC
DWT
FPGA Configuration Implementation Reconfiguration
Compilation
MM DCT (1D,2D) FFT (1D, 2D) DWT (1D, 2D) FRAT
(1D, 2D) FRIT (1D,2D) SVD QR
VHDL Handel-C Schematic Hybrid EDIF Bitstream
5Introduction
- Discrete Wavelet Transforms (DWT) have become
powerful tools in a wide range of applications
including - Image/Video Compression (JPEG2000, MPEG-4)
- Aerospace applications (Data denoising,
Satellite/Astronomical image compression,
analysis) - Image/Video Enhancement, Segmentation
- Telecommunication
- The advantage of DWT over existing transforms
such as Discrete Fourier Transform (DFT) and
Discrete Cosine Transform (DCT) is that the DWT
performs a multiresolution analysis of a signal
with localization in both time and frequency
6 Introduction
- The wavelet transform has many limitations when
it comes to representing straight lines and edges
in images - To overcome the weakness of wavelets in higher
dimensions, Candes recently proposed the Ridgelet
transform which deals effectively with line
singularities in 2-D - However, the complexity of its implementation
still remains as a heavy burden on standard
microprocessors where large amounts of data have
to be processed - Therefore, VLSI/FPGA implementations of the
Ridgelet Transforms are needed for real-time
applications.
7The Finite Ridgelet Transform
- The FRIT provides a sparse representation for
functions defined on the continuum plane - The transform allows representing edges and
other singularities along curves in a more
efficient way - The basic idea is to map a line singularity in
the two-dimensional (2-D) domain into a point by
means of the Radon transform. - Then, a one-dimensional (1-D) wavelet is
performed to deal with the point singularity in
the Radon domain
8The Finite Ridgelet Transform
9The Finite Ridgelet Transform
- The two fundamental buliding blocks of the FRIT
are the FRAT and DWT - The FRAT pseudocode is mapped onto hardware
after performing energy and speed optimisations
including parallelism and pipelining - Experimental results in Matlab have shown that
simple lower order wavelets yield better
compression (lesser entropy) when transforming
from FRAT domain to FRIT domain - The HAAR wavelet gives better results than the
CDF2.2 and other higher order wavelets, in terms
of minimising the entropy in the Ridgelet domain
10The Finite Ridgelet Transform
- It is able to transform two dimensional images
with lines into a domain of possible line
parameters, where each line in the image will
give a peak positioned at the corresponding line
parameters - Numerous discretisations of the Radon transforms
have been devised to approximate the continuous
formulae - However, most of them were not designed to be
invertible - transforms for digital images. Alternatively, the
Finite Radon - Transform (FRAT) theory (which means transform
for finite - length signals) originated
-
11The Finite Radon Transform
- The FRAT is defined as summations of image pixels
over a certain set of lines.
- Lkl denotes the set of points that make up a
line on - the lattice Z2p as follows
- Computing the kth Radon projection, i.e., the kth
row of the array, we need to pass all pixels of
the original image once and use p histogrammers
one for every pixel in the row.
12The Finite Radon Transform
for k0(p-1) n k for j 0(p-1)
n n - k if n lt 0 n
np end l n - 1 for I
0(p-1) l l 1 if l
gt p l l - p end
FRAT(k,l) FRAT(k,l) f(i,j)
end end end for j0(p-1) for i0(p-1)
FRAT(p,j) FRAT(p,j) f(i,j) end end
- The FRAT is defined as summations of image pixels
over a certain set of lines.
FRAT Pseudocode
13 Discrete Wavelet Transform
- The work by Daubechies and Mallat led to the
discrete filter based interpretation of wavelets - Wavelets can be implemented as a set of filter
banks comprising a high-pass and a low-pass
filter, each followed by down-sampling by two
14 Discrete Wavelet Transform
- Though the simplest wavelet, the HAAR DWT gives
the best performance in terms of entropy
reduction - Integer to Integer Lifting version of the HAAR
DWT is used to ensure that it is fully invertible - In place transform is performed to reduce the
number and size of on-chip buffers
15FRIT Architecture
- Once the Radon and wavelet transform have been
implemented, the Ridgelet transform is
straightforward - Each output of the radon projection, i.e, each
row of radon transformed image, is simply passed
through the wavelet transform - Dual output buffer configuration is used so that
the FRAT and the DWT can be performed
simultaneously on the chip - In place lifting DWT is performed in the second
output buffer containing the FRAT vectors
16FRIT Architecture
- One input pixel processed on each clock cycle
- No clock edges wasted in buffering input tile
- Fully pipelined input section
- The controller has (p1) counters which generate
address and read/write status of output vectors - Double buffered O/P section to perform DWT in
parallel
17FRIT Architecture
- p1 FRAT vectors are decomposed in parallel, p
is the Block size - Lifting architecture is used to perform the 1D
Haar wavelet transform - In place decomposition performed to reduce
internal buffer size
Core Latency Total Latency including memory access Input Buffer Size Output Buffer Size Can be Pipelined
p2 p2 p 1 - 2 x p2 Yes
18FRIT Architecture
19FPGA Implementations and Results
- In order to verify the performance of the
proposed architectures, designs have been
prototyped on the Celoxica RC1000 board
containing the Xilinx XCV2000E FPGA - Available on chip logic resource include - Slices
19200 - CLB Array 80 x 120 - Block RAM
655,360 bits - Distributed RAM 614,400 bits - The RC1000 has 4 memory banks which communicate
with the host by means of DMA transfers
20FPGA Implementations and Results
- The design has also been synthesised on the
Radiation Hardened QPro Virtex-II FPGA, as it is
the preferred Xilinx FPGA for deployment onboard
satellites - Industry First Radiation Hardened Platform FPGA
Solution - Guaranteed total ionizing dose to 200 krad(si)
and latch-up immune to LET gt 160 MeV-cm2/mg. SEU
upsets lt 1.5E-6 per device day achievable with
recommended redundancy implementation - Certified to MIL-PRF-38535 standard
- Guaranteed over the full military temperature
range (55 C to 125 C)
21FPGA Implementations and Results
Design Flow
22FPGA Implementations and Results
- Handel-C adds constructs to ANSI-C to enable DK
to directly implement hardware - Fully synthesizable HW programming language based
on ANSI-C - Implements C algorithm direct to optimized FPGA
or outputs RTL from C
Handel-C Additions for hardware
Majority of ANSI-C constructs supported by DK
Parallelism Timing Interfaces Clocks Macro
pre-processor RAM/ROM Shared expression Communicat
ions Handel-C libraries FP library Bit
manipulation
Control statements (if, switch, case,
etc.) Integer Arithmetic Functions Pointers Basic
types (Structures, Arrays etc.) define include
Software-only ANSI-C constructs
Recursion Side effects Standard libraries Malloc
23FPGA Implementations and Results
FRAT Implementation
- An empirical study has shown that the choice of a
block size p7 gives the best balance of power
and performance for the FRAT
24FPGA Implementations and Results
- Comparison of performance metrics of the FRAT
sub-block with existing work
Maximum Throughput (FPS) Maximum Energy Per Frame (mJ) Maximum Frequency
Architecture 2 1 225 1.84 81.92 MHz
Proposed Architecture 317 1.38 96.18 MHz
Software Implementation 0.037 - -
1 C.A.Rahman and W.Badawy, Architectures the
Finite Radon Transform, IEE Electronic Letters,
Vol. 40, No. 15, July 2004 Implemented using
Matlab on a 1.8 GHz Pentium 4 workstation
equipped with 1GB DDR RAM
25FPGA Implementations and Results
- Various performance metrics of the FRIT
implemented on the Virtex-E and the QPro
Virtex-II FPGAs
Performance Metrics Virtex-E QPro Virtex-II
Area Occupied (slices) 504 497
Max Frequency (MHz) 43.57 56.67
Max Power (mW) 301.65 196.91
Energy/Frame (mJ) 1.934 1.40
Max Throughput (FPS) 156 202
26FPGA Implementations and Results
- FRIT achieves the best results in terms of
reducing the entropy of the image - This means that better compression can be
achieved
Entropy Source 1 Source 2 Source 3
Source 4.3430 4.9129 3.3197
FRAT 3.6730 4.2748 2.1200
Entropy Source 1 Source 1 Source 2 Source 2 Source 3 Source 3
Entropy Haar cdf2.2 Haar cdf2.2 Haar cdf2.2
DWT 3.5472 3.6836 3.8725 3.9136 2.7530 3.0387
FRIT 3.1115 3.3753 3.4554 3.6639 2.0525 2.4051
27FPGA Implementations and Results
28FPGA Implementations and Results
29FPGA Implementations and Results
30Conclusions
- The Ridgelet transform was recently introduced
to - overcome the weakness of wavelet transforms
- An architecture and its efficient FPGA
implementation - for the Finite Ridgelet transform have been
proposed - The implementations have been carried out for
different input image sources - The implementation results show that proposed
implementation outperforms existing work in terms
of both area and system speed
31Future work and Acknowledgments
- Develop Complete on-chip compression engine for
satellite images - Explore the effect of Algorithmic, architectural
and RTL level optimisations to minimise power
consumption
Acknowledgments
Celoxica (Mr. Roger Gook) and EPSRC for
supporting this work