Title: Reconfigurable Computing
1Reconfigurable Computing (EN2911X,
Fall07) Lecture 09 RC Principles Software (2/4)
Prof. Sherief Reda Division of Engineering, Brown
University http//ic.engin.brown.edu
2Behavioral code optimizing
- Tree-height reduction applies to arithmetic
expression trees and strives to achieve the
expression split into two-operand expressions to
exploit parallelism - The idea is to attempt to balance the expression
tree as much as possible - If we have n operations, what is the best height
that can be achieved? - Example x a b c d
b
c
b
c
a
d
a
d
x
x
3Tree-height reduction
Exploiting the distributive property at the
expense of adding an operation
4Constant and variable propagation
- Constant propagation consists of detecting
constants operands and pre-computing the value of
the operation with that operand. The result might
a constant which can be propagated to other
operations as input - Example
- a 0 b a 1 c 2 b
- Replaced by ? a 0 b 1 c 2
- Variable propagation consists of detecting the
copies of the variable and using the right-hand
side in the following references in place of the
left-hand side - Example
- a x b a 1 c 2 a
- Replaced by ? a x b x 1 c 2 x
5CSE and DCA
- Common Sub-expression Elimination (CSE) avoids
unnecessary computations. - Example
- a x y b a 1 c x y
- Can be replaced by ? a x y b a 1 c
a - Dead code elimination (DCA). Dead code consists
of all operations that cannot be reached, or
whose results is never referenced elsewhere. - Example
- a x b x 1 c 2 x
- The first assignment can be removed if it is
never subsequently referenced
6Operator strength reduction code motion
- Operator strength reduction means reducing the
cost of implementing an operator by using a
weaker one (that uses less hardware / simpler
and faster) - Example
- a x2 b 3 x
- Replaced by ? a x x t x ltlt 1 b x t
- Code motion often applies to loop invariants,
i.e., quantities that are computed inside an
iterative construct but whose values fo not
change from iteration to iteration. - Example
- for (i 1 i lt a b)
- Replaced by ? t a b for ( i 1 i lt t)
7Control-flow-based transformations
- Control-flow transformations are typically
utilized to create more opportunities for
data-flow transformations to be exercised - Model expansion consists in flattening locally
the model call hierarchy. Therefore the called
model disappears, being swallowed by the calling
one. - A possible benefit is that the scope of
application of some optimization techniques is
enlarged yielding potentially a better circuit - Example
- x a b y ab z func(x, y)
- where func(p, q) t q-ppq return t
- ? By expanding func, we get
- x a b y a b z a bab
- ? CSE x ab y ab z a-by
-
8Conditional expansion
- A conditional construct can be always transformed
in a parallel construct with a test in the end. - Conditional expansion can increase the
performance of the circuit when the conditional
clause depends on some late-arriving signal. - However, it can preclude the possibility of
hardware sharing - If (C) then xA else xB? compute A and B in
parallel, x C ?AB
9Loop expansion
- In loop expansion, or unrolling, a loop is
replaced by as many instances of the body as the
number of operations. The benefit is in expanding
the scope of other transformations - Example
x 0 for (i 1 i lt 12 i) x x
ai
10Putting concepts into work Hardware acceleration
using custom instructions
- We studied the concepts HW/SW partitioning and
code optimizations for high-level synthesis - We will apply these concepts with the help of the
Nios-II soft core processor - Difference between Soft and Hard processors
- A hard processor is one that is implemented as a
dedicated, predefined (hardwired) block - As opposed to physically embedding a processor
into the FPGA fabric, it is possible to configure
a group of logic blocks to act as a soft
processor - What are the advantages and disadvantages of each?
11The Nios II soft processor
- 32 bit soft processor from Altera
- 82 instructions
- Up to 256 custom instructions
- Optional multiply and divide depending on the
flavor - Comes in three flavors (number for Cyclone II
implementations) - Economy emphasizes minimum size 700 L.E and 17
DMIPS. - Standard performance/size balance 1400 L.E and
54 DMIPS - Fast best performance 1800 L. E and 92 DMIPS
12Creating Nios based systems using SOPC and
program it using IDE
SOPC builder
Nios II IDE
13Accelerating application within the Nios II
environment
custom instructions
Avalon component
Accelerator
peripherals
Avalon bus
Nios II processor
Memory (SRAM, or onchip)
14Using customs instructions to accelerate
applications
15HW assignment Accelerate C code to accelerate
palindrome detection
- A palindrome is a sequence of units a string
that has the property of reading the same in
either direction - Examples
- Racecar
- Dennis sinned
- 425524
- HW is to write a C routine to detect whether a
number is a palindrome or not then use it to
write a C program to count the number of number
palindromes between 0 and 1 billion. The count
can be computed statically but the HW ask you to
write a C program for the Nios II processor to
compute the count using the routine you coded
16HW assignment Accelerate C code to accelerate
palindrome detection
- After you write your program, report the runtime
and then accelerate the program using custom
instructions designed using the reconfigurable
logic - You are required to report the runtime before and
after the acceleration. It will be also good to
try your program on a general purpose workstation
and report the runtime. - You have to report the count of palindromes you
found together with the runtimes. Here are my
runtimes. - Optional 2.4 GHz Xeon workstation 355 seconds
- Required Nios II (just software) 27400 seconds
- Required Nios II (software custom
instructions) 105 seconds - Grades 15/20 if you get all parts working
correctly. 16/20 if your runtime is between
500-1000 seconds, 17/20 if your runtime is
between 200-500 and 18/20 if runtime is 100-200
and 20/20 if runtime is lt 100