Reconfigurable Computing - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Reconfigurable Computing

Description:

HW assignment: Accelerate C code to accelerate palindrome detection. A palindrome is a sequence of units 'a string' that has the property of reading ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 17
Provided by: sr5
Category:

less

Transcript and Presenter's Notes

Title: Reconfigurable Computing


1
Reconfigurable Computing (EN2911X,
Fall07) Lecture 09 RC Principles Software (2/4)
Prof. Sherief Reda Division of Engineering, Brown
University http//ic.engin.brown.edu
2
Behavioral code optimizing
  • Tree-height reduction applies to arithmetic
    expression trees and strives to achieve the
    expression split into two-operand expressions to
    exploit parallelism
  • The idea is to attempt to balance the expression
    tree as much as possible
  • If we have n operations, what is the best height
    that can be achieved?
  • Example x a b c d

b
c

b
c
a
d
a



d


x
x
3
Tree-height reduction
Exploiting the distributive property at the
expense of adding an operation
  • x a(bcd e)

4
Constant and variable propagation
  • Constant propagation consists of detecting
    constants operands and pre-computing the value of
    the operation with that operand. The result might
    a constant which can be propagated to other
    operations as input
  • Example
  • a 0 b a 1 c 2 b
  • Replaced by ? a 0 b 1 c 2
  • Variable propagation consists of detecting the
    copies of the variable and using the right-hand
    side in the following references in place of the
    left-hand side
  • Example
  • a x b a 1 c 2 a
  • Replaced by ? a x b x 1 c 2 x

5
CSE and DCA
  • Common Sub-expression Elimination (CSE) avoids
    unnecessary computations.
  • Example
  • a x y b a 1 c x y
  • Can be replaced by ? a x y b a 1 c
    a
  • Dead code elimination (DCA). Dead code consists
    of all operations that cannot be reached, or
    whose results is never referenced elsewhere.
  • Example
  • a x b x 1 c 2 x
  • The first assignment can be removed if it is
    never subsequently referenced

6
Operator strength reduction code motion
  • Operator strength reduction means reducing the
    cost of implementing an operator by using a
    weaker one (that uses less hardware / simpler
    and faster)
  • Example
  • a x2 b 3 x
  • Replaced by ? a x x t x ltlt 1 b x t
  • Code motion often applies to loop invariants,
    i.e., quantities that are computed inside an
    iterative construct but whose values fo not
    change from iteration to iteration.
  • Example
  • for (i 1 i lt a b)
  • Replaced by ? t a b for ( i 1 i lt t)

7
Control-flow-based transformations
  • Control-flow transformations are typically
    utilized to create more opportunities for
    data-flow transformations to be exercised
  • Model expansion consists in flattening locally
    the model call hierarchy. Therefore the called
    model disappears, being swallowed by the calling
    one.
  • A possible benefit is that the scope of
    application of some optimization techniques is
    enlarged yielding potentially a better circuit
  • Example
  • x a b y ab z func(x, y)
  • where func(p, q) t q-ppq return t
  • ? By expanding func, we get
  • x a b y a b z a bab
  • ? CSE x ab y ab z a-by

8
Conditional expansion
  • A conditional construct can be always transformed
    in a parallel construct with a test in the end.
  • Conditional expansion can increase the
    performance of the circuit when the conditional
    clause depends on some late-arriving signal.
  • However, it can preclude the possibility of
    hardware sharing
  • If (C) then xA else xB? compute A and B in
    parallel, x C ?AB

9
Loop expansion
  • In loop expansion, or unrolling, a loop is
    replaced by as many instances of the body as the
    number of operations. The benefit is in expanding
    the scope of other transformations
  • Example

x 0 for (i 1 i lt 12 i) x x
ai
10
Putting concepts into work Hardware acceleration
using custom instructions
  • We studied the concepts HW/SW partitioning and
    code optimizations for high-level synthesis
  • We will apply these concepts with the help of the
    Nios-II soft core processor
  • Difference between Soft and Hard processors
  • A hard processor is one that is implemented as a
    dedicated, predefined (hardwired) block
  • As opposed to physically embedding a processor
    into the FPGA fabric, it is possible to configure
    a group of logic blocks to act as a soft
    processor
  • What are the advantages and disadvantages of each?

11
The Nios II soft processor
  • 32 bit soft processor from Altera
  • 82 instructions
  • Up to 256 custom instructions
  • Optional multiply and divide depending on the
    flavor
  • Comes in three flavors (number for Cyclone II
    implementations)
  • Economy emphasizes minimum size 700 L.E and 17
    DMIPS.
  • Standard performance/size balance 1400 L.E and
    54 DMIPS
  • Fast best performance 1800 L. E and 92 DMIPS

12
Creating Nios based systems using SOPC and
program it using IDE
SOPC builder
Nios II IDE
13
Accelerating application within the Nios II
environment
custom instructions
Avalon component
Accelerator
peripherals
Avalon bus
Nios II processor
Memory (SRAM, or onchip)
14
Using customs instructions to accelerate
applications
15
HW assignment Accelerate C code to accelerate
palindrome detection
  • A palindrome is a sequence of units a string
    that has the property of reading the same in
    either direction
  • Examples
  • Racecar
  • Dennis sinned
  • 425524
  • HW is to write a C routine to detect whether a
    number is a palindrome or not then use it to
    write a C program to count the number of number
    palindromes between 0 and 1 billion. The count
    can be computed statically but the HW ask you to
    write a C program for the Nios II processor to
    compute the count using the routine you coded

16
HW assignment Accelerate C code to accelerate
palindrome detection
  • After you write your program, report the runtime
    and then accelerate the program using custom
    instructions designed using the reconfigurable
    logic
  • You are required to report the runtime before and
    after the acceleration. It will be also good to
    try your program on a general purpose workstation
    and report the runtime.
  • You have to report the count of palindromes you
    found together with the runtimes. Here are my
    runtimes.
  • Optional 2.4 GHz Xeon workstation 355 seconds
  • Required Nios II (just software) 27400 seconds
  • Required Nios II (software custom
    instructions) 105 seconds
  • Grades 15/20 if you get all parts working
    correctly. 16/20 if your runtime is between
    500-1000 seconds, 17/20 if your runtime is
    between 200-500 and 18/20 if runtime is 100-200
    and 20/20 if runtime is lt 100
Write a Comment
User Comments (0)
About PowerShow.com