Energy Efficient Hardware Synthesis of Polynomial Expressions 18th International Conference on VLSI - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Energy Efficient Hardware Synthesis of Polynomial Expressions 18th International Conference on VLSI

Description:

Energy Efficient Hardware Synthesis of Polynomial Expressions. 18th International ... Designers rely on hand optimized libraries ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 42
Provided by: Anup3
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Energy Efficient Hardware Synthesis of Polynomial Expressions 18th International Conference on VLSI


1
Energy Efficient Hardware Synthesis of Polynomial
Expressions18th International Conference on
VLSI Design
Anup Hosangadi Ryan Kastner ECE Department, UCSB
Farzan Fallah Advanced CAD Research Fujitsu Labs
of America
2
Outline
  • Introduction
  • Related Work
  • Problem formulation
  • Algorithms for optimizing polynomials
  • Experimental results
  • Conclusions

3
Introduction
  • Embedded system applications need to compute
    polynomial expressions
  • Continuous functions can be approximated by
    Taylor Series
  • Adaptive (polynomial) filters
  • Polynomial interpolation/extrapolation
  • in Computer Graphics
  • Encrpytion

4
Introduction
  • Commonly occuring computations implemented in
    hardware
  • More flexibility than processor architecture
  • NPAs (Hardware accelarators) in PICO project
  • Custom Instructions (Tensilica)
  • Upto 100 times improvement over processor
    implementation (Kastner et.al TODAES02)
  • Develop techniques for reducing power consumption

5
Related Work (Behavioral transforms)
  • Power consumption depends on many factors
  • Reducing number of operations
  • Hardware (Nguyen and Chatterjee TVLSI00)
  • Software (I.Hong et.al TODAES99)
  • Voltage reduction after speedup transformations
  • Retiming, Pipelining, Algebraic restructuring
  • (Chandrakasan et. al TCAD95)

6
Related Work
  • Scheduling and resource allocation
  • Shutting down unused resources (Monteiro et. al.
    DAC 96)
  • Allocation of registers, functional units and
    interconnects (A.Raghunathan et. al ICCD94)
  • Multiple Vdd scheduling
  • Assigning supply voltage to each operation in
    CDFG (M.Chang and M.Pedram TVLSI97)

7
Related Work
  • Switching power is proportional to number of
    operations
  • Multiplications are expensive in Embedded systems
  • Average 40 times more power than addition at 5V
    (V.Krishna et. al, VLSI Design 1999)
  • Careful optimization of expressions is therefore
    necessary to save power

8
Reducing operations in polynomial expressions
  • No good tool for polynomials
  • Designers rely on hand optimized libraries
  • Conventional compiler techniques CSE and Value
    numbering not suited for polynomials.
  • Horner form most popular representation
  • anxn a1xn-1 .an-1x a0 (((anx an-1)x
    an-2)x ..a1)x a0
  • Not good for multivariate polynomials
  • Only a single polynomial expression at a time

9
Comparison with Horner form
  • Quartic-spline polynomial (3-D graphics)
  • P zu4 4avu3 6bu2v2 4uv3w qv4
  • Horner form (from MapleTM)
  • P zu4 (4au3 (6bu2 (4uw qv)v)v)v
  • (17 multiplications)
  • Proposed algebraic method
  • d1 v2 d2 d1v
  • P u3(uz ad2) d1( qd1 u(wd2 6bu) )
  • (11 multiplications)

10
Related Work (Polynomial Expressions
  • Expression Factorization (M.A. Breuer JACM69)
  • Allows only one kind of operator at a time
  • Using Symbolic Algebra (M.A.Peymandoust, De
    Micheli)
  • Mapping polynomial datapaths to libraries
    (DAC01)
  • Low power embedded software (DATE02)
  • Results depend heavily on set of library elements
  • eg. (a2 b2) (ab)(a-b) iff (ab) or (a-b)
    is a library element
  • Manipulates only a single expression at a time

F1 A B C D F2 A P D
gt Extract (A D)
11
Motivating Example
  • Consider set of expressions
  • Using CSE

16 multiplications and 4 additions/subtractions
12 multiplications and 4 additions/subtractions
12
Motivational Example
  • Using Horner transform
  • Using our algebraic technique

12 multiplications and 4 additions/subtractions
7 multiplications and 3 additions/subtractions
13
Introduction to algebraic technique for
redundancy elimination
  • Algebraic techniques in multi-level logic
    synthesis (MLLS)
  • Decomposition, factoring reduce number of
    literals
  • Distill and Condense use Rectangle Covering
    methods
  • Polynomial Expressions (Our Technique)
  • Factoring, Single term common subexpressions
    reduces number of multiplications
  • Multiple term common subexpressions reduces
    number of additions and possibly multiplications
  • Key Differences (Generalization to handle higher
    orders)
  • Kernelling techniques
  • Finding single cube intersections

14
Introduction to our technique(Outline)
  • Find a subset of all possible subexpressions
    (kernel generation)
  • Transformation of Polynomial Expressions
  • Problem formulation
  • Extract multiple term common subexpressions and
    factors
  • Extract single term common factors

15
Introduction to our technique
  • Terminology
  • Literal A variable or a constant eg. a,b,2,3.14
  • Cube Product of literals e.g. 3a2b, -2a3b2c
  • SOP Sum of cubes e.g. 3a2b 2a3b2c
  • Cube-free expression No literal or cube can
    divide all the cubes of the expression
  • Kernel A cube free sub-expression of an
    expression, e.g. 3 2abc
  • Co-Kernel A cube that is used to divide an
    expression to get a kernel, e.g. a2b

16
Introduction to our Technique
  • Matrix Representation of Polynomial Expressions
  • F x3y xy2z is represented by
  • Each row represents a product term
  • Each column represents a variable/constant
  • Each element (i,j) represents power of variable j
    in term i

17
Generation of Kernels (example)
  • P1 x3y x2y2z L x,y,z
  • Divide by x
  • Ft P1/x x2y xy2z

18
Generation of Kernels (example)
  • Ft P1/x x2y
    xy2z
  • C Biggest Cube dividing all cubes of Ft

/ C
1 1 0
C
xy
19
Generation of Kernels (example)
  • Obtain Kernel
  • F1 Ft/C (x2y xy2z)/(xy) ( x yz)
  • Obtain Co-Kernel
  • D1 x(xy) x2y
  • No kernels within F1. Go back to P1
  • P1 x3y x2y2z
  • Divide now by next variable y
  • Ft x3 x2yz
  • C x2
  • But (x lt y) e C
  • Stop Here, to avoid repeating same kernel Ft/C
    (x yz)
  • No more kernels extracted
  • Record kernel F1 P1 with co-kernel 1

20
Concept of kernels and co-kernels
  • Theorem Two expressions f and g can have a
    multiple term common subexpression iff there are
    2 kernels Kf and Kg having a multiple term
    intersection
  • Detection of multiple term common subexpressions
    by intersection of sets of kernels
  • Each co-kernel kernel pair represents a
    possible factorization
  • e.g. x3y x2y2z x2y(x yz)
  • Set of kernels a subset of all possible
    subexpressions

21
All Kernels and Co Kernels
Which kernels to choose?
22
Kernel Cube Matrix (KCM)
  • One row for each Kernel generated
  • One column for each distinct kernel cube
  • Each non-zero element represents a term

x3y
23
Finding Kernel Intersections(Distill Algorithm)
  • Each kernel intersection or factor appears as a
    rectangle
  • Rectangle Set of rows and columns such that all
    elements are 1
  • Value of a rectangle Weighted sum of the energy
    savings of the different operations
  • Goal Maximum valued rectangular covering of KCM
  • Greedy heuristic covering by prime rectangles

24
Modeling value function of a rectangle
  • Formula for weighted sum of energy savings on
    selection of a rectangle
  • R of rows C of columns
  • M(Ri) of multiplications in row
    (co-kernel) i.
  • M(Ci) of multiplications in column
    (kernel-cube) i
  • m ratio of average energy consumption of
  • multiplication to addition in the target
    library

Value
25
Distill Algorithm
26
Distill Algorithm
4xy x2y xyd2 d2 4 x Saves 2
multiplications Value 80
Remove covered terms
27
Distill Algorithm
  • Distill algorithm exits after no more kernel
    intersections can be found

P1 x2yd1 d1 x yz
P2 4d1 xyz d2 4 - x P3 xyd2
Can further optimize by finding single cube
intersections
28
Finding single cube intersections (Condense
algorithm)
  • Form Cube Literal Matrix (CLM)
  • One row for each cube
  • One column for each literal
  • Eg. 2 cubes F1 a4b3c and F2 a2b4c2

29
Finding single cube intersections (Condense
algorithm)
  • Each (single term) common subexpression appears
    as a rectangle.
  • Rectangle Set of rows and columns where all
    elements are non-zero
  • Value of a rectangle is number of multiplications
    saved by selecting it
  • C cube corresponding to the rectangle
  • Value Rows( (SCi ) -1)
  • Maximum valued rectangular covering will give
    minimum number of multiplications
  • Use greedy iterative covering by prime rectangles

30
Cube Literal Matrix (Condense Algorithm)
CLM for our example after Distill algorithm
Save 2 multiplications by extracting xy
C xy
31
Condense Algorithm
Extracting xy
No more favorable cube intersections found
32
Final Implementation
  • Total 7 multiplications, 3 additions/subtractions
  • Savings of 5 multiplications, 1
    addition/subtraction compared to CSE
  • Impossible to obtain such results using
    conventional techniques

33
Experimental setup
  • Polynomials used in Computer graphics and Signal
    Processing
  • 1.0 µ technology library, characterized for power
    consumption
  • Synthesized using Synopsys Design CompilerTM
  • Min Hardware constraints (1 adder 1 multiplier)
  • Med Hardware constraints (Max 4 multipliers)

34
Experimental setup
  • Estimated power using Synopsys Power CompilerTM
    for random inputs, using RTL Simulator (VCSTM)
  • Compared energy consumption with CSE and Horner
    form
  • Compared energy after voltage scaling

35
Results (Comparing operations)
36
Results (Min Hardware constraints)
37
Results (Med Hardware constraints)
38
Conclusions
  • Technique to reduce number of operations in
    polynomial expressions
  • Large savings in energy consumption observed over
    CSE and Horner methods
  • Need to consider scheduling and resource
    allocation to obtain further improvements

39
Conclusions
  • Thank you!!
  • Questions ???

40
  • Extra slides

41
Finding Kernel Intersections(Distill Algorithm)
  • Worst case scenario for Distill algorithm
  • Number of prime rectangles exponential in number
    of rows/columns
  • Heuristic methods to find best prime rectangle
  • In practice polynomial expressions are not so
    large
Write a Comment
User Comments (0)
About PowerShow.com