Factoring and Eliminating Common Subexpressions in Polynomial Expressions International Conference o presentation

About This Presentation

Transcript and Presenter's Notes

Title: Factoring and Eliminating Common Subexpressions in Polynomial Expressions International Conference o

1
Factoring and Eliminating Common Subexpressions
in Polynomial ExpressionsInternational
Conference on Computer Aided Design (ICCAD), 2004

Farzan Fallah
Advanced CAD Research
Fujitsu Labs. of America

Anup Hosangadi Ryan Kastner ECE Department, UCSB
2
Outline

Introduction
Related Work
Algebraic techniques for redundancy elimination
Experimental results
Conclusions

3
Introduction

Embedded system applications need to compute
polynomial expressions
Continuous functions can be approximated by
polynomials to desired degree of accuracy
Adaptive signal processing (Polynomial filters )
Polynomial interpolation/extrapolation in
Computer Graphics
Encryption

4
Introduction

Multiplications are expensive in Embedded systems
No good optimization tool for reducing complexity
of polynomials
Designers rely on Hand optimized libraries
Conventional optimization techniques
CSE, Value numbering not suited for polynomials
Horner form most popular representation
anxn a1xn-1 .an-1x a0 (((anx an-1)x
an-2)x ..a1)x a0
Not good for multivariate polynomials
Only a single polynomial expression at a time

5
Introduction

Quartic-spline polynomial (3-D graphics)
P zu4 4avu3 6bu2v2 4uv3w qv4
Horner form (from MapleTM)
P zu4 (4au3 (6bu2 (4uw qv)v)v)v
(17 multiplications)
Proposed algebraic method
d1 v2 d2 d1v
P u3(uz ad2) d1( qd1 u(wd2 6bu) )
(11 multiplications)

6
Related Work

Expression Factorization (M.A.Breuer JACM69)
Allows only one kind of operator at a time
Symbolic algebra techniques
(A. Peymandoust, De Micheli DAC01)
Used for mapping DSP datapaths (polynomials) to
library elements
Results depend upon exponential library search
e.g. a2 b2 (ab)(a-b) iff (ab) or (a b)
is in library
Manipulates only one expression at a time

F1 A B C D F2 A P D
gt Extract (A D)
7
Motivating Example

Consider set of expressions
Naïve implementation 16 multiplications, 4
additions/subtractions
Using CSE
12 multiplications, 4 additions/subtractions

8
Motivating Example

Using our algebraic techniques
Total 7 multiplications, 3 additions/subtractions
Savings of 5 multiplications, 1
addition/subtraction compared to CSE
Impossible to obtain such results using
conventional techniques

9
Introduction to algebraic techniques for
redundancy elimination

Algebraic techniques in multi-level logic
synthesis (MLLS)
Decomposition, factoring reduce number of
literals
Distill and Condense use Rectangle Covering
methods
Polynomial Expressions (Our Technique)
Factoring, Single term common subexpressions
reduces number of multiplications
Multiple term common subexpressions reduces
number of additions and possibly multiplications
Key Differences (Generalization to handle higher
orders)
Kernelling techniques
Finding single cube intersections

10
Introduction to our technique(Outline)

Find a subset of all possible subexpressions
(kernel generation)
Transformation of Polynomial Expressions
Problem formulation
Extract multiple term common subexpressions and
factors
Extract single term common factors

11
Introduction to our technique

Terminology
Literal A variable or a constant eg. a,b,2,3.14
Cube Product of literals e.g. 3a2b, -2a3b2c
SOP Sum of cubes e.g. 3a2b 2a3b2c
Cube-free expression No literal or cube can
divide all the cubes of the expression
Kernel A cube free sub-expression of an
expression, e.g. 3 2abc
Co-Kernel A cube that is used to divide an
expression to get a kernel, e.g. a2b

12
Introduction to our Technique

Matrix Representation of Polynomial Expressions
F x3y xy2z is represented by
Each row represents a product term
Each column represents a variable/constant
Each element (i,j) represents power of variable j
in term i

13
Generation of Kernels (example)

P1 x3y x2y2z L x,y,z
Divide by x
Ft P1/x x2y xy2z

14
Generation of Kernels (example)

Ft P1/x x2y xy2z
C Biggest Cube dividing all cubes of Ft

/ C
1 1 0
C
xy
15
Generation of Kernels (example)

Obtain Kernel
F1 Ft/C (x2y xy2z)/(xy) ( x yz)
Obtain Co-Kernel
D1 x(xy) x2y
No kernels within F1. Go back to P1
P1 x3y x2y2z
Divide now by next variable y
Ft x3 x2yz
C x2
But (x lt y) e C
Stop Here, to avoid repeating same kernel Ft/C
(x yz)
No more kernels extracted
Record kernel F1 P1 with co-kernel 1

16
Concept of kernels and co-kernels

Theorem Two expressions f and g can have a
multiple term common subexpression iff there are
2 kernels Kf and Kg having a multiple term
intersection
Detection of multiple term common subexpressions
by intersection of sets of kernels
Each co-kernel kernel pair represents a
possible factorization
e.g. x3y x2y2z x2y(x yz)
Set of kernels a subset of all possible
subexpressions

17
All Kernels and Co Kernels
Which kernels to choose?
18
Kernel Cube Matrix (KCM)

One row for each Kernel generated
One column for each distinct kernel cube
Each non-zero element represents a term

x3y
19
Finding Kernel Intersections(Distill Algorithm)

Each kernel intersection or factor appears as a
rectangle
Rectangle Set of rows and columns such that all
elements are 1
Value of a rectangle weighted sum of the number
of operations saved
Goal Maximum valued rectangular covering of KCM
Greedy heuristic covering by prime rectangles
Prime rectangle Rectangle not covered by any
other rectangle

20
Finding Kernel Intersections (Distill Algorithm)

Formula for Value of a rectangle
R number of rows
C number of columns
M(Ri) of multiplications in row (co-kernel)
i.
M(Ci) of multiplications in column
(kernel-cube) i
m ratio of weights of multiplication to
addition
Value

Formula calculates savings in operation count
21
Distill Algorithm
22
Distill Algorithm
4xy x2y xyd2 d2 4 x Saves 2
multiplications
Remove covered terms
23
Distill Algorithm

Distill algorithm exits after no more kernel
intersections can be found

P1 x2yd1 d1 x yz
P2 4d1 xyz d2 4 - x P3 xyd1
Can further optimize by finding single cube
intersections
24
Finding single cube intersections (Condense
Algorithm)

Need an algorithm for finding single term common
subexpressions
Consider two single term expressions
F1 a4b3c
F2 a2b4c2
Form Cube Variable Incidence Matrix (CIM)

One row for each product term One column for each
variable
25
Finding single cube intersections (Condense
algorithm)

Each (single term) common subexpression appears
as a rectangle.
Rectangle Set of rows and columns where all
elements are non-zero
Value of a rectangle is number of multiplications
saved by selecting it
C cube corresponding to the rectangle
Value Rows( (SCi ) -1)
Maximum valued rectangular covering will give
minimum number of multiplications
Use greedy iterative covering by prime rectangles

26
Finding single cube intersections (Condense
algorithm)
a4b3c
a2d1
a2b4c2
bc
a2b3c
d1 a2b3c
d2 bc
27
Finding single cube intersections (Condense
algorithm)
d3 a2
28
Finding single cube intersections (Condense
algorithm)

Final CIM
Final Implementation ( 7 multiplications)

29
Distill Algorithm

Distill algorithm exits after no more kernel
intersections can be found

P1 x2yd1 d1 x yz
P2 4d1 xyz d2 4 - x P3 xyd1
Can further optimize by finding single cube
intersections
30
Cube Literal Matrix (Condense Algorithm)
CIM for our example after Distill algorithm
Save 2 multiplications by extracting xy
31
Condense Algorithm
Extracting xy
No more favorable cube intersections found
32
Final Implementation

Total 7 multiplications, 3 additions/subtractions
Savings of 5 multiplications, 1
addition/subtraction compared to CSE
Impossible to obtain such results using
conventional techniques

33
Optimization of sin(x)
34
Optimization of sin(x)
35
Optimization of sin(x)
36
Optimization of sin(x)

Final Implementation
X xx
Sin(x) x(1 (-S3 (S5 S7X)X) ) X)
Total 5 multiplications and 3 additions/subtractio
ns
SAME AS GNU C HAND optimized form

37
Experimental Setup (Sequential processor)

Signal processing and multimedia applications
MP3 decoder, Mesa (graphics), Adaptive filter,
FFT, FIR
Taylor series approximation of trigonometric
functions
Optimizations on arithmetic subgraphs from
Dataflow graphs (DFGs)
Polynomials from computer graphics
Multivariate polynomial approximation
Compared number of operations with CSE and Horner
form
Estimated savings in clock cycles on ARM core

38
Experimental results (comparing number of
operations from different methods)
Average run time 0.45s for our technique
39
Experimental results (Improvement over CSE and
Horner)
40
Conclusions

Development of new algebraic technique for
optimizing polynomial expressions
Currently used for minimizing number of
arithmetic operations using greedy rectangular
covering
Results better than conventional techniques

41
Future Work

Develop and implement optimal algorithms to
compare results with our greedy heuristic
Optimization for delay, energy
Impact of optimizations on stability

42
Thank You

Questions ??

Extra slides

44
Finding Kernel Intersections(Distill Algorithm)

Worst case scenario for Distill algorithm
Number of prime rectangles exponential in number
of rows/columns
Heuristic methods to find best prime rectangle
In practice polynomial expressions are not so
large

Write a Comment

User Comments (0)

About PowerShow.com

Factoring and Eliminating Common Subexpressions in Polynomial Expressions International Conference o PowerPoint PPT Presentation