Title: Factoring and Eliminating Common Subexpressions in Polynomial Expressions International Conference o
1Factoring and Eliminating Common Subexpressions
in Polynomial ExpressionsInternational
Conference on Computer Aided Design (ICCAD), 2004
- Farzan Fallah
- Advanced CAD Research
- Fujitsu Labs. of America
Anup Hosangadi Ryan Kastner ECE Department, UCSB
2Outline
- Introduction
- Related Work
- Algebraic techniques for redundancy elimination
- Experimental results
- Conclusions
3Introduction
- Embedded system applications need to compute
polynomial expressions - Continuous functions can be approximated by
polynomials to desired degree of accuracy -
- Adaptive signal processing (Polynomial filters )
- Polynomial interpolation/extrapolation in
Computer Graphics - Encryption
4Introduction
- Multiplications are expensive in Embedded systems
- No good optimization tool for reducing complexity
of polynomials - Designers rely on Hand optimized libraries
- Conventional optimization techniques
- CSE, Value numbering not suited for polynomials
- Horner form most popular representation
- anxn a1xn-1 .an-1x a0 (((anx an-1)x
an-2)x ..a1)x a0 - Not good for multivariate polynomials
- Only a single polynomial expression at a time
5Introduction
- Quartic-spline polynomial (3-D graphics)
- P zu4 4avu3 6bu2v2 4uv3w qv4
- Horner form (from MapleTM)
- P zu4 (4au3 (6bu2 (4uw qv)v)v)v
- (17 multiplications)
- Proposed algebraic method
- d1 v2 d2 d1v
- P u3(uz ad2) d1( qd1 u(wd2 6bu) )
- (11 multiplications)
6Related Work
- Expression Factorization (M.A.Breuer JACM69)
- Allows only one kind of operator at a time
- Symbolic algebra techniques
- (A. Peymandoust, De Micheli DAC01)
- Used for mapping DSP datapaths (polynomials) to
library elements - Results depend upon exponential library search
- e.g. a2 b2 (ab)(a-b) iff (ab) or (a b)
is in library - Manipulates only one expression at a time
F1 A B C D F2 A P D
gt Extract (A D)
7Motivating Example
- Consider set of expressions
- Naïve implementation 16 multiplications, 4
additions/subtractions - Using CSE
-
-
- 12 multiplications, 4 additions/subtractions
8Motivating Example
- Using our algebraic techniques
- Total 7 multiplications, 3 additions/subtractions
- Savings of 5 multiplications, 1
addition/subtraction compared to CSE - Impossible to obtain such results using
conventional techniques
9Introduction to algebraic techniques for
redundancy elimination
- Algebraic techniques in multi-level logic
synthesis (MLLS) - Decomposition, factoring reduce number of
literals - Distill and Condense use Rectangle Covering
methods - Polynomial Expressions (Our Technique)
- Factoring, Single term common subexpressions
reduces number of multiplications - Multiple term common subexpressions reduces
number of additions and possibly multiplications - Key Differences (Generalization to handle higher
orders) - Kernelling techniques
- Finding single cube intersections
10Introduction to our technique(Outline)
- Find a subset of all possible subexpressions
(kernel generation) - Transformation of Polynomial Expressions
- Problem formulation
- Extract multiple term common subexpressions and
factors - Extract single term common factors
11Introduction to our technique
- Terminology
- Literal A variable or a constant eg. a,b,2,3.14
- Cube Product of literals e.g. 3a2b, -2a3b2c
- SOP Sum of cubes e.g. 3a2b 2a3b2c
- Cube-free expression No literal or cube can
divide all the cubes of the expression - Kernel A cube free sub-expression of an
expression, e.g. 3 2abc - Co-Kernel A cube that is used to divide an
expression to get a kernel, e.g. a2b
12Introduction to our Technique
- Matrix Representation of Polynomial Expressions
-
- F x3y xy2z is represented by
- Each row represents a product term
- Each column represents a variable/constant
- Each element (i,j) represents power of variable j
in term i
13Generation of Kernels (example)
- P1 x3y x2y2z L x,y,z
- Divide by x
- Ft P1/x x2y xy2z
14Generation of Kernels (example)
- Ft P1/x x2y xy2z
- C Biggest Cube dividing all cubes of Ft
-
-
/ C
1 1 0
C
xy
15Generation of Kernels (example)
- Obtain Kernel
- F1 Ft/C (x2y xy2z)/(xy) ( x yz)
- Obtain Co-Kernel
- D1 x(xy) x2y
- No kernels within F1. Go back to P1
- P1 x3y x2y2z
- Divide now by next variable y
- Ft x3 x2yz
- C x2
- But (x lt y) e C
- Stop Here, to avoid repeating same kernel Ft/C
(x yz) - No more kernels extracted
- Record kernel F1 P1 with co-kernel 1
16Concept of kernels and co-kernels
- Theorem Two expressions f and g can have a
multiple term common subexpression iff there are
2 kernels Kf and Kg having a multiple term
intersection - Detection of multiple term common subexpressions
by intersection of sets of kernels - Each co-kernel kernel pair represents a
possible factorization - e.g. x3y x2y2z x2y(x yz)
- Set of kernels a subset of all possible
subexpressions
17All Kernels and Co Kernels
Which kernels to choose?
18Kernel Cube Matrix (KCM)
- One row for each Kernel generated
- One column for each distinct kernel cube
- Each non-zero element represents a term
-
x3y
19Finding Kernel Intersections(Distill Algorithm)
- Each kernel intersection or factor appears as a
rectangle - Rectangle Set of rows and columns such that all
elements are 1 - Value of a rectangle weighted sum of the number
of operations saved - Goal Maximum valued rectangular covering of KCM
- Greedy heuristic covering by prime rectangles
- Prime rectangle Rectangle not covered by any
other rectangle
20Finding Kernel Intersections (Distill Algorithm)
- Formula for Value of a rectangle
- R number of rows
- C number of columns
- M(Ri) of multiplications in row (co-kernel)
i. - M(Ci) of multiplications in column
(kernel-cube) i - m ratio of weights of multiplication to
addition - Value
-
Formula calculates savings in operation count
21Distill Algorithm
22Distill Algorithm
4xy x2y xyd2 d2 4 x Saves 2
multiplications
Remove covered terms
23Distill Algorithm
- Distill algorithm exits after no more kernel
intersections can be found -
-
P1 x2yd1 d1 x yz
P2 4d1 xyz d2 4 - x P3 xyd1
Can further optimize by finding single cube
intersections
24Finding single cube intersections (Condense
Algorithm)
- Need an algorithm for finding single term common
subexpressions - Consider two single term expressions
- F1 a4b3c
- F2 a2b4c2
- Form Cube Variable Incidence Matrix (CIM)
One row for each product term One column for each
variable
25Finding single cube intersections (Condense
algorithm)
- Each (single term) common subexpression appears
as a rectangle. - Rectangle Set of rows and columns where all
elements are non-zero - Value of a rectangle is number of multiplications
saved by selecting it - C cube corresponding to the rectangle
- Value Rows( (SCi ) -1)
- Maximum valued rectangular covering will give
minimum number of multiplications - Use greedy iterative covering by prime rectangles
26Finding single cube intersections (Condense
algorithm)
a4b3c
a2d1
a2b4c2
bc
a2b3c
d1 a2b3c
d2 bc
27Finding single cube intersections (Condense
algorithm)
d3 a2
28Finding single cube intersections (Condense
algorithm)
- Final CIM
- Final Implementation ( 7 multiplications)
29Distill Algorithm
- Distill algorithm exits after no more kernel
intersections can be found -
-
P1 x2yd1 d1 x yz
P2 4d1 xyz d2 4 - x P3 xyd1
Can further optimize by finding single cube
intersections
30Cube Literal Matrix (Condense Algorithm)
CIM for our example after Distill algorithm
Save 2 multiplications by extracting xy
31Condense Algorithm
Extracting xy
No more favorable cube intersections found
32Final Implementation
- Total 7 multiplications, 3 additions/subtractions
- Savings of 5 multiplications, 1
addition/subtraction compared to CSE - Impossible to obtain such results using
conventional techniques
33Optimization of sin(x)
34Optimization of sin(x)
35Optimization of sin(x)
36Optimization of sin(x)
- Final Implementation
- X xx
- Sin(x) x(1 (-S3 (S5 S7X)X) ) X)
- Total 5 multiplications and 3 additions/subtractio
ns - SAME AS GNU C HAND optimized form
37Experimental Setup (Sequential processor)
- Signal processing and multimedia applications
- MP3 decoder, Mesa (graphics), Adaptive filter,
FFT, FIR - Taylor series approximation of trigonometric
functions - Optimizations on arithmetic subgraphs from
Dataflow graphs (DFGs) - Polynomials from computer graphics
- Multivariate polynomial approximation
- Compared number of operations with CSE and Horner
form - Estimated savings in clock cycles on ARM core
38Experimental results (comparing number of
operations from different methods)
Average run time 0.45s for our technique
39Experimental results (Improvement over CSE and
Horner)
40Conclusions
- Development of new algebraic technique for
optimizing polynomial expressions - Currently used for minimizing number of
arithmetic operations using greedy rectangular
covering - Results better than conventional techniques
41Future Work
- Develop and implement optimal algorithms to
compare results with our greedy heuristic - Optimization for delay, energy
- Impact of optimizations on stability
42Thank You
43 44Finding Kernel Intersections(Distill Algorithm)
- Worst case scenario for Distill algorithm
- Number of prime rectangles exponential in number
of rows/columns - Heuristic methods to find best prime rectangle
- In practice polynomial expressions are not so
large -