Ajay K' Verma and Paolo Ienne - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Ajay K' Verma and Paolo Ienne

Description:

Ajay K. Verma and Paolo Ienne. Processor Architecture Laboratory (LAP) ... Issue: AND operator is idempotent. Reduce the final expression with respect to (x2 - x) for ... – PowerPoint PPT presentation

Number of Views:101

Avg rating:3.0/5.0

Slides: 27

Provided by: josed83

Category:

more less

Transcript and Presenter's Notes

Title: Ajay K' Verma and Paolo Ienne

1
Towards the Automatic Exploration of
Arithmetic-Circuit Architectures

Ajay K. Verma and Paolo Ienne
Processor Architecture Laboratory (LAP)
Centre for Advanced Digital Systems (CSDA)
Ecole Polytechnique Fédérale de Lausanne (EPFL)

2
Example Plenty of Different Adders
0.49 ns 691 µm2
0.41 ns 385 µm2
Problem How can we obtain automatically the best
fitting implementation in this space?
0.34 ns 534 µm2
3
Typical Synthesis Methods

Write all the expressions in sum of product form.
Find all the Kernels and Cokernels of the
expressions.
Formulate the problem as a Rectangle Cover
Problem.
Use heuristics to solve the Rectangle Cover
Problem.

a
de
bc
f
a
0
0
2
1
x af abc def bcde
0
0
4
3
de
(a de) (bc f)
2
4
0
0
bc
0
0
f
1
3
4
Limitations of Typical Methods

All expressions should be in sum of product form.
Arithmetic expressions are XOR-intensive.
Kernel extraction is based on algebraic
factoring.
Expressions and are considered
independent (? unable to explore all common
subexpressions).
Rectangle Cover Problem solved with heuristics
which therefore cannot guarantee optimal results.

5
Related Work

Multi-level optimization and Boolean division
Classic problem Brayton82, Brayton90,
DeMicheli94,
Boolean division improvements Chang99,
Optimization of specific arithmetic circuits
Final adders for multipliers Lee91
Column compressors Stelling98
Carry-save addition Verma04
Symbolic algebra
Various applications to EDA Peymandoust01

6
Outline

Problem formulation.
Introduction of a different division.
Core problem finding CSEs.
Enumeration of all possible CSEs.
Pruning the search space.
Results and analysis.

7
Problem Statement

Pareto-optimal implementation An implementation
which is better than any other in terms of area
or critical-path delay.

Given a set of Boolean expressions, generate all
their Pareto-optimal implementations.
8
Gröbner Bases and Division

A well known method for multinomial division
using the remainder theorem.

reduce (f, g)
Algebraic factoring
9
Gröbner Bases for Boolean Algebra

Boolean algebra does not form a ring under the
operations AND and OR.
Neither of the two operations is invertible.
But Boolean algebra forms a ring over the field
GF(2) underthe operations AND and XOR.
Operation XOR is self-invertible.
Reed-Muller form has no NOT operation.
Reed-Muller form of an expression is unique.
Expected size of an expression in Reed-Muller
form is smaller than the expected size in sum of
product form.
Issue AND operator is idempotent.
Reduce the final expression with respect to (x2 -
x) forunderlying variables x.

10
Two Theorems and Their Consequences

Theorem 1 In any Pareto-optimal implementation
of E1 and E2 , where they use S as a Common
Sub-Expression, the implementation of S must be
Pareto-optimal.

The problem has a dynamic programming structure.
11
Two Theorems and Their Consequences

Theorem 2 If there are m Pareto-optimal
implementations of E1 and n Pareto-optimal
implementations of E2 which use Sk as the
implementation of their CSE, then by considering
only (m n) combinations of these
implementations we can find all Pareto-optimal
implementations using Sk .

E2
E1
1 (20)
1 (30)
Area
2 (16)
8 (38)
3 (25)
4 (15)
8 (22)
5 (14)
10 (20)
7 (13)
8 (35)
Delay
9 (12)
12
Hence, Two Independent Problems

Problem 1 Given two Boolean expressions E1 and
E2 , find all possible Common Sub-Expressions
between them.
Problem 2 Find all Pareto-optimal
implementations of a single Boolean expression E.

13
Problem 1 Enumerating CSEs
The nodes of the DAG correspond to all partial
implementations of the two expressions with some
sharing between them.
14
Replacing Partial Occurrences Can Be Useful
Partial occurrences can also be replaced by a new
variable (e.g., s x ? y ? x (x ? y) ? y
s ? y). Kernel extraction algorithms cannot be
used.
Replacing partial occurrences too
Without replacing partial occurrences
3 XOR gates, 2 AND gates
3 XOR gates, 3 AND gates
15
t-Reductions Are Necessary
t bd
t -reductions preserve the min delay at least in
one path.
16
Pruning the Enumeration DAG

The size of DAG can be as large as O ((n m)
2m), where n is the number of variables and m is
the sizes of Boolean expressions.
Enumerating the whole DAG is computationally
infeasible.
Pruning Criteria.
Recognizing node equivalence (width reduction).
Merging some reductions into a single one(height
reduction).
Delaying certain reductions (branch reduction).

17
Pruning Based on Node Equivalence (Width
Reduction)

s5 ? abcd, s6 ? abcd, s7 ? s1cd s5 s6 ?
s7
18
Neutral t-Reductions Should Be Applied
Immediately (Height Reduction)
s-reduction

Neutral t-reductions
A t-reduction which does not kill any s
-reduction.

t-reduction

Recognition of neutral t -reductions
Find all reductions which are killed by this
reduction.
Check if any of them is an s -reduction.

Normalization
Normalization
19
Nonneutral t-Reductions Should Be Delayed (Branch
Reduction)

The only purpose of t -reductions is to preserve
the minimum delay in at least one path.
If there exist at least one s -reduction which
preserves the minimum delay, any t -reduction at
the current node can be avoided.
Computing the minimum delay corresponding to a
Boolean expression is NP-hard.
Not all instances of Boolean expressions are hard
to compute the minimum delay.
E.g., the minimum delay of a Boolean expression
A1 A2 An can be computed using a
two-greedy approach, where Ais are product terms
with disjoint set of variables.

20
Two Independent Problems

Problem 1 Given two Boolean expressions E1 and
E2 , find all possible Common Sub-Expressions
between them.
Problem 2 Find all Pareto-optimal
implementations of a single Boolean expression E.

21
Problem 2Special Case of the General Problem

All the Pareto-optimal implementations of a
single expression can be evaluated using DAG
enumeration.
s - and t -reductions can be defined in a similar
way.
If the corresponding expression occurs more than
once then its an s -reduction, otherwise t
-reduction.
If there no s -reductions in the DAG, then all
implementations will have the same area.
The Pareto-optimal implementation will correspond
to the one with minimum delay and can be computed
using a two-greedy strategy.

22
Experimental Setup
E1 f (x1, x2, ) E2 g (x1, x2, ) E3 h
(x1, x2, )
Conversion into Reed-Muller form
Logic synthesis
E1 f1 (x1, x2, ) E2 g1 (x1, x2, ) E3 h1
(x1, x2, )
CSE enumeration
(E11, E21, E31), (E12, E22, E32),
Logic synthesis
Artisan Standard Cells UMC CMOS Technology 0.13µm
23
Results
6-bit Adder
5-bit Adder
Multi-input Addition
4 X 3-bit Multiplier
24
There Is Scope for Further Pruning
Area and Delay for all 6-bit adders generated by
our algorithm
Without any pruning it is impossible to handle
expressions with more than five variables.
25
but the Enumeration Algorithm Finds Interesting
Non-trivial Relations!
4x4-bit multiplier better than our best
manually-designed multiplier?!
Idea Exploit complex dependencies among the
partial product buts of a multiplier
26
Conclusions

We have exploited a new form of division which is
better than algebraic division and still less
complex than Boolean division.
Key to a better exploitation of Common
Sub-Expressions (CSEs).
We have introduced a CSE enumeration algorithm
which discovers all architectures. Unfortunately,
it is still very slow.
More effective pruning strategies are required,
especially based on the inferiority of some
implementations still explored.
Despite the runtime limitations, this exploration
algorithm has already made it possible to study
innovative architectures.
Exploit dependency among input bits in the
compressors of multipliers.