Title: Generating Fast Multipliers using Clever Circuits
1Generating Fast Multipliers using Clever Circuits
- Mary Sheeran
- Chalmers University of Technology
- Research funded by SRC in an Intel-custom
project, and by Vetenskapsrådet
2Using a functional language to describe hardware
- Gives a style of circuit description and analysis
- Emphasises connection patterns
- User writes circuit generators
3Interleave
f
f
ilv f unriffle -gt- two f -gt- riffle
4Butterfly
bfly (n-1) circ
bfly (n-1) circ
5Defining Butterfly
bfly 0 circ id bfly n circ ilvN (n-1)
circ -gt- two (bfly (n-1) circ)
two copies of smaller
butterfly
circuit
6Butterfly Layout on an FPGA
7BUT
High performance data paths are in reality
NOT regular! Start out
regular and become less so as design proceeds --
end with analogue design of each instance of each
cell! Its all in the wires
8Shadow Values
gen. bfly
bafter
Info. about what is bigger/smaller
(98 comparators) updated by components
(dynamic) Only necessary sub-sorters included
9Clever Circuits
decide what component to be based on on shadow
values produced when a particular component is
used Try it and see during generation
10Clever circuits give control over
- Presence or absence of components (Charme03)
- Shape of circuit wiring
(this paper) - Circuit topology
(next paper)
11Multiplication
- 11010
- 01001
- 11010
- 00000
- 00000
- 11010
- 00000
- 0011101010
12Multiplication
- msb 1 1 0 1 0
- 0 0 0 0 0
- 0 0 0 0 0
- 1 1 0 1 0
- 0 0 0 0 0
-
13Multiplication
- lsb 0 1 0 1 1
- 0 0 0 0 0
- 0 0 0 0 0
- 0 1 0 1 1
- 0 0 0 0 0
-
14Structure of multiplier
15 - multBin comps (as,bs) p1ss
- where
- (p1p2,p3ps) prods_by_weight (as,bs)
- is redArray comps
ps - ss binaryAdder
(p2,p3is) - redArray comps ps is
- where
- (is,) row (compress comps) (,ps)
16 Reduction tree for multiplier
5
4
4
3
3
carries
2
Fast Adder
17- Will concentrate on the reduction tree (a row of
compress cells) - Assume partial products already generated (e.g.
using and gates). May also include recoding to
reduce size of tree (cf. Booth)
18Compress (diff2)
n-2
2
19diff gt 2 diff lt 2
k
k
wcell
hcell
k2
k-1
20 - compress comps (as,bs)
- (diff gt 2) (compress comps - hcell
comps) (as,bs) - (diff 2) column (fcell comps)
(as,bs) - (diff lt 2) (compress comps - wcell
comps) (as,bs) - where diff length bs - length as
21(No Transcript)
22possible fcell
c
fullAdd
s
halfAdd cells similar. Gives standard array
multiplier. Not great!
23Only need to vary wiring!Make it explicit
iC
s3
cc
iS
24Dadda-like
c
fullAdd
toEnd (a,as) asa
s
Excellent log depth reduction tree , but known
for irregularity, difficult layout
25picture by Henrik Eriksson, Chalmers
26Delay model for half adder
- halfAddI (as, bs, ac, bc) a,b s,cout
- where
- s max (asa) (bsb)
- cout max (aca) (bcb)
- as is delay between a input and sum output
etc. - hI as halfAddI (10,10,5,5) as
- fI as fullAddI (20,20,10,10,10,10) as
27Checking gate delay
- dDadG n
- simulate (redArray (hI,fI,toEnd,toEnd,id,sep2,
sep3)) (ppzs n) - Gate delay models
wiring cells (allow
.
inclusion of wiring delay) - Maingt dDadG 16
- 0,10,5,20,20,30,30,40,40,50,50,50,50
,60,60,70,70,70, - 70,70,70,80,70,80,80,90,90,90,90,90,9
0,90,90,90,90,90, - 80,90,80,80,70,80,70,80,70,70,60,70,6
0,60,50,60,50,50, - 40,20,0,20
28Promising, but we can do better!
- Choose what wiring cells to use dynamically,
during circuit generation, rather than in advance - Base choice on delay behaviour of both wires and
components
29Idea Harden the wiring during circuit generation
using clever circuits. Shadow values estimate
delay through wires and cells.
30 - cswap((a,x),(b,y))
- if (xgty) then ((b,y),(a,x))else((a,x),(b,y))
31 - cleverInsert row cswap -gt- apr
- forms necessary wiring based on context (delays
on shadow wires)
32 Structure of circuit generator remains
unchanged
- adapt (hAdd, fAdd, cc) (d,pds)
- mmark pds -gt-
- redArray (hAdd // hIB,
- fAdd // fIB,
Haskell level - circuit level cInsert,
- cInsert,
- cc // cross d,
- sep2,
- sep3) -gt- unmark
33Result (multiplication)
- Simple parameterised description of fast adaptive
multiplier. - Like Three Dimensional Method except that
wire-length, and not only gate-delay is taken
into account in choosing which connections to
make - Promises to perform well (better than modified
Dadda and TDM) -
34Result (multiplication)
- Adaption to incoming delay profile can be
arranged (clever circuits again). - Can also easily adapt description to take account
of limitations on cross-cell tracks (see paper) - Much remains to be done (e.g. insertion of
buffers, fine delay modelling, transistor sizing,
other layouts, the rest of the multiplier...).
35(No Transcript)
36Result (general)
- Non-standard interpretation used after generation
(as we have long done) and now also to guide
synthesis. - Circuit generators short and sweet and LOOK LIKE
circuit descriptions. High degree of
parameterisation. Application areas? Module
generation for full custom / SoC / FPGA - Ideas are completely compatible with Intels IDV
system (see talk by Greg Spirakis at this
conference)
37Result (general)
- Clever circuits a good idiom. Can control choice
of components, wiring and topology. Greatly
increase expressive power of the connection
patterns approach. - Gives a way to allow non-functional properties to
influence design (even early on) - Vital as we move to deep sub-micron
- Separation of concerns becoming less and less
possible
38Formal Verification??
- Have verified small-sized versions of multipliers
(Bjesse, Synopsys) - Should verify generators (see Hunts seminal
work) - Investigating generation of FOL for verification
of Haskell programs (Cover project at Chalmers)
39What next?
- Want to go the whole hog and generate layouts for
high performance arithmetic circuits from Wired - Need help with the formal verification of
generators - And it is time to return to refinement