Recent Developments in Theory and Implementation of Parallel Prefix Adders - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Recent Developments in Theory and Implementation of Parallel Prefix Adders

Description:

Idempotent. sub-terms may be 'overlapped' g (0), k(0) g (0), k(0) g (1), k(1) g (1), k(1) ... {1,1,1,2} Idempotent {1,2,2,2} {1,1,1,4} Idempotent {1,1,4,4} {1,1, ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 49
Provided by: wwwmath
Category:

less

Transcript and Presenter's Notes

Title: Recent Developments in Theory and Implementation of Parallel Prefix Adders


1
Recent Developments inTheory and
Implementationof Parallel Prefix Adders
  • Neil Burgess
  • Division of Electronics
  • Cardiff School of Engineering
  • Cardiff University

2
Motivation
  • Parallel Prefix Adders (e.g. Kogge-Stone) mostly
    ignored for deep submicron VLSI
  • large fan-out points
  • wide wiring channels
  • Recent insights can remove both and do...
  • absolute difference
  • late increment
  • media processing

3
Structure of Presentation
  • Parallel Prefix Adder theory
  • Kogge-Stone, Ladner-Fisher
  • New log-depth prefix trees
  • Knowles family of adders
  • New applications of prefix adders
  • late operations, media adder

4
I.Parallel Prefix Adder theory
5
Prefix adder structure
6
Prefix Equations - 1
  • g(i) a(i) ? b(i) carry generate
  • p(i) a(i) ? b(i) carry propagate
  • k(i) ?a(i) ? b(i) carry kill
  • g(i), p(i), k(i) are mutually exclusive
  • Use any two ?g(i) k(i) NAND NOR
  • p(i) needed as well s(i) p(i) ? c(i)

7
Prefix Equations - 2
  • Generate and Not Kill signals are com-bined to
    form Group Signals
  • Gxz ?Kxz interpretation
  • 0 0 c(x1) 0
  • 0 1 c(x1) c(z)
  • 1 0 Dont care
  • 1 1 c(x1) 1

8
Prefix Equations - Interpretation
  • Group signals yield carry signals
  • Tree outputs c(i1) Gi0
  • Tree inputs Gii g(i) ?Kii ?k(i)

9
Prefix Equations - characteristics
  • Associative
  • sub-terms may be pre-computed in parallel

10
Prefix equations - characteristics
  • Idempotent
  • sub-terms may be overlapped

g
(0), k(0)
g
(0), k(0)
g
(1), k(1)
g
(1), k(1)
g
(2), k(2)
g
(2), k(2)
0
1
1
GK
GK
GK
1
2
2
0
0
GK
GK
2
2
c
(3)
c
(3)
c
(2)
c
(2)
c
(1)
c
(1)
11
4-bit Ladner-Fisher prefix tree
  • 1 sub-term
  • pre-computed
  • Logarithmic
  • depth
  • Fan-out 2
  • in 2nd row
  • (laterally)

12
8-bit Ladner-Fisher prefix tree
  • Log depth lateral fan-out 4 in 3rd row
  • No exploitation of idempotency

13
16-bit Ladner-Fisher prefix tree
  • Log depth with large fan-out in final row

14
4-bit Kogge-Stone prefix graph
  • Fan-out 1
  • (laterally)
  • 1 extra cell
  • parallel wires
  • in 2nd row

15
8-bit Kogge-Stone prefix graph
  • More cells wiring than Ladner-Fisher

16
16-bit Kogge-Stone prefix graph
  • Low fan-out but wider wiring channels
  • No exploitation of idempotency

17
Black cells and grey cells
  • Carries, c(i) Gi-10 Ki-10 terms not needed
  • G-only cells called and coloured grey

18
The story so far
  • Parallel prefix adders available in VLSI
  • Log-depth adders possible
  • high fan-outs 1,2,4,8 low cell count
  • low fan-outs 1,1,1,1 high cell count
  • Problematic in VLSI (buffering, area)
  • Idempotency of ? operator not exploited

19
II.KnowlesFamily of Adders
20
Log-depth prefix trees
  • In VLSI
  • L-F trees require too much buffering ? delay
  • K-S trees require too much area (wire flux)
  • Fan-outs characterised as
  • 1,2,4,8 Ladner-Fisher
  • 1,1,1,1 Kogge-Stone

21
Knowles insight
  • Use other fan-out schemes
  • 5 possible 8-bit log-depth prefix trees
  • 1,1,1 17 cells Kogge-Stone
  • 1,1,2 17 cells uses idempotency
  • 1,1,4 14 cells no idempotency
  • 1,2,2 14 cells no idempotency
  • 1,2,4 12 cells Ladner-Fisher

22
Knowles 8-bit prefix trees
  • All trees are log-depth

23
Tree construction rules
  • Levels are labelled 0,1,2...
  • Fan-out at jth level, 2k , satisfies 2k ? 2j
  • Fan-out at jth level ? fan-out at j1th level
  • Lateral wire length at jth level is 2j

24
Knowles 16-bit trees - I
  • 1,1,1,1 49 cells 1,1,1,8 42 cells
  • 1,1,1,2 49 cells 1,2,2,2 42 cells
  • 1,1,1,4 49 cells 1,1,4,4 40 cells
  • 1,1,2,2 49 cells 1,1,4,8 36 cells
  • 1,1,2,4 49 cells 1,2,2,8 36 cells
  • 1,1,2,8 42 cells 1,2,4,4 36 cells
  • 1,2,2,4 42 cells 1,2,4,8 32 cells

25
Knowles 16-bit trees - II
  • 1,1,1,1 1,1,1,8
  • 1,1,1,2 Idempotent 1,2,2,2
  • 1,1,1,4 Idempotent 1,1,4,4
  • 1,1,2,2 Idempotent 1,1,4,8
  • 1,1,2,4 Idempotent 1,2,2,8
  • 1,1,2,8 Idempotent 1,2,4,4
  • 1,2,2,4 Idempotent 1,2,4,8

26
Knowles 16-bit trees - III
  • 1,1,1,1 1,1,1,8 R
  • 1,1,1,2 I 1,2,2,2 R
  • 1,1,1,4 I 1,1,4,4 R
  • 1,1,2,2 I 1,1,4,8 R
  • 1,1,2,4 I 1,2,2,8 R
  • 1,1,2,8 R, I 1,2,4,4 R
  • 1,2,2,4 R, I 1,2,4,8 R

27
Quick way of spotting R, I
  • Define span(l) as distance from start of wire to
    first cell in lth level
  • span(l) 2l ? fanout(l) ? 1
  • tree characteristics
  • R if span(j) ? span(k) for j lt k
  • I if span(i) span(j) span(k) for i lt j lt k

28
Examples of R I spotting
  • fanout(l) span(l) characteristic
  • 1,1,1,1 ? 1,2,4,8 neither R nor I
  • 1,1,2,2 ? 1,2,3,7 I only
  • 1,2,2,2 ? 1,1,3,7 R only
  • 1,2,2,4 ? 1,1,3,5 R I
  • Are R I adders best?

29
VLSI design of prefix adders
  • Adders laid out as rectangular array of prefix
    cells (and gaps)
  • Assume cells measure 10?m ? 4?m
  • 2 cells per significance ? 20?m / bit
  • Key design parameters
  • buffering (area delay)
  • wiring channels (area)

30
16-bit adder example
  • Assumptions
  • Maximum fan-out without buffering
  • 3 cells 80?m wire (4 cell widths)
  • Maximum fan-out with buffering
  • 9 cells 240?m wire (12 cell widths)
  • Employ 1,2,2,4 architecture

31
1,2,2,4 prefix adder layout
32
Area vs Time for 32-bit adders
Area
K-S 1,1,1,1,1
1,1,2,2,2
1,2,2,4,4 ? 1,1,3,5,13
L-F 1,2,4,8,16
Delay
33
32-bit prefix tree adders
  • Exploitable trade-off between adders delay and
    area
  • Kogge-Stone adder 16 faster than Ladner-Fisher
    but 66 larger
  • 1,2,2,4,4 adder 8 faster than Ladner-Fisher
    but only 3 larger
  • buffering also trades off speed for area

34
III.New applications of prefix adders
35
Other addition operations
  • Late increment
  • Mod 2w-1 addition for Reed-Solomon coding
  • floating-point rounding
  • Late complement
  • absolute difference for video motion estimation
  • sign-magnitude addition
  • Typically use 2 adders and a MUX

36
Increments in prefix trees
  • Row of prefix cells late 1 operation
  • Ladner-Fisher comprises many late 1s
  • 1 8-bit, 2 4-bit, 4 2-bit, 8 1-bit

37
Late increment tree
  • Adder returns AB if inc 0
  • Adder returns AB1 if inc 1

inc
38
Late increment logic
  • Late Carry lc(i) set high if
  • c(i) 1 or
  • inc 1 and a(n),b(n) ? 0,0 ? n 0 ? n lt i

0
c(i) G
i
-1
lc(i)
inc
0
Ø
K
s(i)
i
-1
p(i)
39
Late complement theory
  • In 2s-complement, ?N -(N1)
  • A ?B A - B - 1
  • late increment then yields A - B
  • ?(A ?B) -(A - B - 11) B - A
  • Absolute difference readily available

40
Absolute difference logic
  • If c(w) 0, result negative
  • if c(w) 0, invert all the bits
  • else always perform late increment with ?Ki-10

41
Summary of late ops
  • Available on all prefix adders
  • Extra delay 1 gates delay buffering
  • Extra hardware ?w black cells
  • This technique used in floating-point units
  • late increment for rounding
  • late complement for true subtraction

42
Media (packed) arithmetic
  • Fundamental strategy
  • Use full wordlength hardware for
  • multiple sub-wordlength computations
  • Examples
  • 32-bit adder ? 4 8-bit adders
  • 32-bit multiplier ? 2 16-bit multipliers

43
Partitioning an adder
  • Criteria
  • support carries propagating within sub-adders
  • prevent carries propagating between sub-adders
  • Solutions
  • put AND gates on carry chains ? slower adder
  • put dummy 0s on operand bits ? larger adder
  • Use prefix adder!!

44
Packed prefix adder - 1
  • Force ?k(n) 0 at partition points
  • prevents carries propagating across bit n
  • exploits dont care condition (g,?k) (1,0)
  • Implementation
  • change ?k(n) gate to (2,1) OR-AND gate
  • delay-neutral modification

45
Packed prefix adder - 2
  • Force c(n) Gn-10 0 at partition points
  • prevents c(n) ? s(n) errors
  • Implementation
  • insert AND gates (off critical path) or
  • change Gn-10 gate to (2,1,1) complex gate
  • BUT need Gn-10 signal for sub-adder overflows

46
Packed prefix adder - 3
  • Sub-adder carries complete early
  • Extraneous cells automatically do nothing

47
Last Slide
  • Recent developments in prefix adders
  • new family of log-depth trees
  • late operations
  • packed arithmetic for media processing
  • Future possibilities
  • systematic exploitation of idempotency
  • trees with reduced buffering
  • combine packed arithmetic/late ops

48
ANY QUESTIONS OR COMMENTS?
Write a Comment
User Comments (0)
About PowerShow.com