Ch. 8: Design of Fast Arithmetic Units - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Ch. 8: Design of Fast Arithmetic Units

Description:

(b) A ripple-carry adder design used to implement a multiple-bit adder. 6/15/09 ' ... Carry chain of a ripple-carry adder is 'broken' by computing the carry-in input ... – PowerPoint PPT presentation

Number of Views:375

Avg rating:3.0/5.0

Slides: 21

Provided by: sungg

Category:

more less

Transcript and Presenter's Notes

Title: Ch. 8: Design of Fast Arithmetic Units

1
Ch. 8 Design of Fast Arithmetic Units

Fast addition
Carry lookahead adder
Carry save adder
Fast multiplication
Combinational and sequential multipliers
Multiplier recoding (Booth, modified-Booth, etc.)
Pipelining
Basic concepts and methods for pipelining
Pipelined multiply-accumulate unit example
Verilog implementation

2
Basic Adder Design

Logic equations for a 1-bit full adder
sum abcin abcin ab cin a b cin
cout a b a cin b cin
Implementation
Figure 8.1 (p. 301)
(a) One 3-input EX-OR gate for the sum and three
AND gates and an OR gate for the carry-out (cout)
(b) A ripple-carry adder design used to implement
a multiple-bit adder

3
Carry Lookahead Adder

Carry Lookahead
Carry chain of a ripple-carry adder is broken
by computing the carry-in input of a full adder
module directly from the primary inputs
Uses the concepts of a generate term (g) and a
propagate term (p)
gi ai bi
if gi 1, carry is generated independent of cin
pi ai bi
if pi 1, cin is propagated to the output carry

4
Carry-in input for ith full adder computed
directly from theprimary inputs (ai, bi, cin) -
or more accurately, from (gi, pi, cin)
5
CLA Implementation

A direct implementation of the carry lookahead
logic equations requires AND and OR gates with up
to (n1) inputs for an n-bit CLA
This type of hardware implementation is not
practical (for large n) for most types of
technologies used to create logic gate circuits
Alternative designs
A circuit with k m-bit CLA stages (m/n k) with
ripples (of the carry bit) between adjacent
stages
A hierarchical circuit with k m-bit CLA segments
and a tree of carry lookahead generation logic
blocks used to generate the carries into each CLA
segment

6
Carry-Save Adder (CSA)

Carry-save adder (CSA) structure is useful for
applications requiring a large number of
additions
Three n-bit numbers can be added to produce two
n-bit results
sumi values result in one n-bit result
couti values result in second n-bit result

7
Multiplication

Multiplication of an n-bit multiplier A with
ann-bit multiplicand B produces a 2n-bit product
P
A an-1 an-2 a1 a0
B bn-1 bn-2 b1 b0

a0B
21 a1B
. . .
For multiplication of 2s complement signed
numbers, this final operation is a subtraction
instead of an addition Why?
2n-2 an-2B

-2n-1 an-1B
2n-bit product result
8
Combinational Multiplier

Also referred to as a parallel multiplier
Implement multiplication using a 2-dimensional
array of 1-bit full adders
Basic structure is the same as the addition array
shown on the previous page
Can also be implemented using a linear array of
carry-save adders as shown in Figure 8.3 (p. 308)
Each partial product Pi 2i ai B
At first CSA level, add three partial products
At all subsequent CSA levels, add one more
partial product
A CLA (or other type of 21, 2-input/1-output,
adder) is required after the final CSA level

9
Sequential Multiplier

Compute the sums as a sequence of separate steps
Main benefit only requires one 21 adder
Much lower hardware cost for large n
Use a register to store the partial products
Use two registers to store the multiplier and
multiplicand
Requires a state machine (with the corresponding
control logic) to control the sequence of
additions used
Block diagram shown in Figure 8.4 (p. 309)

10
Fast Multiplication

Many different types of methods possible
Most methods are based on recoding
The multiplier bits are recoded (expressed using
a different notational syntax) in order to reduce
the total number of adds/subtracts required
Note addition of the partial product Pi 2i ai
B is not required if ai 0
Booth recoding
Given a string of 1-bits in the multiplier A,
addition of Pi is not required for a bit i in the
middle of the 1-bit string

11
Booth Recoding

Ex A 01110 implies that P3, P2, and P1 have to
be added
However, P4 P1 P3 P2 P1 since(24 21) B
(16 2) B (23 22 21) B
A is the Booth-recoded multiplier
1 denotes an add and denotes a subtract
Also referred to as differential recoding since
A is the 1-dimensional differentiation of A

10010
1
12
Modified-Booth Recoding
11111111

Given A 01010101, A
More adds/subtracts for A than A!
Identify isolated 1s and 0s as well as strings
of 1s and strings of 0s
Table 8.1 (p. 311) shows a modified-Booth
recoding table and Table 8.2 shows the
corresponding add/subtract operations
Other recoding methods also possible

13
Multiply-Accumulate

Multiply-Accumulate (MAC) is a common operation
in digital signal processing applications
D A B C
A A B C
To implement a MAC, we simply have to perform one
more addition in addition to the addition of the
partial products

14
Pipelining

A common method used to speed up digital logic
hardware
Works well if data can be worked on a little
bit and a time, and there is a lot of data
Types of pipelines
Instruction Pipelines
Used to speed up fetch-execute operation of CPUs
Arithmetic Pipelines
Used to speed up arithmetic operations
E.g., the vector operation A B C

15
Structure of a Simple Pipeline
16

Speedup
Speedup(pipeline) Time(no pipeline) / Time
(pipeline)
Space-Time Diagram
Efficiency
Ratio of numbered blocks to total number of
space-time blocks
What is the efficiency of an ideal m-stage
pipeline operating on N data items?

17
Example Pipeline Structure
Linear Pipeline Structure for Floating-point Multi
plication
18
Example Timing Diagram
After 4 clock cycles, there is one output result
every clock cycle
19
Pipelined MAC Unit Design Code ? refer to pp.
323-328 Test Bench ? pp. 329-332 Simulation
Output ? Figure 8.9 (p. 329)
20
Verilog Code for a Pipeline

Two major methods
Write one large always block and describe each
pipeline segment in a functional manner
(perhaps using multiple calls to subroutines)
Method used for pipelined MAC
Write one always block for each segment
Be careful of interactions between always
blocks
All always blocks execute concurrently
In order to transfer data from one segment to the
next, we MUST use pipeline registers