Ch. 8: Design of Fast Arithmetic Units - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Ch. 8: Design of Fast Arithmetic Units

Description:

(b) A ripple-carry adder design used to implement a multiple-bit adder. 6/15/09 ' ... Carry chain of a ripple-carry adder is 'broken' by computing the carry-in input ... – PowerPoint PPT presentation

Number of Views:375
Avg rating:3.0/5.0
Slides: 21
Provided by: sungg
Category:

less

Transcript and Presenter's Notes

Title: Ch. 8: Design of Fast Arithmetic Units


1
Ch. 8 Design of Fast Arithmetic Units
  • Fast addition
  • Carry lookahead adder
  • Carry save adder
  • Fast multiplication
  • Combinational and sequential multipliers
  • Multiplier recoding (Booth, modified-Booth, etc.)
  • Pipelining
  • Basic concepts and methods for pipelining
  • Pipelined multiply-accumulate unit example
  • Verilog implementation

2
Basic Adder Design
  • Logic equations for a 1-bit full adder
  • sum abcin abcin ab cin a b cin
  • cout a b a cin b cin
  • Implementation
  • Figure 8.1 (p. 301)
  • (a) One 3-input EX-OR gate for the sum and three
    AND gates and an OR gate for the carry-out (cout)
  • (b) A ripple-carry adder design used to implement
    a multiple-bit adder

3
Carry Lookahead Adder
  • Carry Lookahead
  • Carry chain of a ripple-carry adder is broken
    by computing the carry-in input of a full adder
    module directly from the primary inputs
  • Uses the concepts of a generate term (g) and a
    propagate term (p)
  • gi ai bi
  • if gi 1, carry is generated independent of cin
  • pi ai bi
  • if pi 1, cin is propagated to the output carry

4
Carry-in input for ith full adder computed
directly from theprimary inputs (ai, bi, cin) -
or more accurately, from (gi, pi, cin)
5
CLA Implementation
  • A direct implementation of the carry lookahead
    logic equations requires AND and OR gates with up
    to (n1) inputs for an n-bit CLA
  • This type of hardware implementation is not
    practical (for large n) for most types of
    technologies used to create logic gate circuits
  • Alternative designs
  • A circuit with k m-bit CLA stages (m/n k) with
    ripples (of the carry bit) between adjacent
    stages
  • A hierarchical circuit with k m-bit CLA segments
    and a tree of carry lookahead generation logic
    blocks used to generate the carries into each CLA
    segment

6
Carry-Save Adder (CSA)
  • Carry-save adder (CSA) structure is useful for
    applications requiring a large number of
    additions
  • Three n-bit numbers can be added to produce two
    n-bit results
  • sumi values result in one n-bit result
  • couti values result in second n-bit result

7
Multiplication
  • Multiplication of an n-bit multiplier A with
    ann-bit multiplicand B produces a 2n-bit product
    P
  • A an-1 an-2 a1 a0
  • B bn-1 bn-2 b1 b0

a0B
21 a1B
. . .
For multiplication of 2s complement signed
numbers, this final operation is a subtraction
instead of an addition Why?
2n-2 an-2B

-2n-1 an-1B
2n-bit product result
8
Combinational Multiplier
  • Also referred to as a parallel multiplier
  • Implement multiplication using a 2-dimensional
    array of 1-bit full adders
  • Basic structure is the same as the addition array
    shown on the previous page
  • Can also be implemented using a linear array of
    carry-save adders as shown in Figure 8.3 (p. 308)
  • Each partial product Pi 2i ai B
  • At first CSA level, add three partial products
  • At all subsequent CSA levels, add one more
    partial product
  • A CLA (or other type of 21, 2-input/1-output,
    adder) is required after the final CSA level

9
Sequential Multiplier
  • Compute the sums as a sequence of separate steps
  • Main benefit only requires one 21 adder
  • Much lower hardware cost for large n
  • Use a register to store the partial products
  • Use two registers to store the multiplier and
    multiplicand
  • Requires a state machine (with the corresponding
    control logic) to control the sequence of
    additions used
  • Block diagram shown in Figure 8.4 (p. 309)

10
Fast Multiplication
  • Many different types of methods possible
  • Most methods are based on recoding
  • The multiplier bits are recoded (expressed using
    a different notational syntax) in order to reduce
    the total number of adds/subtracts required
  • Note addition of the partial product Pi 2i ai
    B is not required if ai 0
  • Booth recoding
  • Given a string of 1-bits in the multiplier A,
    addition of Pi is not required for a bit i in the
    middle of the 1-bit string

11
Booth Recoding
  • Ex A 01110 implies that P3, P2, and P1 have to
    be added
  • However, P4 P1 P3 P2 P1 since(24 21) B
    (16 2) B (23 22 21) B
  • A is the Booth-recoded multiplier
  • 1 denotes an add and denotes a subtract
  • Also referred to as differential recoding since
    A is the 1-dimensional differentiation of A

10010
1
12
Modified-Booth Recoding
11111111
  • Given A 01010101, A
  • More adds/subtracts for A than A!
  • Identify isolated 1s and 0s as well as strings
    of 1s and strings of 0s
  • Table 8.1 (p. 311) shows a modified-Booth
    recoding table and Table 8.2 shows the
    corresponding add/subtract operations
  • Other recoding methods also possible

13
Multiply-Accumulate
  • Multiply-Accumulate (MAC) is a common operation
    in digital signal processing applications
  • D A B C
  • A A B C
  • To implement a MAC, we simply have to perform one
    more addition in addition to the addition of the
    partial products

14
Pipelining
  • A common method used to speed up digital logic
    hardware
  • Works well if data can be worked on a little
    bit and a time, and there is a lot of data
  • Types of pipelines
  • Instruction Pipelines
  • Used to speed up fetch-execute operation of CPUs
  • Arithmetic Pipelines
  • Used to speed up arithmetic operations
  • E.g., the vector operation A B C

15
Structure of a Simple Pipeline
16
  • Speedup
  • Speedup(pipeline) Time(no pipeline) / Time
    (pipeline)
  • Space-Time Diagram
  • Efficiency
  • Ratio of numbered blocks to total number of
    space-time blocks
  • What is the efficiency of an ideal m-stage
    pipeline operating on N data items?

17
Example Pipeline Structure
Linear Pipeline Structure for Floating-point Multi
plication
18
Example Timing Diagram
After 4 clock cycles, there is one output result
every clock cycle
19
Pipelined MAC Unit Design Code ? refer to pp.
323-328 Test Bench ? pp. 329-332 Simulation
Output ? Figure 8.9 (p. 329)
20
Verilog Code for a Pipeline
  • Two major methods
  • Write one large always block and describe each
    pipeline segment in a functional manner
    (perhaps using multiple calls to subroutines)
  • Method used for pipelined MAC
  • Write one always block for each segment
  • Be careful of interactions between always
    blocks
  • All always blocks execute concurrently
  • In order to transfer data from one segment to the
    next, we MUST use pipeline registers
Write a Comment
User Comments (0)
About PowerShow.com