Chap' 4: Datapath Design - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Chap' 4: Datapath Design

Description:

X = (FX, EX), where FX = mantissa, EX = exponent ... Addition: shift one mantissa and add. Subtraction: shift one mantissa and subtract ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 49

Provided by: sungg

Category:

more less

Transcript and Presenter's Notes

Title: Chap' 4: Datapath Design

1
Chap. 4 Datapath Design

Discusses the design of arithmetic units
Basic computer arithmetic methods
4.1. Addition, subtraction, multiplication, and
division
All arithmetic functions can be approximated
4.2. Arithmetic Logic Units (ALUs)
4.3. Floating-point and pipeline processing

2
Unsigned Binary Addition

Decimal addition with fixed number of digits3
4 7, 8 9 7 (with overflow 10)
Half Adder
Binary 1-digit adder0 0 0, 0 1 1,
1 0 1, 1 1 0
Full Adder
Binary 1-digit adder with carry-in
carry-out1 1 0 (cout 1), 1 1 (cin
1) 1 (cout 1)

modulo addition
3
Half Adder (HA) Implementation

Inputs x and y output sumx y sum0 0
00 1 11 0 11 1 0
sum x y x y x EX-OR y

4
Full Adder (FA) Implementation

Inputs x, y, cin Outputs sum,
coutx y cin sum cout0 0 0 0 00 0 1 1
00 1 0 1 00 1 1 0 11 0 0 1
01 0 1 0 11 1 0 0 11 1 1 1 1

5
Lee 2000
6
Simple Adder Designs

Serial Binary Adder
data enters serially, summed data exits serially
Fig. 4.2 (p. 225)
Parallel Adder
Fig. 4.3 (p. 226) n-bit ripple-carry adder (RCA)
Fig. 4.4 (p. 226) n-bit adder-subtracter
Fast Parallel Adder
based on carry lookahead

7
Carry Lookahead Addition

Generates carry out signal using using only
primary input signals (does not use ripples)
Key observations
ci is generated, regardless of the values of any
other carry values, if (xi AND yi) is equal to 1
ci is propagated, depending on the value of ci-1,
if(xi EX-OR yi) is equal to 1
NOTE we can also use (xi OR yi) for the
propagate term

8
(No Transcript)
9
Lee 2000
10
Lee 2000
Lee 2000
11
Multiplication

Combinational Multiplier
Typically uses an array of CSA (carry save adder)
modules
Trades off space (hardware) for time (calculation
speed)
Sequential Multipler
Executes a sequence of add-and-shift operations
Tries to minimize number of add-and-shifts
required
Advantage can use existing registers and ALU
Disadvantage slower than combinational version

12
Multiplication H/W

Based on paper-and-pencil method of repeated
shift-and-add operations

13
Observations

Multiplication of single digits in binary
multiplication is just an AND operation
Multiplication of two n-bit numbers can be
accomplished with (n-1) additions
Can use array of AND gates, HAs, and FAs
Figs. 4.17, 4.18, 4.19 (pp. 242-243) --gt CSA
Question Where is most of the delay in this
design?

14
Sequential Multiplication

Use one parallel adder, a set of registers
(capable of shifting), and control logic
Use the ASM design method to design this circuit
Multiplier recoding can be used to reduce the
number of adds and subtracts required
Booths Algorithm, Booth Multiplier
Modified Booth Multiplier

15
Multiplication with Signed Numbers

Case 1 multiplier X and multiplicand Y are
positive
Case 2 X is positive and Y is negative
sign-extend the partial products during shifting
use the msb (most significant bit) of the partial
product
Case 3 X is negative and Y is positive
add 1 final step of subtracting Y from the
partial product
Case 4 both X and Y are negative
apply methods for both Case 2 and Case 3

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
Booths Algorithm

Suppose X 0111 1110. What is X in base 10?
X 64 32 16 8 4 2 126
X 128 2 126
This works in general ? refer to p. 239
A run of 1s can be replaced by 1 add 1
subtract
X can be recoded as X 1000 0010, where 1
denotes add and 1 denotes subtract
Called differentiating recoding
Algorithm shown in Figs. 4.15, 4.16 (pp. 240-241)

24
Division

Sequential Divider
Executes a sequence of subtract-and-shift
operations
Tries to minimize number of add-and-shifts
required
Advantage can use existing registers and ALU
Disadvantage slower than combinational version
Combinational Divider
Uses an array of 1-bit subtracter modules
Trades off space (hardware) for time (calculation
speed)

25
Sequential Division H/W

Based on paper-and-pencil method of repeated
subtract operations
Note quotient bit needs to be guessed)
Two basic methods available
Restoring division
restore partial remainder if guess is wrong
Nonrestoring division
change next subtract step to addition if guess is
wrong
More advanced methods based on other guessing
methods

26
Paper-and-pencil Division Method
27
Lee 2000
28
Lee 2000
29
Lee 2000
30
Lee 2000
31
Lee 2000
32
Lee 2000
33
Lee 2000
34
Arithmetic Logic Unit (ALU)

Uses of the ALU
process arithmetic and logical instructions
address calculations
act as a data conduit (route data between two
points)
ALU Design Techniques
many advanced transistor-level design techniques
used to achieve fast ALU designs
gate-level designs can be flattened for better
performance
basic ALU design is fairly simple

35
Design of One Bit of ALU

ALU can be designed as an adder that can
conditionally perform other functions based on
the selection of control inputs
ALU designed as a chain of identical 1-bit adders
may not be efficient for large numbers of bits
Adder functions
sum x EX-OR y EX-OR cin
cout (x AND y) OR (y AND cin) OR (x AND cin)
Alternative ALU designs shown in Sec. 4.2

36
y
x
Lee 2000
37
Floating-Point Arithmetic

IEEE Standard for floating-point numbers based on
draft proposed by Kahan et. al. in 1979.
X (FX, EX), where FX mantissa, EX exponent
Multiplication multiply mantissas, add exponents
Division divide mantissas, subtract exponents
Addition shift one mantissa and add
Subtraction shift one mantissa and subtract

38
(No Transcript)
39
Floating-Point Addition Process (Assuming
Positive Numbers)
40
ASM Method Step 1 Pseudocode
41
Floating-Point Addition Units

Similar algorithm shown in Fig. 4.42
Example of algorithm execution shown in Fig. 4.43
Floating-point addition unit for IBM System/360
shown in Fig. 4.44

42
Pipeline Processing Basic Structure
43

Speedup
Speedup(pipeline) Time(no pipeline) / Time
(pipeline)
Space-Time Diagram
Efficiency
Ratio of numbered blocks to total number of
space-time blocks
What is the efficiency of an ideal m-stage
pipeline operating on N data items?

44
Example Pipeline Structure
Linear Pipeline Structure for Floating-point Multi
plication
45
Example Timing Diagram
After 4 clock cycles, there is one output result
every clock cycle
46
Categorization of Pipeline Structures

Based on Function
Instruction pipeline
Arithmetic pipeline (e.g., multiplier pipeline)
Based on Structure
Linear / Nonlinear
Static / Dynamic (multi-function)
Scalar / Vector

47
Simple Instruction Pipelines

Static linear pipeline of about 2-8 stages
Difficulties with simple static linear pipelines
Variations in instruction execution times
Variations in instruction lengths
Different number of accesses to memory to fetch
instruction
Cannot quickly determine location of next
instruction
Thus, instruction sets should be designed so that
the resulting architectures are easily pipelined
Set of fixed-length, similar-complexity
instructions

48
Instruction Pipeline Control

ASM Chart Method
Changes ASM chart to fetch the next instruction
while the current instruction is being executed.
Pipelined Control Signals
Control logic generates control signals in the
first stage
Control signals are pipelined along with the
instructions

Write a Comment

User Comments (0)