Title: Structure of Computer Systems
1Structure of Computer Systems
- Course 3
- The Arithmetical and Logical Unit
2ALU- Arithmetical and Logical Unit
- Purpose computes arithmetical and logical
operations - arithmetical
- basic operations add, subtract, multiply,
division, modulo - special functions exponential, logarithm, sine,
cosine, tangent, atangent, etc. - logical
- AND, OR, NOT, inclusiveOR, exclusiceOR
- Types of arithmetic units
- integer arithmetic
- floating point arithmetic (e.g. Intels
co-processor) - signal processing arithmetic (e.g. with
saturation MMX) - parallel arithmetic (MMX - integer, SSE2-
floating point)
3Addition
- most used operation
- all the other arithmetic operations are based on
addition - subtract adding the complement
- multiply repetitive adding
- division repetitive subtraction and adding
- efficient implementation of adding operation
- influence directly all the other operations
- efficiency speed and cost (complexity)
4Addition
- Basic (full) adder unit one bit adder
- inputs xi, yi, Ci
- outputs
- Si xi ?yi ?Ci
- Ci xiyi (xi ? yi) Ci-1
- delay 3 gate_delay
5n bit adder with ripple carry
- n bit adder n (1 bit full adder)
- delay n3gate_delay
- example
- n32 gate_delay 10 ns (TTL gate) gt
- delay 32310ns 1000 ns gt fclk_max 1/1000
ns 106 1MHz !!!
6Subtract
- subtract adding with the second numbers 2th
complement - n bit add and subtract
- Add/Sub 0 gt adding
- Add/Sub 1 gt subtraction
7Sequence of steps for adding
Step BUS SEL LD_A/ LD_B/ Add/Sub Wr_m/ Result
1 X 1 0 1 - 1 AltX
2 Y 0 1 0 0 1 BltY
3 - 0 0 1 0 1 AltXY
4 Z - 1 1 - 0 ZltXY
8Improving the Adder Carry Look-ahead Adder
- Issue the delay time of the carry
- Solution direct generation on carry gt Carry
lookahead adder - Ci xiyi (xi ? yi) Ci-1 gi pici-1
- where gi carry generator
- pi carry propagator
- C0 x0y0 (x0 ?y0)C-1 g0 p0C-1
- C1 x1y1 (x1 ?y1)C0 g1 p1C0 g1 p1(g0
p0C-1) g1 p1g0 p1p0C-1 - C2 x2y2 (x2 ?y2)C1 g2 p2C1 g2 p2g1
p1(g0 p0C-1) - g2 p2g1 p2p1g0 p2p1p0C-1
- ......
- Ci f(g0, g1, ... gi, p0, p1, ... pi, C-1)
f(x0, x1, ... xi, y0, y1, ... yi, C-1) - Conclusion Ci is obtained directly by combining
ONLY input signals - Drawbacks
- - the circuits complexity grows exponentially
with the number of bits (n) - - it requires gates with a lot of input signals
- - delayideal 2gate_delay
9Carry Look-ahead Adder - CLU
- generates a result in a shorter time
- CLU is feasible for 4 bits the gate inputs
number is limited - it can be extended putting together 4 bit adders
10Carry Look-ahead Adder
- extension from 4 bits to 16 bits
- Generators and propagators for blocks of bits
from i to k - Group generate gi,k
- Group propagate pi,k
- For a block of 4 bits
- G0,3 g3 p3 g2 p3 p2 g1 p3 p2 p1 g0
- P0,3 p3 p2 p1 p0
- Using this notation we obtain block caries C3,
C7, C11, C15 - C3 G0,3 P0,3 C-1 C7 G4,7 P4,7 C3 G4,7
P4,7(G0,3 P0,3 C-1 )
11Carry Look-ahead Adder
- 16 bit carry look-ahead adder made of
- 4 units of 4 bit carry look-ahead adders
- one 4 bit carry look-ahead unit
12Carry select adder
- Extra hardware to speed-up the adding
- Avoids complex carry look-ahead unit
13Serial adder
- Adding two sequences of bits with a 1 bit adder
An-1 .A2 A1 A0
Ai
Si
shift entry
1 bit adder
Sn-1 .S2 S1 S0
Bi
Ci
shift entry
Ci-1
Bn-1 .B2 B1 B0
Q D
clk
Clk
14BCD adder
- adding numbers in BCD (binary coded decimal)
representation - a correction is needed
- if the figure is not a decimal
- If a carry is generated to the next group of 4
bits (to the next decimal figure) - solution adding 6 (both cases)
- Example
-
15Multiplication
Modified multiply 00000000
Acumulator (AC) 0 ? 0000000 0 shift
right 1 ? 1100 adding
0001100 0 partial product
000110 00 shift right. 0 ?
00011 000 shift right 1 ? 1100
adding 1111 000 final
product Solution shift the partial result to
the right and put the product in the same place
Advantages - we need just an n bits adder
- partial products in the same place
1100 12 1010 10
0000 1100 0000 1100 1111000 78H
120 Issues - we need a 2n bits adder -
partial products must be placed in different
positions
16Multiplication
X
? (n1)
Q S
Q0
Q1
Q n-1
. . .
Y
Scriere
Test
Shift
Command unit
Shift
Clear
Write
Write
17Multiply algorithm
- Write the operands in registers (B ? X, Q ? Y),
clear accumulator (A ? 0) - Complement the negative numbers
- Test Q0
- If Q0 0, shift right A and Q
- If Q0 1, add A B A and shift right A and Q
- Go to step 3 until Yn-1 arrives in Q0. No shift
is needed after the last step - AS BS QS
- If AS 1 complement the result
18Multiply with Booth algorithm
- Improvements
- Multiply numbers in 2th complement no initial
and final complementation are needed - For long sequences of 0s and 1s only shift
operations are needed - For 0s it is obvious from the previous method
- For a sequence of 1s
- Examples 1111 10000 -1
- 11.1111 100.000 1
- A sequence of 1s can be changed into a sequence
of 0s - Only transitions from 0 to 1 or 1 to 0 needs
adding or subtract operations as follows - If two consecutive bits in the second operand
are - 0 and 0 - shift the partial result to the right
- 0 and 1 add second operand and shift the
partial result to the right - 1 and 0 subtract the second operand and shift
the partial result to the right - 1 and 1 - shift the partial result to the right
19Division
- Multiple solutions
- Compare and subtract
- Hard to compare on different positions
- Subtract and restore the partial result (if
necessary) - Subtract the second operand from the most
significant part of the first operand and - If the result is positive than its ok (quotient
gets a 1), - Else restore the result by adding back the second
operand (quotient gets a 0) - Drawback some steps require 2 arithmetical
operations (subtract and adding) - Subtract without restoring the partial result
- try to subtract B from the partial rest RR-B
- If a wrong subtraction was made in the previous
step the correction is made in the next step by
adding the second operand instead of subtracting
it - With correction ((R-B) B)2 - B R2 - B
A shifted one position to the left - Without correction
- (R B)2 B R2 B
- Advantage in a step at most one subtraction or
adding is needed
20Division circuit for the second method
restoring the partial result
21Division algorithm with restoring the partial
result
- Load first operand in A and Q Load second
operand in B - Write AS BS in QS.
- If AS 1, complement A, Q
- If BS 1, complement B
- Tests
- A B, overflow
- B 0, division with 0
- A 0 and Q lt B, rezult 0
- Shift A, Q to the left and put 0 in Q0
- Subtract B from A and put the result in A.
- if AS 0 (positive rest) , shift A, Q to the
left and put 1 in Q0 - else (AS 1 negative rest), add B to A, shift A,
Q to the left and put 0 in Q0 - Go to step 5 n times
- Rounding the result. If A B, add 1 to the Qth
complement - If QS 1 complement register Q
22Multiply with look-up tables
- Principle all the results are pre-computed and
memorized in a non-volatile memory - Multiply is a simple reading from the memory
- Operands form the address of the location where
the result is stored - Problem the dimension of the memory must be 22n
- Examples
- 88 bits gt 16 address lines gt 216 64KB
- 1616 bits gt 32 address lines gt 232 4GB (TOO
MUCH) - Solution
- Multiply 88 bits in multiple steps to obtain
multiply on 16, 32 or 64 bits - Example
- X X15,8 X7,0 Y Y15,8 Y7,0
- P XY X7,0Y7,0 X15,8Y7,0 28 X7,0Y15,8
28 X15,8Y15,8 216 - Observation multiplies with 28 and 216 are
achieved by placing the result in a proper binary
position also the first and the last partial
products may be combined in a single 32 bit
register with no adding required
23Multiply with look-up table
WrX
WrY
Sel1
Sel0
WrP1,2
WrP0
WrP3
Sel2
WrAcc
24Multiply with look-up table
Step WrX WrY WrP0 WrP1,2 WrP3 WrAcc Sel0 Sel1 Sel2 Description
1 1 1 0 0 0 0 0 0 0 Load operands
2 0 0 1 0 0 0 0 0 0 Write P0
3 0 0 0 0 1 0 1 1 0 Write P3
4 0 0 0 1 0 0 1 0 0 Write P1
5 0 0 0 0 0 1 0 1 0 AccP0 P3 P1
6 0 0 0 1 0 0 0 1 0 Write P2
7 0 0 0 0 0 1 0 0 1 AccAccP2
- Multiply with look-up table requires only 7 steps
instead of 16-20 - it can be further optimized
25Arithmetical operations in floating point (FP)
representation
- Floating point representation of a number
- Used in case of very big or very small numbers
- 3 fields for representation
- Sign
- Exponent magnitude of the number
- Mantissa some significant figures (digits) of
the number - IT IS NOT THE REPRESENTATION OF REAL NUMBERS from
mathematics !!!!! - A lots of anomalies and precision problems
- Operating with numbers having different
magnitudes may generate errors caused by
rounding - Mm-M 0 M-Mm m
- Number with decimal parts, in most cases have no
precise FP representation - Example 0.3 has no precise representation in
floating point
26Floating point adder/ subtracter
27Adding floating point numbers
- Load the operands
- Compare exponents (5 cases)
- ex ey, add mantissas and copy the exponent
- ex gt ey and (ex ey) lt number of bits in the
mantissa, than the my mantissa is aligned by
shifting it with ex-ey positions to the right - ex gtgt ey and (ex ey) number of bits in the
mantissa, than X is copied in the result (Y is
too small) go to step 4 - ex lt ey and (ey ex) lt number of bits in the
mantissa, than the mx mantissa is aligned by
shifting it with ey-ex positions to the right
than mantissas are added - ex ltlt ey and (ey ex) number of bits in the
mantissa, than Y is copied in the result (X is
too small) go to step 4 - Add mantissas
- Realign the result if necessary. Shift the
resulting mantissa to the right or to the left
until the integer part is 0 and the first bit
after the decimal point is 1 in the same time
increment or decrement the exponent in accordance
with the shifting operation
28Multiply and division in floating point
representation
- Multiply
- Add the exponents
- Multiply the mantissas
- Adjust the result (shift mantissa to the left and
decrement the exponent if necessary) - Division
- Subtract the exponents
- Divide the mantissas
- Adjust the result (if necessary)
29Add and Subtract with saturation
- Idea if there is an overflow or underflow after
an adding or subtraction the result should be the
maximum or the minimum possible value - example
- unsigned 8 bit representation
- Normal adding (wraparound) With saturation
- 80h90h 10h (error, overflow) 80h90h FFh
(maximum value) - 80h-90h F0h (underflow) 80h-90h 00h (minimum
value) - signed (2th complement) 8 bit representation
- Normal adding (wraparound) With saturation
- 70h20h 90h (error, negative) 70h20h 7Fh
(maximum value) - 80h-20h 60h (error, positive) 80h-20h 80h
(minimum value) - (-128-32 96)
- Used in case of
- signal processing
- multimedia processing
- Typical signal processing operation
amplification - Ue Ui A
-
- Supply 10V-10V, Ui0.05 V A100 gtUe 5V
30Add and Subtract with saturation
- Add and subtract with saturation for unsigned 8
bit representation - the result is selected with a multiplexer
- Carry (C) 0 gt result correct
- C1 and adding gt overflow, resultFFh
- C1 and subtract gt underflow, result00h
- homework do it for 2th complement
C Add/Sub Operation Result S1 S0
0 0 adding Correct XY 1 X
0 1 subtract Correct X-Y 1 X
1 0 adding Overflow FFh 0 1
1 1 subtract Underflow 00h 0 0
Add/Sub
Add/Sub
S0 0 1
0 X X
1 1 0
S1 0 1
0 1 1
1 0 0
C
C