Title: ECE 4120 FloPt 1
1Floating-Point Computer Arithmetic
- ECE 4120 Fundamentals of Computer Design
- Dr. Roger L. Haggard, Associate Professor
- Department of Electrical and Computer Engineering
- Tennessee Technological University
- Spring 2004
2Floating Point Numbers Systems - Introduction
- Can represent both integers and fractions with
much wider range of values - N bits still represent up to 2N values, but
interpret differently - Scientific notation in base 10 FPNS
- - 2.34 1012 - 2.34 E12
- - 2.34 10 12
- Mantissa Base Exponent
- sign - 10 sign
- mag 2.34 mag 12
- radix 10 radix 10
- usually assumed, not written with number
- can be assumed
- \ Every number must contain
- mantissa sign mag / - / 2.34 /
- exponent sign mag / / 12 /
3FPNS Basics (1)
- For Computer Binary values most efficient
- VFPN Mantissa BaseExponent
-
- usually S/M usually excess code
- \ VFPN (-1)SIGN VM rbVE
- value of mantissa base of system value of
exponent - in base rb in base re
- m digits e digits
- VMMIN VM VMMAX VMMIN VE VEMAX
4FPNS Basics (2)
- Mantissa is fixed point assume value of p
- Normalization convert number so that the msd ¹
0, giving maximum precision for the number - Usually p m, so mantissa is a pure fraction
- \ VMMIN 0.1000 1/ rb
- \ VMMax 0.1111 1- rb-m
- Number of legal mantissas NLM (rb - 1)
rbm-1 - possible 1st digit other digits
- Number of legal exponents NLE ree
(code-dependent) - Number of representable values NRV NLM
NLE 2 (signed!) - Min FP value VMIN VMMIN rbVE,MIN
- Max FP value VMAX VMMAX rbVE,MAX
57-Bit Example 1
- FPNS with rb 2, re 2, m 4, e 2, p
m 4 - (S/M) (unsigned) normalized, msd 1
- so value (sign) 0.mmmm 2ee
- VMMIN 0.10002 1/2 NLM 8
- VMMAX 0.11112 15/16
- NRV 8 4 2 64
- VEMIN 002 0 NLE 4
- VEMAX 112 3
- VMAX .11112 2112 15/16 8 7 1/2
- VMIN .10002 2002 1/2 1 1/2 IF Ve
3 Dr - .0001 23 1/2
67 Bit Example 2
7IEEE Floating Point Standard (Std 754)
- rb 2, re 2, m 24, e 8, p m
23 - (S/M) (unsigned, excess 127)
- MSB hidden, not stored
8IEEE Floating Point Standard (Std 754)
VMMIN 1.00 02 1 NLM 223 VMMAX 1.11
12 2-2-23 NRV 223 254
2 VEMIN 1 - 127 -126 NLE 28 - 2
254 4.2 109 VEMAX 254 - 127
127 VMIN 1.00 0 2-126 _at_ 1.2
10-38 VMAX 1.11 1 2127 _at_ 3.4
1038
- 223 _at_ 8 106 _at_ 7 significant decimal
digits (Dr 2-23 2VE) - Gradual underflow
- V lt 2-126 (denormalized) when SE 0, VM ¹ 0
(Hidden bit 0) - Error reporting
- NaN (Not a Number) 0/0, /, /0, (NaN op X)
SE 255, VM ¹ 0 - Infinity when V gt 2 2127 SE 255, VM 0
9IEEE FPNS Conversion Example
- Convert IEEE value C050000016 to its decimal
value
S 1 S E 128 (-) VE 1 VM 1.1010 02
1.62510 V (-1)1 1.625 21
-3.25010
10IEEE FPNS Addition
Floating Point Add (Positive Operands)
\Align smaller value 1.2 102
.12 103 (Shift Right 1) 2.4 103
2.4 103 ? 2.52 103
1.01 23 1.01 23 1.11
22 .111 23 (SR1) ?
10.001 23 Post Normalize SR1
1.0001 24
11IEEE FPNS Addition Hardware Diagram
Databus (310)
32
23
23
1
1
8
8
MA
MB
EA
EB
24
24
8
8
Align MA
Align MB
SR (n)
24
24
Compare
A B Cout F Cin
D
M24
24
M23-0
Select Increment for Normalize
Adjust E
Normalize M
SR (1)
0
8
23
3S Reg
EY
MY
SY
3S Reg
3S Reg
8
23
Databus (310)
32
12IEEE FPNS Addition AlgorithmVersion 1 (1)
- Add positive normalized operands
- Omit gradual underflow, NaN,
13Version 1 (2)
EA DB(3023) MA 1, DB(220), hidden bit
MSB EB DB(3023) MB 1, DB(220) hidden
bit MSB
1. Load A B
D EA - EB If D lt 0 then --- A smaller E
EB Shift Right MA by D MA Else (D ³ 0) ---
B smaller E EA Shift Right MB by D
MB Endif
2. Align
14Version 1 (3)
If M24 1 then Shift Right M by 1 E E
1 Endif
4. Normalize
MY M EY E SY 0
5. Store Y
15IEEE FPNS Addition Algorithm Version 1S (1)
- More efficient with special case than Version 1
- Special Case Exponents differ by gt 24
- All steps except 2 (align) are the same as
Version 1
16Version 1S (2)
D EA - EB IF D ³ 24 then -- B very
small MB 0 E EA Elseif D ³ 0 then --
B smaller Shift Right MB by D MB E
EA Elseif D -24 then -- A very small MA
0 E EB Else (D lt 0) -- A smaller Shift
Right MA by D MA E EB Endif
2. Align
17IEEE FPNS Addition/Subtraction
Solve Need 2 extra bits () A
1.10 22 2s comp. 001.10 ( )
B - 1.11 22 110.01 () Y - .01
22 111.11 Negate Mantissa Sign
- 000.01 22 Normalize (SL2) - 1.00 20
S 1
18IEEE FPNS Addition/Subtraction Hardware
Databus (310)
32
23
23
1
1
8
8
MA
MB
EA
EB
SB
SA
8
8
Align MA
Align MB
SR (n)
24
24
MA
MB
Add/SubOp
Sign Logic
00
00
Compare
A B A/S 26 bit Add/Sub
FAB
D
Msign
26
Adjust E
Absolute Value
SY
25
8
Normalize M
SR (1) orSL (n)
EY
23
8
MY
Databus (310)
32
23
19IEEE FPNS Addition/Subtraction Algorithm Version
2 (1)
1. Load A B 2. Align 3. Add or Sub Mantissas 4.
Normalize 5. Store Y
FP ADD/SUB (overall)
20Version 2 (2)
SA DB(31) EA DB(3023) MA 1, DB(220)
hidden bit MSB SB DB(31) EB DB(3023) MB
1, DB(220) hidden bit MSB
1. Load A B
Same as V1S Align
2. Align
21Version 2 (3)
MY M EY E SY S
5. Store Y
Steps 3 and 4 are discussed on the following pages
22Version 2 (4)
Possible Add/Sub Combinations
\Basically, A B or A - B or - (AB) or
- (A-B)
23Version 2 (5)
CASE (Sub SA 0 SB 1) or (Add SA 0
SB 0) M MA MB, S 0 (Sub SA 0
SB 0) or (Add SA 0 SB 1) M MA -
MB, S 0 (Sub SA 1 SB 1) or (Add
SA 1 SB 0) M MA - MB, S 1 (Sub
SA 1 SB 0) or (Add SA 1 SB 1) M
MA MB, S 1 END CASE If MSIGN 1 then
If negative mantissa then M - M Make
positive (abs value) S S Change
sign End If
3. Add/Sub Mantissas
24Version 2 (6)
Bit 24 23 ----- 0
If M 0 then E 0 Else If M24 1 then
11.xxx Shift Right M by 1 01.1xxx E E
1 Else While M23 0 do 00.01xx Shift left
M by 1 E E -1 01.xx End While End If
Bit 24 23 ----- 0
4. Normalize
25IEEE Floating Point Multiplication Examples
(simpler than addition!)
Ex. 1 6.0 102 Þ Mult. Mantissas, Add
Exponents x 4.0 103 24.0 105
Ex. 2 1.01 23 No Alignment needed x
1.10 24 0 0 0
1 0 1 1 0 1 0 1. 1 1 1 0
27 No Normalization 1.11
27 (Rounding?)
26IEEE Floating Point Multiplication Examples
Ex. 3 1.11 21 No Alignment! x 1.11
25 1 1 1
1 1 1 1 1 1 1 1. 0 0 0 1
26 Normalize (SR1) 0 1.1 0 0 0 1
27 1.10 27
27IEEE Floating Point Multiplication Hardware
EA
EB
SA
SB
MA
MB
8
8
24
24
Add (Excess 127)
Sign Logic (XOR)
Integer Multiplier P
HOW?
8
24
Adjust
Normalize
Increment
SR (1)
24
8
EY
SY
MY
28IEEE Floating Point Division Examples(similar to
Multiplication)
29IEEE Floating Point Division Examples
0.1 0 22 Ex. 3 1.00 24
1.11 0 1.0 0 0 0 22 Q 1.11 22
- 0 1 1 1 2 r2
1 1 0 1 (failed) 7 16
1 0 0 0 - 0 1 1 1 0 0.1 0 22
R (ignored) Þ Normalize (SL1)
1.00 21
30IEEE Floating Point Division Hardware
EA
EB
SA
SB
MA
MB
Subtract (Excess 127)
Sign Logic (XOR)
Integer Divider Q R
HOW?
separate norm. and exp. if R needed
Adjust
Normalize
Decrement
SL (1)
EY
SY
MY
31Floating Point Extra Bit Errors
- Bit Shifting for Align and Normalize can create
wider words - Must be reduced to standard width result
- Reduction creates error and bias, depending on
method - Truncation
- Rounding
- Others
32Extra Bit Errors - Examples
4-bit Addition Example
.1 1 0 1 20 Align (SR3) .0 0 0 1 1 0
1 0 23 .1 0 0 1 23 .1 0 0 1
.1 0 1 0 1 0 1 0
if 4-bit Add
Reducing width causes a small error
33Extra Bit Errors - Examples
4-bit Subtraction Example
- We must usually consider
- Increased ALU / Reg width
- Rounding method
34Floating Point Status
- Separate from the fixed point status bits
- Extra information available
- Overflow exp too large (add, mult)
- Underflow exp too small (mult, div) 0
- Zero (mult by 0, div by 0, add 0s, sub )
- Sign sign of result (Not MSB)
- NaN Not legal number (0/0, /) Invalid
Result - Inexact due to rounding