ECE 4120 FloPt 1 - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

ECE 4120 FloPt 1

Description:

value of mantissa base of system value of exponent. in base rb in base re. m digits e digits ... Mantissa is fixed point assume value of p ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 35
Provided by: rogerh
Category:
Tags: ece | flopt | mantissa

less

Transcript and Presenter's Notes

Title: ECE 4120 FloPt 1


1
Floating-Point Computer Arithmetic
  • ECE 4120 Fundamentals of Computer Design
  • Dr. Roger L. Haggard, Associate Professor
  • Department of Electrical and Computer Engineering
  • Tennessee Technological University
  • Spring 2004

2
Floating Point Numbers Systems - Introduction
  • Can represent both integers and fractions with
    much wider range of values
  • N bits still represent up to 2N values, but
    interpret differently
  • Scientific notation in base 10 FPNS
  • - 2.34 1012 - 2.34 E12
  • - 2.34 10 12
  • Mantissa Base Exponent
  • sign - 10 sign
  • mag 2.34 mag 12
  • radix 10 radix 10
  • usually assumed, not written with number
  • can be assumed
  • \ Every number must contain
  • mantissa sign mag / - / 2.34 /
  • exponent sign mag / / 12 /

3
FPNS Basics (1)
  • For Computer Binary values most efficient
  • VFPN Mantissa BaseExponent
  • usually S/M usually excess code
  • \ VFPN (-1)SIGN VM rbVE
  • value of mantissa base of system value of
    exponent
  • in base rb in base re
  • m digits e digits
  • VMMIN VM VMMAX VMMIN VE VEMAX

4
FPNS Basics (2)
  • Mantissa is fixed point assume value of p
  • Normalization convert number so that the msd ¹
    0, giving maximum precision for the number
  • Usually p m, so mantissa is a pure fraction
  • \ VMMIN 0.1000 1/ rb
  • \ VMMax 0.1111 1- rb-m
  • Number of legal mantissas NLM (rb - 1)
    rbm-1
  • possible 1st digit other digits
  • Number of legal exponents NLE ree
    (code-dependent)
  • Number of representable values NRV NLM
    NLE 2 (signed!)
  • Min FP value VMIN VMMIN rbVE,MIN
  • Max FP value VMAX VMMAX rbVE,MAX

5
7-Bit Example 1
  • FPNS with rb 2, re 2, m 4, e 2, p
    m 4
  • (S/M) (unsigned) normalized, msd 1
  • so value (sign) 0.mmmm 2ee
  • VMMIN 0.10002 1/2 NLM 8
  • VMMAX 0.11112 15/16
  • NRV 8 4 2 64
  • VEMIN 002 0 NLE 4
  • VEMAX 112 3
  • VMAX .11112 2112 15/16 8 7 1/2
  • VMIN .10002 2002 1/2 1 1/2 IF Ve
    3 Dr
  • .0001 23 1/2

6
7 Bit Example 2
7
IEEE Floating Point Standard (Std 754)
  • rb 2, re 2, m 24, e 8, p m
    23
  • (S/M) (unsigned, excess 127)
  • MSB hidden, not stored

8
IEEE Floating Point Standard (Std 754)
VMMIN 1.00 02 1 NLM 223 VMMAX 1.11
12 2-2-23 NRV 223 254
2 VEMIN 1 - 127 -126 NLE 28 - 2
254 4.2 109 VEMAX 254 - 127
127 VMIN 1.00 0 2-126 _at_ 1.2
10-38 VMAX 1.11 1 2127 _at_ 3.4
1038
  • 223 _at_ 8 106 _at_ 7 significant decimal
    digits (Dr 2-23 2VE)
  • Gradual underflow
  • V lt 2-126 (denormalized) when SE 0, VM ¹ 0
    (Hidden bit 0)
  • Error reporting
  • NaN (Not a Number) 0/0, /, /0, (NaN op X)
    SE 255, VM ¹ 0
  • Infinity when V gt 2 2127 SE 255, VM 0

9
IEEE FPNS Conversion Example
  • Convert IEEE value C050000016 to its decimal
    value

S 1 S E 128 (-) VE 1 VM 1.1010 02
1.62510 V (-1)1 1.625 21
-3.25010
10
IEEE FPNS Addition
Floating Point Add (Positive Operands)
\Align smaller value 1.2 102
.12 103 (Shift Right 1) 2.4 103
2.4 103 ? 2.52 103
1.01 23 1.01 23 1.11
22 .111 23 (SR1) ?
10.001 23 Post Normalize SR1
1.0001 24
11
IEEE FPNS Addition Hardware Diagram
Databus (310)
32
23
23
1
1
8
8
MA
MB
EA
EB
24
24
8
8
Align MA
Align MB
SR (n)
24
24
Compare
A B Cout F Cin
D
M24
24
M23-0
Select Increment for Normalize
Adjust E
Normalize M
SR (1)
0
8
23
3S Reg
EY
MY
SY
3S Reg
3S Reg
8
23
Databus (310)
32
12
IEEE FPNS Addition AlgorithmVersion 1 (1)
  • Add positive normalized operands
  • Omit gradual underflow, NaN,

13
Version 1 (2)
EA DB(3023) MA 1, DB(220), hidden bit
MSB EB DB(3023) MB 1, DB(220) hidden
bit MSB
1. Load A B
D EA - EB If D lt 0 then --- A smaller E
EB Shift Right MA by D MA Else (D ³ 0) ---
B smaller E EA Shift Right MB by D
MB Endif
2. Align
14
Version 1 (3)
If M24 1 then Shift Right M by 1 E E
1 Endif
4. Normalize
MY M EY E SY 0
5. Store Y
15
IEEE FPNS Addition Algorithm Version 1S (1)
  • More efficient with special case than Version 1
  • Special Case Exponents differ by gt 24
  • All steps except 2 (align) are the same as
    Version 1

16
Version 1S (2)
D EA - EB IF D ³ 24 then -- B very
small MB 0 E EA Elseif D ³ 0 then --
B smaller Shift Right MB by D MB E
EA Elseif D -24 then -- A very small MA
0 E EB Else (D lt 0) -- A smaller Shift
Right MA by D MA E EB Endif
2. Align
17
IEEE FPNS Addition/Subtraction
Solve Need 2 extra bits () A
1.10 22 2s comp. 001.10 ( )
B - 1.11 22 110.01 () Y - .01
22 111.11 Negate Mantissa Sign
- 000.01 22 Normalize (SL2) - 1.00 20
S 1
18
IEEE FPNS Addition/Subtraction Hardware
Databus (310)
32
23
23
1
1
8
8
MA
MB
EA
EB
SB
SA
8
8
Align MA
Align MB
SR (n)
24
24
MA
MB
Add/SubOp
Sign Logic
00
00
Compare
A B A/S 26 bit Add/Sub
FAB
D
Msign
26
Adjust E
Absolute Value
SY
25
8
Normalize M
SR (1) orSL (n)
EY
23
8
MY
Databus (310)
32
23
19
IEEE FPNS Addition/Subtraction Algorithm Version
2 (1)
1. Load A B 2. Align 3. Add or Sub Mantissas 4.
Normalize 5. Store Y
FP ADD/SUB (overall)
20
Version 2 (2)
SA DB(31) EA DB(3023) MA 1, DB(220)
hidden bit MSB SB DB(31) EB DB(3023) MB
1, DB(220) hidden bit MSB
1. Load A B
Same as V1S Align
2. Align
21
Version 2 (3)
MY M EY E SY S
5. Store Y
Steps 3 and 4 are discussed on the following pages
22
Version 2 (4)
Possible Add/Sub Combinations
\Basically, A B or A - B or - (AB) or
- (A-B)
23
Version 2 (5)
CASE (Sub SA 0 SB 1) or (Add SA 0
SB 0) M MA MB, S 0 (Sub SA 0
SB 0) or (Add SA 0 SB 1) M MA -
MB, S 0 (Sub SA 1 SB 1) or (Add
SA 1 SB 0) M MA - MB, S 1 (Sub
SA 1 SB 0) or (Add SA 1 SB 1) M
MA MB, S 1 END CASE If MSIGN 1 then
If negative mantissa then M - M Make
positive (abs value) S S Change
sign End If
3. Add/Sub Mantissas
24
Version 2 (6)
Bit 24 23 ----- 0
If M 0 then E 0 Else If M24 1 then
11.xxx Shift Right M by 1 01.1xxx E E
1 Else While M23 0 do 00.01xx Shift left
M by 1 E E -1 01.xx End While End If
Bit 24 23 ----- 0
4. Normalize
25
IEEE Floating Point Multiplication Examples
(simpler than addition!)
Ex. 1 6.0 102 Þ Mult. Mantissas, Add
Exponents x 4.0 103 24.0 105
Ex. 2 1.01 23 No Alignment needed x
1.10 24 0 0 0
1 0 1 1 0 1 0 1. 1 1 1 0
27 No Normalization 1.11
27 (Rounding?)
26
IEEE Floating Point Multiplication Examples
Ex. 3 1.11 21 No Alignment! x 1.11
25 1 1 1
1 1 1 1 1 1 1 1. 0 0 0 1
26 Normalize (SR1) 0 1.1 0 0 0 1
27 1.10 27
27
IEEE Floating Point Multiplication Hardware
EA
EB
SA
SB
MA
MB
8
8
24
24
Add (Excess 127)
Sign Logic (XOR)
Integer Multiplier P
HOW?
8
24
Adjust
Normalize
Increment
SR (1)
24
8
EY
SY
MY
28
IEEE Floating Point Division Examples(similar to
Multiplication)
29
IEEE Floating Point Division Examples
0.1 0 22 Ex. 3 1.00 24
1.11 0 1.0 0 0 0 22 Q 1.11 22
- 0 1 1 1 2 r2
1 1 0 1 (failed) 7 16
1 0 0 0 - 0 1 1 1 0 0.1 0 22
R (ignored) Þ Normalize (SL1)
1.00 21
30
IEEE Floating Point Division Hardware
EA
EB
SA
SB
MA
MB
Subtract (Excess 127)
Sign Logic (XOR)
Integer Divider Q R
HOW?
separate norm. and exp. if R needed
Adjust
Normalize
Decrement
SL (1)
EY
SY
MY
31
Floating Point Extra Bit Errors
  • Bit Shifting for Align and Normalize can create
    wider words
  • Must be reduced to standard width result
  • Reduction creates error and bias, depending on
    method
  • Truncation
  • Rounding
  • Others

32
Extra Bit Errors - Examples
4-bit Addition Example
.1 1 0 1 20 Align (SR3) .0 0 0 1 1 0
1 0 23 .1 0 0 1 23 .1 0 0 1
.1 0 1 0 1 0 1 0
if 4-bit Add
Reducing width causes a small error
33
Extra Bit Errors - Examples
4-bit Subtraction Example
  • We must usually consider
  • Increased ALU / Reg width
  • Rounding method

34
Floating Point Status
  • Separate from the fixed point status bits
  • Extra information available
  • Overflow exp too large (add, mult)
  • Underflow exp too small (mult, div) 0
  • Zero (mult by 0, div by 0, add 0s, sub )
  • Sign sign of result (Not MSB)
  • NaN Not legal number (0/0, /) Invalid
    Result
  • Inexact due to rounding
Write a Comment
User Comments (0)
About PowerShow.com