Title: Floating-Point Arithmetic
1Floating-Point Arithmetic
if ((A A) - A A) SelfDestruct()
Reading Study Chapter 3.5 Skim 3.6-3.8
2What is the problem?
- Many numeric applications require numbers over a
VERY large range. (e.g. nanoseconds to centuries) - Most scientific applications require real numbers
(e.g. ?) - But so far we only have integers.
- We COULD implement the fractions explicitly
(e.g. ½, 1023/102934) - We COULD use bigger integers
- Floating point is a better answer for most
applications.
3Recall Scientific Notation
- Lets start our discussion of floating point by
recalling scientific notation from high school - Numbers represented in parts
- 42 4.200 x 101
- 1024 1.024 x 103
- -0.0625 -6.250 x 10-2
- Arithmetic is done in pieces
- 1024 1.024 x 103
- - 42 - 0.042 x 103
- 982 0.982 x 103
- 9.820 x 102
4Multiplication in Scientific Notation
- Is straightforward
- Multiply together the significant parts
- Add the exponents
- Normalize if required
- Examples
- 1024 1.024 x 103
- x 0.0625 6.250 x 10-2
- 64 6.400 x 101
- 42 4.200 x 101
- x 0.0625 6.250 x 10-2
- 2.625 26.250 x 10-1
- 2.625 x 100 (Normalized)
In multiplication, how far is the most you will
ever normalize? In addition?
5FP Binary Scientific Notation
- IEEE single precision floating-point format
-
- 0x42280000 in hexadecimal
- Exponent Unsigned Bias 127 8-bit integer E
Exponent 127 Exponent 10000100 (132)
127 5 - Significand Unsigned fixed binary point with
hidden-one Significand 1
0.01010000000000000000000 1.3125 - Putting it all together N -1S (1 F ) x
2E-127 -10 (1.3125) x 25 42
1 0 0 0 0 1 0 0
0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
FSignificand (Mantissa) - 1
E Exponent 127
S SignBit
6Example Numbers
- One Sign , Exponent 0, Significand 1.0
- -10 (1 .0) x 20 1 S 0, E 0 127, F
1.0 1 0 01111111 00000000000000000000000 0
x3f800000 - One-half Sign , Exponent -1, Significand
1.0 -10 (1 .0) x 2-1 ½ S 0, E -1 127,
F 1.0 1 0 01111110 000000000000000000000
00 0x3f000000 - Minus Two Sign -, Exponent 1, Significand
1.0 1 10000000 00000000000000000000000 0xc000
0000
7Zeros
- How do you represent 0?
- Zero Sign ?, Exponent ?, Significand ?
- Heres where the hidden 1 comes back to bite
you - Hint Zero is small. Whats the smallest number
you can generate? - E Exponent 127, Exponent -127, Signficand
1.0 10 (1.0) x 2-127 5.87747 x 10-39 - IEEE Convention
- When E 0 (Exponent -127), well interpret
numbers differently - 0 00000000 00000000000000000000000 0 not 1.0 x
2-127 - 1 00000000 00000000000000000000000 -0 not -1.0
x 2-127
Yes, there are 2 zeros. Setting E0 is also
used to represent a few other small numbers
besides 0. In all of these numbers there is no
hidden one assumed in F, and they are called
the unnormalized numbers. WARNING If you rely
these values you are skating on thin ice!
8Infinities
- IEEE floating point also reserves the largest
possible exponent to represent unrepresentable
large numbers - Positive Infinity S 0, E 255, F 0
- 0 11111111 00000000000000000000000 8
- 0x7f800000
- Negative Infinity S 1, E 255, F 0
- 1 11111111 00000000000000000000000 -8
- 0xff800000
- Other numbers with E 255 (F ? 0) are used to
represent exceptions or Not-A-Number (NAN)v-1,
-8 x 42, 0/0, 8/ 8, log(-5) - It does, however, attempt to handle a few special
cases - 1/0 8, -1/0 - 8, log(0) - 8
9Low-End of the IEEE Spectrum
2-bias
denorm gap
1-bias
-bias
2
2
2
0
normal numbers with hidden bit
The gap between 0 and the next representable
normalized number is much larger than the
gaps between nearby representable numbers. IEEE
standard uses denormalized numbers to fill in the
gap, making the distances between numbers
near 0 more alike.
2-bias
1-bias
-bias
2
2
0
2
p-1 bits of precision
p bits of precision
Denormalized numbers have a hidden 0 and a
fixed exponent of -126
S
-126
X (-1) 2 (0.F)
NOTE Zero is represented using 0 for the
exponent and 0 for the mantissa. Either, 0 or -0
can be represented, based on the sign bit.
10Floating point AINT NATURAL
- It is CRUCIAL for computer scientists to know
that Floating Point arithmetic is NOT the
arithmetic you learned since childhood - 1.0 is NOT EQUAL to 100.1 (Why?)
- 1.0 10.0 10.0
- 0.1 10.0 ! 1.0
- 0.1 decimal 1/16 1/32 1/256 1/512
1/4096 - 0.0 0011 0011 0011 0011 0011
- In decimal 1/3 is a repeating fraction 0.333333
- If you quit at some fixed number of digits, then
3 1/3 ! 1 - Floating Point arithmetic IS NOT associative
- x (y z) is not necessarily equal to (x y)
z - Addition may not even result in a change
- (x 1) MAY x
11Floating Point Disasters
- Scud Missiles get through, 28 die
- In 1991, during the 1st Gulf War, a Patriot
missile defense system let a Scud get through,
hit a barracks, and kill 28 people. The problem
was due to a floating-point error when taking the
difference of a converted scaled integer.
(Source Robert Skeel, "Round-off error cripples
Patriot Missile", SIAM News, July 1992.) - 7B Rocket crashes (Ariane 5)
- When the first ESA Ariane 5 was launched on June
4, 1996, it lasted only 39 seconds, then the
rocket veered off course and self-destructed. An
inertial system, produced a floating-point
exception while trying to convert a 64-bit
floating-point number to an integer. Ironically,
the same code was used in the Ariane 4, but the
larger values were never generated
(http//www.around.com/ariane.html). - Intel Ships and Denies Bugs
- In 1994, Intel shipped its first Pentium
processors with a floating-point divide bug. The
bug was due to bad look-up tables used to speed
up quotient calculations. After months of
denials, Intel adopted a no-questions replacement
policy, costing 300M. (http//www.intel.com/suppo
rt/processors/pentium/fdiv/)
12Floating-Point Multiplication
Step 1 Multiply significands Add exponents
ER E1 E2 -127 ExponentR 127
Exponent1 127 Exponent2 127
127 Step 2 Normalize result (Result of
1,2) 1.2) 1,4) at most we shift
right one bit, and fix exponent
S
E
F
S
E
F
24 by 24
Control
Subtract 127
round
Add 1
Mux(Shift Right by 1)
S
E
F
13Floating-Point Addition
14MIPS Floating Point
- Floating point Co-processor
- 32 Floating point registers
- separate from 32 general purpose registers
- 32 bits wide each.
- use an even-odd pair for double precision
- add.d fd, fs, ft fd fs ft in double
precision - add.s fd, fs, ft fd fs ft in single
precision - sub.d, sub.s, mul.d, mul.s, div.d, div.s, abs.d,
abs.s - l.d fd, address load a double from address
- l.s, s.d, s.s
- Conversion instructions
- Compare instructions
- Branch (bc1t, bc1f)
15Chapter Three Summary
- Computer arithmetic is constrained by limited
precision - Bit patterns have no inherent meaning but
standards do exist - twos complement
- IEEE 754 floating point
- Computer instructions determine meaning of the
bit patterns - Performance and accuracy are important so there
are many complexities in real machines (i.e.,
algorithms and implementation). - Accurate numerical computing requires methods
quite different from those of the math you
learned in grade school.