Title: Floating Point Errors
1Floating Point Errors
Example B2 and E128
Even though 10-7 can be represented accurately by
itself, it cannot accurately be added to a much
larger number
2Roundoff Errors
Often, simple decimal fractions do not have a
finite binary representation.
For example 1/10 does not have a finite binary
representation 1/10 0.00011001100110011. Rep
eating binary
Rational fractions can be exactly expressed with
a finite number of bits if and only if they can
be expressed as qp/2n with p and n being
integers
Proof Assume we can express q with a finite
number of bits
3Roundoff Errors
For example 1/10 cannot be written as
p/2n 1/10 0.00011001100110011. Repeating
binary
Infinite binaries cannot be stored in a
finite-sized computer
4Roundoff Errors and Floating Point Arithmetic
Program main implicit none integer i real
x, dx x 0.0 dx 0.1 do i 1, 100
x x dx write(,) i ,i, x
,x end do end program main
i 10 x 1.00000012 i 20 x
2.00000024 i 30 x 2.99999928 i 40
x 3.99999833 i 50 x
4.99999762 i 60 x 5.99999666 i 70
x 6.99999571 i 80 x
7.99999475 i 90 x 8.99999809 i
100 x 10.0000019
Errors are introduced due to the truncation
(rounding) of the number x0.1 in its binary
representation
5Roundoff Errors and Floating Point Arithmetic
What is the error introduced by the rounding
Lets suppose we write the number x in normalized
binary notation
If the mantissa has 23 bits a nearby machine
number can be found by dropping the excess bits
a24a25
6Roundoff Errors and Floating Point Arithmetic
Another machine number can be obtained by
rounding up. It is found by adding one unit to
a23 in the expression for x-
Lets assume that x lies closer to x-
7Roundoff Errors and Floating Point Arithmetic
In this case, the relative error is bounded by
where ? 2-t is called the machine precision.
t is the number of bits of the mantissa
For our 32-bit example t is typically 23 --gt ?
2-23 1.2 10-7
8Floating Point Accuracy (32 Bit)
Typically with 32 bits (4 bytes) we can represent
numbers in the range -3.402823x1038 to -1.17549
5x10-38 0.0 1.175495x10-38
to 3.402823x1038
The precision is related to the number of bits of
the mantissa, which is about 6-9 decimal digits
We can represent this number 1.23456 But we
cannot represent 1.23456789
9Floating Point Accuracy (64 Bit)
Typically with 64 bits (8 bytes) we can represent
numbers in the range -1.797693x10308 to -2.225
073x10-308 0.0 2.225073x10-308 to 1.797693x
10308
The precision in decimal digits is about 15-17
10Roundoff Errors and Floating Point Arithmetic
We have seen that the relative error can be
expressed as
With this rd(x) can be estimated as
11Errors in Floating Point Arithmetic
Floating point operations on modern hardware are
defined as
where x and y are machine numbers and the
epsilones denote the association with each
operation
12Example of Floating Point Arithmetic
!Example 1.3 !Taken from Stoer and Bulirsch
p8 real4 a,b,c real4 x,y a
0.000023371258 b 33.678429 c
33.677811 x a(bc) y (ab)c print,'
' print,'a(bc) ',x print,'(ab)
c ',y print,'Correct answer is abc
0.00641371258' print,'' end
13Error Propagation in Floating Point Arithmetic
In the last example, we had calculated y abc
two ways
1st way ya(bc) 2nd way y(ab)c
1st way
14Error Propagation in Floating Point Arithmetic
Relative error (neglecting higher order terms in
epsilon)
Depending on whether bc or ab is the
smaller of the two, it is better to calculate
a(bc) rather than (ab)c
15Input Errors and Condition Number
How do errors in the input data affect the
results of an algorithm ??
Lets assume that the algorithm ? takes a vector
of real numbers x (x1, x2, x3, , xn) into
an output vector y (y1, y2, y3, , yn)
Lets assume that the ?i have continuous 1st
derivatives.
16Input Errors and Condition Number
Lets define the relative input and output errors
as
Expand ?(x) in a Taylor series and neglect higher
order terms
17Input Errors and Condition Number
If anyone of the condition numbers has large
absolute values ? ill-conditioned
Otherwise ? well-conditioned
18Input Errors and Condition Number
Example Consider the following algorithm
Lets assume that the values of a and b have some
error ?a and ?b
The 2 condition numbers are
19Input Errors and Condition Number
With these two condition numbers the relative
error ?y becomes
Case 1 a gt 0
The condition number remains of the same order as
?a and ?b
?Algorithm is well-conditioned
Case 2 a -b2
The condition number becomes large
?Algorithm is ill-conditioned