A gentle introduction to floating point arithmetic - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

A gentle introduction to floating point arithmetic

Description:

This series can converge to either 5, 6, and 100. Depends on the value of x0, x1. If x0 = 11/2, x1 = 61/11, the series should be converged to 6 ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 16

Provided by: chho3

Category:

more less

Transcript and Presenter's Notes

Title: A gentle introduction to floating point arithmetic

1
A gentle introduction to floating point arithmetic

Ho Chun Hok (cho_at_doc.ic.ac.uk)
Custom Computing Group Seminar25 Nov 2005

2
IEEE 754 floating point standard
vsize
esize
fsize
Let v

Normal numbers (when exponent gt 0 and lt max
exponent)
v (-1)s x 2exponent x (1.fraction)
Subnormal numbers (when exponent 0)
v (-1)s x 2exponent x (0.fraction)
Special numbers (when exponent max exponent)
Infinity, Nan (not a number)
precisions
Single esize 8, fsize 23, vsize 32
Double esize 11, fsize 52, vsize 64
Double extended, vsize gt 64
Operations
, -, x, /, sqrt, f2i, i2f, compare, f2d, d2f
Rounding
Nearest even, inf, -inf, towards 0

3
What IEEE 754 standard supposes to be

Approximation to real number with expected error
the epsilon can of any real number can be
determined when mapping to floating point number
Results of all operations can be correctly
rounded in case of inexact result
Ensure some math properties hold (in general),
xyyx, -(-a) a, agtb cgt0 ? ac gt bc,
x0 x, yy gt 0
All exception can be detected
Using exception flags
Same results across different machines

4
How to ensure the standard?

Processor?
Rounding numbers in different mode
Gradual underflow
Raise exceptions
Operating System?
Handle exception
Handle function which may not be supported in
hardware (what if a processor cannot handle
subnormal number)
Keep track of the floating point unit state,
(precision, rounding mode)
Programming Language?
Well-defined semantic for floating point (yes, we
have infamous JAVA language)
Compiler?
Preserve the semantic defined in that language
Programmer?
read What Every Computer Scientist Should Know
About Floating-Point Arithmetic

5
Case study 1

int main (void)
double ref,index
double tmp
int i
ref (double) 169.0/ (double) 170.0
for(i0ilt250i)
indexi
if(ref (double) (index/(index1.0)) )
break
printf("id\n", i)
return 0

6
Visual C compiler, running on P-M

Same result on lulu (pentium 3) and irina (Xeon)

7
GCC, running on Pentium 4 (skokie)
8
VCC fld qword ptr ebp-10h fadd
qword ptr __real_at_8_at_3fff8000000000000000
(00426028) fdivr qword ptr ebp-10h fcomp
qword ptr ebp-8 fnstsw ax test
ah,40h je main5Fh (0040106f) jmp
main61h (00401071)

gcc
fld1
faddp st,st(1)
fldl 0xfffffff0(ebp)
fdivp st,st(1)
fldl 0xfffffff8(ebp)
fxch st(1)
fucompp
fnstsw ax
and 0x45,ah
cmp 0x40,ah
je 0x80483d2 ltmain102gt

VCC use normal stack (ebp) (64-bits) to store the
result, and compare with a 64bit double precision
value GCC use advanced FPU register stack (st)
(80-bits) to store the result, and compare with a
64bit double precision value
9
Case study 1

Its compiler issue
Using more precision to calculate the
intermediate result is a good idea
Compiler should convert the 80-bit floating point
number to 64-bit before comparison
And its programmer issue too
Equality test between FP variables is dangerous
We can detect the problem before it hurts.
It is not easy to compliance with the standard

10
Case study 2

Calculate
When x is large, result 0, rather than
Beware, even everything compliance with standard,
the standard cannot guarantee the result is
always correct
Again, programmer should detect this before it
hurts
Define routine to trap the exception
Exceptions are not errors as long as they are
handled correctly

11
Case Study 3

Jean-Michel Mullers Recurrence

12
Using double extended precision (80-bits)

x2 5.590164e00
x3 5.633431e00
x4 5.674649e00
x5 5.713329e00
x6 5.749121e00
x7 5.781811e00
x8 5.811314e00
x9 5.837660e00
x10 5.861018e00

x11 5.882514e00
x12 5.918471e00
x13 6.240859e00
x14 1.115599e01
x15 5.279838e01
x16 9.469105e01
x17 9.966651e01
x18 9.998007e01
x19 9.999881e01

Converge to 100.0, it seems correct
13
Case Study 3

This series can converge to either 5, 6, and 100
Depends on the value of x0, x1
If x0 11/2, x1 61/11, the series should be
converged to 6
Little round off error may affect the result
dramatically
We can calculate the result analytically by
substituting
In general, its very difficult to detect this
error

14
Case Study 4

Table makers dilemma
If we want n-digit accuracy for elementary
function like sine, cosine, we (in most case)
need to calculate the digit up to n2 digit
What if the last 2-digit is 10?
We can calculate last 3 digit
What if the last 3-digit is 100?
We can calculate last 4 digit
What if
The result of most elementary function library
(e.g. libm) is not correctly rounded in some
cases

15
Conclusion

We have a standard representation for floating
point number
Comforting the standard requires collaboration
between different parties
Even if we have standard-compliance platform,
cautions must be taken when underflow,
overflow, and make sure the algorithm is
numerically stable
When using elementary function, dont expect the
result can be comparable between different
machine
Elementary function is NOT included in the IEEE
standard
Floating point, when use properly, can do
something serious