A gentle introduction to floating point arithmetic - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

A gentle introduction to floating point arithmetic

Description:

This series can converge to either 5, 6, and 100. Depends on the value of x0, x1. If x0 = 11/2, x1 = 61/11, the series should be converged to 6 ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 16
Provided by: chho3
Category:

less

Transcript and Presenter's Notes

Title: A gentle introduction to floating point arithmetic


1
A gentle introduction to floating point arithmetic
  • Ho Chun Hok (cho_at_doc.ic.ac.uk)
  • Custom Computing Group Seminar25 Nov 2005

2
IEEE 754 floating point standard
vsize
esize
fsize
Let v
  • Normal numbers (when exponent gt 0 and lt max
    exponent)
  • v (-1)s x 2exponent x (1.fraction)
  • Subnormal numbers (when exponent 0)
  • v (-1)s x 2exponent x (0.fraction)
  • Special numbers (when exponent max exponent)
  • Infinity, Nan (not a number)
  • precisions
  • Single esize 8, fsize 23, vsize 32
  • Double esize 11, fsize 52, vsize 64
  • Double extended, vsize gt 64
  • Operations
  • , -, x, /, sqrt, f2i, i2f, compare, f2d, d2f
  • Rounding
  • Nearest even, inf, -inf, towards 0

3
What IEEE 754 standard supposes to be
  • Approximation to real number with expected error
  • the epsilon can of any real number can be
    determined when mapping to floating point number
  • Results of all operations can be correctly
    rounded in case of inexact result
  • Ensure some math properties hold (in general),
  • xyyx, -(-a) a, agtb cgt0 ? ac gt bc,
  • x0 x, yy gt 0
  • All exception can be detected
  • Using exception flags
  • Same results across different machines

4
How to ensure the standard?
  • Processor?
  • Rounding numbers in different mode
  • Gradual underflow
  • Raise exceptions
  • Operating System?
  • Handle exception
  • Handle function which may not be supported in
    hardware (what if a processor cannot handle
    subnormal number)
  • Keep track of the floating point unit state,
    (precision, rounding mode)
  • Programming Language?
  • Well-defined semantic for floating point (yes, we
    have infamous JAVA language)
  • Compiler?
  • Preserve the semantic defined in that language
  • Programmer?
  • read What Every Computer Scientist Should Know
    About Floating-Point Arithmetic

5
Case study 1
  • int main (void)
  • double ref,index
  • double tmp
  • int i
  • ref (double) 169.0/ (double) 170.0
  • for(i0ilt250i)
  • indexi
  • if(ref (double) (index/(index1.0)) )
    break
  • printf("id\n", i)
  • return 0

6
Visual C compiler, running on P-M
  • Same result on lulu (pentium 3) and irina (Xeon)

7
GCC, running on Pentium 4 (skokie)
8
VCC fld qword ptr ebp-10h fadd
qword ptr __real_at_8_at_3fff8000000000000000
(00426028) fdivr qword ptr ebp-10h fcomp
qword ptr ebp-8 fnstsw ax test
ah,40h je main5Fh (0040106f) jmp
main61h (00401071)
  • gcc
  • fld1
  • faddp st,st(1)
  • fldl 0xfffffff0(ebp)
  • fdivp st,st(1)
  • fldl 0xfffffff8(ebp)
  • fxch st(1)
  • fucompp
  • fnstsw ax
  • and 0x45,ah
  • cmp 0x40,ah
  • je 0x80483d2 ltmain102gt

VCC use normal stack (ebp) (64-bits) to store the
result, and compare with a 64bit double precision
value GCC use advanced FPU register stack (st)
(80-bits) to store the result, and compare with a
64bit double precision value
9
Case study 1
  • Its compiler issue
  • Using more precision to calculate the
    intermediate result is a good idea
  • Compiler should convert the 80-bit floating point
    number to 64-bit before comparison
  • And its programmer issue too
  • Equality test between FP variables is dangerous
  • We can detect the problem before it hurts.
  • It is not easy to compliance with the standard

10
Case study 2
  • Calculate
  • When x is large, result 0, rather than
  • Beware, even everything compliance with standard,
    the standard cannot guarantee the result is
    always correct
  • Again, programmer should detect this before it
    hurts
  • Define routine to trap the exception
  • Exceptions are not errors as long as they are
    handled correctly

11
Case Study 3
  • Jean-Michel Mullers Recurrence

12
Using double extended precision (80-bits)
  • x2 5.590164e00
  • x3 5.633431e00
  • x4 5.674649e00
  • x5 5.713329e00
  • x6 5.749121e00
  • x7 5.781811e00
  • x8 5.811314e00
  • x9 5.837660e00
  • x10 5.861018e00
  • x11 5.882514e00
  • x12 5.918471e00
  • x13 6.240859e00
  • x14 1.115599e01
  • x15 5.279838e01
  • x16 9.469105e01
  • x17 9.966651e01
  • x18 9.998007e01
  • x19 9.999881e01

Converge to 100.0, it seems correct
13
Case Study 3
  • This series can converge to either 5, 6, and 100
  • Depends on the value of x0, x1
  • If x0 11/2, x1 61/11, the series should be
    converged to 6
  • Little round off error may affect the result
    dramatically
  • We can calculate the result analytically by
    substituting
  • In general, its very difficult to detect this
    error

14
Case Study 4
  • Table makers dilemma
  • If we want n-digit accuracy for elementary
    function like sine, cosine, we (in most case)
    need to calculate the digit up to n2 digit
  • What if the last 2-digit is 10?
  • We can calculate last 3 digit
  • What if the last 3-digit is 100?
  • We can calculate last 4 digit
  • What if
  • The result of most elementary function library
    (e.g. libm) is not correctly rounded in some
    cases

15
Conclusion
  • We have a standard representation for floating
    point number
  • Comforting the standard requires collaboration
    between different parties
  • Even if we have standard-compliance platform,
    cautions must be taken when underflow,
    overflow, and make sure the algorithm is
    numerically stable
  • When using elementary function, dont expect the
    result can be comparable between different
    machine
  • Elementary function is NOT included in the IEEE
    standard
  • Floating point, when use properly, can do
    something serious
Write a Comment
User Comments (0)
About PowerShow.com