Title: Standardization and Testing of Mathematical Functions
1Standardization and Testing of Mathematical
Functions
- Victor Kuliamin
- Institute for System Programming, Russian Academy
of Sciences - Moscow
2Mathematical Modeling
- Astrophysics
- Geosciences
- Biosciences
- Social Sciences
- Confidence?
3Mathematical Libraries
- Floating-point numbers arithmetics
- Basic math functions
- Elementary
- Special
- Specialized libraries
- Linear algebra
- Number theory
- Numeric calculus
- Dynamic systems
- Optimization
sqrt, pow, exp, log, sin, atan, cosh,
erf, tgamma, j0, y1,
4IEEE 754 Floating-point Numbers
k1
n-1
0
1
k
n, k
0
0
1
1
1
1
1
1
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
S
E
M
sign
exponent
mantissa
B 2(k1) 1
- Normal E gt 0 E lt 2k 1 X
(1)S2(EB)(1M/2(nk1)) - Denormal E 0 X (1)S2(B1)(M/2(n
k1)) -
- Exceptional E 2k 1
- M 0 ?, ?
- M ? 0 NaN
2(1)1.1012 13/16 0,8125
0, -0
1/0 ?, (1)/0 ?
n 32, k 8 float (single precision) n
64, k 11 double n 79, k 15
extended double n 128, k 15 quadruple
0/0 NaN
1/2(n-k-1) 1 ulp
5IEEE 754 computations
- Correct rounding 4 rounding modes
- to ?
- to ?
- to 0
- to the nearest
- Exception flags
- INVALID Incorrect arguments (NaN result)
- DIVISION-BY-ZERO Infinite result (precise 8)
- OVERFLOW Too big result (approximate 8)
- UNDERFLOW Too small (or denormal) result
- INEXACT Inexact result
0
6Tested Libraries (ISO C, POSIX)
ID Processor arch Library OS
x86 i686 glibc 2.5 Linux Fedora
ia64 ia64 glibc 2.4 Linux Debian
x86_64 x86_64 glibc 2.3.4 Linux RHEL
s390 s390 glibc 2.4 Linux Debian
ppc64 ppc64 glibc 2.7 Linux Debian
ppc32 ppc32 glibc 2.3.5 Linux SLES
sparc UltraSparc III Solaris libc Solaris 10
VC8 x86_64 MS Visual C 2005 Windows XP
VC6 i686 MS Visual C 6.0 Windows XP
7TEST RESULTS DETAILS
rint(262144.25)? 262144
logb(2-1074) -1022
expm1(2.2250738585072e-308) 5.421010862427522e-2
0
to nearest
to 8
to 0
exp(553.8042397037792) -1.710893968937284e239
to 8
exp(-6.453852113757105e-02) 2.255531908873594e1
5
sin(33.63133354799544) 7.99995094799809616e22
erf(3.296656889776298) 8.035526204864467e8
cosh(627.9957549410666) -1.453242606709252e272
cos(917.2279304172412) -13.44757421002838
acos(-1.0) -3.141592653589794
erfc(-5.179813474865007) -3.419501182737284e287
sinh(29.22104351584205) -1.139998423128585e12
sin(- 1.793463141525662e-76) 9.801714032956058e-
2
Exact
6-210 ulp errors
Errors in exceptional cases
210-220 ulp errors
1 ulp errors
Errors for denormals
gt220 ulp errors
Unsupported
2-5 ulp errors
Completely buggy
8Differences in sin() implementations
9Differences in j0() implementations
10Test results similar implementations
Unsupported
11Interoperability base
- Standards
- IEEE 754 (Floating-point arithmetics)FP numbers,
basic operations - ISO 9899 (C language and libraries)56 real 16
complex functions - IEEE 1003.1 (POSIX)63 real 22 complex
functions - ISO 10697.1-3 (Language independent
arithmetics)Elementary real and complex functions
12Standard Requirements IEEE 754
- type conversions, , , , /, sqrt, remainder,
fma (2008) - Correctly rounded results
- 4 rounding modes
- Infinite results in overflow and precise infinity
cases - In overflow rounding to 0 returns the biggest
finite number - NaN results outside of function domain (and for
NaN args) - Exception flags
- INVALID, DIVISION-BY-ZERO, OVERFLOW, UNDERFLOW,
INEXACT
13Standard Requirements ISO C POSIX
- ISO/IEC 9899 (C language) 54 real functions
- Exact values sin(0) 0, log(1) 0,
- DIVISION-BY-ZERO flag log(0), atanh(1),
pow(0,x), ?(-n) - NaN results and INVALID flag outside of domains
- IEEE 1003.1 (POSIX) 63 real 22 complex
- All IEEE 754 flags (except for INEXACT) for real
functions - errno settingDomain error INVALID or
DIVISION-BY-ZERORange error OVERFLOW or
UNDERFLOW - If x is denormalf(x) x for each f(x)x in 0
(sin, asin, sinh, expm1) - In overflow HUGE_VAL should be returned(value of
HUGE_VAL unspecified)
glibc 8 MSVCRT max double
(1.797693134862316e308) Solaris libc max
float (3.402823466385289e38)
Inconsistency with rounding modes
Source of non-interoperability
14Standard Requirements ISO 10697
- Real and complex elementary functions (no erf,
gamma, j0, y1, ) - Only symmetric rounding modes (no rounding to ?
or to ?) - Preservation of sign
- Preservation of monotonicity
- Inaccuracy 0.5-2.0 ulp
- Evenness and oddity
- Exact values cosh(0) 1, log(1) 0,
- Asymptotics near 0 cos(x) 1, sin(x) x,
- Relations expm1 lt exp, cosh gt sinh, atan lt
?( p/2 ) ,
for sin, cos, tan small arguments only
15Summary of Requirements
- Domain boundaries and poles ( flags)
- Exact values, limits and asymptotics
- Preservation of sign and monotonicity
- SymmetriesEvenness, periodicity, others
?(1x) x?(x) - Relations and range boundaries
- Precision
- Correct rounding (according to mode)
- Computational accuracy
- Interoperability and portabilityof libraries and
applications - Feasible ia64 (Intel), crlibm (INRIA)
16inconsistencies
- Correct rounding
- Oddity (sym. with x, 1/x)
POSIX HUGE_VAL instead of 8
POSIX f(x) x for denormal x and f(x)x in 0
Range boundaries
17Standardization proposal
- Extension of IEEE 754 on all library functions
- Correctly rounded results for 4 modes
- Except for ones contradicting to range boundaries
- Infinite results in overflow and precise infinity
cases - In overflow rounding to 0 returns the biggest
finite number - NaN results outside of function domain (and for
NaN args) - Exception flags
- INVALID (and EDOM for errno) Incorrect
arguments - DIVISION-BY-ZERO (and ERANGE for errno)
Infinite result - OVERFLOW (and ERANGE for errno) Too big
result - UNDERFLOW (and ERANGE for errno) Too small
result ( dnr) - INEXACT Inexact result
18Test Data sources
- Bit structure of FP numbers
- Boundaries
- 0, -0, ?, -?, NaN
- Least and greatest positive and negative, normal
and denormal - Mantissa patterns
- FFFFFFFFFFFFF16 FFFFF1111000016
555550000FFFF16 - Both arguments and values of a function
- Intervals of uniform function behavior
- Points hard to compute correctly rounded result
rint(262144.25)? 262144 010000010001000000000000
0000000100000000000000000000000000000000 x10000010
001xxxxxxxxxxxxxxxxxx01000000000000000000000000000
00000
19Intervals and their boundaries
- Neighbourhoods of 0, 8
- Poles and overflow points
- Zeroes and extremes
- Tangents and asymtotics horizontal and diagonal
max
0
20Table Maker Dilemma (TMD)
- Rounding to the nearest
- f x.xxxxxxxxxx011111111...1xx...
- f x.xxxxxxxxxx100000000...0xx...
- Rounding to 0, ?, -?
- f x.xxxxxxxxxx00000000...0xx...
- f x.xxxxxxxxxx11111111...1xx...
0,5 ulp
!
?
- tan(1.11011111111111111111111111111111111111111111
0001111122-22) - 1.111000000000000000000000000000000000000000
0101010001 0 178 01022-22 - sin(1.11100000000000000000000000000000000000000111
0000100022-19) - 1.110111111111111111111111111111111111110000
0010111000 067 1110122-19 - j1(1.100000000000000000000000000000000000000000000
000001122-23) - 1.0111111111111111111111111111111111111111111
111101000 094 1100122-22
21Number of Hard Points
Eval. ? 0, ?, -? ? N
54 0.5 0 1
53 1 1 2
52 2 4 4
51 4 6 6
50 8 10 12
49 16 19 21
48 32 32 37
47 64 70 67
46 128 142 106
45 256 280 239
44 512 547 518
43 1024 1073 996
42 2048 2103 1985
41 4096 4187 4040
40 8192 8325 8142
- Probabilistic evaluation
- Uniform independent bits distribution
- Total N 2(n-k-1) values
- N2-m have m consecutive equal bits
- Real data for sin on exponent -16
22Hard Points Calculation
Feasible only for single precision numbers
- Exhaustive search
- Continued fractions (Kahan, 1983)
- Dyadic method (Tang, 1989 Kahan, 1994)
- Reduced search (Lefevre, 1997)
- Lattice reduction (Gonnet, 2002 Stehle, Lefevre,
Zimmermann, 2003) - Integer secants method (2007)
X Np X M2m 2(n k 1) lt M lt 2(n
k) ? p (2mM)/N
sqrt(N2m) M ½ 2(n-k-1) lt M, N lt 2(n-k)
? 2(m2)N (2M 1)2 j ? (2M 1)2
j (mod 2(m2))
F(x) f(x) ax b c1x2 c2x3 c3x4
F(x) c1(G(x) )2, G(x) x d1x2 d2x3
G(x) y ? x H(y), H is the reversed
series xm H(sqrt(m/c12z)) ? F(xm) axm b
m/2z
33864178045159811206438920823311565991202393932998
38035242121518428537554064774221620930267583474709
60206804568602636298927181441186370849986972132271
59466226343020116976329729079225588927108306160340
38541342154669787134871905353772776431251615694251
273653 p/2 1.011010101100010110110010011000
1011001010000111111110 1857 01122849 sin(1.0110
1010110001011011001001100010110010100001111111112
2849) 1.1111111111111111111111111111111111111111
1111111111 169011022-1
- j 15
- sqrt(1.0010010101100101011001011100101011011100101
1111101002) - 1.0001001000001111100110011001111010011001001101
110100 0 150 0002
2z
23Test suite composition
- Hard points
- double
- Some hard points with 48 additional bits can be
found in crlibm tests http//lipforge.ens-lyon.fr
/projects/crlibm - Calculated (some) hard points with 40
additional bits for sqrt, cbrt, sin, asin, cos,
acos, tan, atan, sinh, asinh, cosh, tanh, atanh,
exp, log, exp2, expm1, log1p, erf, erfc, j0, j1 - float (single precision)
- All hard points with 17 additional bits for
sqrt, cbrt, exp, sin, cos - extended double
- All with 53 additional bits for sqrt, some for
sin, exp - Test suites developed
- double all 37 single real variable POSIX
functions - Correct values calculated by Maple and MPFR
 sqrt exp sin atan lgamma j1
Boundary 20 20 20 20 20 20
Intrevals 106 1622 3674 4242 11680 24538
Patterns 141009 138451 331744 155008 121502 109036
Hard points 170170 28587 62342 95512 0 29436
Other 84820 0 4616 0 229 5664
Total 396125 168680 402396 254782 133431 168694
Â
24Conclusion
- No adequate standards for math librariesSeveral
standards, sometimes inconsistent, highly
incomplete - Correct rounding is needed for interoperability
- Test suites are useful even without standard
- Complete set of hard points for some function
- Multiple variable functions
25- Contact
- E-mail kuliamin_at_ispras.ru
- Web www.ispras.ru/kuliamin
- Thank you!
- Questions?