- Sep 18, 2018
-
-
Szabolcs Nagy authored
checkint in pow is not supposed to be used with 0, inf or nan inputs.
-
- Sep 05, 2018
-
-
Szabolcs Nagy authored
Add comments with enough detail so the log lookup tables can be recreated.
-
- Aug 17, 2018
-
-
Simon Tatham authored
This program was originally developed for a much earlier version of Julia, and there have been a lot of changes to Julia semantics since then. But 1.0 is intended to be stable, so updating once to work with that should mean that no further updates are needed for a long time. Changes relevant to this script include new API calls for making arrays, evaluating Julia source-code expressions from strings, and parsing strings as numbers; scope and semantics changes requiring extra escaping in the @debug macro and some explicit 'global' to allow for-loops to update variables outside themselves; a syntax change for tuple types; and replacing 'beginswith' on strings with 'startswith'. There are a couple of remaining inconveniences with this version of the code. Firstly, the standard Julia 1.0 interpreter will consume a "--" option terminator on its command line even if it appears after the script name, so commands of the form 'julia remez.jl -- -1 1 ...' that worked in Julia 0.2 will no longer work. Adding an extra "--" before the script name works around this ('julia -- remez.jl -- ...') because the first "--" is seen by the Julia interpreter and the second goes to the script. But unfortunately that "--" can't be put in the #! line as well as the 'env', because of #! command line semantics. So users may have to work around this by explicitly invoking the interpreter. Secondly, Julia 1.0 has moved some mathematical functions (e.g. erf and gamma) out of its core library into the SpecialFunctions package, so any pre-existing command lines that used those functions will now need to qualify them with a package name, and be run on a Julia installation which has that package installed. Because Julia 0.4.x is still common (Ubuntu 16.04 and 18.04 both provide 0.4.5), I've included some backwards-compatibility code so that the script still runs on that version as well.
-
Wilco Dijkstra authored
If math.h doesn't set FP_FAST_FMA correctly, ensure HAVE_FAST_FMA is set on AArch64.
-
- Aug 08, 2018
-
-
Wilco Dijkstra authored
Improve comments. Use TOINT_INTRINSICS rather than HAVE_FAST_ROUND.
-
- Jul 30, 2018
-
-
Szabolcs Nagy authored
These are no longer maintained and only kept for WANT_SINGLEPREC build, which is useful for microcontrollers with single precision fpu only. Removed all tests that are not testing code in the default build.
-
Szabolcs Nagy authored
Don't use target flags when building host tools. The example config.mk.dist is updated accordingly.
-
Szabolcs Nagy authored
rtest is the only binary that is built for the host and has additional dependencies: mpfr and mpc. To avoid build issues only build it if the randomized tests are run. New build target is introduced for randomized testing, so make check works without rtest. Tests are no longer copied to the build directory and the runtest.sh tool is removed as it is not very useful.
-
Szabolcs Nagy authored
If lot of benchmarks are run and the output is not a tty then the results only showed up after the stdio buffer was full. With fflush if the output is redirected and mathbench is killed the results before the kill are still available.
-
- Jul 10, 2018
-
-
Szabolcs Nagy authored
PREFER_FLOAT_COMPARISON setting was not correct as it could raise spurious exceptions. Fixing it is easy: just use ISLESS(x, y) instead of abstop12(x) < abstop12(y) with appropriate non-signaling definition for ISLESS. However it seems this setting is not very useful (there is only minor performance difference on various architectures), so remove this option for now.
-
Szabolcs Nagy authored
There was a typo and the arguments were not explained clearly.
-
- Jul 04, 2018
-
-
Wilco Dijkstra authored
Use const sincos_t for clarity instead of making the typedef const. Use __inv_pi4 and __sincosf_table to avoid namespace issues with static linking.
-
Szabolcs Nagy authored
The roundtoint and converttoint internal functions are only called with small values, so 32 bit result is enough for converttoint and it is a signed int conversion so the natural return type is int32_t. The original idea was to help the compiler keeping the result in uint64_t, then it's clear that no sign extension is needed and there is no accidental undefined or implementation defined signed int arithmetics. But it turns out gcc does a good job with inlining so changing the type has no overhead and the semantics of the conversion is less surprising this way. Since we want to allow the asuint64 (x + 0x1.8p52) style conversion, the top bits were never usable and the existing code ensures that only the bottom 32 bits of the conversion result are used.
-
Szabolcs Nagy authored
Rewrote some documentation text and fixed a GNU style issue based on feedback from Joseph Myers on libc-alpha mailing list.
-
- Jul 03, 2018
-
-
Szabolcs Nagy authored
The !HAVE_FAST_FMA code path split r = z/c - 1 into r = rhi + rlo such that when z = 1-tiny and c = 1 then rlo and rhi could have much larger magnitude than r which later caused large rounding errors. So do a nearest rounding instead of truncation at the split.
-
- Jun 29, 2018
-
-
Szabolcs Nagy authored
Explain the semantics of internal functions.
-
Szabolcs Nagy authored
Whitespace changes only.
-
- Jun 27, 2018
-
-
Szabolcs Nagy authored
Extracted 16000 samples from several exp call traces. First half is quite generic, second half has large number of zeros.
-
- Jun 25, 2018
-
-
Szabolcs Nagy authored
The pow error bound was miscalculated, it is slightly below 0.54 ULP when using fma and slightly above it without fma. If 2^-400 < |pow(x,y)| < 2^400 then the error is less than 0.52 ULP with fma.
-
- Jun 22, 2018
-
-
Szabolcs Nagy authored
-
Szabolcs Nagy authored
The log part of pow got rewritten to use a slightly different algorithm. This improves precision and throughput while keeps the same table size. Near 1 cases are no longer special cased, there is a slight performance regression in that case. And when the fma instruction is not available this algorithm is expected to have slightly worse performance. Worst-case error improved from 0.67 ULP to 0.57 ULP. On Cortex-A72 i see thruput near 1: 7% worse latency near 1: 2% worse thruput general: 8% better latency general: 2% better
-
- Jun 20, 2018
-
-
Szabolcs Nagy authored
uint64_t works too, but the correct type is uint32_t.
-
Wilco Dijkstra authored
Add trace for sinf/cosf/sincosf with easy, hard and medium cases extracted from a trace of 1.4 million calls.
-
Wilco Dijkstra authored
Improve support for large traces by reading all of the trace and splitting it into smaller parts. Also remove the B arrays which are not required.
-
Wilco Dijkstra authored
Use int32_t rather than int since we require it to be 32 bits here.
-
Szabolcs Nagy authored
The sign needs to be fixed even in nearest rounding mode.
-
- Jun 15, 2018
-
-
Szabolcs Nagy authored
The last multiplication in exp and exp2 could underflow when it was not contracted into an fma. Changed the thresholds so the problematic cases end up in the specialcase code path (which handles underflow correctly). The initial check now only looks at the exponent bits which has slightly better performance on aarch64. The overflow threshold can be tight for exp2, but was let loose in exp so the specialcase handling got updated accordingly. Added comments about this issue and the assumptions exp_inline is making in pow.
-
Szabolcs Nagy authored
The portable function prototype was wrong.
-
- Jun 13, 2018
-
-
Szabolcs Nagy authored
Previously only uniform random and linear (equidistant) input generators were supported. Now with "-g trace" numbers are read from stdin or with "-f file" they are read from file, so previously recorded call traces can be benchmarked. Only the first 8000 numbers are used and if there are less then the numbers are repeated to fill up the benchmark array.
-
Szabolcs Nagy authored
Add optimization barriers to prevent constant folding and code moving transformations. Use opt_barrier_ and force_eval_ consistently to work around fenv semantics breaking optimizations.
-
Szabolcs Nagy authored
To make the errno tests meaningful mathtest needs to be compiled with errno support otherwise (math_errhandling & MATH_ERRNO) can be 0.
-
- Jun 12, 2018
-
-
Szabolcs Nagy authored
The glibc math.h header can redirect math symbols (e.g. with the -ffinite-math-only cflag) to _finite symbols which have the same semantics as the original ones except they may omit some error checks. With these aliases libmathlib can be a drop in replacement for glibc libm at link time. It's not a full replacement, but it can interpose glibc libm calls using LD_PRELOAD in case of an already dynamic linked binary or using -lmathlib before -lm at link time. This hopefully will make it easy to try libmathlib with a glibc based toolchain. Unfortunately with static linking the glibc _finite symbol definitions may conflict with the libmathlib definitions if other glibc internal symbols are used from the same object file where the _finite symbol is defined. To work this around add glibc internal symbols as hidden aliases so the conflicting glibc object files don't get linked. This is a bit fragile since glibc can change its internal symbols and how they are used, but it only affects static linking, which always has this problem when symbols are interposed.
-
Szabolcs Nagy authored
Update mathlib.h to use GNU style declarations and add missing pow.
-
Szabolcs Nagy authored
FP_FAST_FMA is the standard way to decide if fma is fast, unfortunately in practice it can be defined even if fma is not inlined (e.g. with -fno-builtin-fma), and it does not really help if the libc has a single instruction implementation: the call overhead is too much. Most of the time it is the correct check though and without configure time checks this is the closest we can get.
-
- Jun 11, 2018
-
-
Szabolcs Nagy authored
Simple microbenchmark to measure the latency and throughput of math functions.
-
Szabolcs Nagy authored
The algorithm is exp(y * log(x)), where log(x) is computed with about 1.8*2^-66 relative error, returning the result in two doubles, and the exp part uses the same algorithm (and lookup tables) as exp, but takes the input as two doubles and a sign (to handle negative bases with odd integer exponent). There is separate code path when fma is not available but the worst case error is about 0.67 ULP in both cases. The lookup table and consts for log are 4224 bytes, the code is 1196 bytes. The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: latency: 1.8x thruput: 2.5x
-
- Jun 06, 2018
-
-
Szabolcs Nagy authored
We don't have implmentations for these functions anymore. Directed tests are kept, they just test the host libc implementation for now.
-
Szabolcs Nagy authored
Similar algorithm is used as in log, but there are more operations (and more error) due to the 1/ln2 multiplier. There is separate code path when fma instruction is not available for computing x/c - 1 precisely, for which the table size is doubled, and to compute (x/c - 1)/ln2 precisely. The worst case error is 0.547 ULP (0.55 without fma), the read only global data size is 1168 bytes (2192 without fma). The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: log latency: 2.04x log thruput: 1.87x
-
- Jun 05, 2018
-
-
Szabolcs Nagy authored
-
Szabolcs Nagy authored
Accidentally published with wrong name.
-