Skip to content
  1. Sep 18, 2018
  2. Sep 05, 2018
  3. Aug 17, 2018
    • Simon Tatham's avatar
      remez.jl: update to work with Julia 1.0. · 757f906d
      Simon Tatham authored
      This program was originally developed for a much earlier version of
      Julia, and there have been a lot of changes to Julia semantics since
      then. But 1.0 is intended to be stable, so updating once to work with
      that should mean that no further updates are needed for a long time.
      
      Changes relevant to this script include new API calls for making
      arrays, evaluating Julia source-code expressions from strings, and
      parsing strings as numbers; scope and semantics changes requiring
      extra escaping in the @debug macro and some explicit 'global' to allow
      for-loops to update variables outside themselves; a syntax change for
      tuple types; and replacing 'beginswith' on strings with 'startswith'.
      
      There are a couple of remaining inconveniences with this version of
      the code. Firstly, the standard Julia 1.0 interpreter will consume a
      "--" option terminator on its command line even if it appears after
      the script name, so commands of the form 'julia remez.jl -- -1 1 ...'
      that worked in Julia 0.2 will no longer work. Adding an extra "--"
      before the script name works around this ('julia -- remez.jl -- ...')
      because the first "--" is seen by the Julia interpreter and the second
      goes to the script. But unfortunately that "--" can't be put in the #!
      line as well as the 'env', because of #! command line semantics. So
      users may have to work around this by explicitly invoking the
      interpreter.
      
      Secondly, Julia 1.0 has moved some mathematical functions (e.g. erf
      and gamma) out of its core library into the SpecialFunctions package,
      so any pre-existing command lines that used those functions will now
      need to qualify them with a package name, and be run on a Julia
      installation which has that package installed.
      
      Because Julia 0.4.x is still common (Ubuntu 16.04 and 18.04 both
      provide 0.4.5), I've included some backwards-compatibility code so
      that the script still runs on that version as well.
      757f906d
    • Wilco Dijkstra's avatar
      Ensure HAVE_FAST_FMA is set on AArch64 · 5175759c
      Wilco Dijkstra authored
      If math.h doesn't set FP_FAST_FMA correctly, ensure HAVE_FAST_FMA is set on AArch64.
      5175759c
  4. Aug 08, 2018
  5. Jul 30, 2018
    • Szabolcs Nagy's avatar
      Don't build tanf rredf and funder by default · 41ed0e60
      Szabolcs Nagy authored
      These are no longer maintained and only kept for WANT_SINGLEPREC build,
      which is useful for microcontrollers with single precision fpu only.
      
      Removed all tests that are not testing code in the default build.
      41ed0e60
    • Szabolcs Nagy's avatar
      Add separate HOST_CFLAGS, HOST_LDFLAGS and HOST_LDLIBS make variables · a202746b
      Szabolcs Nagy authored
      Don't use target flags when building host tools.  The example config.mk.dist
      is updated accordingly.
      a202746b
    • Szabolcs Nagy's avatar
      Don't build rtest by default · c5a8042e
      Szabolcs Nagy authored
      rtest is the only binary that is built for the host and has additional
      dependencies: mpfr and mpc.  To avoid build issues only build it if
      the randomized tests are run.  New build target is introduced for
      randomized testing, so make check works without rtest.
      
      Tests are no longer copied to the build directory and the runtest.sh
      tool is removed as it is not very useful.
      c5a8042e
    • Szabolcs Nagy's avatar
      Flush stdout after each benchmark in mathbench · 5f82bffe
      Szabolcs Nagy authored
      If lot of benchmarks are run and the output is not a tty then the results
      only showed up after the stdio buffer was full.  With fflush if the
      output is redirected and mathbench is killed the results before the kill
      are still available.
      5f82bffe
  6. Jul 10, 2018
    • Szabolcs Nagy's avatar
      Remove float compare option from sincosf · fce09974
      Szabolcs Nagy authored
      PREFER_FLOAT_COMPARISON setting was not correct as it could raise
      spurious exceptions.  Fixing it is easy: just use ISLESS(x, y) instead
      of abstop12(x) < abstop12(y) with appropriate non-signaling definition
      for ISLESS.  However it seems this setting is not very useful (there is
      only minor performance difference on various architectures), so remove
      this option for now.
      fce09974
    • Szabolcs Nagy's avatar
      Fix the documentation comments for log_inline in pow · ae8bc7d0
      Szabolcs Nagy authored
      There was a typo and the arguments were not explained clearly.
      ae8bc7d0
  7. Jul 04, 2018
    • Wilco Dijkstra's avatar
      Fix namespace issues in sincosf · 3262ef23
      Wilco Dijkstra authored
      Use const sincos_t for clarity instead of making the typedef const.
      Use __inv_pi4 and __sincosf_table to avoid namespace issues with
      static linking.
      3262ef23
    • Szabolcs Nagy's avatar
      Change the return type of converttoint and document the semantics · dd178df0
      Szabolcs Nagy authored
      The roundtoint and converttoint internal functions are only called with small
      values, so 32 bit result is enough for converttoint and it is a signed int
      conversion so the natural return type is int32_t.
      
      The original idea was to help the compiler keeping the result in uint64_t,
      then it's clear that no sign extension is needed and there is no accidental
      undefined or implementation defined signed int arithmetics.
      
      But it turns out gcc does a good job with inlining so changing the type has
      no overhead and the semantics of the conversion is less surprising this way.
      Since we want to allow the asuint64 (x + 0x1.8p52) style conversion, the top
      bits were never usable and the existing code ensures that only the bottom
      32 bits of the conversion result are used.
      dd178df0
    • Szabolcs Nagy's avatar
      More detailed documentation comments · bc4b9012
      Szabolcs Nagy authored
      Rewrote some documentation text and fixed a GNU style issue based on
      feedback from Joseph Myers on libc-alpha mailing list.
      bc4b9012
  8. Jul 03, 2018
    • Szabolcs Nagy's avatar
      Fix large ulp error in pow without fma very near 1.0 · 2105bad5
      Szabolcs Nagy authored
      The !HAVE_FAST_FMA code path split r = z/c - 1 into r = rhi + rlo such
      that when z = 1-tiny and c = 1 then rlo and rhi could have much larger
      magnitude than r which later caused large rounding errors.
      
      So do a nearest rounding instead of truncation at the split.
      2105bad5
  9. Jun 29, 2018
  10. Jun 27, 2018
    • Szabolcs Nagy's avatar
      Add exp trace · 4aa92161
      Szabolcs Nagy authored
      Extracted 16000 samples from several exp call traces.
      First half is quite generic, second half has large number of zeros.
      4aa92161
  11. Jun 25, 2018
    • Szabolcs Nagy's avatar
      Fix pow error bound · 5049bfaf
      Szabolcs Nagy authored
      The pow error bound was miscalculated, it is slightly below 0.54 ULP
      when using fma and slightly above it without fma.
      
      If 2^-400 < |pow(x,y)| < 2^400 then the error is less than 0.52 ULP
      with fma.
      5049bfaf
  12. Jun 22, 2018
    • Szabolcs Nagy's avatar
      Fix gnu code style in pow.c · 76fd080f
      Szabolcs Nagy authored
      76fd080f
    • Szabolcs Nagy's avatar
      Improve pow implementation · a6230320
      Szabolcs Nagy authored
      The log part of pow got rewritten to use a slightly different algorithm.
      This improves precision and throughput while keeps the same table size.
      
      Near 1 cases are no longer special cased, there is a slight performance
      regression in that case.  And when the fma instruction is not available
      this algorithm is expected to have slightly worse performance.
      
      Worst-case error improved from 0.67 ULP to 0.57 ULP.
      
      On Cortex-A72 i see
      thruput near 1:  7% worse
      latency near 1:  2% worse
      thruput general: 8% better
      latency general: 2% better
      a6230320
  13. Jun 20, 2018
  14. Jun 15, 2018
    • Szabolcs Nagy's avatar
      Fix spurious underflow in exp without fma · 2117b832
      Szabolcs Nagy authored
      The last multiplication in exp and exp2 could underflow when it was not
      contracted into an fma.  Changed the thresholds so the problematic cases
      end up in the specialcase code path (which handles underflow correctly).
      
      The initial check now only looks at the exponent bits which has slightly
      better performance on aarch64.  The overflow threshold can be tight for
      exp2, but was let loose in exp so the specialcase handling got updated
      accordingly.
      
      Added comments about this issue and the assumptions exp_inline is making
      in pow.
      2117b832
    • Szabolcs Nagy's avatar
      Fix the opt_barrier_ function prototypes · f6717402
      Szabolcs Nagy authored
      The portable function prototype was wrong.
      f6717402
  15. Jun 13, 2018
    • Szabolcs Nagy's avatar
      Add trace support to mathbench · 9159cf25
      Szabolcs Nagy authored
      Previously only uniform random and linear (equidistant) input generators
      were supported.  Now with "-g trace" numbers are read from stdin or with
      "-f file" they are read from file, so previously recorded call traces
      can be benchmarked.
      
      Only the first 8000 numbers are used and if there are less then the
      numbers are repeated to fill up the benchmark array.
      9159cf25
    • Szabolcs Nagy's avatar
      Fix floating-point exceptions with clang · 5fa69e1d
      Szabolcs Nagy authored
      Add optimization barriers to prevent constant folding and code moving
      transformations.  Use opt_barrier_ and force_eval_ consistently to work
      around fenv semantics breaking optimizations.
      5fa69e1d
    • Szabolcs Nagy's avatar
      Always compile mathtest with -fmath-errno · 21f63567
      Szabolcs Nagy authored
      To make the errno tests meaningful mathtest needs to be compiled with
      errno support otherwise (math_errhandling & MATH_ERRNO) can be 0.
      21f63567
  16. Jun 12, 2018
    • Szabolcs Nagy's avatar
      Add _finite symbol aliases for better glibc compatibility · b7d568d7
      Szabolcs Nagy authored
      The glibc math.h header can redirect math symbols (e.g. with the
      -ffinite-math-only cflag) to _finite symbols which have the same
      semantics as the original ones except they may omit some error checks.
      
      With these aliases libmathlib can be a drop in replacement for glibc
      libm at link time.  It's not a full replacement, but it can interpose
      glibc libm calls using LD_PRELOAD in case of an already dynamic linked
      binary or using -lmathlib before -lm at link time.  This hopefully will
      make it easy to try libmathlib with a glibc based toolchain.
      
      Unfortunately with static linking the glibc _finite symbol definitions
      may conflict with the libmathlib definitions if other glibc internal
      symbols are used from the same object file where the _finite symbol is
      defined.  To work this around add glibc internal symbols as hidden
      aliases so the conflicting glibc object files don't get linked.
      This is a bit fragile since glibc can change its internal symbols and
      how they are used, but it only affects static linking, which always
      has this problem when symbols are interposed.
      b7d568d7
    • Szabolcs Nagy's avatar
      Add pow to mathlib.h · 875334a2
      Szabolcs Nagy authored
      Update mathlib.h to use GNU style declarations and add missing pow.
      875334a2
    • Szabolcs Nagy's avatar
      More portable default setting for HAVE_FAST_FMA · d7494f25
      Szabolcs Nagy authored
      FP_FAST_FMA is the standard way to decide if fma is fast, unfortunately
      in practice it can be defined even if fma is not inlined (e.g. with
      -fno-builtin-fma), and it does not really help if the libc has a single
      instruction implementation: the call overhead is too much.  Most of the
      time it is the correct check though and without configure time checks
      this is the closest we can get.
      d7494f25
  17. Jun 11, 2018
    • Szabolcs Nagy's avatar
      Add benchmark code · 764b4bfb
      Szabolcs Nagy authored
      Simple microbenchmark to measure the latency and throughput of math
      functions.
      764b4bfb
    • Szabolcs Nagy's avatar
      Add new pow implementation · ed0ecfff
      Szabolcs Nagy authored
      The algorithm is exp(y * log(x)), where log(x) is computed with about
      1.8*2^-66 relative error, returning the result in two doubles, and the
      exp part uses the same algorithm (and lookup tables) as exp, but takes
      the input as two doubles and a sign (to handle negative bases with odd
      integer exponent).
      
      There is separate code path when fma is not available but the worst
      case error is about 0.67 ULP in both cases.  The lookup table and
      consts for log are 4224 bytes, the code is 1196 bytes.  The non-nearest
      rounding error is less than 1 ULP.
      
      Improvements on Cortex-A72 compared to current glibc master:
      latency: 1.8x
      thruput: 2.5x
      ed0ecfff
  18. Jun 06, 2018
    • Szabolcs Nagy's avatar
      Remove sin, cos, tan random testing · f79ee89a
      Szabolcs Nagy authored
      We don't have implmentations for these functions anymore.  Directed
      tests are kept, they just test the host libc implementation for now.
      f79ee89a
    • Szabolcs Nagy's avatar
      Add new log2 implementation · d69e5045
      Szabolcs Nagy authored
      Similar algorithm is used as in log, but there are more operations
      (and more error) due to the 1/ln2 multiplier.
      
      There is separate code path when fma instruction is not available for
      computing x/c - 1 precisely, for which the table size is doubled,
      and to compute (x/c - 1)/ln2 precisely.
      
      The worst case error is 0.547 ULP (0.55 without fma), the read only
      global data size is 1168 bytes (2192 without fma).  The non-nearest
      rounding error is less than 1 ULP.
      
      Improvements on Cortex-A72 compared to current glibc master:
      log latency: 2.04x
      log thruput: 1.87x
      d69e5045
  19. Jun 05, 2018
Loading