Commits · d8114c3bdf5f7cab17dffad0751d0bbf46f4530e · Android-smartphones / realme / realme 5pro / kaderbava / external_arm-optimized-routines

Jul 22, 2019

Szabolcs Nagy authored Jul 18, 2019

Simple script to visualize the output of the ulp test tool, e.g.

$ build/bin/ulp -e .0001 log 0.5 2.0 2345678 | math/tools/plot.py

d8114c3b

Reorganize the directory layout · 0af9fce0

Szabolcs Nagy authored Jul 18, 2019

To allow subprojects other than math, the build system and directory
layout is changed: all math related code, tools and tests are under
the math directory now, new subprojects should be similarly self-
contained.

The top level Makefile design is still kept, but the subproject build
directories provide their own Dir.mk with the build rules for the
subproject. The user interface of config.mk is kept for now, in the
future subproject specific flags and make variables may be added for
finer grained control.

0af9fce0

Jul 18, 2019

Remove math/single and rem_pio2 · 4f408e74

Szabolcs Nagy authored Jul 18, 2019

math/single contained code for systems without double precision fpu
and rem_pio2 is not used currently and likely will be designed
differently when double precision trigonometric functions are added.

4f408e74

Add simple ULP error check code · 3a1d8e6f

Szabolcs Nagy authored Jul 18, 2019

Check ULP error by random sampling and comparing against a higher
precision implmenetation.

This is similar to the randomized tests in the mathtest code, but it
runs on the target only and is much faster to allow exhaustive ULP
checks for single precision functions. It also supports non-nearest
rounding modes.

The ULP error is reported in an unconventional way: instead of the
difference between the observed rounded result and accurate result, the
minimum error is reported that makes the accurate result round to the
observed result. This is more useful for comparing errors across
different rounding modes. In nearest-rounding mode usually 0.5 has to
be added to the reported error to get the conventional ULP error.

The code optionally depends on mpfr. On targets where double has the
same format as long double, mpfr is required for testing double
precision functions. By default there is no dependency on mpfr on the
target, to use mpfr add -DUSE_MPFR to the CFLAGS and -lmpfr to LDLIBS.

ucheck is a new make target for running ulp error checks.

Typical usage and output of the new ulp tool:

$ build/bin/ulp -e .001 exp 1.0 2.0 12345
exp(0x1.79ef3658a63c9p+0) got 0x1.181caa32757a7p+2 want 0x1.181caa32757a6p+2 +0.499708 ulp err 0.000291756
exp(0x1.9c8a65340f80cp+0) got 0x1.40a8032e5f576p+2 want 0x1.40a8032e5f575p+2 +0.498903 ulp err 0.00109668
FAIL exp in [0x1p+0;0x1p+1] round n errlim 0.001 maxerr 0.00109668 +0.5 cnt 12345 cnt1 6 0.0486027% cnt2 0 0% cntfail 1 0.00810045%

Floating-point exceptions are not guaranteed to be reported accurately
and can be turned off by -f.

The implementation is generic over the argument types which complicates
the code, but at least the difficult inner loop logic is not repeated
many times this way.

To add support for a new function foo, the fun array needs to be updated
with an entry for foo. (This usually requires the functions foo, fool
and mpfr_foo to be defined.)

3a1d8e6f

Add scripts that can regenerate polynomial coefficients · d1db92e0
Szabolcs Nagy authored Jul 18, 2019
```
Scripts are known to work with sollya 6.0.
```
d1db92e0
Update the copyright year. · ded9791f
Szabolcs Nagy authored Jul 18, 2019

ded9791f

Jul 01, 2019

__ieee754_rem_pio2: store 32 more bits of 2/pi · c7a7debc

Simon Tatham authored May 03, 2019

When range-reducing a value of the maximum double precision exponent,
it was possible to read one word past the end of the const data array.
The error introduced was negligible, but of course overrunning an
array is a bug regardless.

c7a7debc

Dec 10, 2018

Change the powf overflow handling · 6e60567b

Szabolcs Nagy authored Dec 10, 2018

This fix is slightly more correct than the previous one (which introduced
a 1ulp error when rounding toward zero near the overflow limit) and has
slightly smaller code size.

6e60567b

Dec 07, 2018

More consistent excess precision handling · 04884bd0

Szabolcs Nagy authored Dec 07, 2018

The current code aims to support FLT_EVAL_METHOD!=0 targets (such as
i386 with x87 fpu or m68k) assuming appropriate narrowing eval functions
are defined for them. But the narrowing eval functions were not used
consistently: the return statement may not guarantee narrowing (e.g.
that was the C99 behaviour which got changed in C11 annex F) so we
should use the narrowing eval_as_ functions at return statements too.

Results should be correct if narrowing only happens at eval_as_ calls.
On most targets this change has no effect because eval_as_ is a noop.
Most math implementations that care about excess precision already
compile in a mode that narrows at returns so this change is not
necessary for them, just better documents the assumptions.

04884bd0

Fix powf overflow handling in non-nearest rounding mode · 75b8d8c6

Szabolcs Nagy authored Dec 07, 2018

The threshold value at which powf overflows depends on the rounding mode
and the current check did not take this into account. So when the result
was rounded away from zero it could become infinity without setting
errno to ERANGE.

Example: pow(0x1.7ac7cp+5, 23) is 0x1.fffffep+127 + 0.1633ulp

If the result goes above 0x1.fffffep+127 + 0.5ulp then errno is set,
which is fine in nearest rounding mode, but

  powf(0x1.7ac7cp+5, 23) is inf in upward rounding mode
  powf(-0x1.7ac7cp+5, 23) is -inf in downward rounding mode

and the previous implementation did not set errno in these cases.

This special case is fixed without affecting the common code path by
checking the rounding mode and using appropriate threshold value.
Arithmetics is used to check the rounding mode to avoid introducing a
stack frame when calling fegetround.

Unfortunately the current test system does not support rounding modes.
(Found by comparing against musl powf behaviour on the libc-test test
suite, unfortunately that test system does not support errno.)

75b8d8c6

Nov 22, 2018
- Relicence the project under the MIT License · 11253b0b
  Szabolcs Nagy authored Nov 12, 2018
  
  11253b0b
Sep 18, 2018
- Fix the documentation comment of checkint · e875f40f
  Szabolcs Nagy authored Sep 18, 2018
```
checkint in pow is not supposed to be used with 0, inf or nan inputs.
```
  e875f40f
Sep 05, 2018
- Document the log table generation method · 71d27280
  Szabolcs Nagy authored Sep 04, 2018
```
Add comments with enough detail so the log lookup tables can be recreated.
```
  71d27280
Aug 17, 2018

remez.jl: update to work with Julia 1.0. · 757f906d

Simon Tatham authored Aug 15, 2018

This program was originally developed for a much earlier version of
Julia, and there have been a lot of changes to Julia semantics since
then. But 1.0 is intended to be stable, so updating once to work with
that should mean that no further updates are needed for a long time.

Changes relevant to this script include new API calls for making
arrays, evaluating Julia source-code expressions from strings, and
parsing strings as numbers; scope and semantics changes requiring
extra escaping in the @debug macro and some explicit 'global' to allow
for-loops to update variables outside themselves; a syntax change for
tuple types; and replacing 'beginswith' on strings with 'startswith'.

There are a couple of remaining inconveniences with this version of
the code. Firstly, the standard Julia 1.0 interpreter will consume a
"--" option terminator on its command line even if it appears after
the script name, so commands of the form 'julia remez.jl -- -1 1 ...'
that worked in Julia 0.2 will no longer work. Adding an extra "--"
before the script name works around this ('julia -- remez.jl -- ...')
because the first "--" is seen by the Julia interpreter and the second
goes to the script. But unfortunately that "--" can't be put in the #!
line as well as the 'env', because of #! command line semantics. So
users may have to work around this by explicitly invoking the
interpreter.

Secondly, Julia 1.0 has moved some mathematical functions (e.g. erf
and gamma) out of its core library into the SpecialFunctions package,
so any pre-existing command lines that used those functions will now
need to qualify them with a package name, and be run on a Julia
installation which has that package installed.

Because Julia 0.4.x is still common (Ubuntu 16.04 and 18.04 both
provide 0.4.5), I've included some backwards-compatibility code so
that the script still runs on that version as well.

757f906d

Ensure HAVE_FAST_FMA is set on AArch64 · 5175759c
Wilco Dijkstra authored Aug 17, 2018
```
If math.h doesn't set FP_FAST_FMA correctly, ensure HAVE_FAST_FMA is set on AArch64.
```
5175759c

Aug 08, 2018
- Improve sincosf comments · b2fc9892
  Wilco Dijkstra authored Aug 08, 2018
```
Improve comments. Use TOINT_INTRINSICS rather than HAVE_FAST_ROUND.
```
  b2fc9892
Jul 30, 2018

Don't build tanf rredf and funder by default · 41ed0e60

Szabolcs Nagy authored Jul 27, 2018

These are no longer maintained and only kept for WANT_SINGLEPREC build,
which is useful for microcontrollers with single precision fpu only.

Removed all tests that are not testing code in the default build.

41ed0e60

Add separate HOST_CFLAGS, HOST_LDFLAGS and HOST_LDLIBS make variables · a202746b
Szabolcs Nagy authored Jul 27, 2018
```
Don't use target flags when building host tools.  The example config.mk.dist
is updated accordingly.
```
a202746b

Don't build rtest by default · c5a8042e

Szabolcs Nagy authored Jul 11, 2018

rtest is the only binary that is built for the host and has additional
dependencies: mpfr and mpc.  To avoid build issues only build it if
the randomized tests are run.  New build target is introduced for
randomized testing, so make check works without rtest.

Tests are no longer copied to the build directory and the runtest.sh
tool is removed as it is not very useful.

c5a8042e

Flush stdout after each benchmark in mathbench · 5f82bffe

Szabolcs Nagy authored Jul 11, 2018

If lot of benchmarks are run and the output is not a tty then the results
only showed up after the stdio buffer was full.  With fflush if the
output is redirected and mathbench is killed the results before the kill
are still available.

5f82bffe

Jul 10, 2018

Remove float compare option from sincosf · fce09974

Szabolcs Nagy authored Jul 09, 2018

PREFER_FLOAT_COMPARISON setting was not correct as it could raise
spurious exceptions. Fixing it is easy: just use ISLESS(x, y) instead
of abstop12(x) < abstop12(y) with appropriate non-signaling definition
for ISLESS. However it seems this setting is not very useful (there is
only minor performance difference on various architectures), so remove
this option for now.

fce09974

Fix the documentation comments for log_inline in pow · ae8bc7d0
Szabolcs Nagy authored Jul 09, 2018
```
There was a typo and the arguments were not explained clearly.
```
ae8bc7d0

Jul 04, 2018

Fix namespace issues in sincosf · 3262ef23

Wilco Dijkstra authored Jul 04, 2018

Use const sincos_t for clarity instead of making the typedef const.
Use __inv_pi4 and __sincosf_table to avoid namespace issues with
static linking.

3262ef23

Change the return type of converttoint and document the semantics · dd178df0

Szabolcs Nagy authored Jul 04, 2018

The roundtoint and converttoint internal functions are only called with small
values, so 32 bit result is enough for converttoint and it is a signed int
conversion so the natural return type is int32_t.

The original idea was to help the compiler keeping the result in uint64_t,
then it's clear that no sign extension is needed and there is no accidental
undefined or implementation defined signed int arithmetics.

But it turns out gcc does a good job with inlining so changing the type has
no overhead and the semantics of the conversion is less surprising this way.
Since we want to allow the asuint64 (x + 0x1.8p52) style conversion, the top
bits were never usable and the existing code ensures that only the bottom
32 bits of the conversion result are used.

dd178df0

More detailed documentation comments · bc4b9012

Szabolcs Nagy authored Jul 03, 2018

Rewrote some documentation text and fixed a GNU style issue based on
feedback from Joseph Myers on libc-alpha mailing list.

bc4b9012

Jul 03, 2018

Fix large ulp error in pow without fma very near 1.0 · 2105bad5

Szabolcs Nagy authored Jul 03, 2018

The !HAVE_FAST_FMA code path split r = z/c - 1 into r = rhi + rlo such
that when z = 1-tiny and c = 1 then rlo and rhi could have much larger
magnitude than r which later caused large rounding errors.

So do a nearest rounding instead of truncation at the split.

2105bad5

Jun 29, 2018
- Add documentation comments to internal functions · 58ce45c0
  Szabolcs Nagy authored Jun 29, 2018
```
Explain the semantics of internal functions.
```
  58ce45c0
- Fix GNU style issues · 5e838911
  Szabolcs Nagy authored Jun 29, 2018
```
Whitespace changes only.
```
  5e838911
Jun 27, 2018

Add exp trace · 4aa92161

Szabolcs Nagy authored Jun 20, 2018

Extracted 16000 samples from several exp call traces.
First half is quite generic, second half has large number of zeros.

4aa92161

Jun 25, 2018

Fix pow error bound · 5049bfaf

Szabolcs Nagy authored Jun 25, 2018

The pow error bound was miscalculated, it is slightly below 0.54 ULP
when using fma and slightly above it without fma.

If 2^-400 < |pow(x,y)| < 2^400 then the error is less than 0.52 ULP
with fma.

5049bfaf

Jun 22, 2018

Fix gnu code style in pow.c · 76fd080f
Szabolcs Nagy authored Jun 22, 2018

76fd080f

Improve pow implementation · a6230320

Szabolcs Nagy authored Jun 21, 2018

The log part of pow got rewritten to use a slightly different algorithm.
This improves precision and throughput while keeps the same table size.

Near 1 cases are no longer special cased, there is a slight performance
regression in that case.  And when the fma instruction is not available
this algorithm is expected to have slightly worse performance.

Worst-case error improved from 0.67 ULP to 0.57 ULP.

On Cortex-A72 i see
thruput near 1:  7% worse
latency near 1:  2% worse
thruput general: 8% better
latency general: 2% better

a6230320

Jun 20, 2018
- Fix the type of sign_bias in pow · db6e4e96
  Szabolcs Nagy authored Jun 18, 2018
```
uint64_t works too, but the correct type is uint32_t.
```
  db6e4e96
- Add trace for sinf/cosf/sincosf · 942b2123
  Wilco Dijkstra authored Jun 15, 2018
```
Add trace for sinf/cosf/sincosf with easy, hard and medium cases
extracted from a trace of 1.4 million calls.
```
  942b2123
- Improve trace support · 2127ba77
  Wilco Dijkstra authored Jun 15, 2018
```
Improve support for large traces by reading all of the trace
and splitting it into smaller parts.

Also remove the B arrays which are not required.
```
  2127ba77
- Use int32_t in reduce_fast in sincosf.h · f3af42d9
  Wilco Dijkstra authored Jun 20, 2018
```
Use int32_t rather than int since we require it to be 32 bits here.
```
  f3af42d9
- Fix the sign of 0 in pow even with !WANT_ROUNDING · e00696aa
  Szabolcs Nagy authored Jun 19, 2018
```
The sign needs to be fixed even in nearest rounding mode.
```
  e00696aa
Jun 15, 2018

Fix spurious underflow in exp without fma · 2117b832

Szabolcs Nagy authored Jun 14, 2018

The last multiplication in exp and exp2 could underflow when it was not
contracted into an fma.  Changed the thresholds so the problematic cases
end up in the specialcase code path (which handles underflow correctly).

The initial check now only looks at the exponent bits which has slightly
better performance on aarch64.  The overflow threshold can be tight for
exp2, but was let loose in exp so the specialcase handling got updated
accordingly.

Added comments about this issue and the assumptions exp_inline is making
in pow.

2117b832

Fix the opt_barrier_ function prototypes · f6717402
Szabolcs Nagy authored Jun 14, 2018
```
The portable function prototype was wrong.
```
f6717402

Jun 13, 2018

Add trace support to mathbench · 9159cf25

Szabolcs Nagy authored Jun 12, 2018

Previously only uniform random and linear (equidistant) input generators
were supported.  Now with "-g trace" numbers are read from stdin or with
"-f file" they are read from file, so previously recorded call traces
can be benchmarked.

Only the first 8000 numbers are used and if there are less then the
numbers are repeated to fill up the benchmark array.

9159cf25