Commits · a1b6ffb361553f7d40cf9491ce017dcbf51a6505 · Android-smartphones / realme / realme 5pro / kaderbava / external_arm-optimized-routines

Jan 24, 2023

pl/math: Fix a copyright notice for consistency · a1b6ffb3
Szabolcs Nagy authored Jan 24, 2023
```
The (c) is not strictly required, but it was only missing from one file.
```
a1b6ffb3
Update copyright years · 1eb5d7c2
Szabolcs Nagy authored Jan 24, 2023
```
Scripted copyright year updates based on git committer date.
```
1eb5d7c2

Wilco Dijkstra authored Jan 24, 2023

Improve SVE memcpy by copying 2 vectors. This avoids a check on vector length
and improves performance of random memcpy.

92864946

Jan 23, 2023

pl/math: Reduce order of single-precision tan polynomial · a7b60220

Joe Ramsay authored Jan 23, 2023

For both vector and scalar routines we reduce the order from 6 to
5. For vector routines, this requires reducing RangeVal as for large
values the tan polynomial is not quite accurate enough. However the
cotan polynomial is used in the inaccurate region in the scalar
routine, so this does not need to change.

Accuracy of scalar routine is unchanged. Accuracy in both vector
routines is now 3.45 ULP, with the same worst-case.

a7b60220

Jan 19, 2023

pl/math: Add vector/Neon tan · 76c2badb

Joe Ramsay authored Jan 19, 2023

New routine uses a similar technique to the single-precision Neon
routine, but with an extra reduction to pi/8 using the double-angle
formula. It is accurate to 3.5 ULP.

76c2badb

Jan 10, 2023

string: Compile memcpy-sve.S for aarch64 if compiler supports it · cd28f3c4

Jake Weinstein authored Dec 21, 2022

This is a partial revert of b7e368fb. If SVE assembly is guarded by
__ARM_FEATURE_SVE, it cannot build when SVE is not enabled by the build
system. This is ok on AOR, but because Android (bionic) uses ifuncs to
select the appropriate assembly at runtime, these need to compile
regardless of if the target actually supports the instructions.

Check for AArch64 and GCC >= 8 or Clang >= 5 so that SVE is not used on
compilers that do not support it. This condition will always be true on
future builds of Android for AArch64.

cd28f3c4

string: Optimize strcpy · 7c1d7a24
Wilco Dijkstra authored Jan 10, 2023
```
Optimize strcpy main loop - large strings are ~22% faster.
```
7c1d7a24

string: Improve strrchr-mte · 10589b2c

Wilco Dijkstra authored Jan 10, 2023

Use shrn for narrowing the mask which simplifies code. Unroll the
strchr search loop which improves performance on large strings.

10589b2c

Jan 09, 2023

pl/math: Fix benchmark entries for SVE bivariate functions · 268217a1

Pierre Blanchard authored Jan 09, 2023

Variant was wrongly set in structures used to benchmark SVE functions.
Before this change only half of the lanes were set as expected.
Also reformat for ease of reading.

268217a1

Jan 06, 2023
- pl/math: Update copyright years · f0f80b8a
  Joe Ramsay authored Jan 06, 2023
```
All files in pl/math updated to 2023.
```
  f0f80b8a
Jan 05, 2023

Rewrite two abs masks as literals · 0f87f607

Joe Ramsay authored Jan 05, 2023

These were technically undefined behaviour - they have been rewritten
without the shift so that their type is unsigned int by default.

0f87f607

pl/math: Add vector/Neon acosh · 49d28324

Joe Ramsay authored Jan 05, 2023

New routine is based on a vector implementation from log1p, which has
been reused (with some modification for improved accuracy close to 0)
from Neon atanh. Accurate to 3.5 ULP.

49d28324

pl/math: Add vector/Neon acoshf · 0866a19c

Joe Ramsay authored Jan 05, 2023

New routine uses inlined log1pf helper, and is accurate to 3.1 ULP
(2.8 ULP if fp exceptions are enabled).

0866a19c

pl/math: Add scalar & vector/Neon tanh · 47c03a91

Joe Ramsay authored Jan 05, 2023

New routines use the same algorithm, reliant on a modified version of
expm1, and are accurate to 3 ULP.

47c03a91

Dec 30, 2022

pl/math: Add vector/SVE log2 · 2364ce53

Pierre Blanchard authored Dec 30, 2022

The new SVE implementation is a direct port of Neon log2, and is
accurate to 2.58 ULPs.
Update error threshold and comments for Neon log2 too, new
approximate argmax but same threshold.

2364ce53

pl/math: Add vector/SVE log2f · 08482af8
Pierre Blanchard authored Dec 30, 2022
```
New SVE routine is an SVE port of the Neon algorithm
and is accurate to 2.48 ULPs.
```
08482af8

Dec 22, 2022

pl/math: Add scalar & vector/Neon atanh · 0d000be2

Joe Ramsay authored Dec 22, 2022

New routines are both based on existing log1p routines. Scalar is
accurate to 3 ULP, Neon to 3.5 ULP. Both set fp exceptions correctly
regardless of build config.

0d000be2

pl/math: Add scalar atan and set fenv in Neon atan · 2015eee4

Joe Ramsay authored Dec 22, 2022

The simplest way to set fenv in Neon atan is by using a scalar
fallback for under/overflow cases, however this routine did not have a
scalar counterpart so we add a new one, based on the same algorithm
and polynomial as the vector variants, and accurate to 2.5 ULP. This
is now used as the fallback for all lanes, when any lane of the Neon
input is special.

2015eee4

pl/math: Fix fp exceptions in Neon sinhf and sinh · 0a9270a2

Joe Ramsay authored Dec 22, 2022

Both routines previously relied on the vector expm1(f) routine exposed
by the library, which depended on WANT_SIMD_EXCEPT for its fenv
behaviour, however both routines were expected to always trigger fp
exceptions correctly. To remedy this, both routines now use an inlined
helper for expm1 (reused from vector tanhf in the case of sinhf), and
special-case small input as well as large when WANT_SIMD_EXCEPT is
enabled.

0a9270a2

Dec 20, 2022

Correct exit code from runulp.sh · 3bfa7bd4

Joe Ramsay authored Dec 20, 2022

The pipe prevented FAILs and PASSs being counted properly - the while
read loop has been rewritten without a pipe, as it was prior to the
changes here.

fenv checking is temporarily disabled in Neon sinh and sinhf, as they
do not get it right. This will be re-enabled once they have been
fixed.

3bfa7bd4

pl/math: Update ULP threshold for SVE erf · 7ab15c5f
Pierre Blanchard authored Dec 20, 2022
```
Updated comment and test threshold.
```
7ab15c5f

pl/math: Add scalar atanf and set fenv in Neon atanf · f312cb80

Joe Ramsay authored Dec 20, 2022

The simplest way to set fenv in Neon atanf is by using a scalar
fallback to under/overflow cases, however this routine did not have a
scalar counterpart so we add a new one, based on the same algorithm
and polynomial as the vector variants, and accurate to 2.9 ULP. This
is now used as the fallback for all lanes, when any lane of the Neon
input is special.

f312cb80

pl/math: Add scalar & vector/Neon cbrt · 04e91eca

Joe Ramsay authored Dec 20, 2022

New routines use the same algorithm, with simplified argument
reduction and recombination in the vector variant. Both are accurate
to 2 ULP.

04e91eca

Dec 19, 2022

pl/math: Update ULP threshold for Neon asinh · a5fc3ed5
Joe Ramsay authored Dec 19, 2022
```
New max observed - updated filenames, comments and runulp threshold.
```
a5fc3ed5

pl/math: Replace WANT_ERRNO with WANT_SIMD_EXCEPT for Neon fenv · d05594e6

Joe Ramsay authored Dec 19, 2022

We were previously misusing the WANT_ERRNO build flag. This is now
replaced everywhere appropriate with WANT_SIMD_EXCEPT. A small number
of vector routines get fp exceptions right with no modification - the
tests have been updated to track this.

d05594e6

pl/math: Improve vector/Neon log2f · 0976cbd2

Pierre Blanchard authored Dec 19, 2022

A new implementation based on the same approach as
Neon logf, that is accurate to 2.48 ULPs.

Flags set correctly regardless of WANT_ERRNO.

0976cbd2

Dec 15, 2022

pl/math: Move test intervals to routine source files · 202e4631

Joe Ramsay authored Dec 15, 2022

To conclude the work on simplifying the runulp.sh script, a new macro
has been introduced to specify the intervals in which a routine should
be tested in the routine source. This is eventually consumed by
runulp.sh.

202e4631

pl/math: Move fenv expectations out of runulp.sh · d748e152

Joe Ramsay authored Dec 15, 2022

Introduces a new macro, similar to how ULP thresholds are now
handled, that emits a list of routines which are expected to
correctly trigger fenv exceptions, to be consumed by runulp.sh.
All scalar routines are expected to do so. A small number of Neon
routines are also expected to, dependent on WANT_ERRNO.

d748e152

pl/math: Move ULP limits to routine source files · ecb1c6f6

Joe Ramsay authored Dec 15, 2022

Introduces a new set of macros and Make rules for mechanically
generating a list of ULP limits for each routine, to be consumed
by runulp.sh. This removes the need to maintain long lists of
thresholds in runulp.sh.

ecb1c6f6

pl/math: Auto-generate mathbench and ulp headers · 1bca1a54

Joe Ramsay authored Dec 15, 2022

Instead of maintaining three separate lists of routines, which
are cumbersome and prone to merge conflicts, we provide a new
macro, PL_SIG, which by some preprocessor machinery outputs the
lists in the required format (macro formats have been changed
very slightly to make the generation simpler). Only routines with
simple signatures are handled - binary functions still need
mathbench wrappers defined manually. As well, routines with
non-standard references (i.e. powi/powk) still need entries and
wrappers manually defined.

1bca1a54

Dec 13, 2022
- pl/math: Set fenv flags in Neon log1p · 3c0af1a7
  Joe Ramsay authored Dec 13, 2022
```
New behaviour is hidden behind WANT_ERRNO config option.
```
  3c0af1a7
- pl/math: Set fenv flags in Neon tanf · 80abd605
  Joe Ramsay authored Dec 13, 2022
```
New behaviour is hidden behind WANT_ERRNO config option.
```
  80abd605
- pl/math: Set fenv flags in Neon log2f · 0d1cc42c
  Joe Ramsay authored Dec 13, 2022
```
Flags set correctly regardless of WANT_ERRNO.
```
  0d1cc42c
- pl/math: Update ULP threshold for SVE atan2 · bfa600d2
  Pierre Blanchard authored Dec 12, 2022
```
Test threshold fixed.
```
  bfa600d2
- pl/math: Set fenv flags in Neon log1pf · b490d4ce
  Joe Ramsay authored Dec 13, 2022
```
New behaviour is hidden behind WANT_ERRNO config option.
```
  b490d4ce
Dec 09, 2022

pl/math: Add polynomial helpers · bc7cc9d2

Joe Ramsay authored Dec 09, 2022

Add macros for simplifying polynomial evaluation using either Horner,
pairwise Horner or Estrin. Several routines have been modified to use
the new helpers. Readability is improved slightly, and we expect that
this will make prototyping new routines simpler.

bc7cc9d2

pl/math/test: Simplify runulp.sh · 132d2f5d

Joe Ramsay authored Dec 09, 2022

Small simplification - pl routines do not support different rounding
modes, so there is no need to support them in runulp.sh. As a result
we can also remove Ldir.

132d2f5d

Dec 08, 2022

pl/math: Fix fenv in asinh · 08d6ff3d

Joe Ramsay authored Dec 08, 2022

Special lanes were not being properly masked when a lane was
tiny. This is now fixed.

08d6ff3d

pl/math: Fix vector/SVE erf · 61056f3c

Pierre Blanchard authored Dec 08, 2022

Fixing a bug that resulted in potentially random results
in boring domain by saturating index at an appropriate value.

61056f3c

Dec 07, 2022

math: Set fenv exceptions for several Neon routines · a5e45e4e

Joe Ramsay authored Dec 07, 2022

In most cases, we mask lanes which should not trigger exceptions with
a neutral value, then let the existing special-case handler fix them
up later. For exp and exp2 we replace the more complex special-case
handler with a simple scalar fallback. All new behaviour is tested in
runulp.sh, with a new option to pass -f to the run line. We also
extend the fenv testing to Neon log and logf, which already triggered
exceptions correctly. New behaviour is mostly hidden behind a new
config setting, WANT_SIMD_EXCEPT.

a5e45e4e