- Jan 24, 2023
-
-
Szabolcs Nagy authored
The (c) is not strictly required, but it was only missing from one file.
-
Szabolcs Nagy authored
Scripted copyright year updates based on git committer date.
-
Wilco Dijkstra authored
Improve SVE memcpy by copying 2 vectors. This avoids a check on vector length and improves performance of random memcpy.
-
- Jan 23, 2023
-
-
Joe Ramsay authored
For both vector and scalar routines we reduce the order from 6 to 5. For vector routines, this requires reducing RangeVal as for large values the tan polynomial is not quite accurate enough. However the cotan polynomial is used in the inaccurate region in the scalar routine, so this does not need to change. Accuracy of scalar routine is unchanged. Accuracy in both vector routines is now 3.45 ULP, with the same worst-case.
-
- Jan 19, 2023
-
-
Joe Ramsay authored
New routine uses a similar technique to the single-precision Neon routine, but with an extra reduction to pi/8 using the double-angle formula. It is accurate to 3.5 ULP.
-
- Jan 10, 2023
-
-
Jake Weinstein authored
This is a partial revert of b7e368fb. If SVE assembly is guarded by __ARM_FEATURE_SVE, it cannot build when SVE is not enabled by the build system. This is ok on AOR, but because Android (bionic) uses ifuncs to select the appropriate assembly at runtime, these need to compile regardless of if the target actually supports the instructions. Check for AArch64 and GCC >= 8 or Clang >= 5 so that SVE is not used on compilers that do not support it. This condition will always be true on future builds of Android for AArch64.
-
Wilco Dijkstra authored
Optimize strcpy main loop - large strings are ~22% faster.
-
Wilco Dijkstra authored
Use shrn for narrowing the mask which simplifies code. Unroll the strchr search loop which improves performance on large strings.
-
- Jan 09, 2023
-
-
Pierre Blanchard authored
Variant was wrongly set in structures used to benchmark SVE functions. Before this change only half of the lanes were set as expected. Also reformat for ease of reading.
-
- Jan 06, 2023
-
-
Joe Ramsay authored
All files in pl/math updated to 2023.
-
- Jan 05, 2023
-
-
Joe Ramsay authored
These were technically undefined behaviour - they have been rewritten without the shift so that their type is unsigned int by default.
-
Joe Ramsay authored
New routine is based on a vector implementation from log1p, which has been reused (with some modification for improved accuracy close to 0) from Neon atanh. Accurate to 3.5 ULP.
-
Joe Ramsay authored
New routine uses inlined log1pf helper, and is accurate to 3.1 ULP (2.8 ULP if fp exceptions are enabled).
-
Joe Ramsay authored
New routines use the same algorithm, reliant on a modified version of expm1, and are accurate to 3 ULP.
-
- Dec 30, 2022
-
-
Pierre Blanchard authored
The new SVE implementation is a direct port of Neon log2, and is accurate to 2.58 ULPs. Update error threshold and comments for Neon log2 too, new approximate argmax but same threshold.
-
Pierre Blanchard authored
New SVE routine is an SVE port of the Neon algorithm and is accurate to 2.48 ULPs.
-
- Dec 22, 2022
-
-
Joe Ramsay authored
New routines are both based on existing log1p routines. Scalar is accurate to 3 ULP, Neon to 3.5 ULP. Both set fp exceptions correctly regardless of build config.
-
Joe Ramsay authored
The simplest way to set fenv in Neon atan is by using a scalar fallback for under/overflow cases, however this routine did not have a scalar counterpart so we add a new one, based on the same algorithm and polynomial as the vector variants, and accurate to 2.5 ULP. This is now used as the fallback for all lanes, when any lane of the Neon input is special.
-
Joe Ramsay authored
Both routines previously relied on the vector expm1(f) routine exposed by the library, which depended on WANT_SIMD_EXCEPT for its fenv behaviour, however both routines were expected to always trigger fp exceptions correctly. To remedy this, both routines now use an inlined helper for expm1 (reused from vector tanhf in the case of sinhf), and special-case small input as well as large when WANT_SIMD_EXCEPT is enabled.
-
- Dec 20, 2022
-
-
Joe Ramsay authored
The pipe prevented FAILs and PASSs being counted properly - the while read loop has been rewritten without a pipe, as it was prior to the changes here. fenv checking is temporarily disabled in Neon sinh and sinhf, as they do not get it right. This will be re-enabled once they have been fixed.
-
Pierre Blanchard authored
Updated comment and test threshold.
-
Joe Ramsay authored
The simplest way to set fenv in Neon atanf is by using a scalar fallback to under/overflow cases, however this routine did not have a scalar counterpart so we add a new one, based on the same algorithm and polynomial as the vector variants, and accurate to 2.9 ULP. This is now used as the fallback for all lanes, when any lane of the Neon input is special.
-
Joe Ramsay authored
New routines use the same algorithm, with simplified argument reduction and recombination in the vector variant. Both are accurate to 2 ULP.
-
- Dec 19, 2022
-
-
Joe Ramsay authored
New max observed - updated filenames, comments and runulp threshold.
-
Joe Ramsay authored
We were previously misusing the WANT_ERRNO build flag. This is now replaced everywhere appropriate with WANT_SIMD_EXCEPT. A small number of vector routines get fp exceptions right with no modification - the tests have been updated to track this.
-
Pierre Blanchard authored
A new implementation based on the same approach as Neon logf, that is accurate to 2.48 ULPs. Flags set correctly regardless of WANT_ERRNO.
-
- Dec 15, 2022
-
-
Joe Ramsay authored
To conclude the work on simplifying the runulp.sh script, a new macro has been introduced to specify the intervals in which a routine should be tested in the routine source. This is eventually consumed by runulp.sh.
-
Joe Ramsay authored
Introduces a new macro, similar to how ULP thresholds are now handled, that emits a list of routines which are expected to correctly trigger fenv exceptions, to be consumed by runulp.sh. All scalar routines are expected to do so. A small number of Neon routines are also expected to, dependent on WANT_ERRNO.
-
Joe Ramsay authored
Introduces a new set of macros and Make rules for mechanically generating a list of ULP limits for each routine, to be consumed by runulp.sh. This removes the need to maintain long lists of thresholds in runulp.sh.
-
Joe Ramsay authored
Instead of maintaining three separate lists of routines, which are cumbersome and prone to merge conflicts, we provide a new macro, PL_SIG, which by some preprocessor machinery outputs the lists in the required format (macro formats have been changed very slightly to make the generation simpler). Only routines with simple signatures are handled - binary functions still need mathbench wrappers defined manually. As well, routines with non-standard references (i.e. powi/powk) still need entries and wrappers manually defined.
-
- Dec 13, 2022
-
-
Joe Ramsay authored
New behaviour is hidden behind WANT_ERRNO config option.
-
Joe Ramsay authored
New behaviour is hidden behind WANT_ERRNO config option.
-
Joe Ramsay authored
Flags set correctly regardless of WANT_ERRNO.
-
Pierre Blanchard authored
Test threshold fixed.
-
Joe Ramsay authored
New behaviour is hidden behind WANT_ERRNO config option.
-
- Dec 09, 2022
-
-
Joe Ramsay authored
Add macros for simplifying polynomial evaluation using either Horner, pairwise Horner or Estrin. Several routines have been modified to use the new helpers. Readability is improved slightly, and we expect that this will make prototyping new routines simpler.
-
Joe Ramsay authored
Small simplification - pl routines do not support different rounding modes, so there is no need to support them in runulp.sh. As a result we can also remove Ldir.
-
- Dec 08, 2022
-
-
Joe Ramsay authored
Special lanes were not being properly masked when a lane was tiny. This is now fixed.
-
Pierre Blanchard authored
Fixing a bug that resulted in potentially random results in boring domain by saturating index at an appropriate value.
-
- Dec 07, 2022
-
-
Joe Ramsay authored
In most cases, we mask lanes which should not trigger exceptions with a neutral value, then let the existing special-case handler fix them up later. For exp and exp2 we replace the more complex special-case handler with a simple scalar fallback. All new behaviour is tested in runulp.sh, with a new option to pass -f to the run line. We also extend the fenv testing to Neon log and logf, which already triggered exceptions correctly. New behaviour is mostly hidden behind a new config setting, WANT_SIMD_EXCEPT.
-