Skip to content
  1. Jan 24, 2023
    • Szabolcs Nagy's avatar
      v23.01 release · 56e3bf05
      Szabolcs Nagy authored
      * Project changes
        * All files are under a new dual license now (MIT OR Apache-2.0 WITH
          LLVM-exception at the election of the user).
        * Added MAINTAINERS file describing who maintains the subdirectories.
        * Added README.contributors files documenting contribution
          requirements.
        * Added new pl/ subdirectory for Arm's Performance Library related
          routines.
      * String routine changes
        * Added memset benchmark.
        * Improved strlen and memcpy benchmarks.
        * Added SVE memcpy.
        * Updated arm string functions to support M-profile PACBTI.
        * Merged the MTE and generic versions of strcmp, strncmp, strcpy and
          stpcpy into one implementation.
        * Optimized memcmp, memchr-mte, memrchr, strchr-mte, strchrnul-mte,
          strrchr-mte, strlen, strlen-mte, strnlen, strcpy.
      * Math routine changes
        * Fixed constants in sinf, cosf and sincosf to be compile time
          computed even with gcc-12 -frounding-math.
        * Fixed an invalid shift in logf.
        * Support floating-point exceptions in vector math routines when
          WANT_SIMD_EXCEPT is set.
      56e3bf05
    • Szabolcs Nagy's avatar
      pl/math: Fix a copyright notice for consistency · a1b6ffb3
      Szabolcs Nagy authored
      The (c) is not strictly required, but it was only missing from one file.
      a1b6ffb3
    • Szabolcs Nagy's avatar
      Update copyright years · 1eb5d7c2
      Szabolcs Nagy authored
      Scripted copyright year updates based on git committer date.
      1eb5d7c2
    • Wilco Dijkstra's avatar
      string: Improve SVE memcpy · 92864946
      Wilco Dijkstra authored
      Improve SVE memcpy by copying 2 vectors. This avoids a check on vector length
      and improves performance of random memcpy.
      92864946
  2. Jan 23, 2023
    • Joe Ramsay's avatar
      pl/math: Reduce order of single-precision tan polynomial · a7b60220
      Joe Ramsay authored
      For both vector and scalar routines we reduce the order from 6 to
      5. For vector routines, this requires reducing RangeVal as for large
      values the tan polynomial is not quite accurate enough. However the
      cotan polynomial is used in the inaccurate region in the scalar
      routine, so this does not need to change.
      
      Accuracy of scalar routine is unchanged. Accuracy in both vector
      routines is now 3.45 ULP, with the same worst-case.
      a7b60220
  3. Jan 19, 2023
    • Joe Ramsay's avatar
      pl/math: Add vector/Neon tan · 76c2badb
      Joe Ramsay authored
      New routine uses a similar technique to the single-precision Neon
      routine, but with an extra reduction to pi/8 using the double-angle
      formula. It is accurate to 3.5 ULP.
      76c2badb
  4. Jan 10, 2023
    • Jake Weinstein's avatar
      string: Compile memcpy-sve.S for aarch64 if compiler supports it · cd28f3c4
      Jake Weinstein authored
      This is a partial revert of b7e368fb. If SVE assembly is guarded by
      __ARM_FEATURE_SVE, it cannot build when SVE is not enabled by the build
      system. This is ok on AOR, but because Android (bionic) uses ifuncs to
      select the appropriate assembly at runtime, these need to compile
      regardless of if the target actually supports the instructions.
      
      Check for AArch64 and GCC >= 8 or Clang >= 5 so that SVE is not used on
      compilers that do not support it. This condition will always be true on
      future builds of Android for AArch64.
      cd28f3c4
    • Wilco Dijkstra's avatar
      string: Optimize strcpy · 7c1d7a24
      Wilco Dijkstra authored
      Optimize strcpy main loop - large strings are ~22% faster.
      7c1d7a24
    • Wilco Dijkstra's avatar
      string: Improve strrchr-mte · 10589b2c
      Wilco Dijkstra authored
      Use shrn for narrowing the mask which simplifies code. Unroll the
      strchr search loop which improves performance on large strings.
      10589b2c
  5. Jan 09, 2023
  6. Jan 06, 2023
  7. Jan 05, 2023
  8. Dec 30, 2022
  9. Dec 22, 2022
    • Joe Ramsay's avatar
      pl/math: Add scalar & vector/Neon atanh · 0d000be2
      Joe Ramsay authored
      New routines are both based on existing log1p routines. Scalar is
      accurate to 3 ULP, Neon to 3.5 ULP. Both set fp exceptions correctly
      regardless of build config.
      0d000be2
    • Joe Ramsay's avatar
      pl/math: Add scalar atan and set fenv in Neon atan · 2015eee4
      Joe Ramsay authored
      The simplest way to set fenv in Neon atan is by using a scalar
      fallback for under/overflow cases, however this routine did not have a
      scalar counterpart so we add a new one, based on the same algorithm
      and polynomial as the vector variants, and accurate to 2.5 ULP. This
      is now used as the fallback for all lanes, when any lane of the Neon
      input is special.
      2015eee4
    • Joe Ramsay's avatar
      pl/math: Fix fp exceptions in Neon sinhf and sinh · 0a9270a2
      Joe Ramsay authored
      Both routines previously relied on the vector expm1(f) routine exposed
      by the library, which depended on WANT_SIMD_EXCEPT for its fenv
      behaviour, however both routines were expected to always trigger fp
      exceptions correctly. To remedy this, both routines now use an inlined
      helper for expm1 (reused from vector tanhf in the case of sinhf), and
      special-case small input as well as large when WANT_SIMD_EXCEPT is
      enabled.
      0a9270a2
  10. Dec 20, 2022
    • Joe Ramsay's avatar
      Correct exit code from runulp.sh · 3bfa7bd4
      Joe Ramsay authored
      The pipe prevented FAILs and PASSs being counted properly - the while
      read loop has been rewritten without a pipe, as it was prior to the
      changes here.
      
      fenv checking is temporarily disabled in Neon sinh and sinhf, as they
      do not get it right. This will be re-enabled once they have been
      fixed.
      3bfa7bd4
    • Pierre Blanchard's avatar
      pl/math: Update ULP threshold for SVE erf · 7ab15c5f
      Pierre Blanchard authored
      Updated comment and test threshold.
      7ab15c5f
    • Joe Ramsay's avatar
      pl/math: Add scalar atanf and set fenv in Neon atanf · f312cb80
      Joe Ramsay authored
      The simplest way to set fenv in Neon atanf is by using a scalar
      fallback to under/overflow cases, however this routine did not have a
      scalar counterpart so we add a new one, based on the same algorithm
      and polynomial as the vector variants, and accurate to 2.9 ULP. This
      is now used as the fallback for all lanes, when any lane of the Neon
      input is special.
      f312cb80
    • Joe Ramsay's avatar
      pl/math: Add scalar & vector/Neon cbrt · 04e91eca
      Joe Ramsay authored
      New routines use the same algorithm, with simplified argument
      reduction and recombination in the vector variant. Both are accurate
      to 2 ULP.
      04e91eca
  11. Dec 19, 2022
  12. Dec 15, 2022
    • Joe Ramsay's avatar
      pl/math: Move test intervals to routine source files · 202e4631
      Joe Ramsay authored
      To conclude the work on simplifying the runulp.sh script, a new macro
      has been introduced to specify the intervals in which a routine should
      be tested in the routine source. This is eventually consumed by
      runulp.sh.
      202e4631
    • Joe Ramsay's avatar
      pl/math: Move fenv expectations out of runulp.sh · d748e152
      Joe Ramsay authored
      Introduces a new macro, similar to how ULP thresholds are now
      handled, that emits a list of routines which are expected to
      correctly trigger fenv exceptions, to be consumed by runulp.sh.
      All scalar routines are expected to do so. A small number of Neon
      routines are also expected to, dependent on WANT_ERRNO.
      d748e152
    • Joe Ramsay's avatar
      pl/math: Move ULP limits to routine source files · ecb1c6f6
      Joe Ramsay authored
      Introduces a new set of macros and Make rules for mechanically
      generating a list of ULP limits for each routine, to be consumed
      by runulp.sh. This removes the need to maintain long lists of
      thresholds in runulp.sh.
      ecb1c6f6
    • Joe Ramsay's avatar
      pl/math: Auto-generate mathbench and ulp headers · 1bca1a54
      Joe Ramsay authored
      Instead of maintaining three separate lists of routines, which
      are cumbersome and prone to merge conflicts, we provide a new
      macro, PL_SIG, which by some preprocessor machinery outputs the
      lists in the required format (macro formats have been changed
      very slightly to make the generation simpler). Only routines with
      simple signatures are handled - binary functions still need
      mathbench wrappers defined manually. As well, routines with
      non-standard references (i.e. powi/powk) still need entries and
      wrappers manually defined.
      1bca1a54
  13. Dec 13, 2022
  14. Dec 09, 2022
    • Joe Ramsay's avatar
      pl/math: Add polynomial helpers · bc7cc9d2
      Joe Ramsay authored
      Add macros for simplifying polynomial evaluation using either Horner,
      pairwise Horner or Estrin. Several routines have been modified to use
      the new helpers. Readability is improved slightly, and we expect that
      this will make prototyping new routines simpler.
      bc7cc9d2
    • Joe Ramsay's avatar
      pl/math/test: Simplify runulp.sh · 132d2f5d
      Joe Ramsay authored
      Small simplification - pl routines do not support different rounding
      modes, so there is no need to support them in runulp.sh. As a result
      we can also remove Ldir.
      132d2f5d
  15. Dec 08, 2022
Loading