Skip to content
  1. Nov 13, 2020
    • Szabolcs Nagy's avatar
      math: fix spurious underflow in erff and erf · 94b4be60
      Szabolcs Nagy authored
      The code relied on the final x + c*x to be done via an fma, otherwise
      the intermediate c*x could underflow for tiny (almost subnormal) x.
      
      Use explicit fmaf like elsewhere (this code is not expected to be
      fast when fma is not inlined, but at least it should be correct).
      94b4be60
    • Szabolcs Nagy's avatar
      math: fix erf tests in directed rounding modes · 15a1d62b
      Szabolcs Nagy authored
      erf has larger than 1 ULP errors in directed rounding modes,
      increase the error threshold to 1.4 ULP in the test script.
      15a1d62b
    • Pierre Blanchard's avatar
      math: add scalar erf · 00e5afdb
      Pierre Blanchard authored
      Only tested in round-to-nearest mode. The expected worst case error
      is 1.01 ULP near x=1.25.  Benchmarked over random x in [-6,6] and
      can increase performance by > 2x (> 3.5x for throughput) on big ooo
      cores compared to the implementation in glibc 2.28.
      
      Includes data for erfc too, but this patch only adds erf.
      00e5afdb
  2. Nov 05, 2020
  3. Oct 29, 2020
    • Pierre Blanchard's avatar
      math: add scalar erff. · 1a79237d
      Pierre Blanchard authored
      In round-to-nearest mode the maximum error is 1.09 ULP.
      
      Compared to glibc-2.28 erff: throughput is about 2.2x better,
      latency is about 1.5x better on some AArch64 cores (on random
      input in [-4,4]).
      
      There are further optimization and quality improvement opportunities.
      1a79237d
  4. Aug 14, 2020
    • Szabolcs Nagy's avatar
      v20.08 release · 0f4ae0c5
      Szabolcs Nagy authored
      * Bug fixes
        * strcmp-mte nul check
        * strncmp-mte with large size
        * arm memcpy with large size (CVE-2020-6096)
      * String routines performance improvements
        * strlen
        * memmove with backward copy
      * Benchmarking code for strings and memory routines
        * strlen
      0f4ae0c5
    • Wilco Dijkstra's avatar
      string: Benchmark unaligned memmove · ded1e17e
      Wilco Dijkstra authored
      Add benchmarking of forward and backward unaligned memmoves.
      ded1e17e
    • Wilco Dijkstra's avatar
      string: Improve backwards memmove performance · cf3b6b37
      Wilco Dijkstra authored
      On some microarchitectures performance of the backwards memmove improves
      if the stores use STR with decreasing addresses.
      cf3b6b37
  5. Jul 29, 2020
  6. Jul 01, 2020
    • Wilco Dijkstra's avatar
      string: Optimize strlen · 224cb5f6
      Wilco Dijkstra authored
      Optimize strlen using a mix of scalar and SIMD code. On modern micro
      architectures large strings are 55% faster than the current version,
      and 35% faster than strlen-mte.  On the random strlen benchmark the
      speedup is 3.4% and 40% respectively.
      224cb5f6
  7. Jun 23, 2020
  8. Jun 12, 2020
  9. Jun 01, 2020
    • Wilco Dijkstra's avatar
      string: Fix issue in strcmp-mte NUL check · f08b12e8
      Wilco Dijkstra authored
      Improve the previous fix - if a string is immediately preceded by a NUL byte
      and the first byte is 0x1, it may be confused by the NUL check as a NUL byte.
      Instead of removing bytes outside the string via a shift, force them to be
      non-NUL.
      f08b12e8
  10. May 29, 2020
    • Szabolcs Nagy's avatar
      v20.05 release · ef907c7a
      Szabolcs Nagy authored
      * New functionality (64-bit Arm)
        * string: Optimized MTE variants of strlen, strnlen, strchr,
          strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp,
          strncmp
        * string: Changes to support BTI
        * string: New optimized memrchr, strnlen
      * Performance improvements (Neoverse N1)
        * strchr/strchrnul: 21% improvement on long strings
        * strrchr: 11% improvement
        * strnlen: 130% improvement on long strings, 50% on short strings
      * Benchmark and tests
        * string: New memcpy benchmark
        * string: Cleanup testsuite and improve test coverage
      ef907c7a
    • Wilco Dijkstra's avatar
      string: Fix issue in strcmp-mte · 304137d8
      Wilco Dijkstra authored
      Ensure nul bytes before unaligned strings are correctly ignored.
      304137d8
  11. May 28, 2020
    • Wilco Dijkstra's avatar
      string: Improve strcmp-mte performance · 27bb6b2b
      Wilco Dijkstra authored
      Improve strcmp performance. On various micro architectures the speedup is 65%
      on large unaligned strings and 21% on large (mutually) aligned strings.
      On small unaligned strings the speedup is 12%.
      27bb6b2b
    • Wilco Dijkstra's avatar
      string: Improve memcpy benchmark · 2525af9b
      Wilco Dijkstra authored
      Print results in bytes/ns. Add medium and large copy benchmark.
      2525af9b
    • Branislav Rankov's avatar
      string: Add MTE support to string tests. · 4d55c2d3
      Branislav Rankov authored
      Set taggs for every test case so that boundaries are as narrow as
      possible. There is no handling of tag faults, so the test will
      crash if there is a MTE problem.
      
      The implementations that are not compatible are excluded, including
      the standard symbols that may come from an mte incompatible libc.
      4d55c2d3
  12. May 22, 2020
  13. May 20, 2020
  14. May 18, 2020
    • Wilco Dijkstra's avatar
      string: Improve strrchr-mte performance · a99a1a96
      Wilco Dijkstra authored
      Improve strrchr performance by using a fast strchr loop to find the first
      match. On various micro architectures the speedup is 30-80% on large strings
      and 32% on small strings.
      a99a1a96
  15. May 13, 2020
  16. May 12, 2020
  17. May 01, 2020
    • Szabolcs Nagy's avatar
      string: add a setting to disable GNU Property Notes · e1127946
      Szabolcs Nagy authored
      GNU Property Notes are only supported in recent tooling and older
      tools may warn about them, so it makes sense to remove these notes
      on a system where BTI is not supported anyway.
      
      The actual BTI instructions should be kept in place to avoid
      disturbing code layout.
      
      -DWANT_GNU_PROPERTY=0 removes the .note.gnu.property section
      from assembly files (ideally it would be based on the compiler
      default setting, but there is no feature test macro for BTI and
      PAC-RET).
      e1127946
    • Wilco Dijkstra's avatar
      string: Further improve strchrnul-mte performance · fa69d42a
      Wilco Dijkstra authored
      Remove 2 more instructions, resulting in a 9.8% speedup of medium
      sized strings (16-32).
      
      The BTI patch changed ENTRY so the loops got misaligned, this fixes
      that regression.
      fa69d42a
    • Wilco Dijkstra's avatar
      string: Further improve strchr-mte performance · 7bb8464f
      Wilco Dijkstra authored
      Remove 2 more instructions, resulting in a 6.8% speedup of medium
      sized strings (16-32).
      
      The BTI patch changed ENTRY so the loops got misaligned, this fixes
      that regression.
      7bb8464f
Loading