Skip to content
  1. Feb 10, 2022
  2. Dec 21, 2021
    • Szabolcs Nagy's avatar
      math: fix constant in sinf and cosf · 074e8357
      Szabolcs Nagy authored
      gcc-12 -frounding-math started using runtime rounding mode for
      converting double constants to float, so abstop12(pio4) is no longer
      a compile time constant (this is required by iso c). Use float pio4f
      instead to make the generated code the same as before and avoid
      regressions on gcc-12.
      074e8357
  3. Oct 29, 2021
    • Wilco Dijkstra's avatar
      string: Optimize memcmp · 7a9fd160
      Wilco Dijkstra authored
      Rewrite memcmp to improve performance. On small and medium inputs
      performance is typically 25% better. Large inputs use a SIMD loop
      processing 64 bytes per iteration, which is 50% faster than the
      previous version.
      7a9fd160
  4. Oct 04, 2021
  5. Feb 17, 2021
    • Szabolcs Nagy's avatar
      v21.02 release · 6798b507
      Szabolcs Nagy authored
      * String routine changes
        * Added AArch64 ILP32 ABI support.
        * Fixed SVE strnlen return value.
        * Added MTE related __mtag_tag_region.
        * Added MTE related __mtag_tag_zero_region.
        * Minor code cleanups.
      6798b507
    • Szabolcs Nagy's avatar
      Update copyright years · e823e3ab
      Szabolcs Nagy authored
      Scripted copyright year updates based on git committer date.
      e823e3ab
  6. Feb 12, 2021
    • Szabolcs Nagy's avatar
      string: add __mtag_tag_zero_region · fcad5b82
      Szabolcs Nagy authored
      Add optimized __mtag_tag_zero_region(dst, len) operation to AOR. It tags
      the memory according to the tag of the dst pointer then memsets it to 0
      and returns dst. It requires MTE support. The memory remains untagged if
      tagging is not enabled for it. The dst must be 16 bytes aligned and len
      must be a multiple of 16.
      
      Similar to __mtag_tag_region, but uses the zeroing instructions.
      fcad5b82
    • Szabolcs Nagy's avatar
      string: add __mtag_tag_region · f8d6aece
      Szabolcs Nagy authored
      Add optimized __mtag_tag_region(dst, len) operation to AOR. It tags the
      given memory region according to the tag of the dst pointer and returns
      dst. It requires MTE support. The memory remains untagged if tagging is
      not enabled for it. The dst must be 16 bytes aligned and len must be a
      multiple of 16.
      f8d6aece
  7. Jan 08, 2021
  8. Jan 04, 2021
  9. Dec 17, 2020
  10. Nov 16, 2020
  11. Nov 13, 2020
    • Szabolcs Nagy's avatar
      math: fix spurious underflow in erff and erf · 94b4be60
      Szabolcs Nagy authored
      The code relied on the final x + c*x to be done via an fma, otherwise
      the intermediate c*x could underflow for tiny (almost subnormal) x.
      
      Use explicit fmaf like elsewhere (this code is not expected to be
      fast when fma is not inlined, but at least it should be correct).
      94b4be60
    • Szabolcs Nagy's avatar
      math: fix erf tests in directed rounding modes · 15a1d62b
      Szabolcs Nagy authored
      erf has larger than 1 ULP errors in directed rounding modes,
      increase the error threshold to 1.4 ULP in the test script.
      15a1d62b
    • Pierre Blanchard's avatar
      math: add scalar erf · 00e5afdb
      Pierre Blanchard authored
      Only tested in round-to-nearest mode. The expected worst case error
      is 1.01 ULP near x=1.25.  Benchmarked over random x in [-6,6] and
      can increase performance by > 2x (> 3.5x for throughput) on big ooo
      cores compared to the implementation in glibc 2.28.
      
      Includes data for erfc too, but this patch only adds erf.
      00e5afdb
  12. Nov 05, 2020
  13. Oct 29, 2020
    • Pierre Blanchard's avatar
      math: add scalar erff. · 1a79237d
      Pierre Blanchard authored
      In round-to-nearest mode the maximum error is 1.09 ULP.
      
      Compared to glibc-2.28 erff: throughput is about 2.2x better,
      latency is about 1.5x better on some AArch64 cores (on random
      input in [-4,4]).
      
      There are further optimization and quality improvement opportunities.
      1a79237d
  14. Aug 14, 2020
    • Szabolcs Nagy's avatar
      v20.08 release · 0f4ae0c5
      Szabolcs Nagy authored
      * Bug fixes
        * strcmp-mte nul check
        * strncmp-mte with large size
        * arm memcpy with large size (CVE-2020-6096)
      * String routines performance improvements
        * strlen
        * memmove with backward copy
      * Benchmarking code for strings and memory routines
        * strlen
      0f4ae0c5
    • Wilco Dijkstra's avatar
      string: Benchmark unaligned memmove · ded1e17e
      Wilco Dijkstra authored
      Add benchmarking of forward and backward unaligned memmoves.
      ded1e17e
    • Wilco Dijkstra's avatar
      string: Improve backwards memmove performance · cf3b6b37
      Wilco Dijkstra authored
      On some microarchitectures performance of the backwards memmove improves
      if the stores use STR with decreasing addresses.
      cf3b6b37
  15. Jul 29, 2020
  16. Jul 01, 2020
    • Wilco Dijkstra's avatar
      string: Optimize strlen · 224cb5f6
      Wilco Dijkstra authored
      Optimize strlen using a mix of scalar and SIMD code. On modern micro
      architectures large strings are 55% faster than the current version,
      and 35% faster than strlen-mte.  On the random strlen benchmark the
      speedup is 3.4% and 40% respectively.
      224cb5f6
  17. Jun 23, 2020
  18. Jun 12, 2020
  19. Jun 01, 2020
    • Wilco Dijkstra's avatar
      string: Fix issue in strcmp-mte NUL check · f08b12e8
      Wilco Dijkstra authored
      Improve the previous fix - if a string is immediately preceded by a NUL byte
      and the first byte is 0x1, it may be confused by the NUL check as a NUL byte.
      Instead of removing bytes outside the string via a shift, force them to be
      non-NUL.
      f08b12e8
  20. May 29, 2020
    • Szabolcs Nagy's avatar
      v20.05 release · ef907c7a
      Szabolcs Nagy authored
      * New functionality (64-bit Arm)
        * string: Optimized MTE variants of strlen, strnlen, strchr,
          strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp,
          strncmp
        * string: Changes to support BTI
        * string: New optimized memrchr, strnlen
      * Performance improvements (Neoverse N1)
        * strchr/strchrnul: 21% improvement on long strings
        * strrchr: 11% improvement
        * strnlen: 130% improvement on long strings, 50% on short strings
      * Benchmark and tests
        * string: New memcpy benchmark
        * string: Cleanup testsuite and improve test coverage
      ef907c7a
    • Wilco Dijkstra's avatar
      string: Fix issue in strcmp-mte · 304137d8
      Wilco Dijkstra authored
      Ensure nul bytes before unaligned strings are correctly ignored.
      304137d8
  21. May 28, 2020
    • Wilco Dijkstra's avatar
      string: Improve strcmp-mte performance · 27bb6b2b
      Wilco Dijkstra authored
      Improve strcmp performance. On various micro architectures the speedup is 65%
      on large unaligned strings and 21% on large (mutually) aligned strings.
      On small unaligned strings the speedup is 12%.
      27bb6b2b
    • Wilco Dijkstra's avatar
      string: Improve memcpy benchmark · 2525af9b
      Wilco Dijkstra authored
      Print results in bytes/ns. Add medium and large copy benchmark.
      2525af9b
    • Branislav Rankov's avatar
      string: Add MTE support to string tests. · 4d55c2d3
      Branislav Rankov authored
      Set taggs for every test case so that boundaries are as narrow as
      possible. There is no handling of tag faults, so the test will
      crash if there is a MTE problem.
      
      The implementations that are not compatible are excluded, including
      the standard symbols that may come from an mte incompatible libc.
      4d55c2d3
  22. May 22, 2020
Loading