- Feb 10, 2022
-
-
Szabolcs Nagy authored
The outgoing license was MIT only. The new dual license allows using the code under Apache-2.0 WITH LLVM-exception license too.
-
Wilco Dijkstra authored
Merge the MTE and non-MTE versions of strcpy and stpcpy since the MTE versions are faster.
-
Wilco Dijkstra authored
Merge the MTE and non-MTE versions of strcmp and strncmp since the MTE versions are faster.
-
Wilco Dijkstra authored
Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE vectors which improves the random memcpy benchmark significantly.
-
- Dec 21, 2021
-
-
Szabolcs Nagy authored
gcc-12 -frounding-math started using runtime rounding mode for converting double constants to float, so abstop12(pio4) is no longer a compile time constant (this is required by iso c). Use float pio4f instead to make the generated code the same as before and avoid regressions on gcc-12.
-
- Oct 29, 2021
-
-
Wilco Dijkstra authored
Rewrite memcmp to improve performance. On small and medium inputs performance is typically 25% better. Large inputs use a SIMD loop processing 64 bytes per iteration, which is 50% faster than the previous version.
-
- Oct 04, 2021
-
-
Wilco Dijkstra authored
Improve memcpy benchmark. Double the number of random tests and the memory size. Add separate tests using a direct call to memcpy to compare with indirect call to GLIBC memcpy. Add a test for small aligned and unaligned memcpy.
-
Wilco Dijkstra authored
Increase the number of iterations of the random test. Minor code cleanup.
-
Wilco Dijkstra authored
Add a randomized memset benchmark using string length and alignment distribution based on SPEC2017.
-
- Feb 17, 2021
-
-
Szabolcs Nagy authored
* String routine changes * Added AArch64 ILP32 ABI support. * Fixed SVE strnlen return value. * Added MTE related __mtag_tag_region. * Added MTE related __mtag_tag_zero_region. * Minor code cleanups.
-
Szabolcs Nagy authored
Scripted copyright year updates based on git committer date.
-
- Feb 12, 2021
-
-
Szabolcs Nagy authored
Add optimized __mtag_tag_zero_region(dst, len) operation to AOR. It tags the memory according to the tag of the dst pointer then memsets it to 0 and returns dst. It requires MTE support. The memory remains untagged if tagging is not enabled for it. The dst must be 16 bytes aligned and len must be a multiple of 16. Similar to __mtag_tag_region, but uses the zeroing instructions.
-
Szabolcs Nagy authored
Add optimized __mtag_tag_region(dst, len) operation to AOR. It tags the given memory region according to the tag of the dst pointer and returns dst. It requires MTE support. The memory remains untagged if tagging is not enabled for it. The dst must be 16 bytes aligned and len must be a multiple of 16.
-
- Jan 08, 2021
-
-
Wilco Dijkstra authored
Cleanup spurious .text and .arch. Use ENTRY rather than ENTRY_ALIGN.
-
- Jan 04, 2021
-
-
Richard Henderson authored
The error report was copied from the seekchar test above, and needs adjustment to match the gating IF.
-
Richard Henderson authored
There were nops before the beginning of the function to place the main loop on a 64-byte boundary, but the addition of BTI and instructions for ILP32 has corrupted that. As per review, drop 64-byte alignment entirely, and use the default 16-byte alignment from ENTRY.
-
Richard Henderson authored
These nops were placed to align code to 16-byte boundaries, but then the addition of BTI and ILP32 has corrupted that.
-
Richard Henderson authored
The comment on the eos-not-found path says that it is returning the max string length, but it actually uses the current string length. This results in returned values larger than the expected value.
-
- Dec 17, 2020
-
-
Kinsey Moore authored
This adds sanitization of padding bits for pointers and size_t types as required by ARM aapcs64 for the AArch64 ILP32 ABI.
-
- Nov 16, 2020
-
-
Szabolcs Nagy authored
* New math routines * Scalar erff and erf using fma.
-
- Nov 13, 2020
-
-
Szabolcs Nagy authored
The code relied on the final x + c*x to be done via an fma, otherwise the intermediate c*x could underflow for tiny (almost subnormal) x. Use explicit fmaf like elsewhere (this code is not expected to be fast when fma is not inlined, but at least it should be correct).
-
Szabolcs Nagy authored
erf has larger than 1 ULP errors in directed rounding modes, increase the error threshold to 1.4 ULP in the test script.
-
Pierre Blanchard authored
Only tested in round-to-nearest mode. The expected worst case error is 1.01 ULP near x=1.25. Benchmarked over random x in [-6,6] and can increase performance by > 2x (> 3.5x for throughput) on big ooo cores compared to the implementation in glibc 2.28. Includes data for erfc too, but this patch only adds erf.
-
- Nov 05, 2020
-
-
Szabolcs Nagy authored
Make the formatting consistent with other copyright notices. (This helps me doing automatic license header checks.)
-
Szabolcs Nagy authored
This was incorrect in the previous commit.
-
- Oct 29, 2020
-
-
Pierre Blanchard authored
In round-to-nearest mode the maximum error is 1.09 ULP. Compared to glibc-2.28 erff: throughput is about 2.2x better, latency is about 1.5x better on some AArch64 cores (on random input in [-4,4]). There are further optimization and quality improvement opportunities.
-
- Aug 14, 2020
-
-
Szabolcs Nagy authored
* Bug fixes * strcmp-mte nul check * strncmp-mte with large size * arm memcpy with large size (CVE-2020-6096) * String routines performance improvements * strlen * memmove with backward copy * Benchmarking code for strings and memory routines * strlen
-
Wilco Dijkstra authored
Add benchmarking of forward and backward unaligned memmoves.
-
Wilco Dijkstra authored
On some microarchitectures performance of the backwards memmove improves if the stores use STR with decreasing addresses.
-
- Jul 29, 2020
-
-
Adhemerval Zanella authored
This fix is similar to the one done one glibc (beea361050).
-
- Jul 01, 2020
-
-
Wilco Dijkstra authored
Optimize strlen using a mix of scalar and SIMD code. On modern micro architectures large strings are 55% faster than the current version, and 35% faster than strlen-mte. On the random strlen benchmark the speedup is 3.4% and 40% respectively.
-
- Jun 23, 2020
-
-
Wilco Dijkstra authored
Add strlen benchmark with a random latency and small/medium throughput tests.
-
- Jun 12, 2020
-
-
Wilco Dijkstra authored
If limit is near SIZE_MAX it can overflow in the mutually aligned path. Fix this by clamping limit to SIZE_MAX.
-
- Jun 01, 2020
-
-
Wilco Dijkstra authored
Improve the previous fix - if a string is immediately preceded by a NUL byte and the first byte is 0x1, it may be confused by the NUL check as a NUL byte. Instead of removing bytes outside the string via a shift, force them to be non-NUL.
-
- May 29, 2020
-
-
Szabolcs Nagy authored
* New functionality (64-bit Arm) * string: Optimized MTE variants of strlen, strnlen, strchr, strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp, strncmp * string: Changes to support BTI * string: New optimized memrchr, strnlen * Performance improvements (Neoverse N1) * strchr/strchrnul: 21% improvement on long strings * strrchr: 11% improvement * strnlen: 130% improvement on long strings, 50% on short strings * Benchmark and tests * string: New memcpy benchmark * string: Cleanup testsuite and improve test coverage
-
Wilco Dijkstra authored
Ensure nul bytes before unaligned strings are correctly ignored.
-
- May 28, 2020
-
-
Wilco Dijkstra authored
Improve strcmp performance. On various micro architectures the speedup is 65% on large unaligned strings and 21% on large (mutually) aligned strings. On small unaligned strings the speedup is 12%.
-
Wilco Dijkstra authored
Print results in bytes/ns. Add medium and large copy benchmark.
-
Branislav Rankov authored
Set taggs for every test case so that boundaries are as narrow as possible. There is no handling of tag faults, so the test will crash if there is a MTE problem. The implementations that are not compatible are excluded, including the standard symbols that may come from an mte incompatible libc.
-
- May 22, 2020
-
-
Wilco Dijkstra authored
Clean up code and improve test coverage.
-