- Nov 13, 2020
-
-
Szabolcs Nagy authored
The code relied on the final x + c*x to be done via an fma, otherwise the intermediate c*x could underflow for tiny (almost subnormal) x. Use explicit fmaf like elsewhere (this code is not expected to be fast when fma is not inlined, but at least it should be correct).
-
Szabolcs Nagy authored
erf has larger than 1 ULP errors in directed rounding modes, increase the error threshold to 1.4 ULP in the test script.
-
Pierre Blanchard authored
Only tested in round-to-nearest mode. The expected worst case error is 1.01 ULP near x=1.25. Benchmarked over random x in [-6,6] and can increase performance by > 2x (> 3.5x for throughput) on big ooo cores compared to the implementation in glibc 2.28. Includes data for erfc too, but this patch only adds erf.
-
- Nov 05, 2020
-
-
Szabolcs Nagy authored
Make the formatting consistent with other copyright notices. (This helps me doing automatic license header checks.)
-
Szabolcs Nagy authored
This was incorrect in the previous commit.
-
- Oct 29, 2020
-
-
Pierre Blanchard authored
In round-to-nearest mode the maximum error is 1.09 ULP. Compared to glibc-2.28 erff: throughput is about 2.2x better, latency is about 1.5x better on some AArch64 cores (on random input in [-4,4]). There are further optimization and quality improvement opportunities.
-
- Aug 14, 2020
-
-
Szabolcs Nagy authored
* Bug fixes * strcmp-mte nul check * strncmp-mte with large size * arm memcpy with large size (CVE-2020-6096) * String routines performance improvements * strlen * memmove with backward copy * Benchmarking code for strings and memory routines * strlen
-
Wilco Dijkstra authored
Add benchmarking of forward and backward unaligned memmoves.
-
Wilco Dijkstra authored
On some microarchitectures performance of the backwards memmove improves if the stores use STR with decreasing addresses.
-
- Jul 29, 2020
-
-
Adhemerval Zanella authored
This fix is similar to the one done one glibc (beea361050).
-
- Jul 01, 2020
-
-
Wilco Dijkstra authored
Optimize strlen using a mix of scalar and SIMD code. On modern micro architectures large strings are 55% faster than the current version, and 35% faster than strlen-mte. On the random strlen benchmark the speedup is 3.4% and 40% respectively.
-
- Jun 23, 2020
-
-
Wilco Dijkstra authored
Add strlen benchmark with a random latency and small/medium throughput tests.
-
- Jun 12, 2020
-
-
Wilco Dijkstra authored
If limit is near SIZE_MAX it can overflow in the mutually aligned path. Fix this by clamping limit to SIZE_MAX.
-
- Jun 01, 2020
-
-
Wilco Dijkstra authored
Improve the previous fix - if a string is immediately preceded by a NUL byte and the first byte is 0x1, it may be confused by the NUL check as a NUL byte. Instead of removing bytes outside the string via a shift, force them to be non-NUL.
-
- May 29, 2020
-
-
Szabolcs Nagy authored
* New functionality (64-bit Arm) * string: Optimized MTE variants of strlen, strnlen, strchr, strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp, strncmp * string: Changes to support BTI * string: New optimized memrchr, strnlen * Performance improvements (Neoverse N1) * strchr/strchrnul: 21% improvement on long strings * strrchr: 11% improvement * strnlen: 130% improvement on long strings, 50% on short strings * Benchmark and tests * string: New memcpy benchmark * string: Cleanup testsuite and improve test coverage
-
Wilco Dijkstra authored
Ensure nul bytes before unaligned strings are correctly ignored.
-
- May 28, 2020
-
-
Wilco Dijkstra authored
Improve strcmp performance. On various micro architectures the speedup is 65% on large unaligned strings and 21% on large (mutually) aligned strings. On small unaligned strings the speedup is 12%.
-
Wilco Dijkstra authored
Print results in bytes/ns. Add medium and large copy benchmark.
-
Branislav Rankov authored
Set taggs for every test case so that boundaries are as narrow as possible. There is no handling of tag faults, so the test will crash if there is a MTE problem. The implementations that are not compatible are excluded, including the standard symbols that may come from an mte incompatible libc.
-
- May 22, 2020
-
-
Wilco Dijkstra authored
Clean up code and improve test coverage.
-
Wilco Dijkstra authored
Clean up code and improve test coverage.
-
Wilco Dijkstra authored
Clean up code and improve test coverage.
-
Wilco Dijkstra authored
Cleanup stpcpy test and improve test coverage.
-
Wilco Dijkstra authored
Cleanup strcpy test and improve test coverage.
-
Wilco Dijkstra authored
Cleanup strnlen test and improve test coverage.
-
Wilco Dijkstra authored
Cleanup strlen test and improve test coverage.
-
- May 20, 2020
-
-
Wilco Dijkstra authored
Add optimized MTE-compatible strcpy-mte and stpcpy-mte. On various micro architectures the speedup over the non-MTE version is 53% on large strings and 20-60% on small strings.
-
- May 18, 2020
-
-
Wilco Dijkstra authored
Improve strrchr performance by using a fast strchr loop to find the first match. On various micro architectures the speedup is 30-80% on large strings and 32% on small strings.
-
- May 13, 2020
-
-
Szabolcs Nagy authored
Add GNU property notes to asm files in asmdefs.h instead of adding the END_FILE macro to each file. The WANT_GNU_PROPERTY macro can be still used to opt-out from the notes.
-
- May 12, 2020
-
-
Wilco Dijkstra authored
Add new memrchr test.
-
Wilco Dijkstra authored
Add optimized MTE-comparible memrchr. This walks the input backwards using the same algorithm as memchr-mte.
-
Wilco Dijkstra authored
Improve strlen performance by using a much simpler SIMD implementation. On various micro architectures the speedup is 11% on large strings and 63% on small strings.
-
Wilco Dijkstra authored
Improve memchr test coverage and cleanup code.
-
Wilco Dijkstra authored
Improve strnlen test coverage and cleanup code.
-
Wilco Dijkstra authored
Improve memchr performance by using a more efficient termination test. On various micro architectures the speedup is 16% on large strings and 46% on small strings.
-
Wilco Dijkstra authored
Improve strnlen performance by using a much simpler SIMD implementation. On modern micro architectures the speedup is 2.3x on large strings and 1.5x on small strings.
-
Szabolcs Nagy authored
Use the GNU style consistently in the string test code. Added clang-format guard comments where necessary so the code can be reformated using the clang-format tool and GNU style settings from gcc contrib/clang-format.
-
- May 01, 2020
-
-
Szabolcs Nagy authored
GNU Property Notes are only supported in recent tooling and older tools may warn about them, so it makes sense to remove these notes on a system where BTI is not supported anyway. The actual BTI instructions should be kept in place to avoid disturbing code layout. -DWANT_GNU_PROPERTY=0 removes the .note.gnu.property section from assembly files (ideally it would be based on the compiler default setting, but there is no feature test macro for BTI and PAC-RET).
-
Wilco Dijkstra authored
Remove 2 more instructions, resulting in a 9.8% speedup of medium sized strings (16-32). The BTI patch changed ENTRY so the loops got misaligned, this fixes that regression.
-
Wilco Dijkstra authored
Remove 2 more instructions, resulting in a 6.8% speedup of medium sized strings (16-32). The BTI patch changed ENTRY so the loops got misaligned, this fixes that regression.
-