- Aug 14, 2020
-
-
Wilco Dijkstra authored
Add benchmarking of forward and backward unaligned memmoves.
-
Wilco Dijkstra authored
On some microarchitectures performance of the backwards memmove improves if the stores use STR with decreasing addresses.
-
- Jul 29, 2020
-
-
Adhemerval Zanella authored
This fix is similar to the one done one glibc (beea361050).
-
- Jul 01, 2020
-
-
Wilco Dijkstra authored
Optimize strlen using a mix of scalar and SIMD code. On modern micro architectures large strings are 55% faster than the current version, and 35% faster than strlen-mte. On the random strlen benchmark the speedup is 3.4% and 40% respectively.
-
- Jun 23, 2020
-
-
Wilco Dijkstra authored
Add strlen benchmark with a random latency and small/medium throughput tests.
-
- Jun 12, 2020
-
-
Wilco Dijkstra authored
If limit is near SIZE_MAX it can overflow in the mutually aligned path. Fix this by clamping limit to SIZE_MAX.
-
- Jun 01, 2020
-
-
Wilco Dijkstra authored
Improve the previous fix - if a string is immediately preceded by a NUL byte and the first byte is 0x1, it may be confused by the NUL check as a NUL byte. Instead of removing bytes outside the string via a shift, force them to be non-NUL.
-
- May 29, 2020
-
-
Szabolcs Nagy authored
* New functionality (64-bit Arm) * string: Optimized MTE variants of strlen, strnlen, strchr, strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp, strncmp * string: Changes to support BTI * string: New optimized memrchr, strnlen * Performance improvements (Neoverse N1) * strchr/strchrnul: 21% improvement on long strings * strrchr: 11% improvement * strnlen: 130% improvement on long strings, 50% on short strings * Benchmark and tests * string: New memcpy benchmark * string: Cleanup testsuite and improve test coverage
-
Wilco Dijkstra authored
Ensure nul bytes before unaligned strings are correctly ignored.
-
- May 28, 2020
-
-
Wilco Dijkstra authored
Improve strcmp performance. On various micro architectures the speedup is 65% on large unaligned strings and 21% on large (mutually) aligned strings. On small unaligned strings the speedup is 12%.
-
Wilco Dijkstra authored
Print results in bytes/ns. Add medium and large copy benchmark.
-
Branislav Rankov authored
Set taggs for every test case so that boundaries are as narrow as possible. There is no handling of tag faults, so the test will crash if there is a MTE problem. The implementations that are not compatible are excluded, including the standard symbols that may come from an mte incompatible libc.
-
- May 22, 2020
-
-
Wilco Dijkstra authored
Clean up code and improve test coverage.
-
Wilco Dijkstra authored
Clean up code and improve test coverage.
-
Wilco Dijkstra authored
Clean up code and improve test coverage.
-
Wilco Dijkstra authored
Cleanup stpcpy test and improve test coverage.
-
Wilco Dijkstra authored
Cleanup strcpy test and improve test coverage.
-
Wilco Dijkstra authored
Cleanup strnlen test and improve test coverage.
-
Wilco Dijkstra authored
Cleanup strlen test and improve test coverage.
-
- May 20, 2020
-
-
Wilco Dijkstra authored
Add optimized MTE-compatible strcpy-mte and stpcpy-mte. On various micro architectures the speedup over the non-MTE version is 53% on large strings and 20-60% on small strings.
-
- May 18, 2020
-
-
Wilco Dijkstra authored
Improve strrchr performance by using a fast strchr loop to find the first match. On various micro architectures the speedup is 30-80% on large strings and 32% on small strings.
-
- May 13, 2020
-
-
Szabolcs Nagy authored
Add GNU property notes to asm files in asmdefs.h instead of adding the END_FILE macro to each file. The WANT_GNU_PROPERTY macro can be still used to opt-out from the notes.
-
- May 12, 2020
-
-
Wilco Dijkstra authored
Add new memrchr test.
-
Wilco Dijkstra authored
Add optimized MTE-comparible memrchr. This walks the input backwards using the same algorithm as memchr-mte.
-
Wilco Dijkstra authored
Improve strlen performance by using a much simpler SIMD implementation. On various micro architectures the speedup is 11% on large strings and 63% on small strings.
-
Wilco Dijkstra authored
Improve memchr test coverage and cleanup code.
-
Wilco Dijkstra authored
Improve strnlen test coverage and cleanup code.
-
Wilco Dijkstra authored
Improve memchr performance by using a more efficient termination test. On various micro architectures the speedup is 16% on large strings and 46% on small strings.
-
Wilco Dijkstra authored
Improve strnlen performance by using a much simpler SIMD implementation. On modern micro architectures the speedup is 2.3x on large strings and 1.5x on small strings.
-
Szabolcs Nagy authored
Use the GNU style consistently in the string test code. Added clang-format guard comments where necessary so the code can be reformated using the clang-format tool and GNU style settings from gcc contrib/clang-format.
-
- May 01, 2020
-
-
Szabolcs Nagy authored
GNU Property Notes are only supported in recent tooling and older tools may warn about them, so it makes sense to remove these notes on a system where BTI is not supported anyway. The actual BTI instructions should be kept in place to avoid disturbing code layout. -DWANT_GNU_PROPERTY=0 removes the .note.gnu.property section from assembly files (ideally it would be based on the compiler default setting, but there is no feature test macro for BTI and PAC-RET).
-
Wilco Dijkstra authored
Remove 2 more instructions, resulting in a 9.8% speedup of medium sized strings (16-32). The BTI patch changed ENTRY so the loops got misaligned, this fixes that regression.
-
Wilco Dijkstra authored
Remove 2 more instructions, resulting in a 6.8% speedup of medium sized strings (16-32). The BTI patch changed ENTRY so the loops got misaligned, this fixes that regression.
-
- Apr 30, 2020
-
-
Branislav Rankov authored
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strncmp.S Change the case when strings are are misaligned, align the pointers down, and ignore bytes before the start of the string. Carry the part that is not compared to the next comparison. Testing done: string/test/strncmp.c on big endian, little endian and with MTE support. Booted nanodroid with MTE enabled. Bechmarked on Pixel4.
-
Branislav Rankov authored
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strcmp.S Change the case when strings are are misaligned, align the pointers down, and ignore bytes before the start of the string. Carry the part that is not compared to the next comparison. Testing done: optimized-routines/string/test/strcmp.c on big and little endian. Booted nanodroid with MTE enabled. bionic string tests with MTE enabled. Benchmarks results: Run both bionic benchmarks and glibc benchmarks on Pixel4. Cores A76 and A55.
-
Tamas Zsoldos authored
This change addds the landing pads to the start of functions implemented in assembly, by adding it to the ENTRY macro. To avoid skipping it when using an alias, every ENTRY_ALIAS use must precede the corresponding ENTRY. Furthermore, the GNU property note is added to the assembly files. Since none of the functions save LR to stack, both BTI and PAC support are indicated. Paddings before __strncmp_aarch64 and __strnlen_aarch64 were adjusted.
-
Gabor Kertesz authored
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strrchr.S. Testing done: optimized-routines/string/test/strrchr.c Booted nanodroid with MTE enabled. Bionic string tests with MTE enabled. Big endian with Qemu: qemu-aarch64_be
-
Wilco Dijkstra authored
Improve strchrnul performance by using more efficient termination tests. On various micro architectures the speedup is 20% on large strings and 26% on small strings.
-
Wilco Dijkstra authored
Improve strchr performance by using a more efficient termination test. On various micro architectures the speedup is 19% on large strings and 19% on small strings.
-
Wilco Dijkstra authored
Improve strrchr performance by using a more efficient termination test. On various micro architectures the speedup is 11% on large strings.
-