- May 29, 2020
-
-
Szabolcs Nagy authored
* New functionality (64-bit Arm) * string: Optimized MTE variants of strlen, strnlen, strchr, strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp, strncmp * string: Changes to support BTI * string: New optimized memrchr, strnlen * Performance improvements (Neoverse N1) * strchr/strchrnul: 21% improvement on long strings * strrchr: 11% improvement * strnlen: 130% improvement on long strings, 50% on short strings * Benchmark and tests * string: New memcpy benchmark * string: Cleanup testsuite and improve test coverage
-
Wilco Dijkstra authored
Ensure nul bytes before unaligned strings are correctly ignored.
-
- May 28, 2020
-
-
Wilco Dijkstra authored
Improve strcmp performance. On various micro architectures the speedup is 65% on large unaligned strings and 21% on large (mutually) aligned strings. On small unaligned strings the speedup is 12%.
-
Wilco Dijkstra authored
Print results in bytes/ns. Add medium and large copy benchmark.
-
Branislav Rankov authored
Set taggs for every test case so that boundaries are as narrow as possible. There is no handling of tag faults, so the test will crash if there is a MTE problem. The implementations that are not compatible are excluded, including the standard symbols that may come from an mte incompatible libc.
-
- May 22, 2020
-
-
Wilco Dijkstra authored
Clean up code and improve test coverage.
-
Wilco Dijkstra authored
Clean up code and improve test coverage.
-
Wilco Dijkstra authored
Clean up code and improve test coverage.
-
Wilco Dijkstra authored
Cleanup stpcpy test and improve test coverage.
-
Wilco Dijkstra authored
Cleanup strcpy test and improve test coverage.
-
Wilco Dijkstra authored
Cleanup strnlen test and improve test coverage.
-
Wilco Dijkstra authored
Cleanup strlen test and improve test coverage.
-
- May 20, 2020
-
-
Wilco Dijkstra authored
Add optimized MTE-compatible strcpy-mte and stpcpy-mte. On various micro architectures the speedup over the non-MTE version is 53% on large strings and 20-60% on small strings.
-
- May 18, 2020
-
-
Wilco Dijkstra authored
Improve strrchr performance by using a fast strchr loop to find the first match. On various micro architectures the speedup is 30-80% on large strings and 32% on small strings.
-
- May 13, 2020
-
-
Szabolcs Nagy authored
Add GNU property notes to asm files in asmdefs.h instead of adding the END_FILE macro to each file. The WANT_GNU_PROPERTY macro can be still used to opt-out from the notes.
-
- May 12, 2020
-
-
Wilco Dijkstra authored
Add new memrchr test.
-
Wilco Dijkstra authored
Add optimized MTE-comparible memrchr. This walks the input backwards using the same algorithm as memchr-mte.
-
Wilco Dijkstra authored
Improve strlen performance by using a much simpler SIMD implementation. On various micro architectures the speedup is 11% on large strings and 63% on small strings.
-
Wilco Dijkstra authored
Improve memchr test coverage and cleanup code.
-
Wilco Dijkstra authored
Improve strnlen test coverage and cleanup code.
-
Wilco Dijkstra authored
Improve memchr performance by using a more efficient termination test. On various micro architectures the speedup is 16% on large strings and 46% on small strings.
-
Wilco Dijkstra authored
Improve strnlen performance by using a much simpler SIMD implementation. On modern micro architectures the speedup is 2.3x on large strings and 1.5x on small strings.
-
Szabolcs Nagy authored
Use the GNU style consistently in the string test code. Added clang-format guard comments where necessary so the code can be reformated using the clang-format tool and GNU style settings from gcc contrib/clang-format.
-
- May 01, 2020
-
-
Szabolcs Nagy authored
GNU Property Notes are only supported in recent tooling and older tools may warn about them, so it makes sense to remove these notes on a system where BTI is not supported anyway. The actual BTI instructions should be kept in place to avoid disturbing code layout. -DWANT_GNU_PROPERTY=0 removes the .note.gnu.property section from assembly files (ideally it would be based on the compiler default setting, but there is no feature test macro for BTI and PAC-RET).
-
Wilco Dijkstra authored
Remove 2 more instructions, resulting in a 9.8% speedup of medium sized strings (16-32). The BTI patch changed ENTRY so the loops got misaligned, this fixes that regression.
-
Wilco Dijkstra authored
Remove 2 more instructions, resulting in a 6.8% speedup of medium sized strings (16-32). The BTI patch changed ENTRY so the loops got misaligned, this fixes that regression.
-
- Apr 30, 2020
-
-
Branislav Rankov authored
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strncmp.S Change the case when strings are are misaligned, align the pointers down, and ignore bytes before the start of the string. Carry the part that is not compared to the next comparison. Testing done: string/test/strncmp.c on big endian, little endian and with MTE support. Booted nanodroid with MTE enabled. Bechmarked on Pixel4.
-
Branislav Rankov authored
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strcmp.S Change the case when strings are are misaligned, align the pointers down, and ignore bytes before the start of the string. Carry the part that is not compared to the next comparison. Testing done: optimized-routines/string/test/strcmp.c on big and little endian. Booted nanodroid with MTE enabled. bionic string tests with MTE enabled. Benchmarks results: Run both bionic benchmarks and glibc benchmarks on Pixel4. Cores A76 and A55.
-
Tamas Zsoldos authored
This change addds the landing pads to the start of functions implemented in assembly, by adding it to the ENTRY macro. To avoid skipping it when using an alias, every ENTRY_ALIAS use must precede the corresponding ENTRY. Furthermore, the GNU property note is added to the assembly files. Since none of the functions save LR to stack, both BTI and PAC support are indicated. Paddings before __strncmp_aarch64 and __strnlen_aarch64 were adjusted.
-
Gabor Kertesz authored
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strrchr.S. Testing done: optimized-routines/string/test/strrchr.c Booted nanodroid with MTE enabled. Bionic string tests with MTE enabled. Big endian with Qemu: qemu-aarch64_be
-
Wilco Dijkstra authored
Improve strchrnul performance by using more efficient termination tests. On various micro architectures the speedup is 20% on large strings and 26% on small strings.
-
Wilco Dijkstra authored
Improve strchr performance by using a more efficient termination test. On various micro architectures the speedup is 19% on large strings and 19% on small strings.
-
Wilco Dijkstra authored
Improve strrchr performance by using a more efficient termination test. On various micro architectures the speedup is 11% on large strings.
-
Wilco Dijkstra authored
Improve strchrnul performance by using a more efficient termination test. On various micro architectures the speedup is 21% on large strings.
-
Wilco Dijkstra authored
Improve strchr performance by using a more efficient termination test. On various micro architectures the speedup is 21% on large strings.
-
Szabolcs Nagy authored
Don't stop at first failing test and allow running tests in parallel.
-
- Apr 29, 2020
-
-
Szabolcs Nagy authored
Use matching and null characters in the padding area around the string. Remove large input tests.
-
- Apr 24, 2020
-
-
Szabolcs Nagy authored
Tests printed too much output on broken string function and the output was not entirely useful. Added a new header file with some common logic for printing buffers nicely. In str* tests len now means string length (not buffer size which was confusing).
-
- Apr 08, 2020
-
-
Gabor Kertesz authored
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strchr-mte.S and string/aarch64/strchrnul.S Testing done: optimized-routines/string/test/strchrnul.c Booted nanodroid with MTE enabled. bionic string tests with MTE enabled. Big endian with Qemu: qemu-aarch64_be
-
Gabor Kertesz authored
Previously used LDR istrunction resulted little endian behavior. This LD1 results byte-by-byte load. Testing done: optimized-routines/string/test/memchr.c Big Endian test: qemu-aarch64_be Booted nanodroid with MTE enabled. bionic string tests with MTE enabled.
-