Skip to content
Commit 3377796f authored by Krzysztof Koch's avatar Krzysztof Koch Committed by Szabolcs Nagy
Browse files

aarch64: Combine memcpy and memmove implementations

Modify integer and SIMD versions of memcpy to handle overlaps correctly.

Make __memmove_aarch64 and __memmove_aarch64_simd alias to
__memcpy_aarch64 and __memcpy_aarch64_simd respectively.

Complete sharing of code between memcpy and memmove implementations is
possible without noticeable performance penalty. This is thanks to
moving the source and destination buffer overlap detection after
the code for handling small and medium copies which are overlap-safe
anyway.

Benchmarking shows that keeping two versions of memcpy is necessary
because newer platforms favor aligning src over destination for large
copies. Using NEON registers also gives a small speedup. However,
aligning dst and using general-purpose registers works best for older
platforms. Consequently, memcpy.S and memcpy_simd.S contain memcpy
code which is identical except for the registers used and src vs dst
alignment.
parent 709020ed
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment