aarch64: Combine memcpy and memmove implementations
Modify integer and SIMD versions of memcpy to handle overlaps correctly. Make __memmove_aarch64 and __memmove_aarch64_simd alias to __memcpy_aarch64 and __memcpy_aarch64_simd respectively. Complete sharing of code between memcpy and memmove implementations is possible without noticeable performance penalty. This is thanks to moving the source and destination buffer overlap detection after the code for handling small and medium copies which are overlap-safe anyway. Benchmarking shows that keeping two versions of memcpy is necessary because newer platforms favor aligning src over destination for large copies. Using NEON registers also gives a small speedup. However, aligning dst and using general-purpose registers works best for older platforms. Consequently, memcpy.S and memcpy_simd.S contain memcpy code which is identical except for the registers used and src vs dst alignment.
Loading
Please register or sign in to comment