Enable decode input reads in 64 bit chunks
This enables reading bigger chunks of data in the DEFLATE decoder on aarch64. Basically instead of performing 2x 32-bit loads (i.e. ldrb w22,[x9]) followed by a second write in higher lane of the register (i.e. w23), memcpy will do a 64-bit load to the same register. (i.e. ldr x22, [x9]). This also allows to halve the amount of following operations (i.e. adds and shifts), improving performance in decompression. For JavaScript content the gain was close to 14% in big cores (A72) and 9% for little cores (A53). Bug: 812499 Change-Id: I010604ee62e72a769ce2a7912afb7e334adefacf Reviewed-on: https://chromium-review.googlesource.com/c/1447042 Reviewed-by: Mike Klein <mtklein@chromium.org> Reviewed-by: Adenilson Cavalcanti <cavalcantii@chromium.org> Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#628091} Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src Cr-Mirrored-Commit: e2aef12cf002ca3577b9bfea3f2a89eed5379a4f
Loading
Please register or sign in to comment