Commits · 94b4be6009322a584cd3f7b74f22565a162bddfa · Android-smartphones / realme / realme 5pro / kaderbava / external_arm-optimized-routines

Nov 13, 2020

math: fix spurious underflow in erff and erf · 94b4be60

Szabolcs Nagy authored Nov 13, 2020

The code relied on the final x + c*x to be done via an fma, otherwise
the intermediate c*x could underflow for tiny (almost subnormal) x.

Use explicit fmaf like elsewhere (this code is not expected to be
fast when fma is not inlined, but at least it should be correct).

94b4be60

math: fix erf tests in directed rounding modes · 15a1d62b

Szabolcs Nagy authored Nov 13, 2020

erf has larger than 1 ULP errors in directed rounding modes,
increase the error threshold to 1.4 ULP in the test script.

15a1d62b

math: add scalar erf · 00e5afdb

Pierre Blanchard authored Nov 05, 2020

Only tested in round-to-nearest mode. The expected worst case error
is 1.01 ULP near x=1.25. Benchmarked over random x in [-6,6] and
can increase performance by > 2x (> 3.5x for throughput) on big ooo
cores compared to the implementation in glibc 2.28.

Includes data for erfc too, but this patch only adds erf.

00e5afdb

Nov 05, 2020
- networking: Fix the copyright notice in chksum.c · fc3fcc91
  Szabolcs Nagy authored Nov 04, 2020
```
Make the formatting consistent with other copyright notices.
(This helps me doing automatic license header checks.)
```
  fc3fcc91
- math: Fix copyright header in erff.tst · 935f9a4a
  Szabolcs Nagy authored Nov 04, 2020
```
This was incorrect in the previous commit.
```
  935f9a4a
Oct 29, 2020

math: add scalar erff. · 1a79237d

Pierre Blanchard authored Oct 29, 2020

In round-to-nearest mode the maximum error is 1.09 ULP.

Compared to glibc-2.28 erff: throughput is about 2.2x better,
latency is about 1.5x better on some AArch64 cores (on random
input in [-4,4]).

There are further optimization and quality improvement opportunities.

1a79237d

Aug 14, 2020

v20.08 release · 0f4ae0c5

Szabolcs Nagy authored Aug 14, 2020

* Bug fixes
  * strcmp-mte nul check
  * strncmp-mte with large size
  * arm memcpy with large size (CVE-2020-6096)
* String routines performance improvements
  * strlen
  * memmove with backward copy
* Benchmarking code for strings and memory routines
  * strlen

0f4ae0c5

string: Benchmark unaligned memmove · ded1e17e
Wilco Dijkstra authored Aug 14, 2020
```
Add benchmarking of forward and backward unaligned memmoves.
```
ded1e17e

string: Improve backwards memmove performance · cf3b6b37

Wilco Dijkstra authored Aug 14, 2020

On some microarchitectures performance of the backwards memmove improves
if the stores use STR with decreasing addresses.

cf3b6b37

Jul 29, 2020
- string: Fix CVE-2020-6096 for arm memcpy · 77ac889d
  Adhemerval Zanella authored Jul 27, 2020
```
This fix is similar to the one done one glibc (beea361050).
```
  77ac889d
Jul 01, 2020

string: Optimize strlen · 224cb5f6

Wilco Dijkstra authored Jul 01, 2020

Optimize strlen using a mix of scalar and SIMD code. On modern micro
architectures large strings are 55% faster than the current version,
and 35% faster than strlen-mte.  On the random strlen benchmark the
speedup is 3.4% and 40% respectively.

224cb5f6

Jun 23, 2020
- string: Add strlen benchmark · bb88c18f
  Wilco Dijkstra authored Jun 23, 2020
```
Add strlen benchmark with a random latency and small/medium throughput tests.
```
  bb88c18f
Jun 12, 2020

string: Fix overflow issue in strncmp-mte · d4f3f7f0

Wilco Dijkstra authored Jun 12, 2020

If limit is near SIZE_MAX it can overflow in the mutually aligned path.
Fix this by clamping limit to SIZE_MAX.

d4f3f7f0

Jun 01, 2020

string: Fix issue in strcmp-mte NUL check · f08b12e8

Wilco Dijkstra authored Jun 01, 2020

Improve the previous fix - if a string is immediately preceded by a NUL byte
and the first byte is 0x1, it may be confused by the NUL check as a NUL byte.
Instead of removing bytes outside the string via a shift, force them to be
non-NUL.

f08b12e8

May 29, 2020

v20.05 release · ef907c7a

Szabolcs Nagy authored May 29, 2020

* New functionality (64-bit Arm)
  * string: Optimized MTE variants of strlen, strnlen, strchr,
    strchrnul, strrchr, memchr, memrchr, strcpy, stpcpy, strcmp,
    strncmp
  * string: Changes to support BTI
  * string: New optimized memrchr, strnlen
* Performance improvements (Neoverse N1)
  * strchr/strchrnul: 21% improvement on long strings
  * strrchr: 11% improvement
  * strnlen: 130% improvement on long strings, 50% on short strings
* Benchmark and tests
  * string: New memcpy benchmark
  * string: Cleanup testsuite and improve test coverage

ef907c7a

string: Fix issue in strcmp-mte · 304137d8
Wilco Dijkstra authored May 29, 2020
```
Ensure nul bytes before unaligned strings are correctly ignored.
```
304137d8

May 28, 2020

string: Improve strcmp-mte performance · 27bb6b2b

Wilco Dijkstra authored May 28, 2020

Improve strcmp performance. On various micro architectures the speedup is 65%
on large unaligned strings and 21% on large (mutually) aligned strings.
On small unaligned strings the speedup is 12%.

27bb6b2b

string: Improve memcpy benchmark · 2525af9b
Wilco Dijkstra authored May 28, 2020
```
Print results in bytes/ns. Add medium and large copy benchmark.
```
2525af9b

string: Add MTE support to string tests. · 4d55c2d3

Branislav Rankov authored May 28, 2020

Set taggs for every test case so that boundaries are as narrow as
possible. There is no handling of tag faults, so the test will
crash if there is a MTE problem.

The implementations that are not compatible are excluded, including
the standard symbols that may come from an mte incompatible libc.

4d55c2d3

May 22, 2020
- string: Cleanup strchrnul test · 09b21d98
  Wilco Dijkstra authored May 22, 2020
```
Clean up code and improve test coverage.
```
  09b21d98
- string: Cleanup strchr test · 620b09f1
  Wilco Dijkstra authored May 22, 2020
```
Clean up code and improve test coverage.
```
  620b09f1
- string: Cleanup strrchr test · f5edabeb
  Wilco Dijkstra authored May 22, 2020
```
Clean up code and improve test coverage.
```
  f5edabeb
- string: Cleanup stpcpy test · e3b6fdf1
  Wilco Dijkstra authored May 22, 2020
```
Cleanup stpcpy test and improve test coverage.
```
  e3b6fdf1
- string: Cleanup strcpy test · 76203e7e
  Wilco Dijkstra authored May 22, 2020
```
Cleanup strcpy test and improve test coverage.
```
  76203e7e
- string: Cleanup strnlen test · edfa34d2
  Wilco Dijkstra authored May 22, 2020
```
Cleanup strnlen test and improve test coverage.
```
  edfa34d2
- string: Cleanup strlen test · 833a1ea7
  Wilco Dijkstra authored May 21, 2020
```
Cleanup strlen test and improve test coverage.
```
  833a1ea7
May 20, 2020

string: Add optimized strcpy-mte and stpcpy-mte · 0c9a5f3e

Wilco Dijkstra authored May 20, 2020

Add optimized MTE-compatible strcpy-mte and stpcpy-mte. On various micro
architectures the speedup over the non-MTE version is 53% on large strings
and 20-60% on small strings.

0c9a5f3e

May 18, 2020

string: Improve strrchr-mte performance · a99a1a96

Wilco Dijkstra authored May 18, 2020

Improve strrchr performance by using a fast strchr loop to find the first
match. On various micro architectures the speedup is 30-80% on large strings
and 32% on small strings.

a99a1a96

May 13, 2020

string: cleaner handling of GNU property notes · 98e4d6a5

Szabolcs Nagy authored May 12, 2020

Add GNU property notes to asm files in asmdefs.h instead of adding
the END_FILE macro to each file.

The WANT_GNU_PROPERTY macro can be still used to opt-out from the
notes.

98e4d6a5

May 12, 2020

string: Add memrchr test · 875cc5fd
Wilco Dijkstra authored May 12, 2020
```
Add new memrchr test.
```
875cc5fd

string: Add optimized memrchr · ad3f8def

Wilco Dijkstra authored May 12, 2020

Add optimized MTE-comparible memrchr. This walks the input backwards
using the same algorithm as memchr-mte.

ad3f8def

string: Improve strlen-mte performance · 2fdbac97

Wilco Dijkstra authored May 12, 2020

Improve strlen performance by using a much simpler SIMD implementation.
On various micro architectures the speedup is 11% on large strings and
63% on small strings.

2fdbac97

string: Cleanup memchr test · 04957075
Wilco Dijkstra authored May 12, 2020
```
Improve memchr test coverage and cleanup code.
```
04957075
string: Cleanup strnlen test · e7517100
Wilco Dijkstra authored May 12, 2020
```
Improve strnlen test coverage and cleanup code.
```
e7517100

string: Improve memchr-mte performance · cbbc5965

Wilco Dijkstra authored May 12, 2020

Improve memchr performance by using a more efficient termination test.
On various micro architectures the speedup is 16% on large strings and
46% on small strings.

cbbc5965

string: Improve strnlen performance · 6b23ea83

Wilco Dijkstra authored May 12, 2020

Improve strnlen performance by using a much simpler SIMD implementation.
On modern micro architectures the speedup is 2.3x on large strings and
1.5x on small strings.

6b23ea83

string: format tests according to GNU style · 0b7d1aeb

Szabolcs Nagy authored May 12, 2020

Use the GNU style consistently in the string test code.

Added clang-format guard comments where necessary so the
code can be reformated using the clang-format tool and
GNU style settings from gcc contrib/clang-format.

0b7d1aeb

May 01, 2020

string: add a setting to disable GNU Property Notes · e1127946

Szabolcs Nagy authored May 01, 2020

GNU Property Notes are only supported in recent tooling and older
tools may warn about them, so it makes sense to remove these notes
on a system where BTI is not supported anyway.

The actual BTI instructions should be kept in place to avoid
disturbing code layout.

-DWANT_GNU_PROPERTY=0 removes the .note.gnu.property section
from assembly files (ideally it would be based on the compiler
default setting, but there is no feature test macro for BTI and
PAC-RET).

e1127946

string: Further improve strchrnul-mte performance · fa69d42a

Wilco Dijkstra authored May 01, 2020

Remove 2 more instructions, resulting in a 9.8% speedup of medium
sized strings (16-32).

The BTI patch changed ENTRY so the loops got misaligned, this fixes
that regression.

fa69d42a

string: Further improve strchr-mte performance · 7bb8464f

Wilco Dijkstra authored May 01, 2020

Remove 2 more instructions, resulting in a 6.8% speedup of medium
sized strings (16-32).

The BTI patch changed ENTRY so the loops got misaligned, this fixes
that regression.

7bb8464f