Commits · dbb919dc12bbca2ea0f3426758773bd99212033d · Android-smartphones / realme / realme 5pro / kaderbava / external_arm-optimized-routines

Jan 03, 2020

Add the Assignment Agreement v1.1 document · dbb919dc

Szabolcs Nagy authored Jan 02, 2020

This Assignment Agreement has to be filled in, signed and sent to
optimized-routines-assignment@arm.com by Contributors before their
contributions can be accepted into optimized-routines.

dbb919dc

Jan 02, 2020
- string: Use L(name) for labels · 833e8609
  Wilco Dijkstra authored Jan 02, 2020
```
Use L(name) for all assembler labels.
```
  833e8609
- string: Use asmdefs.h, ENTRY and END · 31b560bc
  Wilco Dijkstra authored Jan 02, 2020
```
Cleanup string functions to use asmdefs.h, ENTRY and END instead of
defining macros in each file.
```
  31b560bc
Dec 10, 2019

aarch64: Combine memcpy and memmove implementations · 3377796f

Krzysztof Koch authored Dec 09, 2019

Modify integer and SIMD versions of memcpy to handle overlaps correctly.

Make __memmove_aarch64 and __memmove_aarch64_simd alias to
__memcpy_aarch64 and __memcpy_aarch64_simd respectively.

Complete sharing of code between memcpy and memmove implementations is
possible without noticeable performance penalty. This is thanks to
moving the source and destination buffer overlap detection after
the code for handling small and medium copies which are overlap-safe
anyway.

Benchmarking shows that keeping two versions of memcpy is necessary
because newer platforms favor aligning src over destination for large
copies. Using NEON registers also gives a small speedup. However,
aligning dst and using general-purpose registers works best for older
platforms. Consequently, memcpy.S and memcpy_simd.S contain memcpy
code which is identical except for the registers used and src vs dst
alignment.

3377796f

Nov 27, 2019
- Update the readme · 709020ed
  Szabolcs Nagy authored Nov 27, 2019
```
Mention releases.
```
  709020ed
Nov 26, 2019

arch64: Add SIMD version of memcpy · 6d3ae5fc

Krzysztof Koch authored Nov 25, 2019

Create a new memcpy implementation for targets with the NEON extension.

__memcpy_aarch64_simd has been tested on a range of modern
microarchitectures. It turned out to be faster than __memcpy_aarch64 on
all of them, with a performance improvement of 3-11% depending on the
platform.

6d3ae5fc

aarch64: Use common header file in memcpy.S · 015c9519

Krzysztof Koch authored Nov 25, 2019

Include asmdefs.h in memcpy.S to avoid duplicate macro definitions.

Add macro for defining labels in asmdefs.h.

Change the default routine entry point alignment to 64 bytes.

Define a new macro which allows controlling the entry point alignment.

Add include guard to asmdefs.h.

015c9519

Makefile tweak for better subproject handling · dec9ffea

Szabolcs Nagy authored Nov 26, 2019

Don't include the makefile fragments of subprojects that aren't built.

With this the build fails more reasonably when SUBS is set incorrectly.

dec9ffea

Nov 22, 2019

Build system refactoring · 1fd2aaae

Szabolcs Nagy authored Nov 20, 2019

Reorganise the makefiles so subprojects can be more separately used and
maintained. Still kept the single toplevel Makefile and config.mk.

Subproject Dir.mk is expected to provide all-X, check-X, clean-X and
install-X targets where X is the subproject name and it may use generic
make variables set in config.mk, like CFLAGS_ALL and CC, or subproject
specific variables like X-cflags.

1fd2aaae

Nov 19, 2019

string: Use .d rather than .2d for element mov instructions · 4f4e530b

George Steed authored Nov 19, 2019

Use .d rather than .2d for element mov instructions in string routines
so the assembly compiles with clang too.

4f4e530b

Nov 06, 2019

math: add WANT_VMATH feature macro · 1f3b1638
Szabolcs Nagy authored Nov 06, 2019
```
When defined as 0 the vector math code is not built and not tested.
```
1f3b1638

math: allow errno setting even if !(math_errhandling&MATH_ERRNO) · 675721a4

Szabolcs Nagy authored Nov 06, 2019

The math_errhandling checks are incorrect in general: it is defined
by the libc math.h which is not appropriate for optimized-routines
provided functions that we are testing.

However even if we want to test a libc implementation, ISO C allows
the setting of errno even if !(math_errhandling&MATH_ERRNO), so
relax the checks.

675721a4

math: fix unused function warnings in mathbench.c · 80922e8b

Szabolcs Nagy authored Nov 06, 2019

Vector functions are only used on aarch64, so only define them there.

  math/test/mathbench.c:95:1: warning: '__v_dummyf' defined but not used [-Wunused-function]

80922e8b

math: fix missing attributes warnings · 2c6c2405

Szabolcs Nagy authored Nov 06, 2019

gcc-9 started warning if alias symbols have different attributes:

math/expf.c: At top level:
math/expf.c:89:21: warning: '__expf_finite' specifies less restrictive attributes than its target 'expf': 'leaf', 'nothrow', 'pure' [-Wmissing-attributes]

so copy the attributes when creating the aliases.

2c6c2405

math: fix unused variable warnings · 17cd8af3

Szabolcs Nagy authored Nov 06, 2019

Compilers (incorrectly) warn about unused volatile variables:

  math/math_config.h: In function 'force_eval_float':
  math/math_config.h:188:18: warning: unused variable 'y' [-Wunused-variable]

silence them.

17cd8af3

math: move definitions in internal header · 9e6e14e2

Szabolcs Nagy authored Nov 06, 2019

Compiler checks and realated macros need to be done earlier so they are
usable for the static inline functions.

9e6e14e2

Fix building outside the source directory · 0eb42807
Szabolcs Nagy authored Nov 06, 2019
```
Fix the Makefile so the documented mechanism in the README still works.
```
0eb42807

Nov 05, 2019

Add vector exp2f · 69170e15

Szabolcs Nagy authored Oct 14, 2019

Same design as in expf. Worst-case error of __v_exp2f and __v_exp2f_1u
is 1.96 and 0.88 ulp respectively.

It is not clear if round/convert instructions are better or +- Shift.
For expf the latter, for exp2f the former seems more consistently
faster, but both options are kept in the code for now.

69170e15

math: fix runulp.sh · 65464ec6

Szabolcs Nagy authored Nov 05, 2019

Use heredoc instead of pipe when iterating over test cases to avoid
creating a subshell that would break the PASS/FAIL accounting.

65464ec6

aarch64: Increase small and medium cases for memcpy · fec28a72

Krzysztof Koch authored Oct 25, 2019

Increase the upper bound on medium cases from 96 to 128 bytes.
Now, up to 128 bytes are copied unrolled.

Increase the upper bound on small cases from 16 to 32 bytes so that
copies of 17-32 bytes are not impacted by the larger medium case.

fec28a72

Oct 17, 2019

Add -Werror=implicit-function-declaration · 433a3b1f

Szabolcs Nagy authored Oct 17, 2019

Implicit function declaration is always a bug, but compilers don't
turn it into an error by default for historical reasons, so add it
to the default config.

433a3b1f

fix the build of s_powf.o on non-aarch64 targets · 3d7ecfe3

Szabolcs Nagy authored Oct 17, 2019

This is a simple fix to the v_powf code, but in general the vector
code may not work on arbitrary targets even when compiled with
scalar types (s_powf.c), so in the long term may be all s_* should
be disabled for non-aarch64 targets (requires test system and header
changes too).

3d7ecfe3

Oct 14, 2019

Add vector log · d984098b

Szabolcs Nagy authored Aug 29, 2019

Worst-case error is 1.67 ulp, the polynomial was generated by sollya.
Uses a 128 entry (2KB) lookup table. Special cases fall back to scalar
log call.

d984098b

Add vector sin and cos · a2f717ef

Szabolcs Nagy authored Aug 09, 2019

Worst-case error is 3.5 ulp, the polynomial was generated by sollya.
For large (>2^23) and special inputs the code falls back to scalar
sin and cos.

a2f717ef

Add vector powf · ba75d0a0

Szabolcs Nagy authored Aug 09, 2019

Essentially the scalar powf algorithm is used for each element in the
vector just inlined for better scheduling and simpler special case
handling. The log polynomial is smaller as less accuracy is enough.

Worst-case error is 2.6 ulp.

ba75d0a0

Add vector sinf and cosf · c5cba852

Szabolcs Nagy authored Aug 09, 2019

The polynomials were produced by searching the coefficient space using
heuristics and ideas from https://arxiv.org/abs/1508.03211

The worst-case error is 1.886 ulp, large inputs (> 2^20) and other
special cases use scalar sinf and cosf.

c5cba852

Add vector logf · c280e49d

Szabolcs Nagy authored Aug 09, 2019

The polynomial was produced by searching the coefficient space using
heuristics and ideas from https://arxiv.org/abs/1508.03211

The worst-case error is 3.34 ulp, subnormal range inputs and other
special cases use scalar logf.

c280e49d

Add vector exp, expf and related vector math support code · 7a1f4cfd

Szabolcs Nagy authored Jul 18, 2019

Vector math routines are added to the same libmathlib library as scalar
ones. The difficulty is that they are not always available, the external
abi depends on the compiler version used for the build. Currently only
aarch64 AdvSIMD is supported, there are 4 new sets of symbols:

  __s_foo is a scalar function with identical result to the vector one,
  __v_foo is a vector function using the base PCS,
  __vn_foo uses the vector PCS and
  _ZGV*_foo is the vector ABI symbol alias of vn_foo

for a scalar math function foo.

The test and benchmark code got extended to handle vector functions.

Vector functions aim for < 5 ulp worst case error, only support nearest
rounding mode and don't support floating-point exceptions. Vector
functions may call scalar functions to handle special cases, but for a
single value they should return the same result independently of values
in other vector lanes or the position of the value in the vector.

The __v_expf and __v_expf_1u polynomials were produced by searching the
coefficient space with some heuristics and ideas from
https://arxiv.org/abs/1508.03211
Their worst case error is 1.95 and 0.866 ulp respectively.

The exp polynomial was produced by sollya, it uses a 128 element (1KB)
lookup table and has 2.38 ulp worst case error.

7a1f4cfd

math: more robust mathbench_libc · a88f3f60

Szabolcs Nagy authored Oct 14, 2019

Not all symbols referenced by mathbench may be available in libc so
link to libmathlib too to resolve the missing symbols.

a88f3f60

math: update the plot script · 0a51e645
Szabolcs Nagy authored Oct 14, 2019
```
Fix it to be python3 compatible and plot the exact and approximated
values too.
```
0a51e645

Prevent fenv access breaking optimizations of the ulp tool · 60463383

Szabolcs Nagy authored Oct 14, 2019

The ulp tool compares output of a math function to a larger precision
implementation of the same function.

But when the input argument is converted to a larger precision number
the signaling nan property is lost, so ensure that the conversion
happens inside the critical region where fenv exceptions are checked
and then the conversion itself will raise the invalid exception, which
is the correct behaviour in most cases.

The volatile barrier is not perfect and the snan behaviour is not
always signaling, but this should give more reliable results in most
cases than before.

60463383

Oct 08, 2019
- Support running the math tests without fenv checks · ef987d37
  Szabolcs Nagy authored Oct 08, 2019
```
fenv support is not reliable in clang so provide a mechanism to
disable fenv status checks and only check the result values.
```
  ef987d37
- Support separate CFLAGS per subproject · 0705df19
  Szabolcs Nagy authored Sep 11, 2019
```
Users may want different CFLAGS for math and string subprojects, expose
a mechanism for this in config.mk.
```
  0705df19
- Use CFLAGS_SHARED for shared libraries · 1e0c8023
  Szabolcs Nagy authored Aug 06, 2019
```
Allows optimizing the code in shared libraries differently.
Has significant effect on literal loads in simd code.
```
  1e0c8023
- fix exit status of ulp and runulp.sh · 8dcd0638
  Szabolcs Nagy authored Oct 08, 2019
```
Make ulp and runulp.sh fail on error.
```
  8dcd0638
- Document subproject make targets in the README · 39dc36a8
  Szabolcs Nagy authored Oct 08, 2019
  
  39dc36a8
- Document debian build deps in the README · 32fe1c5b
  Szabolcs Nagy authored Oct 08, 2019
  
  32fe1c5b
- fix mathtest lineno accounting · 614d783b
  Szabolcs Nagy authored Oct 08, 2019
```
Only increment once per fgets.
```
  614d783b
- fix the exit status of mathtest on failure · db49c4bc
  Szabolcs Nagy authored Oct 08, 2019
```
Make mathtest fail on error so make check fails too.
```
  db49c4bc
Aug 29, 2019

string: print passing test cases · 5e3d7561

Szabolcs Nagy authored Aug 29, 2019

Without printing anything on success it is unclear if the right set
of functions got hooked up in the test code.

5e3d7561