- Nov 26, 2019
-
-
Krzysztof Koch authored
Create a new memcpy implementation for targets with the NEON extension. __memcpy_aarch64_simd has been tested on a range of modern microarchitectures. It turned out to be faster than __memcpy_aarch64 on all of them, with a performance improvement of 3-11% depending on the platform.
-
Krzysztof Koch authored
Include asmdefs.h in memcpy.S to avoid duplicate macro definitions. Add macro for defining labels in asmdefs.h. Change the default routine entry point alignment to 64 bytes. Define a new macro which allows controlling the entry point alignment. Add include guard to asmdefs.h.
-
Szabolcs Nagy authored
Don't include the makefile fragments of subprojects that aren't built. With this the build fails more reasonably when SUBS is set incorrectly.
-
- Nov 22, 2019
-
-
Szabolcs Nagy authored
Reorganise the makefiles so subprojects can be more separately used and maintained. Still kept the single toplevel Makefile and config.mk. Subproject Dir.mk is expected to provide all-X, check-X, clean-X and install-X targets where X is the subproject name and it may use generic make variables set in config.mk, like CFLAGS_ALL and CC, or subproject specific variables like X-cflags.
-
- Nov 19, 2019
-
-
George Steed authored
Use .d rather than .2d for element mov instructions in string routines so the assembly compiles with clang too.
-
- Nov 06, 2019
-
-
Szabolcs Nagy authored
When defined as 0 the vector math code is not built and not tested.
-
Szabolcs Nagy authored
The math_errhandling checks are incorrect in general: it is defined by the libc math.h which is not appropriate for optimized-routines provided functions that we are testing. However even if we want to test a libc implementation, ISO C allows the setting of errno even if !(math_errhandling&MATH_ERRNO), so relax the checks.
-
Szabolcs Nagy authored
Vector functions are only used on aarch64, so only define them there. math/test/mathbench.c:95:1: warning: '__v_dummyf' defined but not used [-Wunused-function]
-
Szabolcs Nagy authored
gcc-9 started warning if alias symbols have different attributes: math/expf.c: At top level: math/expf.c:89:21: warning: '__expf_finite' specifies less restrictive attributes than its target 'expf': 'leaf', 'nothrow', 'pure' [-Wmissing-attributes] so copy the attributes when creating the aliases.
-
Szabolcs Nagy authored
Compilers (incorrectly) warn about unused volatile variables: math/math_config.h: In function 'force_eval_float': math/math_config.h:188:18: warning: unused variable 'y' [-Wunused-variable] silence them.
-
Szabolcs Nagy authored
Compiler checks and realated macros need to be done earlier so they are usable for the static inline functions.
-
Szabolcs Nagy authored
Fix the Makefile so the documented mechanism in the README still works.
-
- Nov 05, 2019
-
-
Szabolcs Nagy authored
Same design as in expf. Worst-case error of __v_exp2f and __v_exp2f_1u is 1.96 and 0.88 ulp respectively. It is not clear if round/convert instructions are better or +- Shift. For expf the latter, for exp2f the former seems more consistently faster, but both options are kept in the code for now.
-
Szabolcs Nagy authored
Use heredoc instead of pipe when iterating over test cases to avoid creating a subshell that would break the PASS/FAIL accounting.
-
Krzysztof Koch authored
Increase the upper bound on medium cases from 96 to 128 bytes. Now, up to 128 bytes are copied unrolled. Increase the upper bound on small cases from 16 to 32 bytes so that copies of 17-32 bytes are not impacted by the larger medium case.
-
- Oct 17, 2019
-
-
Szabolcs Nagy authored
Implicit function declaration is always a bug, but compilers don't turn it into an error by default for historical reasons, so add it to the default config.
-
Szabolcs Nagy authored
This is a simple fix to the v_powf code, but in general the vector code may not work on arbitrary targets even when compiled with scalar types (s_powf.c), so in the long term may be all s_* should be disabled for non-aarch64 targets (requires test system and header changes too).
-
- Oct 14, 2019
-
-
Szabolcs Nagy authored
Worst-case error is 1.67 ulp, the polynomial was generated by sollya. Uses a 128 entry (2KB) lookup table. Special cases fall back to scalar log call.
-
Szabolcs Nagy authored
Worst-case error is 3.5 ulp, the polynomial was generated by sollya. For large (>2^23) and special inputs the code falls back to scalar sin and cos.
-
Szabolcs Nagy authored
Essentially the scalar powf algorithm is used for each element in the vector just inlined for better scheduling and simpler special case handling. The log polynomial is smaller as less accuracy is enough. Worst-case error is 2.6 ulp.
-
Szabolcs Nagy authored
The polynomials were produced by searching the coefficient space using heuristics and ideas from https://arxiv.org/abs/1508.03211 The worst-case error is 1.886 ulp, large inputs (> 2^20) and other special cases use scalar sinf and cosf.
-
Szabolcs Nagy authored
The polynomial was produced by searching the coefficient space using heuristics and ideas from https://arxiv.org/abs/1508.03211 The worst-case error is 3.34 ulp, subnormal range inputs and other special cases use scalar logf.
-
Szabolcs Nagy authored
Vector math routines are added to the same libmathlib library as scalar ones. The difficulty is that they are not always available, the external abi depends on the compiler version used for the build. Currently only aarch64 AdvSIMD is supported, there are 4 new sets of symbols: __s_foo is a scalar function with identical result to the vector one, __v_foo is a vector function using the base PCS, __vn_foo uses the vector PCS and _ZGV*_foo is the vector ABI symbol alias of vn_foo for a scalar math function foo. The test and benchmark code got extended to handle vector functions. Vector functions aim for < 5 ulp worst case error, only support nearest rounding mode and don't support floating-point exceptions. Vector functions may call scalar functions to handle special cases, but for a single value they should return the same result independently of values in other vector lanes or the position of the value in the vector. The __v_expf and __v_expf_1u polynomials were produced by searching the coefficient space with some heuristics and ideas from https://arxiv.org/abs/1508.03211 Their worst case error is 1.95 and 0.866 ulp respectively. The exp polynomial was produced by sollya, it uses a 128 element (1KB) lookup table and has 2.38 ulp worst case error.
-
Szabolcs Nagy authored
Not all symbols referenced by mathbench may be available in libc so link to libmathlib too to resolve the missing symbols.
-
Szabolcs Nagy authored
Fix it to be python3 compatible and plot the exact and approximated values too.
-
Szabolcs Nagy authored
The ulp tool compares output of a math function to a larger precision implementation of the same function. But when the input argument is converted to a larger precision number the signaling nan property is lost, so ensure that the conversion happens inside the critical region where fenv exceptions are checked and then the conversion itself will raise the invalid exception, which is the correct behaviour in most cases. The volatile barrier is not perfect and the snan behaviour is not always signaling, but this should give more reliable results in most cases than before.
-
- Oct 08, 2019
-
-
Szabolcs Nagy authored
fenv support is not reliable in clang so provide a mechanism to disable fenv status checks and only check the result values.
-
Szabolcs Nagy authored
Users may want different CFLAGS for math and string subprojects, expose a mechanism for this in config.mk.
-
Szabolcs Nagy authored
Allows optimizing the code in shared libraries differently. Has significant effect on literal loads in simd code.
-
Szabolcs Nagy authored
Make ulp and runulp.sh fail on error.
-
Szabolcs Nagy authored
-
Szabolcs Nagy authored
-
Szabolcs Nagy authored
Only increment once per fgets.
-
Szabolcs Nagy authored
Make mathtest fail on error so make check fails too.
-
- Aug 29, 2019
-
-
Szabolcs Nagy authored
Without printing anything on success it is unclear if the right set of functions got hooked up in the test code.
-
- Aug 28, 2019
-
-
Szabolcs Nagy authored
Arm state code can be called from thumb code, so don't hide the symbol.
-
Szabolcs Nagy authored
To allow libmathlib.a to be a drop-in replacement for libc functions on 32bit arm, we should provide long double symbols otherwise the libc long double implementation may pull in double symbols that can conflict with libmathlib.a in case of static linking. Using wrappers instead of alias to avoid type declaration conflicts.
-
Adhemerval Zanella authored
The only difference is changing the symbol name from strrchr to __strrchr_aarch64_sve.
-
Adhemerval Zanella authored
-
Adhemerval Zanella authored
The only difference is changing the symbol name from strnlen to __strnlen_aarch64_sve.
-