Commit graph

595 commits

Author SHA1 Message Date
Ivan Shapovalov 46a59eb882 .travis.yml: do not build for osx twice, there's no gcc there 2017-01-24 02:55:09 +03:00
Ivan Shapovalov 064ba4abd4 treewide: silence type mismatch warnings in *printf() 2017-01-24 02:55:09 +03:00
Ivan Shapovalov 519ccbd273 Tester: always fail on OpenCL and CLBlast internal errors
These errors are self-evident and enough to fail the test even if there is
no clBLAS reference to compare error codes with.
2017-01-24 02:55:09 +03:00
Ivan Shapovalov 1b8e816333 FillCache: perform compilation for each precision separately
Thus do not prevent filling cache for float if the device does not support
e. g. double.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov 6ad11665a1 Routine: fix semi-warm routine construction (when binary is in cache)
There was a missing return statement in the semi-warm path that made
CLBlast to continue to cold path after a cache hit.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov a9914ee3a8 src/clpp11.hpp: check pointers before clRelease*()
This is to avoid spurious "induced" errors on destruction, if construction
failed for some reason.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov 8e1c084c93 src/clpp11.hpp: do not store program source/binary in Program
The stored source/binary does not seem to serve any purpose, yet its
presence makes Program a heavy (not pure refcounted) object, which is
undesired esp. because it is copied from the cache in the hot path.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov ee4124dcbc samples: add CL_USE_DEPRECATED_OPENCL_1_*_APIS where needed 2017-01-24 02:42:59 +03:00
Ivan Shapovalov 1a1e863ab3 treewide: include clpp11.hpp first to silence deprecation warnings
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Ivan Shapovalov 43c7707173 Routine: use PrecisionSupported<>() instead of duplicating the check 2017-01-20 17:20:45 +03:00
Cedric Nugteren 2e4f6e1609 Added tuning results for NVIDIA GTX 1080 and Intel Core i7-4790K 2017-01-19 19:42:31 +01:00
Cedric Nugteren ff2bf985a3 Updated the link to cl.hpp in the Khronos registry for the samples 2017-01-07 13:57:23 +01:00
Cedric Nugteren 69ca271a8c Always enables cl_khr_fp64 when running double-precision, not just for OpenCL 1.1 or lower 2017-01-07 13:31:29 +01:00
Cedric Nugteren 32b850b12b Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU 2017-01-03 20:30:56 +01:00
Cedric Nugteren 6b533dda1c Fixed a bug when using offsets in the direct GEMM kernels 2016-12-18 11:54:32 +01:00
Cedric Nugteren 26e0177431 Made Intel GPUs always use the indirect version of the GEMM kernel 2016-11-29 20:47:20 +01:00
Cedric Nugteren 2cf7d8429a Updated to version 0.10.0 2016-11-27 13:34:18 +01:00
Cedric Nugteren 39c49bf4f9 Made it possible to use the command-line environmental variables for each executable and without re-running CMake 2016-11-27 11:00:29 +01:00
Cedric Nugteren 8cfcda52a8 Merge branch 'better_defaults' into development 2016-11-27 09:48:11 +01:00
Cedric Nugteren 080e1be684 Improved the default parameters for cases with non-common parameters across all devices 2016-11-26 16:38:17 +01:00
Cedric Nugteren cb398f0e42 Merge pull request #125 from CNugteren/netlib_blas_api
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren 2ff3f77392 Made the Netlib SGEMM example also optionally compiled 2016-11-23 22:07:11 +01:00
Cedric Nugteren 792cc8359f Fixed a vector-size related bug in the CLBlast Netlib API 2016-11-23 22:00:20 +01:00
Cedric Nugteren fa42befcc1 Made compilation of the Netlib CBLAS API conditional 2016-11-23 21:33:35 +01:00
Cedric Nugteren 654b41bb2b Fixed a bug in the HSCAL routine 2016-11-23 21:29:16 +01:00
Cedric Nugteren 26ca071480 Minor changes to ensure full compatibility with the Netlib CBLAS API 2016-11-22 08:41:52 +01:00
Cedric Nugteren eefe0df435 Made functions with scalar-buffers as output properly return values 2016-11-20 21:36:57 +01:00
Cedric Nugteren 88ba1f4db9 Added performance results for the Skylake ULT GT2 GPU 2016-11-20 20:36:56 +01:00
Cedric Nugteren d8af24e388 Now correctly tests for validaty of the B matrix in the TRMM routine 2016-11-20 16:27:54 +01:00
Cedric Nugteren 90eb8738c4 Forced OpenCL 1.1 compilation and disabled a deprecation warning 2016-11-20 16:27:02 +01:00
Cedric Nugteren 2f0697564f Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything 2016-11-20 15:05:42 +01:00
Cedric Nugteren 4c9585a349 Generating FP16 performance graphs now uses FP32 as a reference for comparison 2016-11-19 22:21:07 +01:00
Cedric Nugteren 6eeb1180fd Changed the GEMM kernel selection parameters for Skylake GPUs to always favour the regular kernel 2016-11-19 22:15:33 +01:00
Cedric Nugteren 60fa2322ca Added a proper half-precision reference for testing of xomatcopy 2016-11-17 22:20:16 +01:00
Cedric Nugteren 29aab3019e Fixed a bug in the error margins; relaxed the error margins for half-precision 2016-11-17 22:19:36 +01:00
Cedric Nugteren 746d688e07 Updated the tuning results for the Intel Skylake ULT GT2 GPU 2016-11-15 22:42:04 +01:00
Cedric Nugteren bb14a5880e Added an example and documentation for the Netlib CBLAS API 2016-10-25 20:37:33 +02:00
Cedric Nugteren 8ae8ab06a2 Renamed the include and source files of the Netlib CBLAS API 2016-10-25 20:33:10 +02:00
Cedric Nugteren 140121ef91 Removed the clblast namespace from the Netlib C API source file to ensure proper linking 2016-10-25 20:21:50 +02:00
Cedric Nugteren 729862e873 Fixed some issues preventing the Netlib CBLAS API from linking correctly 2016-10-25 19:56:42 +02:00
Cedric Nugteren 926aca53a0 Made the Netlib CBLAS API use the same enums with prefixes as the regular C API of CLBlast 2016-10-25 19:45:57 +02:00
Cedric Nugteren 59183b7d79 Sets the proper sizes for the buffers for the Netlib CBLAS API 2016-10-25 19:21:49 +02:00
Cedric Nugteren f96fd372bc Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes 2016-10-25 14:28:52 +02:00
Cedric Nugteren 3b65eace0a Merge branch 'development' into netlib_blas_api
Conflicts:
	scripts/generator/generator.py
	scripts/generator/generator/routine.py
2016-10-25 09:34:24 +02:00
Cedric Nugteren 0f5bf35ebe Updated list of acknowledgments and thanks 2016-10-24 19:54:45 +02:00
Cedric Nugteren ec687afa75 Added tuning results for GeForce GTX TITAN Black 2016-10-24 19:49:10 +02:00
Cedric Nugteren 76d5d2ccfc Fixed a bug in the transpose-matrix function 2016-10-23 20:49:55 +02:00
Cedric Nugteren 43f4f02399 Added an initial version of contributing guidelines 2016-10-23 16:56:51 +02:00
Cedric Nugteren b8d4a9b9d0 Removed PUBLIC_API from the C++ exception classes 2016-10-23 16:09:59 +02:00
Cedric Nugteren 66f5c9d9b8 Added a fix for compilation under Visual Studio 2013 related to the new exception classes 2016-10-23 15:55:03 +02:00