Ivan Shapovalov
8e1c084c93
src/clpp11.hpp: do not store program source/binary in Program
...
The stored source/binary does not seem to serve any purpose, yet its
presence makes Program a heavy (not pure refcounted) object, which is
undesired esp. because it is copied from the cache in the hot path.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov
ee4124dcbc
samples: add CL_USE_DEPRECATED_OPENCL_1_*_APIS where needed
2017-01-24 02:42:59 +03:00
Ivan Shapovalov
1a1e863ab3
treewide: include clpp11.hpp first to silence deprecation warnings
...
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Ivan Shapovalov
43c7707173
Routine: use PrecisionSupported<>() instead of duplicating the check
2017-01-20 17:20:45 +03:00
Cedric Nugteren
a5fd2323b6
Added prototype for the TRSV routine
2017-01-20 11:30:32 +01:00
Cedric Nugteren
a2c0a9c551
Set number of decimals for floating-point printing for error reporting
2017-01-20 11:13:44 +01:00
Cedric Nugteren
2e4f6e1609
Added tuning results for NVIDIA GTX 1080 and Intel Core i7-4790K
2017-01-19 19:42:31 +01:00
Cedric Nugteren
df9a77d74d
Added first version of the TRSM routine based on the diagonal invert kernel
2017-01-18 21:29:59 +01:00
Cedric Nugteren
4b3ffd9989
Added a first version of the diagonal block invert routine in preparation of TRSM
2017-01-15 17:30:00 +01:00
Cedric Nugteren
4a4be0c3a5
Prints additional information in verbose/debug mode
2017-01-15 17:17:40 +01:00
Cedric Nugteren
ff2bf985a3
Updated the link to cl.hpp in the Khronos registry for the samples
2017-01-07 13:57:23 +01:00
Cedric Nugteren
69ca271a8c
Always enables cl_khr_fp64 when running double-precision, not just for OpenCL 1.1 or lower
2017-01-07 13:31:29 +01:00
Cedric Nugteren
32b850b12b
Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU
2017-01-03 20:30:56 +01:00
Cedric Nugteren
681a465b35
Prepared for the addition of the TRSM triangular solver kernel
2016-12-18 12:30:16 +01:00
Cedric Nugteren
6b533dda1c
Fixed a bug when using offsets in the direct GEMM kernels
2016-12-18 11:54:32 +01:00
Cedric Nugteren
26e0177431
Made Intel GPUs always use the indirect version of the GEMM kernel
2016-11-29 20:47:20 +01:00
Cedric Nugteren
2cf7d8429a
Updated to version 0.10.0
2016-11-27 13:34:18 +01:00
Cedric Nugteren
39c49bf4f9
Made it possible to use the command-line environmental variables for each executable and without re-running CMake
2016-11-27 11:00:29 +01:00
Cedric Nugteren
8cfcda52a8
Merge branch 'better_defaults' into development
2016-11-27 09:48:11 +01:00
Cedric Nugteren
080e1be684
Improved the default parameters for cases with non-common parameters across all devices
2016-11-26 16:38:17 +01:00
Cedric Nugteren
cb398f0e42
Merge pull request #125 from CNugteren/netlib_blas_api
...
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren
2ff3f77392
Made the Netlib SGEMM example also optionally compiled
2016-11-23 22:07:11 +01:00
Cedric Nugteren
792cc8359f
Fixed a vector-size related bug in the CLBlast Netlib API
2016-11-23 22:00:20 +01:00
Cedric Nugteren
fa42befcc1
Made compilation of the Netlib CBLAS API conditional
2016-11-23 21:33:35 +01:00
Cedric Nugteren
654b41bb2b
Fixed a bug in the HSCAL routine
2016-11-23 21:29:16 +01:00
Cedric Nugteren
26ca071480
Minor changes to ensure full compatibility with the Netlib CBLAS API
2016-11-22 08:41:52 +01:00
Cedric Nugteren
eefe0df435
Made functions with scalar-buffers as output properly return values
2016-11-20 21:36:57 +01:00
Cedric Nugteren
88ba1f4db9
Added performance results for the Skylake ULT GT2 GPU
2016-11-20 20:36:56 +01:00
Cedric Nugteren
d8af24e388
Now correctly tests for validaty of the B matrix in the TRMM routine
2016-11-20 16:27:54 +01:00
Cedric Nugteren
90eb8738c4
Forced OpenCL 1.1 compilation and disabled a deprecation warning
2016-11-20 16:27:02 +01:00
Cedric Nugteren
2f0697564f
Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything
2016-11-20 15:05:42 +01:00
Cedric Nugteren
4c9585a349
Generating FP16 performance graphs now uses FP32 as a reference for comparison
2016-11-19 22:21:07 +01:00
Cedric Nugteren
6eeb1180fd
Changed the GEMM kernel selection parameters for Skylake GPUs to always favour the regular kernel
2016-11-19 22:15:33 +01:00
Cedric Nugteren
60fa2322ca
Added a proper half-precision reference for testing of xomatcopy
2016-11-17 22:20:16 +01:00
Cedric Nugteren
29aab3019e
Fixed a bug in the error margins; relaxed the error margins for half-precision
2016-11-17 22:19:36 +01:00
Cedric Nugteren
746d688e07
Updated the tuning results for the Intel Skylake ULT GT2 GPU
2016-11-15 22:42:04 +01:00
Cedric Nugteren
bb14a5880e
Added an example and documentation for the Netlib CBLAS API
2016-10-25 20:37:33 +02:00
Cedric Nugteren
8ae8ab06a2
Renamed the include and source files of the Netlib CBLAS API
2016-10-25 20:33:10 +02:00
Cedric Nugteren
140121ef91
Removed the clblast namespace from the Netlib C API source file to ensure proper linking
2016-10-25 20:21:50 +02:00
Cedric Nugteren
729862e873
Fixed some issues preventing the Netlib CBLAS API from linking correctly
2016-10-25 19:56:42 +02:00
Cedric Nugteren
926aca53a0
Made the Netlib CBLAS API use the same enums with prefixes as the regular C API of CLBlast
2016-10-25 19:45:57 +02:00
Cedric Nugteren
59183b7d79
Sets the proper sizes for the buffers for the Netlib CBLAS API
2016-10-25 19:21:49 +02:00
Cedric Nugteren
f96fd372bc
Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes
2016-10-25 14:28:52 +02:00
Cedric Nugteren
3b65eace0a
Merge branch 'development' into netlib_blas_api
...
Conflicts:
scripts/generator/generator.py
scripts/generator/generator/routine.py
2016-10-25 09:34:24 +02:00
Cedric Nugteren
0f5bf35ebe
Updated list of acknowledgments and thanks
2016-10-24 19:54:45 +02:00
Cedric Nugteren
ec687afa75
Added tuning results for GeForce GTX TITAN Black
2016-10-24 19:49:10 +02:00
Cedric Nugteren
76d5d2ccfc
Fixed a bug in the transpose-matrix function
2016-10-23 20:49:55 +02:00
Cedric Nugteren
43f4f02399
Added an initial version of contributing guidelines
2016-10-23 16:56:51 +02:00
Cedric Nugteren
b8d4a9b9d0
Removed PUBLIC_API from the C++ exception classes
2016-10-23 16:09:59 +02:00
Cedric Nugteren
66f5c9d9b8
Added a fix for compilation under Visual Studio 2013 related to the new exception classes
2016-10-23 15:55:03 +02:00