Cedric Nugteren
|
4b3ffd9989
|
Added a first version of the diagonal block invert routine in preparation of TRSM
|
2017-01-15 17:30:00 +01:00 |
|
Cedric Nugteren
|
4a4be0c3a5
|
Prints additional information in verbose/debug mode
|
2017-01-15 17:17:40 +01:00 |
|
Cedric Nugteren
|
ff2bf985a3
|
Updated the link to cl.hpp in the Khronos registry for the samples
|
2017-01-07 13:57:23 +01:00 |
|
Cedric Nugteren
|
69ca271a8c
|
Always enables cl_khr_fp64 when running double-precision, not just for OpenCL 1.1 or lower
|
2017-01-07 13:31:29 +01:00 |
|
Cedric Nugteren
|
32b850b12b
|
Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU
|
2017-01-03 20:30:56 +01:00 |
|
Cedric Nugteren
|
681a465b35
|
Prepared for the addition of the TRSM triangular solver kernel
|
2016-12-18 12:30:16 +01:00 |
|
Cedric Nugteren
|
6b533dda1c
|
Fixed a bug when using offsets in the direct GEMM kernels
|
2016-12-18 11:54:32 +01:00 |
|
Cedric Nugteren
|
26e0177431
|
Made Intel GPUs always use the indirect version of the GEMM kernel
|
2016-11-29 20:47:20 +01:00 |
|
Cedric Nugteren
|
e52f9a9ff2
|
Merge pull request #127 from CNugteren/development
Update to version 0.10.0
|
2016-11-27 15:59:21 +01:00 |
|
Cedric Nugteren
|
2cf7d8429a
|
Updated to version 0.10.0
|
2016-11-27 13:34:18 +01:00 |
|
Cedric Nugteren
|
39c49bf4f9
|
Made it possible to use the command-line environmental variables for each executable and without re-running CMake
|
2016-11-27 11:00:29 +01:00 |
|
Cedric Nugteren
|
8cfcda52a8
|
Merge branch 'better_defaults' into development
|
2016-11-27 09:48:11 +01:00 |
|
Cedric Nugteren
|
080e1be684
|
Improved the default parameters for cases with non-common parameters across all devices
|
2016-11-26 16:38:17 +01:00 |
|
Cedric Nugteren
|
cb398f0e42
|
Merge pull request #125 from CNugteren/netlib_blas_api
Netlib CBLAS API for CLBlast
|
2016-11-24 19:35:59 +01:00 |
|
Cedric Nugteren
|
2ff3f77392
|
Made the Netlib SGEMM example also optionally compiled
|
2016-11-23 22:07:11 +01:00 |
|
Cedric Nugteren
|
792cc8359f
|
Fixed a vector-size related bug in the CLBlast Netlib API
|
2016-11-23 22:00:20 +01:00 |
|
Cedric Nugteren
|
fa42befcc1
|
Made compilation of the Netlib CBLAS API conditional
|
2016-11-23 21:33:35 +01:00 |
|
Cedric Nugteren
|
654b41bb2b
|
Fixed a bug in the HSCAL routine
|
2016-11-23 21:29:16 +01:00 |
|
Cedric Nugteren
|
26ca071480
|
Minor changes to ensure full compatibility with the Netlib CBLAS API
|
2016-11-22 08:41:52 +01:00 |
|
Cedric Nugteren
|
eefe0df435
|
Made functions with scalar-buffers as output properly return values
|
2016-11-20 21:36:57 +01:00 |
|
Cedric Nugteren
|
88ba1f4db9
|
Added performance results for the Skylake ULT GT2 GPU
|
2016-11-20 20:36:56 +01:00 |
|
Cedric Nugteren
|
d8af24e388
|
Now correctly tests for validaty of the B matrix in the TRMM routine
|
2016-11-20 16:27:54 +01:00 |
|
Cedric Nugteren
|
90eb8738c4
|
Forced OpenCL 1.1 compilation and disabled a deprecation warning
|
2016-11-20 16:27:02 +01:00 |
|
Cedric Nugteren
|
2f0697564f
|
Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything
|
2016-11-20 15:05:42 +01:00 |
|
Cedric Nugteren
|
4c9585a349
|
Generating FP16 performance graphs now uses FP32 as a reference for comparison
|
2016-11-19 22:21:07 +01:00 |
|
Cedric Nugteren
|
6eeb1180fd
|
Changed the GEMM kernel selection parameters for Skylake GPUs to always favour the regular kernel
|
2016-11-19 22:15:33 +01:00 |
|
Cedric Nugteren
|
60fa2322ca
|
Added a proper half-precision reference for testing of xomatcopy
|
2016-11-17 22:20:16 +01:00 |
|
Cedric Nugteren
|
29aab3019e
|
Fixed a bug in the error margins; relaxed the error margins for half-precision
|
2016-11-17 22:19:36 +01:00 |
|
Cedric Nugteren
|
746d688e07
|
Updated the tuning results for the Intel Skylake ULT GT2 GPU
|
2016-11-15 22:42:04 +01:00 |
|
Cedric Nugteren
|
bb14a5880e
|
Added an example and documentation for the Netlib CBLAS API
|
2016-10-25 20:37:33 +02:00 |
|
Cedric Nugteren
|
8ae8ab06a2
|
Renamed the include and source files of the Netlib CBLAS API
|
2016-10-25 20:33:10 +02:00 |
|
Cedric Nugteren
|
140121ef91
|
Removed the clblast namespace from the Netlib C API source file to ensure proper linking
|
2016-10-25 20:21:50 +02:00 |
|
Cedric Nugteren
|
729862e873
|
Fixed some issues preventing the Netlib CBLAS API from linking correctly
|
2016-10-25 19:56:42 +02:00 |
|
Cedric Nugteren
|
926aca53a0
|
Made the Netlib CBLAS API use the same enums with prefixes as the regular C API of CLBlast
|
2016-10-25 19:45:57 +02:00 |
|
Cedric Nugteren
|
59183b7d79
|
Sets the proper sizes for the buffers for the Netlib CBLAS API
|
2016-10-25 19:21:49 +02:00 |
|
Cedric Nugteren
|
f96fd372bc
|
Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes
|
2016-10-25 14:28:52 +02:00 |
|
Cedric Nugteren
|
3b65eace0a
|
Merge branch 'development' into netlib_blas_api
Conflicts:
scripts/generator/generator.py
scripts/generator/generator/routine.py
|
2016-10-25 09:34:24 +02:00 |
|
Cedric Nugteren
|
0f5bf35ebe
|
Updated list of acknowledgments and thanks
|
2016-10-24 19:54:45 +02:00 |
|
Cedric Nugteren
|
ec687afa75
|
Added tuning results for GeForce GTX TITAN Black
|
2016-10-24 19:49:10 +02:00 |
|
Cedric Nugteren
|
76d5d2ccfc
|
Fixed a bug in the transpose-matrix function
|
2016-10-23 20:49:55 +02:00 |
|
Cedric Nugteren
|
43f4f02399
|
Added an initial version of contributing guidelines
|
2016-10-23 16:56:51 +02:00 |
|
Cedric Nugteren
|
b8d4a9b9d0
|
Removed PUBLIC_API from the C++ exception classes
|
2016-10-23 16:09:59 +02:00 |
|
Cedric Nugteren
|
66f5c9d9b8
|
Added a fix for compilation under Visual Studio 2013 related to the new exception classes
|
2016-10-23 15:55:03 +02:00 |
|
Cedric Nugteren
|
fda39ffd86
|
Fixed the CMakeLists.txt for Visual Studio compilation
|
2016-10-23 14:34:46 +02:00 |
|
Cedric Nugteren
|
de0420dffa
|
Minor clean-up of the CMakeLists file
|
2016-10-22 16:38:42 +02:00 |
|
Cedric Nugteren
|
c925fe463f
|
Added tuning results for the AMD Tonga GPU
|
2016-10-22 16:25:31 +02:00 |
|
Cedric Nugteren
|
a670c4c4bf
|
All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects
|
2016-10-22 16:14:56 +02:00 |
|
Cedric Nugteren
|
4a5516aa78
|
Added extra error codes to reflect the more detailed error reporting of OpenCL functions
|
2016-10-22 15:46:29 +02:00 |
|
Cedric Nugteren
|
b0ff11acf0
|
Moved files around a bit; created a utilities subfolder
|
2016-10-22 15:36:48 +02:00 |
|
Cedric Nugteren
|
9afbbc9ef9
|
Added documentation for the better exception handling
|
2016-10-22 15:23:18 +02:00 |
|