Commit graph

741 commits

Author SHA1 Message Date
Cedric Nugteren a2c0a9c551 Set number of decimals for floating-point printing for error reporting 2017-01-20 11:13:44 +01:00
Cedric Nugteren 2e4f6e1609 Added tuning results for NVIDIA GTX 1080 and Intel Core i7-4790K 2017-01-19 19:42:31 +01:00
Cedric Nugteren df9a77d74d Added first version of the TRSM routine based on the diagonal invert kernel 2017-01-18 21:29:59 +01:00
Cedric Nugteren 4b3ffd9989 Added a first version of the diagonal block invert routine in preparation of TRSM 2017-01-15 17:30:00 +01:00
Cedric Nugteren 4a4be0c3a5 Prints additional information in verbose/debug mode 2017-01-15 17:17:40 +01:00
Cedric Nugteren ff2bf985a3 Updated the link to cl.hpp in the Khronos registry for the samples 2017-01-07 13:57:23 +01:00
Cedric Nugteren 69ca271a8c Always enables cl_khr_fp64 when running double-precision, not just for OpenCL 1.1 or lower 2017-01-07 13:31:29 +01:00
Cedric Nugteren 32b850b12b Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU 2017-01-03 20:30:56 +01:00
Cedric Nugteren 681a465b35 Prepared for the addition of the TRSM triangular solver kernel 2016-12-18 12:30:16 +01:00
Cedric Nugteren 6b533dda1c Fixed a bug when using offsets in the direct GEMM kernels 2016-12-18 11:54:32 +01:00
Cedric Nugteren 26e0177431 Made Intel GPUs always use the indirect version of the GEMM kernel 2016-11-29 20:47:20 +01:00
Cedric Nugteren e52f9a9ff2 Merge pull request #127 from CNugteren/development
Update to version 0.10.0
2016-11-27 15:59:21 +01:00
Cedric Nugteren 2cf7d8429a Updated to version 0.10.0 2016-11-27 13:34:18 +01:00
Cedric Nugteren 39c49bf4f9 Made it possible to use the command-line environmental variables for each executable and without re-running CMake 2016-11-27 11:00:29 +01:00
Cedric Nugteren 8cfcda52a8 Merge branch 'better_defaults' into development 2016-11-27 09:48:11 +01:00
Cedric Nugteren 080e1be684 Improved the default parameters for cases with non-common parameters across all devices 2016-11-26 16:38:17 +01:00
Cedric Nugteren cb398f0e42 Merge pull request #125 from CNugteren/netlib_blas_api
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren 2ff3f77392 Made the Netlib SGEMM example also optionally compiled 2016-11-23 22:07:11 +01:00
Cedric Nugteren 792cc8359f Fixed a vector-size related bug in the CLBlast Netlib API 2016-11-23 22:00:20 +01:00
Cedric Nugteren fa42befcc1 Made compilation of the Netlib CBLAS API conditional 2016-11-23 21:33:35 +01:00
Cedric Nugteren 654b41bb2b Fixed a bug in the HSCAL routine 2016-11-23 21:29:16 +01:00
Cedric Nugteren 26ca071480 Minor changes to ensure full compatibility with the Netlib CBLAS API 2016-11-22 08:41:52 +01:00
Cedric Nugteren eefe0df435 Made functions with scalar-buffers as output properly return values 2016-11-20 21:36:57 +01:00
Cedric Nugteren 88ba1f4db9 Added performance results for the Skylake ULT GT2 GPU 2016-11-20 20:36:56 +01:00
Cedric Nugteren d8af24e388 Now correctly tests for validaty of the B matrix in the TRMM routine 2016-11-20 16:27:54 +01:00
Cedric Nugteren 90eb8738c4 Forced OpenCL 1.1 compilation and disabled a deprecation warning 2016-11-20 16:27:02 +01:00
Cedric Nugteren 2f0697564f Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything 2016-11-20 15:05:42 +01:00
Cedric Nugteren 4c9585a349 Generating FP16 performance graphs now uses FP32 as a reference for comparison 2016-11-19 22:21:07 +01:00
Cedric Nugteren 6eeb1180fd Changed the GEMM kernel selection parameters for Skylake GPUs to always favour the regular kernel 2016-11-19 22:15:33 +01:00
Cedric Nugteren 60fa2322ca Added a proper half-precision reference for testing of xomatcopy 2016-11-17 22:20:16 +01:00
Cedric Nugteren 29aab3019e Fixed a bug in the error margins; relaxed the error margins for half-precision 2016-11-17 22:19:36 +01:00
Cedric Nugteren 746d688e07 Updated the tuning results for the Intel Skylake ULT GT2 GPU 2016-11-15 22:42:04 +01:00
Cedric Nugteren bb14a5880e Added an example and documentation for the Netlib CBLAS API 2016-10-25 20:37:33 +02:00
Cedric Nugteren 8ae8ab06a2 Renamed the include and source files of the Netlib CBLAS API 2016-10-25 20:33:10 +02:00
Cedric Nugteren 140121ef91 Removed the clblast namespace from the Netlib C API source file to ensure proper linking 2016-10-25 20:21:50 +02:00
Cedric Nugteren 729862e873 Fixed some issues preventing the Netlib CBLAS API from linking correctly 2016-10-25 19:56:42 +02:00
Cedric Nugteren 926aca53a0 Made the Netlib CBLAS API use the same enums with prefixes as the regular C API of CLBlast 2016-10-25 19:45:57 +02:00
Cedric Nugteren 59183b7d79 Sets the proper sizes for the buffers for the Netlib CBLAS API 2016-10-25 19:21:49 +02:00
Cedric Nugteren f96fd372bc Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes 2016-10-25 14:28:52 +02:00
Cedric Nugteren 3b65eace0a Merge branch 'development' into netlib_blas_api
Conflicts:
	scripts/generator/generator.py
	scripts/generator/generator/routine.py
2016-10-25 09:34:24 +02:00
Cedric Nugteren 0f5bf35ebe Updated list of acknowledgments and thanks 2016-10-24 19:54:45 +02:00
Cedric Nugteren ec687afa75 Added tuning results for GeForce GTX TITAN Black 2016-10-24 19:49:10 +02:00
Cedric Nugteren 76d5d2ccfc Fixed a bug in the transpose-matrix function 2016-10-23 20:49:55 +02:00
Cedric Nugteren 43f4f02399 Added an initial version of contributing guidelines 2016-10-23 16:56:51 +02:00
Cedric Nugteren b8d4a9b9d0 Removed PUBLIC_API from the C++ exception classes 2016-10-23 16:09:59 +02:00
Cedric Nugteren 66f5c9d9b8 Added a fix for compilation under Visual Studio 2013 related to the new exception classes 2016-10-23 15:55:03 +02:00
Cedric Nugteren fda39ffd86 Fixed the CMakeLists.txt for Visual Studio compilation 2016-10-23 14:34:46 +02:00
Cedric Nugteren de0420dffa Minor clean-up of the CMakeLists file 2016-10-22 16:38:42 +02:00
Cedric Nugteren c925fe463f Added tuning results for the AMD Tonga GPU 2016-10-22 16:25:31 +02:00
Cedric Nugteren a670c4c4bf All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects 2016-10-22 16:14:56 +02:00