Commit graph

117 commits

Author SHA1 Message Date
Cedric Nugteren 7b8f8fce68 Added initial naive version of the batched GEMM routine based on the direct GEMM kernel 2017-03-11 16:02:45 +01:00
Cedric Nugteren d754586b49 Added proper testing of the alpha parameter; finalized the batched AXPY implementation 2017-03-10 20:49:59 +01:00
Cedric Nugteren d6f1b5fca3 Added L2 error computation and checking for half-precision tests 2017-02-27 21:49:20 +01:00
Cedric Nugteren 00281dad26 Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants 2017-02-27 21:00:04 +01:00
Cedric Nugteren ea6790665d Merge branch 'development' into triangular_solvers 2017-02-26 14:51:45 +01:00
Cedric Nugteren ccac957f17 Added documentation for the TRSV and TRSM routines 2017-02-25 13:02:15 +01:00
Cedric Nugteren fef11a208c Added documentation for the OverrideParameters function 2017-02-18 11:02:57 +01:00
Cedric Nugteren fd471e380c Updated the changelog for PR131 and PR132 2017-01-24 20:34:09 +01:00
Cedric Nugteren ff2bf985a3 Updated the link to cl.hpp in the Khronos registry for the samples 2017-01-07 13:57:23 +01:00
Cedric Nugteren 69ca271a8c Always enables cl_khr_fp64 when running double-precision, not just for OpenCL 1.1 or lower 2017-01-07 13:31:29 +01:00
Cedric Nugteren 32b850b12b Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU 2017-01-03 20:30:56 +01:00
Cedric Nugteren 6b533dda1c Fixed a bug when using offsets in the direct GEMM kernels 2016-12-18 11:54:32 +01:00
Cedric Nugteren 2cf7d8429a Updated to version 0.10.0 2016-11-27 13:34:18 +01:00
Cedric Nugteren 39c49bf4f9 Made it possible to use the command-line environmental variables for each executable and without re-running CMake 2016-11-27 11:00:29 +01:00
Cedric Nugteren cb398f0e42 Merge pull request #125 from CNugteren/netlib_blas_api
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren 2f0697564f Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything 2016-11-20 15:05:42 +01:00
Cedric Nugteren bb14a5880e Added an example and documentation for the Netlib CBLAS API 2016-10-25 20:37:33 +02:00
Cedric Nugteren a670c4c4bf All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects 2016-10-22 16:14:56 +02:00
Cedric Nugteren 9afbbc9ef9 Added documentation for the better exception handling 2016-10-22 15:23:18 +02:00
Cedric Nugteren db17b1fbe9 Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters 2016-10-22 10:41:02 +02:00
Cedric Nugteren 53deed298f Added documentation and minor refactoring for the recent support of static library compilation 2016-10-15 17:11:08 +02:00
Cedric Nugteren ebb505b783 Added tuning results for Intel HD Graphics IvyBridge GPU 2016-10-13 12:18:28 +02:00
Cedric Nugteren 8a9d3cdf37 Added support for compiling the library, the client, and the samples under MSVC 2013 2016-10-10 22:45:39 +02:00
Cedric Nugteren b698e45478 Added first tuning results for the single-kernel direct GEMM implementation 2016-10-06 21:13:14 +02:00
Cedric Nugteren d59e5c570b Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0 2016-09-27 21:03:24 +02:00
Cedric Nugteren db5772e521 Updated to version 8.0 of the CLCudaAPI header 2016-09-27 20:56:49 +02:00
Cedric Nugteren e3076d26cc Added more relaxed error checking for the half-precision tests 2016-09-27 19:42:58 +02:00
Cedric Nugteren d595a8ed7e Fixed a bug waiting for an invalid event in case of a non-succesfull CLBlast call in the tests and samples 2016-09-22 20:47:22 +02:00
Cedric Nugteren b1929d8ce7 It is now possible to set the OpenCL compiler options through an environmental variable 2016-09-21 21:22:16 +02:00
Cedric Nugteren 4b94afda94 Updated to version 0.9.0 2016-09-13 19:20:39 +02:00
Cedric Nugteren b30b26b89e The GEMM kernel no longer adds beta*C in case beta is zero; this would cause problems if C contains NaNs 2016-09-04 17:21:16 +02:00
Cedric Nugteren 8d6a6a5bbf Merge branch 'database_defaults' into development 2016-08-22 19:31:36 +02:00
Cedric Nugteren 00979faab4 Updated the changelog; refactored the database-get-bests code a bit 2016-08-21 20:16:06 +02:00
Cedric Nugteren 7eeef74338 Merge branch 'development' of github.com:CNugteren/CLBlast into development
Conflicts:
	README.md
2016-08-20 12:59:21 +02:00
Cedric Nugteren 6eca53ee23 Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvasschemacq-master
Conflicts:
	src/kernels/level1/xaxpy.opencl
	src/kernels/level2/xgemv.opencl
	src/kernels/level2/xgemv_fast.opencl
	src/kernels/level2/xger.opencl
	src/kernels/level2/xher.opencl
	src/kernels/level2/xher2.opencl
	src/kernels/level3/xgemm_part2.opencl
2016-08-20 12:50:31 +02:00
Cedric Nugteren 35623cd98d Minor update regarding the previous CMake export/install target changes 2016-07-28 20:45:09 +02:00
Cedric Nugteren 40a72259eb Fixe a bug in the new XgemvFastRot kernel related to local memory size 2016-07-23 16:58:11 +02:00
Cedric Nugteren c87e877bf2 Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel 2016-07-10 20:32:01 +02:00
Cedric Nugteren 9caa7ca5b9 Cache now compares cl_context instead of a pointer to a context; added verbose print statements to the cache 2016-07-08 20:57:58 +02:00
Cedric Nugteren 77325b8974 Added an option to the performance clients to do a warm-up run before timing 2016-07-06 21:25:55 +02:00
Cedric Nugteren 9683b50c55 Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp) 2016-07-03 20:30:47 +02:00
Cedric Nugteren 7cf2f8c268 Fixed some memory leaks related to events not properly cleaned-up 2016-07-02 15:34:55 +02:00
Cedric Nugteren b330ab0866 Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library 2016-06-30 10:49:17 +02:00
Cedric Nugteren cd74aaac52 Updated to version 6.0 of the CLCudaAPI header 2016-06-29 19:42:49 +02:00
Cedric Nugteren 56483347e8 Prepared the changelog for the next release 2016-06-28 22:33:13 +02:00
Cedric Nugteren 577f0ee117 Updated to version 0.8.0 2016-06-28 21:32:00 +02:00
Cedric Nugteren 7eeb790824 Added Appveyor Windows CI support 2016-06-27 12:47:39 +02:00
Cedric Nugteren 61203453aa Renamed all C++ source files to .cpp to match the .hpp extension better 2016-06-19 13:55:49 +02:00
Cedric Nugteren 52ccaf5b25 Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing 2016-06-16 18:07:46 +02:00
Cedric Nugteren 995a528cec Improved API documentation and added documentation for level-2 and level-3 routines 2016-06-13 20:17:26 +02:00