Commit graph

97 commits

Author SHA1 Message Date
Cedric Nugteren 7a3b695db7 Added half precision tuning results for supporting kernels (pad, copy, transpose, padtranspose) 2016-05-16 12:45:10 +02:00
Cedric Nugteren 4b6bdd83a2 Added header with conversions from and to half-precision floating-point 2016-05-15 20:13:57 +02:00
Cedric Nugteren 5e1b2e021f Set kernel arguments for AXPY as constant memory buffers, making it possible to transfer half-precision values as well 2016-05-14 18:06:00 +02:00
Cedric Nugteren 120c31a30f Initial experimental version of the half-precision HAXPY routine 2016-05-13 20:49:34 +02:00
Cedric Nugteren f2ba75890c Initial changes in preparation for half-precision fp16 support 2016-05-12 19:56:21 +02:00
Cedric Nugteren 435729a43e Added tuning results for AMD Hawaii (R9 290X) 2016-05-02 20:20:23 +02:00
Cedric Nugteren 27d0ac7f38 Added tuning results for AMD Pitcairn (R9 270X) 2016-05-01 19:33:50 +02:00
Cedric Nugteren c94b628318 Updated tuning database for reduction/dot kernels based on the new tuner; partially repopulated the database 2016-05-01 19:17:04 +02:00
Cedric Nugteren bee2f943ec Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking 2016-05-01 14:03:37 +02:00
Cedric Nugteren 9602c150aa Added a program cache (per-context) next to the per-device binary cache 2016-05-01 12:56:08 +02:00
Cedric Nugteren e113ff0852 Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX 2016-04-30 09:49:39 +02:00
Cedric Nugteren d9b21d7f49 Fixed the cache to store binaries instead of OpenCL programs 2016-04-28 21:14:17 +02:00
Cedric Nugteren d7ddbdeb1f Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX 2016-04-27 18:07:30 +02:00
Cedric Nugteren 82be8f211c Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache 2016-04-27 16:02:13 +02:00
Cedric Nugteren 226e834d0a Added a '-verbose' option to the test binaries to report errors in more detail if needed 2016-04-27 14:38:30 +02:00
cnugteren 16a048f1ac Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines 2016-04-20 22:12:51 -06:00
cnugteren 8be99de82d Added support for the SASUM/DASUM/ScASUM/DzASUM routines 2016-04-14 19:58:26 -06:00
cnugteren a61724ece5 Fixed the way the defaults are calculated in the database; added warning for non-matching tuner arguments 2016-04-11 22:27:44 -06:00
cnugteren 1d3d38a261 Events are now properly implemented using event waiting list and asking the user to wait for event completion 2016-04-09 22:22:24 -06:00
cnugteren 1a82861a90 Added support for testing (performance and correctness) against a CPU BLAS library 2016-04-02 11:58:00 -07:00
cnugteren 8c3c6db7d0 Merge branch 'level1_routines' into development 2016-03-30 21:37:56 -07:00
Cedric Nugteren 6e5f558746 Made event an optional argument in the CLBlast C++ API 2016-03-30 16:13:26 -07:00
Cedric Nugteren 6f561abada Added missing newline to the end of the public API file 2016-03-30 16:13:22 -07:00
Cedric Nugteren 2429ad5025 Fixed properly passing of OpenCL events to CLBlast functions 2016-03-30 16:12:53 -07:00
Cedric Nugteren aaa687ca98 Added preliminary support for the xNRM2 routines 2016-03-28 23:00:44 +02:00
Cedric Nugteren 49822c8ead Fixed the C-api export to be able to properly build a DLL on Windows 2016-03-23 20:49:28 +01:00
Cedric Nugteren d935695417 Added __declspec(dllexport) to create a DLL on Windows 2016-03-19 11:09:09 +01:00
Cedric Nugteren 918797735d Made the library thread-safe by guarding the kernel cache with a mutex 2016-03-14 22:55:22 +01:00
Cedric Nugteren 88c551cdea Added tuning results for the newest xGER family kernels 2016-03-12 16:23:58 +01:00
Cedric Nugteren 83c6a51765 Added tuning results for the ARM Mali-T628 GPU 2016-03-12 15:10:35 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 60da54da5d Added preliminary support for xHER2 and xSYR2 routines 2016-03-02 21:18:01 +01:00
Cedric Nugteren fa79720557 Added tuning results for Intel Iris Pro and AMD R9 M370X 2016-02-28 16:47:52 +01:00
Cedric Nugteren e3545215a5 Added support for xHER, xHPR, xSYR, and xSPR routines 2016-02-28 14:16:48 +01:00
Cedric Nugteren cef78c7356 Fixed a compilation issue under AppleClang 2016-02-28 14:14:50 +01:00
Cedric Nugteren 9f682aa66b Set a proper default precision for the CLBlast clients 2016-02-20 14:41:53 +01:00
Cedric Nugteren 6dc44da07b Added support for xGERU and xGERC routines 2016-02-20 14:15:41 +01:00
Cedric Nugteren 8854a73127 Added XGER routine, kernel, and tuner 2016-02-20 12:40:01 +01:00
Cedric Nugteren 6f4b34f813 Added tuning parameters for various devices using the new database script 2016-02-07 16:41:09 +01:00
Cedric Nugteren 00be6f7530 Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names 2016-02-07 11:59:30 +01:00
CNugteren fbf071ba62 Fixed a linker error in the performance client under GCC 2016-02-06 10:53:44 +01:00
Cedric Nugteren 310d05d187 Updated to version 4.0 of the CLCudaAPI header 2016-01-30 11:52:21 +01:00
Cedric Nugteren 276e772a2c Added first auto-generated database headers from the Python database; only K40 and Iris supported now 2016-01-30 11:43:21 +01:00
CNugteren 9bf6be8426 Added alpha and beta to tuner meta-data 2015-10-23 11:01:44 +02:00
CNugteren f74c9a5640 Routine names are now all default arguments defined in the header 2015-10-12 08:35:58 +02:00
CNugteren 2b56c2c603 Added TRMV/TBMV/TPMV routines 2015-09-26 16:58:03 +02:00
CNugteren 04d28b0420 Made buffer copying a const-method for the source 2015-09-26 16:48:11 +02:00
CNugteren de6547a92b Added SBMV and SPMV routines 2015-09-19 18:01:19 +02:00
CNugteren 80da67d28b Added the HPMV routine 2015-09-19 17:40:38 +02:00
CNugteren c32c4a9739 Added infrastructure for packed matrices 2015-09-19 17:37:42 +02:00