Commit graph

240 commits

Author SHA1 Message Date
Cedric Nugteren 6e5f558746 Made event an optional argument in the CLBlast C++ API 2016-03-30 16:13:26 -07:00
Cedric Nugteren 6f561abada Added missing newline to the end of the public API file 2016-03-30 16:13:22 -07:00
Cedric Nugteren 2429ad5025 Fixed properly passing of OpenCL events to CLBlast functions 2016-03-30 16:12:53 -07:00
Cedric Nugteren 49822c8ead Fixed the C-api export to be able to properly build a DLL on Windows 2016-03-23 20:49:28 +01:00
Cedric Nugteren 706c6987c6 Fixed compilation of the two SGEMM samples 2016-03-23 20:31:25 +01:00
Cedric Nugteren d935695417 Added __declspec(dllexport) to create a DLL on Windows 2016-03-19 11:09:09 +01:00
Cedric Nugteren 918797735d Made the library thread-safe by guarding the kernel cache with a mutex 2016-03-14 22:55:22 +01:00
Cedric Nugteren fda335ddf2 Prepared the changelog for the next release 2016-03-13 11:09:02 +01:00
Cedric Nugteren bf4bd072e2 Updated to version 0.6.0 2016-03-13 11:02:40 +01:00
Cedric Nugteren dd74450a83 Updated Travis to reflect the changes in the Khronos website 2016-03-13 10:55:16 +01:00
Cedric Nugteren de7e68e872 Updated the README file 2016-03-13 10:48:42 +01:00
Cedric Nugteren e6acf13296 Updated Travis script to take into account the missing OpenCL packages 2016-03-13 10:47:53 +01:00
Cedric Nugteren 99d309598d Updated Travis script to fix the fglrx=2:8.960-0ubuntu1 issue 2016-03-13 10:21:33 +01:00
Cedric Nugteren 88c551cdea Added tuning results for the newest xGER family kernels 2016-03-12 16:23:58 +01:00
Cedric Nugteren 801218ba10 Added performance graphs for Intel Iris and Radeon M370X 2016-03-12 16:04:23 +01:00
Cedric Nugteren 83c6a51765 Added tuning results for the ARM Mali-T628 GPU 2016-03-12 15:10:35 +01:00
Cedric Nugteren f4c09220c1 Fixed a bug in the GER-family of routines due to incorrect division of the workgroup size 2016-03-06 16:43:28 +01:00
Cedric Nugteren fb58129afb Made testing against clBLAS in the client binaries truely optional (was partly implemented before) 2016-03-06 16:34:26 +01:00
Cedric Nugteren 7468e2ba9d Adjusted the correctness-test error margins 2016-03-06 16:32:38 +01:00
Cedric Nugteren c93cd2fc2d Merge branch 'rank2_update_routines' into development 2016-03-06 15:48:51 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 60da54da5d Added preliminary support for xHER2 and xSYR2 routines 2016-03-02 21:18:01 +01:00
Cedric Nugteren fa79720557 Added tuning results for Intel Iris Pro and AMD R9 M370X 2016-02-28 16:47:52 +01:00
Cedric Nugteren 3c27edb087 Updated the changelog with newly supported level-2 routines 2016-02-28 16:37:49 +01:00
Cedric Nugteren 610a31283b Merge branch 'ger_routines' into development 2016-02-28 16:31:31 +01:00
Cedric Nugteren 4a56822dcc Fixed a couple of correctness bugs in the Xher kernels 2016-02-28 15:49:59 +01:00
Cedric Nugteren e3545215a5 Added support for xHER, xHPR, xSYR, and xSPR routines 2016-02-28 14:16:48 +01:00
Cedric Nugteren cef78c7356 Fixed a compilation issue under AppleClang 2016-02-28 14:14:50 +01:00
Cedric Nugteren 9f682aa66b Set a proper default precision for the CLBlast clients 2016-02-20 14:41:53 +01:00
Cedric Nugteren 6dc44da07b Added support for xGERU and xGERC routines 2016-02-20 14:15:41 +01:00
Cedric Nugteren 8854a73127 Added XGER routine, kernel, and tuner 2016-02-20 12:40:01 +01:00
Cedric Nugteren c457a70aa1 Updated the changelog 2016-02-10 21:32:09 +01:00
CNugteren fadd76207f Fixed warnings under MSVC 2016-02-08 20:44:05 +01:00
Cedric Nugteren bf84463ab2 Separated the GEMM kernel in two parts to reduce string length for MSVC 2016-02-08 20:06:02 +01:00
Cedric Nugteren 38c56bbde2 Split-up the XGEMV kernel in two parts 2016-02-08 19:43:34 +01:00
Cedric Nugteren 6f4b34f813 Added tuning parameters for various devices using the new database script 2016-02-07 16:41:09 +01:00
Cedric Nugteren 165a94c200 Various fixes to the database script 2016-02-07 16:39:37 +01:00
Cedric Nugteren 00be6f7530 Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names 2016-02-07 11:59:30 +01:00
Cedric Nugteren c76f1d9dbb Made the tuning database an optional external download 2016-02-07 10:59:51 +01:00
CNugteren 704a729f5c Made the database script compatible with Python 3 2016-02-06 13:11:36 +01:00
CNugteren b7900652b2 Reduced the maximum workgroup-size for GEMV kernels further 2016-02-06 13:07:19 +01:00
Cedric Nugteren bb985f010b Changed the order of tuners in the alltuners target 2016-02-06 12:48:42 +01:00
CNugteren 40346bb3a5 Reduced unrolling factor in xgemv kernel to reduce compilation times 2016-02-06 12:09:21 +01:00
CNugteren fbf071ba62 Fixed a linker error in the performance client under GCC 2016-02-06 10:53:44 +01:00
CNugteren 9622d3be22 Fixes for compilation under Visual Studio 2016-01-30 14:57:49 +01:00
Cedric Nugteren 44fb40e5c4 Prepared for MSVC support 2016-01-30 11:54:29 +01:00
Cedric Nugteren f573fe6bb3 Fixed a bug in the graph scripts (thanks to Victor Pakhomov) 2016-01-30 11:53:54 +01:00
Cedric Nugteren 310d05d187 Updated to version 4.0 of the CLCudaAPI header 2016-01-30 11:52:21 +01:00
Cedric Nugteren deb58709c8 Merge branch 'tuning_database' into development
is merge is necessary,
2016-01-30 11:46:45 +01:00
Cedric Nugteren 276e772a2c Added first auto-generated database headers from the Python database; only K40 and Iris supported now 2016-01-30 11:43:21 +01:00