Commit graph

407 commits

Author SHA1 Message Date
cnugteren c2cfee76c4 Properly set warning flags for Clang 2016-04-04 08:39:13 -07:00
cnugteren 90e237b97a Removed redundant queue synchronisation statements 2016-04-04 08:38:31 -07:00
cnugteren 2981ca4d3c Merge branch 'cpu_blas' into development 2016-04-03 16:08:48 -07:00
cnugteren c4ab9bda63 Updated the documentation in light of the support for a reference CPU BLAS library 2016-04-03 16:07:25 -07:00
cnugteren cf841d1840 Added support for detection of CPU BLAS libraries OpenBLAS, BLIS and Accelerate on OS X 2016-04-03 15:51:03 -07:00
cnugteren 1a82861a90 Added support for testing (performance and correctness) against a CPU BLAS library 2016-04-02 11:58:00 -07:00
cnugteren 5c83217cf2 Added a wrapper for CBLAS libraries for performance/correctness testing 2016-04-01 22:36:39 -07:00
cnugteren a2056f2216 Create a first version of CPU BLAS detection in CMake 2016-03-31 22:22:29 -07:00
cnugteren 8217b01702 Updated the documentation 2016-03-31 20:20:32 -07:00
cnugteren 8c3c6db7d0 Merge branch 'level1_routines' into development 2016-03-30 21:37:56 -07:00
cnugteren 5409f349a1 Fixed the nrm2 kernel for complex data-types 2016-03-30 21:32:04 -07:00
cnugteren 6578102ae9 CMake now downloads the cl.hpp header from the Khronos website when building the samples 2016-03-30 16:24:38 -07:00
Cedric Nugteren c1df786764 Added prototypes for the xROTM and xROTMG routines 2016-03-30 16:13:37 -07:00
Cedric Nugteren 6ecc0d089c Added prototypes for the xROT and xROTG functions 2016-03-30 16:13:32 -07:00
Cedric Nugteren 6e5f558746 Made event an optional argument in the CLBlast C++ API 2016-03-30 16:13:26 -07:00
Cedric Nugteren 6f561abada Added missing newline to the end of the public API file 2016-03-30 16:13:22 -07:00
Cedric Nugteren 2429ad5025 Fixed properly passing of OpenCL events to CLBlast functions 2016-03-30 16:12:53 -07:00
Cedric Nugteren aaa687ca98 Added preliminary support for the xNRM2 routines 2016-03-28 23:00:44 +02:00
Cedric Nugteren 1d5a702d9d Added prototypes for ScNRM2/DzNRM2 routines 2016-03-25 10:30:38 +01:00
Cedric Nugteren 3876096c30 Added prototypes for SNRM2/DNRM2 routines 2016-03-25 10:00:40 +01:00
Cedric Nugteren 49822c8ead Fixed the C-api export to be able to properly build a DLL on Windows 2016-03-23 20:49:28 +01:00
Cedric Nugteren 706c6987c6 Fixed compilation of the two SGEMM samples 2016-03-23 20:31:25 +01:00
Cedric Nugteren d935695417 Added __declspec(dllexport) to create a DLL on Windows 2016-03-19 11:09:09 +01:00
Cedric Nugteren 918797735d Made the library thread-safe by guarding the kernel cache with a mutex 2016-03-14 22:55:22 +01:00
Cedric Nugteren fda335ddf2 Prepared the changelog for the next release 2016-03-13 11:09:02 +01:00
Cedric Nugteren bf4bd072e2 Updated to version 0.6.0 2016-03-13 11:02:40 +01:00
Cedric Nugteren dd74450a83 Updated Travis to reflect the changes in the Khronos website 2016-03-13 10:55:16 +01:00
Cedric Nugteren de7e68e872 Updated the README file 2016-03-13 10:48:42 +01:00
Cedric Nugteren e6acf13296 Updated Travis script to take into account the missing OpenCL packages 2016-03-13 10:47:53 +01:00
Cedric Nugteren 99d309598d Updated Travis script to fix the fglrx=2:8.960-0ubuntu1 issue 2016-03-13 10:21:33 +01:00
Cedric Nugteren 88c551cdea Added tuning results for the newest xGER family kernels 2016-03-12 16:23:58 +01:00
Cedric Nugteren 801218ba10 Added performance graphs for Intel Iris and Radeon M370X 2016-03-12 16:04:23 +01:00
Cedric Nugteren 83c6a51765 Added tuning results for the ARM Mali-T628 GPU 2016-03-12 15:10:35 +01:00
Cedric Nugteren f4c09220c1 Fixed a bug in the GER-family of routines due to incorrect division of the workgroup size 2016-03-06 16:43:28 +01:00
Cedric Nugteren fb58129afb Made testing against clBLAS in the client binaries truely optional (was partly implemented before) 2016-03-06 16:34:26 +01:00
Cedric Nugteren 7468e2ba9d Adjusted the correctness-test error margins 2016-03-06 16:32:38 +01:00
Cedric Nugteren c93cd2fc2d Merge branch 'rank2_update_routines' into development 2016-03-06 15:48:51 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 60da54da5d Added preliminary support for xHER2 and xSYR2 routines 2016-03-02 21:18:01 +01:00
Cedric Nugteren fa79720557 Added tuning results for Intel Iris Pro and AMD R9 M370X 2016-02-28 16:47:52 +01:00
Cedric Nugteren 3c27edb087 Updated the changelog with newly supported level-2 routines 2016-02-28 16:37:49 +01:00
Cedric Nugteren 610a31283b Merge branch 'ger_routines' into development 2016-02-28 16:31:31 +01:00
Cedric Nugteren 4a56822dcc Fixed a couple of correctness bugs in the Xher kernels 2016-02-28 15:49:59 +01:00
Cedric Nugteren e3545215a5 Added support for xHER, xHPR, xSYR, and xSPR routines 2016-02-28 14:16:48 +01:00
Cedric Nugteren cef78c7356 Fixed a compilation issue under AppleClang 2016-02-28 14:14:50 +01:00
Cedric Nugteren 9f682aa66b Set a proper default precision for the CLBlast clients 2016-02-20 14:41:53 +01:00
Cedric Nugteren 6dc44da07b Added support for xGERU and xGERC routines 2016-02-20 14:15:41 +01:00
Cedric Nugteren 8854a73127 Added XGER routine, kernel, and tuner 2016-02-20 12:40:01 +01:00
Cedric Nugteren c457a70aa1 Updated the changelog 2016-02-10 21:32:09 +01:00
CNugteren fadd76207f Fixed warnings under MSVC 2016-02-08 20:44:05 +01:00