cnugteren
|
5c83217cf2
|
Added a wrapper for CBLAS libraries for performance/correctness testing
|
2016-04-01 22:36:39 -07:00 |
|
cnugteren
|
a2056f2216
|
Create a first version of CPU BLAS detection in CMake
|
2016-03-31 22:22:29 -07:00 |
|
cnugteren
|
8217b01702
|
Updated the documentation
|
2016-03-31 20:20:32 -07:00 |
|
cnugteren
|
8c3c6db7d0
|
Merge branch 'level1_routines' into development
|
2016-03-30 21:37:56 -07:00 |
|
cnugteren
|
5409f349a1
|
Fixed the nrm2 kernel for complex data-types
|
2016-03-30 21:32:04 -07:00 |
|
cnugteren
|
6578102ae9
|
CMake now downloads the cl.hpp header from the Khronos website when building the samples
|
2016-03-30 16:24:38 -07:00 |
|
Cedric Nugteren
|
c1df786764
|
Added prototypes for the xROTM and xROTMG routines
|
2016-03-30 16:13:37 -07:00 |
|
Cedric Nugteren
|
6ecc0d089c
|
Added prototypes for the xROT and xROTG functions
|
2016-03-30 16:13:32 -07:00 |
|
Cedric Nugteren
|
6e5f558746
|
Made event an optional argument in the CLBlast C++ API
|
2016-03-30 16:13:26 -07:00 |
|
Cedric Nugteren
|
6f561abada
|
Added missing newline to the end of the public API file
|
2016-03-30 16:13:22 -07:00 |
|
Cedric Nugteren
|
2429ad5025
|
Fixed properly passing of OpenCL events to CLBlast functions
|
2016-03-30 16:12:53 -07:00 |
|
Cedric Nugteren
|
aaa687ca98
|
Added preliminary support for the xNRM2 routines
|
2016-03-28 23:00:44 +02:00 |
|
Cedric Nugteren
|
1d5a702d9d
|
Added prototypes for ScNRM2/DzNRM2 routines
|
2016-03-25 10:30:38 +01:00 |
|
Cedric Nugteren
|
3876096c30
|
Added prototypes for SNRM2/DNRM2 routines
|
2016-03-25 10:00:40 +01:00 |
|
Cedric Nugteren
|
49822c8ead
|
Fixed the C-api export to be able to properly build a DLL on Windows
|
2016-03-23 20:49:28 +01:00 |
|
Cedric Nugteren
|
706c6987c6
|
Fixed compilation of the two SGEMM samples
|
2016-03-23 20:31:25 +01:00 |
|
Cedric Nugteren
|
d935695417
|
Added __declspec(dllexport) to create a DLL on Windows
|
2016-03-19 11:09:09 +01:00 |
|
Cedric Nugteren
|
918797735d
|
Made the library thread-safe by guarding the kernel cache with a mutex
|
2016-03-14 22:55:22 +01:00 |
|
Cedric Nugteren
|
fda335ddf2
|
Prepared the changelog for the next release
|
2016-03-13 11:09:02 +01:00 |
|
Cedric Nugteren
|
bf4bd072e2
|
Updated to version 0.6.0
|
2016-03-13 11:02:40 +01:00 |
|
Cedric Nugteren
|
dd74450a83
|
Updated Travis to reflect the changes in the Khronos website
|
2016-03-13 10:55:16 +01:00 |
|
Cedric Nugteren
|
de7e68e872
|
Updated the README file
|
2016-03-13 10:48:42 +01:00 |
|
Cedric Nugteren
|
e6acf13296
|
Updated Travis script to take into account the missing OpenCL packages
|
2016-03-13 10:47:53 +01:00 |
|
Cedric Nugteren
|
99d309598d
|
Updated Travis script to fix the fglrx=2:8.960-0ubuntu1 issue
|
2016-03-13 10:21:33 +01:00 |
|
Cedric Nugteren
|
88c551cdea
|
Added tuning results for the newest xGER family kernels
|
2016-03-12 16:23:58 +01:00 |
|
Cedric Nugteren
|
801218ba10
|
Added performance graphs for Intel Iris and Radeon M370X
|
2016-03-12 16:04:23 +01:00 |
|
Cedric Nugteren
|
83c6a51765
|
Added tuning results for the ARM Mali-T628 GPU
|
2016-03-12 15:10:35 +01:00 |
|
Cedric Nugteren
|
f4c09220c1
|
Fixed a bug in the GER-family of routines due to incorrect division of the workgroup size
|
2016-03-06 16:43:28 +01:00 |
|
Cedric Nugteren
|
fb58129afb
|
Made testing against clBLAS in the client binaries truely optional (was partly implemented before)
|
2016-03-06 16:34:26 +01:00 |
|
Cedric Nugteren
|
7468e2ba9d
|
Adjusted the correctness-test error margins
|
2016-03-06 16:32:38 +01:00 |
|
Cedric Nugteren
|
c93cd2fc2d
|
Merge branch 'rank2_update_routines' into development
|
2016-03-06 15:48:51 +01:00 |
|
Cedric Nugteren
|
306bf67660
|
Added preliminary support for xHPR2 and xSPR2 routines
|
2016-03-06 15:48:11 +01:00 |
|
Cedric Nugteren
|
60da54da5d
|
Added preliminary support for xHER2 and xSYR2 routines
|
2016-03-02 21:18:01 +01:00 |
|
Cedric Nugteren
|
fa79720557
|
Added tuning results for Intel Iris Pro and AMD R9 M370X
|
2016-02-28 16:47:52 +01:00 |
|
Cedric Nugteren
|
3c27edb087
|
Updated the changelog with newly supported level-2 routines
|
2016-02-28 16:37:49 +01:00 |
|
Cedric Nugteren
|
610a31283b
|
Merge branch 'ger_routines' into development
|
2016-02-28 16:31:31 +01:00 |
|
Cedric Nugteren
|
4a56822dcc
|
Fixed a couple of correctness bugs in the Xher kernels
|
2016-02-28 15:49:59 +01:00 |
|
Cedric Nugteren
|
e3545215a5
|
Added support for xHER, xHPR, xSYR, and xSPR routines
|
2016-02-28 14:16:48 +01:00 |
|
Cedric Nugteren
|
cef78c7356
|
Fixed a compilation issue under AppleClang
|
2016-02-28 14:14:50 +01:00 |
|
Cedric Nugteren
|
9f682aa66b
|
Set a proper default precision for the CLBlast clients
|
2016-02-20 14:41:53 +01:00 |
|
Cedric Nugteren
|
6dc44da07b
|
Added support for xGERU and xGERC routines
|
2016-02-20 14:15:41 +01:00 |
|
Cedric Nugteren
|
8854a73127
|
Added XGER routine, kernel, and tuner
|
2016-02-20 12:40:01 +01:00 |
|
Cedric Nugteren
|
c457a70aa1
|
Updated the changelog
|
2016-02-10 21:32:09 +01:00 |
|
CNugteren
|
fadd76207f
|
Fixed warnings under MSVC
|
2016-02-08 20:44:05 +01:00 |
|
Cedric Nugteren
|
bf84463ab2
|
Separated the GEMM kernel in two parts to reduce string length for MSVC
|
2016-02-08 20:06:02 +01:00 |
|
Cedric Nugteren
|
38c56bbde2
|
Split-up the XGEMV kernel in two parts
|
2016-02-08 19:43:34 +01:00 |
|
Cedric Nugteren
|
6f4b34f813
|
Added tuning parameters for various devices using the new database script
|
2016-02-07 16:41:09 +01:00 |
|
Cedric Nugteren
|
165a94c200
|
Various fixes to the database script
|
2016-02-07 16:39:37 +01:00 |
|
Cedric Nugteren
|
00be6f7530
|
Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names
|
2016-02-07 11:59:30 +01:00 |
|
Cedric Nugteren
|
c76f1d9dbb
|
Made the tuning database an optional external download
|
2016-02-07 10:59:51 +01:00 |
|