Commit graph

141 commits

Author SHA1 Message Date
Cedric Nugteren 65c492edf6 Added OpenCL properties printing to the diagnostics helper 2017-09-22 21:35:32 +02:00
Cedric Nugteren 0802e3d84c Added tuning results for Intel Core i7 6770HQ 2017-09-16 21:19:06 +02:00
Cedric Nugteren 4e317f5e85 Improved compilation time of the tuner database 2017-09-16 18:02:37 +02:00
Cedric Nugteren 0d13d814c2 Added architecture layer in the tuning database for better performance on unseen devices 2017-09-14 21:27:33 +02:00
Cedric Nugteren 28462aa050 Removed an assumption that the 'default' tuning parameters have to be stored last; this is no longer needed 2017-09-04 17:39:57 +02:00
Cedric Nugteren 161fd8514d Merge branch 'master' into im_to_col 2017-08-24 21:15:14 +02:00
Cedric Nugteren 4d9d03ba51 Completed im2col implementation 2017-08-24 21:11:12 +02:00
Cedric Nugteren da28cc5e93 Minor updates after merging in the PSO addition to the tuners 2017-08-21 20:14:02 +02:00
Cedric Nugteren eb896838b1 Updated to version 1.0.1 (bugfix release) 2017-08-08 20:35:49 +02:00
Cedric Nugteren 1155c068e9 Updated to version 1.0.0 2017-07-30 20:54:21 +02:00
Cedric Nugteren b7473f50df Added status badges for correctness tests; updated list of contributors; fixed minor typos 2017-07-24 20:14:47 +02:00
Cedric Nugteren 4cf516cfec Fixed an if-statement in the direct GEMM kernel causing a bug with specific sets of input parameters 2017-06-30 21:57:41 +02:00
Cedric Nugteren ce528a9d39 Fixed and suppresses several warnings for MSVC 2017-06-26 21:38:04 +02:00
Cedric Nugteren 615a7fdc81 Fixes some compilation issues related to the database structure change 2017-06-21 23:07:47 +02:00
Cedric Nugteren 33ed1e5a06 Added tuning results for GeForce GT 650M (thanks to bzcheeseman) 2017-06-01 22:52:08 +02:00
Cedric Nugteren f151e56daa Added the IxAMIN routines: absolute minimum version of IxAMAX 2017-05-12 20:01:33 -07:00
Cedric Nugteren 86e8df60f1 Fixed a bug in the TRSM routine; tests now pass 2017-05-12 17:43:56 -07:00
Cedric Nugteren 81d9ed3946 Removed the included performance reports; README now redirects to the new external website 2017-05-12 13:18:10 -07:00
Cedric Nugteren 71933c3411 Added tuning results for the AMD Radeon Fiji GPU 2017-05-11 22:53:52 -07:00
Cedric Nugteren 97955fc221 Minor naming fixes to the benchmark script 2017-05-11 22:12:16 -07:00
Cedric Nugteren e9d2a2f54c Updated to version 0.11.0 2017-05-02 20:29:59 +02:00
Cedric Nugteren e3bb58f602 Finalized support for performance testing against cuBLAS 2017-04-16 17:53:51 +02:00
Cedric Nugteren 300531b869 Updated the changelog with the Apple CPU override 2017-04-10 07:21:34 +02:00
Cedric Nugteren fa5c4b00b7 Replaced the R graph scripts with Python/Matplotlib benchmark scripts 2017-03-26 15:36:34 +02:00
Cedric Nugteren 7b8f8fce68 Added initial naive version of the batched GEMM routine based on the direct GEMM kernel 2017-03-11 16:02:45 +01:00
Cedric Nugteren d754586b49 Added proper testing of the alpha parameter; finalized the batched AXPY implementation 2017-03-10 20:49:59 +01:00
Cedric Nugteren d6f1b5fca3 Added L2 error computation and checking for half-precision tests 2017-02-27 21:49:20 +01:00
Cedric Nugteren 00281dad26 Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants 2017-02-27 21:00:04 +01:00
Cedric Nugteren ea6790665d Merge branch 'development' into triangular_solvers 2017-02-26 14:51:45 +01:00
Cedric Nugteren ccac957f17 Added documentation for the TRSV and TRSM routines 2017-02-25 13:02:15 +01:00
Cedric Nugteren fef11a208c Added documentation for the OverrideParameters function 2017-02-18 11:02:57 +01:00
Cedric Nugteren fd471e380c Updated the changelog for PR131 and PR132 2017-01-24 20:34:09 +01:00
Cedric Nugteren ff2bf985a3 Updated the link to cl.hpp in the Khronos registry for the samples 2017-01-07 13:57:23 +01:00
Cedric Nugteren 69ca271a8c Always enables cl_khr_fp64 when running double-precision, not just for OpenCL 1.1 or lower 2017-01-07 13:31:29 +01:00
Cedric Nugteren 32b850b12b Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU 2017-01-03 20:30:56 +01:00
Cedric Nugteren 6b533dda1c Fixed a bug when using offsets in the direct GEMM kernels 2016-12-18 11:54:32 +01:00
Cedric Nugteren 2cf7d8429a Updated to version 0.10.0 2016-11-27 13:34:18 +01:00
Cedric Nugteren 39c49bf4f9 Made it possible to use the command-line environmental variables for each executable and without re-running CMake 2016-11-27 11:00:29 +01:00
Cedric Nugteren cb398f0e42 Merge pull request #125 from CNugteren/netlib_blas_api
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren 2f0697564f Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything 2016-11-20 15:05:42 +01:00
Cedric Nugteren bb14a5880e Added an example and documentation for the Netlib CBLAS API 2016-10-25 20:37:33 +02:00
Cedric Nugteren a670c4c4bf All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects 2016-10-22 16:14:56 +02:00
Cedric Nugteren 9afbbc9ef9 Added documentation for the better exception handling 2016-10-22 15:23:18 +02:00
Cedric Nugteren db17b1fbe9 Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters 2016-10-22 10:41:02 +02:00
Cedric Nugteren 53deed298f Added documentation and minor refactoring for the recent support of static library compilation 2016-10-15 17:11:08 +02:00
Cedric Nugteren ebb505b783 Added tuning results for Intel HD Graphics IvyBridge GPU 2016-10-13 12:18:28 +02:00
Cedric Nugteren 8a9d3cdf37 Added support for compiling the library, the client, and the samples under MSVC 2013 2016-10-10 22:45:39 +02:00
Cedric Nugteren b698e45478 Added first tuning results for the single-kernel direct GEMM implementation 2016-10-06 21:13:14 +02:00
Cedric Nugteren d59e5c570b Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0 2016-09-27 21:03:24 +02:00
Cedric Nugteren db5772e521 Updated to version 8.0 of the CLCudaAPI header 2016-09-27 20:56:49 +02:00