Commit graph

161 commits

Author SHA1 Message Date
Grigori Fursin 35e2e6c3a4 changing "wb" to "w" when saving json file (text mode) - compatibility for Python 3 2017-05-24 15:08:34 +02:00
Cedric Nugteren f151e56daa Added the IxAMIN routines: absolute minimum version of IxAMAX 2017-05-12 20:01:33 -07:00
Cedric Nugteren 97955fc221 Minor naming fixes to the benchmark script 2017-05-11 22:12:16 -07:00
Cedric Nugteren 67d4bbff66 Added an option to the database script to remove tuning results from the database 2017-04-23 17:59:16 +02:00
Cedric Nugteren 1c33af6eab Re-added Titan X (Pascal) tuning results based on more averaging when tuning 2017-04-23 17:58:56 +02:00
Cedric Nugteren 957aaae6ca Merge branch 'development' into benchmarking 2017-04-21 21:59:48 +02:00
Cedric Nugteren cc9ad7b33b Removed the words SUMMARY from the title of the benchmark script when benchmarking the summary 2017-04-21 21:34:44 +02:00
Cedric Nugteren 4d34083039 Updated the settings for the batched benchmarks 2017-04-20 22:19:29 +02:00
Cedric Nugteren 409a5a2ad0 Fixed a namespace clash with CUDA FP16 for the half-datatype 2017-04-17 16:47:15 +02:00
Cedric Nugteren 3ec14df60e Added proper handling of mismatched arguments in the database script 2017-04-17 15:00:45 +02:00
Cedric Nugteren 3e2faa5db8 Set proper settings for the benchmarks of batched routines 2017-04-16 20:40:15 +02:00
Cedric Nugteren 2673f50518 Merge branch 'development' into benchmarking 2017-04-16 19:41:14 +02:00
Cedric Nugteren 063ef729e1 Added settings for benchmarking batched routines 2017-04-16 16:55:49 +02:00
Cedric Nugteren c88ad94338 Added a benchmark-all script to run multiple benchmarks automatically 2017-04-14 22:02:47 +02:00
Cedric Nugteren 5203402c41 Tuned the num-runs settings for the benchmarks 2017-04-14 21:22:02 +02:00
Cedric Nugteren 56b2f46fbf Added output-folder for benchmarking and removed the requirement on X 2017-04-14 20:32:28 +02:00
Cedric Nugteren 8833ae51be Made the number of runs a benchmark-specific setting in the benchmark scripts 2017-04-14 20:16:51 +02:00
Cedric Nugteren f7f8ec644f Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works 2017-04-13 21:31:27 +02:00
Cedric Nugteren f24c142948 Made compilation of the cuBLAS wrapper work properly 2017-04-11 21:50:18 +02:00
Cedric Nugteren 22b3ea9256 Merge branch 'development' into cublas_reference
Conflicts:
	scripts/generator/generator.py
2017-04-10 20:11:45 +02:00
Cedric Nugteren 2d45c37676 Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard 2017-04-10 07:40:27 +02:00
Cedric Nugteren 52dd7433ca Completed the cuBLAS wrapper 2017-04-06 20:56:28 +02:00
Cedric Nugteren 674ff96fdf Added a first version of a cuBLAS wrapper (WIP) 2017-04-05 21:27:25 +02:00
Cedric Nugteren eb1fda2729 In-lined the float2 and double2 types to avoid collision with CUDA's definitions 2017-04-03 21:44:35 +02:00
Cedric Nugteren 0f96e9d2f9 Various tweaks to the new benchmark script 2017-04-02 14:53:55 +02:00
Cedric Nugteren 1ee71fdc80 Tuned the plots for a tight-layout for in papers and presentations 2017-04-01 14:00:46 +02:00
Cedric Nugteren fa5c4b00b7 Replaced the R graph scripts with Python/Matplotlib benchmark scripts 2017-03-26 15:36:34 +02:00
Cedric Nugteren 49e04c7fce Added API and test infrastructure for the batched GEMM routine 2017-03-10 21:24:35 +01:00
Cedric Nugteren fa0a9c689f Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes 2017-03-08 20:10:20 +01:00
Cedric Nugteren b114ea49a9 Added first naive version of the batched AXPY routine 2017-03-05 15:06:14 +01:00
Cedric Nugteren f9a520b3af Prepared generator for batched routines; added batched AXPY routine interface 2017-03-05 10:38:38 +01:00
Cedric Nugteren dde67ac79e Minor fix to the generator script 2017-02-26 14:53:58 +01:00
Cedric Nugteren ea6790665d Merge branch 'development' into triangular_solvers 2017-02-26 14:51:45 +01:00
Cedric Nugteren b7310036ed Removed half-precision support from the TRSM routine; too unstable 2017-02-26 12:56:21 +01:00
Cedric Nugteren fef11a208c Added documentation for the OverrideParameters function 2017-02-18 11:02:57 +01:00
Cedric Nugteren 3d10690c83 Added missing documentation for the fill and clear cache functions 2017-02-18 10:32:32 +01:00
Cedric Nugteren cda449a5c3 Added a C interface to the OverrideParameters function; added some in-line comments to the API 2017-02-16 21:14:48 +01:00
Cedric Nugteren 08bfb75a9d Added input-sanity checks for the OverrideParameters function 2017-02-16 21:12:50 +01:00
Cedric Nugteren cdb3bb7166 Added first version of the OverrideParameters function 2017-02-13 20:53:06 +01:00
Cedric Nugteren c248f900c0 Merge branch 'development' into triangular_solvers 2017-02-05 22:18:59 +01:00
Ivan Shapovalov 1b8e816333 FillCache: perform compilation for each precision separately
Thus do not prevent filling cache for float if the device does not support
e. g. double.
2017-01-24 02:43:00 +03:00
Cedric Nugteren a5fd2323b6 Added prototype for the TRSV routine 2017-01-20 11:30:32 +01:00
Cedric Nugteren 32b850b12b Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU 2017-01-03 20:30:56 +01:00
Cedric Nugteren 681a465b35 Prepared for the addition of the TRSM triangular solver kernel 2016-12-18 12:30:16 +01:00
Cedric Nugteren 39c49bf4f9 Made it possible to use the command-line environmental variables for each executable and without re-running CMake 2016-11-27 11:00:29 +01:00
Cedric Nugteren 080e1be684 Improved the default parameters for cases with non-common parameters across all devices 2016-11-26 16:38:17 +01:00
Cedric Nugteren cb398f0e42 Merge pull request #125 from CNugteren/netlib_blas_api
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren 792cc8359f Fixed a vector-size related bug in the CLBlast Netlib API 2016-11-23 22:00:20 +01:00
Cedric Nugteren 26ca071480 Minor changes to ensure full compatibility with the Netlib CBLAS API 2016-11-22 08:41:52 +01:00
Cedric Nugteren eefe0df435 Made functions with scalar-buffers as output properly return values 2016-11-20 21:36:57 +01:00