Cedric Nugteren
|
67d4bbff66
|
Added an option to the database script to remove tuning results from the database
|
2017-04-23 17:59:16 +02:00 |
|
Cedric Nugteren
|
1c33af6eab
|
Re-added Titan X (Pascal) tuning results based on more averaging when tuning
|
2017-04-23 17:58:56 +02:00 |
|
Cedric Nugteren
|
957aaae6ca
|
Merge branch 'development' into benchmarking
|
2017-04-21 21:59:48 +02:00 |
|
Cedric Nugteren
|
cc9ad7b33b
|
Removed the words SUMMARY from the title of the benchmark script when benchmarking the summary
|
2017-04-21 21:34:44 +02:00 |
|
Cedric Nugteren
|
4d34083039
|
Updated the settings for the batched benchmarks
|
2017-04-20 22:19:29 +02:00 |
|
Cedric Nugteren
|
409a5a2ad0
|
Fixed a namespace clash with CUDA FP16 for the half-datatype
|
2017-04-17 16:47:15 +02:00 |
|
Cedric Nugteren
|
3ec14df60e
|
Added proper handling of mismatched arguments in the database script
|
2017-04-17 15:00:45 +02:00 |
|
Cedric Nugteren
|
3e2faa5db8
|
Set proper settings for the benchmarks of batched routines
|
2017-04-16 20:40:15 +02:00 |
|
Cedric Nugteren
|
2673f50518
|
Merge branch 'development' into benchmarking
|
2017-04-16 19:41:14 +02:00 |
|
Cedric Nugteren
|
063ef729e1
|
Added settings for benchmarking batched routines
|
2017-04-16 16:55:49 +02:00 |
|
Cedric Nugteren
|
c88ad94338
|
Added a benchmark-all script to run multiple benchmarks automatically
|
2017-04-14 22:02:47 +02:00 |
|
Cedric Nugteren
|
5203402c41
|
Tuned the num-runs settings for the benchmarks
|
2017-04-14 21:22:02 +02:00 |
|
Cedric Nugteren
|
56b2f46fbf
|
Added output-folder for benchmarking and removed the requirement on X
|
2017-04-14 20:32:28 +02:00 |
|
Cedric Nugteren
|
8833ae51be
|
Made the number of runs a benchmark-specific setting in the benchmark scripts
|
2017-04-14 20:16:51 +02:00 |
|
Cedric Nugteren
|
f7f8ec644f
|
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works
|
2017-04-13 21:31:27 +02:00 |
|
Cedric Nugteren
|
f24c142948
|
Made compilation of the cuBLAS wrapper work properly
|
2017-04-11 21:50:18 +02:00 |
|
Cedric Nugteren
|
22b3ea9256
|
Merge branch 'development' into cublas_reference
Conflicts:
scripts/generator/generator.py
|
2017-04-10 20:11:45 +02:00 |
|
Cedric Nugteren
|
2d45c37676
|
Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard
|
2017-04-10 07:40:27 +02:00 |
|
Cedric Nugteren
|
52dd7433ca
|
Completed the cuBLAS wrapper
|
2017-04-06 20:56:28 +02:00 |
|
Cedric Nugteren
|
674ff96fdf
|
Added a first version of a cuBLAS wrapper (WIP)
|
2017-04-05 21:27:25 +02:00 |
|
Cedric Nugteren
|
eb1fda2729
|
In-lined the float2 and double2 types to avoid collision with CUDA's definitions
|
2017-04-03 21:44:35 +02:00 |
|
Cedric Nugteren
|
0f96e9d2f9
|
Various tweaks to the new benchmark script
|
2017-04-02 14:53:55 +02:00 |
|
Cedric Nugteren
|
1ee71fdc80
|
Tuned the plots for a tight-layout for in papers and presentations
|
2017-04-01 14:00:46 +02:00 |
|
Cedric Nugteren
|
fa5c4b00b7
|
Replaced the R graph scripts with Python/Matplotlib benchmark scripts
|
2017-03-26 15:36:34 +02:00 |
|
Cedric Nugteren
|
49e04c7fce
|
Added API and test infrastructure for the batched GEMM routine
|
2017-03-10 21:24:35 +01:00 |
|
Cedric Nugteren
|
fa0a9c689f
|
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
|
2017-03-08 20:10:20 +01:00 |
|
Cedric Nugteren
|
b114ea49a9
|
Added first naive version of the batched AXPY routine
|
2017-03-05 15:06:14 +01:00 |
|
Cedric Nugteren
|
f9a520b3af
|
Prepared generator for batched routines; added batched AXPY routine interface
|
2017-03-05 10:38:38 +01:00 |
|
Cedric Nugteren
|
dde67ac79e
|
Minor fix to the generator script
|
2017-02-26 14:53:58 +01:00 |
|
Cedric Nugteren
|
ea6790665d
|
Merge branch 'development' into triangular_solvers
|
2017-02-26 14:51:45 +01:00 |
|
Cedric Nugteren
|
b7310036ed
|
Removed half-precision support from the TRSM routine; too unstable
|
2017-02-26 12:56:21 +01:00 |
|
Cedric Nugteren
|
fef11a208c
|
Added documentation for the OverrideParameters function
|
2017-02-18 11:02:57 +01:00 |
|
Cedric Nugteren
|
3d10690c83
|
Added missing documentation for the fill and clear cache functions
|
2017-02-18 10:32:32 +01:00 |
|
Cedric Nugteren
|
cda449a5c3
|
Added a C interface to the OverrideParameters function; added some in-line comments to the API
|
2017-02-16 21:14:48 +01:00 |
|
Cedric Nugteren
|
08bfb75a9d
|
Added input-sanity checks for the OverrideParameters function
|
2017-02-16 21:12:50 +01:00 |
|
Cedric Nugteren
|
cdb3bb7166
|
Added first version of the OverrideParameters function
|
2017-02-13 20:53:06 +01:00 |
|
Cedric Nugteren
|
c248f900c0
|
Merge branch 'development' into triangular_solvers
|
2017-02-05 22:18:59 +01:00 |
|
Ivan Shapovalov
|
1b8e816333
|
FillCache: perform compilation for each precision separately
Thus do not prevent filling cache for float if the device does not support
e. g. double.
|
2017-01-24 02:43:00 +03:00 |
|
Cedric Nugteren
|
a5fd2323b6
|
Added prototype for the TRSV routine
|
2017-01-20 11:30:32 +01:00 |
|
Cedric Nugteren
|
32b850b12b
|
Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU
|
2017-01-03 20:30:56 +01:00 |
|
Cedric Nugteren
|
681a465b35
|
Prepared for the addition of the TRSM triangular solver kernel
|
2016-12-18 12:30:16 +01:00 |
|
Cedric Nugteren
|
39c49bf4f9
|
Made it possible to use the command-line environmental variables for each executable and without re-running CMake
|
2016-11-27 11:00:29 +01:00 |
|
Cedric Nugteren
|
080e1be684
|
Improved the default parameters for cases with non-common parameters across all devices
|
2016-11-26 16:38:17 +01:00 |
|
Cedric Nugteren
|
cb398f0e42
|
Merge pull request #125 from CNugteren/netlib_blas_api
Netlib CBLAS API for CLBlast
|
2016-11-24 19:35:59 +01:00 |
|
Cedric Nugteren
|
792cc8359f
|
Fixed a vector-size related bug in the CLBlast Netlib API
|
2016-11-23 22:00:20 +01:00 |
|
Cedric Nugteren
|
26ca071480
|
Minor changes to ensure full compatibility with the Netlib CBLAS API
|
2016-11-22 08:41:52 +01:00 |
|
Cedric Nugteren
|
eefe0df435
|
Made functions with scalar-buffers as output properly return values
|
2016-11-20 21:36:57 +01:00 |
|
Cedric Nugteren
|
4c9585a349
|
Generating FP16 performance graphs now uses FP32 as a reference for comparison
|
2016-11-19 22:21:07 +01:00 |
|
Cedric Nugteren
|
8ae8ab06a2
|
Renamed the include and source files of the Netlib CBLAS API
|
2016-10-25 20:33:10 +02:00 |
|
Cedric Nugteren
|
140121ef91
|
Removed the clblast namespace from the Netlib C API source file to ensure proper linking
|
2016-10-25 20:21:50 +02:00 |
|