Commit graph

184 commits

Author SHA1 Message Date
Cedric Nugteren 3948cd6551 Made plotting script more resilient to missing data 2017-12-20 20:12:02 +01:00
Cedric Nugteren 0ee81e27b9 Added tuning results for Apple AMD Radeon Pro 580 2017-12-20 19:59:31 +01:00
Cedric Nugteren c680666250 Added try-except to database script parser to skip invalid files 2017-12-20 19:14:04 +01:00
Cedric Nugteren 606990af6f Made the database script properly handle multiple entries for a single device 2017-11-20 21:38:23 +01:00
Cedric Nugteren defad3d1a2 Minor fix to the database script 2017-11-19 18:19:21 +01:00
Cedric Nugteren a3a8b44f59 Some fixed for the new auto-tuner to be compatible with the Python scripts 2017-11-19 16:31:08 +01:00
Cedric Nugteren 33ac2b0175 Improved the way the database defaults are computed 2017-11-06 21:59:45 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren 73272ab97d Fixed a bug in database compression/decompression 2017-11-02 21:19:18 +01:00
Cedric Nugteren 54d0c440ce Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
Cedric Nugteren cc5b475425 CUDA API now takes context and device in instead of stream 2017-10-12 12:20:43 +02:00
Cedric Nugteren b901809345 Added first (untested) version of a CUDA API 2017-10-11 23:16:57 +02:00
Cedric Nugteren 9224da19ef Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately 2017-10-09 20:06:25 +02:00
Cedric Nugteren df3c9f4a8a Moved non-routine-specific API functions and includes to separate files 2017-10-08 21:52:02 +02:00
Cedric Nugteren 4e317f5e85 Improved compilation time of the tuner database 2017-09-16 18:02:37 +02:00
Cedric Nugteren 0d13d814c2 Added architecture layer in the tuning database for better performance on unseen devices 2017-09-14 21:27:33 +02:00
Cedric Nugteren 14a61d2425 Added database compress and de-compress functions 2017-09-12 22:25:52 +02:00
Cedric Nugteren ebe10d5118 Database now works with new format of clblast_[property] 2017-09-11 20:40:37 +02:00
Cedric Nugteren 20da5e33a8 Split the database files over multiple directories and files; first step towards separate compilation 2017-09-06 21:50:42 +02:00
Cedric Nugteren 84ec50e29d Added interface and stubs for the im2col routine 2017-07-02 12:10:22 +02:00
Cedric Nugteren 1a8ed48a35 Fixed some Clang and MSVC warnings 2017-06-25 11:50:36 +02:00
Cedric Nugteren 615a7fdc81 Fixes some compilation issues related to the database structure change 2017-06-21 23:07:47 +02:00
Cedric Nugteren e44feb8576 Changed the structure of the database to reduce compilation time and save memory 2017-06-20 21:19:26 +02:00
Grigori Fursin 35e2e6c3a4 changing "wb" to "w" when saving json file (text mode) - compatibility for Python 3 2017-05-24 15:08:34 +02:00
Cedric Nugteren f151e56daa Added the IxAMIN routines: absolute minimum version of IxAMAX 2017-05-12 20:01:33 -07:00
Cedric Nugteren 97955fc221 Minor naming fixes to the benchmark script 2017-05-11 22:12:16 -07:00
Cedric Nugteren 67d4bbff66 Added an option to the database script to remove tuning results from the database 2017-04-23 17:59:16 +02:00
Cedric Nugteren 1c33af6eab Re-added Titan X (Pascal) tuning results based on more averaging when tuning 2017-04-23 17:58:56 +02:00
Cedric Nugteren 957aaae6ca Merge branch 'development' into benchmarking 2017-04-21 21:59:48 +02:00
Cedric Nugteren cc9ad7b33b Removed the words SUMMARY from the title of the benchmark script when benchmarking the summary 2017-04-21 21:34:44 +02:00
Cedric Nugteren 4d34083039 Updated the settings for the batched benchmarks 2017-04-20 22:19:29 +02:00
Cedric Nugteren 409a5a2ad0 Fixed a namespace clash with CUDA FP16 for the half-datatype 2017-04-17 16:47:15 +02:00
Cedric Nugteren 3ec14df60e Added proper handling of mismatched arguments in the database script 2017-04-17 15:00:45 +02:00
Cedric Nugteren 3e2faa5db8 Set proper settings for the benchmarks of batched routines 2017-04-16 20:40:15 +02:00
Cedric Nugteren 2673f50518 Merge branch 'development' into benchmarking 2017-04-16 19:41:14 +02:00
Cedric Nugteren 063ef729e1 Added settings for benchmarking batched routines 2017-04-16 16:55:49 +02:00
Cedric Nugteren c88ad94338 Added a benchmark-all script to run multiple benchmarks automatically 2017-04-14 22:02:47 +02:00
Cedric Nugteren 5203402c41 Tuned the num-runs settings for the benchmarks 2017-04-14 21:22:02 +02:00
Cedric Nugteren 56b2f46fbf Added output-folder for benchmarking and removed the requirement on X 2017-04-14 20:32:28 +02:00
Cedric Nugteren 8833ae51be Made the number of runs a benchmark-specific setting in the benchmark scripts 2017-04-14 20:16:51 +02:00
Cedric Nugteren f7f8ec644f Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works 2017-04-13 21:31:27 +02:00
Cedric Nugteren f24c142948 Made compilation of the cuBLAS wrapper work properly 2017-04-11 21:50:18 +02:00
Cedric Nugteren 22b3ea9256 Merge branch 'development' into cublas_reference
Conflicts:
	scripts/generator/generator.py
2017-04-10 20:11:45 +02:00
Cedric Nugteren 2d45c37676 Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard 2017-04-10 07:40:27 +02:00
Cedric Nugteren 52dd7433ca Completed the cuBLAS wrapper 2017-04-06 20:56:28 +02:00
Cedric Nugteren 674ff96fdf Added a first version of a cuBLAS wrapper (WIP) 2017-04-05 21:27:25 +02:00
Cedric Nugteren eb1fda2729 In-lined the float2 and double2 types to avoid collision with CUDA's definitions 2017-04-03 21:44:35 +02:00
Cedric Nugteren 0f96e9d2f9 Various tweaks to the new benchmark script 2017-04-02 14:53:55 +02:00
Cedric Nugteren 1ee71fdc80 Tuned the plots for a tight-layout for in papers and presentations 2017-04-01 14:00:46 +02:00
Cedric Nugteren fa5c4b00b7 Replaced the R graph scripts with Python/Matplotlib benchmark scripts 2017-03-26 15:36:34 +02:00