Cedric Nugteren
|
ce069545d4
|
Added CUDA interface to get temporary-buffer size for GEMM routine
|
2018-01-06 10:05:28 +01:00 |
|
Cedric Nugteren
|
44431daecc
|
Added a CUDA version of the GEMM temp-buffer optional argument
|
2018-01-04 19:33:51 +01:00 |
|
Cedric Nugteren
|
af14fff1e9
|
Updated the generator script to automatically generate the temp-buffer code
|
2018-01-04 19:31:57 +01:00 |
|
Cedric Nugteren
|
6d1e30e61f
|
Added interface to compute the required temporary buffer size for GEMM
|
2017-12-28 14:46:45 +01:00 |
|
Cedric Nugteren
|
1e738db6dd
|
Split the database into multiple small compilation units
|
2017-12-27 12:04:22 +01:00 |
|
Cedric Nugteren
|
3948cd6551
|
Made plotting script more resilient to missing data
|
2017-12-20 20:12:02 +01:00 |
|
Cedric Nugteren
|
0ee81e27b9
|
Added tuning results for Apple AMD Radeon Pro 580
|
2017-12-20 19:59:31 +01:00 |
|
Cedric Nugteren
|
c680666250
|
Added try-except to database script parser to skip invalid files
|
2017-12-20 19:14:04 +01:00 |
|
Cedric Nugteren
|
606990af6f
|
Made the database script properly handle multiple entries for a single device
|
2017-11-20 21:38:23 +01:00 |
|
Cedric Nugteren
|
defad3d1a2
|
Minor fix to the database script
|
2017-11-19 18:19:21 +01:00 |
|
Cedric Nugteren
|
a3a8b44f59
|
Some fixed for the new auto-tuner to be compatible with the Python scripts
|
2017-11-19 16:31:08 +01:00 |
|
Cedric Nugteren
|
33ac2b0175
|
Improved the way the database defaults are computed
|
2017-11-06 21:59:45 +01:00 |
|
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
|
Cedric Nugteren
|
73272ab97d
|
Fixed a bug in database compression/decompression
|
2017-11-02 21:19:18 +01:00 |
|
Cedric Nugteren
|
54d0c440ce
|
Various fixes to make the host code and sample compile with the CUDA API
|
2017-10-14 11:43:57 +02:00 |
|
Cedric Nugteren
|
cc5b475425
|
CUDA API now takes context and device in instead of stream
|
2017-10-12 12:20:43 +02:00 |
|
Cedric Nugteren
|
b901809345
|
Added first (untested) version of a CUDA API
|
2017-10-11 23:16:57 +02:00 |
|
Cedric Nugteren
|
9224da19ef
|
Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately
|
2017-10-09 20:06:25 +02:00 |
|
Cedric Nugteren
|
df3c9f4a8a
|
Moved non-routine-specific API functions and includes to separate files
|
2017-10-08 21:52:02 +02:00 |
|
Cedric Nugteren
|
4e317f5e85
|
Improved compilation time of the tuner database
|
2017-09-16 18:02:37 +02:00 |
|
Cedric Nugteren
|
0d13d814c2
|
Added architecture layer in the tuning database for better performance on unseen devices
|
2017-09-14 21:27:33 +02:00 |
|
Cedric Nugteren
|
14a61d2425
|
Added database compress and de-compress functions
|
2017-09-12 22:25:52 +02:00 |
|
Cedric Nugteren
|
ebe10d5118
|
Database now works with new format of clblast_[property]
|
2017-09-11 20:40:37 +02:00 |
|
Cedric Nugteren
|
20da5e33a8
|
Split the database files over multiple directories and files; first step towards separate compilation
|
2017-09-06 21:50:42 +02:00 |
|
Cedric Nugteren
|
84ec50e29d
|
Added interface and stubs for the im2col routine
|
2017-07-02 12:10:22 +02:00 |
|
Cedric Nugteren
|
1a8ed48a35
|
Fixed some Clang and MSVC warnings
|
2017-06-25 11:50:36 +02:00 |
|
Cedric Nugteren
|
615a7fdc81
|
Fixes some compilation issues related to the database structure change
|
2017-06-21 23:07:47 +02:00 |
|
Cedric Nugteren
|
e44feb8576
|
Changed the structure of the database to reduce compilation time and save memory
|
2017-06-20 21:19:26 +02:00 |
|
Grigori Fursin
|
35e2e6c3a4
|
changing "wb" to "w" when saving json file (text mode) - compatibility for Python 3
|
2017-05-24 15:08:34 +02:00 |
|
Cedric Nugteren
|
f151e56daa
|
Added the IxAMIN routines: absolute minimum version of IxAMAX
|
2017-05-12 20:01:33 -07:00 |
|
Cedric Nugteren
|
97955fc221
|
Minor naming fixes to the benchmark script
|
2017-05-11 22:12:16 -07:00 |
|
Cedric Nugteren
|
67d4bbff66
|
Added an option to the database script to remove tuning results from the database
|
2017-04-23 17:59:16 +02:00 |
|
Cedric Nugteren
|
1c33af6eab
|
Re-added Titan X (Pascal) tuning results based on more averaging when tuning
|
2017-04-23 17:58:56 +02:00 |
|
Cedric Nugteren
|
957aaae6ca
|
Merge branch 'development' into benchmarking
|
2017-04-21 21:59:48 +02:00 |
|
Cedric Nugteren
|
cc9ad7b33b
|
Removed the words SUMMARY from the title of the benchmark script when benchmarking the summary
|
2017-04-21 21:34:44 +02:00 |
|
Cedric Nugteren
|
4d34083039
|
Updated the settings for the batched benchmarks
|
2017-04-20 22:19:29 +02:00 |
|
Cedric Nugteren
|
409a5a2ad0
|
Fixed a namespace clash with CUDA FP16 for the half-datatype
|
2017-04-17 16:47:15 +02:00 |
|
Cedric Nugteren
|
3ec14df60e
|
Added proper handling of mismatched arguments in the database script
|
2017-04-17 15:00:45 +02:00 |
|
Cedric Nugteren
|
3e2faa5db8
|
Set proper settings for the benchmarks of batched routines
|
2017-04-16 20:40:15 +02:00 |
|
Cedric Nugteren
|
2673f50518
|
Merge branch 'development' into benchmarking
|
2017-04-16 19:41:14 +02:00 |
|
Cedric Nugteren
|
063ef729e1
|
Added settings for benchmarking batched routines
|
2017-04-16 16:55:49 +02:00 |
|
Cedric Nugteren
|
c88ad94338
|
Added a benchmark-all script to run multiple benchmarks automatically
|
2017-04-14 22:02:47 +02:00 |
|
Cedric Nugteren
|
5203402c41
|
Tuned the num-runs settings for the benchmarks
|
2017-04-14 21:22:02 +02:00 |
|
Cedric Nugteren
|
56b2f46fbf
|
Added output-folder for benchmarking and removed the requirement on X
|
2017-04-14 20:32:28 +02:00 |
|
Cedric Nugteren
|
8833ae51be
|
Made the number of runs a benchmark-specific setting in the benchmark scripts
|
2017-04-14 20:16:51 +02:00 |
|
Cedric Nugteren
|
f7f8ec644f
|
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works
|
2017-04-13 21:31:27 +02:00 |
|
Cedric Nugteren
|
f24c142948
|
Made compilation of the cuBLAS wrapper work properly
|
2017-04-11 21:50:18 +02:00 |
|
Cedric Nugteren
|
22b3ea9256
|
Merge branch 'development' into cublas_reference
Conflicts:
scripts/generator/generator.py
|
2017-04-10 20:11:45 +02:00 |
|
Cedric Nugteren
|
2d45c37676
|
Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard
|
2017-04-10 07:40:27 +02:00 |
|
Cedric Nugteren
|
52dd7433ca
|
Completed the cuBLAS wrapper
|
2017-04-06 20:56:28 +02:00 |
|