Cedric Nugteren
54d0c440ce
Various fixes to make the host code and sample compile with the CUDA API
2017-10-14 11:43:57 +02:00
Cedric Nugteren
cc5b475425
CUDA API now takes context and device in instead of stream
2017-10-12 12:20:43 +02:00
Cedric Nugteren
b901809345
Added first (untested) version of a CUDA API
2017-10-11 23:16:57 +02:00
Cedric Nugteren
9224da19ef
Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately
2017-10-09 20:06:25 +02:00
Cedric Nugteren
df3c9f4a8a
Moved non-routine-specific API functions and includes to separate files
2017-10-08 21:52:02 +02:00
Cedric Nugteren
4e317f5e85
Improved compilation time of the tuner database
2017-09-16 18:02:37 +02:00
Cedric Nugteren
0d13d814c2
Added architecture layer in the tuning database for better performance on unseen devices
2017-09-14 21:27:33 +02:00
Cedric Nugteren
14a61d2425
Added database compress and de-compress functions
2017-09-12 22:25:52 +02:00
Cedric Nugteren
ebe10d5118
Database now works with new format of clblast_[property]
2017-09-11 20:40:37 +02:00
Cedric Nugteren
20da5e33a8
Split the database files over multiple directories and files; first step towards separate compilation
2017-09-06 21:50:42 +02:00
Cedric Nugteren
84ec50e29d
Added interface and stubs for the im2col routine
2017-07-02 12:10:22 +02:00
Cedric Nugteren
1a8ed48a35
Fixed some Clang and MSVC warnings
2017-06-25 11:50:36 +02:00
Cedric Nugteren
615a7fdc81
Fixes some compilation issues related to the database structure change
2017-06-21 23:07:47 +02:00
Cedric Nugteren
e44feb8576
Changed the structure of the database to reduce compilation time and save memory
2017-06-20 21:19:26 +02:00
Grigori Fursin
35e2e6c3a4
changing "wb" to "w" when saving json file (text mode) - compatibility for Python 3
2017-05-24 15:08:34 +02:00
Cedric Nugteren
f151e56daa
Added the IxAMIN routines: absolute minimum version of IxAMAX
2017-05-12 20:01:33 -07:00
Cedric Nugteren
97955fc221
Minor naming fixes to the benchmark script
2017-05-11 22:12:16 -07:00
Cedric Nugteren
67d4bbff66
Added an option to the database script to remove tuning results from the database
2017-04-23 17:59:16 +02:00
Cedric Nugteren
1c33af6eab
Re-added Titan X (Pascal) tuning results based on more averaging when tuning
2017-04-23 17:58:56 +02:00
Cedric Nugteren
957aaae6ca
Merge branch 'development' into benchmarking
2017-04-21 21:59:48 +02:00
Cedric Nugteren
cc9ad7b33b
Removed the words SUMMARY from the title of the benchmark script when benchmarking the summary
2017-04-21 21:34:44 +02:00
Cedric Nugteren
4d34083039
Updated the settings for the batched benchmarks
2017-04-20 22:19:29 +02:00
Cedric Nugteren
409a5a2ad0
Fixed a namespace clash with CUDA FP16 for the half-datatype
2017-04-17 16:47:15 +02:00
Cedric Nugteren
3ec14df60e
Added proper handling of mismatched arguments in the database script
2017-04-17 15:00:45 +02:00
Cedric Nugteren
3e2faa5db8
Set proper settings for the benchmarks of batched routines
2017-04-16 20:40:15 +02:00
Cedric Nugteren
2673f50518
Merge branch 'development' into benchmarking
2017-04-16 19:41:14 +02:00
Cedric Nugteren
063ef729e1
Added settings for benchmarking batched routines
2017-04-16 16:55:49 +02:00
Cedric Nugteren
c88ad94338
Added a benchmark-all script to run multiple benchmarks automatically
2017-04-14 22:02:47 +02:00
Cedric Nugteren
5203402c41
Tuned the num-runs settings for the benchmarks
2017-04-14 21:22:02 +02:00
Cedric Nugteren
56b2f46fbf
Added output-folder for benchmarking and removed the requirement on X
2017-04-14 20:32:28 +02:00
Cedric Nugteren
8833ae51be
Made the number of runs a benchmark-specific setting in the benchmark scripts
2017-04-14 20:16:51 +02:00
Cedric Nugteren
f7f8ec644f
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works
2017-04-13 21:31:27 +02:00
Cedric Nugteren
f24c142948
Made compilation of the cuBLAS wrapper work properly
2017-04-11 21:50:18 +02:00
Cedric Nugteren
22b3ea9256
Merge branch 'development' into cublas_reference
...
Conflicts:
scripts/generator/generator.py
2017-04-10 20:11:45 +02:00
Cedric Nugteren
2d45c37676
Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard
2017-04-10 07:40:27 +02:00
Cedric Nugteren
52dd7433ca
Completed the cuBLAS wrapper
2017-04-06 20:56:28 +02:00
Cedric Nugteren
674ff96fdf
Added a first version of a cuBLAS wrapper (WIP)
2017-04-05 21:27:25 +02:00
Cedric Nugteren
eb1fda2729
In-lined the float2 and double2 types to avoid collision with CUDA's definitions
2017-04-03 21:44:35 +02:00
Cedric Nugteren
0f96e9d2f9
Various tweaks to the new benchmark script
2017-04-02 14:53:55 +02:00
Cedric Nugteren
1ee71fdc80
Tuned the plots for a tight-layout for in papers and presentations
2017-04-01 14:00:46 +02:00
Cedric Nugteren
fa5c4b00b7
Replaced the R graph scripts with Python/Matplotlib benchmark scripts
2017-03-26 15:36:34 +02:00
Cedric Nugteren
49e04c7fce
Added API and test infrastructure for the batched GEMM routine
2017-03-10 21:24:35 +01:00
Cedric Nugteren
fa0a9c689f
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
2017-03-08 20:10:20 +01:00
Cedric Nugteren
b114ea49a9
Added first naive version of the batched AXPY routine
2017-03-05 15:06:14 +01:00
Cedric Nugteren
f9a520b3af
Prepared generator for batched routines; added batched AXPY routine interface
2017-03-05 10:38:38 +01:00
Cedric Nugteren
dde67ac79e
Minor fix to the generator script
2017-02-26 14:53:58 +01:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
b7310036ed
Removed half-precision support from the TRSM routine; too unstable
2017-02-26 12:56:21 +01:00
Cedric Nugteren
fef11a208c
Added documentation for the OverrideParameters function
2017-02-18 11:02:57 +01:00
Cedric Nugteren
3d10690c83
Added missing documentation for the fill and clear cache functions
2017-02-18 10:32:32 +01:00
Cedric Nugteren
cda449a5c3
Added a C interface to the OverrideParameters function; added some in-line comments to the API
2017-02-16 21:14:48 +01:00
Cedric Nugteren
08bfb75a9d
Added input-sanity checks for the OverrideParameters function
2017-02-16 21:12:50 +01:00
Cedric Nugteren
cdb3bb7166
Added first version of the OverrideParameters function
2017-02-13 20:53:06 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Ivan Shapovalov
1b8e816333
FillCache: perform compilation for each precision separately
...
Thus do not prevent filling cache for float if the device does not support
e. g. double.
2017-01-24 02:43:00 +03:00
Cedric Nugteren
a5fd2323b6
Added prototype for the TRSV routine
2017-01-20 11:30:32 +01:00
Cedric Nugteren
32b850b12b
Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU
2017-01-03 20:30:56 +01:00
Cedric Nugteren
681a465b35
Prepared for the addition of the TRSM triangular solver kernel
2016-12-18 12:30:16 +01:00
Cedric Nugteren
39c49bf4f9
Made it possible to use the command-line environmental variables for each executable and without re-running CMake
2016-11-27 11:00:29 +01:00
Cedric Nugteren
080e1be684
Improved the default parameters for cases with non-common parameters across all devices
2016-11-26 16:38:17 +01:00
Cedric Nugteren
cb398f0e42
Merge pull request #125 from CNugteren/netlib_blas_api
...
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren
792cc8359f
Fixed a vector-size related bug in the CLBlast Netlib API
2016-11-23 22:00:20 +01:00
Cedric Nugteren
26ca071480
Minor changes to ensure full compatibility with the Netlib CBLAS API
2016-11-22 08:41:52 +01:00
Cedric Nugteren
eefe0df435
Made functions with scalar-buffers as output properly return values
2016-11-20 21:36:57 +01:00
Cedric Nugteren
4c9585a349
Generating FP16 performance graphs now uses FP32 as a reference for comparison
2016-11-19 22:21:07 +01:00
Cedric Nugteren
8ae8ab06a2
Renamed the include and source files of the Netlib CBLAS API
2016-10-25 20:33:10 +02:00
Cedric Nugteren
140121ef91
Removed the clblast namespace from the Netlib C API source file to ensure proper linking
2016-10-25 20:21:50 +02:00
Cedric Nugteren
729862e873
Fixed some issues preventing the Netlib CBLAS API from linking correctly
2016-10-25 19:56:42 +02:00
Cedric Nugteren
926aca53a0
Made the Netlib CBLAS API use the same enums with prefixes as the regular C API of CLBlast
2016-10-25 19:45:57 +02:00
Cedric Nugteren
59183b7d79
Sets the proper sizes for the buffers for the Netlib CBLAS API
2016-10-25 19:21:49 +02:00
Cedric Nugteren
f96fd372bc
Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes
2016-10-25 14:28:52 +02:00
Cedric Nugteren
3b65eace0a
Merge branch 'development' into netlib_blas_api
...
Conflicts:
scripts/generator/generator.py
scripts/generator/generator/routine.py
2016-10-25 09:34:24 +02:00
Cedric Nugteren
a670c4c4bf
All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects
2016-10-22 16:14:56 +02:00
Cedric Nugteren
4a5516aa78
Added extra error codes to reflect the more detailed error reporting of OpenCL functions
2016-10-22 15:46:29 +02:00
Ivan Shapovalov
56f300607b
Routine: get rid of ::SetUp()
...
Since we now use C++ exceptions inside the implementation (and exceptions
can be thrown from constructors), there is no need for a separate
Routine::SetUp() function.
For this, we also change the way how the kernel source string is constructed.
The kernel-specific source code is now passed to the Routine ctor via
an initializer_list of C strings to avoid unnecessary data copying
while also working around C1091 of MSVC 2013.
2016-10-22 08:45:27 +03:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren
9331442a56
Merge branch 'development' into netlib_blas_api
2016-10-16 11:43:03 +02:00
Cedric Nugteren
0f9311d46a
Fixed an issue with a growing database: the database is now a global variable in a namespace and its container uses const-pointers to the actual data
2016-10-14 20:56:32 +02:00
Cedric Nugteren
39afc9543b
Changed the storage location of the database to a separate Github repository
2016-10-10 19:10:12 +02:00
Cedric Nugteren
f563341e7b
Added fresh performance graphs for GeForce 750Ti; removed old GTX480 results
2016-10-10 16:59:28 +02:00
Cedric Nugteren
d7cfb6aa9b
Added benchmark script for small matrix sizes, testing the direct GEMM kernels
2016-10-08 22:05:54 +02:00
Cedric Nugteren
8d5747aa54
Made non-standard types void-pointers in the Netlib BLAS interface
2016-10-05 08:23:54 +02:00
Cedric Nugteren
a17b714c3e
Added first version of Netlib BLAS API header
2016-10-05 00:09:39 +02:00
Cedric Nugteren
aa3dffe356
Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are now automatically taken from 32-bit if there are no entries at all
2016-09-12 20:13:38 +02:00
Cedric Nugteren
b5a67f86ec
Complete re-write of the database script. Changed Pandas for the much faster and convienient plain JSON/dict data-type
2016-09-11 21:29:28 +02:00
Cedric Nugteren
e21f32bc99
Updated database based on exhaustive tuning results for GEMM for the R9 M370X GPU
2016-09-10 14:00:43 +02:00
Cedric Nugteren
3daba70997
Updated the database script to remove duplicate entries: keeps only the best-performing cases for a specific parameters combination
2016-09-10 11:12:09 +02:00
Cedric Nugteren
a2f8350703
Refactored the Python C++ generator script; now confirms to the PEP8 styleguide
2016-09-04 21:26:30 +02:00
Cedric Nugteren
521bf6cdfc
Added tuning results for Intel Broadwell 5500 GT2 GPU
2016-09-03 16:43:23 +02:00
Cedric Nugteren
19574b2519
Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to handle duplicate entries of different runs
2016-09-03 12:45:11 +02:00
Cedric Nugteren
0c0f0ac7f9
Also changed the default-default for unknown device types to use the same method as for known device groups
2016-08-21 20:35:20 +02:00
Cedric Nugteren
00979faab4
Updated the changelog; refactored the database-get-bests code a bit
2016-08-21 20:16:06 +02:00
Cedric Nugteren
7d5631b7e4
Updated the database script to calculate the relative best performance of tuning results common for a device/vendor type
2016-08-15 21:01:07 +02:00
Cedric Nugteren
7da6492b36
Improved the speed of the new common-best defaults method for the database generation
2016-08-09 21:06:04 +02:00
Cedric Nugteren
3f5401d4c8
Added a first version of the database's common-best default calculation
2016-08-07 16:25:38 +02:00
Cedric Nugteren
2582f0290a
Moved the XgemvFast and XgemvFastRot tuning database into a separate file
2016-07-25 22:43:49 +02:00
Cedric Nugteren
622682ffe3
Refactored the Python database script: separated functionality in modules, now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up
2016-07-24 16:41:01 +02:00
Cedric Nugteren
9683b50c55
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
2016-07-03 20:30:47 +02:00
Cedric Nugteren
5a690f4e36
Prints the current pandas version and reports the minimum required version
2016-07-02 16:44:13 +02:00
Cedric Nugteren
b330ab0866
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
2016-06-30 10:49:17 +02:00