Commit graph

211 commits

Author SHA1 Message Date
Cedric Nugteren bff64917bd Fixed some small issues regarding PR#253 2018-03-03 10:43:12 +01:00
sivagnanamn 1433dc67f1 Added C API for getting GEMM temp buffer size 2018-03-03 03:00:17 +09:00
Cedric Nugteren 13dc26e63d Generated PyCLBlast docstrings 2018-02-25 15:30:57 +01:00
Cedric Nugteren 6710c60935 Some style improvements in the pyclblast code generator 2018-02-25 14:51:58 +01:00
Cedric Nugteren 9699169cdf Added API documentation for two missing C++ functions 2018-02-25 14:44:22 +01:00
Cedric Nugteren e784df0230 Renamed the API documentation 2018-02-24 20:46:44 +01:00
Kirill Mavreshko e300ad3292 Fixed duplication of parameter descriptions by the doc generator 2018-02-21 14:18:45 +05:00
Cedric Nugteren ce5e2a1e00 Prepared PyCLBlast for release as a package on PyPi 2018-02-18 18:01:02 +01:00
Cedric Nugteren a66e24a009 Added all other level 1/2/3 routines to pyclblast 2018-02-18 17:34:10 +01:00
Cedric Nugteren e1bfb40827 Added GEMM to the Python wrapper 2018-02-18 16:33:20 +01:00
Cedric Nugteren eb85f6b514 First agenerated version (clblastXswap only for now) of the pyclblast wrapper 2018-02-14 20:50:47 +01:00
Cedric Nugteren ae66782eab Fixed the XHAD documentation 2018-02-02 21:12:07 +01:00
Cedric Nugteren ef5008f5e4 Created the API and stubs for the HAD (hadamard-product) routines 2018-01-31 20:41:02 +01:00
Cedric Nugteren 180532ea39 Some fixes to the benchmark scripts 2018-01-27 20:06:13 +01:00
Cedric Nugteren ada762f668 Minor displaying improvements to the graph plotting scripts 2018-01-26 20:38:11 +01:00
Cedric Nugteren 3651b51664 Improved the benchmark scripts; added gemmstridedbatched benchmark 2018-01-25 21:24:18 +01:00
Cedric Nugteren b35e3d1e53 Small improvements to benchmarking for cuBLAS 2018-01-14 19:50:27 +01:00
Cedric Nugteren a500f537d8 Added a RetrieveParameters function to inspect tuning parameters 2018-01-11 20:32:06 +01:00
Cedric Nugteren 9fb2c61b25 Added API and tests for new GemmStridedBatched routine 2018-01-07 14:27:15 +01:00
Cedric Nugteren 0c48c6e6c4 Fixed a minor nullptr related issue in the code generator 2018-01-06 19:32:54 +01:00
Cedric Nugteren a7ccce1969
Merge pull request #238 from CNugteren/gemm_api_with_temp_buffer
GEMM API with optional temp buffer
2018-01-06 16:08:27 +01:00
Cedric Nugteren ce069545d4 Added CUDA interface to get temporary-buffer size for GEMM routine 2018-01-06 10:05:28 +01:00
Cedric Nugteren 44431daecc Added a CUDA version of the GEMM temp-buffer optional argument 2018-01-04 19:33:51 +01:00
Cedric Nugteren af14fff1e9 Updated the generator script to automatically generate the temp-buffer code 2018-01-04 19:31:57 +01:00
Cedric Nugteren b4c8e1d9a5 Made plotting script more flexible: extra argument to set the comparison library 2017-12-31 16:02:46 +01:00
Cedric Nugteren 6d1e30e61f Added interface to compute the required temporary buffer size for GEMM 2017-12-28 14:46:45 +01:00
Cedric Nugteren 1e738db6dd Split the database into multiple small compilation units 2017-12-27 12:04:22 +01:00
Cedric Nugteren 3948cd6551 Made plotting script more resilient to missing data 2017-12-20 20:12:02 +01:00
Cedric Nugteren 0ee81e27b9 Added tuning results for Apple AMD Radeon Pro 580 2017-12-20 19:59:31 +01:00
Cedric Nugteren c680666250 Added try-except to database script parser to skip invalid files 2017-12-20 19:14:04 +01:00
Cedric Nugteren 606990af6f Made the database script properly handle multiple entries for a single device 2017-11-20 21:38:23 +01:00
Cedric Nugteren defad3d1a2 Minor fix to the database script 2017-11-19 18:19:21 +01:00
Cedric Nugteren a3a8b44f59 Some fixed for the new auto-tuner to be compatible with the Python scripts 2017-11-19 16:31:08 +01:00
Cedric Nugteren 33ac2b0175 Improved the way the database defaults are computed 2017-11-06 21:59:45 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren 73272ab97d Fixed a bug in database compression/decompression 2017-11-02 21:19:18 +01:00
Cedric Nugteren 54d0c440ce Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
Cedric Nugteren cc5b475425 CUDA API now takes context and device in instead of stream 2017-10-12 12:20:43 +02:00
Cedric Nugteren b901809345 Added first (untested) version of a CUDA API 2017-10-11 23:16:57 +02:00
Cedric Nugteren 9224da19ef Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately 2017-10-09 20:06:25 +02:00
Cedric Nugteren df3c9f4a8a Moved non-routine-specific API functions and includes to separate files 2017-10-08 21:52:02 +02:00
Cedric Nugteren 4e317f5e85 Improved compilation time of the tuner database 2017-09-16 18:02:37 +02:00
Cedric Nugteren 0d13d814c2 Added architecture layer in the tuning database for better performance on unseen devices 2017-09-14 21:27:33 +02:00
Cedric Nugteren 14a61d2425 Added database compress and de-compress functions 2017-09-12 22:25:52 +02:00
Cedric Nugteren ebe10d5118 Database now works with new format of clblast_[property] 2017-09-11 20:40:37 +02:00
Cedric Nugteren 20da5e33a8 Split the database files over multiple directories and files; first step towards separate compilation 2017-09-06 21:50:42 +02:00
Cedric Nugteren 84ec50e29d Added interface and stubs for the im2col routine 2017-07-02 12:10:22 +02:00
Cedric Nugteren 1a8ed48a35 Fixed some Clang and MSVC warnings 2017-06-25 11:50:36 +02:00
Cedric Nugteren 615a7fdc81 Fixes some compilation issues related to the database structure change 2017-06-21 23:07:47 +02:00
Cedric Nugteren e44feb8576 Changed the structure of the database to reduce compilation time and save memory 2017-06-20 21:19:26 +02:00