Cedric Nugteren
|
2d1f6ba7fe
|
Added convgemm skeleton, test infrastructure, and first reference implementation
|
2018-05-06 11:35:34 +02:00 |
|
Cedric Nugteren
|
2776d76176
|
Added interface of batched convolution as GEMM
|
2018-05-05 14:06:33 +02:00 |
|
Cedric Nugteren
|
f14e6f87d2
|
Updated tuning results for the Skylake ULT GT2 GPU with the new kernel
|
2018-04-15 11:45:45 +02:00 |
|
Cedric Nugteren
|
f6a48f05ed
|
Made it possible to add tuning parameters to the database using the script
|
2018-04-10 21:24:36 +02:00 |
|
Cedric Nugteren
|
3fbbb81137
|
Fixed a bug in the compression part of the database script
|
2018-04-10 21:18:11 +02:00 |
|
Cedric Nugteren
|
77ba11f686
|
Extended the maximum number of tuning parameters from 14 to 16
|
2018-04-08 18:12:54 +02:00 |
|
Cedric Nugteren
|
cf7965dc68
|
Fixed a python3 import error issue with the database script
|
2018-04-07 17:40:43 +02:00 |
|
kodonell
|
173a7eb928
|
merged
|
2018-03-27 08:55:39 +13:00 |
|
kodonell
|
d16f2d1317
|
got the generator thing working
|
2018-03-27 08:45:54 +13:00 |
|
Cedric Nugteren
|
934893972e
|
Merge pull request #262 from CNugteren/CLBlast-237-tuning-api
CLBlast #237: Tuning API
|
2018-03-11 15:38:33 +01:00 |
|
Cedric Nugteren
|
0dd1bc6f48
|
Made benchmarking script also work for complex numbers
|
2018-03-10 17:03:57 +01:00 |
|
Cedric Nugteren
|
54bbc99273
|
Updated the documentation for the tuner API
|
2018-03-10 14:52:40 +01:00 |
|
Cedric Nugteren
|
3d2ef9331b
|
Fixed a few things for the new tuning API
|
2018-03-10 14:35:11 +01:00 |
|
Cedric Nugteren
|
bff64917bd
|
Fixed some small issues regarding PR#253
|
2018-03-03 10:43:12 +01:00 |
|
sivagnanamn
|
1433dc67f1
|
Added C API for getting GEMM temp buffer size
|
2018-03-03 03:00:17 +09:00 |
|
Cedric Nugteren
|
13dc26e63d
|
Generated PyCLBlast docstrings
|
2018-02-25 15:30:57 +01:00 |
|
Cedric Nugteren
|
6710c60935
|
Some style improvements in the pyclblast code generator
|
2018-02-25 14:51:58 +01:00 |
|
Cedric Nugteren
|
9699169cdf
|
Added API documentation for two missing C++ functions
|
2018-02-25 14:44:22 +01:00 |
|
Cedric Nugteren
|
e784df0230
|
Renamed the API documentation
|
2018-02-24 20:46:44 +01:00 |
|
Kirill Mavreshko
|
e300ad3292
|
Fixed duplication of parameter descriptions by the doc generator
|
2018-02-21 14:18:45 +05:00 |
|
Cedric Nugteren
|
ce5e2a1e00
|
Prepared PyCLBlast for release as a package on PyPi
|
2018-02-18 18:01:02 +01:00 |
|
Cedric Nugteren
|
a66e24a009
|
Added all other level 1/2/3 routines to pyclblast
|
2018-02-18 17:34:10 +01:00 |
|
Cedric Nugteren
|
e1bfb40827
|
Added GEMM to the Python wrapper
|
2018-02-18 16:33:20 +01:00 |
|
Cedric Nugteren
|
eb85f6b514
|
First agenerated version (clblastXswap only for now) of the pyclblast wrapper
|
2018-02-14 20:50:47 +01:00 |
|
Cedric Nugteren
|
ae66782eab
|
Fixed the XHAD documentation
|
2018-02-02 21:12:07 +01:00 |
|
Cedric Nugteren
|
ef5008f5e4
|
Created the API and stubs for the HAD (hadamard-product) routines
|
2018-01-31 20:41:02 +01:00 |
|
Cedric Nugteren
|
180532ea39
|
Some fixes to the benchmark scripts
|
2018-01-27 20:06:13 +01:00 |
|
Cedric Nugteren
|
ada762f668
|
Minor displaying improvements to the graph plotting scripts
|
2018-01-26 20:38:11 +01:00 |
|
Cedric Nugteren
|
3651b51664
|
Improved the benchmark scripts; added gemmstridedbatched benchmark
|
2018-01-25 21:24:18 +01:00 |
|
Cedric Nugteren
|
b35e3d1e53
|
Small improvements to benchmarking for cuBLAS
|
2018-01-14 19:50:27 +01:00 |
|
Cedric Nugteren
|
a500f537d8
|
Added a RetrieveParameters function to inspect tuning parameters
|
2018-01-11 20:32:06 +01:00 |
|
Cedric Nugteren
|
9fb2c61b25
|
Added API and tests for new GemmStridedBatched routine
|
2018-01-07 14:27:15 +01:00 |
|
Cedric Nugteren
|
0c48c6e6c4
|
Fixed a minor nullptr related issue in the code generator
|
2018-01-06 19:32:54 +01:00 |
|
Cedric Nugteren
|
a7ccce1969
|
Merge pull request #238 from CNugteren/gemm_api_with_temp_buffer
GEMM API with optional temp buffer
|
2018-01-06 16:08:27 +01:00 |
|
Cedric Nugteren
|
ce069545d4
|
Added CUDA interface to get temporary-buffer size for GEMM routine
|
2018-01-06 10:05:28 +01:00 |
|
Cedric Nugteren
|
44431daecc
|
Added a CUDA version of the GEMM temp-buffer optional argument
|
2018-01-04 19:33:51 +01:00 |
|
Cedric Nugteren
|
af14fff1e9
|
Updated the generator script to automatically generate the temp-buffer code
|
2018-01-04 19:31:57 +01:00 |
|
Cedric Nugteren
|
b4c8e1d9a5
|
Made plotting script more flexible: extra argument to set the comparison library
|
2017-12-31 16:02:46 +01:00 |
|
Cedric Nugteren
|
6d1e30e61f
|
Added interface to compute the required temporary buffer size for GEMM
|
2017-12-28 14:46:45 +01:00 |
|
Cedric Nugteren
|
1e738db6dd
|
Split the database into multiple small compilation units
|
2017-12-27 12:04:22 +01:00 |
|
Cedric Nugteren
|
3948cd6551
|
Made plotting script more resilient to missing data
|
2017-12-20 20:12:02 +01:00 |
|
Cedric Nugteren
|
0ee81e27b9
|
Added tuning results for Apple AMD Radeon Pro 580
|
2017-12-20 19:59:31 +01:00 |
|
Cedric Nugteren
|
c680666250
|
Added try-except to database script parser to skip invalid files
|
2017-12-20 19:14:04 +01:00 |
|
Cedric Nugteren
|
606990af6f
|
Made the database script properly handle multiple entries for a single device
|
2017-11-20 21:38:23 +01:00 |
|
Cedric Nugteren
|
defad3d1a2
|
Minor fix to the database script
|
2017-11-19 18:19:21 +01:00 |
|
Cedric Nugteren
|
a3a8b44f59
|
Some fixed for the new auto-tuner to be compatible with the Python scripts
|
2017-11-19 16:31:08 +01:00 |
|
Cedric Nugteren
|
33ac2b0175
|
Improved the way the database defaults are computed
|
2017-11-06 21:59:45 +01:00 |
|
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
|
Cedric Nugteren
|
73272ab97d
|
Fixed a bug in database compression/decompression
|
2017-11-02 21:19:18 +01:00 |
|
Cedric Nugteren
|
54d0c440ce
|
Various fixes to make the host code and sample compile with the CUDA API
|
2017-10-14 11:43:57 +02:00 |
|