Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren
9eca896b05
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 20:24:40 +01:00
Cedric Nugteren
396ac0278a
Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering
2020-05-12 14:43:25 +02:00
Cedric Nugteren
b94e81af10
Added pyclblast bindings for the 3 batched routines
2020-05-10 12:26:25 +02:00
Cedric Nugteren
5f97d64505
Update API documentation
2020-03-08 11:29:47 +01:00
Cedric Nugteren
e0541c41a1
Added fp32 to fp16 conversion function in Python to make haxpy example work
2019-01-23 19:52:01 +01:00
Cedric Nugteren
3937efdcda
Added experimental support for half-precision in pyclblast
2019-01-22 21:13:41 +01:00
Koichi Akabe
032e3b0cc0
Add kernel_mode option to im2col, col2im, and convgemm functions
2018-11-12 10:12:07 +09:00
Cedric Nugteren
6f67525ea6
Changed col2im to append to the existing im-buffer
2018-11-07 19:45:07 +01:00
Cedric Nugteren
d45911b61d
Added groundwork for col2im algorithm plus first non-working version of kernel and test
2018-10-23 20:52:25 +02:00
Cedric Nugteren
83ba3d4b7b
Merge branch 'master' into convgemm_multi_kernel
2018-09-16 20:01:18 +02:00
Cedric Nugteren
fe639455bd
Added an option to compile the Netlib API with static OpenCL device and context
2018-08-05 21:12:39 +02:00
Cedric Nugteren
2dd539f911
Removed complex numbers support for CONVGEMM
2018-07-29 10:37:14 +02:00
Cedric Nugteren
5903820ba2
Merge branch 'master' into CLBlast-267-convgemm
2018-07-29 10:26:34 +02:00
Cedric Nugteren
c459582c4f
Added tuning results for HD Graphics 6000 Broadwell GT3
2018-07-13 21:05:43 +02:00
Cedric Nugteren
a4119531ee
Updated the documentation for convgemm to include data layout (NCHW)
2018-05-09 17:46:27 +02:00
Cedric Nugteren
2d1f6ba7fe
Added convgemm skeleton, test infrastructure, and first reference implementation
2018-05-06 11:35:34 +02:00
Cedric Nugteren
2776d76176
Added interface of batched convolution as GEMM
2018-05-05 14:06:33 +02:00
kodonell
173a7eb928
merged
2018-03-27 08:55:39 +13:00
kodonell
d16f2d1317
got the generator thing working
2018-03-27 08:45:54 +13:00
Cedric Nugteren
54bbc99273
Updated the documentation for the tuner API
2018-03-10 14:52:40 +01:00
Cedric Nugteren
3d2ef9331b
Fixed a few things for the new tuning API
2018-03-10 14:35:11 +01:00
Cedric Nugteren
bff64917bd
Fixed some small issues regarding PR#253
2018-03-03 10:43:12 +01:00
sivagnanamn
1433dc67f1
Added C API for getting GEMM temp buffer size
2018-03-03 03:00:17 +09:00
Cedric Nugteren
13dc26e63d
Generated PyCLBlast docstrings
2018-02-25 15:30:57 +01:00
Cedric Nugteren
6710c60935
Some style improvements in the pyclblast code generator
2018-02-25 14:51:58 +01:00
Cedric Nugteren
9699169cdf
Added API documentation for two missing C++ functions
2018-02-25 14:44:22 +01:00
Cedric Nugteren
e784df0230
Renamed the API documentation
2018-02-24 20:46:44 +01:00
Kirill Mavreshko
e300ad3292
Fixed duplication of parameter descriptions by the doc generator
2018-02-21 14:18:45 +05:00
Cedric Nugteren
ce5e2a1e00
Prepared PyCLBlast for release as a package on PyPi
2018-02-18 18:01:02 +01:00
Cedric Nugteren
a66e24a009
Added all other level 1/2/3 routines to pyclblast
2018-02-18 17:34:10 +01:00
Cedric Nugteren
e1bfb40827
Added GEMM to the Python wrapper
2018-02-18 16:33:20 +01:00
Cedric Nugteren
eb85f6b514
First agenerated version (clblastXswap only for now) of the pyclblast wrapper
2018-02-14 20:50:47 +01:00
Cedric Nugteren
ae66782eab
Fixed the XHAD documentation
2018-02-02 21:12:07 +01:00
Cedric Nugteren
ef5008f5e4
Created the API and stubs for the HAD (hadamard-product) routines
2018-01-31 20:41:02 +01:00
Cedric Nugteren
a500f537d8
Added a RetrieveParameters function to inspect tuning parameters
2018-01-11 20:32:06 +01:00
Cedric Nugteren
9fb2c61b25
Added API and tests for new GemmStridedBatched routine
2018-01-07 14:27:15 +01:00
Cedric Nugteren
0c48c6e6c4
Fixed a minor nullptr related issue in the code generator
2018-01-06 19:32:54 +01:00
Cedric Nugteren
ce069545d4
Added CUDA interface to get temporary-buffer size for GEMM routine
2018-01-06 10:05:28 +01:00
Cedric Nugteren
44431daecc
Added a CUDA version of the GEMM temp-buffer optional argument
2018-01-04 19:33:51 +01:00
Cedric Nugteren
af14fff1e9
Updated the generator script to automatically generate the temp-buffer code
2018-01-04 19:31:57 +01:00
Cedric Nugteren
6d1e30e61f
Added interface to compute the required temporary buffer size for GEMM
2017-12-28 14:46:45 +01:00
Cedric Nugteren
54d0c440ce
Various fixes to make the host code and sample compile with the CUDA API
2017-10-14 11:43:57 +02:00
Cedric Nugteren
cc5b475425
CUDA API now takes context and device in instead of stream
2017-10-12 12:20:43 +02:00
Cedric Nugteren
b901809345
Added first (untested) version of a CUDA API
2017-10-11 23:16:57 +02:00
Cedric Nugteren
9224da19ef
Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately
2017-10-09 20:06:25 +02:00
Cedric Nugteren
df3c9f4a8a
Moved non-routine-specific API functions and includes to separate files
2017-10-08 21:52:02 +02:00
Cedric Nugteren
84ec50e29d
Added interface and stubs for the im2col routine
2017-07-02 12:10:22 +02:00
Cedric Nugteren
1a8ed48a35
Fixed some Clang and MSVC warnings
2017-06-25 11:50:36 +02:00
Cedric Nugteren
615a7fdc81
Fixes some compilation issues related to the database structure change
2017-06-21 23:07:47 +02:00