Koichi Akabe
032e3b0cc0
Add kernel_mode option to im2col, col2im, and convgemm functions
2018-11-12 10:12:07 +09:00
Cedric Nugteren
d45911b61d
Added groundwork for col2im algorithm plus first non-working version of kernel and test
2018-10-23 20:52:25 +02:00
Cedric Nugteren
2dd539f911
Removed complex numbers support for CONVGEMM
2018-07-29 10:37:14 +02:00
Cedric Nugteren
2776d76176
Added interface of batched convolution as GEMM
2018-05-05 14:06:33 +02:00
Cedric Nugteren
bff64917bd
Fixed some small issues regarding PR#253
2018-03-03 10:43:12 +01:00
sivagnanamn
1433dc67f1
Added C API for getting GEMM temp buffer size
2018-03-03 03:00:17 +09:00
Cedric Nugteren
ef5008f5e4
Created the API and stubs for the HAD (hadamard-product) routines
2018-01-31 20:41:02 +01:00
Cedric Nugteren
9fb2c61b25
Added API and tests for new GemmStridedBatched routine
2018-01-07 14:27:15 +01:00
Cedric Nugteren
84ec50e29d
Added interface and stubs for the im2col routine
2017-07-02 12:10:22 +02:00
Cedric Nugteren
f151e56daa
Added the IxAMIN routines: absolute minimum version of IxAMAX
2017-05-12 20:01:33 -07:00
Cedric Nugteren
49e04c7fce
Added API and test infrastructure for the batched GEMM routine
2017-03-10 21:24:35 +01:00
Cedric Nugteren
fa0a9c689f
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
2017-03-08 20:10:20 +01:00
Cedric Nugteren
b114ea49a9
Added first naive version of the batched AXPY routine
2017-03-05 15:06:14 +01:00
Cedric Nugteren
f9a520b3af
Prepared generator for batched routines; added batched AXPY routine interface
2017-03-05 10:38:38 +01:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
b7310036ed
Removed half-precision support from the TRSM routine; too unstable
2017-02-26 12:56:21 +01:00
Cedric Nugteren
d6538dfc25
Fixed the naming of the C API of OverrideParameters and fixed the description
2017-02-18 10:59:38 +01:00
Cedric Nugteren
cda449a5c3
Added a C interface to the OverrideParameters function; added some in-line comments to the API
2017-02-16 21:14:48 +01:00
Cedric Nugteren
f96fd372bc
Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes
2016-10-25 14:28:52 +02:00
Cedric Nugteren
a670c4c4bf
All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects
2016-10-22 16:14:56 +02:00
Cedric Nugteren
4a5516aa78
Added extra error codes to reflect the more detailed error reporting of OpenCL functions
2016-10-22 15:46:29 +02:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren
53deed298f
Added documentation and minor refactoring for the recent support of static library compilation
2016-10-15 17:11:08 +02:00
Shehzan Mohammed
0d958bf3b3
Fixes for static lib compilation on Windows
2016-10-14 18:45:34 -04:00
Cedric Nugteren
b330ab0866
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
2016-06-30 10:49:17 +02:00
Cedric Nugteren
afe8852eaa
Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file
2016-06-17 11:29:07 +02:00
Cedric Nugteren
52ccaf5b25
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
2016-06-16 18:07:46 +02:00
Cedric Nugteren
9f87455070
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
2016-05-25 13:29:53 +02:00
Cedric Nugteren
3e9a07f00a
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
2016-05-22 16:59:14 +02:00
Cedric Nugteren
95b828da12
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
2016-05-22 15:38:26 +02:00
Cedric Nugteren
803aaf3070
Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
2016-05-22 14:47:14 +02:00
Cedric Nugteren
120c31a30f
Initial experimental version of the half-precision HAXPY routine
2016-05-13 20:49:34 +02:00
Cedric Nugteren
e113ff0852
Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX
2016-04-30 09:49:39 +02:00
Cedric Nugteren
877aad693f
Added FillCache: a function to pre-compile all kernels for a specific device
2016-04-29 23:33:12 +02:00
Cedric Nugteren
d9b21d7f49
Fixed the cache to store binaries instead of OpenCL programs
2016-04-28 21:14:17 +02:00
Cedric Nugteren
d7ddbdeb1f
Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX
2016-04-27 18:07:30 +02:00
Cedric Nugteren
8075934ca7
Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute counterparts of xASUM and IxAMAX)
2016-04-27 17:06:19 +02:00
Cedric Nugteren
82be8f211c
Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache
2016-04-27 16:02:13 +02:00
Cedric Nugteren
3555cd0436
All CLBlast enum constants now have the same raw values as in the cblas standard
2016-04-27 11:37:55 +02:00
cnugteren
894983fc3c
Added prototype for ixAMAX routines
2016-04-20 21:11:33 -06:00
cnugteren
e0497807e2
Added prototype for xASUM routines
2016-04-13 21:44:49 -06:00
cnugteren
5c83217cf2
Added a wrapper for CBLAS libraries for performance/correctness testing
2016-04-01 22:36:39 -07:00
cnugteren
8c3c6db7d0
Merge branch 'level1_routines' into development
2016-03-30 21:37:56 -07:00
Cedric Nugteren
c1df786764
Added prototypes for the xROTM and xROTMG routines
2016-03-30 16:13:37 -07:00
Cedric Nugteren
6ecc0d089c
Added prototypes for the xROT and xROTG functions
2016-03-30 16:13:32 -07:00
Cedric Nugteren
1d5a702d9d
Added prototypes for ScNRM2/DzNRM2 routines
2016-03-25 10:30:38 +01:00
Cedric Nugteren
3876096c30
Added prototypes for SNRM2/DNRM2 routines
2016-03-25 10:00:40 +01:00
Cedric Nugteren
49822c8ead
Fixed the C-api export to be able to properly build a DLL on Windows
2016-03-23 20:49:28 +01:00
CNugteren
6105ad6f5b
Added interface of all level 2 routines
2015-09-17 17:05:45 +02:00
CNugteren
6307d2e5db
Added script to generate API interface and implementation automatically
2015-09-17 10:14:33 +02:00