Cedric Nugteren
faa2109707
Bump to version 1.6.2 ( #527 )
2024-02-09 21:38:37 +01:00
Cedric Nugteren
e3ce21bb93
Bump to v1.6.1 ( #496 )
2023-07-09 11:24:24 +02:00
Cedric Nugteren
b0b302889c
Update to version 1.6.0 ( #475 )
2023-05-21 20:51:05 +02:00
Cedric Nugteren
0de212a56b
Update to version 1.5.3
2022-09-22 22:07:33 +02:00
Gard Spreemann
3d3492646c
Correct capitalization typo
...
The CLBlastConfig.cmake file was installed to a directory named
CLBLast (notice second capital l), which can cause issues for CMake's
search path when looking for CLBlast on the system.
This commit also fixes other occurrences of the wrong capitalization,
all of it purely cosmetic (i.e. in comments).
2021-04-30 10:27:22 +02:00
Cedric Nugteren
396ac0278a
Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering
2020-05-12 14:43:25 +02:00
Koichi Akabe
032e3b0cc0
Add kernel_mode option to im2col, col2im, and convgemm functions
2018-11-12 10:12:07 +09:00
Cedric Nugteren
d45911b61d
Added groundwork for col2im algorithm plus first non-working version of kernel and test
2018-10-23 20:52:25 +02:00
Cedric Nugteren
2dd539f911
Removed complex numbers support for CONVGEMM
2018-07-29 10:37:14 +02:00
Cedric Nugteren
2776d76176
Added interface of batched convolution as GEMM
2018-05-05 14:06:33 +02:00
Cedric Nugteren
bff64917bd
Fixed some small issues regarding PR#253
2018-03-03 10:43:12 +01:00
sivagnanamn
1433dc67f1
Added C API for getting GEMM temp buffer size
2018-03-03 03:00:17 +09:00
Cedric Nugteren
ef5008f5e4
Created the API and stubs for the HAD (hadamard-product) routines
2018-01-31 20:41:02 +01:00
Cedric Nugteren
9fb2c61b25
Added API and tests for new GemmStridedBatched routine
2018-01-07 14:27:15 +01:00
Cedric Nugteren
84ec50e29d
Added interface and stubs for the im2col routine
2017-07-02 12:10:22 +02:00
Cedric Nugteren
f151e56daa
Added the IxAMIN routines: absolute minimum version of IxAMAX
2017-05-12 20:01:33 -07:00
Cedric Nugteren
49e04c7fce
Added API and test infrastructure for the batched GEMM routine
2017-03-10 21:24:35 +01:00
Cedric Nugteren
fa0a9c689f
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
2017-03-08 20:10:20 +01:00
Cedric Nugteren
b114ea49a9
Added first naive version of the batched AXPY routine
2017-03-05 15:06:14 +01:00
Cedric Nugteren
f9a520b3af
Prepared generator for batched routines; added batched AXPY routine interface
2017-03-05 10:38:38 +01:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
b7310036ed
Removed half-precision support from the TRSM routine; too unstable
2017-02-26 12:56:21 +01:00
Cedric Nugteren
d6538dfc25
Fixed the naming of the C API of OverrideParameters and fixed the description
2017-02-18 10:59:38 +01:00
Cedric Nugteren
cda449a5c3
Added a C interface to the OverrideParameters function; added some in-line comments to the API
2017-02-16 21:14:48 +01:00
Cedric Nugteren
f96fd372bc
Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes
2016-10-25 14:28:52 +02:00
Cedric Nugteren
a670c4c4bf
All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects
2016-10-22 16:14:56 +02:00
Cedric Nugteren
4a5516aa78
Added extra error codes to reflect the more detailed error reporting of OpenCL functions
2016-10-22 15:46:29 +02:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren
53deed298f
Added documentation and minor refactoring for the recent support of static library compilation
2016-10-15 17:11:08 +02:00
Shehzan Mohammed
0d958bf3b3
Fixes for static lib compilation on Windows
2016-10-14 18:45:34 -04:00
Cedric Nugteren
b330ab0866
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
2016-06-30 10:49:17 +02:00
Cedric Nugteren
afe8852eaa
Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file
2016-06-17 11:29:07 +02:00
Cedric Nugteren
52ccaf5b25
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
2016-06-16 18:07:46 +02:00
Cedric Nugteren
9f87455070
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
2016-05-25 13:29:53 +02:00
Cedric Nugteren
3e9a07f00a
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
2016-05-22 16:59:14 +02:00
Cedric Nugteren
95b828da12
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
2016-05-22 15:38:26 +02:00
Cedric Nugteren
803aaf3070
Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
2016-05-22 14:47:14 +02:00
Cedric Nugteren
120c31a30f
Initial experimental version of the half-precision HAXPY routine
2016-05-13 20:49:34 +02:00
Cedric Nugteren
e113ff0852
Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX
2016-04-30 09:49:39 +02:00
Cedric Nugteren
877aad693f
Added FillCache: a function to pre-compile all kernels for a specific device
2016-04-29 23:33:12 +02:00
Cedric Nugteren
d9b21d7f49
Fixed the cache to store binaries instead of OpenCL programs
2016-04-28 21:14:17 +02:00
Cedric Nugteren
d7ddbdeb1f
Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX
2016-04-27 18:07:30 +02:00
Cedric Nugteren
8075934ca7
Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute counterparts of xASUM and IxAMAX)
2016-04-27 17:06:19 +02:00
Cedric Nugteren
82be8f211c
Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache
2016-04-27 16:02:13 +02:00
Cedric Nugteren
3555cd0436
All CLBlast enum constants now have the same raw values as in the cblas standard
2016-04-27 11:37:55 +02:00
cnugteren
894983fc3c
Added prototype for ixAMAX routines
2016-04-20 21:11:33 -06:00
cnugteren
e0497807e2
Added prototype for xASUM routines
2016-04-13 21:44:49 -06:00
cnugteren
5c83217cf2
Added a wrapper for CBLAS libraries for performance/correctness testing
2016-04-01 22:36:39 -07:00
cnugteren
8c3c6db7d0
Merge branch 'level1_routines' into development
2016-03-30 21:37:56 -07:00
Cedric Nugteren
c1df786764
Added prototypes for the xROTM and xROTMG routines
2016-03-30 16:13:37 -07:00