Cedric Nugteren
f9a520b3af
Prepared generator for batched routines; added batched AXPY routine interface
2017-03-05 10:38:38 +01:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
b7310036ed
Removed half-precision support from the TRSM routine; too unstable
2017-02-26 12:56:21 +01:00
Cedric Nugteren
d6538dfc25
Fixed the naming of the C API of OverrideParameters and fixed the description
2017-02-18 10:59:38 +01:00
Cedric Nugteren
cda449a5c3
Added a C interface to the OverrideParameters function; added some in-line comments to the API
2017-02-16 21:14:48 +01:00
Cedric Nugteren
08bfb75a9d
Added input-sanity checks for the OverrideParameters function
2017-02-16 21:12:50 +01:00
Cedric Nugteren
cdb3bb7166
Added first version of the OverrideParameters function
2017-02-13 20:53:06 +01:00
Cedric Nugteren
26ca071480
Minor changes to ensure full compatibility with the Netlib CBLAS API
2016-11-22 08:41:52 +01:00
Cedric Nugteren
eefe0df435
Made functions with scalar-buffers as output properly return values
2016-11-20 21:36:57 +01:00
Cedric Nugteren
8ae8ab06a2
Renamed the include and source files of the Netlib CBLAS API
2016-10-25 20:33:10 +02:00
Cedric Nugteren
729862e873
Fixed some issues preventing the Netlib CBLAS API from linking correctly
2016-10-25 19:56:42 +02:00
Cedric Nugteren
926aca53a0
Made the Netlib CBLAS API use the same enums with prefixes as the regular C API of CLBlast
2016-10-25 19:45:57 +02:00
Cedric Nugteren
f96fd372bc
Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes
2016-10-25 14:28:52 +02:00
Cedric Nugteren
3b65eace0a
Merge branch 'development' into netlib_blas_api
...
Conflicts:
scripts/generator/generator.py
scripts/generator/generator/routine.py
2016-10-25 09:34:24 +02:00
Cedric Nugteren
a670c4c4bf
All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects
2016-10-22 16:14:56 +02:00
Cedric Nugteren
4a5516aa78
Added extra error codes to reflect the more detailed error reporting of OpenCL functions
2016-10-22 15:46:29 +02:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren
9331442a56
Merge branch 'development' into netlib_blas_api
2016-10-16 11:43:03 +02:00
Cedric Nugteren
53deed298f
Added documentation and minor refactoring for the recent support of static library compilation
2016-10-15 17:11:08 +02:00
Shehzan Mohammed
0d958bf3b3
Fixes for static lib compilation on Windows
2016-10-14 18:45:34 -04:00
Cedric Nugteren
8a9d3cdf37
Added support for compiling the library, the client, and the samples under MSVC 2013
2016-10-10 22:45:39 +02:00
Cedric Nugteren
8d5747aa54
Made non-standard types void-pointers in the Netlib BLAS interface
2016-10-05 08:23:54 +02:00
Cedric Nugteren
a17b714c3e
Added first version of Netlib BLAS API header
2016-10-05 00:09:39 +02:00
Cedric Nugteren
b330ab0866
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
2016-06-30 10:49:17 +02:00
Cedric Nugteren
f726fbdc9f
Moved all headers into the source tree, changed headers to .hpp extension
2016-06-18 20:20:13 +02:00
Cedric Nugteren
bacb5d2bb2
Clean-up of the routine class, moved RunKernel to the routine/common file
2016-06-18 18:16:14 +02:00
Cedric Nugteren
7b4c0e1cf0
Removed the template from the Routine base-class
2016-06-18 14:56:55 +02:00
Cedric Nugteren
f9947b4d7f
Removed the precision argument from the routines in favor of a single templated function
2016-06-17 14:30:37 +02:00
Cedric Nugteren
536b7fe4bc
Removed the interface to the cache functions from the Routine class, calls them directly now
2016-06-17 13:57:50 +02:00
Cedric Nugteren
98a95c89fc
Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine class
2016-06-17 12:32:06 +02:00
Cedric Nugteren
520e28e7a7
Moved the ErrorIn function from the Routine class to the utilities header
2016-06-17 11:41:10 +02:00
Cedric Nugteren
afe8852eaa
Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file
2016-06-17 11:29:07 +02:00
Cedric Nugteren
52ccaf5b25
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
2016-06-16 18:07:46 +02:00
Cedric Nugteren
39b7dbc5e3
Added some constness to variables related to the GEMM routines
2016-06-15 12:34:05 +02:00
Cedric Nugteren
3e78a99355
Moved device vendor and type checks to a common header
2016-06-14 14:30:22 +02:00
Cedric Nugteren
6925003e45
Added global memory synchronisation for better cache performance on ARM Mali GPUs
2016-06-08 10:13:37 +02:00
Cedric Nugteren
137d1d8708
Added tuning parameters for 'GRID K520' and 'HD Graphics Skylake ULT GT2'
2016-06-01 09:39:33 +02:00
Cedric Nugteren
03182f9d07
Added half-precision tests for the clBLAS reference through conversion to single-precision
2016-05-26 23:36:19 +02:00
Cedric Nugteren
9f87455070
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
2016-05-25 13:29:53 +02:00
Cedric Nugteren
3e9a07f00a
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
2016-05-22 16:59:14 +02:00
Cedric Nugteren
f0cb3fdc81
Fixed tuning results for half-precision; added first results for the xGER kernels
2016-05-22 16:46:05 +02:00
Cedric Nugteren
c8ff3f143f
Prepared the GER kernels and tuner for half-precision support
2016-05-22 16:18:08 +02:00
Cedric Nugteren
95b828da12
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
2016-05-22 15:38:26 +02:00
Cedric Nugteren
b6268d0c22
Added first tuning results for the half-precision xGEMV kernels
2016-05-22 15:29:05 +02:00
Cedric Nugteren
88551b4005
Prepared the GEMV kernels and tuner for half-precision support
2016-05-22 15:22:54 +02:00
Cedric Nugteren
803aaf3070
Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
2016-05-22 14:47:14 +02:00
Cedric Nugteren
3c9e63c054
Added first tuning results for the half-precision xDOT kernels
2016-05-22 14:43:25 +02:00
Cedric Nugteren
489c5d76cf
Merged in latest changes from 0.7.1 release
2016-05-18 21:32:56 +02:00
Cedric Nugteren
7a3b695db7
Added half precision tuning results for supporting kernels (pad, copy, transpose, padtranspose)
2016-05-16 12:45:10 +02:00
Cedric Nugteren
4b6bdd83a2
Added header with conversions from and to half-precision floating-point
2016-05-15 20:13:57 +02:00