Cedric Nugteren
cc5b475425
CUDA API now takes context and device in instead of stream
2017-10-12 12:20:43 +02:00
Cedric Nugteren
b901809345
Added first (untested) version of a CUDA API
2017-10-11 23:16:57 +02:00
Cedric Nugteren
e8f1de0265
Made the half-precision header OpenCL-independent
2017-10-09 18:30:19 +02:00
Cedric Nugteren
84ec50e29d
Added interface and stubs for the im2col routine
2017-07-02 12:10:22 +02:00
Cedric Nugteren
f151e56daa
Added the IxAMIN routines: absolute minimum version of IxAMAX
2017-05-12 20:01:33 -07:00
Cedric Nugteren
409a5a2ad0
Fixed a namespace clash with CUDA FP16 for the half-datatype
2017-04-17 16:47:15 +02:00
Cedric Nugteren
fb6c78ea07
Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance
2017-04-07 07:37:30 +02:00
Cedric Nugteren
49e04c7fce
Added API and test infrastructure for the batched GEMM routine
2017-03-10 21:24:35 +01:00
Cedric Nugteren
fa0a9c689f
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
2017-03-08 20:10:20 +01:00
Cedric Nugteren
b114ea49a9
Added first naive version of the batched AXPY routine
2017-03-05 15:06:14 +01:00
Cedric Nugteren
f9a520b3af
Prepared generator for batched routines; added batched AXPY routine interface
2017-03-05 10:38:38 +01:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
b7310036ed
Removed half-precision support from the TRSM routine; too unstable
2017-02-26 12:56:21 +01:00
Cedric Nugteren
d6538dfc25
Fixed the naming of the C API of OverrideParameters and fixed the description
2017-02-18 10:59:38 +01:00
Cedric Nugteren
cda449a5c3
Added a C interface to the OverrideParameters function; added some in-line comments to the API
2017-02-16 21:14:48 +01:00
Cedric Nugteren
08bfb75a9d
Added input-sanity checks for the OverrideParameters function
2017-02-16 21:12:50 +01:00
Cedric Nugteren
cdb3bb7166
Added first version of the OverrideParameters function
2017-02-13 20:53:06 +01:00
Cedric Nugteren
26ca071480
Minor changes to ensure full compatibility with the Netlib CBLAS API
2016-11-22 08:41:52 +01:00
Cedric Nugteren
eefe0df435
Made functions with scalar-buffers as output properly return values
2016-11-20 21:36:57 +01:00
Cedric Nugteren
8ae8ab06a2
Renamed the include and source files of the Netlib CBLAS API
2016-10-25 20:33:10 +02:00
Cedric Nugteren
729862e873
Fixed some issues preventing the Netlib CBLAS API from linking correctly
2016-10-25 19:56:42 +02:00
Cedric Nugteren
926aca53a0
Made the Netlib CBLAS API use the same enums with prefixes as the regular C API of CLBlast
2016-10-25 19:45:57 +02:00
Cedric Nugteren
f96fd372bc
Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes
2016-10-25 14:28:52 +02:00
Cedric Nugteren
3b65eace0a
Merge branch 'development' into netlib_blas_api
...
Conflicts:
scripts/generator/generator.py
scripts/generator/generator/routine.py
2016-10-25 09:34:24 +02:00
Cedric Nugteren
a670c4c4bf
All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects
2016-10-22 16:14:56 +02:00
Cedric Nugteren
4a5516aa78
Added extra error codes to reflect the more detailed error reporting of OpenCL functions
2016-10-22 15:46:29 +02:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren
9331442a56
Merge branch 'development' into netlib_blas_api
2016-10-16 11:43:03 +02:00
Cedric Nugteren
53deed298f
Added documentation and minor refactoring for the recent support of static library compilation
2016-10-15 17:11:08 +02:00
Shehzan Mohammed
0d958bf3b3
Fixes for static lib compilation on Windows
2016-10-14 18:45:34 -04:00
Cedric Nugteren
8a9d3cdf37
Added support for compiling the library, the client, and the samples under MSVC 2013
2016-10-10 22:45:39 +02:00
Cedric Nugteren
8d5747aa54
Made non-standard types void-pointers in the Netlib BLAS interface
2016-10-05 08:23:54 +02:00
Cedric Nugteren
a17b714c3e
Added first version of Netlib BLAS API header
2016-10-05 00:09:39 +02:00
Cedric Nugteren
b330ab0866
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
2016-06-30 10:49:17 +02:00
Cedric Nugteren
f726fbdc9f
Moved all headers into the source tree, changed headers to .hpp extension
2016-06-18 20:20:13 +02:00
Cedric Nugteren
bacb5d2bb2
Clean-up of the routine class, moved RunKernel to the routine/common file
2016-06-18 18:16:14 +02:00
Cedric Nugteren
7b4c0e1cf0
Removed the template from the Routine base-class
2016-06-18 14:56:55 +02:00
Cedric Nugteren
f9947b4d7f
Removed the precision argument from the routines in favor of a single templated function
2016-06-17 14:30:37 +02:00
Cedric Nugteren
536b7fe4bc
Removed the interface to the cache functions from the Routine class, calls them directly now
2016-06-17 13:57:50 +02:00
Cedric Nugteren
98a95c89fc
Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine class
2016-06-17 12:32:06 +02:00
Cedric Nugteren
520e28e7a7
Moved the ErrorIn function from the Routine class to the utilities header
2016-06-17 11:41:10 +02:00
Cedric Nugteren
afe8852eaa
Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file
2016-06-17 11:29:07 +02:00
Cedric Nugteren
52ccaf5b25
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
2016-06-16 18:07:46 +02:00
Cedric Nugteren
39b7dbc5e3
Added some constness to variables related to the GEMM routines
2016-06-15 12:34:05 +02:00
Cedric Nugteren
3e78a99355
Moved device vendor and type checks to a common header
2016-06-14 14:30:22 +02:00
Cedric Nugteren
6925003e45
Added global memory synchronisation for better cache performance on ARM Mali GPUs
2016-06-08 10:13:37 +02:00
Cedric Nugteren
137d1d8708
Added tuning parameters for 'GRID K520' and 'HD Graphics Skylake ULT GT2'
2016-06-01 09:39:33 +02:00
Cedric Nugteren
03182f9d07
Added half-precision tests for the clBLAS reference through conversion to single-precision
2016-05-26 23:36:19 +02:00
Cedric Nugteren
9f87455070
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
2016-05-25 13:29:53 +02:00
Cedric Nugteren
3e9a07f00a
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
2016-05-22 16:59:14 +02:00