Commit Graph

23 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren 503ab74f02 Fixed issue with not performing complex conjugation under certain cases when transposing 2018-07-31 21:49:37 +02:00
Cedric Nugteren 53198121ac Made FillMatrix and FillVector functions take a configurable local workgroup size 2018-05-27 12:03:32 +02:00
Cedric Nugteren 8258321a74 Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00
Cedric Nugteren 99a4df88a6 Implemented the in-direct version of the strided-batched GEMM kernel 2018-01-08 21:07:01 +01:00
Cedric Nugteren f94d498a37 Moved compilation function to separate file; removed dependency of tuners of the CLBlast library 2017-11-17 20:57:46 +01:00
Cedric Nugteren 677afd3b96 Factored out the creation of the OpenCL header and the program compilation 2017-11-11 16:14:43 +01:00
Cedric Nugteren 44246053a5 Removed include of clpp11.hpp in places other than utilities.hpp 2017-10-09 19:41:40 +02:00
Cedric Nugteren 0a63621579 Moved functions from the header to the .cpp file to prevent compiling the same code multiple times 2017-08-12 15:59:14 +02:00
Cedric Nugteren 2fd04dae83 Added batched versions of the pad/copy/transpose kernels 2017-03-19 15:57:44 +01:00
Cedric Nugteren ea6790665d Merge branch 'development' into triangular_solvers 2017-02-26 14:51:45 +01:00
Cedric Nugteren df7638c305 Fixed an out-of-bounds memory access when filling a matrix with a constant 2017-02-26 14:31:05 +01:00
Cedric Nugteren 345a5feb9a Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides) 2017-02-12 12:02:39 +01:00
Cedric Nugteren c248f900c0 Merge branch 'development' into triangular_solvers 2017-02-05 22:18:59 +01:00
Cedric Nugteren 7c73ceb095 Added first (incomplete) version of TRSV routine 2017-01-29 17:02:00 +01:00
Ivan Shapovalov 1a1e863ab3 treewide: include clpp11.hpp first to silence deprecation warnings
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Cedric Nugteren 4b3ffd9989 Added a first version of the diagonal block invert routine in preparation of TRSM 2017-01-15 17:30:00 +01:00
Cedric Nugteren 76d5d2ccfc Fixed a bug in the transpose-matrix function 2016-10-23 20:49:55 +02:00
Ivan Shapovalov b98af44fcf treewide: use C++ exceptions properly
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.

Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.

However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Ivan Shapovalov ae3299da30 clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS 2016-07-22 11:15:52 +03:00
Ivan Shapovalov 2dd5ee3f75 clblast::RunKernel, cl::Kernel: take const vector as waitForEvents 2016-07-22 11:15:52 +03:00
Cedric Nugteren 066af4069b Removed an unused variable from the copy-transpose-pad function 2016-07-16 10:56:37 +02:00
Cedric Nugteren c87e877bf2 Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel 2016-07-10 20:32:01 +02:00
Cedric Nugteren f726fbdc9f Moved all headers into the source tree, changed headers to .hpp extension 2016-06-18 20:20:13 +02:00