Cedric Nugteren
503ab74f02
Fixed issue with not performing complex conjugation under certain cases when transposing
2018-07-31 21:49:37 +02:00
Cedric Nugteren
53198121ac
Made FillMatrix and FillVector functions take a configurable local workgroup size
2018-05-27 12:03:32 +02:00
Cedric Nugteren
8258321a74
Now stores a shared_ptr to the Program class in the cache
2018-05-01 20:34:48 +02:00
Cedric Nugteren
99a4df88a6
Implemented the in-direct version of the strided-batched GEMM kernel
2018-01-08 21:07:01 +01:00
Cedric Nugteren
f94d498a37
Moved compilation function to separate file; removed dependency of tuners of the CLBlast library
2017-11-17 20:57:46 +01:00
Cedric Nugteren
677afd3b96
Factored out the creation of the OpenCL header and the program compilation
2017-11-11 16:14:43 +01:00
Cedric Nugteren
44246053a5
Removed include of clpp11.hpp in places other than utilities.hpp
2017-10-09 19:41:40 +02:00
Cedric Nugteren
0a63621579
Moved functions from the header to the .cpp file to prevent compiling the same code multiple times
2017-08-12 15:59:14 +02:00
Cedric Nugteren
2fd04dae83
Added batched versions of the pad/copy/transpose kernels
2017-03-19 15:57:44 +01:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
df7638c305
Fixed an out-of-bounds memory access when filling a matrix with a constant
2017-02-26 14:31:05 +01:00
Cedric Nugteren
345a5feb9a
Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides)
2017-02-12 12:02:39 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Cedric Nugteren
7c73ceb095
Added first (incomplete) version of TRSV routine
2017-01-29 17:02:00 +01:00
Ivan Shapovalov
1a1e863ab3
treewide: include clpp11.hpp first to silence deprecation warnings
...
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Cedric Nugteren
4b3ffd9989
Added a first version of the diagonal block invert routine in preparation of TRSM
2017-01-15 17:30:00 +01:00
Cedric Nugteren
76d5d2ccfc
Fixed a bug in the transpose-matrix function
2016-10-23 20:49:55 +02:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Ivan Shapovalov
ae3299da30
clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
2dd5ee3f75
clblast::RunKernel, cl::Kernel: take const vector as waitForEvents
2016-07-22 11:15:52 +03:00
Cedric Nugteren
066af4069b
Removed an unused variable from the copy-transpose-pad function
2016-07-16 10:56:37 +02:00
Cedric Nugteren
c87e877bf2
Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel
2016-07-10 20:32:01 +02:00
Cedric Nugteren
f726fbdc9f
Moved all headers into the source tree, changed headers to .hpp extension
2016-06-18 20:20:13 +02:00