Commit Graph

60 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer (#461) 2023-05-10 12:48:25 +02:00
Cedric Nugteren 4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer (#458) 2023-05-07 20:03:16 +02:00
Cedric Nugteren bf43dbb4ee Made last operation in TRSV and TRSM asynchronous, making the events not null 2018-08-13 22:58:44 +02:00
Cedric Nugteren 3115c15db5 Small refactoring of events in TRSV substitution routine 2018-08-13 22:58:01 +02:00
Cedric Nugteren 5702bff5ad Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue 2018-05-31 22:37:06 +02:00
Cedric Nugteren 01d254c0b0 Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM 2018-05-27 18:38:47 +02:00
Cedric Nugteren 53198121ac Made FillMatrix and FillVector functions take a configurable local workgroup size 2018-05-27 12:03:32 +02:00
Cedric Nugteren b1f52f130c Updated the database to use the new TRSV and Invert tuners 2017-12-23 13:55:22 +01:00
Cedric Nugteren 44f7fa628a Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM 2017-10-27 22:01:15 +02:00
Cedric Nugteren ce369702d8 Added some missing const-ness 2017-04-07 07:34:32 +02:00
Cedric Nugteren 00281dad26 Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants 2017-02-27 21:00:04 +01:00
Cedric Nugteren ea6790665d Merge branch 'development' into triangular_solvers 2017-02-26 14:51:45 +01:00
Cedric Nugteren 00eb55a2d4 Fixed a small bug in GEMV: unused kernel in parameter list 2017-02-13 20:48:32 +01:00
Cedric Nugteren c248f900c0 Merge branch 'development' into triangular_solvers 2017-02-05 22:18:59 +01:00
Cedric Nugteren c209dd7af9 Improved substition kernels a bit; added complex support 2017-02-04 22:48:06 +01:00
Cedric Nugteren fec8c1a806 Completed a first STRSV implementation 2017-02-04 16:04:19 +01:00
Cedric Nugteren a6ba6470aa Added row-major support for TRSV 2017-02-04 14:25:27 +01:00
Cedric Nugteren 7c73ceb095 Added first (incomplete) version of TRSV routine 2017-01-29 17:02:00 +01:00
Ivan Shapovalov 5bcd92f297 Routine, Cache: generalize, reduce amount of copying in fast path
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Cedric Nugteren a5fd2323b6 Added prototype for the TRSV routine 2017-01-20 11:30:32 +01:00
Ivan Shapovalov 56f300607b Routine: get rid of ::SetUp()
Since we now use C++ exceptions inside the implementation (and exceptions
can be thrown from constructors), there is no need for a separate
Routine::SetUp() function.

For this, we also change the way how the kernel source string is constructed.
The kernel-specific source code is now passed to the Routine ctor via
an initializer_list of C strings to avoid unnecessary data copying
while also working around C1091 of MSVC 2013.
2016-10-22 08:45:27 +03:00
Ivan Shapovalov b98af44fcf treewide: use C++ exceptions properly
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.

Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.

However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren 2582f0290a Moved the XgemvFast and XgemvFastRot tuning database into a separate file 2016-07-25 22:43:49 +02:00
Cedric Nugteren 75fe8235f7 Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enabling better memory performance 2016-07-23 10:20:11 +02:00
Cedric Nugteren c87e877bf2 Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel 2016-07-10 20:32:01 +02:00
Cedric Nugteren 61203453aa Renamed all C++ source files to .cpp to match the .hpp extension better 2016-06-19 13:55:49 +02:00
Cedric Nugteren f726fbdc9f Moved all headers into the source tree, changed headers to .hpp extension 2016-06-18 20:20:13 +02:00
Cedric Nugteren 7b4c0e1cf0 Removed the template from the Routine base-class 2016-06-18 14:56:55 +02:00
Cedric Nugteren f9947b4d7f Removed the precision argument from the routines in favor of a single templated function 2016-06-17 14:30:37 +02:00
Cedric Nugteren 536b7fe4bc Removed the interface to the cache functions from the Routine class, calls them directly now 2016-06-17 13:57:50 +02:00
Cedric Nugteren 98a95c89fc Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine class 2016-06-17 12:32:06 +02:00
Cedric Nugteren afe8852eaa Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file 2016-06-17 11:29:07 +02:00
Cedric Nugteren 3e9a07f00a Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2 2016-05-22 16:59:14 +02:00
Cedric Nugteren c8ff3f143f Prepared the GER kernels and tuner for half-precision support 2016-05-22 16:18:08 +02:00
Cedric Nugteren 95b828da12 Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV 2016-05-22 15:38:26 +02:00
Cedric Nugteren 88551b4005 Prepared the GEMV kernels and tuner for half-precision support 2016-05-22 15:22:54 +02:00
Cedric Nugteren d9b21d7f49 Fixed the cache to store binaries instead of OpenCL programs 2016-04-28 21:14:17 +02:00
cnugteren 1d3d38a261 Events are now properly implemented using event waiting list and asking the user to wait for event completion 2016-04-09 22:22:24 -06:00
cnugteren 90e237b97a Removed redundant queue synchronisation statements 2016-04-04 08:38:31 -07:00
Cedric Nugteren f4c09220c1 Fixed a bug in the GER-family of routines due to incorrect division of the workgroup size 2016-03-06 16:43:28 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 60da54da5d Added preliminary support for xHER2 and xSYR2 routines 2016-03-02 21:18:01 +01:00
Cedric Nugteren 4a56822dcc Fixed a couple of correctness bugs in the Xher kernels 2016-02-28 15:49:59 +01:00
Cedric Nugteren e3545215a5 Added support for xHER, xHPR, xSYR, and xSPR routines 2016-02-28 14:16:48 +01:00
Cedric Nugteren 6dc44da07b Added support for xGERU and xGERC routines 2016-02-20 14:15:41 +01:00
Cedric Nugteren 8854a73127 Added XGER routine, kernel, and tuner 2016-02-20 12:40:01 +01:00
Cedric Nugteren 38c56bbde2 Split-up the XGEMV kernel in two parts 2016-02-08 19:43:34 +01:00
CNugteren 2b56c2c603 Added TRMV/TBMV/TPMV routines 2015-09-26 16:58:03 +02:00
CNugteren de6547a92b Added SBMV and SPMV routines 2015-09-19 18:01:19 +02:00
CNugteren 80da67d28b Added the HPMV routine 2015-09-19 17:40:38 +02:00