Commit Graph

92 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren c0e41b87cb Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel 2018-11-30 20:23:26 +01:00
Cedric Nugteren 9bedaa752d Fixed an MSVC compilation error due to large strings 2018-09-15 17:35:26 +02:00
Cedric Nugteren bf43dbb4ee Made last operation in TRSV and TRSM asynchronous, making the events not null 2018-08-13 22:58:44 +02:00
Cedric Nugteren 7c3431a72a Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present 2018-06-01 20:59:44 +02:00
Cedric Nugteren 01d254c0b0 Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM 2018-05-27 18:38:47 +02:00
Cedric Nugteren 53198121ac Made FillMatrix and FillVector functions take a configurable local workgroup size 2018-05-27 12:03:32 +02:00
Cedric Nugteren 458e6717a9 Expressed HER2K as two HERK calls 2018-04-18 20:58:29 +02:00
Cedric Nugteren dcce23d938 Expressed SYR2K as two SYRK calls 2018-04-18 20:29:28 +02:00
Cedric Nugteren ef6b1207df Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel 2018-04-17 21:13:28 +02:00
Cedric Nugteren 93610a9cba Fixed some failing tests for GEMM and batched GEMM routines 2018-04-15 12:53:32 +02:00
Cedric Nugteren 0dff7f1ac4 Made GEMM rotation expectations kernel-specific 2018-04-13 22:27:11 +02:00
Cedric Nugteren ad197da08d Fixed the CUDA interface: replaced nullptr with 0 2018-01-06 13:38:44 +01:00
Cedric Nugteren ad1227c4f2 Added optional temp-buffer argument to C++ interface of GEMM 2017-12-30 18:45:06 +01:00
Cedric Nugteren 6d1e30e61f Added interface to compute the required temporary buffer size for GEMM 2017-12-28 14:46:45 +01:00
Cedric Nugteren aaea9474a1 Factored out argument processing from the GEMM routine 2017-12-28 13:56:18 +01:00
Cedric Nugteren 74792ce96c Refactored GEMM code in preparation of separate temp-buffer computation 2017-12-28 11:08:10 +01:00
Cedric Nugteren 4a58efc130 Fixed for error C1091 in MSVC 2013 2017-12-10 16:40:59 +01:00
Cedric Nugteren b4d3a50f19 Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit 2017-12-10 16:09:09 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren fa6e5e67f5 Fixed a bug when using the matrix A-offset argument for the TRSM routine 2017-10-27 22:12:30 +02:00
Cedric Nugteren 449577cf07 Reduced TRSM block-size for better numerical stability 2017-10-27 22:07:43 +02:00
Cedric Nugteren d49aae236e Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls 2017-10-25 20:35:39 +02:00
Cedric Nugteren 86b80cdc98 Fixed a small typo 2017-10-07 18:39:32 +02:00
Cedric Nugteren 375193fe4e Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers 2017-10-03 21:55:21 +02:00
Cedric Nugteren e5eb6b1d3a Merge pull request #173 from mcian/PSO_params
Add PSO parameters support and search strategy selection from command…
2017-08-21 20:06:29 +02:00
mcian 0b4aa109f8 Use cltune::SearchMethod enum instead of int values 2017-08-09 16:05:25 +02:00
mcian 99afdcd908 Restore direct GEMM to previous version 2017-07-31 14:06:23 +02:00
Cedric Nugteren 0ea16a0e63 Minor optimization for the direct GEMM kernel: don't ceil m and n unnecessarily high 2017-07-25 20:53:12 +02:00
Cedric Nugteren 3070b502b5 Fixed an overflow bug on 32-bit systems when chosing a GEMM kernel 2017-06-18 20:51:11 +02:00
Cedric Nugteren 8400ee3a09 Fixed an TRSM issue caused by incorrect block size calculation 2017-05-15 22:04:55 +02:00
Cedric Nugteren 86e8df60f1 Fixed a bug in the TRSM routine; tests now pass 2017-05-12 17:43:56 -07:00
Cedric Nugteren 7b8f8fce68 Added initial naive version of the batched GEMM routine based on the direct GEMM kernel 2017-03-11 16:02:45 +01:00
Cedric Nugteren 3fc73851f7 Added proper support for the b_offset argument in TRSM 2017-03-01 21:23:33 +01:00
Cedric Nugteren 00281dad26 Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants 2017-02-27 21:00:04 +01:00
Cedric Nugteren e09c26c706 Split the GEMM kernel further up to prevent C1091 in MSVC 2017-02-26 15:03:12 +01:00
Cedric Nugteren df7638c305 Fixed an out-of-bounds memory access when filling a matrix with a constant 2017-02-26 14:31:05 +01:00
Cedric Nugteren 2f2a510c38 Implemented a simple row-major to col-major problem conversion for TRSM 2017-02-24 21:08:44 +01:00
Cedric Nugteren 1e5b5157bc Fixed a few issues with the TRSM routine; some tests still failing 2017-02-22 20:31:33 +01:00
Cedric Nugteren c248f900c0 Merge branch 'development' into triangular_solvers 2017-02-05 22:18:59 +01:00
Ivan Shapovalov 5bcd92f297 Routine, Cache: generalize, reduce amount of copying in fast path
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Cedric Nugteren df9a77d74d Added first version of the TRSM routine based on the diagonal invert kernel 2017-01-18 21:29:59 +01:00
Cedric Nugteren 681a465b35 Prepared for the addition of the TRSM triangular solver kernel 2016-12-18 12:30:16 +01:00
Cedric Nugteren 6b533dda1c Fixed a bug when using offsets in the direct GEMM kernels 2016-12-18 11:54:32 +01:00
Cedric Nugteren d8af24e388 Now correctly tests for validaty of the B matrix in the TRMM routine 2016-11-20 16:27:54 +01:00
Cedric Nugteren 2f0697564f Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything 2016-11-20 15:05:42 +01:00
Cedric Nugteren 280698d076 Merge pull request #117 from intelfx/exceptions
Convert to use C++ exceptions internally
2016-10-22 15:05:12 +02:00
Cedric Nugteren db17b1fbe9 Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters 2016-10-22 10:41:02 +02:00
Ivan Shapovalov 56f300607b Routine: get rid of ::SetUp()
Since we now use C++ exceptions inside the implementation (and exceptions
can be thrown from constructors), there is no need for a separate
Routine::SetUp() function.

For this, we also change the way how the kernel source string is constructed.
The kernel-specific source code is now passed to the Routine ctor via
an initializer_list of C strings to avoid unnecessary data copying
while also working around C1091 of MSVC 2013.
2016-10-22 08:45:27 +03:00
Ivan Shapovalov b98af44fcf treewide: use C++ exceptions properly
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.

Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.

However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren de77f00e8c Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 and 2015 2016-10-10 22:23:33 +02:00