Cedric Nugteren
c0e41b87cb
Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel
2018-11-30 20:23:26 +01:00
Cedric Nugteren
9bedaa752d
Fixed an MSVC compilation error due to large strings
2018-09-15 17:35:26 +02:00
Cedric Nugteren
bf43dbb4ee
Made last operation in TRSV and TRSM asynchronous, making the events not null
2018-08-13 22:58:44 +02:00
Cedric Nugteren
7c3431a72a
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present
2018-06-01 20:59:44 +02:00
Cedric Nugteren
01d254c0b0
Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM
2018-05-27 18:38:47 +02:00
Cedric Nugteren
53198121ac
Made FillMatrix and FillVector functions take a configurable local workgroup size
2018-05-27 12:03:32 +02:00
Cedric Nugteren
458e6717a9
Expressed HER2K as two HERK calls
2018-04-18 20:58:29 +02:00
Cedric Nugteren
dcce23d938
Expressed SYR2K as two SYRK calls
2018-04-18 20:29:28 +02:00
Cedric Nugteren
ef6b1207df
Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel
2018-04-17 21:13:28 +02:00
Cedric Nugteren
93610a9cba
Fixed some failing tests for GEMM and batched GEMM routines
2018-04-15 12:53:32 +02:00
Cedric Nugteren
0dff7f1ac4
Made GEMM rotation expectations kernel-specific
2018-04-13 22:27:11 +02:00
Cedric Nugteren
ad197da08d
Fixed the CUDA interface: replaced nullptr with 0
2018-01-06 13:38:44 +01:00
Cedric Nugteren
ad1227c4f2
Added optional temp-buffer argument to C++ interface of GEMM
2017-12-30 18:45:06 +01:00
Cedric Nugteren
6d1e30e61f
Added interface to compute the required temporary buffer size for GEMM
2017-12-28 14:46:45 +01:00
Cedric Nugteren
aaea9474a1
Factored out argument processing from the GEMM routine
2017-12-28 13:56:18 +01:00
Cedric Nugteren
74792ce96c
Refactored GEMM code in preparation of separate temp-buffer computation
2017-12-28 11:08:10 +01:00
Cedric Nugteren
4a58efc130
Fixed for error C1091 in MSVC 2013
2017-12-10 16:40:59 +01:00
Cedric Nugteren
b4d3a50f19
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
2017-12-10 16:09:09 +01:00
Cedric Nugteren
9b0a435fb0
Integrated the GEMM routine tuner for kernel selection; added first tuning results
2017-11-02 21:47:14 +01:00
Cedric Nugteren
fa6e5e67f5
Fixed a bug when using the matrix A-offset argument for the TRSM routine
2017-10-27 22:12:30 +02:00
Cedric Nugteren
449577cf07
Reduced TRSM block-size for better numerical stability
2017-10-27 22:07:43 +02:00
Cedric Nugteren
d49aae236e
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls
2017-10-25 20:35:39 +02:00
Cedric Nugteren
86b80cdc98
Fixed a small typo
2017-10-07 18:39:32 +02:00
Cedric Nugteren
375193fe4e
Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers
2017-10-03 21:55:21 +02:00
Cedric Nugteren
e5eb6b1d3a
Merge pull request #173 from mcian/PSO_params
...
Add PSO parameters support and search strategy selection from command…
2017-08-21 20:06:29 +02:00
mcian
0b4aa109f8
Use cltune::SearchMethod enum instead of int values
2017-08-09 16:05:25 +02:00
mcian
99afdcd908
Restore direct GEMM to previous version
2017-07-31 14:06:23 +02:00
Cedric Nugteren
0ea16a0e63
Minor optimization for the direct GEMM kernel: don't ceil m and n unnecessarily high
2017-07-25 20:53:12 +02:00
Cedric Nugteren
3070b502b5
Fixed an overflow bug on 32-bit systems when chosing a GEMM kernel
2017-06-18 20:51:11 +02:00
Cedric Nugteren
8400ee3a09
Fixed an TRSM issue caused by incorrect block size calculation
2017-05-15 22:04:55 +02:00
Cedric Nugteren
86e8df60f1
Fixed a bug in the TRSM routine; tests now pass
2017-05-12 17:43:56 -07:00
Cedric Nugteren
7b8f8fce68
Added initial naive version of the batched GEMM routine based on the direct GEMM kernel
2017-03-11 16:02:45 +01:00
Cedric Nugteren
3fc73851f7
Added proper support for the b_offset argument in TRSM
2017-03-01 21:23:33 +01:00
Cedric Nugteren
00281dad26
Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants
2017-02-27 21:00:04 +01:00
Cedric Nugteren
e09c26c706
Split the GEMM kernel further up to prevent C1091 in MSVC
2017-02-26 15:03:12 +01:00
Cedric Nugteren
df7638c305
Fixed an out-of-bounds memory access when filling a matrix with a constant
2017-02-26 14:31:05 +01:00
Cedric Nugteren
2f2a510c38
Implemented a simple row-major to col-major problem conversion for TRSM
2017-02-24 21:08:44 +01:00
Cedric Nugteren
1e5b5157bc
Fixed a few issues with the TRSM routine; some tests still failing
2017-02-22 20:31:33 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Ivan Shapovalov
5bcd92f297
Routine, Cache: generalize, reduce amount of copying in fast path
...
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Cedric Nugteren
df9a77d74d
Added first version of the TRSM routine based on the diagonal invert kernel
2017-01-18 21:29:59 +01:00
Cedric Nugteren
681a465b35
Prepared for the addition of the TRSM triangular solver kernel
2016-12-18 12:30:16 +01:00
Cedric Nugteren
6b533dda1c
Fixed a bug when using offsets in the direct GEMM kernels
2016-12-18 11:54:32 +01:00
Cedric Nugteren
d8af24e388
Now correctly tests for validaty of the B matrix in the TRMM routine
2016-11-20 16:27:54 +01:00
Cedric Nugteren
2f0697564f
Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything
2016-11-20 15:05:42 +01:00
Cedric Nugteren
280698d076
Merge pull request #117 from intelfx/exceptions
...
Convert to use C++ exceptions internally
2016-10-22 15:05:12 +02:00
Cedric Nugteren
db17b1fbe9
Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters
2016-10-22 10:41:02 +02:00
Ivan Shapovalov
56f300607b
Routine: get rid of ::SetUp()
...
Since we now use C++ exceptions inside the implementation (and exceptions
can be thrown from constructors), there is no need for a separate
Routine::SetUp() function.
For this, we also change the way how the kernel source string is constructed.
The kernel-specific source code is now passed to the Routine ctor via
an initializer_list of C strings to avoid unnecessary data copying
while also working around C1091 of MSVC 2013.
2016-10-22 08:45:27 +03:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren
de77f00e8c
Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 and 2015
2016-10-10 22:23:33 +02:00