Commit Graph

231 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren 9fb2c61b25 Added API and tests for new GemmStridedBatched routine 2018-01-07 14:27:15 +01:00
Cedric Nugteren f1e3b35541 Reduced duplicate code in the batched GEMM implementation 2018-01-06 19:26:11 +01:00
Cedric Nugteren ad197da08d Fixed the CUDA interface: replaced nullptr with 0 2018-01-06 13:38:44 +01:00
Cedric Nugteren ad1227c4f2 Added optional temp-buffer argument to C++ interface of GEMM 2017-12-30 18:45:06 +01:00
Cedric Nugteren 6d1e30e61f Added interface to compute the required temporary buffer size for GEMM 2017-12-28 14:46:45 +01:00
Cedric Nugteren aaea9474a1 Factored out argument processing from the GEMM routine 2017-12-28 13:56:18 +01:00
Cedric Nugteren 74792ce96c Refactored GEMM code in preparation of separate temp-buffer computation 2017-12-28 11:08:10 +01:00
Cedric Nugteren 736399e528 Split the invert kernel in two parts to prevent error C1091 in MSVC 2013 2017-12-23 14:18:07 +01:00
Cedric Nugteren b1f52f130c Updated the database to use the new TRSV and Invert tuners 2017-12-23 13:55:22 +01:00
Cedric Nugteren aa7db4f987 Added TRSV block-size tuner 2017-12-23 13:34:57 +01:00
Cedric Nugteren 4a58efc130 Fixed for error C1091 in MSVC 2013 2017-12-10 16:40:59 +01:00
Cedric Nugteren b4d3a50f19 Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit 2017-12-10 16:09:09 +01:00
Cedric Nugteren f94d498a37 Moved compilation function to separate file; removed dependency of tuners of the CLBlast library 2017-11-17 20:57:46 +01:00
Cedric Nugteren 677afd3b96 Factored out the creation of the OpenCL header and the program compilation 2017-11-11 16:14:43 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren fa6e5e67f5 Fixed a bug when using the matrix A-offset argument for the TRSM routine 2017-10-27 22:12:30 +02:00
Cedric Nugteren 449577cf07 Reduced TRSM block-size for better numerical stability 2017-10-27 22:07:43 +02:00
Cedric Nugteren 44f7fa628a Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM 2017-10-27 22:01:15 +02:00
Cedric Nugteren d49aae236e Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls 2017-10-25 20:35:39 +02:00
Cedric Nugteren b1270f04b8 Made buffers of batched routines read/write (was: read-only) 2017-10-17 19:56:47 +02:00
Cedric Nugteren 44246053a5 Removed include of clpp11.hpp in places other than utilities.hpp 2017-10-09 19:41:40 +02:00
Cedric Nugteren df3c9f4a8a Moved non-routine-specific API functions and includes to separate files 2017-10-08 21:52:02 +02:00
Cedric Nugteren 86b80cdc98 Fixed a small typo 2017-10-07 18:39:32 +02:00
Cedric Nugteren 375193fe4e Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers 2017-10-03 21:55:21 +02:00
Cedric Nugteren ae1eeb4d1f Fixed type conversion warnings under MSVC 2013 2017-09-19 19:44:34 +02:00
Cedric Nugteren 297159d5b9 Fixed a bug in im2col: process only valid channel IDs 2017-08-31 21:58:12 +02:00
Cedric Nugteren 6194d43efb Fixed a bug in im2col confusing first and second workgroup size; made im2col kernel 2d instead of 3d 2017-08-31 20:34:10 +02:00
Cedric Nugteren 161fd8514d Merge branch 'master' into im_to_col 2017-08-24 21:15:14 +02:00
Cedric Nugteren 4d9d03ba51 Completed im2col implementation 2017-08-24 21:11:12 +02:00
Cedric Nugteren e5eb6b1d3a Merge pull request #173 from mcian/PSO_params
Add PSO parameters support and search strategy selection from command…
2017-08-21 20:06:29 +02:00
Cedric Nugteren 803ca781f9 First version of im2col kernel, unoptimized but working 2017-08-19 18:25:13 +02:00
Cedric Nugteren 777681dcbd Merge branch 'master' into im_to_col 2017-08-12 20:50:00 +02:00
Cedric Nugteren 0a63621579 Moved functions from the header to the .cpp file to prevent compiling the same code multiple times 2017-08-12 15:59:14 +02:00
mcian 0b4aa109f8 Use cltune::SearchMethod enum instead of int values 2017-08-09 16:05:25 +02:00
mcian 99afdcd908 Restore direct GEMM to previous version 2017-07-31 14:06:23 +02:00
Cedric Nugteren 0ea16a0e63 Minor optimization for the direct GEMM kernel: don't ceil m and n unnecessarily high 2017-07-25 20:53:12 +02:00
Cedric Nugteren f77b48692b Relaxed requirement on a_ld and b_ld for batched GEMM 2017-07-12 21:53:39 +02:00
Cedric Nugteren 84ec50e29d Added interface and stubs for the im2col routine 2017-07-02 12:10:22 +02:00
Cedric Nugteren 3070b502b5 Fixed an overflow bug on 32-bit systems when chosing a GEMM kernel 2017-06-18 20:51:11 +02:00
Cedric Nugteren 8400ee3a09 Fixed an TRSM issue caused by incorrect block size calculation 2017-05-15 22:04:55 +02:00
Cedric Nugteren f151e56daa Added the IxAMIN routines: absolute minimum version of IxAMAX 2017-05-12 20:01:33 -07:00
Cedric Nugteren 86e8df60f1 Fixed a bug in the TRSM routine; tests now pass 2017-05-12 17:43:56 -07:00
Cedric Nugteren 10205d773e Added a new Xaxpy kernel in between the regular and fast version in 2017-04-14 20:16:10 +02:00
Cedric Nugteren ce369702d8 Added some missing const-ness 2017-04-07 07:34:32 +02:00
Cedric Nugteren c27d2f0c1e Added an (optional) non-direct implementation of the batched GEMM routine 2017-03-19 16:04:04 +01:00
Cedric Nugteren 2fd04dae83 Added batched versions of the pad/copy/transpose kernels 2017-03-19 15:57:44 +01:00
Cedric Nugteren 7b8f8fce68 Added initial naive version of the batched GEMM routine based on the direct GEMM kernel 2017-03-11 16:02:45 +01:00
Cedric Nugteren 49e04c7fce Added API and test infrastructure for the batched GEMM routine 2017-03-10 21:24:35 +01:00
Cedric Nugteren 878d93e7dc Implemented a batched version of the AXPY kernel 2017-03-08 20:36:35 +01:00
Cedric Nugteren fa0a9c689f Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes 2017-03-08 20:10:20 +01:00
Cedric Nugteren b114ea49a9 Added first naive version of the batched AXPY routine 2017-03-05 15:06:14 +01:00
Cedric Nugteren 3fc73851f7 Added proper support for the b_offset argument in TRSM 2017-03-01 21:23:33 +01:00
Cedric Nugteren 00281dad26 Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants 2017-02-27 21:00:04 +01:00
Cedric Nugteren e09c26c706 Split the GEMM kernel further up to prevent C1091 in MSVC 2017-02-26 15:03:12 +01:00
Cedric Nugteren ea6790665d Merge branch 'development' into triangular_solvers 2017-02-26 14:51:45 +01:00
Cedric Nugteren df7638c305 Fixed an out-of-bounds memory access when filling a matrix with a constant 2017-02-26 14:31:05 +01:00
Cedric Nugteren 2f2a510c38 Implemented a simple row-major to col-major problem conversion for TRSM 2017-02-24 21:08:44 +01:00
Cedric Nugteren 1e5b5157bc Fixed a few issues with the TRSM routine; some tests still failing 2017-02-22 20:31:33 +01:00
Cedric Nugteren 00eb55a2d4 Fixed a small bug in GEMV: unused kernel in parameter list 2017-02-13 20:48:32 +01:00
Cedric Nugteren 345a5feb9a Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides) 2017-02-12 12:02:39 +01:00
Cedric Nugteren c248f900c0 Merge branch 'development' into triangular_solvers 2017-02-05 22:18:59 +01:00
Cedric Nugteren c209dd7af9 Improved substition kernels a bit; added complex support 2017-02-04 22:48:06 +01:00
Cedric Nugteren fec8c1a806 Completed a first STRSV implementation 2017-02-04 16:04:19 +01:00
Cedric Nugteren a6ba6470aa Added row-major support for TRSV 2017-02-04 14:25:27 +01:00
Cedric Nugteren 7c73ceb095 Added first (incomplete) version of TRSV routine 2017-01-29 17:02:00 +01:00
Ivan Shapovalov 5bcd92f297 Routine, Cache: generalize, reduce amount of copying in fast path
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov 1a1e863ab3 treewide: include clpp11.hpp first to silence deprecation warnings
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Cedric Nugteren a5fd2323b6 Added prototype for the TRSV routine 2017-01-20 11:30:32 +01:00
Cedric Nugteren df9a77d74d Added first version of the TRSM routine based on the diagonal invert kernel 2017-01-18 21:29:59 +01:00
Cedric Nugteren 4b3ffd9989 Added a first version of the diagonal block invert routine in preparation of TRSM 2017-01-15 17:30:00 +01:00
Cedric Nugteren 681a465b35 Prepared for the addition of the TRSM triangular solver kernel 2016-12-18 12:30:16 +01:00
Cedric Nugteren 6b533dda1c Fixed a bug when using offsets in the direct GEMM kernels 2016-12-18 11:54:32 +01:00
Cedric Nugteren cb398f0e42 Merge pull request #125 from CNugteren/netlib_blas_api
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren 654b41bb2b Fixed a bug in the HSCAL routine 2016-11-23 21:29:16 +01:00
Cedric Nugteren d8af24e388 Now correctly tests for validaty of the B matrix in the TRMM routine 2016-11-20 16:27:54 +01:00
Cedric Nugteren 2f0697564f Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything 2016-11-20 15:05:42 +01:00
Cedric Nugteren 76d5d2ccfc Fixed a bug in the transpose-matrix function 2016-10-23 20:49:55 +02:00
Cedric Nugteren 280698d076 Merge pull request #117 from intelfx/exceptions
Convert to use C++ exceptions internally
2016-10-22 15:05:12 +02:00
Cedric Nugteren db17b1fbe9 Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters 2016-10-22 10:41:02 +02:00
Ivan Shapovalov 56f300607b Routine: get rid of ::SetUp()
Since we now use C++ exceptions inside the implementation (and exceptions
can be thrown from constructors), there is no need for a separate
Routine::SetUp() function.

For this, we also change the way how the kernel source string is constructed.
The kernel-specific source code is now passed to the Routine ctor via
an initializer_list of C strings to avoid unnecessary data copying
while also working around C1091 of MSVC 2013.
2016-10-22 08:45:27 +03:00
Ivan Shapovalov b98af44fcf treewide: use C++ exceptions properly
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.

Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.

However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren de77f00e8c Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 and 2015 2016-10-10 22:23:33 +02:00
Cedric Nugteren a3e67f2be2 Added a kernel selection database to select between the direct and indirect GEMM kernels 2016-10-06 19:51:12 +02:00
Cedric Nugteren c1c4bc5d20 Re-organised GEMM direct kernel and added faster fall-back version for incomplete rectangles 2016-10-03 19:32:01 +02:00
Cedric Nugteren d8827e908c Specialised the GEMM direct kernel in four ways for transposing/non-transposing: NN, NT, TN, TT 2016-10-02 17:59:05 +02:00
Cedric Nugteren 61f489e370 Split the GEMM direct kernel into two files; set the default tuning target to 256-256-256 2016-10-02 15:06:59 +02:00
Cedric Nugteren 73d135c2ce Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter 2016-09-25 14:48:34 +02:00
Cedric Nugteren 669f43aed6 Separated the tuning parameters of the new direct GEMM kernel from the indirect version 2016-09-25 13:52:08 +02:00
Cedric Nugteren 6aa652d6ea Merge branch 'development' into gemm_direct 2016-09-21 21:32:18 +02:00
Cedric Nugteren 4ce584a014 Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC can't handle long strings 2016-09-12 22:13:16 +02:00
Cedric Nugteren 5004a435ff Fixed issues related to the recent changes in the Xgemm infrastructure 2016-07-26 20:59:59 +02:00
Cedric Nugteren 5053f6ebc6 Merge branch 'development' into gemm_direct 2016-07-26 20:53:31 +02:00
Cedric Nugteren 2582f0290a Moved the XgemvFast and XgemvFastRot tuning database into a separate file 2016-07-25 22:43:49 +02:00
Cedric Nugteren 0252df731a Merge branch 'development' into gemv_performance 2016-07-24 17:06:27 +02:00
Cedric Nugteren 75fe8235f7 Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enabling better memory performance 2016-07-23 10:20:11 +02:00
Ivan Shapovalov ae3299da30 clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS 2016-07-22 11:15:52 +03:00
Ivan Shapovalov 2dd5ee3f75 clblast::RunKernel, cl::Kernel: take const vector as waitForEvents 2016-07-22 11:15:52 +03:00
Ivan Shapovalov 1ae71614ac xgemm: do not hardcode kernel requirements for internal matrix layout
Do not hardcode the knowledge about "A and C col-major, B row-major".

This allows for easier reuse of the DoGemm() routine with different
kernels.
2016-07-22 11:15:52 +03:00
Cedric Nugteren 798d32edad Improved the GEMM direct kernel by adding register blocking. Still not fast though 2016-07-17 14:36:51 +02:00
Cedric Nugteren eaa348735e Created infrastructure to support a direct GEMM kernel; added correct but slow reference kernel as a place-holder 2016-07-16 15:18:28 +02:00