Cedric Nugteren
9fb2c61b25
Added API and tests for new GemmStridedBatched routine
2018-01-07 14:27:15 +01:00
Cedric Nugteren
f1e3b35541
Reduced duplicate code in the batched GEMM implementation
2018-01-06 19:26:11 +01:00
Cedric Nugteren
ad197da08d
Fixed the CUDA interface: replaced nullptr with 0
2018-01-06 13:38:44 +01:00
Cedric Nugteren
ad1227c4f2
Added optional temp-buffer argument to C++ interface of GEMM
2017-12-30 18:45:06 +01:00
Cedric Nugteren
6d1e30e61f
Added interface to compute the required temporary buffer size for GEMM
2017-12-28 14:46:45 +01:00
Cedric Nugteren
aaea9474a1
Factored out argument processing from the GEMM routine
2017-12-28 13:56:18 +01:00
Cedric Nugteren
74792ce96c
Refactored GEMM code in preparation of separate temp-buffer computation
2017-12-28 11:08:10 +01:00
Cedric Nugteren
736399e528
Split the invert kernel in two parts to prevent error C1091 in MSVC 2013
2017-12-23 14:18:07 +01:00
Cedric Nugteren
b1f52f130c
Updated the database to use the new TRSV and Invert tuners
2017-12-23 13:55:22 +01:00
Cedric Nugteren
aa7db4f987
Added TRSV block-size tuner
2017-12-23 13:34:57 +01:00
Cedric Nugteren
4a58efc130
Fixed for error C1091 in MSVC 2013
2017-12-10 16:40:59 +01:00
Cedric Nugteren
b4d3a50f19
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
2017-12-10 16:09:09 +01:00
Cedric Nugteren
f94d498a37
Moved compilation function to separate file; removed dependency of tuners of the CLBlast library
2017-11-17 20:57:46 +01:00
Cedric Nugteren
677afd3b96
Factored out the creation of the OpenCL header and the program compilation
2017-11-11 16:14:43 +01:00
Cedric Nugteren
9b0a435fb0
Integrated the GEMM routine tuner for kernel selection; added first tuning results
2017-11-02 21:47:14 +01:00
Cedric Nugteren
fa6e5e67f5
Fixed a bug when using the matrix A-offset argument for the TRSM routine
2017-10-27 22:12:30 +02:00
Cedric Nugteren
449577cf07
Reduced TRSM block-size for better numerical stability
2017-10-27 22:07:43 +02:00
Cedric Nugteren
44f7fa628a
Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM
2017-10-27 22:01:15 +02:00
Cedric Nugteren
d49aae236e
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls
2017-10-25 20:35:39 +02:00
Cedric Nugteren
b1270f04b8
Made buffers of batched routines read/write (was: read-only)
2017-10-17 19:56:47 +02:00
Cedric Nugteren
44246053a5
Removed include of clpp11.hpp in places other than utilities.hpp
2017-10-09 19:41:40 +02:00
Cedric Nugteren
df3c9f4a8a
Moved non-routine-specific API functions and includes to separate files
2017-10-08 21:52:02 +02:00
Cedric Nugteren
86b80cdc98
Fixed a small typo
2017-10-07 18:39:32 +02:00
Cedric Nugteren
375193fe4e
Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers
2017-10-03 21:55:21 +02:00
Cedric Nugteren
ae1eeb4d1f
Fixed type conversion warnings under MSVC 2013
2017-09-19 19:44:34 +02:00
Cedric Nugteren
297159d5b9
Fixed a bug in im2col: process only valid channel IDs
2017-08-31 21:58:12 +02:00
Cedric Nugteren
6194d43efb
Fixed a bug in im2col confusing first and second workgroup size; made im2col kernel 2d instead of 3d
2017-08-31 20:34:10 +02:00
Cedric Nugteren
161fd8514d
Merge branch 'master' into im_to_col
2017-08-24 21:15:14 +02:00
Cedric Nugteren
4d9d03ba51
Completed im2col implementation
2017-08-24 21:11:12 +02:00
Cedric Nugteren
e5eb6b1d3a
Merge pull request #173 from mcian/PSO_params
...
Add PSO parameters support and search strategy selection from command…
2017-08-21 20:06:29 +02:00
Cedric Nugteren
803ca781f9
First version of im2col kernel, unoptimized but working
2017-08-19 18:25:13 +02:00
Cedric Nugteren
777681dcbd
Merge branch 'master' into im_to_col
2017-08-12 20:50:00 +02:00
Cedric Nugteren
0a63621579
Moved functions from the header to the .cpp file to prevent compiling the same code multiple times
2017-08-12 15:59:14 +02:00
mcian
0b4aa109f8
Use cltune::SearchMethod enum instead of int values
2017-08-09 16:05:25 +02:00
mcian
99afdcd908
Restore direct GEMM to previous version
2017-07-31 14:06:23 +02:00
Cedric Nugteren
0ea16a0e63
Minor optimization for the direct GEMM kernel: don't ceil m and n unnecessarily high
2017-07-25 20:53:12 +02:00
Cedric Nugteren
f77b48692b
Relaxed requirement on a_ld and b_ld for batched GEMM
2017-07-12 21:53:39 +02:00
Cedric Nugteren
84ec50e29d
Added interface and stubs for the im2col routine
2017-07-02 12:10:22 +02:00
Cedric Nugteren
3070b502b5
Fixed an overflow bug on 32-bit systems when chosing a GEMM kernel
2017-06-18 20:51:11 +02:00
Cedric Nugteren
8400ee3a09
Fixed an TRSM issue caused by incorrect block size calculation
2017-05-15 22:04:55 +02:00
Cedric Nugteren
f151e56daa
Added the IxAMIN routines: absolute minimum version of IxAMAX
2017-05-12 20:01:33 -07:00
Cedric Nugteren
86e8df60f1
Fixed a bug in the TRSM routine; tests now pass
2017-05-12 17:43:56 -07:00
Cedric Nugteren
10205d773e
Added a new Xaxpy kernel in between the regular and fast version in
2017-04-14 20:16:10 +02:00
Cedric Nugteren
ce369702d8
Added some missing const-ness
2017-04-07 07:34:32 +02:00
Cedric Nugteren
c27d2f0c1e
Added an (optional) non-direct implementation of the batched GEMM routine
2017-03-19 16:04:04 +01:00
Cedric Nugteren
2fd04dae83
Added batched versions of the pad/copy/transpose kernels
2017-03-19 15:57:44 +01:00
Cedric Nugteren
7b8f8fce68
Added initial naive version of the batched GEMM routine based on the direct GEMM kernel
2017-03-11 16:02:45 +01:00
Cedric Nugteren
49e04c7fce
Added API and test infrastructure for the batched GEMM routine
2017-03-10 21:24:35 +01:00
Cedric Nugteren
878d93e7dc
Implemented a batched version of the AXPY kernel
2017-03-08 20:36:35 +01:00
Cedric Nugteren
fa0a9c689f
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
2017-03-08 20:10:20 +01:00
Cedric Nugteren
b114ea49a9
Added first naive version of the batched AXPY routine
2017-03-05 15:06:14 +01:00
Cedric Nugteren
3fc73851f7
Added proper support for the b_offset argument in TRSM
2017-03-01 21:23:33 +01:00
Cedric Nugteren
00281dad26
Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants
2017-02-27 21:00:04 +01:00
Cedric Nugteren
e09c26c706
Split the GEMM kernel further up to prevent C1091 in MSVC
2017-02-26 15:03:12 +01:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
df7638c305
Fixed an out-of-bounds memory access when filling a matrix with a constant
2017-02-26 14:31:05 +01:00
Cedric Nugteren
2f2a510c38
Implemented a simple row-major to col-major problem conversion for TRSM
2017-02-24 21:08:44 +01:00
Cedric Nugteren
1e5b5157bc
Fixed a few issues with the TRSM routine; some tests still failing
2017-02-22 20:31:33 +01:00
Cedric Nugteren
00eb55a2d4
Fixed a small bug in GEMV: unused kernel in parameter list
2017-02-13 20:48:32 +01:00
Cedric Nugteren
345a5feb9a
Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides)
2017-02-12 12:02:39 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Cedric Nugteren
c209dd7af9
Improved substition kernels a bit; added complex support
2017-02-04 22:48:06 +01:00
Cedric Nugteren
fec8c1a806
Completed a first STRSV implementation
2017-02-04 16:04:19 +01:00
Cedric Nugteren
a6ba6470aa
Added row-major support for TRSV
2017-02-04 14:25:27 +01:00
Cedric Nugteren
7c73ceb095
Added first (incomplete) version of TRSV routine
2017-01-29 17:02:00 +01:00
Ivan Shapovalov
5bcd92f297
Routine, Cache: generalize, reduce amount of copying in fast path
...
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov
1a1e863ab3
treewide: include clpp11.hpp first to silence deprecation warnings
...
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Cedric Nugteren
a5fd2323b6
Added prototype for the TRSV routine
2017-01-20 11:30:32 +01:00
Cedric Nugteren
df9a77d74d
Added first version of the TRSM routine based on the diagonal invert kernel
2017-01-18 21:29:59 +01:00
Cedric Nugteren
4b3ffd9989
Added a first version of the diagonal block invert routine in preparation of TRSM
2017-01-15 17:30:00 +01:00
Cedric Nugteren
681a465b35
Prepared for the addition of the TRSM triangular solver kernel
2016-12-18 12:30:16 +01:00
Cedric Nugteren
6b533dda1c
Fixed a bug when using offsets in the direct GEMM kernels
2016-12-18 11:54:32 +01:00
Cedric Nugteren
cb398f0e42
Merge pull request #125 from CNugteren/netlib_blas_api
...
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren
654b41bb2b
Fixed a bug in the HSCAL routine
2016-11-23 21:29:16 +01:00
Cedric Nugteren
d8af24e388
Now correctly tests for validaty of the B matrix in the TRMM routine
2016-11-20 16:27:54 +01:00
Cedric Nugteren
2f0697564f
Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything
2016-11-20 15:05:42 +01:00
Cedric Nugteren
76d5d2ccfc
Fixed a bug in the transpose-matrix function
2016-10-23 20:49:55 +02:00
Cedric Nugteren
280698d076
Merge pull request #117 from intelfx/exceptions
...
Convert to use C++ exceptions internally
2016-10-22 15:05:12 +02:00
Cedric Nugteren
db17b1fbe9
Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters
2016-10-22 10:41:02 +02:00
Ivan Shapovalov
56f300607b
Routine: get rid of ::SetUp()
...
Since we now use C++ exceptions inside the implementation (and exceptions
can be thrown from constructors), there is no need for a separate
Routine::SetUp() function.
For this, we also change the way how the kernel source string is constructed.
The kernel-specific source code is now passed to the Routine ctor via
an initializer_list of C strings to avoid unnecessary data copying
while also working around C1091 of MSVC 2013.
2016-10-22 08:45:27 +03:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren
de77f00e8c
Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 and 2015
2016-10-10 22:23:33 +02:00
Cedric Nugteren
a3e67f2be2
Added a kernel selection database to select between the direct and indirect GEMM kernels
2016-10-06 19:51:12 +02:00
Cedric Nugteren
c1c4bc5d20
Re-organised GEMM direct kernel and added faster fall-back version for incomplete rectangles
2016-10-03 19:32:01 +02:00
Cedric Nugteren
d8827e908c
Specialised the GEMM direct kernel in four ways for transposing/non-transposing: NN, NT, TN, TT
2016-10-02 17:59:05 +02:00
Cedric Nugteren
61f489e370
Split the GEMM direct kernel into two files; set the default tuning target to 256-256-256
2016-10-02 15:06:59 +02:00
Cedric Nugteren
73d135c2ce
Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter
2016-09-25 14:48:34 +02:00
Cedric Nugteren
669f43aed6
Separated the tuning parameters of the new direct GEMM kernel from the indirect version
2016-09-25 13:52:08 +02:00
Cedric Nugteren
6aa652d6ea
Merge branch 'development' into gemm_direct
2016-09-21 21:32:18 +02:00
Cedric Nugteren
4ce584a014
Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC can't handle long strings
2016-09-12 22:13:16 +02:00
Cedric Nugteren
5004a435ff
Fixed issues related to the recent changes in the Xgemm infrastructure
2016-07-26 20:59:59 +02:00
Cedric Nugteren
5053f6ebc6
Merge branch 'development' into gemm_direct
2016-07-26 20:53:31 +02:00
Cedric Nugteren
2582f0290a
Moved the XgemvFast and XgemvFastRot tuning database into a separate file
2016-07-25 22:43:49 +02:00
Cedric Nugteren
0252df731a
Merge branch 'development' into gemv_performance
2016-07-24 17:06:27 +02:00
Cedric Nugteren
75fe8235f7
Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enabling better memory performance
2016-07-23 10:20:11 +02:00
Ivan Shapovalov
ae3299da30
clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
2dd5ee3f75
clblast::RunKernel, cl::Kernel: take const vector as waitForEvents
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
1ae71614ac
xgemm: do not hardcode kernel requirements for internal matrix layout
...
Do not hardcode the knowledge about "A and C col-major, B row-major".
This allows for easier reuse of the DoGemm() routine with different
kernels.
2016-07-22 11:15:52 +03:00
Cedric Nugteren
798d32edad
Improved the GEMM direct kernel by adding register blocking. Still not fast though
2016-07-17 14:36:51 +02:00
Cedric Nugteren
eaa348735e
Created infrastructure to support a direct GEMM kernel; added correct but slow reference kernel as a place-holder
2016-07-16 15:18:28 +02:00