Tarmo Räntilä
|
21b66ca761
|
Reduce TestMatrix calls for xgemmstridedbatched.
Replace the looped test by a single one with the offset of the last batch.
|
2019-12-09 22:17:24 +02:00 |
Tarmo Räntilä
|
bf50c4e53e
|
Reduce TestMatrix calls for xgemmbatched.
Replace the looped test by a single one with the maximal found offset.
|
2019-12-09 22:13:52 +02:00 |
Cedric Nugteren
|
9a9c24e811
|
Merge pull request #345 from CNugteren/convolution-fixes-and-tuner
Convolution with single kernel
|
2019-01-19 17:56:05 +01:00 |
Cedric Nugteren
|
afcf5dc6eb
|
Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine
|
2019-01-05 10:56:35 +01:00 |
Cedric Nugteren
|
560f7a40f6
|
Added convgemm to the CLBlast database, added initial parameters for Skylake GPU
|
2018-12-31 19:05:34 +01:00 |
Koichi Akabe
|
301dc280df
|
Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel
|
2018-12-18 13:56:00 +09:00 |
Koichi Akabe
|
032e3b0cc0
|
Add kernel_mode option to im2col, col2im, and convgemm functions
|
2018-11-12 10:12:07 +09:00 |
Koichi Akabe
|
0b3d04f709
|
Fix col2im implementation
|
2018-10-30 14:54:55 +09:00 |
Cedric Nugteren
|
d45911b61d
|
Added groundwork for col2im algorithm plus first non-working version of kernel and test
|
2018-10-23 20:52:25 +02:00 |
Cedric Nugteren
|
44b630fc22
|
Some name changes in im2col code
|
2018-10-22 22:12:58 +02:00 |
Cedric Nugteren
|
83ba3d4b7b
|
Merge branch 'master' into convgemm_multi_kernel
|
2018-09-16 20:01:18 +02:00 |
Cedric Nugteren
|
9bedaa752d
|
Fixed an MSVC compilation error due to large strings
|
2018-09-15 17:35:26 +02:00 |
Cedric Nugteren
|
c788e040f7
|
Added xCONVGEMM as im2col plus a batched GEMM kernel
|
2018-09-07 22:02:44 +02:00 |
Cedric Nugteren
|
1c9a741470
|
Merge branch 'master' into CLBlast-267-convgemm
|
2018-06-03 15:53:27 +02:00 |
Cedric Nugteren
|
7c3431a72a
|
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present
|
2018-06-01 20:59:44 +02:00 |
Cedric Nugteren
|
5702bff5ad
|
Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue
|
2018-05-31 22:37:06 +02:00 |
Cedric Nugteren
|
e609220393
|
Some potential fixes for error -54 when launching TRSV and TRSM kernels
|
2018-05-31 20:09:49 +02:00 |
Cedric Nugteren
|
01d254c0b0
|
Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM
|
2018-05-27 18:38:47 +02:00 |
Cedric Nugteren
|
53198121ac
|
Made FillMatrix and FillVector functions take a configurable local workgroup size
|
2018-05-27 12:03:32 +02:00 |
Cedric Nugteren
|
5d87abf780
|
Added method selection option to switch between im2col and single-kernel approach for convgemm
|
2018-05-21 11:28:11 +02:00 |
Cedric Nugteren
|
37cabd4f1f
|
Moved new convgemm kernel to levelx kernel folder
|
2018-05-19 21:05:45 +02:00 |
Cedric Nugteren
|
27b52ac2c8
|
Second version of direct reading from image tensor for convgemm: also with local memory support now
|
2018-05-19 21:02:44 +02:00 |
Cedric Nugteren
|
e057a9186a
|
First version of direct reading from image tensor for convgemm: only for edge cases now
|
2018-05-17 09:23:28 +01:00 |
Cedric Nugteren
|
0cb9580042
|
Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm kernel
|
2018-05-13 22:10:21 +02:00 |
Cedric Nugteren
|
ad8f1027ab
|
Plugged in the code of strided-batched-gemm into convgemm in preparation of a new kernel
|
2018-05-13 21:01:46 +02:00 |
Cedric Nugteren
|
4e6d30088d
|
Changed temporary convgemm implementation to use batched-strided GEMM
|
2018-05-09 20:38:39 +02:00 |
Cedric Nugteren
|
cc95d4fa03
|
Implemented convolution as im2col + GEMM
|
2018-05-09 17:42:59 +02:00 |
Cedric Nugteren
|
2d1f6ba7fe
|
Added convgemm skeleton, test infrastructure, and first reference implementation
|
2018-05-06 11:35:34 +02:00 |
Cedric Nugteren
|
93610a9cba
|
Fixed some failing tests for GEMM and batched GEMM routines
|
2018-04-15 12:53:32 +02:00 |
Cedric Nugteren
|
0dff7f1ac4
|
Made GEMM rotation expectations kernel-specific
|
2018-04-13 22:27:11 +02:00 |
Cedric Nugteren
|
69ed46c8da
|
Implemented the XHAD Hadamard product routine
|
2018-02-02 21:18:37 +01:00 |
Cedric Nugteren
|
ef5008f5e4
|
Created the API and stubs for the HAD (hadamard-product) routines
|
2018-01-31 20:41:02 +01:00 |
Cedric Nugteren
|
caebe8a9d5
|
Fixed an event synchronisation issue in the batched gemm routines
|
2018-01-26 20:37:04 +01:00 |
Cedric Nugteren
|
bc54411d19
|
Made the batched routines also chose direct/indirect kernel like the main GEMM routine
|
2018-01-18 19:41:02 +01:00 |
Cedric Nugteren
|
99a4df88a6
|
Implemented the in-direct version of the strided-batched GEMM kernel
|
2018-01-08 21:07:01 +01:00 |
Cedric Nugteren
|
13f0f6fc6e
|
Implemented direct version of strided-batched GEMM kernel
|
2018-01-07 14:58:45 +01:00 |
Cedric Nugteren
|
9fb2c61b25
|
Added API and tests for new GemmStridedBatched routine
|
2018-01-07 14:27:15 +01:00 |
Cedric Nugteren
|
f1e3b35541
|
Reduced duplicate code in the batched GEMM implementation
|
2018-01-06 19:26:11 +01:00 |
Cedric Nugteren
|
736399e528
|
Split the invert kernel in two parts to prevent error C1091 in MSVC 2013
|
2017-12-23 14:18:07 +01:00 |
Cedric Nugteren
|
aa7db4f987
|
Added TRSV block-size tuner
|
2017-12-23 13:34:57 +01:00 |
Cedric Nugteren
|
b4d3a50f19
|
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
|
2017-12-10 16:09:09 +01:00 |
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
Cedric Nugteren
|
b1270f04b8
|
Made buffers of batched routines read/write (was: read-only)
|
2017-10-17 19:56:47 +02:00 |
Cedric Nugteren
|
ae1eeb4d1f
|
Fixed type conversion warnings under MSVC 2013
|
2017-09-19 19:44:34 +02:00 |
Cedric Nugteren
|
297159d5b9
|
Fixed a bug in im2col: process only valid channel IDs
|
2017-08-31 21:58:12 +02:00 |
Cedric Nugteren
|
6194d43efb
|
Fixed a bug in im2col confusing first and second workgroup size; made im2col kernel 2d instead of 3d
|
2017-08-31 20:34:10 +02:00 |
Cedric Nugteren
|
4d9d03ba51
|
Completed im2col implementation
|
2017-08-24 21:11:12 +02:00 |
Cedric Nugteren
|
803ca781f9
|
First version of im2col kernel, unoptimized but working
|
2017-08-19 18:25:13 +02:00 |
Cedric Nugteren
|
777681dcbd
|
Merge branch 'master' into im_to_col
|
2017-08-12 20:50:00 +02:00 |
Cedric Nugteren
|
f77b48692b
|
Relaxed requirement on a_ld and b_ld for batched GEMM
|
2017-07-12 21:53:39 +02:00 |