Commit Graph

74 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Tarmo Räntilä 21b66ca761 Reduce TestMatrix calls for xgemmstridedbatched.
Replace the looped test by a single one with the offset of the last batch.
2019-12-09 22:17:24 +02:00
Tarmo Räntilä bf50c4e53e Reduce TestMatrix calls for xgemmbatched.
Replace the looped test by a single one with the maximal found offset.
2019-12-09 22:13:52 +02:00
Cedric Nugteren 9a9c24e811
Merge pull request #345 from CNugteren/convolution-fixes-and-tuner
Convolution with single kernel
2019-01-19 17:56:05 +01:00
Cedric Nugteren afcf5dc6eb Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine 2019-01-05 10:56:35 +01:00
Cedric Nugteren 560f7a40f6 Added convgemm to the CLBlast database, added initial parameters for Skylake GPU 2018-12-31 19:05:34 +01:00
Koichi Akabe 301dc280df Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel 2018-12-18 13:56:00 +09:00
Koichi Akabe 032e3b0cc0 Add kernel_mode option to im2col, col2im, and convgemm functions 2018-11-12 10:12:07 +09:00
Koichi Akabe 0b3d04f709 Fix col2im implementation 2018-10-30 14:54:55 +09:00
Cedric Nugteren d45911b61d Added groundwork for col2im algorithm plus first non-working version of kernel and test 2018-10-23 20:52:25 +02:00
Cedric Nugteren 44b630fc22 Some name changes in im2col code 2018-10-22 22:12:58 +02:00
Cedric Nugteren 83ba3d4b7b Merge branch 'master' into convgemm_multi_kernel 2018-09-16 20:01:18 +02:00
Cedric Nugteren 9bedaa752d Fixed an MSVC compilation error due to large strings 2018-09-15 17:35:26 +02:00
Cedric Nugteren c788e040f7 Added xCONVGEMM as im2col plus a batched GEMM kernel 2018-09-07 22:02:44 +02:00
Cedric Nugteren 1c9a741470 Merge branch 'master' into CLBlast-267-convgemm 2018-06-03 15:53:27 +02:00
Cedric Nugteren 7c3431a72a Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present 2018-06-01 20:59:44 +02:00
Cedric Nugteren 5702bff5ad Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue 2018-05-31 22:37:06 +02:00
Cedric Nugteren e609220393 Some potential fixes for error -54 when launching TRSV and TRSM kernels 2018-05-31 20:09:49 +02:00
Cedric Nugteren 01d254c0b0 Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM 2018-05-27 18:38:47 +02:00
Cedric Nugteren 53198121ac Made FillMatrix and FillVector functions take a configurable local workgroup size 2018-05-27 12:03:32 +02:00
Cedric Nugteren 5d87abf780 Added method selection option to switch between im2col and single-kernel approach for convgemm 2018-05-21 11:28:11 +02:00
Cedric Nugteren 37cabd4f1f Moved new convgemm kernel to levelx kernel folder 2018-05-19 21:05:45 +02:00
Cedric Nugteren 27b52ac2c8 Second version of direct reading from image tensor for convgemm: also with local memory support now 2018-05-19 21:02:44 +02:00
Cedric Nugteren e057a9186a First version of direct reading from image tensor for convgemm: only for edge cases now 2018-05-17 09:23:28 +01:00
Cedric Nugteren 0cb9580042 Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm kernel 2018-05-13 22:10:21 +02:00
Cedric Nugteren ad8f1027ab Plugged in the code of strided-batched-gemm into convgemm in preparation of a new kernel 2018-05-13 21:01:46 +02:00
Cedric Nugteren 4e6d30088d Changed temporary convgemm implementation to use batched-strided GEMM 2018-05-09 20:38:39 +02:00
Cedric Nugteren cc95d4fa03 Implemented convolution as im2col + GEMM 2018-05-09 17:42:59 +02:00
Cedric Nugteren 2d1f6ba7fe Added convgemm skeleton, test infrastructure, and first reference implementation 2018-05-06 11:35:34 +02:00
Cedric Nugteren 93610a9cba Fixed some failing tests for GEMM and batched GEMM routines 2018-04-15 12:53:32 +02:00
Cedric Nugteren 0dff7f1ac4 Made GEMM rotation expectations kernel-specific 2018-04-13 22:27:11 +02:00
Cedric Nugteren 69ed46c8da Implemented the XHAD Hadamard product routine 2018-02-02 21:18:37 +01:00
Cedric Nugteren ef5008f5e4 Created the API and stubs for the HAD (hadamard-product) routines 2018-01-31 20:41:02 +01:00
Cedric Nugteren caebe8a9d5 Fixed an event synchronisation issue in the batched gemm routines 2018-01-26 20:37:04 +01:00
Cedric Nugteren bc54411d19 Made the batched routines also chose direct/indirect kernel like the main GEMM routine 2018-01-18 19:41:02 +01:00
Cedric Nugteren 99a4df88a6 Implemented the in-direct version of the strided-batched GEMM kernel 2018-01-08 21:07:01 +01:00
Cedric Nugteren 13f0f6fc6e Implemented direct version of strided-batched GEMM kernel 2018-01-07 14:58:45 +01:00
Cedric Nugteren 9fb2c61b25 Added API and tests for new GemmStridedBatched routine 2018-01-07 14:27:15 +01:00
Cedric Nugteren f1e3b35541 Reduced duplicate code in the batched GEMM implementation 2018-01-06 19:26:11 +01:00
Cedric Nugteren 736399e528 Split the invert kernel in two parts to prevent error C1091 in MSVC 2013 2017-12-23 14:18:07 +01:00
Cedric Nugteren aa7db4f987 Added TRSV block-size tuner 2017-12-23 13:34:57 +01:00
Cedric Nugteren b4d3a50f19 Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit 2017-12-10 16:09:09 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren b1270f04b8 Made buffers of batched routines read/write (was: read-only) 2017-10-17 19:56:47 +02:00
Cedric Nugteren ae1eeb4d1f Fixed type conversion warnings under MSVC 2013 2017-09-19 19:44:34 +02:00
Cedric Nugteren 297159d5b9 Fixed a bug in im2col: process only valid channel IDs 2017-08-31 21:58:12 +02:00
Cedric Nugteren 6194d43efb Fixed a bug in im2col confusing first and second workgroup size; made im2col kernel 2d instead of 3d 2017-08-31 20:34:10 +02:00
Cedric Nugteren 4d9d03ba51 Completed im2col implementation 2017-08-24 21:11:12 +02:00
Cedric Nugteren 803ca781f9 First version of im2col kernel, unoptimized but working 2017-08-19 18:25:13 +02:00
Cedric Nugteren 777681dcbd Merge branch 'master' into im_to_col 2017-08-12 20:50:00 +02:00
Cedric Nugteren f77b48692b Relaxed requirement on a_ld and b_ld for batched GEMM 2017-07-12 21:53:39 +02:00