Cedric Nugteren
|
d94d086d6f
|
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer (#461)
|
2023-05-10 12:48:25 +02:00 |
Cedric Nugteren
|
4f24d92730
|
TRMV: Use the minimum x buffer size for copying to a temp buffer (#458)
|
2023-05-07 20:03:16 +02:00 |
Tarmo Räntilä
|
21b66ca761
|
Reduce TestMatrix calls for xgemmstridedbatched.
Replace the looped test by a single one with the offset of the last batch.
|
2019-12-09 22:17:24 +02:00 |
Tarmo Räntilä
|
bf50c4e53e
|
Reduce TestMatrix calls for xgemmbatched.
Replace the looped test by a single one with the maximal found offset.
|
2019-12-09 22:13:52 +02:00 |
Cedric Nugteren
|
9a9c24e811
|
Merge pull request #345 from CNugteren/convolution-fixes-and-tuner
Convolution with single kernel
|
2019-01-19 17:56:05 +01:00 |
Cedric Nugteren
|
afcf5dc6eb
|
Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine
|
2019-01-05 10:56:35 +01:00 |
Cedric Nugteren
|
560f7a40f6
|
Added convgemm to the CLBlast database, added initial parameters for Skylake GPU
|
2018-12-31 19:05:34 +01:00 |
Koichi Akabe
|
301dc280df
|
Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel
|
2018-12-18 13:56:00 +09:00 |
Cedric Nugteren
|
c0e41b87cb
|
Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel
|
2018-11-30 20:23:26 +01:00 |
Koichi Akabe
|
032e3b0cc0
|
Add kernel_mode option to im2col, col2im, and convgemm functions
|
2018-11-12 10:12:07 +09:00 |
Koichi Akabe
|
0b3d04f709
|
Fix col2im implementation
|
2018-10-30 14:54:55 +09:00 |
Cedric Nugteren
|
d45911b61d
|
Added groundwork for col2im algorithm plus first non-working version of kernel and test
|
2018-10-23 20:52:25 +02:00 |
Cedric Nugteren
|
44b630fc22
|
Some name changes in im2col code
|
2018-10-22 22:12:58 +02:00 |
Cedric Nugteren
|
83ba3d4b7b
|
Merge branch 'master' into convgemm_multi_kernel
|
2018-09-16 20:01:18 +02:00 |
Cedric Nugteren
|
9bedaa752d
|
Fixed an MSVC compilation error due to large strings
|
2018-09-15 17:35:26 +02:00 |
Cedric Nugteren
|
c788e040f7
|
Added xCONVGEMM as im2col plus a batched GEMM kernel
|
2018-09-07 22:02:44 +02:00 |
Cedric Nugteren
|
bf43dbb4ee
|
Made last operation in TRSV and TRSM asynchronous, making the events not null
|
2018-08-13 22:58:44 +02:00 |
Cedric Nugteren
|
3115c15db5
|
Small refactoring of events in TRSV substitution routine
|
2018-08-13 22:58:01 +02:00 |
Cedric Nugteren
|
503ab74f02
|
Fixed issue with not performing complex conjugation under certain cases when transposing
|
2018-07-31 21:49:37 +02:00 |
Cedric Nugteren
|
1c9a741470
|
Merge branch 'master' into CLBlast-267-convgemm
|
2018-06-03 15:53:27 +02:00 |
Cedric Nugteren
|
7c3431a72a
|
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present
|
2018-06-01 20:59:44 +02:00 |
Cedric Nugteren
|
5702bff5ad
|
Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue
|
2018-05-31 22:37:06 +02:00 |
Cedric Nugteren
|
e609220393
|
Some potential fixes for error -54 when launching TRSV and TRSM kernels
|
2018-05-31 20:09:49 +02:00 |
Cedric Nugteren
|
ff4d5558a6
|
Widened Apple OpenCL check, added way to debug too-large-workgroups issue
|
2018-05-30 22:59:04 +02:00 |
Cedric Nugteren
|
01d254c0b0
|
Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM
|
2018-05-27 18:38:47 +02:00 |
Cedric Nugteren
|
53198121ac
|
Made FillMatrix and FillVector functions take a configurable local workgroup size
|
2018-05-27 12:03:32 +02:00 |
Cedric Nugteren
|
5d87abf780
|
Added method selection option to switch between im2col and single-kernel approach for convgemm
|
2018-05-21 11:28:11 +02:00 |
Cedric Nugteren
|
37cabd4f1f
|
Moved new convgemm kernel to levelx kernel folder
|
2018-05-19 21:05:45 +02:00 |
Cedric Nugteren
|
27b52ac2c8
|
Second version of direct reading from image tensor for convgemm: also with local memory support now
|
2018-05-19 21:02:44 +02:00 |
Cedric Nugteren
|
cbcd4ff7e8
|
Merge branch 'master' into CLBlast-267-convgemm
|
2018-05-19 17:54:27 +02:00 |
Cedric Nugteren
|
e057a9186a
|
First version of direct reading from image tensor for convgemm: only for edge cases now
|
2018-05-17 09:23:28 +01:00 |
Cedric Nugteren
|
0cb9580042
|
Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm kernel
|
2018-05-13 22:10:21 +02:00 |
Cedric Nugteren
|
ad8f1027ab
|
Plugged in the code of strided-batched-gemm into convgemm in preparation of a new kernel
|
2018-05-13 21:01:46 +02:00 |
Cedric Nugteren
|
4e6d30088d
|
Changed temporary convgemm implementation to use batched-strided GEMM
|
2018-05-09 20:38:39 +02:00 |
Cedric Nugteren
|
cc95d4fa03
|
Implemented convolution as im2col + GEMM
|
2018-05-09 17:42:59 +02:00 |
Cedric Nugteren
|
2d1f6ba7fe
|
Added convgemm skeleton, test infrastructure, and first reference implementation
|
2018-05-06 11:35:34 +02:00 |
Cedric Nugteren
|
8258321a74
|
Now stores a shared_ptr to the Program class in the cache
|
2018-05-01 20:34:48 +02:00 |
Cedric Nugteren
|
458e6717a9
|
Expressed HER2K as two HERK calls
|
2018-04-18 20:58:29 +02:00 |
Cedric Nugteren
|
dcce23d938
|
Expressed SYR2K as two SYRK calls
|
2018-04-18 20:29:28 +02:00 |
Cedric Nugteren
|
ef6b1207df
|
Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel
|
2018-04-17 21:13:28 +02:00 |
Cedric Nugteren
|
93610a9cba
|
Fixed some failing tests for GEMM and batched GEMM routines
|
2018-04-15 12:53:32 +02:00 |
Cedric Nugteren
|
0dff7f1ac4
|
Made GEMM rotation expectations kernel-specific
|
2018-04-13 22:27:11 +02:00 |
Cedric Nugteren
|
52791bf355
|
Fixed a failing TRSM test using a CPU with Apple OpenCL
|
2018-03-15 21:09:52 +01:00 |
Cedric Nugteren
|
7a756cbce7
|
Fixed a failing TRSV test using a CPU with Apple OpenCL
|
2018-03-15 20:58:42 +01:00 |
Cedric Nugteren
|
69ed46c8da
|
Implemented the XHAD Hadamard product routine
|
2018-02-02 21:18:37 +01:00 |
Cedric Nugteren
|
ef5008f5e4
|
Created the API and stubs for the HAD (hadamard-product) routines
|
2018-01-31 20:41:02 +01:00 |
Cedric Nugteren
|
caebe8a9d5
|
Fixed an event synchronisation issue in the batched gemm routines
|
2018-01-26 20:37:04 +01:00 |
Cedric Nugteren
|
bc54411d19
|
Made the batched routines also chose direct/indirect kernel like the main GEMM routine
|
2018-01-18 19:41:02 +01:00 |
Cedric Nugteren
|
99a4df88a6
|
Implemented the in-direct version of the strided-batched GEMM kernel
|
2018-01-08 21:07:01 +01:00 |
Cedric Nugteren
|
13f0f6fc6e
|
Implemented direct version of strided-batched GEMM kernel
|
2018-01-07 14:58:45 +01:00 |