CLBlast

Commit Graph

Author	SHA1	Message	Date
Cedric Nugteren	3d0c227fa5	AMAX/AMIN integer testing and bug fixes (#457 ) * Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result * Perform proper integer-output testing in XAMAX tests * A few changes towards getting it ready for a PR * Also fix compilation for clBLAS and cuBLAS references * Fix a bug that would only use the real part of complex numbers in the amax/amin routines * A few small fixes related to the AMAX tests	2023-05-07 20:02:52 +02:00
Angus, Alexander	73f49e9b3d	Updated according to feedback from CNugteren	2023-01-17 08:35:29 -08:00
Angus, Alexander	4f394608a2	implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731	2023-01-03 10:56:04 -08:00
Cedric Nugteren	38fa34b432	Fix typo in comment Resolves https://github.com/CNugteren/CLBlast/issues/440	2022-06-24 09:32:47 +02:00
Justin Graham	ba254d2f50	sum fix	2022-04-22 11:39:38 -05:00
Cedric Nugteren	b46853660e	Made it more likely (but no guarantees) for amax/amin to return the first index	2020-03-08 11:26:49 +01:00
etomzak	9560193a9e	Fix out-of-bounds read/write in XhadFaster Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.	2019-09-04 12:55:25 +01:00
Cedric Nugteren	3f9d7bca22	Fixed a bug in the absolute-min index kernel	2019-05-19 14:00:18 +02:00
Cedric Nugteren	9cbffc9b7c	Changed back to cl_intel_subgroups as suggested	2019-05-08 22:01:56 +02:00
Cedric Nugteren	c6ba86cdc3	Enabled avc_motion_estimation extension for Intel subgroup shuffling	2019-05-07 20:47:31 +02:00
Koichi Akabe	301dc280df	Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel	2018-12-18 13:56:00 +09:00
Koichi Akabe	a646d6ca46	Remove unnecessary attribute of inline function	2018-11-19 13:03:50 +09:00
Koichi Akabe	032e3b0cc0	Add kernel_mode option to im2col, col2im, and convgemm functions	2018-11-12 10:12:07 +09:00
Cedric Nugteren	6f67525ea6	Changed col2im to append to the existing im-buffer	2018-11-07 19:45:07 +01:00
Cedric Nugteren	2d32a23293	Added new col2im routine to the documentation	2018-11-01 21:46:19 +01:00
Koichi Akabe	0b3d04f709	Fix col2im implementation	2018-10-30 14:54:55 +09:00
Cedric Nugteren	d45911b61d	Added groundwork for col2im algorithm plus first non-working version of kernel and test	2018-10-23 20:52:25 +02:00
Cedric Nugteren	9a1454496d	Fixed a bug with the pre-processing and the AXPY kernel	2018-10-17 21:15:53 +02:00
Cedric Nugteren	664a238adf	Fixed a bug in the XaxpyFaster kernel for specific parameters	2018-10-15 20:08:29 +02:00
Cedric Nugteren	634b2bc75c	Merge pull request #319 from CNugteren/convgemm_multi_kernel First im2col+GEMM implementation of convolution	2018-10-14 17:27:45 +02:00
Cedric Nugteren	1736c0cef4	Fixed pre-processor warnings related to the subgroup shuffling	2018-10-10 19:12:42 +02:00
Cedric Nugteren	83ba3d4b7b	Merge branch 'master' into convgemm_multi_kernel	2018-09-16 20:01:18 +02:00
Cedric Nugteren	0f6dd01e51	Fixed an MSVC compilation error due to large strings	2018-09-15 19:58:07 +02:00
Cedric Nugteren	51cc346751	Fixed issues with GEMMK=1 kernel and the pre-processor	2018-09-15 16:50:34 +02:00
Cedric Nugteren	c788e040f7	Added xCONVGEMM as im2col plus a batched GEMM kernel	2018-09-07 22:02:44 +02:00
Cedric Nugteren	5903820ba2	Merge branch 'master' into CLBlast-267-convgemm	2018-07-29 10:26:34 +02:00
Cedric Nugteren	0f0baa561b	Disabled the use of staggered indices on AMD GPUs for the new GEMMK == 1 kernels to improve performance	2018-07-28 14:36:33 +02:00
Cedric Nugteren	03bed8633e	Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel	2018-07-27 23:08:49 +02:00
Tyler Sorensen	0772d63498	moved a two-line macro to a single line	2018-07-16 20:12:30 -04:00
Tyler Sorensen	7709a7308b	Applied feedback from Cedric from first pull request	2018-07-14 19:50:47 -04:00
Tyler Sorensen	7f2e98a140	added inline ptx to support shuffle on Nvidia GPUs	2018-07-11 15:12:22 -04:00
Cedric Nugteren	1c9a741470	Merge branch 'master' into CLBlast-267-convgemm	2018-06-03 15:53:27 +02:00
Cedric Nugteren	e609220393	Some potential fixes for error -54 when launching TRSV and TRSM kernels	2018-05-31 20:09:49 +02:00
Cedric Nugteren	838422fbb1	Further implemented single-kernel approach of convgemm; extended test to capture other parts of the kernel code	2018-05-21 11:47:16 +02:00
Cedric Nugteren	5d87abf780	Added method selection option to switch between im2col and single-kernel approach for convgemm	2018-05-21 11:28:11 +02:00
Cedric Nugteren	37cabd4f1f	Moved new convgemm kernel to levelx kernel folder	2018-05-19 21:05:45 +02:00
Cedric Nugteren	27b52ac2c8	Second version of direct reading from image tensor for convgemm: also with local memory support now	2018-05-19 21:02:44 +02:00
Cedric Nugteren	e057a9186a	First version of direct reading from image tensor for convgemm: only for edge cases now	2018-05-17 09:23:28 +01:00
Cedric Nugteren	0cb9580042	Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm kernel	2018-05-13 22:10:21 +02:00
Cedric Nugteren	ad8f1027ab	Plugged in the code of strided-batched-gemm into convgemm in preparation of a new kernel	2018-05-13 21:01:46 +02:00
Cedric Nugteren	2965b87dda	Added Intel subgroup shuffle support to the 2D register caching GEMM kernel	2018-04-24 21:32:42 +02:00
Cedric Nugteren	a93fec1026	Fixed issues with the pre-processor	2018-04-08 18:02:44 +02:00
Cedric Nugteren	3519d32ac4	Extended the GEMM tuner to be able to tune the new 'kernel 1'	2018-04-07 17:05:44 +02:00
Cedric Nugteren	381f1fe67a	Fixed a compilation issue for complex datatypes and vload	2018-04-07 16:57:36 +02:00
Cedric Nugteren	2a29dc061c	Fixed a compilation issue for complex datatypes and vload	2018-04-06 21:06:13 +02:00
Cedric Nugteren	eae25f5727	Added first version of 2D register tiling kernel with A and C transposed as well	2018-04-03 21:18:40 +02:00
Cedric Nugteren	1cbe2ea301	Removed arrays as function argument from GEMM kernels for Vivante OpenCL compiler	2018-03-23 20:29:20 +01:00
Cedric Nugteren	52791bf355	Fixed a failing TRSM test using a CPU with Apple OpenCL	2018-03-15 21:09:52 +01:00
Cedric Nugteren	7a756cbce7	Fixed a failing TRSV test using a CPU with Apple OpenCL	2018-03-15 20:58:42 +01:00
Cedric Nugteren	69ed46c8da	Implemented the XHAD Hadamard product routine	2018-02-02 21:18:37 +01:00

1 2 3 4

196 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)