CLBlast

mirror of https://github.com/CNugteren/CLBlast.git synced 2024-07-07 12:23:46 +02:00

Author	SHA1	Message	Date
Cedric Nugteren	8a19667e75	Merge pull request #372 from trantila/master Reduced number of TestMatrix calls for the batched xgemm routines.	2019-12-15 09:33:53 +01:00
Tarmo Räntilä	21b66ca761	Reduce TestMatrix calls for xgemmstridedbatched. Replace the looped test by a single one with the offset of the last batch.	2019-12-09 22:17:24 +02:00
Tarmo Räntilä	bf50c4e53e	Reduce TestMatrix calls for xgemmbatched. Replace the looped test by a single one with the maximal found offset.	2019-12-09 22:13:52 +02:00
Cedric Nugteren	6ac74008b6	Added notion of fixes in XhadFaster	2019-09-06 19:33:30 +02:00
Cedric Nugteren	701ac9bf76	Merge pull request #368 from etomzak/master Fix out-of-bounds read/write in XhadFaster	2019-09-06 19:30:52 +02:00
etomzak	9560193a9e	Fix out-of-bounds read/write in XhadFaster Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.	2019-09-04 12:55:25 +01:00
Cedric Nugteren	ec501055f9	Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin Fixed a bug in the absolute-min index kernel	2019-05-19 22:39:26 +02:00
Cedric Nugteren	3f9d7bca22	Fixed a bug in the absolute-min index kernel	2019-05-19 14:00:18 +02:00
Cedric Nugteren	500d19be4c	Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix intel shuffle extension fix	2019-05-16 20:12:32 +02:00
Cedric Nugteren	af6a9eedd1	Added a function to set the OpenCL kernel standard, either 1.1 or 1.2	2019-05-11 20:39:00 +02:00
Cedric Nugteren	9cbffc9b7c	Changed back to cl_intel_subgroups as suggested	2019-05-08 22:01:56 +02:00
Cedric Nugteren	c5a82f6978	Added a host-code check to make sure the avc_motion_estimation is available	2019-05-07 20:47:50 +02:00
Cedric Nugteren	c6ba86cdc3	Enabled avc_motion_estimation extension for Intel subgroup shuffling	2019-05-07 20:47:31 +02:00
Cedric Nugteren	774cebaa40	Merge pull request #356 from umar456/osx_assert Remove assert for extention not available in macOS	2019-05-06 09:58:36 +02:00
Umar Arshad	cf4907942c	Remove assert for extention not available in macOS The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime.	2019-05-03 23:28:07 -04:00
Cedric Nugteren	7084311e45	Added tuning parameters for Tesla P100 16GB	2019-02-09 16:31:48 +01:00
Cedric Nugteren	1035e533cd	Added tuning parameters for Xeon E5-2630 v3 and v4	2019-02-09 16:29:30 +01:00
Cedric Nugteren	eff0f9ad1d	Merge pull request #348 from CNugteren/CLBlast-334-pyclblast-half-precision-support PyCLBlast half precision support	2019-01-26 11:04:14 +01:00
Cedric Nugteren	e0541c41a1	Added fp32 to fp16 conversion function in Python to make haxpy example work	2019-01-23 19:52:01 +01:00
Cedric Nugteren	347f0df32f	Added a (non-working) sample of half precision AXPY in Python	2019-01-22 21:14:43 +01:00
Cedric Nugteren	23b9f655fa	Updated pyclblast README, updated to 1.2.0 for half-precision support	2019-01-22 21:14:02 +01:00
Cedric Nugteren	3937efdcda	Added experimental support for half-precision in pyclblast	2019-01-22 21:13:41 +01:00
Cedric Nugteren	9a9c24e811	Merge pull request #345 from CNugteren/convolution-fixes-and-tuner Convolution with single kernel	2019-01-19 17:56:05 +01:00
Cedric Nugteren	11f4c7dd93	Added documentation on the convgemm routine	2019-01-19 15:44:19 +01:00
Cedric Nugteren	c42e48068b	Added a few more initial Intel tuning parameters for convgemm	2019-01-19 15:32:35 +01:00
Cedric Nugteren	afcf5dc6eb	Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine	2019-01-05 10:56:35 +01:00
Cedric Nugteren	560f7a40f6	Added convgemm to the CLBlast database, added initial parameters for Skylake GPU	2018-12-31 19:05:34 +01:00
Cedric Nugteren	d929525039	Added support for the convgemm tuner in the tuner database	2018-12-31 18:49:12 +01:00
Cedric Nugteren	153ac06cf2	Added the forgotten batch dimension to the tuner to get correct kernel executions	2018-12-31 13:19:58 +01:00
Cedric Nugteren	b894993967	Merge pull request #343 from vbkaisetsu/feature/convgemm-single Fix single kernel version of convgemm	2018-12-23 11:11:59 +01:00
Cedric Nugteren	1f41c3c50a	Merge branch 'master' into convolution-fixes-and-tuner	2018-12-22 11:40:19 +01:00
Koichi Akabe	9532f8652c	Update changelog	2018-12-21 11:08:01 +09:00
Koichi Akabe	c0883cf2fe	Update the documentation	2018-12-18 14:08:16 +09:00
Koichi Akabe	a8e6f813dd	Fix the xconvgemm tuner	2018-12-18 14:05:25 +09:00
Cedric Nugteren	1f0cd61824	Added first version of a tuner for the ConvGemm direct kernel	2018-12-18 13:59:26 +09:00
Koichi Akabe	301dc280df	Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel	2018-12-18 13:56:00 +09:00
Cedric Nugteren	9819957768	Merge pull request #342 from vbkaisetsu/fix/im2col-hf-tests Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm	2018-12-17 22:39:53 +01:00
Koichi Akabe	d9db543d75	Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm	2018-12-17 21:57:35 +09:00
Cedric Nugteren	0c9411c844	Updated to version 1.5.0	2018-12-04 20:46:02 +01:00
Cedric Nugteren	09ab5f512f	Updated the roadmap document	2018-12-01 17:20:36 +01:00
Cedric Nugteren	4676ec2921	Added a FAQ document	2018-12-01 17:19:28 +01:00
Cedric Nugteren	cec021ac34	Merge pull request #341 from CNugteren/CLBlast-340-GEMMK1-issue-with-unequal-MWG-NWG Fixed an issue for the GEMMK == 1 kernel	2018-12-01 17:14:47 +01:00
Cedric Nugteren	c0e41b87cb	Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel	2018-11-30 20:23:26 +01:00
Cedric Nugteren	bca1506e87	Merge pull request #335 from vbkaisetsu/patch-1 Remove unnecessary qualifier of inline function	2018-11-19 21:03:27 +01:00
Koichi Akabe	a646d6ca46	Remove unnecessary attribute of inline function	2018-11-19 13:03:50 +09:00
Cedric Nugteren	e0ddfbfa3b	Merge pull request #332 from vbkaisetsu/feature/im2col-col2im-flip Add im2colflip and col2imflip functions	2018-11-17 20:51:11 +01:00
Koichi Akabe	032e3b0cc0	Add kernel_mode option to im2col, col2im, and convgemm functions	2018-11-12 10:12:07 +09:00
Cedric Nugteren	90112618da	Merge pull request #331 from CNugteren/CLBlast-270-col2im Implements col2im routine	2018-11-09 08:06:13 +01:00
Cedric Nugteren	6f67525ea6	Changed col2im to append to the existing im-buffer	2018-11-07 19:45:07 +01:00
Cedric Nugteren	2d32a23293	Added new col2im routine to the documentation	2018-11-01 21:46:19 +01:00

1 2 3 4 5 ...

1329 commits