CLBlast

mirror of https://github.com/CNugteren/CLBlast.git synced 2024-07-02 12:26:57 +02:00

Author	SHA1	Message	Date
Cedric Nugteren	5f4b3ffcf7	Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner Move queue creation out of the tuner loop	2020-05-04 20:26:42 +02:00
Cedric Nugteren	bbb2031bf3	Move queue creation out of the tuner loop	2020-05-03 20:30:55 +02:00
Cedric Nugteren	78300ccbea	Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin Change amax/amin behaviour	2020-03-15 11:34:31 +01:00
Cedric Nugteren	5f97d64505	Update API documentation	2020-03-08 11:29:47 +01:00
Cedric Nugteren	b46853660e	Made it more likely (but no guarantees) for amax/amin to return the first index	2020-03-08 11:26:49 +01:00
Cedric Nugteren	7fab29304c	Added sample to play around with XAMAX routine	2020-03-08 11:26:18 +01:00
Cedric Nugteren	e3ce88154a	Silenced a new OpenCL warning message	2020-03-08 10:14:59 +01:00
Cedric Nugteren	8433985051	Updated to version 1.5.1	2020-02-18 10:29:40 +01:00
Cedric Nugteren	bf4e4198b7	Merge pull request #376 from CNugteren/fix_tuner_exception_catching Catches all exceptions of the tuners	2020-02-18 10:23:43 +01:00
Cedric Nugteren	49eb490ee1	Catches all exceptions of the tuners	2020-02-17 22:07:51 +01:00
Cedric Nugteren	8a19667e75	Merge pull request #372 from trantila/master Reduced number of TestMatrix calls for the batched xgemm routines.	2019-12-15 09:33:53 +01:00
Tarmo Räntilä	21b66ca761	Reduce TestMatrix calls for xgemmstridedbatched. Replace the looped test by a single one with the offset of the last batch.	2019-12-09 22:17:24 +02:00
Tarmo Räntilä	bf50c4e53e	Reduce TestMatrix calls for xgemmbatched. Replace the looped test by a single one with the maximal found offset.	2019-12-09 22:13:52 +02:00
Cedric Nugteren	6ac74008b6	Added notion of fixes in XhadFaster	2019-09-06 19:33:30 +02:00
Cedric Nugteren	701ac9bf76	Merge pull request #368 from etomzak/master Fix out-of-bounds read/write in XhadFaster	2019-09-06 19:30:52 +02:00
etomzak	9560193a9e	Fix out-of-bounds read/write in XhadFaster Fix an error in XhadFaster where data would be written beyond the end of zgm. The kernel loop assumed that there was always enough work for each thread to process WPT items, but this was not enforced. It's possible to detect the overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be ~500 (much larger than the normal 127). This commit may improve the performance of XhadFaster, since the kernel was performing 2x work in some cases (once over real data, once over garbage). Courtesy of Codeplay Software Ltd.	2019-09-04 12:55:25 +01:00
Cedric Nugteren	ec501055f9	Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin Fixed a bug in the absolute-min index kernel	2019-05-19 22:39:26 +02:00
Cedric Nugteren	3f9d7bca22	Fixed a bug in the absolute-min index kernel	2019-05-19 14:00:18 +02:00
Cedric Nugteren	500d19be4c	Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix intel shuffle extension fix	2019-05-16 20:12:32 +02:00
Cedric Nugteren	af6a9eedd1	Added a function to set the OpenCL kernel standard, either 1.1 or 1.2	2019-05-11 20:39:00 +02:00
Cedric Nugteren	9cbffc9b7c	Changed back to cl_intel_subgroups as suggested	2019-05-08 22:01:56 +02:00
Cedric Nugteren	c5a82f6978	Added a host-code check to make sure the avc_motion_estimation is available	2019-05-07 20:47:50 +02:00
Cedric Nugteren	c6ba86cdc3	Enabled avc_motion_estimation extension for Intel subgroup shuffling	2019-05-07 20:47:31 +02:00
Cedric Nugteren	774cebaa40	Merge pull request #356 from umar456/osx_assert Remove assert for extention not available in macOS	2019-05-06 09:58:36 +02:00
Umar Arshad	cf4907942c	Remove assert for extention not available in macOS The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime.	2019-05-03 23:28:07 -04:00
Cedric Nugteren	7084311e45	Added tuning parameters for Tesla P100 16GB	2019-02-09 16:31:48 +01:00
Cedric Nugteren	1035e533cd	Added tuning parameters for Xeon E5-2630 v3 and v4	2019-02-09 16:29:30 +01:00
Cedric Nugteren	eff0f9ad1d	Merge pull request #348 from CNugteren/CLBlast-334-pyclblast-half-precision-support PyCLBlast half precision support	2019-01-26 11:04:14 +01:00
Cedric Nugteren	e0541c41a1	Added fp32 to fp16 conversion function in Python to make haxpy example work	2019-01-23 19:52:01 +01:00
Cedric Nugteren	347f0df32f	Added a (non-working) sample of half precision AXPY in Python	2019-01-22 21:14:43 +01:00
Cedric Nugteren	23b9f655fa	Updated pyclblast README, updated to 1.2.0 for half-precision support	2019-01-22 21:14:02 +01:00
Cedric Nugteren	3937efdcda	Added experimental support for half-precision in pyclblast	2019-01-22 21:13:41 +01:00
Cedric Nugteren	9a9c24e811	Merge pull request #345 from CNugteren/convolution-fixes-and-tuner Convolution with single kernel	2019-01-19 17:56:05 +01:00
Cedric Nugteren	11f4c7dd93	Added documentation on the convgemm routine	2019-01-19 15:44:19 +01:00
Cedric Nugteren	c42e48068b	Added a few more initial Intel tuning parameters for convgemm	2019-01-19 15:32:35 +01:00
Cedric Nugteren	afcf5dc6eb	Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine	2019-01-05 10:56:35 +01:00
Cedric Nugteren	560f7a40f6	Added convgemm to the CLBlast database, added initial parameters for Skylake GPU	2018-12-31 19:05:34 +01:00
Cedric Nugteren	d929525039	Added support for the convgemm tuner in the tuner database	2018-12-31 18:49:12 +01:00
Cedric Nugteren	153ac06cf2	Added the forgotten batch dimension to the tuner to get correct kernel executions	2018-12-31 13:19:58 +01:00
Cedric Nugteren	b894993967	Merge pull request #343 from vbkaisetsu/feature/convgemm-single Fix single kernel version of convgemm	2018-12-23 11:11:59 +01:00
Cedric Nugteren	1f41c3c50a	Merge branch 'master' into convolution-fixes-and-tuner	2018-12-22 11:40:19 +01:00
Koichi Akabe	9532f8652c	Update changelog	2018-12-21 11:08:01 +09:00
Koichi Akabe	c0883cf2fe	Update the documentation	2018-12-18 14:08:16 +09:00
Koichi Akabe	a8e6f813dd	Fix the xconvgemm tuner	2018-12-18 14:05:25 +09:00
Cedric Nugteren	1f0cd61824	Added first version of a tuner for the ConvGemm direct kernel	2018-12-18 13:59:26 +09:00
Koichi Akabe	301dc280df	Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel	2018-12-18 13:56:00 +09:00
Cedric Nugteren	9819957768	Merge pull request #342 from vbkaisetsu/fix/im2col-hf-tests Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm	2018-12-17 22:39:53 +01:00
Koichi Akabe	d9db543d75	Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm	2018-12-17 21:57:35 +09:00
Cedric Nugteren	0c9411c844	Updated to version 1.5.0	2018-12-04 20:46:02 +01:00
Cedric Nugteren	09ab5f512f	Updated the roadmap document	2018-12-01 17:20:36 +01:00

1 2 3 4 5 ...

1439 commits