CLBlast

mirror of https://github.com/CNugteren/CLBlast.git synced 2024-08-21 04:22:27 +02:00

Author	SHA1	Message	Date
Cedric Nugteren	c5a82f6978	Added a host-code check to make sure the avc_motion_estimation is available	2019-05-07 20:47:50 +02:00
Cedric Nugteren	c6ba86cdc3	Enabled avc_motion_estimation extension for Intel subgroup shuffling	2019-05-07 20:47:31 +02:00
Cedric Nugteren	774cebaa40	Merge pull request #356 from umar456/osx_assert Remove assert for extention not available in macOS	2019-05-06 09:58:36 +02:00
Umar Arshad	cf4907942c	Remove assert for extention not available in macOS The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime.	2019-05-03 23:28:07 -04:00
Cedric Nugteren	7084311e45	Added tuning parameters for Tesla P100 16GB	2019-02-09 16:31:48 +01:00
Cedric Nugteren	1035e533cd	Added tuning parameters for Xeon E5-2630 v3 and v4	2019-02-09 16:29:30 +01:00
Cedric Nugteren	eff0f9ad1d	Merge pull request #348 from CNugteren/CLBlast-334-pyclblast-half-precision-support PyCLBlast half precision support	2019-01-26 11:04:14 +01:00
Cedric Nugteren	e0541c41a1	Added fp32 to fp16 conversion function in Python to make haxpy example work	2019-01-23 19:52:01 +01:00
Cedric Nugteren	347f0df32f	Added a (non-working) sample of half precision AXPY in Python	2019-01-22 21:14:43 +01:00
Cedric Nugteren	23b9f655fa	Updated pyclblast README, updated to 1.2.0 for half-precision support	2019-01-22 21:14:02 +01:00
Cedric Nugteren	3937efdcda	Added experimental support for half-precision in pyclblast	2019-01-22 21:13:41 +01:00
Cedric Nugteren	9a9c24e811	Merge pull request #345 from CNugteren/convolution-fixes-and-tuner Convolution with single kernel	2019-01-19 17:56:05 +01:00
Cedric Nugteren	11f4c7dd93	Added documentation on the convgemm routine	2019-01-19 15:44:19 +01:00
Cedric Nugteren	c42e48068b	Added a few more initial Intel tuning parameters for convgemm	2019-01-19 15:32:35 +01:00
Cedric Nugteren	afcf5dc6eb	Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine	2019-01-05 10:56:35 +01:00
Cedric Nugteren	560f7a40f6	Added convgemm to the CLBlast database, added initial parameters for Skylake GPU	2018-12-31 19:05:34 +01:00
Cedric Nugteren	d929525039	Added support for the convgemm tuner in the tuner database	2018-12-31 18:49:12 +01:00
Cedric Nugteren	153ac06cf2	Added the forgotten batch dimension to the tuner to get correct kernel executions	2018-12-31 13:19:58 +01:00
Cedric Nugteren	b894993967	Merge pull request #343 from vbkaisetsu/feature/convgemm-single Fix single kernel version of convgemm	2018-12-23 11:11:59 +01:00
Cedric Nugteren	1f41c3c50a	Merge branch 'master' into convolution-fixes-and-tuner	2018-12-22 11:40:19 +01:00
Koichi Akabe	9532f8652c	Update changelog	2018-12-21 11:08:01 +09:00
Koichi Akabe	c0883cf2fe	Update the documentation	2018-12-18 14:08:16 +09:00
Koichi Akabe	a8e6f813dd	Fix the xconvgemm tuner	2018-12-18 14:05:25 +09:00
Cedric Nugteren	1f0cd61824	Added first version of a tuner for the ConvGemm direct kernel	2018-12-18 13:59:26 +09:00
Koichi Akabe	301dc280df	Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel	2018-12-18 13:56:00 +09:00
Cedric Nugteren	9819957768	Merge pull request #342 from vbkaisetsu/fix/im2col-hf-tests Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm	2018-12-17 22:39:53 +01:00
Koichi Akabe	d9db543d75	Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm	2018-12-17 21:57:35 +09:00
Cedric Nugteren	0c9411c844	Updated to version 1.5.0	2018-12-04 20:46:02 +01:00
Cedric Nugteren	09ab5f512f	Updated the roadmap document	2018-12-01 17:20:36 +01:00
Cedric Nugteren	4676ec2921	Added a FAQ document	2018-12-01 17:19:28 +01:00
Cedric Nugteren	cec021ac34	Merge pull request #341 from CNugteren/CLBlast-340-GEMMK1-issue-with-unequal-MWG-NWG Fixed an issue for the GEMMK == 1 kernel	2018-12-01 17:14:47 +01:00
Cedric Nugteren	c0e41b87cb	Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel	2018-11-30 20:23:26 +01:00
Cedric Nugteren	bca1506e87	Merge pull request #335 from vbkaisetsu/patch-1 Remove unnecessary qualifier of inline function	2018-11-19 21:03:27 +01:00
Koichi Akabe	a646d6ca46	Remove unnecessary attribute of inline function	2018-11-19 13:03:50 +09:00
Cedric Nugteren	e0ddfbfa3b	Merge pull request #332 from vbkaisetsu/feature/im2col-col2im-flip Add im2colflip and col2imflip functions	2018-11-17 20:51:11 +01:00
Koichi Akabe	032e3b0cc0	Add kernel_mode option to im2col, col2im, and convgemm functions	2018-11-12 10:12:07 +09:00
Cedric Nugteren	90112618da	Merge pull request #331 from CNugteren/CLBlast-270-col2im Implements col2im routine	2018-11-09 08:06:13 +01:00
Cedric Nugteren	6f67525ea6	Changed col2im to append to the existing im-buffer	2018-11-07 19:45:07 +01:00
Cedric Nugteren	2d32a23293	Added new col2im routine to the documentation	2018-11-01 21:46:19 +01:00
Cedric Nugteren	469c346a8e	Fixed half-precision tests for im2col and col2im	2018-11-01 21:44:21 +01:00
Cedric Nugteren	4215bbe62a	Merge pull request #330 from vbkaisetsu/CLBlast-270-col2im Add col2im function	2018-10-31 10:37:21 +01:00
Koichi Akabe	0b3d04f709	Fix col2im implementation	2018-10-30 14:54:55 +09:00
Cedric Nugteren	441373c8fd	Merge pull request #329 from tholu/patch-1 Update FindOpenCL.cmake	2018-10-29 20:06:01 +01:00
Thomas Lutz	17d045cc41	Update FindOpenCL.cmake Add path to ROCm OpenCL as possible location in cmake script	2018-10-28 21:52:25 +01:00
Cedric Nugteren	d45911b61d	Added groundwork for col2im algorithm plus first non-working version of kernel and test	2018-10-23 20:52:25 +02:00
Cedric Nugteren	44b630fc22	Some name changes in im2col code	2018-10-22 22:12:58 +02:00
Cedric Nugteren	ab0178c56b	Fixed MSVC's compilation error C1061 due to too many for-loops	2018-10-17 21:35:09 +02:00
Cedric Nugteren	9a1454496d	Fixed a bug with the pre-processing and the AXPY kernel	2018-10-17 21:15:53 +02:00
Cedric Nugteren	e33542acdd	Merge pull request #325 from CNugteren/CLBlast-321-axpy-faster-kernel-bug Fixed a bug in the XaxpyFaster kernel for specific parameters	2018-10-16 21:06:57 +02:00
Cedric Nugteren	664a238adf	Fixed a bug in the XaxpyFaster kernel for specific parameters	2018-10-15 20:08:29 +02:00

1 2 3 4 5 ...

1318 commits