etomzak
9560193a9e
Fix out-of-bounds read/write in XhadFaster
...
Fix an error in XhadFaster where data would be written beyond the end of zgm.
The kernel loop assumed that there was always enough work for each thread to
process WPT items, but this was not enforced. It's possible to detect the
overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be
~500 (much larger than the normal 127).
This commit may improve the performance of XhadFaster, since the kernel was
performing 2x work in some cases (once over real data, once over garbage).
Courtesy of Codeplay Software Ltd.
2019-09-04 12:55:25 +01:00
Cedric Nugteren
ec501055f9
Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin
...
Fixed a bug in the absolute-min index kernel
2019-05-19 22:39:26 +02:00
Cedric Nugteren
3f9d7bca22
Fixed a bug in the absolute-min index kernel
2019-05-19 14:00:18 +02:00
Cedric Nugteren
500d19be4c
Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix
...
intel shuffle extension fix
2019-05-16 20:12:32 +02:00
Cedric Nugteren
af6a9eedd1
Added a function to set the OpenCL kernel standard, either 1.1 or 1.2
2019-05-11 20:39:00 +02:00
Cedric Nugteren
9cbffc9b7c
Changed back to cl_intel_subgroups as suggested
2019-05-08 22:01:56 +02:00
Cedric Nugteren
c5a82f6978
Added a host-code check to make sure the avc_motion_estimation is available
2019-05-07 20:47:50 +02:00
Cedric Nugteren
c6ba86cdc3
Enabled avc_motion_estimation extension for Intel subgroup shuffling
2019-05-07 20:47:31 +02:00
Cedric Nugteren
774cebaa40
Merge pull request #356 from umar456/osx_assert
...
Remove assert for extention not available in macOS
2019-05-06 09:58:36 +02:00
Umar Arshad
cf4907942c
Remove assert for extention not available in macOS
...
The cl_nv_device_attribute_query extention is not available on the
Apple platform. This caused failures during debug builds at runtime.
2019-05-03 23:28:07 -04:00
Cedric Nugteren
7084311e45
Added tuning parameters for Tesla P100 16GB
2019-02-09 16:31:48 +01:00
Cedric Nugteren
1035e533cd
Added tuning parameters for Xeon E5-2630 v3 and v4
2019-02-09 16:29:30 +01:00
Cedric Nugteren
eff0f9ad1d
Merge pull request #348 from CNugteren/CLBlast-334-pyclblast-half-precision-support
...
PyCLBlast half precision support
2019-01-26 11:04:14 +01:00
Cedric Nugteren
e0541c41a1
Added fp32 to fp16 conversion function in Python to make haxpy example work
2019-01-23 19:52:01 +01:00
Cedric Nugteren
347f0df32f
Added a (non-working) sample of half precision AXPY in Python
2019-01-22 21:14:43 +01:00
Cedric Nugteren
23b9f655fa
Updated pyclblast README, updated to 1.2.0 for half-precision support
2019-01-22 21:14:02 +01:00
Cedric Nugteren
3937efdcda
Added experimental support for half-precision in pyclblast
2019-01-22 21:13:41 +01:00
Cedric Nugteren
9a9c24e811
Merge pull request #345 from CNugteren/convolution-fixes-and-tuner
...
Convolution with single kernel
2019-01-19 17:56:05 +01:00
Cedric Nugteren
11f4c7dd93
Added documentation on the convgemm routine
2019-01-19 15:44:19 +01:00
Cedric Nugteren
c42e48068b
Added a few more initial Intel tuning parameters for convgemm
2019-01-19 15:32:35 +01:00
Cedric Nugteren
afcf5dc6eb
Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine
2019-01-05 10:56:35 +01:00
Cedric Nugteren
560f7a40f6
Added convgemm to the CLBlast database, added initial parameters for Skylake GPU
2018-12-31 19:05:34 +01:00
Cedric Nugteren
d929525039
Added support for the convgemm tuner in the tuner database
2018-12-31 18:49:12 +01:00
Cedric Nugteren
153ac06cf2
Added the forgotten batch dimension to the tuner to get correct kernel executions
2018-12-31 13:19:58 +01:00
Cedric Nugteren
b894993967
Merge pull request #343 from vbkaisetsu/feature/convgemm-single
...
Fix single kernel version of convgemm
2018-12-23 11:11:59 +01:00
Cedric Nugteren
1f41c3c50a
Merge branch 'master' into convolution-fixes-and-tuner
2018-12-22 11:40:19 +01:00
Koichi Akabe
9532f8652c
Update changelog
2018-12-21 11:08:01 +09:00
Koichi Akabe
c0883cf2fe
Update the documentation
2018-12-18 14:08:16 +09:00
Koichi Akabe
a8e6f813dd
Fix the xconvgemm tuner
2018-12-18 14:05:25 +09:00
Cedric Nugteren
1f0cd61824
Added first version of a tuner for the ConvGemm direct kernel
2018-12-18 13:59:26 +09:00
Koichi Akabe
301dc280df
Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel
2018-12-18 13:56:00 +09:00
Cedric Nugteren
9819957768
Merge pull request #342 from vbkaisetsu/fix/im2col-hf-tests
...
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
2018-12-17 22:39:53 +01:00
Koichi Akabe
d9db543d75
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
2018-12-17 21:57:35 +09:00
Cedric Nugteren
0c9411c844
Updated to version 1.5.0
2018-12-04 20:46:02 +01:00
Cedric Nugteren
09ab5f512f
Updated the roadmap document
2018-12-01 17:20:36 +01:00
Cedric Nugteren
4676ec2921
Added a FAQ document
2018-12-01 17:19:28 +01:00
Cedric Nugteren
cec021ac34
Merge pull request #341 from CNugteren/CLBlast-340-GEMMK1-issue-with-unequal-MWG-NWG
...
Fixed an issue for the GEMMK == 1 kernel
2018-12-01 17:14:47 +01:00
Cedric Nugteren
c0e41b87cb
Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel
2018-11-30 20:23:26 +01:00
Cedric Nugteren
bca1506e87
Merge pull request #335 from vbkaisetsu/patch-1
...
Remove unnecessary qualifier of inline function
2018-11-19 21:03:27 +01:00
Koichi Akabe
a646d6ca46
Remove unnecessary attribute of inline function
2018-11-19 13:03:50 +09:00
Cedric Nugteren
e0ddfbfa3b
Merge pull request #332 from vbkaisetsu/feature/im2col-col2im-flip
...
Add im2colflip and col2imflip functions
2018-11-17 20:51:11 +01:00
Koichi Akabe
032e3b0cc0
Add kernel_mode option to im2col, col2im, and convgemm functions
2018-11-12 10:12:07 +09:00
Cedric Nugteren
90112618da
Merge pull request #331 from CNugteren/CLBlast-270-col2im
...
Implements col2im routine
2018-11-09 08:06:13 +01:00
Cedric Nugteren
6f67525ea6
Changed col2im to append to the existing im-buffer
2018-11-07 19:45:07 +01:00
Cedric Nugteren
2d32a23293
Added new col2im routine to the documentation
2018-11-01 21:46:19 +01:00
Cedric Nugteren
469c346a8e
Fixed half-precision tests for im2col and col2im
2018-11-01 21:44:21 +01:00
Cedric Nugteren
4215bbe62a
Merge pull request #330 from vbkaisetsu/CLBlast-270-col2im
...
Add col2im function
2018-10-31 10:37:21 +01:00
Koichi Akabe
0b3d04f709
Fix col2im implementation
2018-10-30 14:54:55 +09:00
Cedric Nugteren
441373c8fd
Merge pull request #329 from tholu/patch-1
...
Update FindOpenCL.cmake
2018-10-29 20:06:01 +01:00
Thomas Lutz
17d045cc41
Update FindOpenCL.cmake
...
Add path to ROCm OpenCL as possible location in cmake script
2018-10-28 21:52:25 +01:00