Commit graph

1329 commits

Author SHA1 Message Date
Cedric Nugteren 8a19667e75
Merge pull request #372 from trantila/master
Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-15 09:33:53 +01:00
Tarmo Räntilä 21b66ca761 Reduce TestMatrix calls for xgemmstridedbatched.
Replace the looped test by a single one with the offset of the last batch.
2019-12-09 22:17:24 +02:00
Tarmo Räntilä bf50c4e53e Reduce TestMatrix calls for xgemmbatched.
Replace the looped test by a single one with the maximal found offset.
2019-12-09 22:13:52 +02:00
Cedric Nugteren 6ac74008b6
Added notion of fixes in XhadFaster 2019-09-06 19:33:30 +02:00
Cedric Nugteren 701ac9bf76
Merge pull request #368 from etomzak/master
Fix out-of-bounds read/write in XhadFaster
2019-09-06 19:30:52 +02:00
etomzak 9560193a9e Fix out-of-bounds read/write in XhadFaster
Fix an error in XhadFaster where data would be written beyond the end of zgm.
The kernel loop assumed that there was always enough work for each thread to
process WPT items, but this was not enforced. It's possible to detect the
overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be
~500 (much larger than the normal 127).

This commit may improve the performance of XhadFaster, since the kernel was
performing 2x work in some cases (once over real data, once over garbage).

Courtesy of Codeplay Software Ltd.
2019-09-04 12:55:25 +01:00
Cedric Nugteren ec501055f9
Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin
Fixed a bug in the absolute-min index kernel
2019-05-19 22:39:26 +02:00
Cedric Nugteren 3f9d7bca22 Fixed a bug in the absolute-min index kernel 2019-05-19 14:00:18 +02:00
Cedric Nugteren 500d19be4c
Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix
intel shuffle extension fix
2019-05-16 20:12:32 +02:00
Cedric Nugteren af6a9eedd1 Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 2019-05-11 20:39:00 +02:00
Cedric Nugteren 9cbffc9b7c Changed back to cl_intel_subgroups as suggested 2019-05-08 22:01:56 +02:00
Cedric Nugteren c5a82f6978 Added a host-code check to make sure the avc_motion_estimation is available 2019-05-07 20:47:50 +02:00
Cedric Nugteren c6ba86cdc3 Enabled avc_motion_estimation extension for Intel subgroup shuffling 2019-05-07 20:47:31 +02:00
Cedric Nugteren 774cebaa40
Merge pull request #356 from umar456/osx_assert
Remove assert for extention not available in macOS
2019-05-06 09:58:36 +02:00
Umar Arshad cf4907942c Remove assert for extention not available in macOS
The cl_nv_device_attribute_query extention is not available on the
Apple platform. This caused failures during debug builds at runtime.
2019-05-03 23:28:07 -04:00
Cedric Nugteren 7084311e45 Added tuning parameters for Tesla P100 16GB 2019-02-09 16:31:48 +01:00
Cedric Nugteren 1035e533cd Added tuning parameters for Xeon E5-2630 v3 and v4 2019-02-09 16:29:30 +01:00
Cedric Nugteren eff0f9ad1d
Merge pull request #348 from CNugteren/CLBlast-334-pyclblast-half-precision-support
PyCLBlast half precision support
2019-01-26 11:04:14 +01:00
Cedric Nugteren e0541c41a1 Added fp32 to fp16 conversion function in Python to make haxpy example work 2019-01-23 19:52:01 +01:00
Cedric Nugteren 347f0df32f Added a (non-working) sample of half precision AXPY in Python 2019-01-22 21:14:43 +01:00
Cedric Nugteren 23b9f655fa Updated pyclblast README, updated to 1.2.0 for half-precision support 2019-01-22 21:14:02 +01:00
Cedric Nugteren 3937efdcda Added experimental support for half-precision in pyclblast 2019-01-22 21:13:41 +01:00
Cedric Nugteren 9a9c24e811
Merge pull request #345 from CNugteren/convolution-fixes-and-tuner
Convolution with single kernel
2019-01-19 17:56:05 +01:00
Cedric Nugteren 11f4c7dd93 Added documentation on the convgemm routine 2019-01-19 15:44:19 +01:00
Cedric Nugteren c42e48068b Added a few more initial Intel tuning parameters for convgemm 2019-01-19 15:32:35 +01:00
Cedric Nugteren afcf5dc6eb Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine 2019-01-05 10:56:35 +01:00
Cedric Nugteren 560f7a40f6 Added convgemm to the CLBlast database, added initial parameters for Skylake GPU 2018-12-31 19:05:34 +01:00
Cedric Nugteren d929525039 Added support for the convgemm tuner in the tuner database 2018-12-31 18:49:12 +01:00
Cedric Nugteren 153ac06cf2 Added the forgotten batch dimension to the tuner to get correct kernel executions 2018-12-31 13:19:58 +01:00
Cedric Nugteren b894993967
Merge pull request #343 from vbkaisetsu/feature/convgemm-single
Fix single kernel version of convgemm
2018-12-23 11:11:59 +01:00
Cedric Nugteren 1f41c3c50a Merge branch 'master' into convolution-fixes-and-tuner 2018-12-22 11:40:19 +01:00
Koichi Akabe 9532f8652c Update changelog 2018-12-21 11:08:01 +09:00
Koichi Akabe c0883cf2fe Update the documentation 2018-12-18 14:08:16 +09:00
Koichi Akabe a8e6f813dd Fix the xconvgemm tuner 2018-12-18 14:05:25 +09:00
Cedric Nugteren 1f0cd61824 Added first version of a tuner for the ConvGemm direct kernel 2018-12-18 13:59:26 +09:00
Koichi Akabe 301dc280df Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel 2018-12-18 13:56:00 +09:00
Cedric Nugteren 9819957768
Merge pull request #342 from vbkaisetsu/fix/im2col-hf-tests
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
2018-12-17 22:39:53 +01:00
Koichi Akabe d9db543d75 Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm 2018-12-17 21:57:35 +09:00
Cedric Nugteren 0c9411c844 Updated to version 1.5.0 2018-12-04 20:46:02 +01:00
Cedric Nugteren 09ab5f512f Updated the roadmap document 2018-12-01 17:20:36 +01:00
Cedric Nugteren 4676ec2921 Added a FAQ document 2018-12-01 17:19:28 +01:00
Cedric Nugteren cec021ac34
Merge pull request #341 from CNugteren/CLBlast-340-GEMMK1-issue-with-unequal-MWG-NWG
Fixed an issue for the GEMMK == 1 kernel
2018-12-01 17:14:47 +01:00
Cedric Nugteren c0e41b87cb Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel 2018-11-30 20:23:26 +01:00
Cedric Nugteren bca1506e87
Merge pull request #335 from vbkaisetsu/patch-1
Remove unnecessary qualifier of inline function
2018-11-19 21:03:27 +01:00
Koichi Akabe a646d6ca46
Remove unnecessary attribute of inline function 2018-11-19 13:03:50 +09:00
Cedric Nugteren e0ddfbfa3b
Merge pull request #332 from vbkaisetsu/feature/im2col-col2im-flip
Add im2colflip and col2imflip functions
2018-11-17 20:51:11 +01:00
Koichi Akabe 032e3b0cc0 Add kernel_mode option to im2col, col2im, and convgemm functions 2018-11-12 10:12:07 +09:00
Cedric Nugteren 90112618da
Merge pull request #331 from CNugteren/CLBlast-270-col2im
Implements col2im routine
2018-11-09 08:06:13 +01:00
Cedric Nugteren 6f67525ea6 Changed col2im to append to the existing im-buffer 2018-11-07 19:45:07 +01:00
Cedric Nugteren 2d32a23293 Added new col2im routine to the documentation 2018-11-01 21:46:19 +01:00