Cedric Nugteren
4a6c7c37a3
Made sure that the global workgroup size is a multiple of the local size in the tuners
2020-05-10 20:28:23 +02:00
Cedric Nugteren
69a4b4d4b0
Added logging of local/global workgroup sizes when run the tuners
2020-05-10 20:08:28 +02:00
Cedric Nugteren
9abc416785
Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routines
...
PyCLBlast: add missing batched routines
2020-05-10 18:23:41 +02:00
Cedric Nugteren
0870e76fba
Updated PyCLBlast version number
2020-05-10 14:55:03 +02:00
Cedric Nugteren
0b7ce8033c
Added a sample to demonstrate a batched routine
2020-05-10 14:54:50 +02:00
Cedric Nugteren
b94e81af10
Added pyclblast bindings for the 3 batched routines
2020-05-10 12:26:25 +02:00
Cedric Nugteren
5f4b3ffcf7
Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner
...
Move queue creation out of the tuner loop
2020-05-04 20:26:42 +02:00
Cedric Nugteren
bbb2031bf3
Move queue creation out of the tuner loop
2020-05-03 20:30:55 +02:00
Cedric Nugteren
78300ccbea
Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin
...
Change amax/amin behaviour
2020-03-15 11:34:31 +01:00
Cedric Nugteren
5f97d64505
Update API documentation
2020-03-08 11:29:47 +01:00
Cedric Nugteren
b46853660e
Made it more likely (but no guarantees) for amax/amin to return the first index
2020-03-08 11:26:49 +01:00
Cedric Nugteren
7fab29304c
Added sample to play around with XAMAX routine
2020-03-08 11:26:18 +01:00
Cedric Nugteren
e3ce88154a
Silenced a new OpenCL warning message
2020-03-08 10:14:59 +01:00
Cedric Nugteren
8433985051
Updated to version 1.5.1
2020-02-18 10:29:40 +01:00
Cedric Nugteren
bf4e4198b7
Merge pull request #376 from CNugteren/fix_tuner_exception_catching
...
Catches all exceptions of the tuners
2020-02-18 10:23:43 +01:00
Cedric Nugteren
49eb490ee1
Catches all exceptions of the tuners
2020-02-17 22:07:51 +01:00
Cedric Nugteren
8a19667e75
Merge pull request #372 from trantila/master
...
Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-15 09:33:53 +01:00
Tarmo Räntilä
21b66ca761
Reduce TestMatrix calls for xgemmstridedbatched.
...
Replace the looped test by a single one with the offset of the last batch.
2019-12-09 22:17:24 +02:00
Tarmo Räntilä
bf50c4e53e
Reduce TestMatrix calls for xgemmbatched.
...
Replace the looped test by a single one with the maximal found offset.
2019-12-09 22:13:52 +02:00
Cedric Nugteren
6ac74008b6
Added notion of fixes in XhadFaster
2019-09-06 19:33:30 +02:00
Cedric Nugteren
701ac9bf76
Merge pull request #368 from etomzak/master
...
Fix out-of-bounds read/write in XhadFaster
2019-09-06 19:30:52 +02:00
etomzak
9560193a9e
Fix out-of-bounds read/write in XhadFaster
...
Fix an error in XhadFaster where data would be written beyond the end of zgm.
The kernel loop assumed that there was always enough work for each thread to
process WPT items, but this was not enforced. It's possible to detect the
overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be
~500 (much larger than the normal 127).
This commit may improve the performance of XhadFaster, since the kernel was
performing 2x work in some cases (once over real data, once over garbage).
Courtesy of Codeplay Software Ltd.
2019-09-04 12:55:25 +01:00
Cedric Nugteren
ec501055f9
Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin
...
Fixed a bug in the absolute-min index kernel
2019-05-19 22:39:26 +02:00
Cedric Nugteren
3f9d7bca22
Fixed a bug in the absolute-min index kernel
2019-05-19 14:00:18 +02:00
Cedric Nugteren
500d19be4c
Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix
...
intel shuffle extension fix
2019-05-16 20:12:32 +02:00
Cedric Nugteren
af6a9eedd1
Added a function to set the OpenCL kernel standard, either 1.1 or 1.2
2019-05-11 20:39:00 +02:00
Cedric Nugteren
9cbffc9b7c
Changed back to cl_intel_subgroups as suggested
2019-05-08 22:01:56 +02:00
Cedric Nugteren
c5a82f6978
Added a host-code check to make sure the avc_motion_estimation is available
2019-05-07 20:47:50 +02:00
Cedric Nugteren
c6ba86cdc3
Enabled avc_motion_estimation extension for Intel subgroup shuffling
2019-05-07 20:47:31 +02:00
Cedric Nugteren
774cebaa40
Merge pull request #356 from umar456/osx_assert
...
Remove assert for extention not available in macOS
2019-05-06 09:58:36 +02:00
Umar Arshad
cf4907942c
Remove assert for extention not available in macOS
...
The cl_nv_device_attribute_query extention is not available on the
Apple platform. This caused failures during debug builds at runtime.
2019-05-03 23:28:07 -04:00
Cedric Nugteren
7084311e45
Added tuning parameters for Tesla P100 16GB
2019-02-09 16:31:48 +01:00
Cedric Nugteren
1035e533cd
Added tuning parameters for Xeon E5-2630 v3 and v4
2019-02-09 16:29:30 +01:00
Cedric Nugteren
eff0f9ad1d
Merge pull request #348 from CNugteren/CLBlast-334-pyclblast-half-precision-support
...
PyCLBlast half precision support
2019-01-26 11:04:14 +01:00
Cedric Nugteren
e0541c41a1
Added fp32 to fp16 conversion function in Python to make haxpy example work
2019-01-23 19:52:01 +01:00
Cedric Nugteren
347f0df32f
Added a (non-working) sample of half precision AXPY in Python
2019-01-22 21:14:43 +01:00
Cedric Nugteren
23b9f655fa
Updated pyclblast README, updated to 1.2.0 for half-precision support
2019-01-22 21:14:02 +01:00
Cedric Nugteren
3937efdcda
Added experimental support for half-precision in pyclblast
2019-01-22 21:13:41 +01:00
Cedric Nugteren
9a9c24e811
Merge pull request #345 from CNugteren/convolution-fixes-and-tuner
...
Convolution with single kernel
2019-01-19 17:56:05 +01:00
Cedric Nugteren
11f4c7dd93
Added documentation on the convgemm routine
2019-01-19 15:44:19 +01:00
Cedric Nugteren
c42e48068b
Added a few more initial Intel tuning parameters for convgemm
2019-01-19 15:32:35 +01:00
Cedric Nugteren
afcf5dc6eb
Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine
2019-01-05 10:56:35 +01:00
Cedric Nugteren
560f7a40f6
Added convgemm to the CLBlast database, added initial parameters for Skylake GPU
2018-12-31 19:05:34 +01:00
Cedric Nugteren
d929525039
Added support for the convgemm tuner in the tuner database
2018-12-31 18:49:12 +01:00
Cedric Nugteren
153ac06cf2
Added the forgotten batch dimension to the tuner to get correct kernel executions
2018-12-31 13:19:58 +01:00
Cedric Nugteren
b894993967
Merge pull request #343 from vbkaisetsu/feature/convgemm-single
...
Fix single kernel version of convgemm
2018-12-23 11:11:59 +01:00
Cedric Nugteren
1f41c3c50a
Merge branch 'master' into convolution-fixes-and-tuner
2018-12-22 11:40:19 +01:00
Koichi Akabe
9532f8652c
Update changelog
2018-12-21 11:08:01 +09:00
Koichi Akabe
c0883cf2fe
Update the documentation
2018-12-18 14:08:16 +09:00
Koichi Akabe
a8e6f813dd
Fix the xconvgemm tuner
2018-12-18 14:05:25 +09:00