Cedric Nugteren
cdcfbbc8bc
Merge pull request #398 from baryluk/patch-2
...
Fix --load_from_disk argument help message
2020-10-04 11:07:09 +02:00
Witold Baryluk
2dfe7c5c23
Fix --load_from_disk argument help message
2020-10-04 08:17:16 +00:00
Witold Baryluk
45fd085395
Fix Python SyntaxWarning
...
There is no guarantee that all empty strings objects are the same or share object with `""` literal.
2020-10-04 08:12:50 +00:00
Cedric Nugteren
46fb748a96
Merge pull request #396 from CNugteren/CLBlast-395-fix-benchmark-script
...
Fix a Python 3 bug in the benchmark script
2020-10-03 10:50:43 +02:00
Cedric Nugteren
0abd62a0e7
Fix a Python 3 bug in the benchmark script
2020-10-02 20:32:58 +02:00
Cedric Nugteren
b4cd2b04e9
Added FUNDING.yml file
2020-08-16 10:33:47 +02:00
Cedric Nugteren
41f344d1a6
Merge pull request #392 from 9prady9/fix_Program_getIR
...
Fix Program::GetIR to handle programs with multiple devices
2020-06-07 19:52:49 +02:00
Pradeep Garigipati
dff65e9217
Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG
2020-06-07 21:13:33 +05:30
Pradeep Garigipati
aec71699f8
Fix Program::GetIR to handle programs with multiple devices
2020-06-05 12:00:45 +05:30
Cedric Nugteren
da0e657d39
Merge pull request #389 from CNugteren/CLBlast-385-version-defines
...
Added version number defines
2020-05-13 20:28:58 +02:00
Cedric Nugteren
396ac0278a
Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering
2020-05-12 14:43:25 +02:00
Cedric Nugteren
0826bfe683
Merge pull request #388 from CNugteren/CLBlast-381-gemm-direct-tuner-failure
...
Fixed tuners global workgroup size
2020-05-11 22:39:48 +02:00
Cedric Nugteren
c369cf1a16
Increase display width of the local/global sizes
2020-05-11 20:26:33 +02:00
Cedric Nugteren
4a6c7c37a3
Made sure that the global workgroup size is a multiple of the local size in the tuners
2020-05-10 20:28:23 +02:00
Cedric Nugteren
69a4b4d4b0
Added logging of local/global workgroup sizes when run the tuners
2020-05-10 20:08:28 +02:00
Cedric Nugteren
9abc416785
Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routines
...
PyCLBlast: add missing batched routines
2020-05-10 18:23:41 +02:00
Cedric Nugteren
0870e76fba
Updated PyCLBlast version number
2020-05-10 14:55:03 +02:00
Cedric Nugteren
0b7ce8033c
Added a sample to demonstrate a batched routine
2020-05-10 14:54:50 +02:00
Cedric Nugteren
b94e81af10
Added pyclblast bindings for the 3 batched routines
2020-05-10 12:26:25 +02:00
Cedric Nugteren
5f4b3ffcf7
Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner
...
Move queue creation out of the tuner loop
2020-05-04 20:26:42 +02:00
Cedric Nugteren
bbb2031bf3
Move queue creation out of the tuner loop
2020-05-03 20:30:55 +02:00
Cedric Nugteren
78300ccbea
Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin
...
Change amax/amin behaviour
2020-03-15 11:34:31 +01:00
Cedric Nugteren
5f97d64505
Update API documentation
2020-03-08 11:29:47 +01:00
Cedric Nugteren
b46853660e
Made it more likely (but no guarantees) for amax/amin to return the first index
2020-03-08 11:26:49 +01:00
Cedric Nugteren
7fab29304c
Added sample to play around with XAMAX routine
2020-03-08 11:26:18 +01:00
Cedric Nugteren
e3ce88154a
Silenced a new OpenCL warning message
2020-03-08 10:14:59 +01:00
Cedric Nugteren
8433985051
Updated to version 1.5.1
2020-02-18 10:29:40 +01:00
Cedric Nugteren
bf4e4198b7
Merge pull request #376 from CNugteren/fix_tuner_exception_catching
...
Catches all exceptions of the tuners
2020-02-18 10:23:43 +01:00
Cedric Nugteren
49eb490ee1
Catches all exceptions of the tuners
2020-02-17 22:07:51 +01:00
Cedric Nugteren
8a19667e75
Merge pull request #372 from trantila/master
...
Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-15 09:33:53 +01:00
Tarmo Räntilä
21b66ca761
Reduce TestMatrix calls for xgemmstridedbatched.
...
Replace the looped test by a single one with the offset of the last batch.
2019-12-09 22:17:24 +02:00
Tarmo Räntilä
bf50c4e53e
Reduce TestMatrix calls for xgemmbatched.
...
Replace the looped test by a single one with the maximal found offset.
2019-12-09 22:13:52 +02:00
Cedric Nugteren
6ac74008b6
Added notion of fixes in XhadFaster
2019-09-06 19:33:30 +02:00
Cedric Nugteren
701ac9bf76
Merge pull request #368 from etomzak/master
...
Fix out-of-bounds read/write in XhadFaster
2019-09-06 19:30:52 +02:00
etomzak
9560193a9e
Fix out-of-bounds read/write in XhadFaster
...
Fix an error in XhadFaster where data would be written beyond the end of zgm.
The kernel loop assumed that there was always enough work for each thread to
process WPT items, but this was not enforced. It's possible to detect the
overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be
~500 (much larger than the normal 127).
This commit may improve the performance of XhadFaster, since the kernel was
performing 2x work in some cases (once over real data, once over garbage).
Courtesy of Codeplay Software Ltd.
2019-09-04 12:55:25 +01:00
Cedric Nugteren
ec501055f9
Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin
...
Fixed a bug in the absolute-min index kernel
2019-05-19 22:39:26 +02:00
Cedric Nugteren
3f9d7bca22
Fixed a bug in the absolute-min index kernel
2019-05-19 14:00:18 +02:00
Cedric Nugteren
500d19be4c
Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix
...
intel shuffle extension fix
2019-05-16 20:12:32 +02:00
Cedric Nugteren
af6a9eedd1
Added a function to set the OpenCL kernel standard, either 1.1 or 1.2
2019-05-11 20:39:00 +02:00
Cedric Nugteren
9cbffc9b7c
Changed back to cl_intel_subgroups as suggested
2019-05-08 22:01:56 +02:00
Cedric Nugteren
c5a82f6978
Added a host-code check to make sure the avc_motion_estimation is available
2019-05-07 20:47:50 +02:00
Cedric Nugteren
c6ba86cdc3
Enabled avc_motion_estimation extension for Intel subgroup shuffling
2019-05-07 20:47:31 +02:00
Cedric Nugteren
774cebaa40
Merge pull request #356 from umar456/osx_assert
...
Remove assert for extention not available in macOS
2019-05-06 09:58:36 +02:00
Umar Arshad
cf4907942c
Remove assert for extention not available in macOS
...
The cl_nv_device_attribute_query extention is not available on the
Apple platform. This caused failures during debug builds at runtime.
2019-05-03 23:28:07 -04:00
Cedric Nugteren
7084311e45
Added tuning parameters for Tesla P100 16GB
2019-02-09 16:31:48 +01:00
Cedric Nugteren
1035e533cd
Added tuning parameters for Xeon E5-2630 v3 and v4
2019-02-09 16:29:30 +01:00
Cedric Nugteren
eff0f9ad1d
Merge pull request #348 from CNugteren/CLBlast-334-pyclblast-half-precision-support
...
PyCLBlast half precision support
2019-01-26 11:04:14 +01:00
Cedric Nugteren
e0541c41a1
Added fp32 to fp16 conversion function in Python to make haxpy example work
2019-01-23 19:52:01 +01:00
Cedric Nugteren
347f0df32f
Added a (non-working) sample of half precision AXPY in Python
2019-01-22 21:14:43 +01:00
Cedric Nugteren
23b9f655fa
Updated pyclblast README, updated to 1.2.0 for half-precision support
2019-01-22 21:14:02 +01:00