Jerry James
dc82a1fbc8
Use reference types to prevent unnecessary copying
2021-01-20 10:21:36 -07:00
Cedric Nugteren
70016e8698
Updated to version 1.5.2
2021-01-19 21:19:12 +01:00
Cedric Nugteren
0ee39af5ed
Add tuning results for TITAN RTX
2020-10-10 13:01:12 +02:00
Cedric Nugteren
481d86665f
Add tuning results for Radeon RX Vega
2020-10-10 12:56:28 +02:00
Cedric Nugteren
e6e2519eaa
Merge pull request #400 from baryluk/patch-6
...
Allow single graph / subplot on plot
2020-10-05 21:19:34 +02:00
Witold Baryluk
ea199c3469
Allow single graph / subplot on plot
...
`plt.subplots` tries to be special, and return array or not-array depending on a number of subplots.
It is not actually helpful, and IMHO bad design.
Make it always `ndarray`.
The `and not type(axes) is np.ndarray`, is just in case matplotlib decides to make their behavior more uniform. For now work around it.
Also, no need for `ndarray.flat` really.
Confirmed to work with existing benchmarks (i.e. rows=2, cols=3), and with single graphs (rows=1, cols=1).
2020-10-05 12:11:17 +00:00
Cedric Nugteren
3462d7fa85
Merge pull request #399 from baryluk/patch-3
...
Fix a typo in benchmark when running fp 16 vs 32
2020-10-04 16:43:21 +02:00
Witold Baryluk
eb967a0943
Fix a typo in benchmark when running fp 16 vs 32
...
The intention here was to limit the iteration range to common indexes only.
Fix that.
2020-10-04 10:22:00 +00:00
Cedric Nugteren
615e5f0ff2
Merge pull request #397 from baryluk/patch-1
...
Fix Python SyntaxWarning
2020-10-04 11:07:56 +02:00
Cedric Nugteren
cdcfbbc8bc
Merge pull request #398 from baryluk/patch-2
...
Fix --load_from_disk argument help message
2020-10-04 11:07:09 +02:00
Witold Baryluk
2dfe7c5c23
Fix --load_from_disk argument help message
2020-10-04 08:17:16 +00:00
Witold Baryluk
45fd085395
Fix Python SyntaxWarning
...
There is no guarantee that all empty strings objects are the same or share object with `""` literal.
2020-10-04 08:12:50 +00:00
Cedric Nugteren
46fb748a96
Merge pull request #396 from CNugteren/CLBlast-395-fix-benchmark-script
...
Fix a Python 3 bug in the benchmark script
2020-10-03 10:50:43 +02:00
Cedric Nugteren
0abd62a0e7
Fix a Python 3 bug in the benchmark script
2020-10-02 20:32:58 +02:00
Cedric Nugteren
b4cd2b04e9
Added FUNDING.yml file
2020-08-16 10:33:47 +02:00
Cedric Nugteren
41f344d1a6
Merge pull request #392 from 9prady9/fix_Program_getIR
...
Fix Program::GetIR to handle programs with multiple devices
2020-06-07 19:52:49 +02:00
Pradeep Garigipati
dff65e9217
Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG
2020-06-07 21:13:33 +05:30
Pradeep Garigipati
aec71699f8
Fix Program::GetIR to handle programs with multiple devices
2020-06-05 12:00:45 +05:30
Cedric Nugteren
da0e657d39
Merge pull request #389 from CNugteren/CLBlast-385-version-defines
...
Added version number defines
2020-05-13 20:28:58 +02:00
Cedric Nugteren
396ac0278a
Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering
2020-05-12 14:43:25 +02:00
Cedric Nugteren
0826bfe683
Merge pull request #388 from CNugteren/CLBlast-381-gemm-direct-tuner-failure
...
Fixed tuners global workgroup size
2020-05-11 22:39:48 +02:00
Cedric Nugteren
c369cf1a16
Increase display width of the local/global sizes
2020-05-11 20:26:33 +02:00
Cedric Nugteren
4a6c7c37a3
Made sure that the global workgroup size is a multiple of the local size in the tuners
2020-05-10 20:28:23 +02:00
Cedric Nugteren
69a4b4d4b0
Added logging of local/global workgroup sizes when run the tuners
2020-05-10 20:08:28 +02:00
Cedric Nugteren
9abc416785
Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routines
...
PyCLBlast: add missing batched routines
2020-05-10 18:23:41 +02:00
Cedric Nugteren
0870e76fba
Updated PyCLBlast version number
2020-05-10 14:55:03 +02:00
Cedric Nugteren
0b7ce8033c
Added a sample to demonstrate a batched routine
2020-05-10 14:54:50 +02:00
Cedric Nugteren
b94e81af10
Added pyclblast bindings for the 3 batched routines
2020-05-10 12:26:25 +02:00
Cedric Nugteren
5f4b3ffcf7
Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner
...
Move queue creation out of the tuner loop
2020-05-04 20:26:42 +02:00
Cedric Nugteren
bbb2031bf3
Move queue creation out of the tuner loop
2020-05-03 20:30:55 +02:00
Cedric Nugteren
78300ccbea
Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin
...
Change amax/amin behaviour
2020-03-15 11:34:31 +01:00
Cedric Nugteren
5f97d64505
Update API documentation
2020-03-08 11:29:47 +01:00
Cedric Nugteren
b46853660e
Made it more likely (but no guarantees) for amax/amin to return the first index
2020-03-08 11:26:49 +01:00
Cedric Nugteren
7fab29304c
Added sample to play around with XAMAX routine
2020-03-08 11:26:18 +01:00
Cedric Nugteren
e3ce88154a
Silenced a new OpenCL warning message
2020-03-08 10:14:59 +01:00
Cedric Nugteren
8433985051
Updated to version 1.5.1
2020-02-18 10:29:40 +01:00
Cedric Nugteren
bf4e4198b7
Merge pull request #376 from CNugteren/fix_tuner_exception_catching
...
Catches all exceptions of the tuners
2020-02-18 10:23:43 +01:00
Cedric Nugteren
49eb490ee1
Catches all exceptions of the tuners
2020-02-17 22:07:51 +01:00
Cedric Nugteren
8a19667e75
Merge pull request #372 from trantila/master
...
Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-15 09:33:53 +01:00
Tarmo Räntilä
21b66ca761
Reduce TestMatrix calls for xgemmstridedbatched.
...
Replace the looped test by a single one with the offset of the last batch.
2019-12-09 22:17:24 +02:00
Tarmo Räntilä
bf50c4e53e
Reduce TestMatrix calls for xgemmbatched.
...
Replace the looped test by a single one with the maximal found offset.
2019-12-09 22:13:52 +02:00
Cedric Nugteren
6ac74008b6
Added notion of fixes in XhadFaster
2019-09-06 19:33:30 +02:00
Cedric Nugteren
701ac9bf76
Merge pull request #368 from etomzak/master
...
Fix out-of-bounds read/write in XhadFaster
2019-09-06 19:30:52 +02:00
etomzak
9560193a9e
Fix out-of-bounds read/write in XhadFaster
...
Fix an error in XhadFaster where data would be written beyond the end of zgm.
The kernel loop assumed that there was always enough work for each thread to
process WPT items, but this was not enforced. It's possible to detect the
overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be
~500 (much larger than the normal 127).
This commit may improve the performance of XhadFaster, since the kernel was
performing 2x work in some cases (once over real data, once over garbage).
Courtesy of Codeplay Software Ltd.
2019-09-04 12:55:25 +01:00
Cedric Nugteren
ec501055f9
Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin
...
Fixed a bug in the absolute-min index kernel
2019-05-19 22:39:26 +02:00
Cedric Nugteren
3f9d7bca22
Fixed a bug in the absolute-min index kernel
2019-05-19 14:00:18 +02:00
Cedric Nugteren
500d19be4c
Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix
...
intel shuffle extension fix
2019-05-16 20:12:32 +02:00
Cedric Nugteren
af6a9eedd1
Added a function to set the OpenCL kernel standard, either 1.1 or 1.2
2019-05-11 20:39:00 +02:00
Cedric Nugteren
9cbffc9b7c
Changed back to cl_intel_subgroups as suggested
2019-05-08 22:01:56 +02:00
Cedric Nugteren
c5a82f6978
Added a host-code check to make sure the avc_motion_estimation is available
2019-05-07 20:47:50 +02:00