Cedric Nugteren
ce44c3adb5
Merge pull request #414 from CNugteren/CLBlast-412-python-runtime-libs-fix
...
Fix Windows paths in pyclblast
2021-02-06 13:30:24 +01:00
Cedric Nugteren
1fa0930d85
Fix Windows paths in pyclblast
2021-02-05 21:52:23 +01:00
Cedric Nugteren
fe93153404
Merge pull request #413 from CNugteren/CLBlast-412-python-runtime-libs
...
Add library dir on Linux for pyclblast
2021-02-04 20:45:40 +01:00
Cedric Nugteren
d57f8065ea
Added second Windows library path
2021-02-04 20:13:02 +01:00
Cedric Nugteren
c78c649844
Add library path for Windows as well
2021-01-30 14:28:11 +01:00
Cedric Nugteren
bbcb357a71
Add library dir on Linux for pyclblast
2021-01-29 20:48:05 +01:00
Cedric Nugteren
07837a5c2d
Update pyclblast package version number
2021-01-21 20:49:31 +01:00
Cedric Nugteren
a5ef06ec57
Merge pull request #410 from jamesjer/master
...
Use reference types to prevent unnecessary copying
2021-01-21 19:56:18 +01:00
Jerry James
dc82a1fbc8
Use reference types to prevent unnecessary copying
2021-01-20 10:21:36 -07:00
Cedric Nugteren
70016e8698
Updated to version 1.5.2
2021-01-19 21:19:12 +01:00
Cedric Nugteren
0ee39af5ed
Add tuning results for TITAN RTX
2020-10-10 13:01:12 +02:00
Cedric Nugteren
481d86665f
Add tuning results for Radeon RX Vega
2020-10-10 12:56:28 +02:00
Cedric Nugteren
e6e2519eaa
Merge pull request #400 from baryluk/patch-6
...
Allow single graph / subplot on plot
2020-10-05 21:19:34 +02:00
Witold Baryluk
ea199c3469
Allow single graph / subplot on plot
...
`plt.subplots` tries to be special, and return array or not-array depending on a number of subplots.
It is not actually helpful, and IMHO bad design.
Make it always `ndarray`.
The `and not type(axes) is np.ndarray`, is just in case matplotlib decides to make their behavior more uniform. For now work around it.
Also, no need for `ndarray.flat` really.
Confirmed to work with existing benchmarks (i.e. rows=2, cols=3), and with single graphs (rows=1, cols=1).
2020-10-05 12:11:17 +00:00
Cedric Nugteren
3462d7fa85
Merge pull request #399 from baryluk/patch-3
...
Fix a typo in benchmark when running fp 16 vs 32
2020-10-04 16:43:21 +02:00
Witold Baryluk
eb967a0943
Fix a typo in benchmark when running fp 16 vs 32
...
The intention here was to limit the iteration range to common indexes only.
Fix that.
2020-10-04 10:22:00 +00:00
Cedric Nugteren
615e5f0ff2
Merge pull request #397 from baryluk/patch-1
...
Fix Python SyntaxWarning
2020-10-04 11:07:56 +02:00
Cedric Nugteren
cdcfbbc8bc
Merge pull request #398 from baryluk/patch-2
...
Fix --load_from_disk argument help message
2020-10-04 11:07:09 +02:00
Witold Baryluk
2dfe7c5c23
Fix --load_from_disk argument help message
2020-10-04 08:17:16 +00:00
Witold Baryluk
45fd085395
Fix Python SyntaxWarning
...
There is no guarantee that all empty strings objects are the same or share object with `""` literal.
2020-10-04 08:12:50 +00:00
Cedric Nugteren
46fb748a96
Merge pull request #396 from CNugteren/CLBlast-395-fix-benchmark-script
...
Fix a Python 3 bug in the benchmark script
2020-10-03 10:50:43 +02:00
Cedric Nugteren
0abd62a0e7
Fix a Python 3 bug in the benchmark script
2020-10-02 20:32:58 +02:00
Cedric Nugteren
b4cd2b04e9
Added FUNDING.yml file
2020-08-16 10:33:47 +02:00
Cedric Nugteren
41f344d1a6
Merge pull request #392 from 9prady9/fix_Program_getIR
...
Fix Program::GetIR to handle programs with multiple devices
2020-06-07 19:52:49 +02:00
Pradeep Garigipati
dff65e9217
Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG
2020-06-07 21:13:33 +05:30
Pradeep Garigipati
aec71699f8
Fix Program::GetIR to handle programs with multiple devices
2020-06-05 12:00:45 +05:30
Cedric Nugteren
da0e657d39
Merge pull request #389 from CNugteren/CLBlast-385-version-defines
...
Added version number defines
2020-05-13 20:28:58 +02:00
Cedric Nugteren
396ac0278a
Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering
2020-05-12 14:43:25 +02:00
Cedric Nugteren
0826bfe683
Merge pull request #388 from CNugteren/CLBlast-381-gemm-direct-tuner-failure
...
Fixed tuners global workgroup size
2020-05-11 22:39:48 +02:00
Cedric Nugteren
c369cf1a16
Increase display width of the local/global sizes
2020-05-11 20:26:33 +02:00
Cedric Nugteren
4a6c7c37a3
Made sure that the global workgroup size is a multiple of the local size in the tuners
2020-05-10 20:28:23 +02:00
Cedric Nugteren
69a4b4d4b0
Added logging of local/global workgroup sizes when run the tuners
2020-05-10 20:08:28 +02:00
Cedric Nugteren
9abc416785
Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routines
...
PyCLBlast: add missing batched routines
2020-05-10 18:23:41 +02:00
Cedric Nugteren
0870e76fba
Updated PyCLBlast version number
2020-05-10 14:55:03 +02:00
Cedric Nugteren
0b7ce8033c
Added a sample to demonstrate a batched routine
2020-05-10 14:54:50 +02:00
Cedric Nugteren
b94e81af10
Added pyclblast bindings for the 3 batched routines
2020-05-10 12:26:25 +02:00
Cedric Nugteren
5f4b3ffcf7
Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner
...
Move queue creation out of the tuner loop
2020-05-04 20:26:42 +02:00
Cedric Nugteren
bbb2031bf3
Move queue creation out of the tuner loop
2020-05-03 20:30:55 +02:00
Cedric Nugteren
78300ccbea
Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin
...
Change amax/amin behaviour
2020-03-15 11:34:31 +01:00
Cedric Nugteren
5f97d64505
Update API documentation
2020-03-08 11:29:47 +01:00
Cedric Nugteren
b46853660e
Made it more likely (but no guarantees) for amax/amin to return the first index
2020-03-08 11:26:49 +01:00
Cedric Nugteren
7fab29304c
Added sample to play around with XAMAX routine
2020-03-08 11:26:18 +01:00
Cedric Nugteren
e3ce88154a
Silenced a new OpenCL warning message
2020-03-08 10:14:59 +01:00
Cedric Nugteren
8433985051
Updated to version 1.5.1
2020-02-18 10:29:40 +01:00
Cedric Nugteren
bf4e4198b7
Merge pull request #376 from CNugteren/fix_tuner_exception_catching
...
Catches all exceptions of the tuners
2020-02-18 10:23:43 +01:00
Cedric Nugteren
49eb490ee1
Catches all exceptions of the tuners
2020-02-17 22:07:51 +01:00
Cedric Nugteren
8a19667e75
Merge pull request #372 from trantila/master
...
Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-15 09:33:53 +01:00
Tarmo Räntilä
21b66ca761
Reduce TestMatrix calls for xgemmstridedbatched.
...
Replace the looped test by a single one with the offset of the last batch.
2019-12-09 22:17:24 +02:00
Tarmo Räntilä
bf50c4e53e
Reduce TestMatrix calls for xgemmbatched.
...
Replace the looped test by a single one with the maximal found offset.
2019-12-09 22:13:52 +02:00
Cedric Nugteren
6ac74008b6
Added notion of fixes in XhadFaster
2019-09-06 19:33:30 +02:00