Commit graph

1375 commits

Author SHA1 Message Date
Cedric Nugteren ce44c3adb5
Merge pull request #414 from CNugteren/CLBlast-412-python-runtime-libs-fix
Fix Windows paths in pyclblast
2021-02-06 13:30:24 +01:00
Cedric Nugteren 1fa0930d85 Fix Windows paths in pyclblast 2021-02-05 21:52:23 +01:00
Cedric Nugteren fe93153404
Merge pull request #413 from CNugteren/CLBlast-412-python-runtime-libs
Add library dir on Linux for pyclblast
2021-02-04 20:45:40 +01:00
Cedric Nugteren d57f8065ea Added second Windows library path 2021-02-04 20:13:02 +01:00
Cedric Nugteren c78c649844 Add library path for Windows as well 2021-01-30 14:28:11 +01:00
Cedric Nugteren bbcb357a71 Add library dir on Linux for pyclblast 2021-01-29 20:48:05 +01:00
Cedric Nugteren 07837a5c2d Update pyclblast package version number 2021-01-21 20:49:31 +01:00
Cedric Nugteren a5ef06ec57
Merge pull request #410 from jamesjer/master
Use reference types to prevent unnecessary copying
2021-01-21 19:56:18 +01:00
Jerry James dc82a1fbc8 Use reference types to prevent unnecessary copying 2021-01-20 10:21:36 -07:00
Cedric Nugteren 70016e8698 Updated to version 1.5.2 2021-01-19 21:19:12 +01:00
Cedric Nugteren 0ee39af5ed Add tuning results for TITAN RTX 2020-10-10 13:01:12 +02:00
Cedric Nugteren 481d86665f Add tuning results for Radeon RX Vega 2020-10-10 12:56:28 +02:00
Cedric Nugteren e6e2519eaa
Merge pull request #400 from baryluk/patch-6
Allow single graph / subplot on plot
2020-10-05 21:19:34 +02:00
Witold Baryluk ea199c3469
Allow single graph / subplot on plot
`plt.subplots` tries to be special, and return array or not-array depending on a number of subplots.

It is not actually helpful, and IMHO bad design.

Make it always `ndarray`.

The `and not type(axes) is np.ndarray`, is just in case matplotlib decides to make their behavior more uniform. For now work around it.

Also, no need for `ndarray.flat` really.

Confirmed to work with existing benchmarks (i.e. rows=2, cols=3), and with single graphs (rows=1, cols=1).
2020-10-05 12:11:17 +00:00
Cedric Nugteren 3462d7fa85
Merge pull request #399 from baryluk/patch-3
Fix a typo in benchmark when running fp 16 vs 32
2020-10-04 16:43:21 +02:00
Witold Baryluk eb967a0943
Fix a typo in benchmark when running fp 16 vs 32
The intention here was to limit the iteration range to common indexes only.

Fix that.
2020-10-04 10:22:00 +00:00
Cedric Nugteren 615e5f0ff2
Merge pull request #397 from baryluk/patch-1
Fix Python SyntaxWarning
2020-10-04 11:07:56 +02:00
Cedric Nugteren cdcfbbc8bc
Merge pull request #398 from baryluk/patch-2
Fix --load_from_disk argument help message
2020-10-04 11:07:09 +02:00
Witold Baryluk 2dfe7c5c23
Fix --load_from_disk argument help message 2020-10-04 08:17:16 +00:00
Witold Baryluk 45fd085395
Fix Python SyntaxWarning
There is no guarantee that all empty strings objects are the same or share object with `""` literal.
2020-10-04 08:12:50 +00:00
Cedric Nugteren 46fb748a96
Merge pull request #396 from CNugteren/CLBlast-395-fix-benchmark-script
Fix a Python 3 bug in the benchmark script
2020-10-03 10:50:43 +02:00
Cedric Nugteren 0abd62a0e7 Fix a Python 3 bug in the benchmark script 2020-10-02 20:32:58 +02:00
Cedric Nugteren b4cd2b04e9
Added FUNDING.yml file 2020-08-16 10:33:47 +02:00
Cedric Nugteren 41f344d1a6
Merge pull request #392 from 9prady9/fix_Program_getIR
Fix Program::GetIR to handle programs with multiple devices
2020-06-07 19:52:49 +02:00
Pradeep Garigipati dff65e9217 Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG 2020-06-07 21:13:33 +05:30
Pradeep Garigipati aec71699f8
Fix Program::GetIR to handle programs with multiple devices 2020-06-05 12:00:45 +05:30
Cedric Nugteren da0e657d39
Merge pull request #389 from CNugteren/CLBlast-385-version-defines
Added version number defines
2020-05-13 20:28:58 +02:00
Cedric Nugteren 396ac0278a Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering 2020-05-12 14:43:25 +02:00
Cedric Nugteren 0826bfe683
Merge pull request #388 from CNugteren/CLBlast-381-gemm-direct-tuner-failure
Fixed tuners global workgroup size
2020-05-11 22:39:48 +02:00
Cedric Nugteren c369cf1a16 Increase display width of the local/global sizes 2020-05-11 20:26:33 +02:00
Cedric Nugteren 4a6c7c37a3 Made sure that the global workgroup size is a multiple of the local size in the tuners 2020-05-10 20:28:23 +02:00
Cedric Nugteren 69a4b4d4b0 Added logging of local/global workgroup sizes when run the tuners 2020-05-10 20:08:28 +02:00
Cedric Nugteren 9abc416785
Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routines
PyCLBlast: add missing batched routines
2020-05-10 18:23:41 +02:00
Cedric Nugteren 0870e76fba Updated PyCLBlast version number 2020-05-10 14:55:03 +02:00
Cedric Nugteren 0b7ce8033c Added a sample to demonstrate a batched routine 2020-05-10 14:54:50 +02:00
Cedric Nugteren b94e81af10 Added pyclblast bindings for the 3 batched routines 2020-05-10 12:26:25 +02:00
Cedric Nugteren 5f4b3ffcf7
Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner
Move queue creation out of the tuner loop
2020-05-04 20:26:42 +02:00
Cedric Nugteren bbb2031bf3 Move queue creation out of the tuner loop 2020-05-03 20:30:55 +02:00
Cedric Nugteren 78300ccbea
Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin
Change amax/amin behaviour
2020-03-15 11:34:31 +01:00
Cedric Nugteren 5f97d64505 Update API documentation 2020-03-08 11:29:47 +01:00
Cedric Nugteren b46853660e Made it more likely (but no guarantees) for amax/amin to return the first index 2020-03-08 11:26:49 +01:00
Cedric Nugteren 7fab29304c Added sample to play around with XAMAX routine 2020-03-08 11:26:18 +01:00
Cedric Nugteren e3ce88154a Silenced a new OpenCL warning message 2020-03-08 10:14:59 +01:00
Cedric Nugteren 8433985051 Updated to version 1.5.1 2020-02-18 10:29:40 +01:00
Cedric Nugteren bf4e4198b7
Merge pull request #376 from CNugteren/fix_tuner_exception_catching
Catches all exceptions of the tuners
2020-02-18 10:23:43 +01:00
Cedric Nugteren 49eb490ee1 Catches all exceptions of the tuners 2020-02-17 22:07:51 +01:00
Cedric Nugteren 8a19667e75
Merge pull request #372 from trantila/master
Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-15 09:33:53 +01:00
Tarmo Räntilä 21b66ca761 Reduce TestMatrix calls for xgemmstridedbatched.
Replace the looped test by a single one with the offset of the last batch.
2019-12-09 22:17:24 +02:00
Tarmo Räntilä bf50c4e53e Reduce TestMatrix calls for xgemmbatched.
Replace the looped test by a single one with the maximal found offset.
2019-12-09 22:13:52 +02:00
Cedric Nugteren 6ac74008b6
Added notion of fixes in XhadFaster 2019-09-06 19:33:30 +02:00