Commit Graph

1473 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren fe93153404
Merge pull request #413 from CNugteren/CLBlast-412-python-runtime-libs
Add library dir on Linux for pyclblast
2021-02-04 20:45:40 +01:00
Cedric Nugteren d57f8065ea Added second Windows library path 2021-02-04 20:13:02 +01:00
Cedric Nugteren c78c649844 Add library path for Windows as well 2021-01-30 14:28:11 +01:00
Cedric Nugteren bbcb357a71 Add library dir on Linux for pyclblast 2021-01-29 20:48:05 +01:00
Cedric Nugteren 07837a5c2d Update pyclblast package version number 2021-01-21 20:49:31 +01:00
Cedric Nugteren a5ef06ec57
Merge pull request #410 from jamesjer/master
Use reference types to prevent unnecessary copying
2021-01-21 19:56:18 +01:00
Jerry James dc82a1fbc8 Use reference types to prevent unnecessary copying 2021-01-20 10:21:36 -07:00
Cedric Nugteren 70016e8698 Updated to version 1.5.2 2021-01-19 21:19:12 +01:00
Cedric Nugteren 0ee39af5ed Add tuning results for TITAN RTX 2020-10-10 13:01:12 +02:00
Cedric Nugteren 481d86665f Add tuning results for Radeon RX Vega 2020-10-10 12:56:28 +02:00
Cedric Nugteren e6e2519eaa
Merge pull request #400 from baryluk/patch-6
Allow single graph / subplot on plot
2020-10-05 21:19:34 +02:00
Witold Baryluk ea199c3469
Allow single graph / subplot on plot
`plt.subplots` tries to be special, and return array or not-array depending on a number of subplots.

It is not actually helpful, and IMHO bad design.

Make it always `ndarray`.

The `and not type(axes) is np.ndarray`, is just in case matplotlib decides to make their behavior more uniform. For now work around it.

Also, no need for `ndarray.flat` really.

Confirmed to work with existing benchmarks (i.e. rows=2, cols=3), and with single graphs (rows=1, cols=1).
2020-10-05 12:11:17 +00:00
Cedric Nugteren 3462d7fa85
Merge pull request #399 from baryluk/patch-3
Fix a typo in benchmark when running fp 16 vs 32
2020-10-04 16:43:21 +02:00
Witold Baryluk eb967a0943
Fix a typo in benchmark when running fp 16 vs 32
The intention here was to limit the iteration range to common indexes only.

Fix that.
2020-10-04 10:22:00 +00:00
Cedric Nugteren 615e5f0ff2
Merge pull request #397 from baryluk/patch-1
Fix Python SyntaxWarning
2020-10-04 11:07:56 +02:00
Cedric Nugteren cdcfbbc8bc
Merge pull request #398 from baryluk/patch-2
Fix --load_from_disk argument help message
2020-10-04 11:07:09 +02:00
Witold Baryluk 2dfe7c5c23
Fix --load_from_disk argument help message 2020-10-04 08:17:16 +00:00
Witold Baryluk 45fd085395
Fix Python SyntaxWarning
There is no guarantee that all empty strings objects are the same or share object with `""` literal.
2020-10-04 08:12:50 +00:00
Cedric Nugteren 46fb748a96
Merge pull request #396 from CNugteren/CLBlast-395-fix-benchmark-script
Fix a Python 3 bug in the benchmark script
2020-10-03 10:50:43 +02:00
Cedric Nugteren 0abd62a0e7 Fix a Python 3 bug in the benchmark script 2020-10-02 20:32:58 +02:00
Cedric Nugteren b4cd2b04e9
Added FUNDING.yml file 2020-08-16 10:33:47 +02:00
Cedric Nugteren 41f344d1a6
Merge pull request #392 from 9prady9/fix_Program_getIR
Fix Program::GetIR to handle programs with multiple devices
2020-06-07 19:52:49 +02:00
Pradeep Garigipati dff65e9217 Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG 2020-06-07 21:13:33 +05:30
Pradeep Garigipati aec71699f8
Fix Program::GetIR to handle programs with multiple devices 2020-06-05 12:00:45 +05:30
Cedric Nugteren da0e657d39
Merge pull request #389 from CNugteren/CLBlast-385-version-defines
Added version number defines
2020-05-13 20:28:58 +02:00
Cedric Nugteren 396ac0278a Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering 2020-05-12 14:43:25 +02:00
Cedric Nugteren 0826bfe683
Merge pull request #388 from CNugteren/CLBlast-381-gemm-direct-tuner-failure
Fixed tuners global workgroup size
2020-05-11 22:39:48 +02:00
Cedric Nugteren c369cf1a16 Increase display width of the local/global sizes 2020-05-11 20:26:33 +02:00
Cedric Nugteren 4a6c7c37a3 Made sure that the global workgroup size is a multiple of the local size in the tuners 2020-05-10 20:28:23 +02:00
Cedric Nugteren 69a4b4d4b0 Added logging of local/global workgroup sizes when run the tuners 2020-05-10 20:08:28 +02:00
Cedric Nugteren 9abc416785
Merge pull request #386 from CNugteren/CLBlast-384-pyclblast-missing-routines
PyCLBlast: add missing batched routines
2020-05-10 18:23:41 +02:00
Cedric Nugteren 0870e76fba Updated PyCLBlast version number 2020-05-10 14:55:03 +02:00
Cedric Nugteren 0b7ce8033c Added a sample to demonstrate a batched routine 2020-05-10 14:54:50 +02:00
Cedric Nugteren b94e81af10 Added pyclblast bindings for the 3 batched routines 2020-05-10 12:26:25 +02:00
Cedric Nugteren 5f4b3ffcf7
Merge pull request #383 from CNugteren/CLBlast-382-improve-tuner
Move queue creation out of the tuner loop
2020-05-04 20:26:42 +02:00
Cedric Nugteren bbb2031bf3 Move queue creation out of the tuner loop 2020-05-03 20:30:55 +02:00
Cedric Nugteren 78300ccbea
Merge pull request #378 from CNugteren/CLBlast-377-fix-amax-amin
Change amax/amin behaviour
2020-03-15 11:34:31 +01:00
Cedric Nugteren 5f97d64505 Update API documentation 2020-03-08 11:29:47 +01:00
Cedric Nugteren b46853660e Made it more likely (but no guarantees) for amax/amin to return the first index 2020-03-08 11:26:49 +01:00
Cedric Nugteren 7fab29304c Added sample to play around with XAMAX routine 2020-03-08 11:26:18 +01:00
Cedric Nugteren e3ce88154a Silenced a new OpenCL warning message 2020-03-08 10:14:59 +01:00
Cedric Nugteren 8433985051 Updated to version 1.5.1 2020-02-18 10:29:40 +01:00
Cedric Nugteren bf4e4198b7
Merge pull request #376 from CNugteren/fix_tuner_exception_catching
Catches all exceptions of the tuners
2020-02-18 10:23:43 +01:00
Cedric Nugteren 49eb490ee1 Catches all exceptions of the tuners 2020-02-17 22:07:51 +01:00
Cedric Nugteren 8a19667e75
Merge pull request #372 from trantila/master
Reduced number of TestMatrix calls for the batched xgemm routines.
2019-12-15 09:33:53 +01:00
Tarmo Räntilä 21b66ca761 Reduce TestMatrix calls for xgemmstridedbatched.
Replace the looped test by a single one with the offset of the last batch.
2019-12-09 22:17:24 +02:00
Tarmo Räntilä bf50c4e53e Reduce TestMatrix calls for xgemmbatched.
Replace the looped test by a single one with the maximal found offset.
2019-12-09 22:13:52 +02:00
Cedric Nugteren 6ac74008b6
Added notion of fixes in XhadFaster 2019-09-06 19:33:30 +02:00
Cedric Nugteren 701ac9bf76
Merge pull request #368 from etomzak/master
Fix out-of-bounds read/write in XhadFaster
2019-09-06 19:30:52 +02:00
etomzak 9560193a9e Fix out-of-bounds read/write in XhadFaster
Fix an error in XhadFaster where data would be written beyond the end of zgm.
The kernel loop assumed that there was always enough work for each thread to
process WPT items, but this was not enforced. It's possible to detect the
overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be
~500 (much larger than the normal 127).

This commit may improve the performance of XhadFaster, since the kernel was
performing 2x work in some cases (once over real data, once over garbage).

Courtesy of Codeplay Software Ltd.
2019-09-04 12:55:25 +01:00