Commit Graph

1394 Commits (1f639b7264a7caa4ddc12b2511a6bd1aaef85f34)

Author SHA1 Message Date
Cedric Nugteren 1f639b7264 Remove Tesla T4 tuning results 2021-08-27 20:32:59 +02:00
Cedric Nugteren cb761e375b
Merge pull request #424 from gspr/gspr/prebuilt
Update documentation to reflect CLBlast in Debian & Ubuntu
2021-08-24 13:29:18 +02:00
Gard Spreemann df1eebc120 PPA for older Ubuntus 2021-08-24 12:36:35 +02:00
Gard Spreemann 3b1e14acd6 Let the installation documentation reflect the fact that CLBlast is now in Debian and Ubuntu 2021-08-24 11:27:42 +02:00
Cedric Nugteren 93d6070e27
Merge pull request #423 from CNugteren/new_tuning_results
New tuning results for 1 Intel CPU and 5 NVIDIA GPUs
2021-08-20 08:18:36 +02:00
Cedric Nugteren 2eaabeed10 Added a note on clock frequencies for tuning 2021-08-19 22:38:18 +02:00
Cedric Nugteren c2951b8a2a Updated README and tuning list 2021-08-19 20:37:46 +02:00
Cedric Nugteren 5a9bd270f8 Add tuning results for NVIDIA Tesla V100 2021-08-19 20:34:09 +02:00
Cedric Nugteren adb4b02982 Add tuning results for NVIDIA Tesla T4 2021-08-19 20:31:52 +02:00
Cedric Nugteren dea3b5fadb Add tuning results for NVIDIA Quadro T2000 2021-08-19 20:29:47 +02:00
Cedric Nugteren 521ad117bc Add tuning results for NVIDIA Quadro GV100 2021-08-19 20:27:39 +02:00
Cedric Nugteren e9dec268bc Add tuning results for Intel Core i9-9980HK 2021-08-19 20:25:26 +02:00
Cedric Nugteren e59ea46180 Add tuning results for NVIDIA A100 2021-08-19 20:23:25 +02:00
Cedric Nugteren 6dbd6d96bc
Merge pull request #419 from CNugteren/fix_tuner_out_of_bounds_access
Fix tuner printing issue
2021-05-23 13:39:55 +02:00
Cedric Nugteren 468a4a74eb Fix issue with printing out-of-bounds local/global sizes for level 1 tuners 2021-05-22 20:31:12 +02:00
Cedric Nugteren 856c850113
Merge pull request #417 from gspr/gspr/capitalization-typo
Correct capitalization typo
2021-04-30 12:58:01 +02:00
Gard Spreemann 3d3492646c Correct capitalization typo
The CLBlastConfig.cmake file was installed to a directory named
CLBLast (notice second capital l), which can cause issues for CMake's
search path when looking for CLBlast on the system.

This commit also fixes other occurrences of the wrong capitalization,
all of it purely cosmetic (i.e. in comments).
2021-04-30 10:27:22 +02:00
Cedric Nugteren ef5176dd96
Merge pull request #416 from JishinMaster/master
set the correct flop count for xgemm
2021-03-15 20:15:02 +01:00
JishinMaster aec45ea637 set the correct flop count for xgemm 2021-03-13 21:48:04 +01:00
Cedric Nugteren ce44c3adb5
Merge pull request #414 from CNugteren/CLBlast-412-python-runtime-libs-fix
Fix Windows paths in pyclblast
2021-02-06 13:30:24 +01:00
Cedric Nugteren 1fa0930d85 Fix Windows paths in pyclblast 2021-02-05 21:52:23 +01:00
Cedric Nugteren fe93153404
Merge pull request #413 from CNugteren/CLBlast-412-python-runtime-libs
Add library dir on Linux for pyclblast
2021-02-04 20:45:40 +01:00
Cedric Nugteren d57f8065ea Added second Windows library path 2021-02-04 20:13:02 +01:00
Cedric Nugteren c78c649844 Add library path for Windows as well 2021-01-30 14:28:11 +01:00
Cedric Nugteren bbcb357a71 Add library dir on Linux for pyclblast 2021-01-29 20:48:05 +01:00
Cedric Nugteren 07837a5c2d Update pyclblast package version number 2021-01-21 20:49:31 +01:00
Cedric Nugteren a5ef06ec57
Merge pull request #410 from jamesjer/master
Use reference types to prevent unnecessary copying
2021-01-21 19:56:18 +01:00
Jerry James dc82a1fbc8 Use reference types to prevent unnecessary copying 2021-01-20 10:21:36 -07:00
Cedric Nugteren 70016e8698 Updated to version 1.5.2 2021-01-19 21:19:12 +01:00
Cedric Nugteren 0ee39af5ed Add tuning results for TITAN RTX 2020-10-10 13:01:12 +02:00
Cedric Nugteren 481d86665f Add tuning results for Radeon RX Vega 2020-10-10 12:56:28 +02:00
Cedric Nugteren e6e2519eaa
Merge pull request #400 from baryluk/patch-6
Allow single graph / subplot on plot
2020-10-05 21:19:34 +02:00
Witold Baryluk ea199c3469
Allow single graph / subplot on plot
`plt.subplots` tries to be special, and return array or not-array depending on a number of subplots.

It is not actually helpful, and IMHO bad design.

Make it always `ndarray`.

The `and not type(axes) is np.ndarray`, is just in case matplotlib decides to make their behavior more uniform. For now work around it.

Also, no need for `ndarray.flat` really.

Confirmed to work with existing benchmarks (i.e. rows=2, cols=3), and with single graphs (rows=1, cols=1).
2020-10-05 12:11:17 +00:00
Cedric Nugteren 3462d7fa85
Merge pull request #399 from baryluk/patch-3
Fix a typo in benchmark when running fp 16 vs 32
2020-10-04 16:43:21 +02:00
Witold Baryluk eb967a0943
Fix a typo in benchmark when running fp 16 vs 32
The intention here was to limit the iteration range to common indexes only.

Fix that.
2020-10-04 10:22:00 +00:00
Cedric Nugteren 615e5f0ff2
Merge pull request #397 from baryluk/patch-1
Fix Python SyntaxWarning
2020-10-04 11:07:56 +02:00
Cedric Nugteren cdcfbbc8bc
Merge pull request #398 from baryluk/patch-2
Fix --load_from_disk argument help message
2020-10-04 11:07:09 +02:00
Witold Baryluk 2dfe7c5c23
Fix --load_from_disk argument help message 2020-10-04 08:17:16 +00:00
Witold Baryluk 45fd085395
Fix Python SyntaxWarning
There is no guarantee that all empty strings objects are the same or share object with `""` literal.
2020-10-04 08:12:50 +00:00
Cedric Nugteren 46fb748a96
Merge pull request #396 from CNugteren/CLBlast-395-fix-benchmark-script
Fix a Python 3 bug in the benchmark script
2020-10-03 10:50:43 +02:00
Cedric Nugteren 0abd62a0e7 Fix a Python 3 bug in the benchmark script 2020-10-02 20:32:58 +02:00
Cedric Nugteren b4cd2b04e9
Added FUNDING.yml file 2020-08-16 10:33:47 +02:00
Cedric Nugteren 41f344d1a6
Merge pull request #392 from 9prady9/fix_Program_getIR
Fix Program::GetIR to handle programs with multiple devices
2020-06-07 19:52:49 +02:00
Pradeep Garigipati dff65e9217 Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG 2020-06-07 21:13:33 +05:30
Pradeep Garigipati aec71699f8
Fix Program::GetIR to handle programs with multiple devices 2020-06-05 12:00:45 +05:30
Cedric Nugteren da0e657d39
Merge pull request #389 from CNugteren/CLBlast-385-version-defines
Added version number defines
2020-05-13 20:28:58 +02:00
Cedric Nugteren 396ac0278a Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering 2020-05-12 14:43:25 +02:00
Cedric Nugteren 0826bfe683
Merge pull request #388 from CNugteren/CLBlast-381-gemm-direct-tuner-failure
Fixed tuners global workgroup size
2020-05-11 22:39:48 +02:00
Cedric Nugteren c369cf1a16 Increase display width of the local/global sizes 2020-05-11 20:26:33 +02:00
Cedric Nugteren 4a6c7c37a3 Made sure that the global workgroup size is a multiple of the local size in the tuners 2020-05-10 20:28:23 +02:00