Justin Graham
1256f7bfbf
changelog message
2022-05-13 08:45:54 -05:00
Cedric Nugteren
cb43f264cb
Merge pull request #436 from CNugteren/add_tuning_results
...
Add tuning results for 2 AMD GPUs and 1 Qualcomm GPU
2022-04-25 21:42:57 +02:00
Cedric Nugteren
f107162e64
Add tuning results for Adreno 540
2022-04-25 20:36:18 +02:00
Cedric Nugteren
c4163b4b1a
Add tuning results for Radeon RX 6500 XT
2022-04-25 20:33:47 +02:00
Cedric Nugteren
7ec8b2f29b
Add tuning results for Radeon RX 6800 XT
2022-04-25 20:31:55 +02:00
Cedric Nugteren
df0e492d39
Merge pull request #434 from CNugteren/update_test_status_machines
...
Remove old test machines and add new ones
2022-04-25 20:15:07 +02:00
Cedric Nugteren
a7cdf3f0fa
Remove old test machines and add new ones
2022-04-25 20:08:41 +02:00
Justin Graham
ba254d2f50
sum fix
2022-04-22 11:39:38 -05:00
Cedric Nugteren
9e2ccb7f2b
Merge pull request #431 from danyougle/patch-2
...
android.hpp: custom header guard _clang_
2022-04-14 10:26:52 +02:00
danyougle
f3f3c88710
android.hpp: custom header guard of _clang_
...
In order not to have ambiguous definitions, exclude the functions for other compilers
2022-04-13 22:33:12 +02:00
Cedric Nugteren
8d298af10b
Merge pull request #430 from danyougle/patch-1
...
add AMD OCL SDK light path in ENV section
2022-04-13 11:51:47 +02:00
danyougle
6db6ff7107
add AMD OCL SDK light path in ENV section
2022-04-13 10:44:40 +02:00
Cedric Nugteren
4500a03440
Merge pull request #425 from CNugteren/tesla_t4_correctness
...
Tesla T4 tuning parameters
2021-08-27 22:17:30 +02:00
Cedric Nugteren
772dd307ab
Add Quadro T2000 tuning parameters for the Tesla T4
2021-08-27 20:39:59 +02:00
Cedric Nugteren
1f639b7264
Remove Tesla T4 tuning results
2021-08-27 20:32:59 +02:00
Cedric Nugteren
cb761e375b
Merge pull request #424 from gspr/gspr/prebuilt
...
Update documentation to reflect CLBlast in Debian & Ubuntu
2021-08-24 13:29:18 +02:00
Gard Spreemann
df1eebc120
PPA for older Ubuntus
2021-08-24 12:36:35 +02:00
Gard Spreemann
3b1e14acd6
Let the installation documentation reflect the fact that CLBlast is now in Debian and Ubuntu
2021-08-24 11:27:42 +02:00
Cedric Nugteren
93d6070e27
Merge pull request #423 from CNugteren/new_tuning_results
...
New tuning results for 1 Intel CPU and 5 NVIDIA GPUs
2021-08-20 08:18:36 +02:00
Cedric Nugteren
2eaabeed10
Added a note on clock frequencies for tuning
2021-08-19 22:38:18 +02:00
Cedric Nugteren
c2951b8a2a
Updated README and tuning list
2021-08-19 20:37:46 +02:00
Cedric Nugteren
5a9bd270f8
Add tuning results for NVIDIA Tesla V100
2021-08-19 20:34:09 +02:00
Cedric Nugteren
adb4b02982
Add tuning results for NVIDIA Tesla T4
2021-08-19 20:31:52 +02:00
Cedric Nugteren
dea3b5fadb
Add tuning results for NVIDIA Quadro T2000
2021-08-19 20:29:47 +02:00
Cedric Nugteren
521ad117bc
Add tuning results for NVIDIA Quadro GV100
2021-08-19 20:27:39 +02:00
Cedric Nugteren
e9dec268bc
Add tuning results for Intel Core i9-9980HK
2021-08-19 20:25:26 +02:00
Cedric Nugteren
e59ea46180
Add tuning results for NVIDIA A100
2021-08-19 20:23:25 +02:00
Cedric Nugteren
6dbd6d96bc
Merge pull request #419 from CNugteren/fix_tuner_out_of_bounds_access
...
Fix tuner printing issue
2021-05-23 13:39:55 +02:00
Cedric Nugteren
468a4a74eb
Fix issue with printing out-of-bounds local/global sizes for level 1 tuners
2021-05-22 20:31:12 +02:00
Cedric Nugteren
856c850113
Merge pull request #417 from gspr/gspr/capitalization-typo
...
Correct capitalization typo
2021-04-30 12:58:01 +02:00
Gard Spreemann
3d3492646c
Correct capitalization typo
...
The CLBlastConfig.cmake file was installed to a directory named
CLBLast (notice second capital l), which can cause issues for CMake's
search path when looking for CLBlast on the system.
This commit also fixes other occurrences of the wrong capitalization,
all of it purely cosmetic (i.e. in comments).
2021-04-30 10:27:22 +02:00
Cedric Nugteren
ef5176dd96
Merge pull request #416 from JishinMaster/master
...
set the correct flop count for xgemm
2021-03-15 20:15:02 +01:00
JishinMaster
aec45ea637
set the correct flop count for xgemm
2021-03-13 21:48:04 +01:00
Cedric Nugteren
ce44c3adb5
Merge pull request #414 from CNugteren/CLBlast-412-python-runtime-libs-fix
...
Fix Windows paths in pyclblast
2021-02-06 13:30:24 +01:00
Cedric Nugteren
1fa0930d85
Fix Windows paths in pyclblast
2021-02-05 21:52:23 +01:00
Cedric Nugteren
fe93153404
Merge pull request #413 from CNugteren/CLBlast-412-python-runtime-libs
...
Add library dir on Linux for pyclblast
2021-02-04 20:45:40 +01:00
Cedric Nugteren
d57f8065ea
Added second Windows library path
2021-02-04 20:13:02 +01:00
Cedric Nugteren
c78c649844
Add library path for Windows as well
2021-01-30 14:28:11 +01:00
Cedric Nugteren
bbcb357a71
Add library dir on Linux for pyclblast
2021-01-29 20:48:05 +01:00
Cedric Nugteren
07837a5c2d
Update pyclblast package version number
2021-01-21 20:49:31 +01:00
Cedric Nugteren
a5ef06ec57
Merge pull request #410 from jamesjer/master
...
Use reference types to prevent unnecessary copying
2021-01-21 19:56:18 +01:00
Jerry James
dc82a1fbc8
Use reference types to prevent unnecessary copying
2021-01-20 10:21:36 -07:00
Cedric Nugteren
70016e8698
Updated to version 1.5.2
2021-01-19 21:19:12 +01:00
Cedric Nugteren
0ee39af5ed
Add tuning results for TITAN RTX
2020-10-10 13:01:12 +02:00
Cedric Nugteren
481d86665f
Add tuning results for Radeon RX Vega
2020-10-10 12:56:28 +02:00
Cedric Nugteren
e6e2519eaa
Merge pull request #400 from baryluk/patch-6
...
Allow single graph / subplot on plot
2020-10-05 21:19:34 +02:00
Witold Baryluk
ea199c3469
Allow single graph / subplot on plot
...
`plt.subplots` tries to be special, and return array or not-array depending on a number of subplots.
It is not actually helpful, and IMHO bad design.
Make it always `ndarray`.
The `and not type(axes) is np.ndarray`, is just in case matplotlib decides to make their behavior more uniform. For now work around it.
Also, no need for `ndarray.flat` really.
Confirmed to work with existing benchmarks (i.e. rows=2, cols=3), and with single graphs (rows=1, cols=1).
2020-10-05 12:11:17 +00:00
Cedric Nugteren
3462d7fa85
Merge pull request #399 from baryluk/patch-3
...
Fix a typo in benchmark when running fp 16 vs 32
2020-10-04 16:43:21 +02:00
Witold Baryluk
eb967a0943
Fix a typo in benchmark when running fp 16 vs 32
...
The intention here was to limit the iteration range to common indexes only.
Fix that.
2020-10-04 10:22:00 +00:00
Cedric Nugteren
615e5f0ff2
Merge pull request #397 from baryluk/patch-1
...
Fix Python SyntaxWarning
2020-10-04 11:07:56 +02:00