Cedric Nugteren
38fa34b432
Fix typo in comment
...
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-06-24 09:32:47 +02:00
Cedric Nugteren
d837b64269
Merge pull request #438 from CNugteren/cupp11_api_inconsistency
...
Fix API inconsistency in cupp11.hpp
2022-05-25 09:14:04 +02:00
Cedric Nugteren
9ab1bf24e2
Fix API inconsistency in cupp11.hpp
...
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-23 12:45:22 +02:00
Cedric Nugteren
6b358e1be9
Merge pull request #437 from umar456/blas_fix
...
Add logic to find intel OpenMP on oneMKL.
2022-05-17 08:36:18 +02:00
Cedric Nugteren
1884158128
Merge pull request #432 from justingra/sum-fix
...
sum fix
2022-05-16 08:38:35 +02:00
Umar Arshad
35a4be231a
Add logic to find intel OpenMP on oneMKL.
2022-05-15 15:37:23 -04:00
Justin Graham
fc238a96c9
dev version
2022-05-13 16:46:28 -05:00
Justin Graham
1256f7bfbf
changelog message
2022-05-13 08:45:54 -05:00
Cedric Nugteren
cb43f264cb
Merge pull request #436 from CNugteren/add_tuning_results
...
Add tuning results for 2 AMD GPUs and 1 Qualcomm GPU
2022-04-25 21:42:57 +02:00
Cedric Nugteren
f107162e64
Add tuning results for Adreno 540
2022-04-25 20:36:18 +02:00
Cedric Nugteren
c4163b4b1a
Add tuning results for Radeon RX 6500 XT
2022-04-25 20:33:47 +02:00
Cedric Nugteren
7ec8b2f29b
Add tuning results for Radeon RX 6800 XT
2022-04-25 20:31:55 +02:00
Cedric Nugteren
df0e492d39
Merge pull request #434 from CNugteren/update_test_status_machines
...
Remove old test machines and add new ones
2022-04-25 20:15:07 +02:00
Cedric Nugteren
a7cdf3f0fa
Remove old test machines and add new ones
2022-04-25 20:08:41 +02:00
Justin Graham
ba254d2f50
sum fix
2022-04-22 11:39:38 -05:00
Cedric Nugteren
9e2ccb7f2b
Merge pull request #431 from danyougle/patch-2
...
android.hpp: custom header guard _clang_
2022-04-14 10:26:52 +02:00
danyougle
f3f3c88710
android.hpp: custom header guard of _clang_
...
In order not to have ambiguous definitions, exclude the functions for other compilers
2022-04-13 22:33:12 +02:00
Cedric Nugteren
8d298af10b
Merge pull request #430 from danyougle/patch-1
...
add AMD OCL SDK light path in ENV section
2022-04-13 11:51:47 +02:00
danyougle
6db6ff7107
add AMD OCL SDK light path in ENV section
2022-04-13 10:44:40 +02:00
Cedric Nugteren
4500a03440
Merge pull request #425 from CNugteren/tesla_t4_correctness
...
Tesla T4 tuning parameters
2021-08-27 22:17:30 +02:00
Cedric Nugteren
772dd307ab
Add Quadro T2000 tuning parameters for the Tesla T4
2021-08-27 20:39:59 +02:00
Cedric Nugteren
1f639b7264
Remove Tesla T4 tuning results
2021-08-27 20:32:59 +02:00
Cedric Nugteren
cb761e375b
Merge pull request #424 from gspr/gspr/prebuilt
...
Update documentation to reflect CLBlast in Debian & Ubuntu
2021-08-24 13:29:18 +02:00
Gard Spreemann
df1eebc120
PPA for older Ubuntus
2021-08-24 12:36:35 +02:00
Gard Spreemann
3b1e14acd6
Let the installation documentation reflect the fact that CLBlast is now in Debian and Ubuntu
2021-08-24 11:27:42 +02:00
Cedric Nugteren
93d6070e27
Merge pull request #423 from CNugteren/new_tuning_results
...
New tuning results for 1 Intel CPU and 5 NVIDIA GPUs
2021-08-20 08:18:36 +02:00
Cedric Nugteren
2eaabeed10
Added a note on clock frequencies for tuning
2021-08-19 22:38:18 +02:00
Cedric Nugteren
c2951b8a2a
Updated README and tuning list
2021-08-19 20:37:46 +02:00
Cedric Nugteren
5a9bd270f8
Add tuning results for NVIDIA Tesla V100
2021-08-19 20:34:09 +02:00
Cedric Nugteren
adb4b02982
Add tuning results for NVIDIA Tesla T4
2021-08-19 20:31:52 +02:00
Cedric Nugteren
dea3b5fadb
Add tuning results for NVIDIA Quadro T2000
2021-08-19 20:29:47 +02:00
Cedric Nugteren
521ad117bc
Add tuning results for NVIDIA Quadro GV100
2021-08-19 20:27:39 +02:00
Cedric Nugteren
e9dec268bc
Add tuning results for Intel Core i9-9980HK
2021-08-19 20:25:26 +02:00
Cedric Nugteren
e59ea46180
Add tuning results for NVIDIA A100
2021-08-19 20:23:25 +02:00
Cedric Nugteren
6dbd6d96bc
Merge pull request #419 from CNugteren/fix_tuner_out_of_bounds_access
...
Fix tuner printing issue
2021-05-23 13:39:55 +02:00
Cedric Nugteren
468a4a74eb
Fix issue with printing out-of-bounds local/global sizes for level 1 tuners
2021-05-22 20:31:12 +02:00
Cedric Nugteren
856c850113
Merge pull request #417 from gspr/gspr/capitalization-typo
...
Correct capitalization typo
2021-04-30 12:58:01 +02:00
Gard Spreemann
3d3492646c
Correct capitalization typo
...
The CLBlastConfig.cmake file was installed to a directory named
CLBLast (notice second capital l), which can cause issues for CMake's
search path when looking for CLBlast on the system.
This commit also fixes other occurrences of the wrong capitalization,
all of it purely cosmetic (i.e. in comments).
2021-04-30 10:27:22 +02:00
Cedric Nugteren
ef5176dd96
Merge pull request #416 from JishinMaster/master
...
set the correct flop count for xgemm
2021-03-15 20:15:02 +01:00
JishinMaster
aec45ea637
set the correct flop count for xgemm
2021-03-13 21:48:04 +01:00
Cedric Nugteren
ce44c3adb5
Merge pull request #414 from CNugteren/CLBlast-412-python-runtime-libs-fix
...
Fix Windows paths in pyclblast
2021-02-06 13:30:24 +01:00
Cedric Nugteren
1fa0930d85
Fix Windows paths in pyclblast
2021-02-05 21:52:23 +01:00
Cedric Nugteren
fe93153404
Merge pull request #413 from CNugteren/CLBlast-412-python-runtime-libs
...
Add library dir on Linux for pyclblast
2021-02-04 20:45:40 +01:00
Cedric Nugteren
d57f8065ea
Added second Windows library path
2021-02-04 20:13:02 +01:00
Cedric Nugteren
c78c649844
Add library path for Windows as well
2021-01-30 14:28:11 +01:00
Cedric Nugteren
bbcb357a71
Add library dir on Linux for pyclblast
2021-01-29 20:48:05 +01:00
Cedric Nugteren
07837a5c2d
Update pyclblast package version number
2021-01-21 20:49:31 +01:00
Cedric Nugteren
a5ef06ec57
Merge pull request #410 from jamesjer/master
...
Use reference types to prevent unnecessary copying
2021-01-21 19:56:18 +01:00
Jerry James
dc82a1fbc8
Use reference types to prevent unnecessary copying
2021-01-20 10:21:36 -07:00
Cedric Nugteren
70016e8698
Updated to version 1.5.2
2021-01-19 21:19:12 +01:00