Commit Graph

1473 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren 374eba3ee2 Fix plotting issue with a single row or column 2022-10-13 22:24:35 +02:00
Cedric Nugteren 8aa9f32b23 Fix plotting issue in case of 'inf' values 2022-10-13 22:20:24 +02:00
Cedric Nugteren d55840e16c
Merge pull request #442 from CNugteren/update_version_to_1_5_3
Update to version 1.5.3
2022-09-27 22:45:49 +02:00
Cedric Nugteren e080635019 Fix opencl.hpp download in CMake 2022-09-27 21:11:17 +02:00
Cedric Nugteren 5c608d97cd Properly set OpenCL target to version 2.1 2022-09-27 21:09:35 +02:00
Cedric Nugteren f7db4c5d45 Replace the broken khronos registry link for cl.hpp with a new github link for opencl.hpp 2022-09-22 22:18:58 +02:00
Cedric Nugteren 521eee4bbf Update PyCLBlast version number 2022-09-22 22:09:21 +02:00
Cedric Nugteren 0de212a56b Update to version 1.5.3 2022-09-22 22:07:33 +02:00
Cedric Nugteren 38fa34b432
Fix typo in comment
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-06-24 09:32:47 +02:00
Cedric Nugteren d837b64269
Merge pull request #438 from CNugteren/cupp11_api_inconsistency
Fix API inconsistency in cupp11.hpp
2022-05-25 09:14:04 +02:00
Cedric Nugteren 9ab1bf24e2
Fix API inconsistency in cupp11.hpp
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-23 12:45:22 +02:00
Cedric Nugteren 6b358e1be9
Merge pull request #437 from umar456/blas_fix
Add logic to find intel OpenMP on oneMKL.
2022-05-17 08:36:18 +02:00
Cedric Nugteren 1884158128
Merge pull request #432 from justingra/sum-fix
sum fix
2022-05-16 08:38:35 +02:00
Umar Arshad 35a4be231a
Add logic to find intel OpenMP on oneMKL. 2022-05-15 15:37:23 -04:00
Justin Graham fc238a96c9 dev version 2022-05-13 16:46:28 -05:00
Justin Graham 1256f7bfbf changelog message 2022-05-13 08:45:54 -05:00
Cedric Nugteren cb43f264cb
Merge pull request #436 from CNugteren/add_tuning_results
Add tuning results for 2 AMD GPUs and 1 Qualcomm GPU
2022-04-25 21:42:57 +02:00
Cedric Nugteren f107162e64 Add tuning results for Adreno 540 2022-04-25 20:36:18 +02:00
Cedric Nugteren c4163b4b1a Add tuning results for Radeon RX 6500 XT 2022-04-25 20:33:47 +02:00
Cedric Nugteren 7ec8b2f29b Add tuning results for Radeon RX 6800 XT 2022-04-25 20:31:55 +02:00
Cedric Nugteren df0e492d39
Merge pull request #434 from CNugteren/update_test_status_machines
Remove old test machines and add new ones
2022-04-25 20:15:07 +02:00
Cedric Nugteren a7cdf3f0fa Remove old test machines and add new ones 2022-04-25 20:08:41 +02:00
Justin Graham ba254d2f50 sum fix 2022-04-22 11:39:38 -05:00
Cedric Nugteren 9e2ccb7f2b
Merge pull request #431 from danyougle/patch-2
android.hpp: custom header guard _clang_
2022-04-14 10:26:52 +02:00
danyougle f3f3c88710
android.hpp: custom header guard of _clang_
In order not to have ambiguous definitions, exclude the functions for other compilers
2022-04-13 22:33:12 +02:00
Cedric Nugteren 8d298af10b
Merge pull request #430 from danyougle/patch-1
add AMD OCL SDK light path in ENV section
2022-04-13 11:51:47 +02:00
danyougle 6db6ff7107
add AMD OCL SDK light path in ENV section 2022-04-13 10:44:40 +02:00
Cedric Nugteren 4500a03440
Merge pull request #425 from CNugteren/tesla_t4_correctness
Tesla T4 tuning parameters
2021-08-27 22:17:30 +02:00
Cedric Nugteren 772dd307ab Add Quadro T2000 tuning parameters for the Tesla T4 2021-08-27 20:39:59 +02:00
Cedric Nugteren 1f639b7264 Remove Tesla T4 tuning results 2021-08-27 20:32:59 +02:00
Cedric Nugteren cb761e375b
Merge pull request #424 from gspr/gspr/prebuilt
Update documentation to reflect CLBlast in Debian & Ubuntu
2021-08-24 13:29:18 +02:00
Gard Spreemann df1eebc120 PPA for older Ubuntus 2021-08-24 12:36:35 +02:00
Gard Spreemann 3b1e14acd6 Let the installation documentation reflect the fact that CLBlast is now in Debian and Ubuntu 2021-08-24 11:27:42 +02:00
Cedric Nugteren 93d6070e27
Merge pull request #423 from CNugteren/new_tuning_results
New tuning results for 1 Intel CPU and 5 NVIDIA GPUs
2021-08-20 08:18:36 +02:00
Cedric Nugteren 2eaabeed10 Added a note on clock frequencies for tuning 2021-08-19 22:38:18 +02:00
Cedric Nugteren c2951b8a2a Updated README and tuning list 2021-08-19 20:37:46 +02:00
Cedric Nugteren 5a9bd270f8 Add tuning results for NVIDIA Tesla V100 2021-08-19 20:34:09 +02:00
Cedric Nugteren adb4b02982 Add tuning results for NVIDIA Tesla T4 2021-08-19 20:31:52 +02:00
Cedric Nugteren dea3b5fadb Add tuning results for NVIDIA Quadro T2000 2021-08-19 20:29:47 +02:00
Cedric Nugteren 521ad117bc Add tuning results for NVIDIA Quadro GV100 2021-08-19 20:27:39 +02:00
Cedric Nugteren e9dec268bc Add tuning results for Intel Core i9-9980HK 2021-08-19 20:25:26 +02:00
Cedric Nugteren e59ea46180 Add tuning results for NVIDIA A100 2021-08-19 20:23:25 +02:00
Cedric Nugteren 6dbd6d96bc
Merge pull request #419 from CNugteren/fix_tuner_out_of_bounds_access
Fix tuner printing issue
2021-05-23 13:39:55 +02:00
Cedric Nugteren 468a4a74eb Fix issue with printing out-of-bounds local/global sizes for level 1 tuners 2021-05-22 20:31:12 +02:00
Cedric Nugteren 856c850113
Merge pull request #417 from gspr/gspr/capitalization-typo
Correct capitalization typo
2021-04-30 12:58:01 +02:00
Gard Spreemann 3d3492646c Correct capitalization typo
The CLBlastConfig.cmake file was installed to a directory named
CLBLast (notice second capital l), which can cause issues for CMake's
search path when looking for CLBlast on the system.

This commit also fixes other occurrences of the wrong capitalization,
all of it purely cosmetic (i.e. in comments).
2021-04-30 10:27:22 +02:00
Cedric Nugteren ef5176dd96
Merge pull request #416 from JishinMaster/master
set the correct flop count for xgemm
2021-03-15 20:15:02 +01:00
JishinMaster aec45ea637 set the correct flop count for xgemm 2021-03-13 21:48:04 +01:00
Cedric Nugteren ce44c3adb5
Merge pull request #414 from CNugteren/CLBlast-412-python-runtime-libs-fix
Fix Windows paths in pyclblast
2021-02-06 13:30:24 +01:00
Cedric Nugteren 1fa0930d85 Fix Windows paths in pyclblast 2021-02-05 21:52:23 +01:00