Commit Graph

1439 Commits (d94d086d6f92ff1f73bd2a8595a974f6802b3f24)

Author SHA1 Message Date
Cedric Nugteren d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer (#461) 2023-05-10 12:48:25 +02:00
Cedric Nugteren 4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer (#458) 2023-05-07 20:03:16 +02:00
Cedric Nugteren 3d0c227fa5
AMAX/AMIN integer testing and bug fixes (#457)
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result

* Perform proper integer-output testing in XAMAX tests

* A few changes towards getting it ready for a PR

* Also fix compilation for clBLAS and cuBLAS references

* Fix a bug that would only use the real part of complex numbers in the amax/amin routines

* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren 1573f7d304
Merge pull request #455 from CNugteren/fix_gemm_documentation_bug
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 21:25:41 +01:00
Cedric Nugteren 9eca896b05 Fix documentation bug w.r.t. ld values and matrix layout 2023-03-25 20:24:40 +01:00
Cedric Nugteren ab5092dd26
Merge pull request #452 from CNugteren/add_tuning_results_adreno
Add tuning results for 4 devies
2023-01-22 15:52:49 +01:00
Cedric Nugteren c9856758b3 Add tuning results for Intel FPGA emulation device 2023-01-21 21:13:49 +01:00
Cedric Nugteren f4a14daf8d Add tuning results for Radeon Pro 450 2023-01-21 21:11:38 +01:00
Cedric Nugteren 3ca1f5176e Add tuning results for Adreno 740 2023-01-21 21:09:09 +01:00
Cedric Nugteren d11b0c8b01 Add tuning results for Adreno 730 2023-01-21 20:33:49 +01:00
Cedric Nugteren e72f87ae5e
Merge pull request #451 from CodeLinaro/master
CLBlast modifications to address Qualcomm Adreno performance
2023-01-21 20:28:32 +01:00
Angus, Alexander 73f49e9b3d Updated according to feedback from CNugteren 2023-01-17 08:35:29 -08:00
Angus, Alexander ff6a5689df Adreno 730 + 740 CLBlast tuning results 2023-01-12 12:33:48 -08:00
Angus, Alexander 4f394608a2 implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731 2023-01-03 10:56:04 -08:00
Cedric Nugteren 03cffa83c5
Merge pull request #447 from CNugteren/small_plotting_fixes
Fix two small issues in the plotting script
2022-10-14 08:18:00 +02:00
Cedric Nugteren c7d677e4a9 Update changelog 2022-10-13 22:26:26 +02:00
Cedric Nugteren 374eba3ee2 Fix plotting issue with a single row or column 2022-10-13 22:24:35 +02:00
Cedric Nugteren 8aa9f32b23 Fix plotting issue in case of 'inf' values 2022-10-13 22:20:24 +02:00
Cedric Nugteren d55840e16c
Merge pull request #442 from CNugteren/update_version_to_1_5_3
Update to version 1.5.3
2022-09-27 22:45:49 +02:00
Cedric Nugteren e080635019 Fix opencl.hpp download in CMake 2022-09-27 21:11:17 +02:00
Cedric Nugteren 5c608d97cd Properly set OpenCL target to version 2.1 2022-09-27 21:09:35 +02:00
Cedric Nugteren f7db4c5d45 Replace the broken khronos registry link for cl.hpp with a new github link for opencl.hpp 2022-09-22 22:18:58 +02:00
Cedric Nugteren 521eee4bbf Update PyCLBlast version number 2022-09-22 22:09:21 +02:00
Cedric Nugteren 0de212a56b Update to version 1.5.3 2022-09-22 22:07:33 +02:00
Cedric Nugteren 38fa34b432
Fix typo in comment
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-06-24 09:32:47 +02:00
Cedric Nugteren d837b64269
Merge pull request #438 from CNugteren/cupp11_api_inconsistency
Fix API inconsistency in cupp11.hpp
2022-05-25 09:14:04 +02:00
Cedric Nugteren 9ab1bf24e2
Fix API inconsistency in cupp11.hpp
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-23 12:45:22 +02:00
Cedric Nugteren 6b358e1be9
Merge pull request #437 from umar456/blas_fix
Add logic to find intel OpenMP on oneMKL.
2022-05-17 08:36:18 +02:00
Cedric Nugteren 1884158128
Merge pull request #432 from justingra/sum-fix
sum fix
2022-05-16 08:38:35 +02:00
Umar Arshad 35a4be231a
Add logic to find intel OpenMP on oneMKL. 2022-05-15 15:37:23 -04:00
Justin Graham fc238a96c9 dev version 2022-05-13 16:46:28 -05:00
Justin Graham 1256f7bfbf changelog message 2022-05-13 08:45:54 -05:00
Cedric Nugteren cb43f264cb
Merge pull request #436 from CNugteren/add_tuning_results
Add tuning results for 2 AMD GPUs and 1 Qualcomm GPU
2022-04-25 21:42:57 +02:00
Cedric Nugteren f107162e64 Add tuning results for Adreno 540 2022-04-25 20:36:18 +02:00
Cedric Nugteren c4163b4b1a Add tuning results for Radeon RX 6500 XT 2022-04-25 20:33:47 +02:00
Cedric Nugteren 7ec8b2f29b Add tuning results for Radeon RX 6800 XT 2022-04-25 20:31:55 +02:00
Cedric Nugteren df0e492d39
Merge pull request #434 from CNugteren/update_test_status_machines
Remove old test machines and add new ones
2022-04-25 20:15:07 +02:00
Cedric Nugteren a7cdf3f0fa Remove old test machines and add new ones 2022-04-25 20:08:41 +02:00
Justin Graham ba254d2f50 sum fix 2022-04-22 11:39:38 -05:00
Cedric Nugteren 9e2ccb7f2b
Merge pull request #431 from danyougle/patch-2
android.hpp: custom header guard _clang_
2022-04-14 10:26:52 +02:00
danyougle f3f3c88710
android.hpp: custom header guard of _clang_
In order not to have ambiguous definitions, exclude the functions for other compilers
2022-04-13 22:33:12 +02:00
Cedric Nugteren 8d298af10b
Merge pull request #430 from danyougle/patch-1
add AMD OCL SDK light path in ENV section
2022-04-13 11:51:47 +02:00
danyougle 6db6ff7107
add AMD OCL SDK light path in ENV section 2022-04-13 10:44:40 +02:00
Cedric Nugteren 4500a03440
Merge pull request #425 from CNugteren/tesla_t4_correctness
Tesla T4 tuning parameters
2021-08-27 22:17:30 +02:00
Cedric Nugteren 772dd307ab Add Quadro T2000 tuning parameters for the Tesla T4 2021-08-27 20:39:59 +02:00
Cedric Nugteren 1f639b7264 Remove Tesla T4 tuning results 2021-08-27 20:32:59 +02:00
Cedric Nugteren cb761e375b
Merge pull request #424 from gspr/gspr/prebuilt
Update documentation to reflect CLBlast in Debian & Ubuntu
2021-08-24 13:29:18 +02:00
Gard Spreemann df1eebc120 PPA for older Ubuntus 2021-08-24 12:36:35 +02:00
Gard Spreemann 3b1e14acd6 Let the installation documentation reflect the fact that CLBlast is now in Debian and Ubuntu 2021-08-24 11:27:42 +02:00
Cedric Nugteren 93d6070e27
Merge pull request #423 from CNugteren/new_tuning_results
New tuning results for 1 Intel CPU and 5 NVIDIA GPUs
2021-08-20 08:18:36 +02:00