Commit Graph

1443 Commits (221121b8407d5538cfed2f53973303f02810d856)

Author SHA1 Message Date
Cedric Nugteren 221121b840
Add Github Actions CI (#464)
This replaces the old Travis CI builds with Github Actions that test on both Ubuntu and MacOS, with both Clang and GCC. The builds on macOS also run the tests and some other programs, on Ubuntu OpenCL is not working at the moment. Because these tests use new/different compilers, I fixed a few warnings and errors along the way.
2023-05-14 11:25:15 +02:00
Cedric Nugteren 8d2f3540e9
Merge pull request #465 from CNugteren/fix_override_parameters_test
Fix compilation issue in override parameters test
2023-05-11 08:49:26 +02:00
Cedric Nugteren 6e6efb72be Fix compilation issue in override parameters test 2023-05-10 21:31:33 +02:00
Cedric Nugteren 3baf823575
Fixes an issue under Android when the driver was already unloaded (#462) 2023-05-10 17:10:17 +02:00
Cedric Nugteren d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer (#461) 2023-05-10 12:48:25 +02:00
Cedric Nugteren 4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer (#458) 2023-05-07 20:03:16 +02:00
Cedric Nugteren 3d0c227fa5
AMAX/AMIN integer testing and bug fixes (#457)
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result

* Perform proper integer-output testing in XAMAX tests

* A few changes towards getting it ready for a PR

* Also fix compilation for clBLAS and cuBLAS references

* Fix a bug that would only use the real part of complex numbers in the amax/amin routines

* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren 1573f7d304
Merge pull request #455 from CNugteren/fix_gemm_documentation_bug
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 21:25:41 +01:00
Cedric Nugteren 9eca896b05 Fix documentation bug w.r.t. ld values and matrix layout 2023-03-25 20:24:40 +01:00
Cedric Nugteren ab5092dd26
Merge pull request #452 from CNugteren/add_tuning_results_adreno
Add tuning results for 4 devies
2023-01-22 15:52:49 +01:00
Cedric Nugteren c9856758b3 Add tuning results for Intel FPGA emulation device 2023-01-21 21:13:49 +01:00
Cedric Nugteren f4a14daf8d Add tuning results for Radeon Pro 450 2023-01-21 21:11:38 +01:00
Cedric Nugteren 3ca1f5176e Add tuning results for Adreno 740 2023-01-21 21:09:09 +01:00
Cedric Nugteren d11b0c8b01 Add tuning results for Adreno 730 2023-01-21 20:33:49 +01:00
Cedric Nugteren e72f87ae5e
Merge pull request #451 from CodeLinaro/master
CLBlast modifications to address Qualcomm Adreno performance
2023-01-21 20:28:32 +01:00
Angus, Alexander 73f49e9b3d Updated according to feedback from CNugteren 2023-01-17 08:35:29 -08:00
Angus, Alexander ff6a5689df Adreno 730 + 740 CLBlast tuning results 2023-01-12 12:33:48 -08:00
Angus, Alexander 4f394608a2 implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731 2023-01-03 10:56:04 -08:00
Cedric Nugteren 03cffa83c5
Merge pull request #447 from CNugteren/small_plotting_fixes
Fix two small issues in the plotting script
2022-10-14 08:18:00 +02:00
Cedric Nugteren c7d677e4a9 Update changelog 2022-10-13 22:26:26 +02:00
Cedric Nugteren 374eba3ee2 Fix plotting issue with a single row or column 2022-10-13 22:24:35 +02:00
Cedric Nugteren 8aa9f32b23 Fix plotting issue in case of 'inf' values 2022-10-13 22:20:24 +02:00
Cedric Nugteren d55840e16c
Merge pull request #442 from CNugteren/update_version_to_1_5_3
Update to version 1.5.3
2022-09-27 22:45:49 +02:00
Cedric Nugteren e080635019 Fix opencl.hpp download in CMake 2022-09-27 21:11:17 +02:00
Cedric Nugteren 5c608d97cd Properly set OpenCL target to version 2.1 2022-09-27 21:09:35 +02:00
Cedric Nugteren f7db4c5d45 Replace the broken khronos registry link for cl.hpp with a new github link for opencl.hpp 2022-09-22 22:18:58 +02:00
Cedric Nugteren 521eee4bbf Update PyCLBlast version number 2022-09-22 22:09:21 +02:00
Cedric Nugteren 0de212a56b Update to version 1.5.3 2022-09-22 22:07:33 +02:00
Cedric Nugteren 38fa34b432
Fix typo in comment
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-06-24 09:32:47 +02:00
Cedric Nugteren d837b64269
Merge pull request #438 from CNugteren/cupp11_api_inconsistency
Fix API inconsistency in cupp11.hpp
2022-05-25 09:14:04 +02:00
Cedric Nugteren 9ab1bf24e2
Fix API inconsistency in cupp11.hpp
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-23 12:45:22 +02:00
Cedric Nugteren 6b358e1be9
Merge pull request #437 from umar456/blas_fix
Add logic to find intel OpenMP on oneMKL.
2022-05-17 08:36:18 +02:00
Cedric Nugteren 1884158128
Merge pull request #432 from justingra/sum-fix
sum fix
2022-05-16 08:38:35 +02:00
Umar Arshad 35a4be231a
Add logic to find intel OpenMP on oneMKL. 2022-05-15 15:37:23 -04:00
Justin Graham fc238a96c9 dev version 2022-05-13 16:46:28 -05:00
Justin Graham 1256f7bfbf changelog message 2022-05-13 08:45:54 -05:00
Cedric Nugteren cb43f264cb
Merge pull request #436 from CNugteren/add_tuning_results
Add tuning results for 2 AMD GPUs and 1 Qualcomm GPU
2022-04-25 21:42:57 +02:00
Cedric Nugteren f107162e64 Add tuning results for Adreno 540 2022-04-25 20:36:18 +02:00
Cedric Nugteren c4163b4b1a Add tuning results for Radeon RX 6500 XT 2022-04-25 20:33:47 +02:00
Cedric Nugteren 7ec8b2f29b Add tuning results for Radeon RX 6800 XT 2022-04-25 20:31:55 +02:00
Cedric Nugteren df0e492d39
Merge pull request #434 from CNugteren/update_test_status_machines
Remove old test machines and add new ones
2022-04-25 20:15:07 +02:00
Cedric Nugteren a7cdf3f0fa Remove old test machines and add new ones 2022-04-25 20:08:41 +02:00
Justin Graham ba254d2f50 sum fix 2022-04-22 11:39:38 -05:00
Cedric Nugteren 9e2ccb7f2b
Merge pull request #431 from danyougle/patch-2
android.hpp: custom header guard _clang_
2022-04-14 10:26:52 +02:00
danyougle f3f3c88710
android.hpp: custom header guard of _clang_
In order not to have ambiguous definitions, exclude the functions for other compilers
2022-04-13 22:33:12 +02:00
Cedric Nugteren 8d298af10b
Merge pull request #430 from danyougle/patch-1
add AMD OCL SDK light path in ENV section
2022-04-13 11:51:47 +02:00
danyougle 6db6ff7107
add AMD OCL SDK light path in ENV section 2022-04-13 10:44:40 +02:00
Cedric Nugteren 4500a03440
Merge pull request #425 from CNugteren/tesla_t4_correctness
Tesla T4 tuning parameters
2021-08-27 22:17:30 +02:00
Cedric Nugteren 772dd307ab Add Quadro T2000 tuning parameters for the Tesla T4 2021-08-27 20:39:59 +02:00
Cedric Nugteren 1f639b7264 Remove Tesla T4 tuning results 2021-08-27 20:32:59 +02:00