Cedric Nugteren
221121b840
Add Github Actions CI ( #464 )
...
This replaces the old Travis CI builds with Github Actions that test on both Ubuntu and MacOS, with both Clang and GCC. The builds on macOS also run the tests and some other programs, on Ubuntu OpenCL is not working at the moment. Because these tests use new/different compilers, I fixed a few warnings and errors along the way.
2023-05-14 11:25:15 +02:00
Cedric Nugteren
8d2f3540e9
Merge pull request #465 from CNugteren/fix_override_parameters_test
...
Fix compilation issue in override parameters test
2023-05-11 08:49:26 +02:00
Cedric Nugteren
6e6efb72be
Fix compilation issue in override parameters test
2023-05-10 21:31:33 +02:00
Cedric Nugteren
3baf823575
Fixes an issue under Android when the driver was already unloaded ( #462 )
2023-05-10 17:10:17 +02:00
Cedric Nugteren
d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer ( #461 )
2023-05-10 12:48:25 +02:00
Cedric Nugteren
4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer ( #458 )
2023-05-07 20:03:16 +02:00
Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren
1573f7d304
Merge pull request #455 from CNugteren/fix_gemm_documentation_bug
...
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 21:25:41 +01:00
Cedric Nugteren
9eca896b05
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 20:24:40 +01:00
Cedric Nugteren
ab5092dd26
Merge pull request #452 from CNugteren/add_tuning_results_adreno
...
Add tuning results for 4 devies
2023-01-22 15:52:49 +01:00
Cedric Nugteren
c9856758b3
Add tuning results for Intel FPGA emulation device
2023-01-21 21:13:49 +01:00
Cedric Nugteren
f4a14daf8d
Add tuning results for Radeon Pro 450
2023-01-21 21:11:38 +01:00
Cedric Nugteren
3ca1f5176e
Add tuning results for Adreno 740
2023-01-21 21:09:09 +01:00
Cedric Nugteren
d11b0c8b01
Add tuning results for Adreno 730
2023-01-21 20:33:49 +01:00
Cedric Nugteren
e72f87ae5e
Merge pull request #451 from CodeLinaro/master
...
CLBlast modifications to address Qualcomm Adreno performance
2023-01-21 20:28:32 +01:00
Angus, Alexander
73f49e9b3d
Updated according to feedback from CNugteren
2023-01-17 08:35:29 -08:00
Angus, Alexander
ff6a5689df
Adreno 730 + 740 CLBlast tuning results
2023-01-12 12:33:48 -08:00
Angus, Alexander
4f394608a2
implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2023-01-03 10:56:04 -08:00
Cedric Nugteren
03cffa83c5
Merge pull request #447 from CNugteren/small_plotting_fixes
...
Fix two small issues in the plotting script
2022-10-14 08:18:00 +02:00
Cedric Nugteren
c7d677e4a9
Update changelog
2022-10-13 22:26:26 +02:00
Cedric Nugteren
374eba3ee2
Fix plotting issue with a single row or column
2022-10-13 22:24:35 +02:00
Cedric Nugteren
8aa9f32b23
Fix plotting issue in case of 'inf' values
2022-10-13 22:20:24 +02:00
Cedric Nugteren
d55840e16c
Merge pull request #442 from CNugteren/update_version_to_1_5_3
...
Update to version 1.5.3
2022-09-27 22:45:49 +02:00
Cedric Nugteren
e080635019
Fix opencl.hpp download in CMake
2022-09-27 21:11:17 +02:00
Cedric Nugteren
5c608d97cd
Properly set OpenCL target to version 2.1
2022-09-27 21:09:35 +02:00
Cedric Nugteren
f7db4c5d45
Replace the broken khronos registry link for cl.hpp with a new github link for opencl.hpp
2022-09-22 22:18:58 +02:00
Cedric Nugteren
521eee4bbf
Update PyCLBlast version number
2022-09-22 22:09:21 +02:00
Cedric Nugteren
0de212a56b
Update to version 1.5.3
2022-09-22 22:07:33 +02:00
Cedric Nugteren
38fa34b432
Fix typo in comment
...
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-06-24 09:32:47 +02:00
Cedric Nugteren
d837b64269
Merge pull request #438 from CNugteren/cupp11_api_inconsistency
...
Fix API inconsistency in cupp11.hpp
2022-05-25 09:14:04 +02:00
Cedric Nugteren
9ab1bf24e2
Fix API inconsistency in cupp11.hpp
...
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-23 12:45:22 +02:00
Cedric Nugteren
6b358e1be9
Merge pull request #437 from umar456/blas_fix
...
Add logic to find intel OpenMP on oneMKL.
2022-05-17 08:36:18 +02:00
Cedric Nugteren
1884158128
Merge pull request #432 from justingra/sum-fix
...
sum fix
2022-05-16 08:38:35 +02:00
Umar Arshad
35a4be231a
Add logic to find intel OpenMP on oneMKL.
2022-05-15 15:37:23 -04:00
Justin Graham
fc238a96c9
dev version
2022-05-13 16:46:28 -05:00
Justin Graham
1256f7bfbf
changelog message
2022-05-13 08:45:54 -05:00
Cedric Nugteren
cb43f264cb
Merge pull request #436 from CNugteren/add_tuning_results
...
Add tuning results for 2 AMD GPUs and 1 Qualcomm GPU
2022-04-25 21:42:57 +02:00
Cedric Nugteren
f107162e64
Add tuning results for Adreno 540
2022-04-25 20:36:18 +02:00
Cedric Nugteren
c4163b4b1a
Add tuning results for Radeon RX 6500 XT
2022-04-25 20:33:47 +02:00
Cedric Nugteren
7ec8b2f29b
Add tuning results for Radeon RX 6800 XT
2022-04-25 20:31:55 +02:00
Cedric Nugteren
df0e492d39
Merge pull request #434 from CNugteren/update_test_status_machines
...
Remove old test machines and add new ones
2022-04-25 20:15:07 +02:00
Cedric Nugteren
a7cdf3f0fa
Remove old test machines and add new ones
2022-04-25 20:08:41 +02:00
Justin Graham
ba254d2f50
sum fix
2022-04-22 11:39:38 -05:00
Cedric Nugteren
9e2ccb7f2b
Merge pull request #431 from danyougle/patch-2
...
android.hpp: custom header guard _clang_
2022-04-14 10:26:52 +02:00
danyougle
f3f3c88710
android.hpp: custom header guard of _clang_
...
In order not to have ambiguous definitions, exclude the functions for other compilers
2022-04-13 22:33:12 +02:00
Cedric Nugteren
8d298af10b
Merge pull request #430 from danyougle/patch-1
...
add AMD OCL SDK light path in ENV section
2022-04-13 11:51:47 +02:00
danyougle
6db6ff7107
add AMD OCL SDK light path in ENV section
2022-04-13 10:44:40 +02:00
Cedric Nugteren
4500a03440
Merge pull request #425 from CNugteren/tesla_t4_correctness
...
Tesla T4 tuning parameters
2021-08-27 22:17:30 +02:00
Cedric Nugteren
772dd307ab
Add Quadro T2000 tuning parameters for the Tesla T4
2021-08-27 20:39:59 +02:00
Cedric Nugteren
1f639b7264
Remove Tesla T4 tuning results
2021-08-27 20:32:59 +02:00