Cedric Nugteren
036684204e
Github Actions Windows builds with tests ( #472 )
...
* Set CMake CMP0074 policy
* Attempt to use pre-compiled OpenBLAS on Windows CI
* Fix an issue and add some debugging
* Improve FindCBLAS for OpenBLAS on Windows
2023-05-21 17:07:31 +02:00
Cedric Nugteren
63eb127bad
Intel HD Graphics 770 and AMD RX 6600 XT tuning results ( #474 )
...
* Add tuning results for AMD Radeon RX 6600 XT
* Add tuning results for Intel HD Graphics 770
* Update list of tuned devices
2023-05-21 14:25:15 +02:00
Mikko Vedru
0832ed6a16
Documentation: tuning.md: Add a useful link ( #473 )
2023-05-21 08:07:43 +00:00
Cedric Nugteren
ce5e446fbe
Actualize the README and remove the old ROADMAP ( #471 )
2023-05-18 18:10:12 +02:00
Cedric Nugteren
db3bd0a32e
Add Windows builds to Github Actions and fix Windows compilation issue ( #470 )
...
* Add Windows builds to Github Actions CI
* Fix failing Windows builds
2023-05-18 16:58:31 +02:00
Cedric Nugteren
e73f0b5131
Add Github Actions release script ( #469 )
...
* Add first version of release script
* Several fixes for the Windows release job
* Install OpenCL for Windows release
* Fix issue with environment variable
* Set OpenCL root
* Fix zipping in Windows build
2023-05-17 17:10:51 +02:00
Cedric Nugteren
7a3ef92ff2
Add 3 sets of tuning results: RX 5700 XT, 2080 Ti, and 3090 ( #468 )
...
* Add tuning results for AMD Radeon RX 5700 XT
* Add tuning results for NVIDIA GeForce RTX 2080 Ti
* Add tuning results for NVIDIA GeForce RTX 3090
2023-05-17 10:31:47 +02:00
Cedric Nugteren
221121b840
Add Github Actions CI ( #464 )
...
This replaces the old Travis CI builds with Github Actions that test on both Ubuntu and MacOS, with both Clang and GCC. The builds on macOS also run the tests and some other programs, on Ubuntu OpenCL is not working at the moment. Because these tests use new/different compilers, I fixed a few warnings and errors along the way.
2023-05-14 11:25:15 +02:00
Cedric Nugteren
8d2f3540e9
Merge pull request #465 from CNugteren/fix_override_parameters_test
...
Fix compilation issue in override parameters test
2023-05-11 08:49:26 +02:00
Cedric Nugteren
6e6efb72be
Fix compilation issue in override parameters test
2023-05-10 21:31:33 +02:00
Cedric Nugteren
3baf823575
Fixes an issue under Android when the driver was already unloaded ( #462 )
2023-05-10 17:10:17 +02:00
Cedric Nugteren
d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer ( #461 )
2023-05-10 12:48:25 +02:00
Cedric Nugteren
4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer ( #458 )
2023-05-07 20:03:16 +02:00
Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren
1573f7d304
Merge pull request #455 from CNugteren/fix_gemm_documentation_bug
...
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 21:25:41 +01:00
Cedric Nugteren
9eca896b05
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 20:24:40 +01:00
Cedric Nugteren
ab5092dd26
Merge pull request #452 from CNugteren/add_tuning_results_adreno
...
Add tuning results for 4 devies
2023-01-22 15:52:49 +01:00
Cedric Nugteren
c9856758b3
Add tuning results for Intel FPGA emulation device
2023-01-21 21:13:49 +01:00
Cedric Nugteren
f4a14daf8d
Add tuning results for Radeon Pro 450
2023-01-21 21:11:38 +01:00
Cedric Nugteren
3ca1f5176e
Add tuning results for Adreno 740
2023-01-21 21:09:09 +01:00
Cedric Nugteren
d11b0c8b01
Add tuning results for Adreno 730
2023-01-21 20:33:49 +01:00
Cedric Nugteren
e72f87ae5e
Merge pull request #451 from CodeLinaro/master
...
CLBlast modifications to address Qualcomm Adreno performance
2023-01-21 20:28:32 +01:00
Angus, Alexander
73f49e9b3d
Updated according to feedback from CNugteren
2023-01-17 08:35:29 -08:00
Angus, Alexander
ff6a5689df
Adreno 730 + 740 CLBlast tuning results
2023-01-12 12:33:48 -08:00
Angus, Alexander
4f394608a2
implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2023-01-03 10:56:04 -08:00
Cedric Nugteren
03cffa83c5
Merge pull request #447 from CNugteren/small_plotting_fixes
...
Fix two small issues in the plotting script
2022-10-14 08:18:00 +02:00
Cedric Nugteren
c7d677e4a9
Update changelog
2022-10-13 22:26:26 +02:00
Cedric Nugteren
374eba3ee2
Fix plotting issue with a single row or column
2022-10-13 22:24:35 +02:00
Cedric Nugteren
8aa9f32b23
Fix plotting issue in case of 'inf' values
2022-10-13 22:20:24 +02:00
Cedric Nugteren
d55840e16c
Merge pull request #442 from CNugteren/update_version_to_1_5_3
...
Update to version 1.5.3
2022-09-27 22:45:49 +02:00
Cedric Nugteren
e080635019
Fix opencl.hpp download in CMake
2022-09-27 21:11:17 +02:00
Cedric Nugteren
5c608d97cd
Properly set OpenCL target to version 2.1
2022-09-27 21:09:35 +02:00
Cedric Nugteren
f7db4c5d45
Replace the broken khronos registry link for cl.hpp with a new github link for opencl.hpp
2022-09-22 22:18:58 +02:00
Cedric Nugteren
521eee4bbf
Update PyCLBlast version number
2022-09-22 22:09:21 +02:00
Cedric Nugteren
0de212a56b
Update to version 1.5.3
2022-09-22 22:07:33 +02:00
Cedric Nugteren
38fa34b432
Fix typo in comment
...
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-06-24 09:32:47 +02:00
Cedric Nugteren
d837b64269
Merge pull request #438 from CNugteren/cupp11_api_inconsistency
...
Fix API inconsistency in cupp11.hpp
2022-05-25 09:14:04 +02:00
Cedric Nugteren
9ab1bf24e2
Fix API inconsistency in cupp11.hpp
...
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-23 12:45:22 +02:00
Cedric Nugteren
6b358e1be9
Merge pull request #437 from umar456/blas_fix
...
Add logic to find intel OpenMP on oneMKL.
2022-05-17 08:36:18 +02:00
Cedric Nugteren
1884158128
Merge pull request #432 from justingra/sum-fix
...
sum fix
2022-05-16 08:38:35 +02:00
Umar Arshad
35a4be231a
Add logic to find intel OpenMP on oneMKL.
2022-05-15 15:37:23 -04:00
Justin Graham
fc238a96c9
dev version
2022-05-13 16:46:28 -05:00
Justin Graham
1256f7bfbf
changelog message
2022-05-13 08:45:54 -05:00
Cedric Nugteren
cb43f264cb
Merge pull request #436 from CNugteren/add_tuning_results
...
Add tuning results for 2 AMD GPUs and 1 Qualcomm GPU
2022-04-25 21:42:57 +02:00
Cedric Nugteren
f107162e64
Add tuning results for Adreno 540
2022-04-25 20:36:18 +02:00
Cedric Nugteren
c4163b4b1a
Add tuning results for Radeon RX 6500 XT
2022-04-25 20:33:47 +02:00
Cedric Nugteren
7ec8b2f29b
Add tuning results for Radeon RX 6800 XT
2022-04-25 20:31:55 +02:00
Cedric Nugteren
df0e492d39
Merge pull request #434 from CNugteren/update_test_status_machines
...
Remove old test machines and add new ones
2022-04-25 20:15:07 +02:00
Cedric Nugteren
a7cdf3f0fa
Remove old test machines and add new ones
2022-04-25 20:08:41 +02:00
Justin Graham
ba254d2f50
sum fix
2022-04-22 11:39:38 -05:00