Cedric Nugteren
29e13d5a33
Add tuning results for 5 devices ( #503 )
2023-09-14 21:14:26 +02:00
Cedric Nugteren
afb3d8a604
Fix preprocessor and extend test coverage ( #498 )
...
* Improve coverage of pre-processor test
* Make the preprocessor handle the not-defined() construct
* Update the changelog
2023-08-07 20:32:30 +02:00
Cedric Nugteren
e3ce21bb93
Bump to v1.6.1 ( #496 )
2023-07-09 11:24:24 +02:00
Cedric Nugteren
6762e8480c
Fix a multithreading bug related to storing objects in the cache ( #495 )
2023-07-08 20:08:00 +02:00
Cedric Nugteren
83bd474eda
Add tuning results for 7 devices ( #494 )
2023-07-04 21:09:12 +02:00
Cedric Nugteren
af667c45fe
Add 6 tuning results ( #493 )
...
* Add tuning results for 6 devices
* Add GPU generation names for NVIDIA and AMD GPUs in the documentation
2023-06-24 12:10:08 +02:00
Yubraj Bhoi
28a61c53a6
Fix pointer error in `pyclblast` on ARM ( #490 )
...
* Fix pointer error in `pyclblast` on ARM
Use `ptrdiff_t` instead of `size_t` for pointers.
Fix error in `setup.py`
* Fix ARM pointer error in `pyclblast` generator
Update CHANGELOG file
2023-06-16 09:45:16 +00:00
Cedric Nugteren
2b98c6a28c
Add tuning results for more devices ( #488 )
...
Add tuning results for 13 devices
2023-06-06 21:31:35 +02:00
Cedric Nugteren
ec733402a8
Add tuning results for Radeon RX 6700 XT ( #484 )
2023-06-01 21:51:33 +02:00
Cedric Nugteren
05a26111f7
Add tuning results for 14 devices ( #483 )
2023-05-31 21:06:52 +02:00
Cedric Nugteren
8d1cbde036
Fix folder name of Windows release ( #482 )
2023-05-26 21:21:23 +02:00
Cedric Nugteren
1b66a1149e
Use NMake for Windows builds ( #481 )
2023-05-26 20:11:44 +02:00
Mikko Vedru
26ceab814f
Update README.md with a useful link ( #476 )
...
* Update README.md with a useful link
* Update README.md
Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
---------
Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
2023-05-23 09:09:21 +02:00
Cedric Nugteren
107beaac17
Fix issues in Windows release script ( #477 )
2023-05-22 21:05:36 +02:00
Cedric Nugteren
b0b302889c
Update to version 1.6.0 ( #475 )
2023-05-21 20:51:05 +02:00
Cedric Nugteren
036684204e
Github Actions Windows builds with tests ( #472 )
...
* Set CMake CMP0074 policy
* Attempt to use pre-compiled OpenBLAS on Windows CI
* Fix an issue and add some debugging
* Improve FindCBLAS for OpenBLAS on Windows
2023-05-21 17:07:31 +02:00
Cedric Nugteren
63eb127bad
Intel HD Graphics 770 and AMD RX 6600 XT tuning results ( #474 )
...
* Add tuning results for AMD Radeon RX 6600 XT
* Add tuning results for Intel HD Graphics 770
* Update list of tuned devices
2023-05-21 14:25:15 +02:00
Mikko Vedru
0832ed6a16
Documentation: tuning.md: Add a useful link ( #473 )
2023-05-21 08:07:43 +00:00
Cedric Nugteren
ce5e446fbe
Actualize the README and remove the old ROADMAP ( #471 )
2023-05-18 18:10:12 +02:00
Cedric Nugteren
db3bd0a32e
Add Windows builds to Github Actions and fix Windows compilation issue ( #470 )
...
* Add Windows builds to Github Actions CI
* Fix failing Windows builds
2023-05-18 16:58:31 +02:00
Cedric Nugteren
e73f0b5131
Add Github Actions release script ( #469 )
...
* Add first version of release script
* Several fixes for the Windows release job
* Install OpenCL for Windows release
* Fix issue with environment variable
* Set OpenCL root
* Fix zipping in Windows build
2023-05-17 17:10:51 +02:00
Cedric Nugteren
7a3ef92ff2
Add 3 sets of tuning results: RX 5700 XT, 2080 Ti, and 3090 ( #468 )
...
* Add tuning results for AMD Radeon RX 5700 XT
* Add tuning results for NVIDIA GeForce RTX 2080 Ti
* Add tuning results for NVIDIA GeForce RTX 3090
2023-05-17 10:31:47 +02:00
Cedric Nugteren
221121b840
Add Github Actions CI ( #464 )
...
This replaces the old Travis CI builds with Github Actions that test on both Ubuntu and MacOS, with both Clang and GCC. The builds on macOS also run the tests and some other programs, on Ubuntu OpenCL is not working at the moment. Because these tests use new/different compilers, I fixed a few warnings and errors along the way.
2023-05-14 11:25:15 +02:00
Cedric Nugteren
8d2f3540e9
Merge pull request #465 from CNugteren/fix_override_parameters_test
...
Fix compilation issue in override parameters test
2023-05-11 08:49:26 +02:00
Cedric Nugteren
6e6efb72be
Fix compilation issue in override parameters test
2023-05-10 21:31:33 +02:00
Cedric Nugteren
3baf823575
Fixes an issue under Android when the driver was already unloaded ( #462 )
2023-05-10 17:10:17 +02:00
Cedric Nugteren
d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer ( #461 )
2023-05-10 12:48:25 +02:00
Cedric Nugteren
4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer ( #458 )
2023-05-07 20:03:16 +02:00
Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren
1573f7d304
Merge pull request #455 from CNugteren/fix_gemm_documentation_bug
...
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 21:25:41 +01:00
Cedric Nugteren
9eca896b05
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 20:24:40 +01:00
Cedric Nugteren
ab5092dd26
Merge pull request #452 from CNugteren/add_tuning_results_adreno
...
Add tuning results for 4 devies
2023-01-22 15:52:49 +01:00
Cedric Nugteren
c9856758b3
Add tuning results for Intel FPGA emulation device
2023-01-21 21:13:49 +01:00
Cedric Nugteren
f4a14daf8d
Add tuning results for Radeon Pro 450
2023-01-21 21:11:38 +01:00
Cedric Nugteren
3ca1f5176e
Add tuning results for Adreno 740
2023-01-21 21:09:09 +01:00
Cedric Nugteren
d11b0c8b01
Add tuning results for Adreno 730
2023-01-21 20:33:49 +01:00
Cedric Nugteren
e72f87ae5e
Merge pull request #451 from CodeLinaro/master
...
CLBlast modifications to address Qualcomm Adreno performance
2023-01-21 20:28:32 +01:00
Angus, Alexander
73f49e9b3d
Updated according to feedback from CNugteren
2023-01-17 08:35:29 -08:00
Angus, Alexander
ff6a5689df
Adreno 730 + 740 CLBlast tuning results
2023-01-12 12:33:48 -08:00
Angus, Alexander
4f394608a2
implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2023-01-03 10:56:04 -08:00
Cedric Nugteren
03cffa83c5
Merge pull request #447 from CNugteren/small_plotting_fixes
...
Fix two small issues in the plotting script
2022-10-14 08:18:00 +02:00
Cedric Nugteren
c7d677e4a9
Update changelog
2022-10-13 22:26:26 +02:00
Cedric Nugteren
374eba3ee2
Fix plotting issue with a single row or column
2022-10-13 22:24:35 +02:00
Cedric Nugteren
8aa9f32b23
Fix plotting issue in case of 'inf' values
2022-10-13 22:20:24 +02:00
Cedric Nugteren
d55840e16c
Merge pull request #442 from CNugteren/update_version_to_1_5_3
...
Update to version 1.5.3
2022-09-27 22:45:49 +02:00
Cedric Nugteren
e080635019
Fix opencl.hpp download in CMake
2022-09-27 21:11:17 +02:00
Cedric Nugteren
5c608d97cd
Properly set OpenCL target to version 2.1
2022-09-27 21:09:35 +02:00
Cedric Nugteren
f7db4c5d45
Replace the broken khronos registry link for cl.hpp with a new github link for opencl.hpp
2022-09-22 22:18:58 +02:00
Cedric Nugteren
521eee4bbf
Update PyCLBlast version number
2022-09-22 22:09:21 +02:00
Cedric Nugteren
0de212a56b
Update to version 1.5.3
2022-09-22 22:07:33 +02:00