Brian Moore
e0c06a9ac1
CMake: Fix for older versions of cmake ( #536 )
2024-04-02 06:24:30 +00:00
engineer1109
e320c204b1
[CLIENTS]fix cuda_runtime not found and assert len(MARKERS) >= len(y_keys[index]) ( #535 )
...
fix cuda_runtime not found and assert len(MARKERS) >= len(y_keys[index])
2024-03-13 19:25:00 +00:00
gspr
b35737b066
Supply formatting string to fprintf ( #530 )
...
This restores compatibility with -Werror=format-security.
2024-02-11 13:18:20 +01:00
Cedric Nugteren
faa2109707
Bump to version 1.6.2 ( #527 )
2024-02-09 21:38:37 +01:00
Cedric Nugteren
6e2ab6ee96
Add tuning results for 5 devices ( #526 )
2024-02-08 20:33:33 +01:00
Karol Herbst
32ad2a67ae
Add openblas PATH_SUFFIX to FindCBLAS ( #525 )
...
This is needed on e.g. Fedora to find openblas.
2024-02-07 19:11:10 +00:00
vathomass
162783a414
Python module mutli-platform setup ( #519 )
...
* Change to pyproject.toml file
* Switch to cmake for building the extension
* Update readme
* Update CHANGELOG
* Add hint fot CLBlast discovery
* Update README about detecting the library
* Switch to scikit-build-core for using CMake
2024-01-21 10:58:38 +01:00
Cedric Nugteren
9535155ad8
Add tuning results for 4 devices ( #518 )
2023-11-12 20:47:00 +01:00
vathomass
564629cafd
Fix floating point conversion in Python wrapper ( #515 )
...
* Generator.py: use LF ending when run from windows
* Convert scalar input to float16 in cython
* Index buffer as uints in cython wrapper
* Update CHANGELOG and bump ver. 1.4.0
2023-11-10 20:33:21 +01:00
Lawrence Angrave
bcd294a93a
Compress tar files in release.yml action ( #508 )
2023-10-02 07:38:13 +00:00
Biswapriyo Nath
42264dab42
CMake: Install pkgconfig file with mingw toolchain ( #506 )
2023-09-25 12:18:24 +02:00
Biswapriyo Nath
60e7574113
CMake: Fix DLL install directory in mingw ( #505 )
...
This installs DLL in 'bin' directory instead of 'lib' by default.
In mingw environment, all DLLs and EXEs are installed in 'bin'
and static & import libraries are installed in 'lib' directory.
This does not affect other environment because the 'RUNTIME'
and 'LIBRARY' targets are automatically set by cmake. See
https://cmake.org/cmake/help/latest/command/install.html
Signed-off-by: Biswapriyo Nath <nathbappai@gmail.com>
2023-09-25 08:57:12 +00:00
Cedric Nugteren
29e13d5a33
Add tuning results for 5 devices ( #503 )
2023-09-14 21:14:26 +02:00
Cedric Nugteren
afb3d8a604
Fix preprocessor and extend test coverage ( #498 )
...
* Improve coverage of pre-processor test
* Make the preprocessor handle the not-defined() construct
* Update the changelog
2023-08-07 20:32:30 +02:00
Cedric Nugteren
e3ce21bb93
Bump to v1.6.1 ( #496 )
2023-07-09 11:24:24 +02:00
Cedric Nugteren
6762e8480c
Fix a multithreading bug related to storing objects in the cache ( #495 )
2023-07-08 20:08:00 +02:00
Cedric Nugteren
83bd474eda
Add tuning results for 7 devices ( #494 )
2023-07-04 21:09:12 +02:00
Cedric Nugteren
af667c45fe
Add 6 tuning results ( #493 )
...
* Add tuning results for 6 devices
* Add GPU generation names for NVIDIA and AMD GPUs in the documentation
2023-06-24 12:10:08 +02:00
Yubraj Bhoi
28a61c53a6
Fix pointer error in `pyclblast` on ARM ( #490 )
...
* Fix pointer error in `pyclblast` on ARM
Use `ptrdiff_t` instead of `size_t` for pointers.
Fix error in `setup.py`
* Fix ARM pointer error in `pyclblast` generator
Update CHANGELOG file
2023-06-16 09:45:16 +00:00
Cedric Nugteren
2b98c6a28c
Add tuning results for more devices ( #488 )
...
Add tuning results for 13 devices
2023-06-06 21:31:35 +02:00
Cedric Nugteren
ec733402a8
Add tuning results for Radeon RX 6700 XT ( #484 )
2023-06-01 21:51:33 +02:00
Cedric Nugteren
05a26111f7
Add tuning results for 14 devices ( #483 )
2023-05-31 21:06:52 +02:00
Cedric Nugteren
8d1cbde036
Fix folder name of Windows release ( #482 )
2023-05-26 21:21:23 +02:00
Cedric Nugteren
1b66a1149e
Use NMake for Windows builds ( #481 )
2023-05-26 20:11:44 +02:00
Mikko Vedru
26ceab814f
Update README.md with a useful link ( #476 )
...
* Update README.md with a useful link
* Update README.md
Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
---------
Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
2023-05-23 09:09:21 +02:00
Cedric Nugteren
107beaac17
Fix issues in Windows release script ( #477 )
2023-05-22 21:05:36 +02:00
Cedric Nugteren
b0b302889c
Update to version 1.6.0 ( #475 )
2023-05-21 20:51:05 +02:00
Cedric Nugteren
036684204e
Github Actions Windows builds with tests ( #472 )
...
* Set CMake CMP0074 policy
* Attempt to use pre-compiled OpenBLAS on Windows CI
* Fix an issue and add some debugging
* Improve FindCBLAS for OpenBLAS on Windows
2023-05-21 17:07:31 +02:00
Cedric Nugteren
63eb127bad
Intel HD Graphics 770 and AMD RX 6600 XT tuning results ( #474 )
...
* Add tuning results for AMD Radeon RX 6600 XT
* Add tuning results for Intel HD Graphics 770
* Update list of tuned devices
2023-05-21 14:25:15 +02:00
Mikko Vedru
0832ed6a16
Documentation: tuning.md: Add a useful link ( #473 )
2023-05-21 08:07:43 +00:00
Cedric Nugteren
ce5e446fbe
Actualize the README and remove the old ROADMAP ( #471 )
2023-05-18 18:10:12 +02:00
Cedric Nugteren
db3bd0a32e
Add Windows builds to Github Actions and fix Windows compilation issue ( #470 )
...
* Add Windows builds to Github Actions CI
* Fix failing Windows builds
2023-05-18 16:58:31 +02:00
Cedric Nugteren
e73f0b5131
Add Github Actions release script ( #469 )
...
* Add first version of release script
* Several fixes for the Windows release job
* Install OpenCL for Windows release
* Fix issue with environment variable
* Set OpenCL root
* Fix zipping in Windows build
2023-05-17 17:10:51 +02:00
Cedric Nugteren
7a3ef92ff2
Add 3 sets of tuning results: RX 5700 XT, 2080 Ti, and 3090 ( #468 )
...
* Add tuning results for AMD Radeon RX 5700 XT
* Add tuning results for NVIDIA GeForce RTX 2080 Ti
* Add tuning results for NVIDIA GeForce RTX 3090
2023-05-17 10:31:47 +02:00
Cedric Nugteren
221121b840
Add Github Actions CI ( #464 )
...
This replaces the old Travis CI builds with Github Actions that test on both Ubuntu and MacOS, with both Clang and GCC. The builds on macOS also run the tests and some other programs, on Ubuntu OpenCL is not working at the moment. Because these tests use new/different compilers, I fixed a few warnings and errors along the way.
2023-05-14 11:25:15 +02:00
Cedric Nugteren
8d2f3540e9
Merge pull request #465 from CNugteren/fix_override_parameters_test
...
Fix compilation issue in override parameters test
2023-05-11 08:49:26 +02:00
Cedric Nugteren
6e6efb72be
Fix compilation issue in override parameters test
2023-05-10 21:31:33 +02:00
Cedric Nugteren
3baf823575
Fixes an issue under Android when the driver was already unloaded ( #462 )
2023-05-10 17:10:17 +02:00
Cedric Nugteren
d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer ( #461 )
2023-05-10 12:48:25 +02:00
Cedric Nugteren
4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer ( #458 )
2023-05-07 20:03:16 +02:00
Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren
1573f7d304
Merge pull request #455 from CNugteren/fix_gemm_documentation_bug
...
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 21:25:41 +01:00
Cedric Nugteren
9eca896b05
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 20:24:40 +01:00
Cedric Nugteren
ab5092dd26
Merge pull request #452 from CNugteren/add_tuning_results_adreno
...
Add tuning results for 4 devies
2023-01-22 15:52:49 +01:00
Cedric Nugteren
c9856758b3
Add tuning results for Intel FPGA emulation device
2023-01-21 21:13:49 +01:00
Cedric Nugteren
f4a14daf8d
Add tuning results for Radeon Pro 450
2023-01-21 21:11:38 +01:00
Cedric Nugteren
3ca1f5176e
Add tuning results for Adreno 740
2023-01-21 21:09:09 +01:00
Cedric Nugteren
d11b0c8b01
Add tuning results for Adreno 730
2023-01-21 20:33:49 +01:00
Cedric Nugteren
e72f87ae5e
Merge pull request #451 from CodeLinaro/master
...
CLBlast modifications to address Qualcomm Adreno performance
2023-01-21 20:28:32 +01:00
Angus, Alexander
73f49e9b3d
Updated according to feedback from CNugteren
2023-01-17 08:35:29 -08:00