Commit Graph

1477 Commits (master)

Author SHA1 Message Date
Brian Moore e0c06a9ac1
CMake: Fix for older versions of cmake (#536) 2024-04-02 06:24:30 +00:00
engineer1109 e320c204b1
[CLIENTS]fix cuda_runtime not found and assert len(MARKERS) >= len(y_keys[index]) (#535)
fix cuda_runtime not found and assert len(MARKERS) >= len(y_keys[index])
2024-03-13 19:25:00 +00:00
gspr b35737b066
Supply formatting string to fprintf (#530)
This restores compatibility with -Werror=format-security.
2024-02-11 13:18:20 +01:00
Cedric Nugteren faa2109707
Bump to version 1.6.2 (#527) 2024-02-09 21:38:37 +01:00
Cedric Nugteren 6e2ab6ee96
Add tuning results for 5 devices (#526) 2024-02-08 20:33:33 +01:00
Karol Herbst 32ad2a67ae
Add openblas PATH_SUFFIX to FindCBLAS (#525)
This is needed on e.g. Fedora to find openblas.
2024-02-07 19:11:10 +00:00
vathomass 162783a414
Python module mutli-platform setup (#519)
* Change to pyproject.toml file

* Switch to cmake for building the extension

* Update readme

* Update CHANGELOG

* Add hint fot CLBlast discovery

* Update README about detecting the library

* Switch to scikit-build-core for using CMake
2024-01-21 10:58:38 +01:00
Cedric Nugteren 9535155ad8
Add tuning results for 4 devices (#518) 2023-11-12 20:47:00 +01:00
vathomass 564629cafd
Fix floating point conversion in Python wrapper (#515)
* Generator.py: use LF ending when run from windows

* Convert scalar input to float16 in cython

* Index buffer as uints in cython wrapper

* Update CHANGELOG and bump ver. 1.4.0
2023-11-10 20:33:21 +01:00
Lawrence Angrave bcd294a93a
Compress tar files in release.yml action (#508) 2023-10-02 07:38:13 +00:00
Biswapriyo Nath 42264dab42
CMake: Install pkgconfig file with mingw toolchain (#506) 2023-09-25 12:18:24 +02:00
Biswapriyo Nath 60e7574113
CMake: Fix DLL install directory in mingw (#505)
This installs DLL in 'bin' directory instead of 'lib' by default.
In mingw environment, all DLLs and EXEs are installed in 'bin'
and static & import libraries are installed in 'lib' directory.
This does not affect other environment because the 'RUNTIME'
and 'LIBRARY' targets are automatically set by cmake. See
https://cmake.org/cmake/help/latest/command/install.html

Signed-off-by: Biswapriyo Nath <nathbappai@gmail.com>
2023-09-25 08:57:12 +00:00
Cedric Nugteren 29e13d5a33
Add tuning results for 5 devices (#503) 2023-09-14 21:14:26 +02:00
Cedric Nugteren afb3d8a604
Fix preprocessor and extend test coverage (#498)
* Improve coverage of pre-processor test

* Make the preprocessor handle the not-defined() construct

* Update the changelog
2023-08-07 20:32:30 +02:00
Cedric Nugteren e3ce21bb93
Bump to v1.6.1 (#496) 2023-07-09 11:24:24 +02:00
Cedric Nugteren 6762e8480c
Fix a multithreading bug related to storing objects in the cache (#495) 2023-07-08 20:08:00 +02:00
Cedric Nugteren 83bd474eda
Add tuning results for 7 devices (#494) 2023-07-04 21:09:12 +02:00
Cedric Nugteren af667c45fe
Add 6 tuning results (#493)
* Add tuning results for 6 devices

* Add GPU generation names for NVIDIA and AMD GPUs in the documentation
2023-06-24 12:10:08 +02:00
Yubraj Bhoi 28a61c53a6
Fix pointer error in `pyclblast` on ARM (#490)
* Fix pointer error in `pyclblast` on ARM

Use `ptrdiff_t` instead of `size_t` for pointers.
Fix error in `setup.py`

* Fix ARM pointer error in `pyclblast` generator

Update CHANGELOG file
2023-06-16 09:45:16 +00:00
Cedric Nugteren 2b98c6a28c
Add tuning results for more devices (#488)
Add tuning results for 13 devices
2023-06-06 21:31:35 +02:00
Cedric Nugteren ec733402a8
Add tuning results for Radeon RX 6700 XT (#484) 2023-06-01 21:51:33 +02:00
Cedric Nugteren 05a26111f7
Add tuning results for 14 devices (#483) 2023-05-31 21:06:52 +02:00
Cedric Nugteren 8d1cbde036
Fix folder name of Windows release (#482) 2023-05-26 21:21:23 +02:00
Cedric Nugteren 1b66a1149e
Use NMake for Windows builds (#481) 2023-05-26 20:11:44 +02:00
Mikko Vedru 26ceab814f
Update README.md with a useful link (#476)
* Update README.md with a useful link

* Update README.md

Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>

---------

Co-authored-by: Cedric Nugteren <web@cedricnugteren.nl>
2023-05-23 09:09:21 +02:00
Cedric Nugteren 107beaac17
Fix issues in Windows release script (#477) 2023-05-22 21:05:36 +02:00
Cedric Nugteren b0b302889c
Update to version 1.6.0 (#475) 2023-05-21 20:51:05 +02:00
Cedric Nugteren 036684204e
Github Actions Windows builds with tests (#472)
* Set CMake CMP0074 policy

* Attempt to use pre-compiled OpenBLAS on Windows CI

* Fix an issue and add some debugging

* Improve FindCBLAS for OpenBLAS on Windows
2023-05-21 17:07:31 +02:00
Cedric Nugteren 63eb127bad
Intel HD Graphics 770 and AMD RX 6600 XT tuning results (#474)
* Add tuning results for AMD Radeon RX 6600 XT

* Add tuning results for Intel HD Graphics 770

* Update list of tuned devices
2023-05-21 14:25:15 +02:00
Mikko Vedru 0832ed6a16
Documentation: tuning.md: Add a useful link (#473) 2023-05-21 08:07:43 +00:00
Cedric Nugteren ce5e446fbe
Actualize the README and remove the old ROADMAP (#471) 2023-05-18 18:10:12 +02:00
Cedric Nugteren db3bd0a32e
Add Windows builds to Github Actions and fix Windows compilation issue (#470)
* Add Windows builds to Github Actions CI

* Fix failing Windows builds
2023-05-18 16:58:31 +02:00
Cedric Nugteren e73f0b5131
Add Github Actions release script (#469)
* Add first version of release script

* Several fixes for the Windows release job

* Install OpenCL for Windows release

* Fix issue with environment variable

* Set OpenCL root

* Fix zipping in Windows build
2023-05-17 17:10:51 +02:00
Cedric Nugteren 7a3ef92ff2
Add 3 sets of tuning results: RX 5700 XT, 2080 Ti, and 3090 (#468)
* Add tuning results for AMD Radeon RX 5700 XT

* Add tuning results for NVIDIA GeForce RTX 2080 Ti

* Add tuning results for NVIDIA GeForce RTX 3090
2023-05-17 10:31:47 +02:00
Cedric Nugteren 221121b840
Add Github Actions CI (#464)
This replaces the old Travis CI builds with Github Actions that test on both Ubuntu and MacOS, with both Clang and GCC. The builds on macOS also run the tests and some other programs, on Ubuntu OpenCL is not working at the moment. Because these tests use new/different compilers, I fixed a few warnings and errors along the way.
2023-05-14 11:25:15 +02:00
Cedric Nugteren 8d2f3540e9
Merge pull request #465 from CNugteren/fix_override_parameters_test
Fix compilation issue in override parameters test
2023-05-11 08:49:26 +02:00
Cedric Nugteren 6e6efb72be Fix compilation issue in override parameters test 2023-05-10 21:31:33 +02:00
Cedric Nugteren 3baf823575
Fixes an issue under Android when the driver was already unloaded (#462) 2023-05-10 17:10:17 +02:00
Cedric Nugteren d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer (#461) 2023-05-10 12:48:25 +02:00
Cedric Nugteren 4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer (#458) 2023-05-07 20:03:16 +02:00
Cedric Nugteren 3d0c227fa5
AMAX/AMIN integer testing and bug fixes (#457)
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result

* Perform proper integer-output testing in XAMAX tests

* A few changes towards getting it ready for a PR

* Also fix compilation for clBLAS and cuBLAS references

* Fix a bug that would only use the real part of complex numbers in the amax/amin routines

* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren 1573f7d304
Merge pull request #455 from CNugteren/fix_gemm_documentation_bug
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 21:25:41 +01:00
Cedric Nugteren 9eca896b05 Fix documentation bug w.r.t. ld values and matrix layout 2023-03-25 20:24:40 +01:00
Cedric Nugteren ab5092dd26
Merge pull request #452 from CNugteren/add_tuning_results_adreno
Add tuning results for 4 devies
2023-01-22 15:52:49 +01:00
Cedric Nugteren c9856758b3 Add tuning results for Intel FPGA emulation device 2023-01-21 21:13:49 +01:00
Cedric Nugteren f4a14daf8d Add tuning results for Radeon Pro 450 2023-01-21 21:11:38 +01:00
Cedric Nugteren 3ca1f5176e Add tuning results for Adreno 740 2023-01-21 21:09:09 +01:00
Cedric Nugteren d11b0c8b01 Add tuning results for Adreno 730 2023-01-21 20:33:49 +01:00
Cedric Nugteren e72f87ae5e
Merge pull request #451 from CodeLinaro/master
CLBlast modifications to address Qualcomm Adreno performance
2023-01-21 20:28:32 +01:00
Angus, Alexander 73f49e9b3d Updated according to feedback from CNugteren 2023-01-17 08:35:29 -08:00