Commit Graph

780 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren 6e2ab6ee96
Add tuning results for 5 devices (#526) 2024-02-08 20:33:33 +01:00
vathomass 162783a414
Python module mutli-platform setup (#519)
* Change to pyproject.toml file

* Switch to cmake for building the extension

* Update readme

* Update CHANGELOG

* Add hint fot CLBlast discovery

* Update README about detecting the library

* Switch to scikit-build-core for using CMake
2024-01-21 10:58:38 +01:00
Cedric Nugteren 9535155ad8
Add tuning results for 4 devices (#518) 2023-11-12 20:47:00 +01:00
vathomass 564629cafd
Fix floating point conversion in Python wrapper (#515)
* Generator.py: use LF ending when run from windows

* Convert scalar input to float16 in cython

* Index buffer as uints in cython wrapper

* Update CHANGELOG and bump ver. 1.4.0
2023-11-10 20:33:21 +01:00
Cedric Nugteren 29e13d5a33
Add tuning results for 5 devices (#503) 2023-09-14 21:14:26 +02:00
Cedric Nugteren afb3d8a604
Fix preprocessor and extend test coverage (#498)
* Improve coverage of pre-processor test

* Make the preprocessor handle the not-defined() construct

* Update the changelog
2023-08-07 20:32:30 +02:00
Cedric Nugteren 6762e8480c
Fix a multithreading bug related to storing objects in the cache (#495) 2023-07-08 20:08:00 +02:00
Cedric Nugteren 83bd474eda
Add tuning results for 7 devices (#494) 2023-07-04 21:09:12 +02:00
Cedric Nugteren af667c45fe
Add 6 tuning results (#493)
* Add tuning results for 6 devices

* Add GPU generation names for NVIDIA and AMD GPUs in the documentation
2023-06-24 12:10:08 +02:00
Yubraj Bhoi 28a61c53a6
Fix pointer error in `pyclblast` on ARM (#490)
* Fix pointer error in `pyclblast` on ARM

Use `ptrdiff_t` instead of `size_t` for pointers.
Fix error in `setup.py`

* Fix ARM pointer error in `pyclblast` generator

Update CHANGELOG file
2023-06-16 09:45:16 +00:00
Cedric Nugteren 2b98c6a28c
Add tuning results for more devices (#488)
Add tuning results for 13 devices
2023-06-06 21:31:35 +02:00
Cedric Nugteren ec733402a8
Add tuning results for Radeon RX 6700 XT (#484) 2023-06-01 21:51:33 +02:00
Cedric Nugteren 05a26111f7
Add tuning results for 14 devices (#483) 2023-05-31 21:06:52 +02:00
Cedric Nugteren 63eb127bad
Intel HD Graphics 770 and AMD RX 6600 XT tuning results (#474)
* Add tuning results for AMD Radeon RX 6600 XT

* Add tuning results for Intel HD Graphics 770

* Update list of tuned devices
2023-05-21 14:25:15 +02:00
Cedric Nugteren 7a3ef92ff2
Add 3 sets of tuning results: RX 5700 XT, 2080 Ti, and 3090 (#468)
* Add tuning results for AMD Radeon RX 5700 XT

* Add tuning results for NVIDIA GeForce RTX 2080 Ti

* Add tuning results for NVIDIA GeForce RTX 3090
2023-05-17 10:31:47 +02:00
Cedric Nugteren 3baf823575
Fixes an issue under Android when the driver was already unloaded (#462) 2023-05-10 17:10:17 +02:00
Cedric Nugteren d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer (#461) 2023-05-10 12:48:25 +02:00
Cedric Nugteren 4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer (#458) 2023-05-07 20:03:16 +02:00
Cedric Nugteren 3d0c227fa5
AMAX/AMIN integer testing and bug fixes (#457)
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result

* Perform proper integer-output testing in XAMAX tests

* A few changes towards getting it ready for a PR

* Also fix compilation for clBLAS and cuBLAS references

* Fix a bug that would only use the real part of complex numbers in the amax/amin routines

* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren c9856758b3 Add tuning results for Intel FPGA emulation device 2023-01-21 21:13:49 +01:00
Cedric Nugteren f4a14daf8d Add tuning results for Radeon Pro 450 2023-01-21 21:11:38 +01:00
Cedric Nugteren 3ca1f5176e Add tuning results for Adreno 740 2023-01-21 21:09:09 +01:00
Cedric Nugteren d11b0c8b01 Add tuning results for Adreno 730 2023-01-21 20:33:49 +01:00
Angus, Alexander 73f49e9b3d Updated according to feedback from CNugteren 2023-01-17 08:35:29 -08:00
Angus, Alexander 4f394608a2 implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731 2023-01-03 10:56:04 -08:00
Cedric Nugteren 521eee4bbf Update PyCLBlast version number 2022-09-22 22:09:21 +02:00
Cedric Nugteren 38fa34b432
Fix typo in comment
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-06-24 09:32:47 +02:00
Cedric Nugteren 9ab1bf24e2
Fix API inconsistency in cupp11.hpp
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-23 12:45:22 +02:00
Cedric Nugteren 1884158128
Merge pull request #432 from justingra/sum-fix
sum fix
2022-05-16 08:38:35 +02:00
Cedric Nugteren f107162e64 Add tuning results for Adreno 540 2022-04-25 20:36:18 +02:00
Cedric Nugteren c4163b4b1a Add tuning results for Radeon RX 6500 XT 2022-04-25 20:33:47 +02:00
Cedric Nugteren 7ec8b2f29b Add tuning results for Radeon RX 6800 XT 2022-04-25 20:31:55 +02:00
Justin Graham ba254d2f50 sum fix 2022-04-22 11:39:38 -05:00
danyougle f3f3c88710
android.hpp: custom header guard of _clang_
In order not to have ambiguous definitions, exclude the functions for other compilers
2022-04-13 22:33:12 +02:00
Cedric Nugteren 772dd307ab Add Quadro T2000 tuning parameters for the Tesla T4 2021-08-27 20:39:59 +02:00
Cedric Nugteren 1f639b7264 Remove Tesla T4 tuning results 2021-08-27 20:32:59 +02:00
Cedric Nugteren 5a9bd270f8 Add tuning results for NVIDIA Tesla V100 2021-08-19 20:34:09 +02:00
Cedric Nugteren adb4b02982 Add tuning results for NVIDIA Tesla T4 2021-08-19 20:31:52 +02:00
Cedric Nugteren dea3b5fadb Add tuning results for NVIDIA Quadro T2000 2021-08-19 20:29:47 +02:00
Cedric Nugteren 521ad117bc Add tuning results for NVIDIA Quadro GV100 2021-08-19 20:27:39 +02:00
Cedric Nugteren e9dec268bc Add tuning results for Intel Core i9-9980HK 2021-08-19 20:25:26 +02:00
Cedric Nugteren e59ea46180 Add tuning results for NVIDIA A100 2021-08-19 20:23:25 +02:00
Cedric Nugteren 468a4a74eb Fix issue with printing out-of-bounds local/global sizes for level 1 tuners 2021-05-22 20:31:12 +02:00
JishinMaster aec45ea637 set the correct flop count for xgemm 2021-03-13 21:48:04 +01:00
Cedric Nugteren 1fa0930d85 Fix Windows paths in pyclblast 2021-02-05 21:52:23 +01:00
Cedric Nugteren d57f8065ea Added second Windows library path 2021-02-04 20:13:02 +01:00
Cedric Nugteren c78c649844 Add library path for Windows as well 2021-01-30 14:28:11 +01:00
Cedric Nugteren bbcb357a71 Add library dir on Linux for pyclblast 2021-01-29 20:48:05 +01:00
Cedric Nugteren 07837a5c2d Update pyclblast package version number 2021-01-21 20:49:31 +01:00
Jerry James dc82a1fbc8 Use reference types to prevent unnecessary copying 2021-01-20 10:21:36 -07:00