Cedric Nugteren
6e2ab6ee96
Add tuning results for 5 devices ( #526 )
2024-02-08 20:33:33 +01:00
vathomass
162783a414
Python module mutli-platform setup ( #519 )
...
* Change to pyproject.toml file
* Switch to cmake for building the extension
* Update readme
* Update CHANGELOG
* Add hint fot CLBlast discovery
* Update README about detecting the library
* Switch to scikit-build-core for using CMake
2024-01-21 10:58:38 +01:00
Cedric Nugteren
9535155ad8
Add tuning results for 4 devices ( #518 )
2023-11-12 20:47:00 +01:00
vathomass
564629cafd
Fix floating point conversion in Python wrapper ( #515 )
...
* Generator.py: use LF ending when run from windows
* Convert scalar input to float16 in cython
* Index buffer as uints in cython wrapper
* Update CHANGELOG and bump ver. 1.4.0
2023-11-10 20:33:21 +01:00
Cedric Nugteren
29e13d5a33
Add tuning results for 5 devices ( #503 )
2023-09-14 21:14:26 +02:00
Cedric Nugteren
afb3d8a604
Fix preprocessor and extend test coverage ( #498 )
...
* Improve coverage of pre-processor test
* Make the preprocessor handle the not-defined() construct
* Update the changelog
2023-08-07 20:32:30 +02:00
Cedric Nugteren
6762e8480c
Fix a multithreading bug related to storing objects in the cache ( #495 )
2023-07-08 20:08:00 +02:00
Cedric Nugteren
83bd474eda
Add tuning results for 7 devices ( #494 )
2023-07-04 21:09:12 +02:00
Cedric Nugteren
af667c45fe
Add 6 tuning results ( #493 )
...
* Add tuning results for 6 devices
* Add GPU generation names for NVIDIA and AMD GPUs in the documentation
2023-06-24 12:10:08 +02:00
Yubraj Bhoi
28a61c53a6
Fix pointer error in `pyclblast` on ARM ( #490 )
...
* Fix pointer error in `pyclblast` on ARM
Use `ptrdiff_t` instead of `size_t` for pointers.
Fix error in `setup.py`
* Fix ARM pointer error in `pyclblast` generator
Update CHANGELOG file
2023-06-16 09:45:16 +00:00
Cedric Nugteren
2b98c6a28c
Add tuning results for more devices ( #488 )
...
Add tuning results for 13 devices
2023-06-06 21:31:35 +02:00
Cedric Nugteren
ec733402a8
Add tuning results for Radeon RX 6700 XT ( #484 )
2023-06-01 21:51:33 +02:00
Cedric Nugteren
05a26111f7
Add tuning results for 14 devices ( #483 )
2023-05-31 21:06:52 +02:00
Cedric Nugteren
63eb127bad
Intel HD Graphics 770 and AMD RX 6600 XT tuning results ( #474 )
...
* Add tuning results for AMD Radeon RX 6600 XT
* Add tuning results for Intel HD Graphics 770
* Update list of tuned devices
2023-05-21 14:25:15 +02:00
Cedric Nugteren
7a3ef92ff2
Add 3 sets of tuning results: RX 5700 XT, 2080 Ti, and 3090 ( #468 )
...
* Add tuning results for AMD Radeon RX 5700 XT
* Add tuning results for NVIDIA GeForce RTX 2080 Ti
* Add tuning results for NVIDIA GeForce RTX 3090
2023-05-17 10:31:47 +02:00
Cedric Nugteren
3baf823575
Fixes an issue under Android when the driver was already unloaded ( #462 )
2023-05-10 17:10:17 +02:00
Cedric Nugteren
d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer ( #461 )
2023-05-10 12:48:25 +02:00
Cedric Nugteren
4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer ( #458 )
2023-05-07 20:03:16 +02:00
Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren
c9856758b3
Add tuning results for Intel FPGA emulation device
2023-01-21 21:13:49 +01:00
Cedric Nugteren
f4a14daf8d
Add tuning results for Radeon Pro 450
2023-01-21 21:11:38 +01:00
Cedric Nugteren
3ca1f5176e
Add tuning results for Adreno 740
2023-01-21 21:09:09 +01:00
Cedric Nugteren
d11b0c8b01
Add tuning results for Adreno 730
2023-01-21 20:33:49 +01:00
Angus, Alexander
73f49e9b3d
Updated according to feedback from CNugteren
2023-01-17 08:35:29 -08:00
Angus, Alexander
4f394608a2
implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2023-01-03 10:56:04 -08:00
Cedric Nugteren
521eee4bbf
Update PyCLBlast version number
2022-09-22 22:09:21 +02:00
Cedric Nugteren
38fa34b432
Fix typo in comment
...
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-06-24 09:32:47 +02:00
Cedric Nugteren
9ab1bf24e2
Fix API inconsistency in cupp11.hpp
...
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-23 12:45:22 +02:00
Cedric Nugteren
1884158128
Merge pull request #432 from justingra/sum-fix
...
sum fix
2022-05-16 08:38:35 +02:00
Cedric Nugteren
f107162e64
Add tuning results for Adreno 540
2022-04-25 20:36:18 +02:00
Cedric Nugteren
c4163b4b1a
Add tuning results for Radeon RX 6500 XT
2022-04-25 20:33:47 +02:00
Cedric Nugteren
7ec8b2f29b
Add tuning results for Radeon RX 6800 XT
2022-04-25 20:31:55 +02:00
Justin Graham
ba254d2f50
sum fix
2022-04-22 11:39:38 -05:00
danyougle
f3f3c88710
android.hpp: custom header guard of _clang_
...
In order not to have ambiguous definitions, exclude the functions for other compilers
2022-04-13 22:33:12 +02:00
Cedric Nugteren
772dd307ab
Add Quadro T2000 tuning parameters for the Tesla T4
2021-08-27 20:39:59 +02:00
Cedric Nugteren
1f639b7264
Remove Tesla T4 tuning results
2021-08-27 20:32:59 +02:00
Cedric Nugteren
5a9bd270f8
Add tuning results for NVIDIA Tesla V100
2021-08-19 20:34:09 +02:00
Cedric Nugteren
adb4b02982
Add tuning results for NVIDIA Tesla T4
2021-08-19 20:31:52 +02:00
Cedric Nugteren
dea3b5fadb
Add tuning results for NVIDIA Quadro T2000
2021-08-19 20:29:47 +02:00
Cedric Nugteren
521ad117bc
Add tuning results for NVIDIA Quadro GV100
2021-08-19 20:27:39 +02:00
Cedric Nugteren
e9dec268bc
Add tuning results for Intel Core i9-9980HK
2021-08-19 20:25:26 +02:00
Cedric Nugteren
e59ea46180
Add tuning results for NVIDIA A100
2021-08-19 20:23:25 +02:00
Cedric Nugteren
468a4a74eb
Fix issue with printing out-of-bounds local/global sizes for level 1 tuners
2021-05-22 20:31:12 +02:00
JishinMaster
aec45ea637
set the correct flop count for xgemm
2021-03-13 21:48:04 +01:00
Cedric Nugteren
1fa0930d85
Fix Windows paths in pyclblast
2021-02-05 21:52:23 +01:00
Cedric Nugteren
d57f8065ea
Added second Windows library path
2021-02-04 20:13:02 +01:00
Cedric Nugteren
c78c649844
Add library path for Windows as well
2021-01-30 14:28:11 +01:00
Cedric Nugteren
bbcb357a71
Add library dir on Linux for pyclblast
2021-01-29 20:48:05 +01:00
Cedric Nugteren
07837a5c2d
Update pyclblast package version number
2021-01-21 20:49:31 +01:00
Jerry James
dc82a1fbc8
Use reference types to prevent unnecessary copying
2021-01-20 10:21:36 -07:00