Cedric Nugteren
faa2109707
Bump to version 1.6.2 ( #527 )
2024-02-09 21:38:37 +01:00
Cedric Nugteren
6e2ab6ee96
Add tuning results for 5 devices ( #526 )
2024-02-08 20:33:33 +01:00
vathomass
162783a414
Python module mutli-platform setup ( #519 )
...
* Change to pyproject.toml file
* Switch to cmake for building the extension
* Update readme
* Update CHANGELOG
* Add hint fot CLBlast discovery
* Update README about detecting the library
* Switch to scikit-build-core for using CMake
2024-01-21 10:58:38 +01:00
vathomass
564629cafd
Fix floating point conversion in Python wrapper ( #515 )
...
* Generator.py: use LF ending when run from windows
* Convert scalar input to float16 in cython
* Index buffer as uints in cython wrapper
* Update CHANGELOG and bump ver. 1.4.0
2023-11-10 20:33:21 +01:00
Biswapriyo Nath
60e7574113
CMake: Fix DLL install directory in mingw ( #505 )
...
This installs DLL in 'bin' directory instead of 'lib' by default.
In mingw environment, all DLLs and EXEs are installed in 'bin'
and static & import libraries are installed in 'lib' directory.
This does not affect other environment because the 'RUNTIME'
and 'LIBRARY' targets are automatically set by cmake. See
https://cmake.org/cmake/help/latest/command/install.html
Signed-off-by: Biswapriyo Nath <nathbappai@gmail.com>
2023-09-25 08:57:12 +00:00
Cedric Nugteren
29e13d5a33
Add tuning results for 5 devices ( #503 )
2023-09-14 21:14:26 +02:00
Cedric Nugteren
afb3d8a604
Fix preprocessor and extend test coverage ( #498 )
...
* Improve coverage of pre-processor test
* Make the preprocessor handle the not-defined() construct
* Update the changelog
2023-08-07 20:32:30 +02:00
Cedric Nugteren
e3ce21bb93
Bump to v1.6.1 ( #496 )
2023-07-09 11:24:24 +02:00
Cedric Nugteren
6762e8480c
Fix a multithreading bug related to storing objects in the cache ( #495 )
2023-07-08 20:08:00 +02:00
Cedric Nugteren
af667c45fe
Add 6 tuning results ( #493 )
...
* Add tuning results for 6 devices
* Add GPU generation names for NVIDIA and AMD GPUs in the documentation
2023-06-24 12:10:08 +02:00
Yubraj Bhoi
28a61c53a6
Fix pointer error in `pyclblast` on ARM ( #490 )
...
* Fix pointer error in `pyclblast` on ARM
Use `ptrdiff_t` instead of `size_t` for pointers.
Fix error in `setup.py`
* Fix ARM pointer error in `pyclblast` generator
Update CHANGELOG file
2023-06-16 09:45:16 +00:00
Cedric Nugteren
05a26111f7
Add tuning results for 14 devices ( #483 )
2023-05-31 21:06:52 +02:00
Cedric Nugteren
b0b302889c
Update to version 1.6.0 ( #475 )
2023-05-21 20:51:05 +02:00
Cedric Nugteren
3baf823575
Fixes an issue under Android when the driver was already unloaded ( #462 )
2023-05-10 17:10:17 +02:00
Cedric Nugteren
d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer ( #461 )
2023-05-10 12:48:25 +02:00
Cedric Nugteren
4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer ( #458 )
2023-05-07 20:03:16 +02:00
Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren
9eca896b05
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 20:24:40 +01:00
Cedric Nugteren
3ca1f5176e
Add tuning results for Adreno 740
2023-01-21 21:09:09 +01:00
Angus, Alexander
73f49e9b3d
Updated according to feedback from CNugteren
2023-01-17 08:35:29 -08:00
Cedric Nugteren
c7d677e4a9
Update changelog
2022-10-13 22:26:26 +02:00
Cedric Nugteren
f7db4c5d45
Replace the broken khronos registry link for cl.hpp with a new github link for opencl.hpp
2022-09-22 22:18:58 +02:00
Cedric Nugteren
0de212a56b
Update to version 1.5.3
2022-09-22 22:07:33 +02:00
Justin Graham
fc238a96c9
dev version
2022-05-13 16:46:28 -05:00
Justin Graham
1256f7bfbf
changelog message
2022-05-13 08:45:54 -05:00
Cedric Nugteren
c2951b8a2a
Updated README and tuning list
2021-08-19 20:37:46 +02:00
Cedric Nugteren
70016e8698
Updated to version 1.5.2
2021-01-19 21:19:12 +01:00
Cedric Nugteren
481d86665f
Add tuning results for Radeon RX Vega
2020-10-10 12:56:28 +02:00
Pradeep Garigipati
dff65e9217
Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG
2020-06-07 21:13:33 +05:30
Cedric Nugteren
396ac0278a
Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering
2020-05-12 14:43:25 +02:00
Cedric Nugteren
4a6c7c37a3
Made sure that the global workgroup size is a multiple of the local size in the tuners
2020-05-10 20:28:23 +02:00
Cedric Nugteren
0870e76fba
Updated PyCLBlast version number
2020-05-10 14:55:03 +02:00
Cedric Nugteren
5f97d64505
Update API documentation
2020-03-08 11:29:47 +01:00
Cedric Nugteren
e3ce88154a
Silenced a new OpenCL warning message
2020-03-08 10:14:59 +01:00
Cedric Nugteren
8433985051
Updated to version 1.5.1
2020-02-18 10:29:40 +01:00
Cedric Nugteren
49eb490ee1
Catches all exceptions of the tuners
2020-02-17 22:07:51 +01:00
Cedric Nugteren
6ac74008b6
Added notion of fixes in XhadFaster
2019-09-06 19:33:30 +02:00
Cedric Nugteren
3f9d7bca22
Fixed a bug in the absolute-min index kernel
2019-05-19 14:00:18 +02:00
Cedric Nugteren
1035e533cd
Added tuning parameters for Xeon E5-2630 v3 and v4
2019-02-09 16:29:30 +01:00
Koichi Akabe
9532f8652c
Update changelog
2018-12-21 11:08:01 +09:00
Cedric Nugteren
0c9411c844
Updated to version 1.5.0
2018-12-04 20:46:02 +01:00
Cedric Nugteren
4676ec2921
Added a FAQ document
2018-12-01 17:19:28 +01:00
Cedric Nugteren
c0e41b87cb
Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel
2018-11-30 20:23:26 +01:00
Cedric Nugteren
2d32a23293
Added new col2im routine to the documentation
2018-11-01 21:46:19 +01:00
Cedric Nugteren
664a238adf
Fixed a bug in the XaxpyFaster kernel for specific parameters
2018-10-15 20:08:29 +02:00
Cedric Nugteren
634b2bc75c
Merge pull request #319 from CNugteren/convgemm_multi_kernel
...
First im2col+GEMM implementation of convolution
2018-10-14 17:27:45 +02:00
Cedric Nugteren
115a8f0f3d
Updated changelog regarding tuning API change
2018-10-13 17:49:49 +02:00
Cedric Nugteren
83ba3d4b7b
Merge branch 'master' into convgemm_multi_kernel
2018-09-16 20:01:18 +02:00
Cedric Nugteren
8ac39fa331
Disabled Intel subgroup shuffling for double-precision
2018-09-15 16:53:09 +02:00
Cedric Nugteren
c788e040f7
Added xCONVGEMM as im2col plus a batched GEMM kernel
2018-09-07 22:02:44 +02:00