Commit Graph

246 Commits (613ee24ab7f47fe075b6c88d92cdccc1eefea585)

Author SHA1 Message Date
Cedric Nugteren faa2109707
Bump to version 1.6.2 (#527) 2024-02-09 21:38:37 +01:00
Cedric Nugteren 6e2ab6ee96
Add tuning results for 5 devices (#526) 2024-02-08 20:33:33 +01:00
vathomass 162783a414
Python module mutli-platform setup (#519)
* Change to pyproject.toml file

* Switch to cmake for building the extension

* Update readme

* Update CHANGELOG

* Add hint fot CLBlast discovery

* Update README about detecting the library

* Switch to scikit-build-core for using CMake
2024-01-21 10:58:38 +01:00
vathomass 564629cafd
Fix floating point conversion in Python wrapper (#515)
* Generator.py: use LF ending when run from windows

* Convert scalar input to float16 in cython

* Index buffer as uints in cython wrapper

* Update CHANGELOG and bump ver. 1.4.0
2023-11-10 20:33:21 +01:00
Biswapriyo Nath 60e7574113
CMake: Fix DLL install directory in mingw (#505)
This installs DLL in 'bin' directory instead of 'lib' by default.
In mingw environment, all DLLs and EXEs are installed in 'bin'
and static & import libraries are installed in 'lib' directory.
This does not affect other environment because the 'RUNTIME'
and 'LIBRARY' targets are automatically set by cmake. See
https://cmake.org/cmake/help/latest/command/install.html

Signed-off-by: Biswapriyo Nath <nathbappai@gmail.com>
2023-09-25 08:57:12 +00:00
Cedric Nugteren 29e13d5a33
Add tuning results for 5 devices (#503) 2023-09-14 21:14:26 +02:00
Cedric Nugteren afb3d8a604
Fix preprocessor and extend test coverage (#498)
* Improve coverage of pre-processor test

* Make the preprocessor handle the not-defined() construct

* Update the changelog
2023-08-07 20:32:30 +02:00
Cedric Nugteren e3ce21bb93
Bump to v1.6.1 (#496) 2023-07-09 11:24:24 +02:00
Cedric Nugteren 6762e8480c
Fix a multithreading bug related to storing objects in the cache (#495) 2023-07-08 20:08:00 +02:00
Cedric Nugteren af667c45fe
Add 6 tuning results (#493)
* Add tuning results for 6 devices

* Add GPU generation names for NVIDIA and AMD GPUs in the documentation
2023-06-24 12:10:08 +02:00
Yubraj Bhoi 28a61c53a6
Fix pointer error in `pyclblast` on ARM (#490)
* Fix pointer error in `pyclblast` on ARM

Use `ptrdiff_t` instead of `size_t` for pointers.
Fix error in `setup.py`

* Fix ARM pointer error in `pyclblast` generator

Update CHANGELOG file
2023-06-16 09:45:16 +00:00
Cedric Nugteren 05a26111f7
Add tuning results for 14 devices (#483) 2023-05-31 21:06:52 +02:00
Cedric Nugteren b0b302889c
Update to version 1.6.0 (#475) 2023-05-21 20:51:05 +02:00
Cedric Nugteren 3baf823575
Fixes an issue under Android when the driver was already unloaded (#462) 2023-05-10 17:10:17 +02:00
Cedric Nugteren d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer (#461) 2023-05-10 12:48:25 +02:00
Cedric Nugteren 4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer (#458) 2023-05-07 20:03:16 +02:00
Cedric Nugteren 3d0c227fa5
AMAX/AMIN integer testing and bug fixes (#457)
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result

* Perform proper integer-output testing in XAMAX tests

* A few changes towards getting it ready for a PR

* Also fix compilation for clBLAS and cuBLAS references

* Fix a bug that would only use the real part of complex numbers in the amax/amin routines

* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren 9eca896b05 Fix documentation bug w.r.t. ld values and matrix layout 2023-03-25 20:24:40 +01:00
Cedric Nugteren 3ca1f5176e Add tuning results for Adreno 740 2023-01-21 21:09:09 +01:00
Angus, Alexander 73f49e9b3d Updated according to feedback from CNugteren 2023-01-17 08:35:29 -08:00
Cedric Nugteren c7d677e4a9 Update changelog 2022-10-13 22:26:26 +02:00
Cedric Nugteren f7db4c5d45 Replace the broken khronos registry link for cl.hpp with a new github link for opencl.hpp 2022-09-22 22:18:58 +02:00
Cedric Nugteren 0de212a56b Update to version 1.5.3 2022-09-22 22:07:33 +02:00
Justin Graham fc238a96c9 dev version 2022-05-13 16:46:28 -05:00
Justin Graham 1256f7bfbf changelog message 2022-05-13 08:45:54 -05:00
Cedric Nugteren c2951b8a2a Updated README and tuning list 2021-08-19 20:37:46 +02:00
Cedric Nugteren 70016e8698 Updated to version 1.5.2 2021-01-19 21:19:12 +01:00
Cedric Nugteren 481d86665f Add tuning results for Radeon RX Vega 2020-10-10 12:56:28 +02:00
Pradeep Garigipati dff65e9217 Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG 2020-06-07 21:13:33 +05:30
Cedric Nugteren 396ac0278a Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering 2020-05-12 14:43:25 +02:00
Cedric Nugteren 4a6c7c37a3 Made sure that the global workgroup size is a multiple of the local size in the tuners 2020-05-10 20:28:23 +02:00
Cedric Nugteren 0870e76fba Updated PyCLBlast version number 2020-05-10 14:55:03 +02:00
Cedric Nugteren 5f97d64505 Update API documentation 2020-03-08 11:29:47 +01:00
Cedric Nugteren e3ce88154a Silenced a new OpenCL warning message 2020-03-08 10:14:59 +01:00
Cedric Nugteren 8433985051 Updated to version 1.5.1 2020-02-18 10:29:40 +01:00
Cedric Nugteren 49eb490ee1 Catches all exceptions of the tuners 2020-02-17 22:07:51 +01:00
Cedric Nugteren 6ac74008b6
Added notion of fixes in XhadFaster 2019-09-06 19:33:30 +02:00
Cedric Nugteren 3f9d7bca22 Fixed a bug in the absolute-min index kernel 2019-05-19 14:00:18 +02:00
Cedric Nugteren 1035e533cd Added tuning parameters for Xeon E5-2630 v3 and v4 2019-02-09 16:29:30 +01:00
Koichi Akabe 9532f8652c Update changelog 2018-12-21 11:08:01 +09:00
Cedric Nugteren 0c9411c844 Updated to version 1.5.0 2018-12-04 20:46:02 +01:00
Cedric Nugteren 4676ec2921 Added a FAQ document 2018-12-01 17:19:28 +01:00
Cedric Nugteren c0e41b87cb Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel 2018-11-30 20:23:26 +01:00
Cedric Nugteren 2d32a23293 Added new col2im routine to the documentation 2018-11-01 21:46:19 +01:00
Cedric Nugteren 664a238adf Fixed a bug in the XaxpyFaster kernel for specific parameters 2018-10-15 20:08:29 +02:00
Cedric Nugteren 634b2bc75c
Merge pull request #319 from CNugteren/convgemm_multi_kernel
First im2col+GEMM implementation of convolution
2018-10-14 17:27:45 +02:00
Cedric Nugteren 115a8f0f3d Updated changelog regarding tuning API change 2018-10-13 17:49:49 +02:00
Cedric Nugteren 83ba3d4b7b Merge branch 'master' into convgemm_multi_kernel 2018-09-16 20:01:18 +02:00
Cedric Nugteren 8ac39fa331 Disabled Intel subgroup shuffling for double-precision 2018-09-15 16:53:09 +02:00
Cedric Nugteren c788e040f7 Added xCONVGEMM as im2col plus a batched GEMM kernel 2018-09-07 22:02:44 +02:00