Cedric Nugteren
b0b302889c
Update to version 1.6.0 ( #475 )
2023-05-21 20:51:05 +02:00
Cedric Nugteren
3baf823575
Fixes an issue under Android when the driver was already unloaded ( #462 )
2023-05-10 17:10:17 +02:00
Cedric Nugteren
d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer ( #461 )
2023-05-10 12:48:25 +02:00
Cedric Nugteren
4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer ( #458 )
2023-05-07 20:03:16 +02:00
Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren
9eca896b05
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 20:24:40 +01:00
Cedric Nugteren
3ca1f5176e
Add tuning results for Adreno 740
2023-01-21 21:09:09 +01:00
Angus, Alexander
73f49e9b3d
Updated according to feedback from CNugteren
2023-01-17 08:35:29 -08:00
Cedric Nugteren
c7d677e4a9
Update changelog
2022-10-13 22:26:26 +02:00
Cedric Nugteren
f7db4c5d45
Replace the broken khronos registry link for cl.hpp with a new github link for opencl.hpp
2022-09-22 22:18:58 +02:00
Cedric Nugteren
0de212a56b
Update to version 1.5.3
2022-09-22 22:07:33 +02:00
Justin Graham
fc238a96c9
dev version
2022-05-13 16:46:28 -05:00
Justin Graham
1256f7bfbf
changelog message
2022-05-13 08:45:54 -05:00
Cedric Nugteren
c2951b8a2a
Updated README and tuning list
2021-08-19 20:37:46 +02:00
Cedric Nugteren
70016e8698
Updated to version 1.5.2
2021-01-19 21:19:12 +01:00
Cedric Nugteren
481d86665f
Add tuning results for Radeon RX Vega
2020-10-10 12:56:28 +02:00
Pradeep Garigipati
dff65e9217
Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG
2020-06-07 21:13:33 +05:30
Cedric Nugteren
396ac0278a
Added CLBLAST_VERSION_MAJOR/MINOR/PATCH defines in headers to store version numbering
2020-05-12 14:43:25 +02:00
Cedric Nugteren
4a6c7c37a3
Made sure that the global workgroup size is a multiple of the local size in the tuners
2020-05-10 20:28:23 +02:00
Cedric Nugteren
0870e76fba
Updated PyCLBlast version number
2020-05-10 14:55:03 +02:00
Cedric Nugteren
5f97d64505
Update API documentation
2020-03-08 11:29:47 +01:00
Cedric Nugteren
e3ce88154a
Silenced a new OpenCL warning message
2020-03-08 10:14:59 +01:00
Cedric Nugteren
8433985051
Updated to version 1.5.1
2020-02-18 10:29:40 +01:00
Cedric Nugteren
49eb490ee1
Catches all exceptions of the tuners
2020-02-17 22:07:51 +01:00
Cedric Nugteren
6ac74008b6
Added notion of fixes in XhadFaster
2019-09-06 19:33:30 +02:00
Cedric Nugteren
3f9d7bca22
Fixed a bug in the absolute-min index kernel
2019-05-19 14:00:18 +02:00
Cedric Nugteren
1035e533cd
Added tuning parameters for Xeon E5-2630 v3 and v4
2019-02-09 16:29:30 +01:00
Koichi Akabe
9532f8652c
Update changelog
2018-12-21 11:08:01 +09:00
Cedric Nugteren
0c9411c844
Updated to version 1.5.0
2018-12-04 20:46:02 +01:00
Cedric Nugteren
4676ec2921
Added a FAQ document
2018-12-01 17:19:28 +01:00
Cedric Nugteren
c0e41b87cb
Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel
2018-11-30 20:23:26 +01:00
Cedric Nugteren
2d32a23293
Added new col2im routine to the documentation
2018-11-01 21:46:19 +01:00
Cedric Nugteren
664a238adf
Fixed a bug in the XaxpyFaster kernel for specific parameters
2018-10-15 20:08:29 +02:00
Cedric Nugteren
634b2bc75c
Merge pull request #319 from CNugteren/convgemm_multi_kernel
...
First im2col+GEMM implementation of convolution
2018-10-14 17:27:45 +02:00
Cedric Nugteren
115a8f0f3d
Updated changelog regarding tuning API change
2018-10-13 17:49:49 +02:00
Cedric Nugteren
83ba3d4b7b
Merge branch 'master' into convgemm_multi_kernel
2018-09-16 20:01:18 +02:00
Cedric Nugteren
8ac39fa331
Disabled Intel subgroup shuffling for double-precision
2018-09-15 16:53:09 +02:00
Cedric Nugteren
c788e040f7
Added xCONVGEMM as im2col plus a batched GEMM kernel
2018-09-07 22:02:44 +02:00
Cedric Nugteren
9d9f09fce9
Name change of setting to NETLIB_PERSISTENT_OPENCL
2018-08-07 22:41:06 +02:00
Cedric Nugteren
fe639455bd
Added an option to compile the Netlib API with static OpenCL device and context
2018-08-05 21:12:39 +02:00
Cedric Nugteren
503ab74f02
Fixed issue with not performing complex conjugation under certain cases when transposing
2018-07-31 21:49:37 +02:00
Cedric Nugteren
fa84ac36f2
The tuners now also check for valid local thread configurations and skip invalid ones completely, saving compilation time
2018-07-28 16:01:03 +02:00
Cedric Nugteren
03bed8633e
Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel
2018-07-27 23:08:49 +02:00
Cedric Nugteren
6a8b9e24f2
Added code to report the average tuning results
2018-07-25 22:28:44 +02:00
Cedric Nugteren
db179a1e40
Updated to CLBlast version 1.4.1
2018-07-14 12:29:06 +02:00
Cedric Nugteren
c459582c4f
Added tuning results for HD Graphics 6000 Broadwell GT3
2018-07-13 21:05:43 +02:00
Cedric Nugteren
7bae54f61f
Updated changelog
2018-07-06 19:39:46 +02:00
Cedric Nugteren
e3eedacbcc
Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first
2018-06-28 20:35:18 +09:00
Cedric Nugteren
4471b67735
Updated to CLBlast version 1.4.0
2018-06-03 13:18:05 +02:00
Cedric Nugteren
4f594e3931
Added MKL as an alternative for CBLAS for correctness and performance comparisons
2018-06-02 17:57:45 +02:00