Cedric Nugteren
d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer ( #461 )
2023-05-10 12:48:25 +02:00
Cedric Nugteren
4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer ( #458 )
2023-05-07 20:03:16 +02:00
Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren
1573f7d304
Merge pull request #455 from CNugteren/fix_gemm_documentation_bug
...
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 21:25:41 +01:00
Cedric Nugteren
9eca896b05
Fix documentation bug w.r.t. ld values and matrix layout
2023-03-25 20:24:40 +01:00
Cedric Nugteren
ab5092dd26
Merge pull request #452 from CNugteren/add_tuning_results_adreno
...
Add tuning results for 4 devies
2023-01-22 15:52:49 +01:00
Cedric Nugteren
c9856758b3
Add tuning results for Intel FPGA emulation device
2023-01-21 21:13:49 +01:00
Cedric Nugteren
f4a14daf8d
Add tuning results for Radeon Pro 450
2023-01-21 21:11:38 +01:00
Cedric Nugteren
3ca1f5176e
Add tuning results for Adreno 740
2023-01-21 21:09:09 +01:00
Cedric Nugteren
d11b0c8b01
Add tuning results for Adreno 730
2023-01-21 20:33:49 +01:00
Cedric Nugteren
e72f87ae5e
Merge pull request #451 from CodeLinaro/master
...
CLBlast modifications to address Qualcomm Adreno performance
2023-01-21 20:28:32 +01:00
Angus, Alexander
73f49e9b3d
Updated according to feedback from CNugteren
2023-01-17 08:35:29 -08:00
Angus, Alexander
ff6a5689df
Adreno 730 + 740 CLBlast tuning results
2023-01-12 12:33:48 -08:00
Angus, Alexander
4f394608a2
implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2023-01-03 10:56:04 -08:00
Cedric Nugteren
03cffa83c5
Merge pull request #447 from CNugteren/small_plotting_fixes
...
Fix two small issues in the plotting script
2022-10-14 08:18:00 +02:00
Cedric Nugteren
c7d677e4a9
Update changelog
2022-10-13 22:26:26 +02:00
Cedric Nugteren
374eba3ee2
Fix plotting issue with a single row or column
2022-10-13 22:24:35 +02:00
Cedric Nugteren
8aa9f32b23
Fix plotting issue in case of 'inf' values
2022-10-13 22:20:24 +02:00
Cedric Nugteren
d55840e16c
Merge pull request #442 from CNugteren/update_version_to_1_5_3
...
Update to version 1.5.3
2022-09-27 22:45:49 +02:00
Cedric Nugteren
e080635019
Fix opencl.hpp download in CMake
2022-09-27 21:11:17 +02:00
Cedric Nugteren
5c608d97cd
Properly set OpenCL target to version 2.1
2022-09-27 21:09:35 +02:00
Cedric Nugteren
f7db4c5d45
Replace the broken khronos registry link for cl.hpp with a new github link for opencl.hpp
2022-09-22 22:18:58 +02:00
Cedric Nugteren
521eee4bbf
Update PyCLBlast version number
2022-09-22 22:09:21 +02:00
Cedric Nugteren
0de212a56b
Update to version 1.5.3
2022-09-22 22:07:33 +02:00
Cedric Nugteren
38fa34b432
Fix typo in comment
...
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-06-24 09:32:47 +02:00
Cedric Nugteren
d837b64269
Merge pull request #438 from CNugteren/cupp11_api_inconsistency
...
Fix API inconsistency in cupp11.hpp
2022-05-25 09:14:04 +02:00
Cedric Nugteren
9ab1bf24e2
Fix API inconsistency in cupp11.hpp
...
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-23 12:45:22 +02:00
Cedric Nugteren
6b358e1be9
Merge pull request #437 from umar456/blas_fix
...
Add logic to find intel OpenMP on oneMKL.
2022-05-17 08:36:18 +02:00
Cedric Nugteren
1884158128
Merge pull request #432 from justingra/sum-fix
...
sum fix
2022-05-16 08:38:35 +02:00
Umar Arshad
35a4be231a
Add logic to find intel OpenMP on oneMKL.
2022-05-15 15:37:23 -04:00
Justin Graham
fc238a96c9
dev version
2022-05-13 16:46:28 -05:00
Justin Graham
1256f7bfbf
changelog message
2022-05-13 08:45:54 -05:00
Cedric Nugteren
cb43f264cb
Merge pull request #436 from CNugteren/add_tuning_results
...
Add tuning results for 2 AMD GPUs and 1 Qualcomm GPU
2022-04-25 21:42:57 +02:00
Cedric Nugteren
f107162e64
Add tuning results for Adreno 540
2022-04-25 20:36:18 +02:00
Cedric Nugteren
c4163b4b1a
Add tuning results for Radeon RX 6500 XT
2022-04-25 20:33:47 +02:00
Cedric Nugteren
7ec8b2f29b
Add tuning results for Radeon RX 6800 XT
2022-04-25 20:31:55 +02:00
Cedric Nugteren
df0e492d39
Merge pull request #434 from CNugteren/update_test_status_machines
...
Remove old test machines and add new ones
2022-04-25 20:15:07 +02:00
Cedric Nugteren
a7cdf3f0fa
Remove old test machines and add new ones
2022-04-25 20:08:41 +02:00
Justin Graham
ba254d2f50
sum fix
2022-04-22 11:39:38 -05:00
Cedric Nugteren
9e2ccb7f2b
Merge pull request #431 from danyougle/patch-2
...
android.hpp: custom header guard _clang_
2022-04-14 10:26:52 +02:00
danyougle
f3f3c88710
android.hpp: custom header guard of _clang_
...
In order not to have ambiguous definitions, exclude the functions for other compilers
2022-04-13 22:33:12 +02:00
Cedric Nugteren
8d298af10b
Merge pull request #430 from danyougle/patch-1
...
add AMD OCL SDK light path in ENV section
2022-04-13 11:51:47 +02:00
danyougle
6db6ff7107
add AMD OCL SDK light path in ENV section
2022-04-13 10:44:40 +02:00
Cedric Nugteren
4500a03440
Merge pull request #425 from CNugteren/tesla_t4_correctness
...
Tesla T4 tuning parameters
2021-08-27 22:17:30 +02:00
Cedric Nugteren
772dd307ab
Add Quadro T2000 tuning parameters for the Tesla T4
2021-08-27 20:39:59 +02:00
Cedric Nugteren
1f639b7264
Remove Tesla T4 tuning results
2021-08-27 20:32:59 +02:00
Cedric Nugteren
cb761e375b
Merge pull request #424 from gspr/gspr/prebuilt
...
Update documentation to reflect CLBlast in Debian & Ubuntu
2021-08-24 13:29:18 +02:00
Gard Spreemann
df1eebc120
PPA for older Ubuntus
2021-08-24 12:36:35 +02:00
Gard Spreemann
3b1e14acd6
Let the installation documentation reflect the fact that CLBlast is now in Debian and Ubuntu
2021-08-24 11:27:42 +02:00
Cedric Nugteren
93d6070e27
Merge pull request #423 from CNugteren/new_tuning_results
...
New tuning results for 1 Intel CPU and 5 NVIDIA GPUs
2021-08-20 08:18:36 +02:00