Commit graph

767 commits

Author SHA1 Message Date
Cedric Nugteren 63eb127bad
Intel HD Graphics 770 and AMD RX 6600 XT tuning results (#474)
* Add tuning results for AMD Radeon RX 6600 XT

* Add tuning results for Intel HD Graphics 770

* Update list of tuned devices
2023-05-21 14:25:15 +02:00
Cedric Nugteren 7a3ef92ff2
Add 3 sets of tuning results: RX 5700 XT, 2080 Ti, and 3090 (#468)
* Add tuning results for AMD Radeon RX 5700 XT

* Add tuning results for NVIDIA GeForce RTX 2080 Ti

* Add tuning results for NVIDIA GeForce RTX 3090
2023-05-17 10:31:47 +02:00
Cedric Nugteren 3baf823575
Fixes an issue under Android when the driver was already unloaded (#462) 2023-05-10 17:10:17 +02:00
Cedric Nugteren d94d086d6f
TBMV/TPMV/TRSV: Use the minimum x buffer size for copying to a temp buffer (#461) 2023-05-10 12:48:25 +02:00
Cedric Nugteren 4f24d92730
TRMV: Use the minimum x buffer size for copying to a temp buffer (#458) 2023-05-07 20:03:16 +02:00
Cedric Nugteren 3d0c227fa5
AMAX/AMIN integer testing and bug fixes (#457)
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result

* Perform proper integer-output testing in XAMAX tests

* A few changes towards getting it ready for a PR

* Also fix compilation for clBLAS and cuBLAS references

* Fix a bug that would only use the real part of complex numbers in the amax/amin routines

* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren c9856758b3 Add tuning results for Intel FPGA emulation device 2023-01-21 21:13:49 +01:00
Cedric Nugteren f4a14daf8d Add tuning results for Radeon Pro 450 2023-01-21 21:11:38 +01:00
Cedric Nugteren 3ca1f5176e Add tuning results for Adreno 740 2023-01-21 21:09:09 +01:00
Cedric Nugteren d11b0c8b01 Add tuning results for Adreno 730 2023-01-21 20:33:49 +01:00
Angus, Alexander 73f49e9b3d Updated according to feedback from CNugteren 2023-01-17 08:35:29 -08:00
Angus, Alexander 4f394608a2 implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731 2023-01-03 10:56:04 -08:00
Cedric Nugteren 521eee4bbf Update PyCLBlast version number 2022-09-22 22:09:21 +02:00
Cedric Nugteren 38fa34b432
Fix typo in comment
Resolves https://github.com/CNugteren/CLBlast/issues/440
2022-06-24 09:32:47 +02:00
Cedric Nugteren 9ab1bf24e2
Fix API inconsistency in cupp11.hpp
The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.
2022-05-23 12:45:22 +02:00
Cedric Nugteren 1884158128
Merge pull request #432 from justingra/sum-fix
sum fix
2022-05-16 08:38:35 +02:00
Cedric Nugteren f107162e64 Add tuning results for Adreno 540 2022-04-25 20:36:18 +02:00
Cedric Nugteren c4163b4b1a Add tuning results for Radeon RX 6500 XT 2022-04-25 20:33:47 +02:00
Cedric Nugteren 7ec8b2f29b Add tuning results for Radeon RX 6800 XT 2022-04-25 20:31:55 +02:00
Justin Graham ba254d2f50 sum fix 2022-04-22 11:39:38 -05:00
danyougle f3f3c88710
android.hpp: custom header guard of _clang_
In order not to have ambiguous definitions, exclude the functions for other compilers
2022-04-13 22:33:12 +02:00
Cedric Nugteren 772dd307ab Add Quadro T2000 tuning parameters for the Tesla T4 2021-08-27 20:39:59 +02:00
Cedric Nugteren 1f639b7264 Remove Tesla T4 tuning results 2021-08-27 20:32:59 +02:00
Cedric Nugteren 5a9bd270f8 Add tuning results for NVIDIA Tesla V100 2021-08-19 20:34:09 +02:00
Cedric Nugteren adb4b02982 Add tuning results for NVIDIA Tesla T4 2021-08-19 20:31:52 +02:00
Cedric Nugteren dea3b5fadb Add tuning results for NVIDIA Quadro T2000 2021-08-19 20:29:47 +02:00
Cedric Nugteren 521ad117bc Add tuning results for NVIDIA Quadro GV100 2021-08-19 20:27:39 +02:00
Cedric Nugteren e9dec268bc Add tuning results for Intel Core i9-9980HK 2021-08-19 20:25:26 +02:00
Cedric Nugteren e59ea46180 Add tuning results for NVIDIA A100 2021-08-19 20:23:25 +02:00
Cedric Nugteren 468a4a74eb Fix issue with printing out-of-bounds local/global sizes for level 1 tuners 2021-05-22 20:31:12 +02:00
JishinMaster aec45ea637 set the correct flop count for xgemm 2021-03-13 21:48:04 +01:00
Cedric Nugteren 1fa0930d85 Fix Windows paths in pyclblast 2021-02-05 21:52:23 +01:00
Cedric Nugteren d57f8065ea Added second Windows library path 2021-02-04 20:13:02 +01:00
Cedric Nugteren c78c649844 Add library path for Windows as well 2021-01-30 14:28:11 +01:00
Cedric Nugteren bbcb357a71 Add library dir on Linux for pyclblast 2021-01-29 20:48:05 +01:00
Cedric Nugteren 07837a5c2d Update pyclblast package version number 2021-01-21 20:49:31 +01:00
Jerry James dc82a1fbc8 Use reference types to prevent unnecessary copying 2021-01-20 10:21:36 -07:00
Cedric Nugteren 0ee39af5ed Add tuning results for TITAN RTX 2020-10-10 13:01:12 +02:00
Cedric Nugteren 481d86665f Add tuning results for Radeon RX Vega 2020-10-10 12:56:28 +02:00
Pradeep Garigipati dff65e9217 Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG 2020-06-07 21:13:33 +05:30
Pradeep Garigipati aec71699f8
Fix Program::GetIR to handle programs with multiple devices 2020-06-05 12:00:45 +05:30
Cedric Nugteren c369cf1a16 Increase display width of the local/global sizes 2020-05-11 20:26:33 +02:00
Cedric Nugteren 4a6c7c37a3 Made sure that the global workgroup size is a multiple of the local size in the tuners 2020-05-10 20:28:23 +02:00
Cedric Nugteren 69a4b4d4b0 Added logging of local/global workgroup sizes when run the tuners 2020-05-10 20:08:28 +02:00
Cedric Nugteren 0870e76fba Updated PyCLBlast version number 2020-05-10 14:55:03 +02:00
Cedric Nugteren 0b7ce8033c Added a sample to demonstrate a batched routine 2020-05-10 14:54:50 +02:00
Cedric Nugteren b94e81af10 Added pyclblast bindings for the 3 batched routines 2020-05-10 12:26:25 +02:00
Cedric Nugteren bbb2031bf3 Move queue creation out of the tuner loop 2020-05-03 20:30:55 +02:00
Cedric Nugteren b46853660e Made it more likely (but no guarantees) for amax/amin to return the first index 2020-03-08 11:26:49 +01:00
Cedric Nugteren e3ce88154a Silenced a new OpenCL warning message 2020-03-08 10:14:59 +01:00