Cedric Nugteren
|
468a4a74eb
|
Fix issue with printing out-of-bounds local/global sizes for level 1 tuners
|
2021-05-22 20:31:12 +02:00 |
JishinMaster
|
aec45ea637
|
set the correct flop count for xgemm
|
2021-03-13 21:48:04 +01:00 |
Jerry James
|
dc82a1fbc8
|
Use reference types to prevent unnecessary copying
|
2021-01-20 10:21:36 -07:00 |
Cedric Nugteren
|
c369cf1a16
|
Increase display width of the local/global sizes
|
2020-05-11 20:26:33 +02:00 |
Cedric Nugteren
|
4a6c7c37a3
|
Made sure that the global workgroup size is a multiple of the local size in the tuners
|
2020-05-10 20:28:23 +02:00 |
Cedric Nugteren
|
69a4b4d4b0
|
Added logging of local/global workgroup sizes when run the tuners
|
2020-05-10 20:08:28 +02:00 |
Cedric Nugteren
|
bbb2031bf3
|
Move queue creation out of the tuner loop
|
2020-05-03 20:30:55 +02:00 |
Cedric Nugteren
|
49eb490ee1
|
Catches all exceptions of the tuners
|
2020-02-17 22:07:51 +01:00 |
Cedric Nugteren
|
d929525039
|
Added support for the convgemm tuner in the tuner database
|
2018-12-31 18:49:12 +01:00 |
Cedric Nugteren
|
153ac06cf2
|
Added the forgotten batch dimension to the tuner to get correct kernel executions
|
2018-12-31 13:19:58 +01:00 |
Koichi Akabe
|
a8e6f813dd
|
Fix the xconvgemm tuner
|
2018-12-18 14:05:25 +09:00 |
Cedric Nugteren
|
1f0cd61824
|
Added first version of a tuner for the ConvGemm direct kernel
|
2018-12-18 13:59:26 +09:00 |
Cedric Nugteren
|
9bedaa752d
|
Fixed an MSVC compilation error due to large strings
|
2018-09-15 17:35:26 +02:00 |
Cedric Nugteren
|
bc47e7e7cc
|
Added print statements to indicate the 4 stages of GEMM tuning
|
2018-07-28 16:08:22 +02:00 |
Cedric Nugteren
|
fa84ac36f2
|
The tuners now also check for valid local thread configurations and skip invalid ones completely, saving compilation time
|
2018-07-28 16:01:03 +02:00 |
Cedric Nugteren
|
6a8b9e24f2
|
Added code to report the average tuning results
|
2018-07-25 22:28:44 +02:00 |
Cedric Nugteren
|
ba0b558e84
|
Added an option to run the routine tuner for a single specific GEMM size
|
2018-05-19 17:42:11 +02:00 |
Cedric Nugteren
|
76e0079a90
|
Fixed compilation issues
|
2018-05-19 14:18:23 +02:00 |
Cedric Nugteren
|
66583b3cda
|
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
|
2018-05-19 12:48:59 +02:00 |
Cedric Nugteren
|
b855af681f
|
Added a canary region for overflow detection to the tuners
|
2018-05-17 10:45:10 +01:00 |
Cedric Nugteren
|
3519d32ac4
|
Extended the GEMM tuner to be able to tune the new 'kernel 1'
|
2018-04-07 17:05:44 +02:00 |
Cedric Nugteren
|
d86ff75fa5
|
Added argument checking for the GEMM tuner: expects m/n to be multiples of MWG/NWG
|
2018-03-30 10:23:33 +02:00 |
Cedric Nugteren
|
9fb6550dd0
|
Added the OpenCL local memory size constraint to the tuners
|
2018-03-22 21:01:02 +01:00 |
Cedric Nugteren
|
7a2371213b
|
Re-added support for local memory size constraint checking in the tuner
|
2018-03-21 22:58:37 +01:00 |
Cedric Nugteren
|
903deaf368
|
Fixed an issue for DLL linking under Windows
|
2018-03-10 16:45:31 +01:00 |
Cedric Nugteren
|
3d2ef9331b
|
Fixed a few things for the new tuning API
|
2018-03-10 14:35:11 +01:00 |
Cedric Nugteren
|
0bdc51e47c
|
Completed the API for all tuneable kernels
|
2018-03-10 10:54:44 +01:00 |
Cedric Nugteren
|
6397e61746
|
Added several more tuner API functions
|
2018-03-09 21:40:22 +01:00 |
Cedric Nugteren
|
49cc8b31ff
|
Fixed compilation issue in Xger tuner
|
2018-03-06 20:59:23 +01:00 |
Cedric Nugteren
|
0e1a152023
|
First version of the tuning API, added interface for copy-kernel, added sample
|
2018-03-06 20:52:12 +01:00 |
Cedric Nugteren
|
a1cedf36e3
|
Separate kernel tuners in .cpp with main and .hpp with settings
|
2018-03-03 16:37:31 +01:00 |
Cedric Nugteren
|
0557694d39
|
Fixed several issues in the new invert tuner
|
2018-02-20 20:53:13 +01:00 |
Cedric Nugteren
|
19fd263fb2
|
Moved some constants from global scope to a function; removed unnecessary includes
|
2018-01-25 20:00:43 +01:00 |
Cedric Nugteren
|
6a9d6b5da2
|
Changed the default number of runs for the GEMV tuner to fix issues for FP16
|
2018-01-25 19:57:36 +01:00 |
Cedric Nugteren
|
c3f9371d16
|
Made GEMM routine tuning a bit more generic in preparation of possible separate batched tuning arguments
|
2018-01-18 19:41:59 +01:00 |
Cedric Nugteren
|
0e5eaa6eb9
|
Factored out the generic parts of the GEMM routine tuner
|
2018-01-15 21:32:51 +01:00 |
Cedric Nugteren
|
c9b5d614e2
|
Fixed a vendor naming bug in the tuners and in the database
|
2018-01-06 17:02:58 +01:00 |
Cedric Nugteren
|
ef71d8e9b5
|
Fixed unused variable warnings showing up with Clang
|
2017-12-23 16:07:26 +01:00 |
Cedric Nugteren
|
288766debb
|
Now calling main TRSV routine again to fix compilation in MSVC
|
2017-12-23 14:49:21 +01:00 |
Cedric Nugteren
|
736399e528
|
Split the invert kernel in two parts to prevent error C1091 in MSVC 2013
|
2017-12-23 14:18:07 +01:00 |
Cedric Nugteren
|
b1f52f130c
|
Updated the database to use the new TRSV and Invert tuners
|
2017-12-23 13:55:22 +01:00 |
Cedric Nugteren
|
aa7db4f987
|
Added TRSV block-size tuner
|
2017-12-23 13:34:57 +01:00 |
Cedric Nugteren
|
07a7012b0d
|
Added skeleton for a tuner for the invert kernel
|
2017-12-19 21:10:48 +01:00 |
Cedric Nugteren
|
249bdaa8e9
|
Reformatted tuning code to make compilation faster
|
2017-12-18 21:34:07 +01:00 |
Cedric Nugteren
|
e2f8068459
|
Fixed an issue with the tuner: it was using platform vendor rather than device vendor
|
2017-12-17 17:58:06 +01:00 |
Cedric Nugteren
|
7408f6e6eb
|
Fixed an unnecessary overflow issue on 32-bit systems
|
2017-12-17 16:42:54 +01:00 |
Cedric Nugteren
|
b4d3a50f19
|
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
|
2017-12-10 16:09:09 +01:00 |
Cedric Nugteren
|
c2f08fa346
|
Fixed an issue in the tuners to prevent error -14 from persisting (CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST)
|
2017-12-10 14:48:13 +01:00 |
Cedric Nugteren
|
ca5dbcd2bd
|
Made the pre-processor run by default for ARM and Qualcomm GPUs
|
2017-12-09 15:16:53 +01:00 |
Cedric Nugteren
|
13eb772343
|
Integrated pre-processor in compilation flow, default is still disabled
|
2017-11-30 21:32:47 +01:00 |