Commit Graph

138 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren 468a4a74eb Fix issue with printing out-of-bounds local/global sizes for level 1 tuners 2021-05-22 20:31:12 +02:00
JishinMaster aec45ea637 set the correct flop count for xgemm 2021-03-13 21:48:04 +01:00
Jerry James dc82a1fbc8 Use reference types to prevent unnecessary copying 2021-01-20 10:21:36 -07:00
Cedric Nugteren c369cf1a16 Increase display width of the local/global sizes 2020-05-11 20:26:33 +02:00
Cedric Nugteren 4a6c7c37a3 Made sure that the global workgroup size is a multiple of the local size in the tuners 2020-05-10 20:28:23 +02:00
Cedric Nugteren 69a4b4d4b0 Added logging of local/global workgroup sizes when run the tuners 2020-05-10 20:08:28 +02:00
Cedric Nugteren bbb2031bf3 Move queue creation out of the tuner loop 2020-05-03 20:30:55 +02:00
Cedric Nugteren 49eb490ee1 Catches all exceptions of the tuners 2020-02-17 22:07:51 +01:00
Cedric Nugteren d929525039 Added support for the convgemm tuner in the tuner database 2018-12-31 18:49:12 +01:00
Cedric Nugteren 153ac06cf2 Added the forgotten batch dimension to the tuner to get correct kernel executions 2018-12-31 13:19:58 +01:00
Koichi Akabe a8e6f813dd Fix the xconvgemm tuner 2018-12-18 14:05:25 +09:00
Cedric Nugteren 1f0cd61824 Added first version of a tuner for the ConvGemm direct kernel 2018-12-18 13:59:26 +09:00
Cedric Nugteren 9bedaa752d Fixed an MSVC compilation error due to large strings 2018-09-15 17:35:26 +02:00
Cedric Nugteren bc47e7e7cc Added print statements to indicate the 4 stages of GEMM tuning 2018-07-28 16:08:22 +02:00
Cedric Nugteren fa84ac36f2 The tuners now also check for valid local thread configurations and skip invalid ones completely, saving compilation time 2018-07-28 16:01:03 +02:00
Cedric Nugteren 6a8b9e24f2 Added code to report the average tuning results 2018-07-25 22:28:44 +02:00
Cedric Nugteren ba0b558e84 Added an option to run the routine tuner for a single specific GEMM size 2018-05-19 17:42:11 +02:00
Cedric Nugteren 76e0079a90 Fixed compilation issues 2018-05-19 14:18:23 +02:00
Cedric Nugteren 66583b3cda The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target 2018-05-19 12:48:59 +02:00
Cedric Nugteren b855af681f Added a canary region for overflow detection to the tuners 2018-05-17 10:45:10 +01:00
Cedric Nugteren 3519d32ac4 Extended the GEMM tuner to be able to tune the new 'kernel 1' 2018-04-07 17:05:44 +02:00
Cedric Nugteren d86ff75fa5 Added argument checking for the GEMM tuner: expects m/n to be multiples of MWG/NWG 2018-03-30 10:23:33 +02:00
Cedric Nugteren 9fb6550dd0 Added the OpenCL local memory size constraint to the tuners 2018-03-22 21:01:02 +01:00
Cedric Nugteren 7a2371213b Re-added support for local memory size constraint checking in the tuner 2018-03-21 22:58:37 +01:00
Cedric Nugteren 903deaf368 Fixed an issue for DLL linking under Windows 2018-03-10 16:45:31 +01:00
Cedric Nugteren 3d2ef9331b Fixed a few things for the new tuning API 2018-03-10 14:35:11 +01:00
Cedric Nugteren 0bdc51e47c Completed the API for all tuneable kernels 2018-03-10 10:54:44 +01:00
Cedric Nugteren 6397e61746 Added several more tuner API functions 2018-03-09 21:40:22 +01:00
Cedric Nugteren 49cc8b31ff Fixed compilation issue in Xger tuner 2018-03-06 20:59:23 +01:00
Cedric Nugteren 0e1a152023 First version of the tuning API, added interface for copy-kernel, added sample 2018-03-06 20:52:12 +01:00
Cedric Nugteren a1cedf36e3 Separate kernel tuners in .cpp with main and .hpp with settings 2018-03-03 16:37:31 +01:00
Cedric Nugteren 0557694d39 Fixed several issues in the new invert tuner 2018-02-20 20:53:13 +01:00
Cedric Nugteren 19fd263fb2 Moved some constants from global scope to a function; removed unnecessary includes 2018-01-25 20:00:43 +01:00
Cedric Nugteren 6a9d6b5da2 Changed the default number of runs for the GEMV tuner to fix issues for FP16 2018-01-25 19:57:36 +01:00
Cedric Nugteren c3f9371d16 Made GEMM routine tuning a bit more generic in preparation of possible separate batched tuning arguments 2018-01-18 19:41:59 +01:00
Cedric Nugteren 0e5eaa6eb9 Factored out the generic parts of the GEMM routine tuner 2018-01-15 21:32:51 +01:00
Cedric Nugteren c9b5d614e2 Fixed a vendor naming bug in the tuners and in the database 2018-01-06 17:02:58 +01:00
Cedric Nugteren ef71d8e9b5 Fixed unused variable warnings showing up with Clang 2017-12-23 16:07:26 +01:00
Cedric Nugteren 288766debb Now calling main TRSV routine again to fix compilation in MSVC 2017-12-23 14:49:21 +01:00
Cedric Nugteren 736399e528 Split the invert kernel in two parts to prevent error C1091 in MSVC 2013 2017-12-23 14:18:07 +01:00
Cedric Nugteren b1f52f130c Updated the database to use the new TRSV and Invert tuners 2017-12-23 13:55:22 +01:00
Cedric Nugteren aa7db4f987 Added TRSV block-size tuner 2017-12-23 13:34:57 +01:00
Cedric Nugteren 07a7012b0d Added skeleton for a tuner for the invert kernel 2017-12-19 21:10:48 +01:00
Cedric Nugteren 249bdaa8e9 Reformatted tuning code to make compilation faster 2017-12-18 21:34:07 +01:00
Cedric Nugteren e2f8068459 Fixed an issue with the tuner: it was using platform vendor rather than device vendor 2017-12-17 17:58:06 +01:00
Cedric Nugteren 7408f6e6eb Fixed an unnecessary overflow issue on 32-bit systems 2017-12-17 16:42:54 +01:00
Cedric Nugteren b4d3a50f19 Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit 2017-12-10 16:09:09 +01:00
Cedric Nugteren c2f08fa346 Fixed an issue in the tuners to prevent error -14 from persisting (CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST) 2017-12-10 14:48:13 +01:00
Cedric Nugteren ca5dbcd2bd Made the pre-processor run by default for ARM and Qualcomm GPUs 2017-12-09 15:16:53 +01:00
Cedric Nugteren 13eb772343 Integrated pre-processor in compilation flow, default is still disabled 2017-11-30 21:32:47 +01:00