Commit graph

546 commits

Author SHA1 Message Date
Cedric Nugteren a500f537d8 Added a RetrieveParameters function to inspect tuning parameters 2018-01-11 20:32:06 +01:00
Cedric Nugteren 99a4df88a6 Implemented the in-direct version of the strided-batched GEMM kernel 2018-01-08 21:07:01 +01:00
Cedric Nugteren 13f0f6fc6e Implemented direct version of strided-batched GEMM kernel 2018-01-07 14:58:45 +01:00
Cedric Nugteren 9fb2c61b25 Added API and tests for new GemmStridedBatched routine 2018-01-07 14:27:15 +01:00
Cedric Nugteren f1e3b35541 Reduced duplicate code in the batched GEMM implementation 2018-01-06 19:26:11 +01:00
Cedric Nugteren c9b5d614e2 Fixed a vendor naming bug in the tuners and in the database 2018-01-06 17:02:58 +01:00
Cedric Nugteren a7ccce1969
Merge pull request #238 from CNugteren/gemm_api_with_temp_buffer
GEMM API with optional temp buffer
2018-01-06 16:08:27 +01:00
Cedric Nugteren ad197da08d Fixed the CUDA interface: replaced nullptr with 0 2018-01-06 13:38:44 +01:00
Cedric Nugteren e71c037304 Fixed a performance overhead in database creation: it is again a static variable now as it was before 2018-01-06 11:28:04 +01:00
Cedric Nugteren ce069545d4 Added CUDA interface to get temporary-buffer size for GEMM routine 2018-01-06 10:05:28 +01:00
Cedric Nugteren 44431daecc Added a CUDA version of the GEMM temp-buffer optional argument 2018-01-04 19:33:51 +01:00
Cedric Nugteren af14fff1e9 Updated the generator script to automatically generate the temp-buffer code 2018-01-04 19:31:57 +01:00
Cedric Nugteren 7f893a85d9 Revert "Added options to disable parts of the invert kernel to find out where the AMD compiler crashes"
This reverts commit 407ed52cec.
2017-12-31 16:10:40 +01:00
Cedric Nugteren 69226ae828 Changed the invert kernel slightly; added part1a/part1b disable-defines 2017-12-31 14:07:08 +01:00
Cedric Nugteren 7ce415b927 Fixed ifdef's into ifndef's 2017-12-30 21:17:31 +01:00
Cedric Nugteren 407ed52cec Added options to disable parts of the invert kernel to find out where the AMD compiler crashes 2017-12-30 21:07:50 +01:00
Cedric Nugteren ad1227c4f2 Added optional temp-buffer argument to C++ interface of GEMM 2017-12-30 18:45:06 +01:00
Cedric Nugteren 6d1e30e61f Added interface to compute the required temporary buffer size for GEMM 2017-12-28 14:46:45 +01:00
Cedric Nugteren aaea9474a1 Factored out argument processing from the GEMM routine 2017-12-28 13:56:18 +01:00
Cedric Nugteren 74792ce96c Refactored GEMM code in preparation of separate temp-buffer computation 2017-12-28 11:08:10 +01:00
Cedric Nugteren 2b9bf3a9aa Simplified invert kernel a little 2017-12-27 17:03:06 +01:00
Cedric Nugteren 1e738db6dd Split the database into multiple small compilation units 2017-12-27 12:04:22 +01:00
Cedric Nugteren 4a2fc4aa98 Made the database-vector a non-static member 2017-12-26 11:32:05 +01:00
Cedric Nugteren bd540829ea Fixes for the CUDA backend of CLBlast 2017-12-24 12:10:55 +01:00
Cedric Nugteren ef71d8e9b5 Fixed unused variable warnings showing up with Clang 2017-12-23 16:07:26 +01:00
Cedric Nugteren 7aabeb44cc Updated the tuning results for the IvyBridge M GT2 GPU 2017-12-23 15:46:41 +01:00
Cedric Nugteren 2b020d59f9 Added defines to disable OpenCL deprecation warnings 2017-12-23 15:32:22 +01:00
Cedric Nugteren 04bf5437bc Fixed a warning under MSVC 2017-12-23 15:30:08 +01:00
Cedric Nugteren 288766debb Now calling main TRSV routine again to fix compilation in MSVC 2017-12-23 14:49:21 +01:00
Cedric Nugteren 736399e528 Split the invert kernel in two parts to prevent error C1091 in MSVC 2013 2017-12-23 14:18:07 +01:00
Cedric Nugteren b1f52f130c Updated the database to use the new TRSV and Invert tuners 2017-12-23 13:55:22 +01:00
Cedric Nugteren aa7db4f987 Added TRSV block-size tuner 2017-12-23 13:34:57 +01:00
Cedric Nugteren 9dec53ff52 Merge branch 'master' into feature/more_tuners 2017-12-21 20:18:05 +01:00
Cedric Nugteren 0ee81e27b9 Added tuning results for Apple AMD Radeon Pro 580 2017-12-20 19:59:31 +01:00
Cedric Nugteren 07a7012b0d Added skeleton for a tuner for the invert kernel 2017-12-19 21:10:48 +01:00
Cedric Nugteren 249bdaa8e9 Reformatted tuning code to make compilation faster 2017-12-18 21:34:07 +01:00
Cedric Nugteren e2f8068459 Fixed an issue with the tuner: it was using platform vendor rather than device vendor 2017-12-17 17:58:06 +01:00
Cedric Nugteren 69f6591564 Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results based on kernel pre-processor 2017-12-17 16:59:08 +01:00
Cedric Nugteren 7408f6e6eb Fixed an unnecessary overflow issue on 32-bit systems 2017-12-17 16:42:54 +01:00
Cedric Nugteren 4a58efc130 Fixed for error C1091 in MSVC 2013 2017-12-10 16:40:59 +01:00
Cedric Nugteren b4d3a50f19 Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit 2017-12-10 16:09:09 +01:00
Cedric Nugteren 82467b64c4 Fixed a missing include 2017-12-10 14:49:38 +01:00
Cedric Nugteren c2f08fa346 Fixed an issue in the tuners to prevent error -14 from persisting (CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST) 2017-12-10 14:48:13 +01:00
Cedric Nugteren 9112e587ae Fixed an Android compilation issue 2017-12-10 13:31:57 +01:00
Cedric Nugteren 9f02fb542c Completed kernel modifications for pre-processor of all other kernels 2017-12-09 20:44:21 +01:00
Cedric Nugteren ca5dbcd2bd Made the pre-processor run by default for ARM and Qualcomm GPUs 2017-12-09 15:16:53 +01:00
Cedric Nugteren 02c0d64037 Modified the direct GEMM kernel to support array-to-register promotion 2017-12-09 14:53:10 +01:00
Cedric Nugteren 23e3a85f2c Reformatted GEMM kernel to support array-to-register promotion 2017-12-09 14:09:13 +01:00
Cedric Nugteren d9df62b794 Fixed defines parsing and substituting in pre-processor; fixed some variable names in kernels 2017-12-09 10:49:55 +01:00
Cedric Nugteren 540896476d Added register promotion to the main GEMM kernel 2017-12-07 22:05:29 +01:00