Commit graph

1319 commits

Author SHA1 Message Date
Cedric Nugteren 7f893a85d9 Revert "Added options to disable parts of the invert kernel to find out where the AMD compiler crashes"
This reverts commit 407ed52cec.
2017-12-31 16:10:40 +01:00
Cedric Nugteren b4c8e1d9a5 Made plotting script more flexible: extra argument to set the comparison library 2017-12-31 16:02:46 +01:00
Cedric Nugteren 69226ae828 Changed the invert kernel slightly; added part1a/part1b disable-defines 2017-12-31 14:07:08 +01:00
Cedric Nugteren 7ce415b927 Fixed ifdef's into ifndef's 2017-12-30 21:17:31 +01:00
Cedric Nugteren 407ed52cec Added options to disable parts of the invert kernel to find out where the AMD compiler crashes 2017-12-30 21:07:50 +01:00
Cedric Nugteren ad1227c4f2 Added optional temp-buffer argument to C++ interface of GEMM 2017-12-30 18:45:06 +01:00
Cedric Nugteren 6d1e30e61f Added interface to compute the required temporary buffer size for GEMM 2017-12-28 14:46:45 +01:00
Cedric Nugteren aaea9474a1 Factored out argument processing from the GEMM routine 2017-12-28 13:56:18 +01:00
Cedric Nugteren 74792ce96c Refactored GEMM code in preparation of separate temp-buffer computation 2017-12-28 11:08:10 +01:00
Cedric Nugteren 936cf2668d
Merge pull request #234 from CNugteren/database_compilation_split
Database compilation split
2017-12-27 20:05:31 +01:00
Cedric Nugteren 0eb9b35481 Added a simple test to check compilation of the invert kernels (issue with AMD APP) 2017-12-27 17:16:08 +01:00
Cedric Nugteren 2b9bf3a9aa Simplified invert kernel a little 2017-12-27 17:03:06 +01:00
Cedric Nugteren 1e738db6dd Split the database into multiple small compilation units 2017-12-27 12:04:22 +01:00
Cedric Nugteren 4a2fc4aa98 Made the database-vector a non-static member 2017-12-26 11:32:05 +01:00
Cedric Nugteren bd540829ea Fixes for the CUDA backend of CLBlast 2017-12-24 12:10:55 +01:00
Cedric Nugteren 8657e90cf8 Fixed linking of the preprocessor test for MSVC 2017-12-24 11:33:47 +01:00
Cedric Nugteren e81eb4f6d4 Added a note that the ArrayFire Jenkins servers are down, being switched to buildbot 2017-12-24 11:32:31 +01:00
Cedric Nugteren ef71d8e9b5 Fixed unused variable warnings showing up with Clang 2017-12-23 16:07:26 +01:00
Cedric Nugteren 7aabeb44cc Updated the tuning results for the IvyBridge M GT2 GPU 2017-12-23 15:46:41 +01:00
Cedric Nugteren 2b020d59f9 Added defines to disable OpenCL deprecation warnings 2017-12-23 15:32:22 +01:00
Cedric Nugteren 04bf5437bc Fixed a warning under MSVC 2017-12-23 15:30:08 +01:00
Cedric Nugteren c6db6f67d7
Merge pull request #232 from CNugteren/feature/more_tuners
First tuners for the TRSV (block size) and TRSM (invert kernel) routines
2017-12-23 15:26:52 +01:00
Cedric Nugteren 288766debb Now calling main TRSV routine again to fix compilation in MSVC 2017-12-23 14:49:21 +01:00
Cedric Nugteren 736399e528 Split the invert kernel in two parts to prevent error C1091 in MSVC 2013 2017-12-23 14:18:07 +01:00
Cedric Nugteren b1f52f130c Updated the database to use the new TRSV and Invert tuners 2017-12-23 13:55:22 +01:00
Cedric Nugteren aa7db4f987 Added TRSV block-size tuner 2017-12-23 13:34:57 +01:00
Cedric Nugteren 2b007450b9 Fixed AppVeyor issue 2017-12-21 20:38:25 +01:00
Cedric Nugteren d182d69e5a Fixed AppVeyor issue 2017-12-21 20:26:09 +01:00
Cedric Nugteren 9dec53ff52 Merge branch 'master' into feature/more_tuners 2017-12-21 20:18:05 +01:00
Cedric Nugteren 3948cd6551 Made plotting script more resilient to missing data 2017-12-20 20:12:02 +01:00
Cedric Nugteren 0ee81e27b9 Added tuning results for Apple AMD Radeon Pro 580 2017-12-20 19:59:31 +01:00
Cedric Nugteren c680666250 Added try-except to database script parser to skip invalid files 2017-12-20 19:14:04 +01:00
Cedric Nugteren 07a7012b0d Added skeleton for a tuner for the invert kernel 2017-12-19 21:10:48 +01:00
Cedric Nugteren 249bdaa8e9 Reformatted tuning code to make compilation faster 2017-12-18 21:34:07 +01:00
Cedric Nugteren e2f8068459 Fixed an issue with the tuner: it was using platform vendor rather than device vendor 2017-12-17 17:58:06 +01:00
Cedric Nugteren a40d91f68f
Merge pull request #230 from CNugteren/kernel_preprocessor
Added an OpenCL kernel preprocessor
2017-12-17 17:57:28 +01:00
Cedric Nugteren 69f6591564 Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results based on kernel pre-processor 2017-12-17 16:59:08 +01:00
Cedric Nugteren 7408f6e6eb Fixed an unnecessary overflow issue on 32-bit systems 2017-12-17 16:42:54 +01:00
Cedric Nugteren 35e2b3ed5b Updated the known issues 2017-12-16 12:11:15 +01:00
Cedric Nugteren 4a58efc130 Fixed for error C1091 in MSVC 2013 2017-12-10 16:40:59 +01:00
Cedric Nugteren b4d3a50f19 Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit 2017-12-10 16:09:09 +01:00
Cedric Nugteren 11489e68ef Updated roadmap: completed pre-processor implementation 2017-12-10 16:08:06 +01:00
Cedric Nugteren 82467b64c4 Fixed a missing include 2017-12-10 14:49:38 +01:00
Cedric Nugteren c2f08fa346 Fixed an issue in the tuners to prevent error -14 from persisting (CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST) 2017-12-10 14:48:13 +01:00
Cedric Nugteren 9112e587ae Fixed an Android compilation issue 2017-12-10 13:31:57 +01:00
Cedric Nugteren 9f02fb542c Completed kernel modifications for pre-processor of all other kernels 2017-12-09 20:44:21 +01:00
Cedric Nugteren ca5dbcd2bd Made the pre-processor run by default for ARM and Qualcomm GPUs 2017-12-09 15:16:53 +01:00
Cedric Nugteren 02c0d64037 Modified the direct GEMM kernel to support array-to-register promotion 2017-12-09 14:53:10 +01:00
Cedric Nugteren 23e3a85f2c Reformatted GEMM kernel to support array-to-register promotion 2017-12-09 14:09:13 +01:00
Cedric Nugteren d9df62b794 Fixed defines parsing and substituting in pre-processor; fixed some variable names in kernels 2017-12-09 10:49:55 +01:00