Commit Graph

1001 Commits (7aabeb44ccbaac2b467a585ec11a79a85fdd7e34)

Author SHA1 Message Date
Cedric Nugteren 7aabeb44cc Updated the tuning results for the IvyBridge M GT2 GPU 2017-12-23 15:46:41 +01:00
Cedric Nugteren 2b020d59f9 Added defines to disable OpenCL deprecation warnings 2017-12-23 15:32:22 +01:00
Cedric Nugteren 04bf5437bc Fixed a warning under MSVC 2017-12-23 15:30:08 +01:00
Cedric Nugteren c6db6f67d7
Merge pull request #232 from CNugteren/feature/more_tuners
First tuners for the TRSV (block size) and TRSM (invert kernel) routines
2017-12-23 15:26:52 +01:00
Cedric Nugteren 288766debb Now calling main TRSV routine again to fix compilation in MSVC 2017-12-23 14:49:21 +01:00
Cedric Nugteren 736399e528 Split the invert kernel in two parts to prevent error C1091 in MSVC 2013 2017-12-23 14:18:07 +01:00
Cedric Nugteren b1f52f130c Updated the database to use the new TRSV and Invert tuners 2017-12-23 13:55:22 +01:00
Cedric Nugteren aa7db4f987 Added TRSV block-size tuner 2017-12-23 13:34:57 +01:00
Cedric Nugteren 2b007450b9 Fixed AppVeyor issue 2017-12-21 20:38:25 +01:00
Cedric Nugteren d182d69e5a Fixed AppVeyor issue 2017-12-21 20:26:09 +01:00
Cedric Nugteren 9dec53ff52 Merge branch 'master' into feature/more_tuners 2017-12-21 20:18:05 +01:00
Cedric Nugteren 3948cd6551 Made plotting script more resilient to missing data 2017-12-20 20:12:02 +01:00
Cedric Nugteren 0ee81e27b9 Added tuning results for Apple AMD Radeon Pro 580 2017-12-20 19:59:31 +01:00
Cedric Nugteren c680666250 Added try-except to database script parser to skip invalid files 2017-12-20 19:14:04 +01:00
Cedric Nugteren 07a7012b0d Added skeleton for a tuner for the invert kernel 2017-12-19 21:10:48 +01:00
Cedric Nugteren 249bdaa8e9 Reformatted tuning code to make compilation faster 2017-12-18 21:34:07 +01:00
Cedric Nugteren e2f8068459 Fixed an issue with the tuner: it was using platform vendor rather than device vendor 2017-12-17 17:58:06 +01:00
Cedric Nugteren a40d91f68f
Merge pull request #230 from CNugteren/kernel_preprocessor
Added an OpenCL kernel preprocessor
2017-12-17 17:57:28 +01:00
Cedric Nugteren 69f6591564 Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results based on kernel pre-processor 2017-12-17 16:59:08 +01:00
Cedric Nugteren 7408f6e6eb Fixed an unnecessary overflow issue on 32-bit systems 2017-12-17 16:42:54 +01:00
Cedric Nugteren 35e2b3ed5b Updated the known issues 2017-12-16 12:11:15 +01:00
Cedric Nugteren 4a58efc130 Fixed for error C1091 in MSVC 2013 2017-12-10 16:40:59 +01:00
Cedric Nugteren b4d3a50f19 Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit 2017-12-10 16:09:09 +01:00
Cedric Nugteren 11489e68ef Updated roadmap: completed pre-processor implementation 2017-12-10 16:08:06 +01:00
Cedric Nugteren 82467b64c4 Fixed a missing include 2017-12-10 14:49:38 +01:00
Cedric Nugteren c2f08fa346 Fixed an issue in the tuners to prevent error -14 from persisting (CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST) 2017-12-10 14:48:13 +01:00
Cedric Nugteren 9112e587ae Fixed an Android compilation issue 2017-12-10 13:31:57 +01:00
Cedric Nugteren 9f02fb542c Completed kernel modifications for pre-processor of all other kernels 2017-12-09 20:44:21 +01:00
Cedric Nugteren ca5dbcd2bd Made the pre-processor run by default for ARM and Qualcomm GPUs 2017-12-09 15:16:53 +01:00
Cedric Nugteren 02c0d64037 Modified the direct GEMM kernel to support array-to-register promotion 2017-12-09 14:53:10 +01:00
Cedric Nugteren 23e3a85f2c Reformatted GEMM kernel to support array-to-register promotion 2017-12-09 14:09:13 +01:00
Cedric Nugteren d9df62b794 Fixed defines parsing and substituting in pre-processor; fixed some variable names in kernels 2017-12-09 10:49:55 +01:00
Cedric Nugteren 540896476d Added register promotion to the main GEMM kernel 2017-12-07 22:05:29 +01:00
Cedric Nugteren 0f9637bbac Improved array-to-register promotion, now handling function calls as well 2017-12-05 20:39:49 +01:00
Cedric Nugteren cf4555d1f4 Added GEMM (direct and in-direct) to the pre-processor testing; modified the loops in kernel accordingly 2017-12-03 16:40:36 +01:00
Cedric Nugteren 0a1a3de58a Added basic bracket parsing in defines and loop expressions 2017-12-03 16:39:22 +01:00
Cedric Nugteren 60312e5878 Reformated transpose kernels for the pre-processor; extended the amount of tests 2017-12-03 12:00:37 +01:00
Cedric Nugteren 92842024b0 Improved array to register promotion in the pre-processor 2017-12-03 11:59:38 +01:00
Cedric Nugteren bf7aeb8d5b Improved the pre-processor's handling of defines; added a special nested defines test 2017-11-30 21:43:16 +01:00
Cedric Nugteren 13eb772343 Integrated pre-processor in compilation flow, default is still disabled 2017-11-30 21:32:47 +01:00
Cedric Nugteren 93ffb876c6 Reformatted unrollable kernel loops and added the new promote_to_registers pragma for several kernels 2017-11-29 20:21:08 +01:00
Cedric Nugteren 0dde6af703 Extended the preprocessor tests to include CopyFast and CopyPad 2017-11-29 20:18:36 +01:00
Cedric Nugteren 1d35f65cea Improves the array-to-register promotion in the pre-processor 2017-11-29 19:53:50 +01:00
Cedric Nugteren 426406668e Improved the pre-processor tester, added GEMV and GER kernels 2017-11-28 20:52:47 +01:00
Cedric Nugteren 14047861ce Improved the kernel pre-processor in various ways 2017-11-28 20:52:08 +01:00
Cedric Nugteren 35956f9db1 Added simple implementation of array-to-register promotion 2017-11-27 20:26:30 +01:00
Cedric Nugteren 9c643b293c Improved the for-loop pre-processing 2017-11-26 13:32:48 +01:00
Cedric Nugteren 69aa3b35ed Implemented first simple pre-processor: defines parser and loop unrolling based on assumptions 2017-11-25 17:46:01 +01:00
Cedric Nugteren f01bcded1e Moved string splitting functions; added string character removal function 2017-11-25 17:44:21 +01:00
Cedric Nugteren c0c6d00b12 Added stub for a preprocessor and a corresponding compilation test 2017-11-25 10:24:05 +01:00