Cedric Nugteren
|
d182d69e5a
|
Fixed AppVeyor issue
|
2017-12-21 20:26:09 +01:00 |
|
Cedric Nugteren
|
9dec53ff52
|
Merge branch 'master' into feature/more_tuners
|
2017-12-21 20:18:05 +01:00 |
|
Cedric Nugteren
|
3948cd6551
|
Made plotting script more resilient to missing data
|
2017-12-20 20:12:02 +01:00 |
|
Cedric Nugteren
|
0ee81e27b9
|
Added tuning results for Apple AMD Radeon Pro 580
|
2017-12-20 19:59:31 +01:00 |
|
Cedric Nugteren
|
c680666250
|
Added try-except to database script parser to skip invalid files
|
2017-12-20 19:14:04 +01:00 |
|
Cedric Nugteren
|
07a7012b0d
|
Added skeleton for a tuner for the invert kernel
|
2017-12-19 21:10:48 +01:00 |
|
Cedric Nugteren
|
249bdaa8e9
|
Reformatted tuning code to make compilation faster
|
2017-12-18 21:34:07 +01:00 |
|
Cedric Nugteren
|
e2f8068459
|
Fixed an issue with the tuner: it was using platform vendor rather than device vendor
|
2017-12-17 17:58:06 +01:00 |
|
Cedric Nugteren
|
a40d91f68f
|
Merge pull request #230 from CNugteren/kernel_preprocessor
Added an OpenCL kernel preprocessor
|
2017-12-17 17:57:28 +01:00 |
|
Cedric Nugteren
|
69f6591564
|
Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results based on kernel pre-processor
|
2017-12-17 16:59:08 +01:00 |
|
Cedric Nugteren
|
7408f6e6eb
|
Fixed an unnecessary overflow issue on 32-bit systems
|
2017-12-17 16:42:54 +01:00 |
|
Cedric Nugteren
|
35e2b3ed5b
|
Updated the known issues
|
2017-12-16 12:11:15 +01:00 |
|
Cedric Nugteren
|
4a58efc130
|
Fixed for error C1091 in MSVC 2013
|
2017-12-10 16:40:59 +01:00 |
|
Cedric Nugteren
|
b4d3a50f19
|
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
|
2017-12-10 16:09:09 +01:00 |
|
Cedric Nugteren
|
11489e68ef
|
Updated roadmap: completed pre-processor implementation
|
2017-12-10 16:08:06 +01:00 |
|
Cedric Nugteren
|
82467b64c4
|
Fixed a missing include
|
2017-12-10 14:49:38 +01:00 |
|
Cedric Nugteren
|
c2f08fa346
|
Fixed an issue in the tuners to prevent error -14 from persisting (CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST)
|
2017-12-10 14:48:13 +01:00 |
|
Cedric Nugteren
|
9112e587ae
|
Fixed an Android compilation issue
|
2017-12-10 13:31:57 +01:00 |
|
Cedric Nugteren
|
9f02fb542c
|
Completed kernel modifications for pre-processor of all other kernels
|
2017-12-09 20:44:21 +01:00 |
|
Cedric Nugteren
|
ca5dbcd2bd
|
Made the pre-processor run by default for ARM and Qualcomm GPUs
|
2017-12-09 15:16:53 +01:00 |
|
Cedric Nugteren
|
02c0d64037
|
Modified the direct GEMM kernel to support array-to-register promotion
|
2017-12-09 14:53:10 +01:00 |
|
Cedric Nugteren
|
23e3a85f2c
|
Reformatted GEMM kernel to support array-to-register promotion
|
2017-12-09 14:09:13 +01:00 |
|
Cedric Nugteren
|
d9df62b794
|
Fixed defines parsing and substituting in pre-processor; fixed some variable names in kernels
|
2017-12-09 10:49:55 +01:00 |
|
Cedric Nugteren
|
540896476d
|
Added register promotion to the main GEMM kernel
|
2017-12-07 22:05:29 +01:00 |
|
Cedric Nugteren
|
0f9637bbac
|
Improved array-to-register promotion, now handling function calls as well
|
2017-12-05 20:39:49 +01:00 |
|
Cedric Nugteren
|
cf4555d1f4
|
Added GEMM (direct and in-direct) to the pre-processor testing; modified the loops in kernel accordingly
|
2017-12-03 16:40:36 +01:00 |
|
Cedric Nugteren
|
0a1a3de58a
|
Added basic bracket parsing in defines and loop expressions
|
2017-12-03 16:39:22 +01:00 |
|
Cedric Nugteren
|
60312e5878
|
Reformated transpose kernels for the pre-processor; extended the amount of tests
|
2017-12-03 12:00:37 +01:00 |
|
Cedric Nugteren
|
92842024b0
|
Improved array to register promotion in the pre-processor
|
2017-12-03 11:59:38 +01:00 |
|
Cedric Nugteren
|
bf7aeb8d5b
|
Improved the pre-processor's handling of defines; added a special nested defines test
|
2017-11-30 21:43:16 +01:00 |
|
Cedric Nugteren
|
13eb772343
|
Integrated pre-processor in compilation flow, default is still disabled
|
2017-11-30 21:32:47 +01:00 |
|
Cedric Nugteren
|
93ffb876c6
|
Reformatted unrollable kernel loops and added the new promote_to_registers pragma for several kernels
|
2017-11-29 20:21:08 +01:00 |
|
Cedric Nugteren
|
0dde6af703
|
Extended the preprocessor tests to include CopyFast and CopyPad
|
2017-11-29 20:18:36 +01:00 |
|
Cedric Nugteren
|
1d35f65cea
|
Improves the array-to-register promotion in the pre-processor
|
2017-11-29 19:53:50 +01:00 |
|
Cedric Nugteren
|
426406668e
|
Improved the pre-processor tester, added GEMV and GER kernels
|
2017-11-28 20:52:47 +01:00 |
|
Cedric Nugteren
|
14047861ce
|
Improved the kernel pre-processor in various ways
|
2017-11-28 20:52:08 +01:00 |
|
Cedric Nugteren
|
35956f9db1
|
Added simple implementation of array-to-register promotion
|
2017-11-27 20:26:30 +01:00 |
|
Cedric Nugteren
|
9c643b293c
|
Improved the for-loop pre-processing
|
2017-11-26 13:32:48 +01:00 |
|
Cedric Nugteren
|
69aa3b35ed
|
Implemented first simple pre-processor: defines parser and loop unrolling based on assumptions
|
2017-11-25 17:46:01 +01:00 |
|
Cedric Nugteren
|
f01bcded1e
|
Moved string splitting functions; added string character removal function
|
2017-11-25 17:44:21 +01:00 |
|
Cedric Nugteren
|
c0c6d00b12
|
Added stub for a preprocessor and a corresponding compilation test
|
2017-11-25 10:24:05 +01:00 |
|
Cedric Nugteren
|
ebce82e650
|
Merge pull request #222 from CNugteren/override_params_from_json
Override params in clients from tuner JSON
|
2017-11-25 09:48:27 +01:00 |
|
Cedric Nugteren
|
d7b29d864a
|
Fixed a Clang compilation error
|
2017-11-24 21:41:45 +01:00 |
|
Cedric Nugteren
|
abb4d5ab32
|
Added tuning results for ARM Mali T760 GPU
|
2017-11-24 21:16:54 +01:00 |
|
Cedric Nugteren
|
a5cef9ef3b
|
Added missing include file
|
2017-11-24 21:11:52 +01:00 |
|
Cedric Nugteren
|
a768b7686b
|
Added precision check to parameter override for the clients
|
2017-11-24 21:09:39 +01:00 |
|
Cedric Nugteren
|
9527c89c30
|
Made parameter override in the clients a command-line argument and added support for multi-kernel routines
|
2017-11-22 20:53:20 +01:00 |
|
Cedric Nugteren
|
8c9ecd9736
|
Implemented first version of reading JSON files from disk in the client to override parameters
|
2017-11-21 22:05:08 +01:00 |
|
Cedric Nugteren
|
606990af6f
|
Made the database script properly handle multiple entries for a single device
|
2017-11-20 21:38:23 +01:00 |
|
Cedric Nugteren
|
0f080bbc6e
|
Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated
|
2017-11-20 20:54:18 +01:00 |
|