Cedric Nugteren
|
cf4555d1f4
|
Added GEMM (direct and in-direct) to the pre-processor testing; modified the loops in kernel accordingly
|
2017-12-03 16:40:36 +01:00 |
|
Cedric Nugteren
|
0a1a3de58a
|
Added basic bracket parsing in defines and loop expressions
|
2017-12-03 16:39:22 +01:00 |
|
Cedric Nugteren
|
60312e5878
|
Reformated transpose kernels for the pre-processor; extended the amount of tests
|
2017-12-03 12:00:37 +01:00 |
|
Cedric Nugteren
|
92842024b0
|
Improved array to register promotion in the pre-processor
|
2017-12-03 11:59:38 +01:00 |
|
Cedric Nugteren
|
bf7aeb8d5b
|
Improved the pre-processor's handling of defines; added a special nested defines test
|
2017-11-30 21:43:16 +01:00 |
|
Cedric Nugteren
|
13eb772343
|
Integrated pre-processor in compilation flow, default is still disabled
|
2017-11-30 21:32:47 +01:00 |
|
Cedric Nugteren
|
93ffb876c6
|
Reformatted unrollable kernel loops and added the new promote_to_registers pragma for several kernels
|
2017-11-29 20:21:08 +01:00 |
|
Cedric Nugteren
|
0dde6af703
|
Extended the preprocessor tests to include CopyFast and CopyPad
|
2017-11-29 20:18:36 +01:00 |
|
Cedric Nugteren
|
1d35f65cea
|
Improves the array-to-register promotion in the pre-processor
|
2017-11-29 19:53:50 +01:00 |
|
Cedric Nugteren
|
426406668e
|
Improved the pre-processor tester, added GEMV and GER kernels
|
2017-11-28 20:52:47 +01:00 |
|
Cedric Nugteren
|
14047861ce
|
Improved the kernel pre-processor in various ways
|
2017-11-28 20:52:08 +01:00 |
|
Cedric Nugteren
|
35956f9db1
|
Added simple implementation of array-to-register promotion
|
2017-11-27 20:26:30 +01:00 |
|
Cedric Nugteren
|
9c643b293c
|
Improved the for-loop pre-processing
|
2017-11-26 13:32:48 +01:00 |
|
Cedric Nugteren
|
69aa3b35ed
|
Implemented first simple pre-processor: defines parser and loop unrolling based on assumptions
|
2017-11-25 17:46:01 +01:00 |
|
Cedric Nugteren
|
f01bcded1e
|
Moved string splitting functions; added string character removal function
|
2017-11-25 17:44:21 +01:00 |
|
Cedric Nugteren
|
c0c6d00b12
|
Added stub for a preprocessor and a corresponding compilation test
|
2017-11-25 10:24:05 +01:00 |
|
Cedric Nugteren
|
ebce82e650
|
Merge pull request #222 from CNugteren/override_params_from_json
Override params in clients from tuner JSON
|
2017-11-25 09:48:27 +01:00 |
|
Cedric Nugteren
|
d7b29d864a
|
Fixed a Clang compilation error
|
2017-11-24 21:41:45 +01:00 |
|
Cedric Nugteren
|
abb4d5ab32
|
Added tuning results for ARM Mali T760 GPU
|
2017-11-24 21:16:54 +01:00 |
|
Cedric Nugteren
|
a5cef9ef3b
|
Added missing include file
|
2017-11-24 21:11:52 +01:00 |
|
Cedric Nugteren
|
a768b7686b
|
Added precision check to parameter override for the clients
|
2017-11-24 21:09:39 +01:00 |
|
Cedric Nugteren
|
9527c89c30
|
Made parameter override in the clients a command-line argument and added support for multi-kernel routines
|
2017-11-22 20:53:20 +01:00 |
|
Cedric Nugteren
|
8c9ecd9736
|
Implemented first version of reading JSON files from disk in the client to override parameters
|
2017-11-21 22:05:08 +01:00 |
|
Cedric Nugteren
|
606990af6f
|
Made the database script properly handle multiple entries for a single device
|
2017-11-20 21:38:23 +01:00 |
|
Cedric Nugteren
|
0f080bbc6e
|
Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated
|
2017-11-20 20:54:18 +01:00 |
|
Cedric Nugteren
|
e0f3484084
|
Fixes some displaying issues in the GEMM routine tuner
|
2017-11-20 20:29:52 +01:00 |
|
Cedric Nugteren
|
5467c0cac5
|
Fixed a variety of warnings and an error for MSVC2013 compilation
|
2017-11-19 21:09:24 +01:00 |
|
Cedric Nugteren
|
da76d7ab81
|
Merge pull request #216 from CNugteren/integrated_tuner
Integrated tuner
|
2017-11-19 20:05:15 +01:00 |
|
Cedric Nugteren
|
defad3d1a2
|
Minor fix to the database script
|
2017-11-19 18:19:21 +01:00 |
|
Cedric Nugteren
|
4e0d08c3bc
|
Added compilation timing and better compilation error reporting
|
2017-11-19 16:58:13 +01:00 |
|
Cedric Nugteren
|
a3a8b44f59
|
Some fixed for the new auto-tuner to be compatible with the Python scripts
|
2017-11-19 16:31:08 +01:00 |
|
Cedric Nugteren
|
c6690df896
|
Made the tuners be compiled by default
|
2017-11-19 14:33:25 +01:00 |
|
Cedric Nugteren
|
76d2b7f0b6
|
Revived the GEMM routine tuner; minor formatting changes
|
2017-11-19 12:59:52 +01:00 |
|
Cedric Nugteren
|
8d2f7d53aa
|
Added a library with common tuner sources to speed-up compilation
|
2017-11-19 12:59:28 +01:00 |
|
Cedric Nugteren
|
7a54494577
|
Modified the kernel tuners to use the newly integrated auto-tuner
|
2017-11-19 12:58:41 +01:00 |
|
Cedric Nugteren
|
8a5a5e031e
|
Moved some tuning functions from .hpp to .cpp
|
2017-11-17 20:58:36 +01:00 |
|
Cedric Nugteren
|
f94d498a37
|
Moved compilation function to separate file; removed dependency of tuners of the CLBlast library
|
2017-11-17 20:57:46 +01:00 |
|
Cedric Nugteren
|
d9cf206979
|
Removed dependency on CLTune
|
2017-11-16 21:28:36 +01:00 |
|
Cedric Nugteren
|
2b8ad70b63
|
Added printing of the best parameters for the new tuner
|
2017-11-16 21:18:29 +01:00 |
|
Cedric Nugteren
|
1b2b46f2f0
|
Added first version of integrated and re-written auto-tuner
|
2017-11-15 22:49:35 +01:00 |
|
Cedric Nugteren
|
0cd78bb6f9
|
Added kernel timing functionality to the utilities
|
2017-11-15 22:47:06 +01:00 |
|
Cedric Nugteren
|
b337bffbaf
|
Added exception handle with catch-all
|
2017-11-15 22:44:44 +01:00 |
|
Cedric Nugteren
|
03ebf14b97
|
Made the exception dispatch function optionally silent
|
2017-11-13 21:11:31 +01:00 |
|
Cedric Nugteren
|
4bac1287f2
|
Moved square-difference utility function for use in the tuners
|
2017-11-13 21:10:44 +01:00 |
|
Cedric Nugteren
|
677afd3b96
|
Factored out the creation of the OpenCL header and the program compilation
|
2017-11-11 16:14:43 +01:00 |
|
Cedric Nugteren
|
c41d219ea4
|
Added tuning results for the GeForce GTX750Ti
|
2017-11-09 21:19:21 +01:00 |
|
Cedric Nugteren
|
5d5e3f93bc
|
Updated to CLBlast version 1.2.0
|
2017-11-08 21:30:06 +01:00 |
|
Cedric Nugteren
|
d24138808b
|
Fixed an FP16 issue in the homatcopy test; added a comment about improper testing of integer returning functions for FP16
|
2017-11-08 21:20:07 +01:00 |
|
Cedric Nugteren
|
b18cc9d3f1
|
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
|
2017-11-07 22:20:13 +01:00 |
|
Cedric Nugteren
|
6fe9916231
|
Updated the roadmap
|
2017-11-07 21:35:04 +01:00 |
|