Cedric Nugteren
|
0f9637bbac
|
Improved array-to-register promotion, now handling function calls as well
|
2017-12-05 20:39:49 +01:00 |
|
Cedric Nugteren
|
cf4555d1f4
|
Added GEMM (direct and in-direct) to the pre-processor testing; modified the loops in kernel accordingly
|
2017-12-03 16:40:36 +01:00 |
|
Cedric Nugteren
|
0a1a3de58a
|
Added basic bracket parsing in defines and loop expressions
|
2017-12-03 16:39:22 +01:00 |
|
Cedric Nugteren
|
60312e5878
|
Reformated transpose kernels for the pre-processor; extended the amount of tests
|
2017-12-03 12:00:37 +01:00 |
|
Cedric Nugteren
|
92842024b0
|
Improved array to register promotion in the pre-processor
|
2017-12-03 11:59:38 +01:00 |
|
Cedric Nugteren
|
bf7aeb8d5b
|
Improved the pre-processor's handling of defines; added a special nested defines test
|
2017-11-30 21:43:16 +01:00 |
|
Cedric Nugteren
|
13eb772343
|
Integrated pre-processor in compilation flow, default is still disabled
|
2017-11-30 21:32:47 +01:00 |
|
Cedric Nugteren
|
93ffb876c6
|
Reformatted unrollable kernel loops and added the new promote_to_registers pragma for several kernels
|
2017-11-29 20:21:08 +01:00 |
|
Cedric Nugteren
|
0dde6af703
|
Extended the preprocessor tests to include CopyFast and CopyPad
|
2017-11-29 20:18:36 +01:00 |
|
Cedric Nugteren
|
1d35f65cea
|
Improves the array-to-register promotion in the pre-processor
|
2017-11-29 19:53:50 +01:00 |
|
Cedric Nugteren
|
14047861ce
|
Improved the kernel pre-processor in various ways
|
2017-11-28 20:52:08 +01:00 |
|
Cedric Nugteren
|
35956f9db1
|
Added simple implementation of array-to-register promotion
|
2017-11-27 20:26:30 +01:00 |
|
Cedric Nugteren
|
9c643b293c
|
Improved the for-loop pre-processing
|
2017-11-26 13:32:48 +01:00 |
|
Cedric Nugteren
|
69aa3b35ed
|
Implemented first simple pre-processor: defines parser and loop unrolling based on assumptions
|
2017-11-25 17:46:01 +01:00 |
|
Cedric Nugteren
|
f01bcded1e
|
Moved string splitting functions; added string character removal function
|
2017-11-25 17:44:21 +01:00 |
|
Cedric Nugteren
|
c0c6d00b12
|
Added stub for a preprocessor and a corresponding compilation test
|
2017-11-25 10:24:05 +01:00 |
|
Cedric Nugteren
|
ebce82e650
|
Merge pull request #222 from CNugteren/override_params_from_json
Override params in clients from tuner JSON
|
2017-11-25 09:48:27 +01:00 |
|
Cedric Nugteren
|
abb4d5ab32
|
Added tuning results for ARM Mali T760 GPU
|
2017-11-24 21:16:54 +01:00 |
|
Cedric Nugteren
|
9527c89c30
|
Made parameter override in the clients a command-line argument and added support for multi-kernel routines
|
2017-11-22 20:53:20 +01:00 |
|
Cedric Nugteren
|
0f080bbc6e
|
Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated
|
2017-11-20 20:54:18 +01:00 |
|
Cedric Nugteren
|
e0f3484084
|
Fixes some displaying issues in the GEMM routine tuner
|
2017-11-20 20:29:52 +01:00 |
|
Cedric Nugteren
|
5467c0cac5
|
Fixed a variety of warnings and an error for MSVC2013 compilation
|
2017-11-19 21:09:24 +01:00 |
|
Cedric Nugteren
|
4e0d08c3bc
|
Added compilation timing and better compilation error reporting
|
2017-11-19 16:58:13 +01:00 |
|
Cedric Nugteren
|
a3a8b44f59
|
Some fixed for the new auto-tuner to be compatible with the Python scripts
|
2017-11-19 16:31:08 +01:00 |
|
Cedric Nugteren
|
76d2b7f0b6
|
Revived the GEMM routine tuner; minor formatting changes
|
2017-11-19 12:59:52 +01:00 |
|
Cedric Nugteren
|
7a54494577
|
Modified the kernel tuners to use the newly integrated auto-tuner
|
2017-11-19 12:58:41 +01:00 |
|
Cedric Nugteren
|
8a5a5e031e
|
Moved some tuning functions from .hpp to .cpp
|
2017-11-17 20:58:36 +01:00 |
|
Cedric Nugteren
|
f94d498a37
|
Moved compilation function to separate file; removed dependency of tuners of the CLBlast library
|
2017-11-17 20:57:46 +01:00 |
|
Cedric Nugteren
|
2b8ad70b63
|
Added printing of the best parameters for the new tuner
|
2017-11-16 21:18:29 +01:00 |
|
Cedric Nugteren
|
1b2b46f2f0
|
Added first version of integrated and re-written auto-tuner
|
2017-11-15 22:49:35 +01:00 |
|
Cedric Nugteren
|
0cd78bb6f9
|
Added kernel timing functionality to the utilities
|
2017-11-15 22:47:06 +01:00 |
|
Cedric Nugteren
|
b337bffbaf
|
Added exception handle with catch-all
|
2017-11-15 22:44:44 +01:00 |
|
Cedric Nugteren
|
03ebf14b97
|
Made the exception dispatch function optionally silent
|
2017-11-13 21:11:31 +01:00 |
|
Cedric Nugteren
|
4bac1287f2
|
Moved square-difference utility function for use in the tuners
|
2017-11-13 21:10:44 +01:00 |
|
Cedric Nugteren
|
677afd3b96
|
Factored out the creation of the OpenCL header and the program compilation
|
2017-11-11 16:14:43 +01:00 |
|
Cedric Nugteren
|
c41d219ea4
|
Added tuning results for the GeForce GTX750Ti
|
2017-11-09 21:19:21 +01:00 |
|
Cedric Nugteren
|
b18cc9d3f1
|
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
|
2017-11-07 22:20:13 +01:00 |
|
Cedric Nugteren
|
3ec0be6fb8
|
Added various GEMM routine tuning results
|
2017-11-07 21:34:54 +01:00 |
|
Cedric Nugteren
|
33ac2b0175
|
Improved the way the database defaults are computed
|
2017-11-06 21:59:45 +01:00 |
|
Cedric Nugteren
|
34a33b54cf
|
Changed GEMM routine tuner's scoring to use L2 measure instead for better averaging
|
2017-11-06 20:50:36 +01:00 |
|
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
|
Cedric Nugteren
|
73272ab97d
|
Fixed a bug in database compression/decompression
|
2017-11-02 21:19:18 +01:00 |
|
Cedric Nugteren
|
5c90577dfd
|
Added collecting and printing of scores for the kernel-selection tuner
|
2017-10-30 20:39:21 +01:00 |
|
Cedric Nugteren
|
ac5a58cfe5
|
Added platform ID to the binary program cache to prevent issues with multi-platform systems
|
2017-10-29 20:01:30 +01:00 |
|
Cedric Nugteren
|
319762f150
|
Added Android support using the GNU C++ STL library and the GCC toolchain
|
2017-10-29 12:07:07 +01:00 |
|
Cedric Nugteren
|
12b08ae491
|
Merge branch 'master' into android_support
|
2017-10-28 17:32:37 +02:00 |
|
Cedric Nugteren
|
334a26eb12
|
Added initial version of a GEMM kernel selection tuner
|
2017-10-28 17:30:29 +02:00 |
|
Cedric Nugteren
|
bd57dfa435
|
Moved timing function to a separate file
|
2017-10-28 14:12:05 +02:00 |
|
Cedric Nugteren
|
fa6e5e67f5
|
Fixed a bug when using the matrix A-offset argument for the TRSM routine
|
2017-10-27 22:12:30 +02:00 |
|
Cedric Nugteren
|
449577cf07
|
Reduced TRSM block-size for better numerical stability
|
2017-10-27 22:07:43 +02:00 |
|
Cedric Nugteren
|
44f7fa628a
|
Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM
|
2017-10-27 22:01:15 +02:00 |
|
Cedric Nugteren
|
d49aae236e
|
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls
|
2017-10-25 20:35:39 +02:00 |
|
Cedric Nugteren
|
472f90501c
|
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
|
2017-10-20 18:06:12 +02:00 |
|
Cedric Nugteren
|
363568787e
|
Moved CUmodule code from Kernel to Program class to not require re-compilation every time
|
2017-10-18 18:17:30 +02:00 |
|
Cedric Nugteren
|
9d879c949a
|
Fix an incompatibility with CUDA's FP16 definition
|
2017-10-17 20:29:23 +02:00 |
|
Cedric Nugteren
|
b1270f04b8
|
Made buffers of batched routines read/write (was: read-only)
|
2017-10-17 19:56:47 +02:00 |
|
Cedric Nugteren
|
f349731d54
|
CUDA kernel compilation fixes
|
2017-10-17 19:53:09 +02:00 |
|
Cedric Nugteren
|
0719f14486
|
Made all CUDA kernel launches synchronous; removed exception raising
|
2017-10-16 21:54:42 +02:00 |
|
Cedric Nugteren
|
d62823f067
|
Added a missing OpenCL-to-CUDA function translation
|
2017-10-15 19:53:52 +02:00 |
|
Cedric Nugteren
|
7663cba234
|
Fixes for the CUDA API: first tests pass and the client runs
|
2017-10-15 17:43:20 +02:00 |
|
Cedric Nugteren
|
71049e8d39
|
Added the SM-compute-arch version to the nv compile options
|
2017-10-15 17:41:44 +02:00 |
|
Cedric Nugteren
|
7408da174c
|
Various fixes to make the first CUDA examples work
|
2017-10-15 12:17:35 +02:00 |
|
Cedric Nugteren
|
55a802c63d
|
Fixed a kernel/attribute order bug in the direct GEMM kernels
|
2017-10-14 17:21:34 +02:00 |
|
Cedric Nugteren
|
b06bc01da9
|
Make local memory pointers a define in OpenCL; some fixes to the recently changed transpose kernel code
|
2017-10-14 17:13:54 +02:00 |
|
Cedric Nugteren
|
d9456306e0
|
Made transpose kernel struct init proper according to the C standard
|
2017-10-14 16:48:06 +02:00 |
|
Cedric Nugteren
|
313fc796b2
|
Fixed several (not all) CUDA kernel compilation issues
|
2017-10-14 16:01:12 +02:00 |
|
Cedric Nugteren
|
54d0c440ce
|
Various fixes to make the host code and sample compile with the CUDA API
|
2017-10-14 11:43:57 +02:00 |
|
Cedric Nugteren
|
2d7b648a24
|
Added OpenCL to CUDA translation header for the kernels
|
2017-10-14 10:49:25 +02:00 |
|
Cedric Nugteren
|
cc5b475425
|
CUDA API now takes context and device in instead of stream
|
2017-10-12 12:20:43 +02:00 |
|
Cedric Nugteren
|
b901809345
|
Added first (untested) version of a CUDA API
|
2017-10-11 23:16:57 +02:00 |
|
Cedric Nugteren
|
44246053a5
|
Removed include of clpp11.hpp in places other than utilities.hpp
|
2017-10-09 19:41:40 +02:00 |
|
Cedric Nugteren
|
df3c9f4a8a
|
Moved non-routine-specific API functions and includes to separate files
|
2017-10-08 21:52:02 +02:00 |
|
Cedric Nugteren
|
3598762029
|
Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs
|
2017-10-08 10:29:47 +02:00 |
|
Cedric Nugteren
|
6d3e1212f0
|
Synchronizes clpp11.h with CLCudaAPI 9.0
|
2017-10-07 18:43:29 +02:00 |
|
Cedric Nugteren
|
86b80cdc98
|
Fixed a small typo
|
2017-10-07 18:39:32 +02:00 |
|
Cedric Nugteren
|
375193fe4e
|
Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers
|
2017-10-03 21:55:21 +02:00 |
|
Cedric Nugteren
|
6b226028d5
|
Allow OverrideParameters function to work before a kernel was first used
|
2017-10-01 20:32:39 +02:00 |
|
Cedric Nugteren
|
1009303717
|
Merge branch 'additional_tuners'
|
2017-09-30 21:04:32 +02:00 |
|
Cedric Nugteren
|
c151ab1325
|
Refactored the tuning architecture: less duplicate now; more defaults
|
2017-09-30 20:26:26 +02:00 |
|
Cedric Nugteren
|
00b5771477
|
Added Android header for compilation with gnustl STL
|
2017-09-26 21:20:01 +02:00 |
|
Cedric Nugteren
|
21af690472
|
Added missing headers
|
2017-09-26 21:17:55 +02:00 |
|
Cedric Nugteren
|
ed980a1df1
|
Updated database override function to work with the new database storage format
|
2017-09-24 15:44:14 +02:00 |
|
Cedric Nugteren
|
255f09843c
|
Made program and binary databases dependent on the routine parameters on top of the name
|
2017-09-23 20:40:38 +02:00 |
|
Cedric Nugteren
|
890281f3e8
|
Made database-caching no longer dependent on device name but on device/platform IDs
|
2017-09-23 17:50:44 +02:00 |
|
Cedric Nugteren
|
ae1eeb4d1f
|
Fixed type conversion warnings under MSVC 2013
|
2017-09-19 19:44:34 +02:00 |
|
Cedric Nugteren
|
1d2ee29cb9
|
Fixed compilation issues of the database for MSVC 2013
|
2017-09-19 19:44:05 +02:00 |
|
Cedric Nugteren
|
a23cd8d13a
|
Updated README with proper AMD device names; fixed device look-up for names of length 50+
|
2017-09-16 21:26:38 +02:00 |
|
Cedric Nugteren
|
0802e3d84c
|
Added tuning results for Intel Core i7 6770HQ
|
2017-09-16 21:19:06 +02:00 |
|
Cedric Nugteren
|
bcf39eb79a
|
Fixed a compilation error and warning under MacOS
|
2017-09-16 18:34:11 +02:00 |
|
Cedric Nugteren
|
163474e171
|
Fixed an issue with the NVIDIA compute capability not being retrieved properly
|
2017-09-16 18:25:23 +02:00 |
|
Cedric Nugteren
|
4e317f5e85
|
Improved compilation time of the tuner database
|
2017-09-16 18:02:37 +02:00 |
|
Cedric Nugteren
|
c21878ecce
|
Added a guard against missing AMD and NVIDIA extensions
|
2017-09-14 21:58:08 +02:00 |
|
Cedric Nugteren
|
0d13d814c2
|
Added architecture layer in the tuning database for better performance on unseen devices
|
2017-09-14 21:27:33 +02:00 |
|
Cedric Nugteren
|
76382ff6c1
|
Added the new vendor-architecture-name hierarchy to the tuners as well
|
2017-09-10 16:34:54 +02:00 |
|
Cedric Nugteren
|
91ea7fcde2
|
Introduced the notion of a device-architecture for the database and added device and architecture name mappings
|
2017-09-08 21:09:05 +02:00 |
|
Cedric Nugteren
|
20da5e33a8
|
Split the database files over multiple directories and files; first step towards separate compilation
|
2017-09-06 21:50:42 +02:00 |
|
Cedric Nugteren
|
8905da259d
|
Fixed a modulo and division issue manifesting on Apple OpenCL for im2col
|
2017-09-05 18:49:23 +02:00 |
|
Cedric Nugteren
|
28462aa050
|
Removed an assumption that the 'default' tuning parameters have to be stored last; this is no longer needed
|
2017-09-04 17:39:57 +02:00 |
|
Cedric Nugteren
|
297159d5b9
|
Fixed a bug in im2col: process only valid channel IDs
|
2017-08-31 21:58:12 +02:00 |
|
Cedric Nugteren
|
6194d43efb
|
Fixed a bug in im2col confusing first and second workgroup size; made im2col kernel 2d instead of 3d
|
2017-08-31 20:34:10 +02:00 |
|