Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Cedric Nugteren
af6a9eedd1
Added a function to set the OpenCL kernel standard, either 1.1 or 1.2
2019-05-11 20:39:00 +02:00
Cedric Nugteren
9cbffc9b7c
Changed back to cl_intel_subgroups as suggested
2019-05-08 22:01:56 +02:00
Cedric Nugteren
c5a82f6978
Added a host-code check to make sure the avc_motion_estimation is available
2019-05-07 20:47:50 +02:00
Koichi Akabe
032e3b0cc0
Add kernel_mode option to im2col, col2im, and convgemm functions
2018-11-12 10:12:07 +09:00
Koichi Akabe
0b3d04f709
Fix col2im implementation
2018-10-30 14:54:55 +09:00
Cedric Nugteren
1c9a741470
Merge branch 'master' into CLBlast-267-convgemm
2018-06-03 15:53:27 +02:00
Cedric Nugteren
c85c385aaf
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
2018-05-23 22:36:38 +02:00
Cedric Nugteren
cbcd4ff7e8
Merge branch 'master' into CLBlast-267-convgemm
2018-05-19 17:54:27 +02:00
Cedric Nugteren
b855af681f
Added a canary region for overflow detection to the tuners
2018-05-17 10:45:10 +01:00
Cedric Nugteren
2d1f6ba7fe
Added convgemm skeleton, test infrastructure, and first reference implementation
2018-05-06 11:35:34 +02:00
Cedric Nugteren
2b1e0295e6
Added a define to enable subgroup shuffling if supported by the device
2018-04-24 20:41:15 +02:00
Cedric Nugteren
82467b64c4
Fixed a missing include
2017-12-10 14:49:38 +01:00
Cedric Nugteren
f01bcded1e
Moved string splitting functions; added string character removal function
2017-11-25 17:44:21 +01:00
Cedric Nugteren
9527c89c30
Made parameter override in the clients a command-line argument and added support for multi-kernel routines
2017-11-22 20:53:20 +01:00
Cedric Nugteren
1b2b46f2f0
Added first version of integrated and re-written auto-tuner
2017-11-15 22:49:35 +01:00
Cedric Nugteren
4bac1287f2
Moved square-difference utility function for use in the tuners
2017-11-13 21:10:44 +01:00
Cedric Nugteren
b901809345
Added first (untested) version of a CUDA API
2017-10-11 23:16:57 +02:00
Cedric Nugteren
3598762029
Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs
2017-10-08 10:29:47 +02:00
Cedric Nugteren
bcf39eb79a
Fixed a compilation error and warning under MacOS
2017-09-16 18:34:11 +02:00
Cedric Nugteren
0d13d814c2
Added architecture layer in the tuning database for better performance on unseen devices
2017-09-14 21:27:33 +02:00
Cedric Nugteren
76382ff6c1
Added the new vendor-architecture-name hierarchy to the tuners as well
2017-09-10 16:34:54 +02:00
Cedric Nugteren
91ea7fcde2
Introduced the notion of a device-architecture for the database and added device and architecture name mappings
2017-09-08 21:09:05 +02:00
Cedric Nugteren
161fd8514d
Merge branch 'master' into im_to_col
2017-08-24 21:15:14 +02:00
Cedric Nugteren
a8c26594d9
Made the im2col client properly handle the arguments
2017-08-23 19:54:09 +02:00
Cedric Nugteren
e5eb6b1d3a
Merge pull request #173 from mcian/PSO_params
...
Add PSO parameters support and search strategy selection from command…
2017-08-21 20:06:29 +02:00
mcian
dfd332524a
Remove multistrategy and related functions
2017-08-21 14:09:11 +02:00
Cedric Nugteren
777681dcbd
Merge branch 'master' into im_to_col
2017-08-12 20:50:00 +02:00
Cedric Nugteren
844e68853e
Moved some utility functions to a test-specific utility compilation-unit
2017-08-12 15:38:17 +02:00
mcian
473e814718
Code refactoring
2017-07-23 14:48:13 +02:00
mcian
8131e68664
Add PSO parameters support and search strategy selection from command line
2017-07-17 12:00:25 +02:00
Cedric Nugteren
97bcf77d4b
First step towards supporting im2col in the test infrastructure
2017-07-16 22:33:49 +02:00
Cedric Nugteren
f7a16d427c
Fixed a compilation issue under MSVC 2013
2017-05-26 22:10:56 +02:00
Cedric Nugteren
409a5a2ad0
Fixed a namespace clash with CUDA FP16 for the half-datatype
2017-04-17 16:47:15 +02:00
Cedric Nugteren
f7f8ec644f
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works
2017-04-13 21:31:27 +02:00
Cedric Nugteren
b24d364743
Layed the groundwork for cuBLAS comparisons in the clients
2017-04-02 18:06:15 +02:00
Cedric Nugteren
b84d2296b8
Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication
2017-04-01 13:36:24 +02:00
Cedric Nugteren
d754586b49
Added proper testing of the alpha parameter; finalized the batched AXPY implementation
2017-03-10 20:49:59 +01:00
Cedric Nugteren
fa0a9c689f
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
2017-03-08 20:10:20 +01:00
Cedric Nugteren
6aba0bbae7
Minor fixes to the client w.r.t. the addition of the batch count
2017-03-05 16:44:16 +01:00
Cedric Nugteren
cdf354f895
Adjusted the test-infrastructure to support testing of batched-versions of routines
2017-03-05 15:04:16 +01:00
Cedric Nugteren
7f14b11f1e
Changed the way the test-data is generated: now using a single MT generator and distribution for all data
2017-03-05 11:13:47 +01:00
Cedric Nugteren
e993ee077b
Added a proper data-preparation function for the TRSM tests
2017-03-04 15:21:33 +01:00
Cedric Nugteren
e47d95887c
Added PrepareData function for TRSM to create proper test input
2017-02-25 12:23:04 +01:00
Cedric Nugteren
133ebfc834
Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass
2017-02-19 17:43:26 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Ivan Shapovalov
1a1e863ab3
treewide: include clpp11.hpp first to silence deprecation warnings
...
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Cedric Nugteren
df9a77d74d
Added first version of the TRSM routine based on the diagonal invert kernel
2017-01-18 21:29:59 +01:00
Cedric Nugteren
4b3ffd9989
Added a first version of the diagonal block invert routine in preparation of TRSM
2017-01-15 17:30:00 +01:00
Cedric Nugteren
39c49bf4f9
Made it possible to use the command-line environmental variables for each executable and without re-running CMake
2016-11-27 11:00:29 +01:00