Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
Koichi Akabe
d9db543d75
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
2018-12-17 21:57:35 +09:00
Cedric Nugteren
d45911b61d
Added groundwork for col2im algorithm plus first non-working version of kernel and test
2018-10-23 20:52:25 +02:00
Cedric Nugteren
2dd539f911
Removed complex numbers support for CONVGEMM
2018-07-29 10:37:14 +02:00
Cedric Nugteren
1c9a741470
Merge branch 'master' into CLBlast-267-convgemm
2018-06-03 15:53:27 +02:00
Cedric Nugteren
38318fa39c
Added maximum time reporting to the client statistics
2018-05-27 11:39:51 +02:00
Cedric Nugteren
c85c385aaf
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
2018-05-23 22:36:38 +02:00
Cedric Nugteren
b608280361
Fixed the performance client for convgemm and added GFLOPS measurements
2018-05-09 19:59:31 +02:00
Cedric Nugteren
2d1f6ba7fe
Added convgemm skeleton, test infrastructure, and first reference implementation
2018-05-06 11:35:34 +02:00
Cedric Nugteren
ef5008f5e4
Created the API and stubs for the HAD (hadamard-product) routines
2018-01-31 20:41:02 +01:00
Cedric Nugteren
b35e3d1e53
Small improvements to benchmarking for cuBLAS
2018-01-14 19:50:27 +01:00
Cedric Nugteren
9fb2c61b25
Added API and tests for new GemmStridedBatched routine
2018-01-07 14:27:15 +01:00
Cedric Nugteren
eb89371d2b
Added a queue argument to the get-size function when running the tests/clients
2018-01-03 20:19:45 +01:00
Cedric Nugteren
9527c89c30
Made parameter override in the clients a command-line argument and added support for multi-kernel routines
2017-11-22 20:53:20 +01:00
Cedric Nugteren
8c9ecd9736
Implemented first version of reading JSON files from disk in the client to override parameters
2017-11-21 22:05:08 +01:00
Cedric Nugteren
a3069a97c3
Prepared test and client infrastructure for use with the CUDA API
2017-10-15 13:56:19 +02:00
Cedric Nugteren
74fd6767b9
GEMM tests now test both the in-direct and the direct kernels seperately
2017-10-01 20:36:56 +02:00
Cedric Nugteren
a8c26594d9
Made the im2col client properly handle the arguments
2017-08-23 19:54:09 +02:00
Cedric Nugteren
777681dcbd
Merge branch 'master' into im_to_col
2017-08-12 20:50:00 +02:00
Cedric Nugteren
844e68853e
Moved some utility functions to a test-specific utility compilation-unit
2017-08-12 15:38:17 +02:00
Cedric Nugteren
97bcf77d4b
First step towards supporting im2col in the test infrastructure
2017-07-16 22:33:49 +02:00
Cedric Nugteren
409a5a2ad0
Fixed a namespace clash with CUDA FP16 for the half-datatype
2017-04-17 16:47:15 +02:00
Cedric Nugteren
f7f8ec644f
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works
2017-04-13 21:31:27 +02:00
Cedric Nugteren
eb1fda2729
In-lined the float2 and double2 types to avoid collision with CUDA's definitions
2017-04-03 21:44:35 +02:00
Cedric Nugteren
b24d364743
Layed the groundwork for cuBLAS comparisons in the clients
2017-04-02 18:06:15 +02:00
Cedric Nugteren
b84d2296b8
Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication
2017-04-01 13:36:24 +02:00
Cedric Nugteren
0610447a7a
Fixed a compilation issue for GCC/MSVC
2017-03-19 17:37:52 +01:00
Cedric Nugteren
068ff32e9f
Fixed a linker issue for Clang
2017-03-12 10:41:18 +01:00
Cedric Nugteren
49e04c7fce
Added API and test infrastructure for the batched GEMM routine
2017-03-10 21:24:35 +01:00
Cedric Nugteren
fa0a9c689f
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
2017-03-08 20:10:20 +01:00
Cedric Nugteren
6aba0bbae7
Minor fixes to the client w.r.t. the addition of the batch count
2017-03-05 16:44:16 +01:00
Cedric Nugteren
cdf354f895
Adjusted the test-infrastructure to support testing of batched-versions of routines
2017-03-05 15:04:16 +01:00
Cedric Nugteren
7f14b11f1e
Changed the way the test-data is generated: now using a single MT generator and distribution for all data
2017-03-05 11:13:47 +01:00
Cedric Nugteren
f9a520b3af
Prepared generator for batched routines; added batched AXPY routine interface
2017-03-05 10:38:38 +01:00
Cedric Nugteren
b7310036ed
Removed half-precision support from the TRSM routine; too unstable
2017-02-26 12:56:21 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Ivan Shapovalov
1a1e863ab3
treewide: include clpp11.hpp first to silence deprecation warnings
...
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Cedric Nugteren
4b3ffd9989
Added a first version of the diagonal block invert routine in preparation of TRSM
2017-01-15 17:30:00 +01:00
Cedric Nugteren
39c49bf4f9
Made it possible to use the command-line environmental variables for each executable and without re-running CMake
2016-11-27 11:00:29 +01:00
Cedric Nugteren
b0ff11acf0
Moved files around a bit; created a utilities subfolder
2016-10-22 15:36:48 +02:00
Cedric Nugteren
6178fcd584
Now generates test/client/tuner data using a fixed seed to enable reproducability of results
2016-09-27 19:55:21 +02:00
Cedric Nugteren
77325b8974
Added an option to the performance clients to do a warm-up run before timing
2016-07-06 21:25:55 +02:00
Cedric Nugteren
69beca90f4
Moved the performance graph scripts to the 'scripts' subfolder
2016-06-27 11:51:57 +02:00
Cedric Nugteren
61203453aa
Renamed all C++ source files to .cpp to match the .hpp extension better
2016-06-19 13:55:49 +02:00
Cedric Nugteren
f726fbdc9f
Moved all headers into the source tree, changed headers to .hpp extension
2016-06-18 20:20:13 +02:00
Cedric Nugteren
52ccaf5b25
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
2016-06-16 18:07:46 +02:00
Cedric Nugteren
4612ff3552
Added possibility to run the performance client with half-precision
2016-05-25 14:37:26 +02:00
Cedric Nugteren
489c5d76cf
Merged in latest changes from 0.7.1 release
2016-05-18 21:32:56 +02:00
cnugteren
16a048f1ac
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
2016-04-20 22:12:51 -06:00
cnugteren
894983fc3c
Added prototype for ixAMAX routines
2016-04-20 21:11:33 -06:00