Commit graph

94 commits

Author SHA1 Message Date
Koichi Akabe d9db543d75 Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm 2018-12-17 21:57:35 +09:00
Cedric Nugteren d45911b61d Added groundwork for col2im algorithm plus first non-working version of kernel and test 2018-10-23 20:52:25 +02:00
Cedric Nugteren 2dd539f911 Removed complex numbers support for CONVGEMM 2018-07-29 10:37:14 +02:00
Cedric Nugteren 1c9a741470 Merge branch 'master' into CLBlast-267-convgemm 2018-06-03 15:53:27 +02:00
Cedric Nugteren 38318fa39c Added maximum time reporting to the client statistics 2018-05-27 11:39:51 +02:00
Cedric Nugteren c85c385aaf Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation 2018-05-23 22:36:38 +02:00
Cedric Nugteren b608280361 Fixed the performance client for convgemm and added GFLOPS measurements 2018-05-09 19:59:31 +02:00
Cedric Nugteren 2d1f6ba7fe Added convgemm skeleton, test infrastructure, and first reference implementation 2018-05-06 11:35:34 +02:00
Cedric Nugteren ef5008f5e4 Created the API and stubs for the HAD (hadamard-product) routines 2018-01-31 20:41:02 +01:00
Cedric Nugteren b35e3d1e53 Small improvements to benchmarking for cuBLAS 2018-01-14 19:50:27 +01:00
Cedric Nugteren 9fb2c61b25 Added API and tests for new GemmStridedBatched routine 2018-01-07 14:27:15 +01:00
Cedric Nugteren eb89371d2b Added a queue argument to the get-size function when running the tests/clients 2018-01-03 20:19:45 +01:00
Cedric Nugteren 9527c89c30 Made parameter override in the clients a command-line argument and added support for multi-kernel routines 2017-11-22 20:53:20 +01:00
Cedric Nugteren 8c9ecd9736 Implemented first version of reading JSON files from disk in the client to override parameters 2017-11-21 22:05:08 +01:00
Cedric Nugteren a3069a97c3 Prepared test and client infrastructure for use with the CUDA API 2017-10-15 13:56:19 +02:00
Cedric Nugteren 74fd6767b9 GEMM tests now test both the in-direct and the direct kernels seperately 2017-10-01 20:36:56 +02:00
Cedric Nugteren a8c26594d9 Made the im2col client properly handle the arguments 2017-08-23 19:54:09 +02:00
Cedric Nugteren 777681dcbd Merge branch 'master' into im_to_col 2017-08-12 20:50:00 +02:00
Cedric Nugteren 844e68853e Moved some utility functions to a test-specific utility compilation-unit 2017-08-12 15:38:17 +02:00
Cedric Nugteren 97bcf77d4b First step towards supporting im2col in the test infrastructure 2017-07-16 22:33:49 +02:00
Cedric Nugteren 409a5a2ad0 Fixed a namespace clash with CUDA FP16 for the half-datatype 2017-04-17 16:47:15 +02:00
Cedric Nugteren f7f8ec644f Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works 2017-04-13 21:31:27 +02:00
Cedric Nugteren eb1fda2729 In-lined the float2 and double2 types to avoid collision with CUDA's definitions 2017-04-03 21:44:35 +02:00
Cedric Nugteren b24d364743 Layed the groundwork for cuBLAS comparisons in the clients 2017-04-02 18:06:15 +02:00
Cedric Nugteren b84d2296b8 Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication 2017-04-01 13:36:24 +02:00
Cedric Nugteren 0610447a7a Fixed a compilation issue for GCC/MSVC 2017-03-19 17:37:52 +01:00
Cedric Nugteren 068ff32e9f Fixed a linker issue for Clang 2017-03-12 10:41:18 +01:00
Cedric Nugteren 49e04c7fce Added API and test infrastructure for the batched GEMM routine 2017-03-10 21:24:35 +01:00
Cedric Nugteren fa0a9c689f Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes 2017-03-08 20:10:20 +01:00
Cedric Nugteren 6aba0bbae7 Minor fixes to the client w.r.t. the addition of the batch count 2017-03-05 16:44:16 +01:00
Cedric Nugteren cdf354f895 Adjusted the test-infrastructure to support testing of batched-versions of routines 2017-03-05 15:04:16 +01:00
Cedric Nugteren 7f14b11f1e Changed the way the test-data is generated: now using a single MT generator and distribution for all data 2017-03-05 11:13:47 +01:00
Cedric Nugteren f9a520b3af Prepared generator for batched routines; added batched AXPY routine interface 2017-03-05 10:38:38 +01:00
Cedric Nugteren b7310036ed Removed half-precision support from the TRSM routine; too unstable 2017-02-26 12:56:21 +01:00
Cedric Nugteren c248f900c0 Merge branch 'development' into triangular_solvers 2017-02-05 22:18:59 +01:00
Ivan Shapovalov 1a1e863ab3 treewide: include clpp11.hpp first to silence deprecation warnings
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Cedric Nugteren 4b3ffd9989 Added a first version of the diagonal block invert routine in preparation of TRSM 2017-01-15 17:30:00 +01:00
Cedric Nugteren 39c49bf4f9 Made it possible to use the command-line environmental variables for each executable and without re-running CMake 2016-11-27 11:00:29 +01:00
Cedric Nugteren b0ff11acf0 Moved files around a bit; created a utilities subfolder 2016-10-22 15:36:48 +02:00
Cedric Nugteren 6178fcd584 Now generates test/client/tuner data using a fixed seed to enable reproducability of results 2016-09-27 19:55:21 +02:00
Cedric Nugteren 77325b8974 Added an option to the performance clients to do a warm-up run before timing 2016-07-06 21:25:55 +02:00
Cedric Nugteren 69beca90f4 Moved the performance graph scripts to the 'scripts' subfolder 2016-06-27 11:51:57 +02:00
Cedric Nugteren 61203453aa Renamed all C++ source files to .cpp to match the .hpp extension better 2016-06-19 13:55:49 +02:00
Cedric Nugteren f726fbdc9f Moved all headers into the source tree, changed headers to .hpp extension 2016-06-18 20:20:13 +02:00
Cedric Nugteren 52ccaf5b25 Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing 2016-06-16 18:07:46 +02:00
Cedric Nugteren 4612ff3552 Added possibility to run the performance client with half-precision 2016-05-25 14:37:26 +02:00
Cedric Nugteren 489c5d76cf Merged in latest changes from 0.7.1 release 2016-05-18 21:32:56 +02:00
cnugteren 16a048f1ac Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines 2016-04-20 22:12:51 -06:00
cnugteren 894983fc3c Added prototype for ixAMAX routines 2016-04-20 21:11:33 -06:00
cnugteren 8be99de82d Added support for the SASUM/DASUM/ScASUM/DzASUM routines 2016-04-14 19:58:26 -06:00