Cedric Nugteren
|
8290ad78b9
|
Fixed a few issues with canary region testing
|
2018-05-17 12:16:32 +02:00 |
|
Cedric Nugteren
|
69ed46c8da
|
Implemented the XHAD Hadamard product routine
|
2018-02-02 21:18:37 +01:00 |
|
Cedric Nugteren
|
ef5008f5e4
|
Created the API and stubs for the HAD (hadamard-product) routines
|
2018-01-31 20:41:02 +01:00 |
|
Cedric Nugteren
|
9fb2c61b25
|
Added API and tests for new GemmStridedBatched routine
|
2018-01-07 14:27:15 +01:00 |
|
Cedric Nugteren
|
00687f8d81
|
Prevented half-precision batched routines from failing in the tests
|
2018-01-06 19:26:38 +01:00 |
|
Cedric Nugteren
|
ce069545d4
|
Added CUDA interface to get temporary-buffer size for GEMM routine
|
2018-01-06 10:05:28 +01:00 |
|
Cedric Nugteren
|
5315b982a9
|
Added the temp-buffer to the GEMM testers and clients
|
2018-01-03 20:20:31 +01:00 |
|
Cedric Nugteren
|
eb89371d2b
|
Added a queue argument to the get-size function when running the tests/clients
|
2018-01-03 20:19:45 +01:00 |
|
Cedric Nugteren
|
ef71d8e9b5
|
Fixed unused variable warnings showing up with Clang
|
2017-12-23 16:07:26 +01:00 |
|
Cedric Nugteren
|
5467c0cac5
|
Fixed a variety of warnings and an error for MSVC2013 compilation
|
2017-11-19 21:09:24 +01:00 |
|
Cedric Nugteren
|
d24138808b
|
Fixed an FP16 issue in the homatcopy test; added a comment about improper testing of integer returning functions for FP16
|
2017-11-08 21:20:07 +01:00 |
|
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
|
Cedric Nugteren
|
e388f055f7
|
Fixed small bug in (unused) invert tester
|
2017-10-25 20:35:39 +02:00 |
|
Cedric Nugteren
|
8431a165d0
|
Fixed a small copy-paste typo
|
2017-10-15 19:38:48 +02:00 |
|
Cedric Nugteren
|
e6da575fff
|
Modified test interfaces such that they support either OpenCL or CUDA
|
2017-10-15 19:35:21 +02:00 |
|
Cedric Nugteren
|
7663cba234
|
Fixes for the CUDA API: first tests pass and the client runs
|
2017-10-15 17:43:20 +02:00 |
|
Cedric Nugteren
|
a3069a97c3
|
Prepared test and client infrastructure for use with the CUDA API
|
2017-10-15 13:56:19 +02:00 |
|
Cedric Nugteren
|
74fd6767b9
|
GEMM tests now test both the in-direct and the direct kernels seperately
|
2017-10-01 20:36:56 +02:00 |
|
Cedric Nugteren
|
6194d43efb
|
Fixed a bug in im2col confusing first and second workgroup size; made im2col kernel 2d instead of 3d
|
2017-08-31 20:34:10 +02:00 |
|
Cedric Nugteren
|
a8c26594d9
|
Made the im2col client properly handle the arguments
|
2017-08-23 19:54:09 +02:00 |
|
Cedric Nugteren
|
132e62892d
|
Implemented proper im2col reference function and completd tests
|
2017-08-19 16:55:09 +02:00 |
|
Cedric Nugteren
|
777681dcbd
|
Merge branch 'master' into im_to_col
|
2017-08-12 20:50:00 +02:00 |
|
Cedric Nugteren
|
844e68853e
|
Moved some utility functions to a test-specific utility compilation-unit
|
2017-08-12 15:38:17 +02:00 |
|
Cedric Nugteren
|
97bcf77d4b
|
First step towards supporting im2col in the test infrastructure
|
2017-07-16 22:33:49 +02:00 |
|
Cedric Nugteren
|
f77b48692b
|
Relaxed requirement on a_ld and b_ld for batched GEMM
|
2017-07-12 21:53:39 +02:00 |
|
Cedric Nugteren
|
ce528a9d39
|
Fixed and suppresses several warnings for MSVC
|
2017-06-26 21:38:04 +02:00 |
|
Cedric Nugteren
|
93c8db7fe7
|
Bug-fix in the half-precision test of the amax routine
|
2017-05-11 22:19:15 -07:00 |
|
Cedric Nugteren
|
049d0fc95a
|
Fixed a compiler warning message
|
2017-04-23 10:45:08 +02:00 |
|
Cedric Nugteren
|
f7f8ec644f
|
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works
|
2017-04-13 21:31:27 +02:00 |
|
Cedric Nugteren
|
f24c142948
|
Made compilation of the cuBLAS wrapper work properly
|
2017-04-11 21:50:18 +02:00 |
|
Cedric Nugteren
|
6b625f8915
|
Added reference implementations for performance-testing against cuBLAS
|
2017-04-10 22:54:14 +02:00 |
|
Cedric Nugteren
|
af9a521042
|
Fixes the CUDA wrapper (now actually tested on a system with CUDA)
|
2017-04-03 21:46:07 +02:00 |
|
Cedric Nugteren
|
c5461d77e5
|
Factored out inclusion of clBLAS and CBLAS from the test-routine files
|
2017-04-02 15:24:21 +02:00 |
|
Cedric Nugteren
|
a9c25e9fd2
|
Factored out inclusion of clBLAS and CBLAS from the test-routine files
|
2017-04-02 15:21:19 +02:00 |
|
Cedric Nugteren
|
b84d2296b8
|
Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication
|
2017-04-01 13:36:24 +02:00 |
|
Cedric Nugteren
|
49e04c7fce
|
Added API and test infrastructure for the batched GEMM routine
|
2017-03-10 21:24:35 +01:00 |
|
Cedric Nugteren
|
3846f44eaf
|
Small fix for a file that isn't currently compiled anymore
|
2017-03-10 20:53:20 +01:00 |
|
Cedric Nugteren
|
d754586b49
|
Added proper testing of the alpha parameter; finalized the batched AXPY implementation
|
2017-03-10 20:49:59 +01:00 |
|
Cedric Nugteren
|
fa0a9c689f
|
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
|
2017-03-08 20:10:20 +01:00 |
|
Cedric Nugteren
|
6aba0bbae7
|
Minor fixes to the client w.r.t. the addition of the batch count
|
2017-03-05 16:44:16 +01:00 |
|
Cedric Nugteren
|
b114ea49a9
|
Added first naive version of the batched AXPY routine
|
2017-03-05 15:06:14 +01:00 |
|
Cedric Nugteren
|
cdf354f895
|
Adjusted the test-infrastructure to support testing of batched-versions of routines
|
2017-03-05 15:04:16 +01:00 |
|
Cedric Nugteren
|
7f14b11f1e
|
Changed the way the test-data is generated: now using a single MT generator and distribution for all data
|
2017-03-05 11:13:47 +01:00 |
|
Cedric Nugteren
|
37228c9098
|
Fixed a missing include for the tests
|
2017-03-04 20:45:39 +01:00 |
|
Cedric Nugteren
|
e993ee077b
|
Added a proper data-preparation function for the TRSM tests
|
2017-03-04 15:21:33 +01:00 |
|
Cedric Nugteren
|
a145890aaa
|
Added a guard against invalid buffer sizes in the prepare-data functions for tests
|
2017-02-26 14:37:29 +01:00 |
|
Cedric Nugteren
|
e47d95887c
|
Added PrepareData function for TRSM to create proper test input
|
2017-02-25 12:23:04 +01:00 |
|
Cedric Nugteren
|
133ebfc834
|
Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass
|
2017-02-19 17:43:26 +01:00 |
|
Cedric Nugteren
|
a5fd2323b6
|
Added prototype for the TRSV routine
|
2017-01-20 11:30:32 +01:00 |
|
Cedric Nugteren
|
df9a77d74d
|
Added first version of the TRSM routine based on the diagonal invert kernel
|
2017-01-18 21:29:59 +01:00 |
|