Commit graph

12 commits

Author SHA1 Message Date
Cedric Nugteren 00687f8d81 Prevented half-precision batched routines from failing in the tests 2018-01-06 19:26:38 +01:00
Cedric Nugteren eb89371d2b Added a queue argument to the get-size function when running the tests/clients 2018-01-03 20:19:45 +01:00
Cedric Nugteren e6da575fff Modified test interfaces such that they support either OpenCL or CUDA 2017-10-15 19:35:21 +02:00
Cedric Nugteren ce528a9d39 Fixed and suppresses several warnings for MSVC 2017-06-26 21:38:04 +02:00
Cedric Nugteren f7f8ec644f Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works 2017-04-13 21:31:27 +02:00
Cedric Nugteren 6b625f8915 Added reference implementations for performance-testing against cuBLAS 2017-04-10 22:54:14 +02:00
Cedric Nugteren c5461d77e5 Factored out inclusion of clBLAS and CBLAS from the test-routine files 2017-04-02 15:24:21 +02:00
Cedric Nugteren b84d2296b8 Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication 2017-04-01 13:36:24 +02:00
Cedric Nugteren d754586b49 Added proper testing of the alpha parameter; finalized the batched AXPY implementation 2017-03-10 20:49:59 +01:00
Cedric Nugteren fa0a9c689f Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes 2017-03-08 20:10:20 +01:00
Cedric Nugteren 6aba0bbae7 Minor fixes to the client w.r.t. the addition of the batch count 2017-03-05 16:44:16 +01:00
Cedric Nugteren b114ea49a9 Added first naive version of the batched AXPY routine 2017-03-05 15:06:14 +01:00