Cedric Nugteren
|
844e68853e
|
Moved some utility functions to a test-specific utility compilation-unit
|
2017-08-12 15:38:17 +02:00 |
|
Cedric Nugteren
|
f77b48692b
|
Relaxed requirement on a_ld and b_ld for batched GEMM
|
2017-07-12 21:53:39 +02:00 |
|
Cedric Nugteren
|
ce528a9d39
|
Fixed and suppresses several warnings for MSVC
|
2017-06-26 21:38:04 +02:00 |
|
Cedric Nugteren
|
93c8db7fe7
|
Bug-fix in the half-precision test of the amax routine
|
2017-05-11 22:19:15 -07:00 |
|
Cedric Nugteren
|
049d0fc95a
|
Fixed a compiler warning message
|
2017-04-23 10:45:08 +02:00 |
|
Cedric Nugteren
|
f7f8ec644f
|
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works
|
2017-04-13 21:31:27 +02:00 |
|
Cedric Nugteren
|
f24c142948
|
Made compilation of the cuBLAS wrapper work properly
|
2017-04-11 21:50:18 +02:00 |
|
Cedric Nugteren
|
6b625f8915
|
Added reference implementations for performance-testing against cuBLAS
|
2017-04-10 22:54:14 +02:00 |
|
Cedric Nugteren
|
af9a521042
|
Fixes the CUDA wrapper (now actually tested on a system with CUDA)
|
2017-04-03 21:46:07 +02:00 |
|
Cedric Nugteren
|
c5461d77e5
|
Factored out inclusion of clBLAS and CBLAS from the test-routine files
|
2017-04-02 15:24:21 +02:00 |
|
Cedric Nugteren
|
a9c25e9fd2
|
Factored out inclusion of clBLAS and CBLAS from the test-routine files
|
2017-04-02 15:21:19 +02:00 |
|
Cedric Nugteren
|
b84d2296b8
|
Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication
|
2017-04-01 13:36:24 +02:00 |
|
Cedric Nugteren
|
49e04c7fce
|
Added API and test infrastructure for the batched GEMM routine
|
2017-03-10 21:24:35 +01:00 |
|
Cedric Nugteren
|
3846f44eaf
|
Small fix for a file that isn't currently compiled anymore
|
2017-03-10 20:53:20 +01:00 |
|
Cedric Nugteren
|
d754586b49
|
Added proper testing of the alpha parameter; finalized the batched AXPY implementation
|
2017-03-10 20:49:59 +01:00 |
|
Cedric Nugteren
|
fa0a9c689f
|
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
|
2017-03-08 20:10:20 +01:00 |
|
Cedric Nugteren
|
6aba0bbae7
|
Minor fixes to the client w.r.t. the addition of the batch count
|
2017-03-05 16:44:16 +01:00 |
|
Cedric Nugteren
|
b114ea49a9
|
Added first naive version of the batched AXPY routine
|
2017-03-05 15:06:14 +01:00 |
|
Cedric Nugteren
|
cdf354f895
|
Adjusted the test-infrastructure to support testing of batched-versions of routines
|
2017-03-05 15:04:16 +01:00 |
|
Cedric Nugteren
|
7f14b11f1e
|
Changed the way the test-data is generated: now using a single MT generator and distribution for all data
|
2017-03-05 11:13:47 +01:00 |
|
Cedric Nugteren
|
37228c9098
|
Fixed a missing include for the tests
|
2017-03-04 20:45:39 +01:00 |
|
Cedric Nugteren
|
e993ee077b
|
Added a proper data-preparation function for the TRSM tests
|
2017-03-04 15:21:33 +01:00 |
|
Cedric Nugteren
|
a145890aaa
|
Added a guard against invalid buffer sizes in the prepare-data functions for tests
|
2017-02-26 14:37:29 +01:00 |
|
Cedric Nugteren
|
e47d95887c
|
Added PrepareData function for TRSM to create proper test input
|
2017-02-25 12:23:04 +01:00 |
|
Cedric Nugteren
|
133ebfc834
|
Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass
|
2017-02-19 17:43:26 +01:00 |
|
Cedric Nugteren
|
a5fd2323b6
|
Added prototype for the TRSV routine
|
2017-01-20 11:30:32 +01:00 |
|
Cedric Nugteren
|
df9a77d74d
|
Added first version of the TRSM routine based on the diagonal invert kernel
|
2017-01-18 21:29:59 +01:00 |
|
Cedric Nugteren
|
4b3ffd9989
|
Added a first version of the diagonal block invert routine in preparation of TRSM
|
2017-01-15 17:30:00 +01:00 |
|
Cedric Nugteren
|
681a465b35
|
Prepared for the addition of the TRSM triangular solver kernel
|
2016-12-18 12:30:16 +01:00 |
|
Cedric Nugteren
|
60fa2322ca
|
Added a proper half-precision reference for testing of xomatcopy
|
2016-11-17 22:20:16 +01:00 |
|
Cedric Nugteren
|
d595a8ed7e
|
Fixed a bug waiting for an invalid event in case of a non-succesfull CLBlast call in the tests and samples
|
2016-09-22 20:47:22 +02:00 |
|
CNugteren
|
2c031f3e1d
|
Made it possible to build the OMATCOPY test and client in case only clBLAS is present
|
2016-06-28 16:36:01 +02:00 |
|
Cedric Nugteren
|
f726fbdc9f
|
Moved all headers into the source tree, changed headers to .hpp extension
|
2016-06-18 20:20:13 +02:00 |
|
Cedric Nugteren
|
52ccaf5b25
|
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
|
2016-06-16 18:07:46 +02:00 |
|
Cedric Nugteren
|
03182f9d07
|
Added half-precision tests for the clBLAS reference through conversion to single-precision
|
2016-05-26 23:36:19 +02:00 |
|
cnugteren
|
1acb31896c
|
Fixed an issue with computing the GFLOPS numbers for the xGEMM performance tests for non-square matrices
|
2016-05-08 10:06:06 +02:00 |
|
Cedric Nugteren
|
3555cd0436
|
All CLBlast enum constants now have the same raw values as in the cblas standard
|
2016-04-27 11:37:55 +02:00 |
|
cnugteren
|
16a048f1ac
|
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
|
2016-04-20 22:12:51 -06:00 |
|
cnugteren
|
8be99de82d
|
Added support for the SASUM/DASUM/ScASUM/DzASUM routines
|
2016-04-14 19:58:26 -06:00 |
|
cnugteren
|
1a82861a90
|
Added support for testing (performance and correctness) against a CPU BLAS library
|
2016-04-02 11:58:00 -07:00 |
|
Cedric Nugteren
|
aaa687ca98
|
Added preliminary support for the xNRM2 routines
|
2016-03-28 23:00:44 +02:00 |
|
Cedric Nugteren
|
306bf67660
|
Added preliminary support for xHPR2 and xSPR2 routines
|
2016-03-06 15:48:11 +01:00 |
|
Cedric Nugteren
|
60da54da5d
|
Added preliminary support for xHER2 and xSYR2 routines
|
2016-03-02 21:18:01 +01:00 |
|
Cedric Nugteren
|
e3545215a5
|
Added support for xHER, xHPR, xSYR, and xSPR routines
|
2016-02-28 14:16:48 +01:00 |
|
Cedric Nugteren
|
6dc44da07b
|
Added support for xGERU and xGERC routines
|
2016-02-20 14:15:41 +01:00 |
|
Cedric Nugteren
|
8854a73127
|
Added XGER routine, kernel, and tuner
|
2016-02-20 12:40:01 +01:00 |
|
CNugteren
|
2b56c2c603
|
Added TRMV/TBMV/TPMV routines
|
2015-09-26 16:58:03 +02:00 |
|
CNugteren
|
de6547a92b
|
Added SBMV and SPMV routines
|
2015-09-19 18:01:19 +02:00 |
|
CNugteren
|
80da67d28b
|
Added the HPMV routine
|
2015-09-19 17:40:38 +02:00 |
|
CNugteren
|
aebd156869
|
Added the HBMV routine
|
2015-09-19 11:11:34 +02:00 |
|