Cedric Nugteren
221121b840
Add Github Actions CI ( #464 )
...
This replaces the old Travis CI builds with Github Actions that test on both Ubuntu and MacOS, with both Clang and GCC. The builds on macOS also run the tests and some other programs, on Ubuntu OpenCL is not working at the moment. Because these tests use new/different compilers, I fixed a few warnings and errors along the way.
2023-05-14 11:25:15 +02:00
JishinMaster
aec45ea637
set the correct flop count for xgemm
2021-03-13 21:48:04 +01:00
Koichi Akabe
d9db543d75
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
2018-12-17 21:57:35 +09:00
Koichi Akabe
032e3b0cc0
Add kernel_mode option to im2col, col2im, and convgemm functions
2018-11-12 10:12:07 +09:00
Cedric Nugteren
6f67525ea6
Changed col2im to append to the existing im-buffer
2018-11-07 19:45:07 +01:00
Cedric Nugteren
469c346a8e
Fixed half-precision tests for im2col and col2im
2018-11-01 21:44:21 +01:00
Koichi Akabe
0b3d04f709
Fix col2im implementation
2018-10-30 14:54:55 +09:00
Cedric Nugteren
d45911b61d
Added groundwork for col2im algorithm plus first non-working version of kernel and test
2018-10-23 20:52:25 +02:00
Cedric Nugteren
44b630fc22
Some name changes in im2col code
2018-10-22 22:12:58 +02:00
Cedric Nugteren
83ba3d4b7b
Merge branch 'master' into convgemm_multi_kernel
2018-09-16 20:01:18 +02:00
Cedric Nugteren
bbb4523b7c
Added reference implementation for xCONVGEMM for half-precision
2018-09-07 22:04:08 +02:00
Cedric Nugteren
391e5757bd
Fixed the tests of OMATCOPY to include proper complex conjugation
2018-07-31 21:44:28 +02:00
Cedric Nugteren
b608280361
Fixed the performance client for convgemm and added GFLOPS measurements
2018-05-09 19:59:31 +02:00
Cedric Nugteren
2d1f6ba7fe
Added convgemm skeleton, test infrastructure, and first reference implementation
2018-05-06 11:35:34 +02:00
Cedric Nugteren
69ed46c8da
Implemented the XHAD Hadamard product routine
2018-02-02 21:18:37 +01:00
Cedric Nugteren
ef5008f5e4
Created the API and stubs for the HAD (hadamard-product) routines
2018-01-31 20:41:02 +01:00
Cedric Nugteren
9fb2c61b25
Added API and tests for new GemmStridedBatched routine
2018-01-07 14:27:15 +01:00
Cedric Nugteren
00687f8d81
Prevented half-precision batched routines from failing in the tests
2018-01-06 19:26:38 +01:00
Cedric Nugteren
eb89371d2b
Added a queue argument to the get-size function when running the tests/clients
2018-01-03 20:19:45 +01:00
Cedric Nugteren
ef71d8e9b5
Fixed unused variable warnings showing up with Clang
2017-12-23 16:07:26 +01:00
Cedric Nugteren
5467c0cac5
Fixed a variety of warnings and an error for MSVC2013 compilation
2017-11-19 21:09:24 +01:00
Cedric Nugteren
d24138808b
Fixed an FP16 issue in the homatcopy test; added a comment about improper testing of integer returning functions for FP16
2017-11-08 21:20:07 +01:00
Cedric Nugteren
e388f055f7
Fixed small bug in (unused) invert tester
2017-10-25 20:35:39 +02:00
Cedric Nugteren
e6da575fff
Modified test interfaces such that they support either OpenCL or CUDA
2017-10-15 19:35:21 +02:00
Cedric Nugteren
6194d43efb
Fixed a bug in im2col confusing first and second workgroup size; made im2col kernel 2d instead of 3d
2017-08-31 20:34:10 +02:00
Cedric Nugteren
a8c26594d9
Made the im2col client properly handle the arguments
2017-08-23 19:54:09 +02:00
Cedric Nugteren
132e62892d
Implemented proper im2col reference function and completd tests
2017-08-19 16:55:09 +02:00
Cedric Nugteren
777681dcbd
Merge branch 'master' into im_to_col
2017-08-12 20:50:00 +02:00
Cedric Nugteren
97bcf77d4b
First step towards supporting im2col in the test infrastructure
2017-07-16 22:33:49 +02:00
Cedric Nugteren
f77b48692b
Relaxed requirement on a_ld and b_ld for batched GEMM
2017-07-12 21:53:39 +02:00
Cedric Nugteren
ce528a9d39
Fixed and suppresses several warnings for MSVC
2017-06-26 21:38:04 +02:00
Cedric Nugteren
049d0fc95a
Fixed a compiler warning message
2017-04-23 10:45:08 +02:00
Cedric Nugteren
f7f8ec644f
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works
2017-04-13 21:31:27 +02:00
Cedric Nugteren
f24c142948
Made compilation of the cuBLAS wrapper work properly
2017-04-11 21:50:18 +02:00
Cedric Nugteren
6b625f8915
Added reference implementations for performance-testing against cuBLAS
2017-04-10 22:54:14 +02:00
Cedric Nugteren
c5461d77e5
Factored out inclusion of clBLAS and CBLAS from the test-routine files
2017-04-02 15:24:21 +02:00
Cedric Nugteren
a9c25e9fd2
Factored out inclusion of clBLAS and CBLAS from the test-routine files
2017-04-02 15:21:19 +02:00
Cedric Nugteren
b84d2296b8
Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication
2017-04-01 13:36:24 +02:00
Cedric Nugteren
49e04c7fce
Added API and test infrastructure for the batched GEMM routine
2017-03-10 21:24:35 +01:00
Cedric Nugteren
3846f44eaf
Small fix for a file that isn't currently compiled anymore
2017-03-10 20:53:20 +01:00
Cedric Nugteren
d754586b49
Added proper testing of the alpha parameter; finalized the batched AXPY implementation
2017-03-10 20:49:59 +01:00
Cedric Nugteren
fa0a9c689f
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
2017-03-08 20:10:20 +01:00
Cedric Nugteren
6aba0bbae7
Minor fixes to the client w.r.t. the addition of the batch count
2017-03-05 16:44:16 +01:00
Cedric Nugteren
b114ea49a9
Added first naive version of the batched AXPY routine
2017-03-05 15:06:14 +01:00
Cedric Nugteren
cdf354f895
Adjusted the test-infrastructure to support testing of batched-versions of routines
2017-03-05 15:04:16 +01:00
Cedric Nugteren
7f14b11f1e
Changed the way the test-data is generated: now using a single MT generator and distribution for all data
2017-03-05 11:13:47 +01:00
Cedric Nugteren
e993ee077b
Added a proper data-preparation function for the TRSM tests
2017-03-04 15:21:33 +01:00
Cedric Nugteren
e47d95887c
Added PrepareData function for TRSM to create proper test input
2017-02-25 12:23:04 +01:00
Cedric Nugteren
4b3ffd9989
Added a first version of the diagonal block invert routine in preparation of TRSM
2017-01-15 17:30:00 +01:00
Cedric Nugteren
60fa2322ca
Added a proper half-precision reference for testing of xomatcopy
2016-11-17 22:20:16 +01:00