Cedric Nugteren
db3bd0a32e
Add Windows builds to Github Actions and fix Windows compilation issue ( #470 )
...
* Add Windows builds to Github Actions CI
* Fix failing Windows builds
2023-05-18 16:58:31 +02:00
Cedric Nugteren
221121b840
Add Github Actions CI ( #464 )
...
This replaces the old Travis CI builds with Github Actions that test on both Ubuntu and MacOS, with both Clang and GCC. The builds on macOS also run the tests and some other programs, on Ubuntu OpenCL is not working at the moment. Because these tests use new/different compilers, I fixed a few warnings and errors along the way.
2023-05-14 11:25:15 +02:00
Cedric Nugteren
6e6efb72be
Fix compilation issue in override parameters test
2023-05-10 21:31:33 +02:00
Cedric Nugteren
3d0c227fa5
AMAX/AMIN integer testing and bug fixes ( #457 )
...
* Fixed a bug in XAMAX/XMIN routines that caused the increment and offset to be included in the result
* Perform proper integer-output testing in XAMAX tests
* A few changes towards getting it ready for a PR
* Also fix compilation for clBLAS and cuBLAS references
* Fix a bug that would only use the real part of complex numbers in the amax/amin routines
* A few small fixes related to the AMAX tests
2023-05-07 20:02:52 +02:00
JishinMaster
aec45ea637
set the correct flop count for xgemm
2021-03-13 21:48:04 +01:00
Jerry James
dc82a1fbc8
Use reference types to prevent unnecessary copying
2021-01-20 10:21:36 -07:00
Koichi Akabe
d9db543d75
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
2018-12-17 21:57:35 +09:00
Koichi Akabe
032e3b0cc0
Add kernel_mode option to im2col, col2im, and convgemm functions
2018-11-12 10:12:07 +09:00
Cedric Nugteren
6f67525ea6
Changed col2im to append to the existing im-buffer
2018-11-07 19:45:07 +01:00
Cedric Nugteren
469c346a8e
Fixed half-precision tests for im2col and col2im
2018-11-01 21:44:21 +01:00
Koichi Akabe
0b3d04f709
Fix col2im implementation
2018-10-30 14:54:55 +09:00
Cedric Nugteren
d45911b61d
Added groundwork for col2im algorithm plus first non-working version of kernel and test
2018-10-23 20:52:25 +02:00
Cedric Nugteren
44b630fc22
Some name changes in im2col code
2018-10-22 22:12:58 +02:00
Cedric Nugteren
ab0178c56b
Fixed MSVC's compilation error C1061 due to too many for-loops
2018-10-17 21:35:09 +02:00
Cedric Nugteren
83ba3d4b7b
Merge branch 'master' into convgemm_multi_kernel
2018-09-16 20:01:18 +02:00
Cedric Nugteren
4917b77e13
Added pre-processor test for GEMMK=1 kernel
2018-09-15 16:49:51 +02:00
Cedric Nugteren
b7d8339012
Reduced size of the xCONVGEMM correctness tests
2018-09-07 22:04:24 +02:00
Cedric Nugteren
bbb4523b7c
Added reference implementation for xCONVGEMM for half-precision
2018-09-07 22:04:08 +02:00
Cedric Nugteren
391e5757bd
Fixed the tests of OMATCOPY to include proper complex conjugation
2018-07-31 21:44:28 +02:00
Cedric Nugteren
713d0f96b3
Fixed an error reporting issue related to the canary region
2018-07-31 21:24:21 +02:00
Cedric Nugteren
2dd539f911
Removed complex numbers support for CONVGEMM
2018-07-29 10:37:14 +02:00
Cedric Nugteren
1c9a741470
Merge branch 'master' into CLBlast-267-convgemm
2018-06-03 15:53:27 +02:00
Cedric Nugteren
4f594e3931
Added MKL as an alternative for CBLAS for correctness and performance comparisons
2018-06-02 17:57:45 +02:00
Cedric Nugteren
38318fa39c
Added maximum time reporting to the client statistics
2018-05-27 11:39:51 +02:00
Cedric Nugteren
c85c385aaf
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
2018-05-23 22:36:38 +02:00
Cedric Nugteren
838422fbb1
Further implemented single-kernel approach of convgemm; extended test to capture other parts of the kernel code
2018-05-21 11:47:16 +02:00
Cedric Nugteren
cbcd4ff7e8
Merge branch 'master' into CLBlast-267-convgemm
2018-05-19 17:54:27 +02:00
Cedric Nugteren
637e49e134
Fixed a bug in loading xgemm-direct JSON data from disk
2018-05-19 12:48:04 +02:00
Cedric Nugteren
8290ad78b9
Fixed a few issues with canary region testing
2018-05-17 12:16:32 +02:00
Cedric Nugteren
85341836dd
Added a canary region for overflow detection to the correctness tests
2018-05-17 10:45:50 +01:00
Cedric Nugteren
b608280361
Fixed the performance client for convgemm and added GFLOPS measurements
2018-05-09 19:59:31 +02:00
Cedric Nugteren
52e6195628
Split channels/strides testing values off from kernel sizes for more flexibility
2018-05-09 17:23:55 +02:00
Cedric Nugteren
2d1f6ba7fe
Added convgemm skeleton, test infrastructure, and first reference implementation
2018-05-06 11:35:34 +02:00
Cedric Nugteren
93610a9cba
Fixed some failing tests for GEMM and batched GEMM routines
2018-04-15 12:53:32 +02:00
Cedric Nugteren
f4d96e80c3
Fixed breaking preprocessor test on certain platforms due to empty kernel string
2018-03-15 20:45:41 +01:00
Cedric Nugteren
69ed46c8da
Implemented the XHAD Hadamard product routine
2018-02-02 21:18:37 +01:00
Cedric Nugteren
ef5008f5e4
Created the API and stubs for the HAD (hadamard-product) routines
2018-01-31 20:41:02 +01:00
Cedric Nugteren
b35e3d1e53
Small improvements to benchmarking for cuBLAS
2018-01-14 19:50:27 +01:00
Cedric Nugteren
90e8e55acb
Added test for the RetrieveParameters function
2018-01-11 20:34:09 +01:00
Cedric Nugteren
389919faec
Fixed bug in override parameters test
2018-01-11 20:30:45 +01:00
Cedric Nugteren
9fb2c61b25
Added API and tests for new GemmStridedBatched routine
2018-01-07 14:27:15 +01:00
Cedric Nugteren
00687f8d81
Prevented half-precision batched routines from failing in the tests
2018-01-06 19:26:38 +01:00
Cedric Nugteren
ce069545d4
Added CUDA interface to get temporary-buffer size for GEMM routine
2018-01-06 10:05:28 +01:00
Cedric Nugteren
5315b982a9
Added the temp-buffer to the GEMM testers and clients
2018-01-03 20:20:31 +01:00
Cedric Nugteren
eb89371d2b
Added a queue argument to the get-size function when running the tests/clients
2018-01-03 20:19:45 +01:00
Cedric Nugteren
bd540829ea
Fixes for the CUDA backend of CLBlast
2017-12-24 12:10:55 +01:00
Cedric Nugteren
ef71d8e9b5
Fixed unused variable warnings showing up with Clang
2017-12-23 16:07:26 +01:00
Cedric Nugteren
b4d3a50f19
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
2017-12-10 16:09:09 +01:00
Cedric Nugteren
9f02fb542c
Completed kernel modifications for pre-processor of all other kernels
2017-12-09 20:44:21 +01:00
Cedric Nugteren
ca5dbcd2bd
Made the pre-processor run by default for ARM and Qualcomm GPUs
2017-12-09 15:16:53 +01:00