JishinMaster
|
aec45ea637
|
set the correct flop count for xgemm
|
2021-03-13 21:48:04 +01:00 |
|
Jerry James
|
dc82a1fbc8
|
Use reference types to prevent unnecessary copying
|
2021-01-20 10:21:36 -07:00 |
|
Koichi Akabe
|
d9db543d75
|
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
|
2018-12-17 21:57:35 +09:00 |
|
Koichi Akabe
|
032e3b0cc0
|
Add kernel_mode option to im2col, col2im, and convgemm functions
|
2018-11-12 10:12:07 +09:00 |
|
Cedric Nugteren
|
6f67525ea6
|
Changed col2im to append to the existing im-buffer
|
2018-11-07 19:45:07 +01:00 |
|
Cedric Nugteren
|
469c346a8e
|
Fixed half-precision tests for im2col and col2im
|
2018-11-01 21:44:21 +01:00 |
|
Koichi Akabe
|
0b3d04f709
|
Fix col2im implementation
|
2018-10-30 14:54:55 +09:00 |
|
Cedric Nugteren
|
d45911b61d
|
Added groundwork for col2im algorithm plus first non-working version of kernel and test
|
2018-10-23 20:52:25 +02:00 |
|
Cedric Nugteren
|
44b630fc22
|
Some name changes in im2col code
|
2018-10-22 22:12:58 +02:00 |
|
Cedric Nugteren
|
ab0178c56b
|
Fixed MSVC's compilation error C1061 due to too many for-loops
|
2018-10-17 21:35:09 +02:00 |
|
Cedric Nugteren
|
83ba3d4b7b
|
Merge branch 'master' into convgemm_multi_kernel
|
2018-09-16 20:01:18 +02:00 |
|
Cedric Nugteren
|
4917b77e13
|
Added pre-processor test for GEMMK=1 kernel
|
2018-09-15 16:49:51 +02:00 |
|
Cedric Nugteren
|
b7d8339012
|
Reduced size of the xCONVGEMM correctness tests
|
2018-09-07 22:04:24 +02:00 |
|
Cedric Nugteren
|
bbb4523b7c
|
Added reference implementation for xCONVGEMM for half-precision
|
2018-09-07 22:04:08 +02:00 |
|
Cedric Nugteren
|
391e5757bd
|
Fixed the tests of OMATCOPY to include proper complex conjugation
|
2018-07-31 21:44:28 +02:00 |
|
Cedric Nugteren
|
713d0f96b3
|
Fixed an error reporting issue related to the canary region
|
2018-07-31 21:24:21 +02:00 |
|
Cedric Nugteren
|
2dd539f911
|
Removed complex numbers support for CONVGEMM
|
2018-07-29 10:37:14 +02:00 |
|
Cedric Nugteren
|
1c9a741470
|
Merge branch 'master' into CLBlast-267-convgemm
|
2018-06-03 15:53:27 +02:00 |
|
Cedric Nugteren
|
4f594e3931
|
Added MKL as an alternative for CBLAS for correctness and performance comparisons
|
2018-06-02 17:57:45 +02:00 |
|
Cedric Nugteren
|
38318fa39c
|
Added maximum time reporting to the client statistics
|
2018-05-27 11:39:51 +02:00 |
|
Cedric Nugteren
|
c85c385aaf
|
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
|
2018-05-23 22:36:38 +02:00 |
|
Cedric Nugteren
|
838422fbb1
|
Further implemented single-kernel approach of convgemm; extended test to capture other parts of the kernel code
|
2018-05-21 11:47:16 +02:00 |
|
Cedric Nugteren
|
cbcd4ff7e8
|
Merge branch 'master' into CLBlast-267-convgemm
|
2018-05-19 17:54:27 +02:00 |
|
Cedric Nugteren
|
637e49e134
|
Fixed a bug in loading xgemm-direct JSON data from disk
|
2018-05-19 12:48:04 +02:00 |
|
Cedric Nugteren
|
8290ad78b9
|
Fixed a few issues with canary region testing
|
2018-05-17 12:16:32 +02:00 |
|
Cedric Nugteren
|
85341836dd
|
Added a canary region for overflow detection to the correctness tests
|
2018-05-17 10:45:50 +01:00 |
|
Cedric Nugteren
|
b608280361
|
Fixed the performance client for convgemm and added GFLOPS measurements
|
2018-05-09 19:59:31 +02:00 |
|
Cedric Nugteren
|
52e6195628
|
Split channels/strides testing values off from kernel sizes for more flexibility
|
2018-05-09 17:23:55 +02:00 |
|
Cedric Nugteren
|
2d1f6ba7fe
|
Added convgemm skeleton, test infrastructure, and first reference implementation
|
2018-05-06 11:35:34 +02:00 |
|
Cedric Nugteren
|
93610a9cba
|
Fixed some failing tests for GEMM and batched GEMM routines
|
2018-04-15 12:53:32 +02:00 |
|
Cedric Nugteren
|
f4d96e80c3
|
Fixed breaking preprocessor test on certain platforms due to empty kernel string
|
2018-03-15 20:45:41 +01:00 |
|
Cedric Nugteren
|
69ed46c8da
|
Implemented the XHAD Hadamard product routine
|
2018-02-02 21:18:37 +01:00 |
|
Cedric Nugteren
|
ef5008f5e4
|
Created the API and stubs for the HAD (hadamard-product) routines
|
2018-01-31 20:41:02 +01:00 |
|
Cedric Nugteren
|
b35e3d1e53
|
Small improvements to benchmarking for cuBLAS
|
2018-01-14 19:50:27 +01:00 |
|
Cedric Nugteren
|
90e8e55acb
|
Added test for the RetrieveParameters function
|
2018-01-11 20:34:09 +01:00 |
|
Cedric Nugteren
|
389919faec
|
Fixed bug in override parameters test
|
2018-01-11 20:30:45 +01:00 |
|
Cedric Nugteren
|
9fb2c61b25
|
Added API and tests for new GemmStridedBatched routine
|
2018-01-07 14:27:15 +01:00 |
|
Cedric Nugteren
|
00687f8d81
|
Prevented half-precision batched routines from failing in the tests
|
2018-01-06 19:26:38 +01:00 |
|
Cedric Nugteren
|
ce069545d4
|
Added CUDA interface to get temporary-buffer size for GEMM routine
|
2018-01-06 10:05:28 +01:00 |
|
Cedric Nugteren
|
5315b982a9
|
Added the temp-buffer to the GEMM testers and clients
|
2018-01-03 20:20:31 +01:00 |
|
Cedric Nugteren
|
eb89371d2b
|
Added a queue argument to the get-size function when running the tests/clients
|
2018-01-03 20:19:45 +01:00 |
|
Cedric Nugteren
|
bd540829ea
|
Fixes for the CUDA backend of CLBlast
|
2017-12-24 12:10:55 +01:00 |
|
Cedric Nugteren
|
ef71d8e9b5
|
Fixed unused variable warnings showing up with Clang
|
2017-12-23 16:07:26 +01:00 |
|
Cedric Nugteren
|
b4d3a50f19
|
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
|
2017-12-10 16:09:09 +01:00 |
|
Cedric Nugteren
|
9f02fb542c
|
Completed kernel modifications for pre-processor of all other kernels
|
2017-12-09 20:44:21 +01:00 |
|
Cedric Nugteren
|
ca5dbcd2bd
|
Made the pre-processor run by default for ARM and Qualcomm GPUs
|
2017-12-09 15:16:53 +01:00 |
|
Cedric Nugteren
|
d9df62b794
|
Fixed defines parsing and substituting in pre-processor; fixed some variable names in kernels
|
2017-12-09 10:49:55 +01:00 |
|
Cedric Nugteren
|
0f9637bbac
|
Improved array-to-register promotion, now handling function calls as well
|
2017-12-05 20:39:49 +01:00 |
|
Cedric Nugteren
|
cf4555d1f4
|
Added GEMM (direct and in-direct) to the pre-processor testing; modified the loops in kernel accordingly
|
2017-12-03 16:40:36 +01:00 |
|
Cedric Nugteren
|
60312e5878
|
Reformated transpose kernels for the pre-processor; extended the amount of tests
|
2017-12-03 12:00:37 +01:00 |
|