Cedric Nugteren
|
4f594e3931
|
Added MKL as an alternative for CBLAS for correctness and performance comparisons
|
2018-06-02 17:57:45 +02:00 |
|
Cedric Nugteren
|
38318fa39c
|
Added maximum time reporting to the client statistics
|
2018-05-27 11:39:51 +02:00 |
|
Cedric Nugteren
|
c85c385aaf
|
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
|
2018-05-23 22:36:38 +02:00 |
|
Cedric Nugteren
|
637e49e134
|
Fixed a bug in loading xgemm-direct JSON data from disk
|
2018-05-19 12:48:04 +02:00 |
|
Cedric Nugteren
|
8290ad78b9
|
Fixed a few issues with canary region testing
|
2018-05-17 12:16:32 +02:00 |
|
Cedric Nugteren
|
85341836dd
|
Added a canary region for overflow detection to the correctness tests
|
2018-05-17 10:45:50 +01:00 |
|
Cedric Nugteren
|
93610a9cba
|
Fixed some failing tests for GEMM and batched GEMM routines
|
2018-04-15 12:53:32 +02:00 |
|
Cedric Nugteren
|
f4d96e80c3
|
Fixed breaking preprocessor test on certain platforms due to empty kernel string
|
2018-03-15 20:45:41 +01:00 |
|
Cedric Nugteren
|
69ed46c8da
|
Implemented the XHAD Hadamard product routine
|
2018-02-02 21:18:37 +01:00 |
|
Cedric Nugteren
|
ef5008f5e4
|
Created the API and stubs for the HAD (hadamard-product) routines
|
2018-01-31 20:41:02 +01:00 |
|
Cedric Nugteren
|
b35e3d1e53
|
Small improvements to benchmarking for cuBLAS
|
2018-01-14 19:50:27 +01:00 |
|
Cedric Nugteren
|
90e8e55acb
|
Added test for the RetrieveParameters function
|
2018-01-11 20:34:09 +01:00 |
|
Cedric Nugteren
|
389919faec
|
Fixed bug in override parameters test
|
2018-01-11 20:30:45 +01:00 |
|
Cedric Nugteren
|
9fb2c61b25
|
Added API and tests for new GemmStridedBatched routine
|
2018-01-07 14:27:15 +01:00 |
|
Cedric Nugteren
|
00687f8d81
|
Prevented half-precision batched routines from failing in the tests
|
2018-01-06 19:26:38 +01:00 |
|
Cedric Nugteren
|
ce069545d4
|
Added CUDA interface to get temporary-buffer size for GEMM routine
|
2018-01-06 10:05:28 +01:00 |
|
Cedric Nugteren
|
5315b982a9
|
Added the temp-buffer to the GEMM testers and clients
|
2018-01-03 20:20:31 +01:00 |
|
Cedric Nugteren
|
eb89371d2b
|
Added a queue argument to the get-size function when running the tests/clients
|
2018-01-03 20:19:45 +01:00 |
|
Cedric Nugteren
|
bd540829ea
|
Fixes for the CUDA backend of CLBlast
|
2017-12-24 12:10:55 +01:00 |
|
Cedric Nugteren
|
ef71d8e9b5
|
Fixed unused variable warnings showing up with Clang
|
2017-12-23 16:07:26 +01:00 |
|
Cedric Nugteren
|
b4d3a50f19
|
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
|
2017-12-10 16:09:09 +01:00 |
|
Cedric Nugteren
|
9f02fb542c
|
Completed kernel modifications for pre-processor of all other kernels
|
2017-12-09 20:44:21 +01:00 |
|
Cedric Nugteren
|
ca5dbcd2bd
|
Made the pre-processor run by default for ARM and Qualcomm GPUs
|
2017-12-09 15:16:53 +01:00 |
|
Cedric Nugteren
|
d9df62b794
|
Fixed defines parsing and substituting in pre-processor; fixed some variable names in kernels
|
2017-12-09 10:49:55 +01:00 |
|
Cedric Nugteren
|
0f9637bbac
|
Improved array-to-register promotion, now handling function calls as well
|
2017-12-05 20:39:49 +01:00 |
|
Cedric Nugteren
|
cf4555d1f4
|
Added GEMM (direct and in-direct) to the pre-processor testing; modified the loops in kernel accordingly
|
2017-12-03 16:40:36 +01:00 |
|
Cedric Nugteren
|
60312e5878
|
Reformated transpose kernels for the pre-processor; extended the amount of tests
|
2017-12-03 12:00:37 +01:00 |
|
Cedric Nugteren
|
bf7aeb8d5b
|
Improved the pre-processor's handling of defines; added a special nested defines test
|
2017-11-30 21:43:16 +01:00 |
|
Cedric Nugteren
|
13eb772343
|
Integrated pre-processor in compilation flow, default is still disabled
|
2017-11-30 21:32:47 +01:00 |
|
Cedric Nugteren
|
0dde6af703
|
Extended the preprocessor tests to include CopyFast and CopyPad
|
2017-11-29 20:18:36 +01:00 |
|
Cedric Nugteren
|
426406668e
|
Improved the pre-processor tester, added GEMV and GER kernels
|
2017-11-28 20:52:47 +01:00 |
|
Cedric Nugteren
|
f01bcded1e
|
Moved string splitting functions; added string character removal function
|
2017-11-25 17:44:21 +01:00 |
|
Cedric Nugteren
|
c0c6d00b12
|
Added stub for a preprocessor and a corresponding compilation test
|
2017-11-25 10:24:05 +01:00 |
|
Cedric Nugteren
|
d7b29d864a
|
Fixed a Clang compilation error
|
2017-11-24 21:41:45 +01:00 |
|
Cedric Nugteren
|
a5cef9ef3b
|
Added missing include file
|
2017-11-24 21:11:52 +01:00 |
|
Cedric Nugteren
|
a768b7686b
|
Added precision check to parameter override for the clients
|
2017-11-24 21:09:39 +01:00 |
|
Cedric Nugteren
|
9527c89c30
|
Made parameter override in the clients a command-line argument and added support for multi-kernel routines
|
2017-11-22 20:53:20 +01:00 |
|
Cedric Nugteren
|
8c9ecd9736
|
Implemented first version of reading JSON files from disk in the client to override parameters
|
2017-11-21 22:05:08 +01:00 |
|
Cedric Nugteren
|
5467c0cac5
|
Fixed a variety of warnings and an error for MSVC2013 compilation
|
2017-11-19 21:09:24 +01:00 |
|
Cedric Nugteren
|
4bac1287f2
|
Moved square-difference utility function for use in the tuners
|
2017-11-13 21:10:44 +01:00 |
|
Cedric Nugteren
|
d24138808b
|
Fixed an FP16 issue in the homatcopy test; added a comment about improper testing of integer returning functions for FP16
|
2017-11-08 21:20:07 +01:00 |
|
Cedric Nugteren
|
b18cc9d3f1
|
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
|
2017-11-07 22:20:13 +01:00 |
|
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
|
Cedric Nugteren
|
12b08ae491
|
Merge branch 'master' into android_support
|
2017-10-28 17:32:37 +02:00 |
|
Cedric Nugteren
|
bd57dfa435
|
Moved timing function to a separate file
|
2017-10-28 14:12:05 +02:00 |
|
Cedric Nugteren
|
e388f055f7
|
Fixed small bug in (unused) invert tester
|
2017-10-25 20:35:39 +02:00 |
|
Cedric Nugteren
|
9d879c949a
|
Fix an incompatibility with CUDA's FP16 definition
|
2017-10-17 20:29:23 +02:00 |
|
Cedric Nugteren
|
8431a165d0
|
Fixed a small copy-paste typo
|
2017-10-15 19:38:48 +02:00 |
|
Cedric Nugteren
|
e6da575fff
|
Modified test interfaces such that they support either OpenCL or CUDA
|
2017-10-15 19:35:21 +02:00 |
|
Cedric Nugteren
|
7663cba234
|
Fixes for the CUDA API: first tests pass and the client runs
|
2017-10-15 17:43:20 +02:00 |
|