Commit Graph

1060 Commits (ef5008f5e46c4fe6d3728beff1d3277d02aae099)

Author SHA1 Message Date
Cedric Nugteren ef5008f5e4 Created the API and stubs for the HAD (hadamard-product) routines 2018-01-31 20:41:02 +01:00
Cedric Nugteren 37c5e8f58c Updated to CLBlast version 1.3.0 2018-01-29 20:45:21 +01:00
Cedric Nugteren f12c7fcdf2 Merge branch 'master' of github.com:CNugteren/CLBlast 2018-01-29 20:34:37 +01:00
Cedric Nugteren d1d80ca131 Fixed a compilation error of the kernel-preprocessor test under MSVC 2018-01-29 20:26:25 +01:00
Cedric Nugteren 97e92cb10c Updated the known issues 2018-01-28 14:50:03 +01:00
Cedric Nugteren 180532ea39 Some fixes to the benchmark scripts 2018-01-27 20:06:13 +01:00
Cedric Nugteren ada762f668 Minor displaying improvements to the graph plotting scripts 2018-01-26 20:38:11 +01:00
Cedric Nugteren caebe8a9d5 Fixed an event synchronisation issue in the batched gemm routines 2018-01-26 20:37:04 +01:00
Cedric Nugteren 3651b51664 Improved the benchmark scripts; added gemmstridedbatched benchmark 2018-01-25 21:24:18 +01:00
Cedric Nugteren 19fd263fb2 Moved some constants from global scope to a function; removed unnecessary includes 2018-01-25 20:00:43 +01:00
Cedric Nugteren 6a9d6b5da2 Changed the default number of runs for the GEMV tuner to fix issues for FP16 2018-01-25 19:57:36 +01:00
Cedric Nugteren b2c946c517
Merge pull request #244 from CNugteren/kernel_selection_batched_gemm
Kernel selection for batched GEMM
2018-01-20 10:19:28 +01:00
Cedric Nugteren c3f9371d16 Made GEMM routine tuning a bit more generic in preparation of possible separate batched tuning arguments 2018-01-18 19:41:59 +01:00
Cedric Nugteren bc54411d19 Made the batched routines also chose direct/indirect kernel like the main GEMM routine 2018-01-18 19:41:02 +01:00
Cedric Nugteren 0e5eaa6eb9 Factored out the generic parts of the GEMM routine tuner 2018-01-15 21:32:51 +01:00
Cedric Nugteren b35e3d1e53 Small improvements to benchmarking for cuBLAS 2018-01-14 19:50:27 +01:00
Cedric Nugteren 6d52eb2956
Merge pull request #240 from CNugteren/retrieve_tuning_parameters
Retrieve tuning parameters
2018-01-11 23:09:48 +01:00
Cedric Nugteren 90e8e55acb Added test for the RetrieveParameters function 2018-01-11 20:34:09 +01:00
Cedric Nugteren a500f537d8 Added a RetrieveParameters function to inspect tuning parameters 2018-01-11 20:32:06 +01:00
Cedric Nugteren 389919faec Fixed bug in override parameters test 2018-01-11 20:30:45 +01:00
Cedric Nugteren 9b084d0409
Merge pull request #239 from CNugteren/gemm_strided_batched
GemmStridedBatched
2018-01-11 19:42:50 +01:00
Cedric Nugteren 99a4df88a6 Implemented the in-direct version of the strided-batched GEMM kernel 2018-01-08 21:07:01 +01:00
Cedric Nugteren 13f0f6fc6e Implemented direct version of strided-batched GEMM kernel 2018-01-07 14:58:45 +01:00
Cedric Nugteren 9fb2c61b25 Added API and tests for new GemmStridedBatched routine 2018-01-07 14:27:15 +01:00
Cedric Nugteren 0c48c6e6c4 Fixed a minor nullptr related issue in the code generator 2018-01-06 19:32:54 +01:00
Cedric Nugteren 00687f8d81 Prevented half-precision batched routines from failing in the tests 2018-01-06 19:26:38 +01:00
Cedric Nugteren f1e3b35541 Reduced duplicate code in the batched GEMM implementation 2018-01-06 19:26:11 +01:00
Cedric Nugteren c988c2cdd1 Updated changelog and roadmap 2018-01-06 17:16:11 +01:00
Cedric Nugteren c9b5d614e2 Fixed a vendor naming bug in the tuners and in the database 2018-01-06 17:02:58 +01:00
Cedric Nugteren a7ccce1969
Merge pull request #238 from CNugteren/gemm_api_with_temp_buffer
GEMM API with optional temp buffer
2018-01-06 16:08:27 +01:00
Cedric Nugteren ad197da08d Fixed the CUDA interface: replaced nullptr with 0 2018-01-06 13:38:44 +01:00
Cedric Nugteren e71c037304 Fixed a performance overhead in database creation: it is again a static variable now as it was before 2018-01-06 11:28:04 +01:00
Cedric Nugteren ce069545d4 Added CUDA interface to get temporary-buffer size for GEMM routine 2018-01-06 10:05:28 +01:00
Cedric Nugteren 44431daecc Added a CUDA version of the GEMM temp-buffer optional argument 2018-01-04 19:33:51 +01:00
Cedric Nugteren af14fff1e9 Updated the generator script to automatically generate the temp-buffer code 2018-01-04 19:31:57 +01:00
Cedric Nugteren a3925e5060 Updated the ROADMAP 2018-01-03 20:38:48 +01:00
Cedric Nugteren 5315b982a9 Added the temp-buffer to the GEMM testers and clients 2018-01-03 20:20:31 +01:00
Cedric Nugteren eb89371d2b Added a queue argument to the get-size function when running the tests/clients 2018-01-03 20:19:45 +01:00
Cedric Nugteren 8040a4e355
Merge pull request #236 from CNugteren/trsm_compilation
Fixed compilation of TRSM/Invert for AMD APP
2018-01-01 16:10:11 +01:00
Cedric Nugteren ad483123e6 Fixed the issue with AMD's APP compiler not being able to compile the invert kernel 2017-12-31 16:13:13 +01:00
Cedric Nugteren 1511909b6f Revert "Added a simple test to check compilation of the invert kernels (issue with AMD APP)"
This reverts commit 0eb9b35481.
2017-12-31 16:11:35 +01:00
Cedric Nugteren 7f893a85d9 Revert "Added options to disable parts of the invert kernel to find out where the AMD compiler crashes"
This reverts commit 407ed52cec.
2017-12-31 16:10:40 +01:00
Cedric Nugteren b4c8e1d9a5 Made plotting script more flexible: extra argument to set the comparison library 2017-12-31 16:02:46 +01:00
Cedric Nugteren 69226ae828 Changed the invert kernel slightly; added part1a/part1b disable-defines 2017-12-31 14:07:08 +01:00
Cedric Nugteren 7ce415b927 Fixed ifdef's into ifndef's 2017-12-30 21:17:31 +01:00
Cedric Nugteren 407ed52cec Added options to disable parts of the invert kernel to find out where the AMD compiler crashes 2017-12-30 21:07:50 +01:00
Cedric Nugteren ad1227c4f2 Added optional temp-buffer argument to C++ interface of GEMM 2017-12-30 18:45:06 +01:00
Cedric Nugteren 6d1e30e61f Added interface to compute the required temporary buffer size for GEMM 2017-12-28 14:46:45 +01:00
Cedric Nugteren aaea9474a1 Factored out argument processing from the GEMM routine 2017-12-28 13:56:18 +01:00
Cedric Nugteren 74792ce96c Refactored GEMM code in preparation of separate temp-buffer computation 2017-12-28 11:08:10 +01:00