Commit graph

675 commits

Author SHA1 Message Date
Cedric Nugteren 1ee71fdc80 Tuned the plots for a tight-layout for in papers and presentations 2017-04-01 14:00:46 +02:00
Cedric Nugteren fa5c4b00b7 Replaced the R graph scripts with Python/Matplotlib benchmark scripts 2017-03-26 15:36:34 +02:00
Cedric Nugteren a98c00a267 Fixed a GCC/MSVC compilation issue 2017-03-20 19:53:55 +01:00
Cedric Nugteren a21d903796 Merge pull request #142 from CNugteren/gemm_batched
Added a first batched version of the GEMM routine
2017-03-19 18:27:40 +01:00
Cedric Nugteren 0610447a7a Fixed a compilation issue for GCC/MSVC 2017-03-19 17:37:52 +01:00
Cedric Nugteren c27d2f0c1e Added an (optional) non-direct implementation of the batched GEMM routine 2017-03-19 16:04:04 +01:00
Cedric Nugteren 2fd04dae83 Added batched versions of the pad/copy/transpose kernels 2017-03-19 15:57:44 +01:00
Cedric Nugteren 11bb30e72b Added the possibility to tune batched kernels 2017-03-14 20:29:51 +01:00
Cedric Nugteren 068ff32e9f Fixed a linker issue for Clang 2017-03-12 10:41:18 +01:00
Cedric Nugteren 7b8f8fce68 Added initial naive version of the batched GEMM routine based on the direct GEMM kernel 2017-03-11 16:02:45 +01:00
Cedric Nugteren 49e04c7fce Added API and test infrastructure for the batched GEMM routine 2017-03-10 21:24:35 +01:00
Cedric Nugteren de3500ed18 Merge pull request #141 from CNugteren/axpy_batched
Added the batched version of the AXPY routine
2017-03-10 21:15:29 +01:00
Cedric Nugteren 3846f44eaf Small fix for a file that isn't currently compiled anymore 2017-03-10 20:53:20 +01:00
Cedric Nugteren d754586b49 Added proper testing of the alpha parameter; finalized the batched AXPY implementation 2017-03-10 20:49:59 +01:00
Cedric Nugteren 92a657290a Fixed a small compilation bug for MSVC related to a floating-point constant 2017-03-10 20:30:10 +01:00
Cedric Nugteren 878d93e7dc Implemented a batched version of the AXPY kernel 2017-03-08 20:36:35 +01:00
Cedric Nugteren fa0a9c689f Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes 2017-03-08 20:10:20 +01:00
Cedric Nugteren 6aba0bbae7 Minor fixes to the client w.r.t. the addition of the batch count 2017-03-05 16:44:16 +01:00
Cedric Nugteren b114ea49a9 Added first naive version of the batched AXPY routine 2017-03-05 15:06:14 +01:00
Cedric Nugteren cdf354f895 Adjusted the test-infrastructure to support testing of batched-versions of routines 2017-03-05 15:04:16 +01:00
Cedric Nugteren 7f14b11f1e Changed the way the test-data is generated: now using a single MT generator and distribution for all data 2017-03-05 11:13:47 +01:00
Cedric Nugteren f9a520b3af Prepared generator for batched routines; added batched AXPY routine interface 2017-03-05 10:38:38 +01:00
Cedric Nugteren 37228c9098 Fixed a missing include for the tests 2017-03-04 20:45:39 +01:00
Cedric Nugteren e9ef037549 Added tuning results for the Radeon HD6750M GPU (Apple OpenCL) 2017-03-04 15:24:55 +01:00
Cedric Nugteren e993ee077b Added a proper data-preparation function for the TRSM tests 2017-03-04 15:21:33 +01:00
Cedric Nugteren 3fc73851f7 Added proper support for the b_offset argument in TRSM 2017-03-01 21:23:33 +01:00
Cedric Nugteren e8d5923d27 Made a double to float cast explicit for MSVC compatibility (C2397) 2017-03-01 20:42:06 +01:00
Cedric Nugteren d6f1b5fca3 Added L2 error computation and checking for half-precision tests 2017-02-27 21:49:20 +01:00
Cedric Nugteren 00281dad26 Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants 2017-02-27 21:00:04 +01:00
Cedric Nugteren 4284fcd940 Updated the README documentation 2017-02-26 16:32:53 +01:00
Cedric Nugteren 7de7e7d8ed Merge pull request #138 from CNugteren/triangular_solvers
Added the triangular solvers (TRSV/TRSM)
2017-02-26 16:26:41 +01:00
Cedric Nugteren e09c26c706 Split the GEMM kernel further up to prevent C1091 in MSVC 2017-02-26 15:03:12 +01:00
Cedric Nugteren dde67ac79e Minor fix to the generator script 2017-02-26 14:53:58 +01:00
Cedric Nugteren ea6790665d Merge branch 'development' into triangular_solvers 2017-02-26 14:51:45 +01:00
Cedric Nugteren a145890aaa Added a guard against invalid buffer sizes in the prepare-data functions for tests 2017-02-26 14:37:29 +01:00
Cedric Nugteren df7638c305 Fixed an out-of-bounds memory access when filling a matrix with a constant 2017-02-26 14:31:05 +01:00
Cedric Nugteren b7310036ed Removed half-precision support from the TRSM routine; too unstable 2017-02-26 12:56:21 +01:00
Cedric Nugteren 70d8c4bad7 Improved the correctness tests for complex numbers in case either real or imag is much larger than the other 2017-02-26 10:19:53 +01:00
Cedric Nugteren a433987441 Fixes division in the kernel for inversion of complex numbers 2017-02-26 10:18:45 +01:00
Cedric Nugteren ccac957f17 Added documentation for the TRSV and TRSM routines 2017-02-25 13:02:15 +01:00
Cedric Nugteren 492ee3d0a5 Removed the invert routine from the tests 2017-02-25 12:28:13 +01:00
Cedric Nugteren e47d95887c Added PrepareData function for TRSM to create proper test input 2017-02-25 12:23:04 +01:00
Cedric Nugteren 2f2a510c38 Implemented a simple row-major to col-major problem conversion for TRSM 2017-02-24 21:08:44 +01:00
Cedric Nugteren 1e5b5157bc Fixed a few issues with the TRSM routine; some tests still failing 2017-02-22 20:31:33 +01:00
Cedric Nugteren 133ebfc834 Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass 2017-02-19 17:43:26 +01:00
Cedric Nugteren 0643a29af5 Added tuning parameters for the AMD RX480 GPU (Ellesmere) 2017-02-18 13:59:10 +01:00
Cedric Nugteren 0ea30263ac Merge pull request #137 from CNugteren/custom_parameters
API to override tuning parameters
2017-02-18 12:34:38 +01:00
Cedric Nugteren 7b2170818f Changed the override-parameters test such that it is compatible with more devices 2017-02-18 11:22:07 +01:00
Cedric Nugteren 2e0951c6dc Fixed small typo in the documentation 2017-02-18 11:05:54 +01:00
Cedric Nugteren fef11a208c Added documentation for the OverrideParameters function 2017-02-18 11:02:57 +01:00