Commit graph

201 commits

Author SHA1 Message Date
Cedric Nugteren 70016e8698 Updated to version 1.5.2 2021-01-19 21:19:12 +01:00
Cedric Nugteren 7fab29304c Added sample to play around with XAMAX routine 2020-03-08 11:26:18 +01:00
Cedric Nugteren 8433985051 Updated to version 1.5.1 2020-02-18 10:29:40 +01:00
Cedric Nugteren 560f7a40f6 Added convgemm to the CLBlast database, added initial parameters for Skylake GPU 2018-12-31 19:05:34 +01:00
Cedric Nugteren 1f0cd61824 Added first version of a tuner for the ConvGemm direct kernel 2018-12-18 13:59:26 +09:00
Cedric Nugteren 0c9411c844 Updated to version 1.5.0 2018-12-04 20:46:02 +01:00
Cedric Nugteren d45911b61d Added groundwork for col2im algorithm plus first non-working version of kernel and test 2018-10-23 20:52:25 +02:00
Cedric Nugteren 83ba3d4b7b Merge branch 'master' into convgemm_multi_kernel 2018-09-16 20:01:18 +02:00
Cedric Nugteren 9d9f09fce9 Name change of setting to NETLIB_PERSISTENT_OPENCL 2018-08-07 22:41:06 +02:00
Cedric Nugteren fe639455bd Added an option to compile the Netlib API with static OpenCL device and context 2018-08-05 21:12:39 +02:00
Cedric Nugteren 5903820ba2 Merge branch 'master' into CLBlast-267-convgemm 2018-07-29 10:26:34 +02:00
Cedric Nugteren f84036948b Renamed AMD SI workaround defines 2018-07-27 20:38:01 +02:00
Cedric Nugteren e8dea34fce Added workaround for weird AMD SI Hainan bug 2018-07-25 22:59:36 +02:00
Cedric Nugteren db179a1e40 Updated to CLBlast version 1.4.1 2018-07-14 12:29:06 +02:00
Cedric Nugteren 1c9a741470 Merge branch 'master' into CLBlast-267-convgemm 2018-06-03 15:53:27 +02:00
Cedric Nugteren 4471b67735 Updated to CLBlast version 1.4.0 2018-06-03 13:18:05 +02:00
Cedric Nugteren bd1715aff9 Fixes for CUDA version of CLBlast 2018-06-03 10:41:57 +02:00
Cedric Nugteren 4f594e3931 Added MKL as an alternative for CBLAS for correctness and performance comparisons 2018-06-02 17:57:45 +02:00
Cedric Nugteren cbcd4ff7e8 Merge branch 'master' into CLBlast-267-convgemm 2018-05-19 17:54:27 +02:00
Cedric Nugteren 76e0079a90 Fixed compilation issues 2018-05-19 14:18:23 +02:00
Cedric Nugteren 66583b3cda The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target 2018-05-19 12:48:59 +02:00
Cedric Nugteren 2d1f6ba7fe Added convgemm skeleton, test infrastructure, and first reference implementation 2018-05-06 11:35:34 +02:00
Cedric Nugteren 3e3a26e0da Fixes for the CUDA API 2018-04-20 21:50:36 +02:00
Cedric Nugteren 0e1a152023 First version of the tuning API, added interface for copy-kernel, added sample 2018-03-06 20:52:12 +01:00
Cedric Nugteren c5a28cd70b Added CLBlast version numbering to the compiled library 2018-02-11 15:31:21 +01:00
Cedric Nugteren ef5008f5e4 Created the API and stubs for the HAD (hadamard-product) routines 2018-01-31 20:41:02 +01:00
Cedric Nugteren 37c5e8f58c Updated to CLBlast version 1.3.0 2018-01-29 20:45:21 +01:00
Cedric Nugteren d1d80ca131 Fixed a compilation error of the kernel-preprocessor test under MSVC 2018-01-29 20:26:25 +01:00
Cedric Nugteren 0e5eaa6eb9 Factored out the generic parts of the GEMM routine tuner 2018-01-15 21:32:51 +01:00
Cedric Nugteren 90e8e55acb Added test for the RetrieveParameters function 2018-01-11 20:34:09 +01:00
Cedric Nugteren 9fb2c61b25 Added API and tests for new GemmStridedBatched routine 2018-01-07 14:27:15 +01:00
Cedric Nugteren 1e738db6dd Split the database into multiple small compilation units 2017-12-27 12:04:22 +01:00
Cedric Nugteren bd540829ea Fixes for the CUDA backend of CLBlast 2017-12-24 12:10:55 +01:00
Cedric Nugteren 8657e90cf8 Fixed linking of the preprocessor test for MSVC 2017-12-24 11:33:47 +01:00
Cedric Nugteren b1f52f130c Updated the database to use the new TRSV and Invert tuners 2017-12-23 13:55:22 +01:00
Cedric Nugteren aa7db4f987 Added TRSV block-size tuner 2017-12-23 13:34:57 +01:00
Cedric Nugteren 07a7012b0d Added skeleton for a tuner for the invert kernel 2017-12-19 21:10:48 +01:00
Cedric Nugteren c0c6d00b12 Added stub for a preprocessor and a corresponding compilation test 2017-11-25 10:24:05 +01:00
Cedric Nugteren c6690df896 Made the tuners be compiled by default 2017-11-19 14:33:25 +01:00
Cedric Nugteren 8d2f7d53aa Added a library with common tuner sources to speed-up compilation 2017-11-19 12:59:28 +01:00
Cedric Nugteren f94d498a37 Moved compilation function to separate file; removed dependency of tuners of the CLBlast library 2017-11-17 20:57:46 +01:00
Cedric Nugteren d9cf206979 Removed dependency on CLTune 2017-11-16 21:28:36 +01:00
Cedric Nugteren 1b2b46f2f0 Added first version of integrated and re-written auto-tuner 2017-11-15 22:49:35 +01:00
Cedric Nugteren 0cd78bb6f9 Added kernel timing functionality to the utilities 2017-11-15 22:47:06 +01:00
Cedric Nugteren 5d5e3f93bc Updated to CLBlast version 1.2.0 2017-11-08 21:30:06 +01:00
Cedric Nugteren b18cc9d3f1
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
2017-11-07 22:20:13 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren f24d611e57 Made it possible to compile the CLBlast performance clients for Android with the NDK 2017-10-29 13:02:14 +01:00
Cedric Nugteren 334a26eb12 Added initial version of a GEMM kernel selection tuner 2017-10-28 17:30:29 +02:00
Cedric Nugteren bd57dfa435 Moved timing function to a separate file 2017-10-28 14:12:05 +02:00