Commit graph

169 commits

Author SHA1 Message Date
Cedric Nugteren bd540829ea Fixes for the CUDA backend of CLBlast 2017-12-24 12:10:55 +01:00
Cedric Nugteren 8657e90cf8 Fixed linking of the preprocessor test for MSVC 2017-12-24 11:33:47 +01:00
Cedric Nugteren b1f52f130c Updated the database to use the new TRSV and Invert tuners 2017-12-23 13:55:22 +01:00
Cedric Nugteren aa7db4f987 Added TRSV block-size tuner 2017-12-23 13:34:57 +01:00
Cedric Nugteren 07a7012b0d Added skeleton for a tuner for the invert kernel 2017-12-19 21:10:48 +01:00
Cedric Nugteren c0c6d00b12 Added stub for a preprocessor and a corresponding compilation test 2017-11-25 10:24:05 +01:00
Cedric Nugteren c6690df896 Made the tuners be compiled by default 2017-11-19 14:33:25 +01:00
Cedric Nugteren 8d2f7d53aa Added a library with common tuner sources to speed-up compilation 2017-11-19 12:59:28 +01:00
Cedric Nugteren f94d498a37 Moved compilation function to separate file; removed dependency of tuners of the CLBlast library 2017-11-17 20:57:46 +01:00
Cedric Nugteren d9cf206979 Removed dependency on CLTune 2017-11-16 21:28:36 +01:00
Cedric Nugteren 1b2b46f2f0 Added first version of integrated and re-written auto-tuner 2017-11-15 22:49:35 +01:00
Cedric Nugteren 0cd78bb6f9 Added kernel timing functionality to the utilities 2017-11-15 22:47:06 +01:00
Cedric Nugteren 5d5e3f93bc Updated to CLBlast version 1.2.0 2017-11-08 21:30:06 +01:00
Cedric Nugteren b18cc9d3f1
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
2017-11-07 22:20:13 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren f24d611e57 Made it possible to compile the CLBlast performance clients for Android with the NDK 2017-10-29 13:02:14 +01:00
Cedric Nugteren 334a26eb12 Added initial version of a GEMM kernel selection tuner 2017-10-28 17:30:29 +02:00
Cedric Nugteren bd57dfa435 Moved timing function to a separate file 2017-10-28 14:12:05 +02:00
Cedric Nugteren 8579b2b494 Added a DTRSM C++ interface example 2017-10-27 21:53:19 +02:00
Matthias Vogelgesang 34e537a5c1 Use GNUInstallDirs to determine install paths
The GNUInstallDirs module* provides variables that match the install directories
for GNU Software and allows users to override them. Without hardcoding paths
packagers can choose library paths according to distribution policies (i.e.
lib, lib64, lib<arch>, ...).

* https://cmake.org/cmake/help/v3.0/module/GNUInstallDirs.html
2017-10-23 15:54:55 +02:00
Cedric Nugteren 42dcd8fd8a Merge pull request #204 from CNugteren/cuda_api
Cuda API to CLBlast
2017-10-20 12:07:30 +02:00
Cedric Nugteren a3069a97c3 Prepared test and client infrastructure for use with the CUDA API 2017-10-15 13:56:19 +02:00
Cedric Nugteren 48133a0cd1 Added an option to choose whether to override the MSVC flags from /MT to /MD (default ON) 2017-10-14 16:26:35 +02:00
Cedric Nugteren 74d6e0048c Added DAXPY example for the CUDA API 2017-10-14 12:23:35 +02:00
Cedric Nugteren 16b9efd605 Added first untested CUDA sample 2017-10-14 10:50:28 +02:00
Cedric Nugteren b901809345 Added first (untested) version of a CUDA API 2017-10-11 23:16:57 +02:00
Cedric Nugteren df3c9f4a8a Moved non-routine-specific API functions and includes to separate files 2017-10-08 21:52:02 +02:00
Cedric Nugteren f4c4674cf6 Updated to version 1.1.0 2017-09-30 17:19:17 +02:00
Cedric Nugteren 2ef6578961 Added first version of a small CLBlast diagnostics helper 2017-09-19 21:43:35 +02:00
Cedric Nugteren 76382ff6c1 Added the new vendor-architecture-name hierarchy to the tuners as well 2017-09-10 16:34:54 +02:00
Cedric Nugteren 91ea7fcde2 Introduced the notion of a device-architecture for the database and added device and architecture name mappings 2017-09-08 21:09:05 +02:00
Cedric Nugteren 20da5e33a8 Split the database files over multiple directories and files; first step towards separate compilation 2017-09-06 21:50:42 +02:00
Cedric Nugteren 777681dcbd Merge branch 'master' into im_to_col 2017-08-12 20:50:00 +02:00
Cedric Nugteren d30c459c5f Fixed .hpp -> .h typo in CMakeLists 2017-08-12 16:11:23 +02:00
Cedric Nugteren f6b6d7ef4b Properly set the common test utilities in the CMake files 2017-08-12 16:07:28 +02:00
Cedric Nugteren 844e68853e Moved some utility functions to a test-specific utility compilation-unit 2017-08-12 15:38:17 +02:00
Cedric Nugteren d588f28dbe Updated CMakeLists to include header files such that IDEs can locate them 2017-08-11 21:20:40 +02:00
Cedric Nugteren eb896838b1 Updated to version 1.0.1 (bugfix release) 2017-08-08 20:35:49 +02:00
Cedric Nugteren 1155c068e9 Updated to version 1.0.0 2017-07-30 20:54:21 +02:00
Cedric Nugteren b494df1111 Fixes warnings for Clang & AppleClang 2017-07-30 18:52:20 +02:00
Cedric Nugteren 6ceb9b7152 Fixes to AppVeyor and Travis scripts 2017-07-30 18:34:39 +02:00
Cedric Nugteren f2477f6636 Removed spurious warning for Clang < 3.9 2017-07-12 20:58:31 +02:00
Cedric Nugteren 84ec50e29d Added interface and stubs for the im2col routine 2017-07-02 12:10:22 +02:00
Cedric Nugteren 52881f3864 Added batched GEMM example program 2017-06-29 21:15:25 +02:00
Cedric Nugteren 4e51b1e1f8 Moved and inlined some static member variables and disabled spurious clang warnings 2017-06-27 21:05:16 +02:00
Cedric Nugteren ce528a9d39 Fixed and suppresses several warnings for MSVC 2017-06-26 21:38:04 +02:00
Cedric Nugteren a823edb65f Reduced optimization level for the (non-performance critical) host-code to speed-up compilation 2017-06-26 21:36:56 +02:00
Cedric Nugteren e9d2a2f54c Updated to version 0.11.0 2017-05-02 20:29:59 +02:00
Cedric Nugteren e3bb58f602 Finalized support for performance testing against cuBLAS 2017-04-16 17:53:51 +02:00
Cedric Nugteren 0cebcbcc71 Added proper CMake searching for CUDA and cuBLAS 2017-04-03 21:45:18 +02:00