Commit graph

1015 commits

Author SHA1 Message Date
Cedric Nugteren 34a33b54cf Changed GEMM routine tuner's scoring to use L2 measure instead for better averaging 2017-11-06 20:50:36 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren 73272ab97d Fixed a bug in database compression/decompression 2017-11-02 21:19:18 +01:00
Cedric Nugteren 5c90577dfd Added collecting and printing of scores for the kernel-selection tuner 2017-10-30 20:39:21 +01:00
Cedric Nugteren 061b1c571b Merge branch 'binary_cache_platform_dependent' 2017-10-30 19:42:35 +01:00
Cedric Nugteren ac5a58cfe5 Added platform ID to the binary program cache to prevent issues with multi-platform systems 2017-10-29 20:01:30 +01:00
Cedric Nugteren 19c53f6dd0
Merge pull request #208 from CNugteren/android_support
Added Android support
2017-10-29 16:45:56 +01:00
Cedric Nugteren f24d611e57 Made it possible to compile the CLBlast performance clients for Android with the NDK 2017-10-29 13:02:14 +01:00
Cedric Nugteren 319762f150 Added Android support using the GNU C++ STL library and the GCC toolchain 2017-10-29 12:07:07 +01:00
Cedric Nugteren 12b08ae491 Merge branch 'master' into android_support 2017-10-28 17:32:37 +02:00
Cedric Nugteren 334a26eb12 Added initial version of a GEMM kernel selection tuner 2017-10-28 17:30:29 +02:00
Cedric Nugteren bd57dfa435 Moved timing function to a separate file 2017-10-28 14:12:05 +02:00
Cedric Nugteren fa6e5e67f5 Fixed a bug when using the matrix A-offset argument for the TRSM routine 2017-10-27 22:12:30 +02:00
Cedric Nugteren 449577cf07 Reduced TRSM block-size for better numerical stability 2017-10-27 22:07:43 +02:00
Cedric Nugteren 44f7fa628a Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM 2017-10-27 22:01:15 +02:00
Cedric Nugteren 8579b2b494 Added a DTRSM C++ interface example 2017-10-27 21:53:19 +02:00
Cedric Nugteren e388f055f7 Fixed small bug in (unused) invert tester 2017-10-25 20:35:39 +02:00
Cedric Nugteren 8cdb5cb4a7 Updated roadmap with links to issues and status 2017-10-25 20:35:39 +02:00
Cedric Nugteren d49aae236e Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls 2017-10-25 20:35:39 +02:00
Cedric Nugteren 42ac3b4748 Merge pull request #206 from matze/use-gnuinstall-dirs
Use GNUInstallDirs to determine install paths
2017-10-23 20:03:47 +02:00
Matthias Vogelgesang 34e537a5c1 Use GNUInstallDirs to determine install paths
The GNUInstallDirs module* provides variables that match the install directories
for GNU Software and allows users to override them. Without hardcoding paths
packagers can choose library paths according to distribution policies (i.e.
lib, lib64, lib<arch>, ...).

* https://cmake.org/cmake/help/v3.0/module/GNUInstallDirs.html
2017-10-23 15:54:55 +02:00
Cedric Nugteren 5fd1f2fc60 Added first version of a roadmap 2017-10-20 18:21:31 +02:00
Cedric Nugteren 472f90501c Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570 2017-10-20 18:06:12 +02:00
Cedric Nugteren 42dcd8fd8a Merge pull request #204 from CNugteren/cuda_api
Cuda API to CLBlast
2017-10-20 12:07:30 +02:00
Cedric Nugteren 363568787e Moved CUmodule code from Kernel to Program class to not require re-compilation every time 2017-10-18 18:17:30 +02:00
Cedric Nugteren 9d879c949a Fix an incompatibility with CUDA's FP16 definition 2017-10-17 20:29:23 +02:00
Cedric Nugteren b1270f04b8 Made buffers of batched routines read/write (was: read-only) 2017-10-17 19:56:47 +02:00
Cedric Nugteren f349731d54 CUDA kernel compilation fixes 2017-10-17 19:53:09 +02:00
Cedric Nugteren 03760f80eb Added CUDA API documentation 2017-10-16 21:54:42 +02:00
Cedric Nugteren 0719f14486 Made all CUDA kernel launches synchronous; removed exception raising 2017-10-16 21:54:42 +02:00
Cedric Nugteren d62823f067 Added a missing OpenCL-to-CUDA function translation 2017-10-15 19:53:52 +02:00
Cedric Nugteren 8431a165d0 Fixed a small copy-paste typo 2017-10-15 19:38:48 +02:00
Cedric Nugteren e6da575fff Modified test interfaces such that they support either OpenCL or CUDA 2017-10-15 19:35:21 +02:00
Cedric Nugteren 7663cba234 Fixes for the CUDA API: first tests pass and the client runs 2017-10-15 17:43:20 +02:00
Cedric Nugteren 71049e8d39 Added the SM-compute-arch version to the nv compile options 2017-10-15 17:41:44 +02:00
Cedric Nugteren a3069a97c3 Prepared test and client infrastructure for use with the CUDA API 2017-10-15 13:56:19 +02:00
Cedric Nugteren 7408da174c Various fixes to make the first CUDA examples work 2017-10-15 12:17:35 +02:00
Cedric Nugteren 55a802c63d Fixed a kernel/attribute order bug in the direct GEMM kernels 2017-10-14 17:21:34 +02:00
Cedric Nugteren b06bc01da9 Make local memory pointers a define in OpenCL; some fixes to the recently changed transpose kernel code 2017-10-14 17:13:54 +02:00
Cedric Nugteren d9456306e0 Made transpose kernel struct init proper according to the C standard 2017-10-14 16:48:06 +02:00
Cedric Nugteren 48133a0cd1 Added an option to choose whether to override the MSVC flags from /MT to /MD (default ON) 2017-10-14 16:26:35 +02:00
Cedric Nugteren 313fc796b2 Fixed several (not all) CUDA kernel compilation issues 2017-10-14 16:01:12 +02:00
Cedric Nugteren 74d6e0048c Added DAXPY example for the CUDA API 2017-10-14 12:23:35 +02:00
Cedric Nugteren 54d0c440ce Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
Cedric Nugteren 16b9efd605 Added first untested CUDA sample 2017-10-14 10:50:28 +02:00
Cedric Nugteren 2d7b648a24 Added OpenCL to CUDA translation header for the kernels 2017-10-14 10:49:25 +02:00
Cedric Nugteren cc5b475425 CUDA API now takes context and device in instead of stream 2017-10-12 12:20:43 +02:00
Cedric Nugteren b901809345 Added first (untested) version of a CUDA API 2017-10-11 23:16:57 +02:00
Cedric Nugteren 9224da19ef Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately 2017-10-09 20:06:25 +02:00
Cedric Nugteren 44246053a5 Removed include of clpp11.hpp in places other than utilities.hpp 2017-10-09 19:41:40 +02:00