Commit graph

1096 commits

Author SHA1 Message Date
Cedric Nugteren 42ac3b4748 Merge pull request #206 from matze/use-gnuinstall-dirs
Use GNUInstallDirs to determine install paths
2017-10-23 20:03:47 +02:00
Matthias Vogelgesang 34e537a5c1 Use GNUInstallDirs to determine install paths
The GNUInstallDirs module* provides variables that match the install directories
for GNU Software and allows users to override them. Without hardcoding paths
packagers can choose library paths according to distribution policies (i.e.
lib, lib64, lib<arch>, ...).

* https://cmake.org/cmake/help/v3.0/module/GNUInstallDirs.html
2017-10-23 15:54:55 +02:00
Cedric Nugteren 5fd1f2fc60 Added first version of a roadmap 2017-10-20 18:21:31 +02:00
Cedric Nugteren 472f90501c Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570 2017-10-20 18:06:12 +02:00
Cedric Nugteren 42dcd8fd8a Merge pull request #204 from CNugteren/cuda_api
Cuda API to CLBlast
2017-10-20 12:07:30 +02:00
Cedric Nugteren 363568787e Moved CUmodule code from Kernel to Program class to not require re-compilation every time 2017-10-18 18:17:30 +02:00
Cedric Nugteren 9d879c949a Fix an incompatibility with CUDA's FP16 definition 2017-10-17 20:29:23 +02:00
Cedric Nugteren b1270f04b8 Made buffers of batched routines read/write (was: read-only) 2017-10-17 19:56:47 +02:00
Cedric Nugteren f349731d54 CUDA kernel compilation fixes 2017-10-17 19:53:09 +02:00
Cedric Nugteren 03760f80eb Added CUDA API documentation 2017-10-16 21:54:42 +02:00
Cedric Nugteren 0719f14486 Made all CUDA kernel launches synchronous; removed exception raising 2017-10-16 21:54:42 +02:00
Cedric Nugteren d62823f067 Added a missing OpenCL-to-CUDA function translation 2017-10-15 19:53:52 +02:00
Cedric Nugteren 8431a165d0 Fixed a small copy-paste typo 2017-10-15 19:38:48 +02:00
Cedric Nugteren e6da575fff Modified test interfaces such that they support either OpenCL or CUDA 2017-10-15 19:35:21 +02:00
Cedric Nugteren 7663cba234 Fixes for the CUDA API: first tests pass and the client runs 2017-10-15 17:43:20 +02:00
Cedric Nugteren 71049e8d39 Added the SM-compute-arch version to the nv compile options 2017-10-15 17:41:44 +02:00
Cedric Nugteren a3069a97c3 Prepared test and client infrastructure for use with the CUDA API 2017-10-15 13:56:19 +02:00
Cedric Nugteren 7408da174c Various fixes to make the first CUDA examples work 2017-10-15 12:17:35 +02:00
Cedric Nugteren 55a802c63d Fixed a kernel/attribute order bug in the direct GEMM kernels 2017-10-14 17:21:34 +02:00
Cedric Nugteren b06bc01da9 Make local memory pointers a define in OpenCL; some fixes to the recently changed transpose kernel code 2017-10-14 17:13:54 +02:00
Cedric Nugteren d9456306e0 Made transpose kernel struct init proper according to the C standard 2017-10-14 16:48:06 +02:00
Cedric Nugteren 48133a0cd1 Added an option to choose whether to override the MSVC flags from /MT to /MD (default ON) 2017-10-14 16:26:35 +02:00
Cedric Nugteren 313fc796b2 Fixed several (not all) CUDA kernel compilation issues 2017-10-14 16:01:12 +02:00
Cedric Nugteren 74d6e0048c Added DAXPY example for the CUDA API 2017-10-14 12:23:35 +02:00
Cedric Nugteren 54d0c440ce Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
Cedric Nugteren 16b9efd605 Added first untested CUDA sample 2017-10-14 10:50:28 +02:00
Cedric Nugteren 2d7b648a24 Added OpenCL to CUDA translation header for the kernels 2017-10-14 10:49:25 +02:00
Cedric Nugteren cc5b475425 CUDA API now takes context and device in instead of stream 2017-10-12 12:20:43 +02:00
Cedric Nugteren b901809345 Added first (untested) version of a CUDA API 2017-10-11 23:16:57 +02:00
Cedric Nugteren 9224da19ef Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately 2017-10-09 20:06:25 +02:00
Cedric Nugteren 44246053a5 Removed include of clpp11.hpp in places other than utilities.hpp 2017-10-09 19:41:40 +02:00
Cedric Nugteren e8f1de0265 Made the half-precision header OpenCL-independent 2017-10-09 18:30:19 +02:00
Cedric Nugteren df3c9f4a8a Moved non-routine-specific API functions and includes to separate files 2017-10-08 21:52:02 +02:00
Cedric Nugteren 2bb8402ec1 Merge pull request #198 from CNugteren/cuda_api_preparation
Cuda API preparation
2017-10-08 12:03:15 +02:00
Cedric Nugteren 3598762029 Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs 2017-10-08 10:29:47 +02:00
Cedric Nugteren 6d3e1212f0 Synchronizes clpp11.h with CLCudaAPI 9.0 2017-10-07 18:43:29 +02:00
Cedric Nugteren b2058320d1 Merge pull request #197 from CNugteren/single_temporary_gemm_buffer
Single temporary GEMM buffer
2017-10-07 18:41:46 +02:00
Cedric Nugteren 86b80cdc98 Fixed a small typo 2017-10-07 18:39:32 +02:00
Cedric Nugteren 375193fe4e Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers 2017-10-03 21:55:21 +02:00
Cedric Nugteren 74fd6767b9 GEMM tests now test both the in-direct and the direct kernels seperately 2017-10-01 20:36:56 +02:00
Cedric Nugteren 6b226028d5 Allow OverrideParameters function to work before a kernel was first used 2017-10-01 20:32:39 +02:00
Cedric Nugteren 1009303717 Merge branch 'additional_tuners' 2017-09-30 21:04:32 +02:00
Cedric Nugteren c86ba85541 Merge pull request #196 from CNugteren/preparation_for_size_specific_parameters
Preparation for size specific parameters
2017-09-30 21:03:35 +02:00
Cedric Nugteren 29c5283c4b Kernels are now cached based on their routine name and their tuning parameters 2017-09-30 20:29:18 +02:00
Cedric Nugteren 3b7371f81b Merge branch 'master' into preparation_for_size_specific_parameters 2017-09-30 20:26:50 +02:00
Cedric Nugteren c151ab1325 Refactored the tuning architecture: less duplicate now; more defaults 2017-09-30 20:26:26 +02:00
Cedric Nugteren ef082bba0d Fixed a minor appveyor artifact issue 2017-09-30 17:33:37 +02:00
Cedric Nugteren f4c4674cf6 Updated to version 1.1.0 2017-09-30 17:19:17 +02:00
Cedric Nugteren 2949e156f5 Added notes for Android compilation of CLBlast 2017-09-26 21:23:53 +02:00
Cedric Nugteren 00b5771477 Added Android header for compilation with gnustl STL 2017-09-26 21:20:01 +02:00