Matthias Vogelgesang
34e537a5c1
Use GNUInstallDirs to determine install paths
...
The GNUInstallDirs module* provides variables that match the install directories
for GNU Software and allows users to override them. Without hardcoding paths
packagers can choose library paths according to distribution policies (i.e.
lib, lib64, lib<arch>, ...).
* https://cmake.org/cmake/help/v3.0/module/GNUInstallDirs.html
2017-10-23 15:54:55 +02:00
Cedric Nugteren
5fd1f2fc60
Added first version of a roadmap
2017-10-20 18:21:31 +02:00
Cedric Nugteren
472f90501c
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
2017-10-20 18:06:12 +02:00
Cedric Nugteren
42dcd8fd8a
Merge pull request #204 from CNugteren/cuda_api
...
Cuda API to CLBlast
2017-10-20 12:07:30 +02:00
Cedric Nugteren
363568787e
Moved CUmodule code from Kernel to Program class to not require re-compilation every time
2017-10-18 18:17:30 +02:00
Cedric Nugteren
9d879c949a
Fix an incompatibility with CUDA's FP16 definition
2017-10-17 20:29:23 +02:00
Cedric Nugteren
b1270f04b8
Made buffers of batched routines read/write (was: read-only)
2017-10-17 19:56:47 +02:00
Cedric Nugteren
f349731d54
CUDA kernel compilation fixes
2017-10-17 19:53:09 +02:00
Cedric Nugteren
03760f80eb
Added CUDA API documentation
2017-10-16 21:54:42 +02:00
Cedric Nugteren
0719f14486
Made all CUDA kernel launches synchronous; removed exception raising
2017-10-16 21:54:42 +02:00
Cedric Nugteren
d62823f067
Added a missing OpenCL-to-CUDA function translation
2017-10-15 19:53:52 +02:00
Cedric Nugteren
8431a165d0
Fixed a small copy-paste typo
2017-10-15 19:38:48 +02:00
Cedric Nugteren
e6da575fff
Modified test interfaces such that they support either OpenCL or CUDA
2017-10-15 19:35:21 +02:00
Cedric Nugteren
7663cba234
Fixes for the CUDA API: first tests pass and the client runs
2017-10-15 17:43:20 +02:00
Cedric Nugteren
71049e8d39
Added the SM-compute-arch version to the nv compile options
2017-10-15 17:41:44 +02:00
Cedric Nugteren
a3069a97c3
Prepared test and client infrastructure for use with the CUDA API
2017-10-15 13:56:19 +02:00
Cedric Nugteren
7408da174c
Various fixes to make the first CUDA examples work
2017-10-15 12:17:35 +02:00
Cedric Nugteren
55a802c63d
Fixed a kernel/attribute order bug in the direct GEMM kernels
2017-10-14 17:21:34 +02:00
Cedric Nugteren
b06bc01da9
Make local memory pointers a define in OpenCL; some fixes to the recently changed transpose kernel code
2017-10-14 17:13:54 +02:00
Cedric Nugteren
d9456306e0
Made transpose kernel struct init proper according to the C standard
2017-10-14 16:48:06 +02:00
Cedric Nugteren
48133a0cd1
Added an option to choose whether to override the MSVC flags from /MT to /MD (default ON)
2017-10-14 16:26:35 +02:00
Cedric Nugteren
313fc796b2
Fixed several (not all) CUDA kernel compilation issues
2017-10-14 16:01:12 +02:00
Cedric Nugteren
74d6e0048c
Added DAXPY example for the CUDA API
2017-10-14 12:23:35 +02:00
Cedric Nugteren
54d0c440ce
Various fixes to make the host code and sample compile with the CUDA API
2017-10-14 11:43:57 +02:00
Cedric Nugteren
16b9efd605
Added first untested CUDA sample
2017-10-14 10:50:28 +02:00
Cedric Nugteren
2d7b648a24
Added OpenCL to CUDA translation header for the kernels
2017-10-14 10:49:25 +02:00
Cedric Nugteren
cc5b475425
CUDA API now takes context and device in instead of stream
2017-10-12 12:20:43 +02:00
Cedric Nugteren
b901809345
Added first (untested) version of a CUDA API
2017-10-11 23:16:57 +02:00
Cedric Nugteren
9224da19ef
Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately
2017-10-09 20:06:25 +02:00
Cedric Nugteren
44246053a5
Removed include of clpp11.hpp in places other than utilities.hpp
2017-10-09 19:41:40 +02:00
Cedric Nugteren
e8f1de0265
Made the half-precision header OpenCL-independent
2017-10-09 18:30:19 +02:00
Cedric Nugteren
df3c9f4a8a
Moved non-routine-specific API functions and includes to separate files
2017-10-08 21:52:02 +02:00
Cedric Nugteren
2bb8402ec1
Merge pull request #198 from CNugteren/cuda_api_preparation
...
Cuda API preparation
2017-10-08 12:03:15 +02:00
Cedric Nugteren
3598762029
Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs
2017-10-08 10:29:47 +02:00
Cedric Nugteren
6d3e1212f0
Synchronizes clpp11.h with CLCudaAPI 9.0
2017-10-07 18:43:29 +02:00
Cedric Nugteren
b2058320d1
Merge pull request #197 from CNugteren/single_temporary_gemm_buffer
...
Single temporary GEMM buffer
2017-10-07 18:41:46 +02:00
Cedric Nugteren
86b80cdc98
Fixed a small typo
2017-10-07 18:39:32 +02:00
Cedric Nugteren
375193fe4e
Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers
2017-10-03 21:55:21 +02:00
Cedric Nugteren
74fd6767b9
GEMM tests now test both the in-direct and the direct kernels seperately
2017-10-01 20:36:56 +02:00
Cedric Nugteren
6b226028d5
Allow OverrideParameters function to work before a kernel was first used
2017-10-01 20:32:39 +02:00
Cedric Nugteren
1009303717
Merge branch 'additional_tuners'
2017-09-30 21:04:32 +02:00
Cedric Nugteren
c86ba85541
Merge pull request #196 from CNugteren/preparation_for_size_specific_parameters
...
Preparation for size specific parameters
2017-09-30 21:03:35 +02:00
Cedric Nugteren
29c5283c4b
Kernels are now cached based on their routine name and their tuning parameters
2017-09-30 20:29:18 +02:00
Cedric Nugteren
3b7371f81b
Merge branch 'master' into preparation_for_size_specific_parameters
2017-09-30 20:26:50 +02:00
Cedric Nugteren
c151ab1325
Refactored the tuning architecture: less duplicate now; more defaults
2017-09-30 20:26:26 +02:00
Cedric Nugteren
ef082bba0d
Fixed a minor appveyor artifact issue
2017-09-30 17:33:37 +02:00
Cedric Nugteren
f4c4674cf6
Updated to version 1.1.0
2017-09-30 17:19:17 +02:00
Cedric Nugteren
ed980a1df1
Updated database override function to work with the new database storage format
2017-09-24 15:44:14 +02:00
Cedric Nugteren
255f09843c
Made program and binary databases dependent on the routine parameters on top of the name
2017-09-23 20:40:38 +02:00
Cedric Nugteren
0d8313708c
Merge branch 'device_name_slow_on_nvidia_gpu'
2017-09-23 18:12:13 +02:00