Cedric Nugteren
|
472f90501c
|
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
|
2017-10-20 18:06:12 +02:00 |
|
Cedric Nugteren
|
42dcd8fd8a
|
Merge pull request #204 from CNugteren/cuda_api
Cuda API to CLBlast
|
2017-10-20 12:07:30 +02:00 |
|
Cedric Nugteren
|
363568787e
|
Moved CUmodule code from Kernel to Program class to not require re-compilation every time
|
2017-10-18 18:17:30 +02:00 |
|
Cedric Nugteren
|
9d879c949a
|
Fix an incompatibility with CUDA's FP16 definition
|
2017-10-17 20:29:23 +02:00 |
|
Cedric Nugteren
|
b1270f04b8
|
Made buffers of batched routines read/write (was: read-only)
|
2017-10-17 19:56:47 +02:00 |
|
Cedric Nugteren
|
f349731d54
|
CUDA kernel compilation fixes
|
2017-10-17 19:53:09 +02:00 |
|
Cedric Nugteren
|
03760f80eb
|
Added CUDA API documentation
|
2017-10-16 21:54:42 +02:00 |
|
Cedric Nugteren
|
0719f14486
|
Made all CUDA kernel launches synchronous; removed exception raising
|
2017-10-16 21:54:42 +02:00 |
|
Cedric Nugteren
|
d62823f067
|
Added a missing OpenCL-to-CUDA function translation
|
2017-10-15 19:53:52 +02:00 |
|
Cedric Nugteren
|
8431a165d0
|
Fixed a small copy-paste typo
|
2017-10-15 19:38:48 +02:00 |
|
Cedric Nugteren
|
e6da575fff
|
Modified test interfaces such that they support either OpenCL or CUDA
|
2017-10-15 19:35:21 +02:00 |
|
Cedric Nugteren
|
7663cba234
|
Fixes for the CUDA API: first tests pass and the client runs
|
2017-10-15 17:43:20 +02:00 |
|
Cedric Nugteren
|
71049e8d39
|
Added the SM-compute-arch version to the nv compile options
|
2017-10-15 17:41:44 +02:00 |
|
Cedric Nugteren
|
a3069a97c3
|
Prepared test and client infrastructure for use with the CUDA API
|
2017-10-15 13:56:19 +02:00 |
|
Cedric Nugteren
|
7408da174c
|
Various fixes to make the first CUDA examples work
|
2017-10-15 12:17:35 +02:00 |
|
Cedric Nugteren
|
55a802c63d
|
Fixed a kernel/attribute order bug in the direct GEMM kernels
|
2017-10-14 17:21:34 +02:00 |
|
Cedric Nugteren
|
b06bc01da9
|
Make local memory pointers a define in OpenCL; some fixes to the recently changed transpose kernel code
|
2017-10-14 17:13:54 +02:00 |
|
Cedric Nugteren
|
d9456306e0
|
Made transpose kernel struct init proper according to the C standard
|
2017-10-14 16:48:06 +02:00 |
|
Cedric Nugteren
|
48133a0cd1
|
Added an option to choose whether to override the MSVC flags from /MT to /MD (default ON)
|
2017-10-14 16:26:35 +02:00 |
|
Cedric Nugteren
|
313fc796b2
|
Fixed several (not all) CUDA kernel compilation issues
|
2017-10-14 16:01:12 +02:00 |
|
Cedric Nugteren
|
74d6e0048c
|
Added DAXPY example for the CUDA API
|
2017-10-14 12:23:35 +02:00 |
|
Cedric Nugteren
|
54d0c440ce
|
Various fixes to make the host code and sample compile with the CUDA API
|
2017-10-14 11:43:57 +02:00 |
|
Cedric Nugteren
|
16b9efd605
|
Added first untested CUDA sample
|
2017-10-14 10:50:28 +02:00 |
|
Cedric Nugteren
|
2d7b648a24
|
Added OpenCL to CUDA translation header for the kernels
|
2017-10-14 10:49:25 +02:00 |
|
Cedric Nugteren
|
cc5b475425
|
CUDA API now takes context and device in instead of stream
|
2017-10-12 12:20:43 +02:00 |
|
Cedric Nugteren
|
b901809345
|
Added first (untested) version of a CUDA API
|
2017-10-11 23:16:57 +02:00 |
|
Cedric Nugteren
|
9224da19ef
|
Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately
|
2017-10-09 20:06:25 +02:00 |
|
Cedric Nugteren
|
44246053a5
|
Removed include of clpp11.hpp in places other than utilities.hpp
|
2017-10-09 19:41:40 +02:00 |
|
Cedric Nugteren
|
e8f1de0265
|
Made the half-precision header OpenCL-independent
|
2017-10-09 18:30:19 +02:00 |
|
Cedric Nugteren
|
df3c9f4a8a
|
Moved non-routine-specific API functions and includes to separate files
|
2017-10-08 21:52:02 +02:00 |
|
Cedric Nugteren
|
2bb8402ec1
|
Merge pull request #198 from CNugteren/cuda_api_preparation
Cuda API preparation
|
2017-10-08 12:03:15 +02:00 |
|
Cedric Nugteren
|
3598762029
|
Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs
|
2017-10-08 10:29:47 +02:00 |
|
Cedric Nugteren
|
6d3e1212f0
|
Synchronizes clpp11.h with CLCudaAPI 9.0
|
2017-10-07 18:43:29 +02:00 |
|
Cedric Nugteren
|
b2058320d1
|
Merge pull request #197 from CNugteren/single_temporary_gemm_buffer
Single temporary GEMM buffer
|
2017-10-07 18:41:46 +02:00 |
|
Cedric Nugteren
|
86b80cdc98
|
Fixed a small typo
|
2017-10-07 18:39:32 +02:00 |
|
Cedric Nugteren
|
375193fe4e
|
Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers
|
2017-10-03 21:55:21 +02:00 |
|
Cedric Nugteren
|
74fd6767b9
|
GEMM tests now test both the in-direct and the direct kernels seperately
|
2017-10-01 20:36:56 +02:00 |
|
Cedric Nugteren
|
6b226028d5
|
Allow OverrideParameters function to work before a kernel was first used
|
2017-10-01 20:32:39 +02:00 |
|
Cedric Nugteren
|
1009303717
|
Merge branch 'additional_tuners'
|
2017-09-30 21:04:32 +02:00 |
|
Cedric Nugteren
|
c86ba85541
|
Merge pull request #196 from CNugteren/preparation_for_size_specific_parameters
Preparation for size specific parameters
|
2017-09-30 21:03:35 +02:00 |
|
Cedric Nugteren
|
29c5283c4b
|
Kernels are now cached based on their routine name and their tuning parameters
|
2017-09-30 20:29:18 +02:00 |
|
Cedric Nugteren
|
3b7371f81b
|
Merge branch 'master' into preparation_for_size_specific_parameters
|
2017-09-30 20:26:50 +02:00 |
|
Cedric Nugteren
|
c151ab1325
|
Refactored the tuning architecture: less duplicate now; more defaults
|
2017-09-30 20:26:26 +02:00 |
|
Cedric Nugteren
|
ef082bba0d
|
Fixed a minor appveyor artifact issue
|
2017-09-30 17:33:37 +02:00 |
|
Cedric Nugteren
|
f4c4674cf6
|
Updated to version 1.1.0
|
2017-09-30 17:19:17 +02:00 |
|
Cedric Nugteren
|
2949e156f5
|
Added notes for Android compilation of CLBlast
|
2017-09-26 21:23:53 +02:00 |
|
Cedric Nugteren
|
00b5771477
|
Added Android header for compilation with gnustl STL
|
2017-09-26 21:20:01 +02:00 |
|
Cedric Nugteren
|
21af690472
|
Added missing headers
|
2017-09-26 21:17:55 +02:00 |
|
Cedric Nugteren
|
ed980a1df1
|
Updated database override function to work with the new database storage format
|
2017-09-24 15:44:14 +02:00 |
|
Cedric Nugteren
|
255f09843c
|
Made program and binary databases dependent on the routine parameters on top of the name
|
2017-09-23 20:40:38 +02:00 |
|