CLBlast

mirror of https://github.com/CNugteren/CLBlast.git synced 2024-07-15 19:05:44 +02:00

Author	SHA1	Message	Date
Cedric Nugteren	472f90501c	Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570	2017-10-20 18:06:12 +02:00
Cedric Nugteren	42dcd8fd8a	Merge pull request #204 from CNugteren/cuda_api Cuda API to CLBlast	2017-10-20 12:07:30 +02:00
Cedric Nugteren	363568787e	Moved CUmodule code from Kernel to Program class to not require re-compilation every time	2017-10-18 18:17:30 +02:00
Cedric Nugteren	9d879c949a	Fix an incompatibility with CUDA's FP16 definition	2017-10-17 20:29:23 +02:00
Cedric Nugteren	b1270f04b8	Made buffers of batched routines read/write (was: read-only)	2017-10-17 19:56:47 +02:00
Cedric Nugteren	f349731d54	CUDA kernel compilation fixes	2017-10-17 19:53:09 +02:00
Cedric Nugteren	03760f80eb	Added CUDA API documentation	2017-10-16 21:54:42 +02:00
Cedric Nugteren	0719f14486	Made all CUDA kernel launches synchronous; removed exception raising	2017-10-16 21:54:42 +02:00
Cedric Nugteren	d62823f067	Added a missing OpenCL-to-CUDA function translation	2017-10-15 19:53:52 +02:00
Cedric Nugteren	8431a165d0	Fixed a small copy-paste typo	2017-10-15 19:38:48 +02:00
Cedric Nugteren	e6da575fff	Modified test interfaces such that they support either OpenCL or CUDA	2017-10-15 19:35:21 +02:00
Cedric Nugteren	7663cba234	Fixes for the CUDA API: first tests pass and the client runs	2017-10-15 17:43:20 +02:00
Cedric Nugteren	71049e8d39	Added the SM-compute-arch version to the nv compile options	2017-10-15 17:41:44 +02:00
Cedric Nugteren	a3069a97c3	Prepared test and client infrastructure for use with the CUDA API	2017-10-15 13:56:19 +02:00
Cedric Nugteren	7408da174c	Various fixes to make the first CUDA examples work	2017-10-15 12:17:35 +02:00
Cedric Nugteren	55a802c63d	Fixed a kernel/attribute order bug in the direct GEMM kernels	2017-10-14 17:21:34 +02:00
Cedric Nugteren	b06bc01da9	Make local memory pointers a define in OpenCL; some fixes to the recently changed transpose kernel code	2017-10-14 17:13:54 +02:00
Cedric Nugteren	d9456306e0	Made transpose kernel struct init proper according to the C standard	2017-10-14 16:48:06 +02:00
Cedric Nugteren	48133a0cd1	Added an option to choose whether to override the MSVC flags from /MT to /MD (default ON)	2017-10-14 16:26:35 +02:00
Cedric Nugteren	313fc796b2	Fixed several (not all) CUDA kernel compilation issues	2017-10-14 16:01:12 +02:00
Cedric Nugteren	74d6e0048c	Added DAXPY example for the CUDA API	2017-10-14 12:23:35 +02:00
Cedric Nugteren	54d0c440ce	Various fixes to make the host code and sample compile with the CUDA API	2017-10-14 11:43:57 +02:00
Cedric Nugteren	16b9efd605	Added first untested CUDA sample	2017-10-14 10:50:28 +02:00
Cedric Nugteren	2d7b648a24	Added OpenCL to CUDA translation header for the kernels	2017-10-14 10:49:25 +02:00
Cedric Nugteren	cc5b475425	CUDA API now takes context and device in instead of stream	2017-10-12 12:20:43 +02:00
Cedric Nugteren	b901809345	Added first (untested) version of a CUDA API	2017-10-11 23:16:57 +02:00
Cedric Nugteren	9224da19ef	Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately	2017-10-09 20:06:25 +02:00
Cedric Nugteren	44246053a5	Removed include of clpp11.hpp in places other than utilities.hpp	2017-10-09 19:41:40 +02:00
Cedric Nugteren	e8f1de0265	Made the half-precision header OpenCL-independent	2017-10-09 18:30:19 +02:00
Cedric Nugteren	df3c9f4a8a	Moved non-routine-specific API functions and includes to separate files	2017-10-08 21:52:02 +02:00
Cedric Nugteren	2bb8402ec1	Merge pull request #198 from CNugteren/cuda_api_preparation Cuda API preparation	2017-10-08 12:03:15 +02:00
Cedric Nugteren	3598762029	Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs	2017-10-08 10:29:47 +02:00
Cedric Nugteren	6d3e1212f0	Synchronizes clpp11.h with CLCudaAPI 9.0	2017-10-07 18:43:29 +02:00
Cedric Nugteren	b2058320d1	Merge pull request #197 from CNugteren/single_temporary_gemm_buffer Single temporary GEMM buffer	2017-10-07 18:41:46 +02:00
Cedric Nugteren	86b80cdc98	Fixed a small typo	2017-10-07 18:39:32 +02:00
Cedric Nugteren	375193fe4e	Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers	2017-10-03 21:55:21 +02:00
Cedric Nugteren	74fd6767b9	GEMM tests now test both the in-direct and the direct kernels seperately	2017-10-01 20:36:56 +02:00
Cedric Nugteren	6b226028d5	Allow OverrideParameters function to work before a kernel was first used	2017-10-01 20:32:39 +02:00
Cedric Nugteren	1009303717	Merge branch 'additional_tuners'	2017-09-30 21:04:32 +02:00
Cedric Nugteren	c86ba85541	Merge pull request #196 from CNugteren/preparation_for_size_specific_parameters Preparation for size specific parameters	2017-09-30 21:03:35 +02:00
Cedric Nugteren	29c5283c4b	Kernels are now cached based on their routine name and their tuning parameters	2017-09-30 20:29:18 +02:00
Cedric Nugteren	3b7371f81b	Merge branch 'master' into preparation_for_size_specific_parameters	2017-09-30 20:26:50 +02:00
Cedric Nugteren	c151ab1325	Refactored the tuning architecture: less duplicate now; more defaults	2017-09-30 20:26:26 +02:00
Cedric Nugteren	ef082bba0d	Fixed a minor appveyor artifact issue	2017-09-30 17:33:37 +02:00
Cedric Nugteren	f4c4674cf6	Updated to version 1.1.0	2017-09-30 17:19:17 +02:00
Cedric Nugteren	2949e156f5	Added notes for Android compilation of CLBlast	2017-09-26 21:23:53 +02:00
Cedric Nugteren	00b5771477	Added Android header for compilation with gnustl STL	2017-09-26 21:20:01 +02:00
Cedric Nugteren	21af690472	Added missing headers	2017-09-26 21:17:55 +02:00
Cedric Nugteren	ed980a1df1	Updated database override function to work with the new database storage format	2017-09-24 15:44:14 +02:00
Cedric Nugteren	255f09843c	Made program and binary databases dependent on the routine parameters on top of the name	2017-09-23 20:40:38 +02:00

... 2 3 4 5 6 ...

1043 commits