CLBlast

Commit Graph

Author	SHA1	Message	Date
Cedric Nugteren	9ab1bf24e2	Fix API inconsistency in cupp11.hpp The function `CopyToAsync` has an optional event argument in the OpenCL version, which is used in CLBlast. This makes the code not compile at all if CUDA (through cupp11.hpp`) is used as backend. This issue was found by a CLBlast user and reported privately by email. This PR should fix that.	2022-05-23 12:45:22 +02:00
Tyler Sorensen	7709a7308b	Applied feedback from Cedric from first pull request	2018-07-14 19:50:47 -04:00
Cedric Nugteren	bd1715aff9	Fixes for CUDA version of CLBlast	2018-06-03 10:41:57 +02:00
Cedric Nugteren	ad1227c4f2	Added optional temp-buffer argument to C++ interface of GEMM	2017-12-30 18:45:06 +01:00
Cedric Nugteren	ca5dbcd2bd	Made the pre-processor run by default for ARM and Qualcomm GPUs	2017-12-09 15:16:53 +01:00
Cedric Nugteren	0f080bbc6e	Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated	2017-11-20 20:54:18 +01:00
Cedric Nugteren	a3a8b44f59	Some fixed for the new auto-tuner to be compatible with the Python scripts	2017-11-19 16:31:08 +01:00
Cedric Nugteren	363568787e	Moved CUmodule code from Kernel to Program class to not require re-compilation every time	2017-10-18 18:17:30 +02:00
Cedric Nugteren	9d879c949a	Fix an incompatibility with CUDA's FP16 definition	2017-10-17 20:29:23 +02:00
Cedric Nugteren	0719f14486	Made all CUDA kernel launches synchronous; removed exception raising	2017-10-16 21:54:42 +02:00
Cedric Nugteren	71049e8d39	Added the SM-compute-arch version to the nv compile options	2017-10-15 17:41:44 +02:00
Cedric Nugteren	7408da174c	Various fixes to make the first CUDA examples work	2017-10-15 12:17:35 +02:00
Cedric Nugteren	54d0c440ce	Various fixes to make the host code and sample compile with the CUDA API	2017-10-14 11:43:57 +02:00
Cedric Nugteren	b901809345	Added first (untested) version of a CUDA API	2017-10-11 23:16:57 +02:00

14 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)