CLBlast

mirror of https://github.com/CNugteren/CLBlast.git synced 2024-07-16 03:15:41 +02:00

History

Cedric Nugteren 44f7fa628a Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM		2017-10-27 22:01:15 +02:00
..
database	Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570	2017-10-20 18:06:12 +02:00
kernels	CUDA kernel compilation fixes	2017-10-17 19:53:09 +02:00
routines	Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM	2017-10-27 22:01:15 +02:00
tuning	Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers	2017-10-03 21:55:21 +02:00
utilities	Various fixes to make the first CUDA examples work	2017-10-15 12:17:35 +02:00
api_common.cpp	Added first (untested) version of a CUDA API	2017-10-11 23:16:57 +02:00
cache.cpp	Made RemoveBySubset from the cache work with references to keys	2017-02-12 11:58:20 +01:00
cache.hpp	Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs	2017-10-08 10:29:47 +02:00
clblast.cpp	Moved non-routine-specific API functions and includes to separate files	2017-10-08 21:52:02 +02:00
clblast_c.cpp	Added interface and stubs for the im2col routine	2017-07-02 12:10:22 +02:00
clblast_cuda.cpp	Various fixes to make the host code and sample compile with the CUDA API	2017-10-14 11:43:57 +02:00
clblast_netlib_c.cpp	Added interface and stubs for the im2col routine	2017-07-02 12:10:22 +02:00
clpp11.hpp	Made buffers of batched routines read/write (was: read-only)	2017-10-17 19:56:47 +02:00
cupp11.hpp	Moved CUmodule code from Kernel to Program class to not require re-compilation every time	2017-10-18 18:17:30 +02:00
cxpp11_common.hpp	Various fixes to make the host code and sample compile with the CUDA API	2017-10-14 11:43:57 +02:00
routine.cpp	Added OpenCL to CUDA translation header for the kernels	2017-10-14 10:49:25 +02:00
routine.hpp	Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs	2017-10-08 10:29:47 +02:00