CLBlast/src
2017-10-27 22:01:15 +02:00
..
database Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570 2017-10-20 18:06:12 +02:00
kernels CUDA kernel compilation fixes 2017-10-17 19:53:09 +02:00
routines Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM 2017-10-27 22:01:15 +02:00
tuning Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers 2017-10-03 21:55:21 +02:00
utilities Various fixes to make the first CUDA examples work 2017-10-15 12:17:35 +02:00
api_common.cpp Added first (untested) version of a CUDA API 2017-10-11 23:16:57 +02:00
cache.cpp Made RemoveBySubset from the cache work with references to keys 2017-02-12 11:58:20 +01:00
cache.hpp Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs 2017-10-08 10:29:47 +02:00
clblast.cpp Moved non-routine-specific API functions and includes to separate files 2017-10-08 21:52:02 +02:00
clblast_c.cpp Added interface and stubs for the im2col routine 2017-07-02 12:10:22 +02:00
clblast_cuda.cpp Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
clblast_netlib_c.cpp Added interface and stubs for the im2col routine 2017-07-02 12:10:22 +02:00
clpp11.hpp Made buffers of batched routines read/write (was: read-only) 2017-10-17 19:56:47 +02:00
cupp11.hpp Moved CUmodule code from Kernel to Program class to not require re-compilation every time 2017-10-18 18:17:30 +02:00
cxpp11_common.hpp Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
routine.cpp Added OpenCL to CUDA translation header for the kernels 2017-10-14 10:49:25 +02:00
routine.hpp Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs 2017-10-08 10:29:47 +02:00