CLBlast/src
2017-12-23 15:30:08 +01:00
..
database Updated the database to use the new TRSV and Invert tuners 2017-12-23 13:55:22 +01:00
kernels Split the invert kernel in two parts to prevent error C1091 in MSVC 2013 2017-12-23 14:18:07 +01:00
routines Split the invert kernel in two parts to prevent error C1091 in MSVC 2013 2017-12-23 14:18:07 +01:00
tuning Now calling main TRSV routine again to fix compilation in MSVC 2017-12-23 14:49:21 +01:00
utilities Added TRSV block-size tuner 2017-12-23 13:34:57 +01:00
api_common.cpp Added first (untested) version of a CUDA API 2017-10-11 23:16:57 +02:00
cache.cpp Made RemoveBySubset from the cache work with references to keys 2017-02-12 11:58:20 +01:00
cache.hpp Added platform ID to the binary program cache to prevent issues with multi-platform systems 2017-10-29 20:01:30 +01:00
clblast.cpp Moved non-routine-specific API functions and includes to separate files 2017-10-08 21:52:02 +02:00
clblast_c.cpp Added interface and stubs for the im2col routine 2017-07-02 12:10:22 +02:00
clblast_cuda.cpp Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
clblast_netlib_c.cpp Added interface and stubs for the im2col routine 2017-07-02 12:10:22 +02:00
clpp11.hpp Made the pre-processor run by default for ARM and Qualcomm GPUs 2017-12-09 15:16:53 +01:00
cupp11.hpp Made the pre-processor run by default for ARM and Qualcomm GPUs 2017-12-09 15:16:53 +01:00
cxpp11_common.hpp Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
kernel_preprocessor.cpp Fixed a warning under MSVC 2017-12-23 15:30:08 +01:00
kernel_preprocessor.hpp Implemented first simple pre-processor: defines parser and loop unrolling based on assumptions 2017-11-25 17:46:01 +01:00
routine.cpp Made the pre-processor run by default for ARM and Qualcomm GPUs 2017-12-09 15:16:53 +01:00
routine.hpp Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs 2017-10-08 10:29:47 +02:00