CLBlast/src
Cedric Nugteren f8fb707fa4
Merge pull request #297 from tyler-utah/master
inline PTX to support subgroup shuffle for Nvidia GPUs
2018-07-23 19:43:03 +02:00
..
database Added tuning results for Intel i5-4970S 2018-07-13 21:25:21 +02:00
kernels moved a two-line macro to a single line 2018-07-16 20:12:30 -04:00
pyclblast Updated pyclblast to 1.1.0 and uploaded to PyPi 2018-03-30 10:38:36 +02:00
routines Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present 2018-06-01 20:59:44 +02:00
tuning forgot to add test cases back in, oops 2018-07-14 22:47:39 -04:00
utilities Merge pull request #297 from tyler-utah/master 2018-07-23 19:43:03 +02:00
api_common.cpp Added a RetrieveParameters function to inspect tuning parameters 2018-01-11 20:32:06 +01:00
cache.cpp Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00
cache.hpp Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00
clblast.cpp Made GEMM rotation expectations kernel-specific 2018-04-13 22:27:11 +02:00
clblast_c.cpp Fixed some small issues regarding PR#253 2018-03-03 10:43:12 +01:00
clblast_cuda.cpp Fixes for the CUDA API 2018-04-20 21:50:36 +02:00
clblast_netlib_c.cpp Created the API and stubs for the HAD (hadamard-product) routines 2018-01-31 20:41:02 +01:00
clpp11.hpp Applied feedback from Cedric from first pull request 2018-07-14 19:50:47 -04:00
cupp11.hpp Applied feedback from Cedric from first pull request 2018-07-14 19:50:47 -04:00
cxpp11_common.hpp Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
kernel_preprocessor.cpp Fixed a warning under MSVC 2017-12-23 15:30:08 +01:00
kernel_preprocessor.hpp Implemented first simple pre-processor: defines parser and loop unrolling based on assumptions 2017-11-25 17:46:01 +01:00
routine.cpp Eliminate a temporary Program object 2018-07-06 12:58:20 +01:00
routine.hpp Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00