.. |
database
|
Added tuning results for Intel i5-4970S
|
2018-07-13 21:25:21 +02:00 |
kernels
|
moved a two-line macro to a single line
|
2018-07-16 20:12:30 -04:00 |
pyclblast
|
Updated pyclblast to 1.1.0 and uploaded to PyPi
|
2018-03-30 10:38:36 +02:00 |
routines
|
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present
|
2018-06-01 20:59:44 +02:00 |
tuning
|
Added code to report the average tuning results
|
2018-07-25 22:28:44 +02:00 |
utilities
|
Merge pull request #297 from tyler-utah/master
|
2018-07-23 19:43:03 +02:00 |
api_common.cpp
|
Added a RetrieveParameters function to inspect tuning parameters
|
2018-01-11 20:32:06 +01:00 |
cache.cpp
|
Now stores a shared_ptr to the Program class in the cache
|
2018-05-01 20:34:48 +02:00 |
cache.hpp
|
Now stores a shared_ptr to the Program class in the cache
|
2018-05-01 20:34:48 +02:00 |
clblast.cpp
|
Made GEMM rotation expectations kernel-specific
|
2018-04-13 22:27:11 +02:00 |
clblast_c.cpp
|
Fixed some small issues regarding PR#253
|
2018-03-03 10:43:12 +01:00 |
clblast_cuda.cpp
|
Fixes for the CUDA API
|
2018-04-20 21:50:36 +02:00 |
clblast_netlib_c.cpp
|
Created the API and stubs for the HAD (hadamard-product) routines
|
2018-01-31 20:41:02 +01:00 |
clpp11.hpp
|
Fixed a bug: forgot to initialize the shared pointer for the null kernel
|
2018-07-27 20:53:24 +02:00 |
cupp11.hpp
|
Applied feedback from Cedric from first pull request
|
2018-07-14 19:50:47 -04:00 |
cxpp11_common.hpp
|
Various fixes to make the host code and sample compile with the CUDA API
|
2017-10-14 11:43:57 +02:00 |
kernel_preprocessor.cpp
|
Fixed a warning under MSVC
|
2017-12-23 15:30:08 +01:00 |
kernel_preprocessor.hpp
|
Implemented first simple pre-processor: defines parser and loop unrolling based on assumptions
|
2017-11-25 17:46:01 +01:00 |
routine.cpp
|
Eliminate a temporary Program object
|
2018-07-06 12:58:20 +01:00 |
routine.hpp
|
Now stores a shared_ptr to the Program class in the cache
|
2018-05-01 20:34:48 +02:00 |