Commit Graph

44 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Koichi Akabe 032e3b0cc0 Add kernel_mode option to im2col, col2im, and convgemm functions 2018-11-12 10:12:07 +09:00
Cedric Nugteren d45911b61d Added groundwork for col2im algorithm plus first non-working version of kernel and test 2018-10-23 20:52:25 +02:00
Cedric Nugteren 2dd539f911 Removed complex numbers support for CONVGEMM 2018-07-29 10:37:14 +02:00
Cedric Nugteren 2d1f6ba7fe Added convgemm skeleton, test infrastructure, and first reference implementation 2018-05-06 11:35:34 +02:00
Cedric Nugteren 2776d76176 Added interface of batched convolution as GEMM 2018-05-05 14:06:33 +02:00
Cedric Nugteren 0dff7f1ac4 Made GEMM rotation expectations kernel-specific 2018-04-13 22:27:11 +02:00
Cedric Nugteren ef5008f5e4 Created the API and stubs for the HAD (hadamard-product) routines 2018-01-31 20:41:02 +01:00
Cedric Nugteren 9fb2c61b25 Added API and tests for new GemmStridedBatched routine 2018-01-07 14:27:15 +01:00
Cedric Nugteren ce069545d4 Added CUDA interface to get temporary-buffer size for GEMM routine 2018-01-06 10:05:28 +01:00
Cedric Nugteren af14fff1e9 Updated the generator script to automatically generate the temp-buffer code 2018-01-04 19:31:57 +01:00
Cedric Nugteren ad1227c4f2 Added optional temp-buffer argument to C++ interface of GEMM 2017-12-30 18:45:06 +01:00
Cedric Nugteren 6d1e30e61f Added interface to compute the required temporary buffer size for GEMM 2017-12-28 14:46:45 +01:00
Cedric Nugteren df3c9f4a8a Moved non-routine-specific API functions and includes to separate files 2017-10-08 21:52:02 +02:00
Cedric Nugteren 6d3e1212f0 Synchronizes clpp11.h with CLCudaAPI 9.0 2017-10-07 18:43:29 +02:00
Cedric Nugteren 6b226028d5 Allow OverrideParameters function to work before a kernel was first used 2017-10-01 20:32:39 +02:00
Cedric Nugteren ed980a1df1 Updated database override function to work with the new database storage format 2017-09-24 15:44:14 +02:00
Cedric Nugteren 890281f3e8 Made database-caching no longer dependent on device name but on device/platform IDs 2017-09-23 17:50:44 +02:00
Cedric Nugteren 4e317f5e85 Improved compilation time of the tuner database 2017-09-16 18:02:37 +02:00
Cedric Nugteren 0d13d814c2 Added architecture layer in the tuning database for better performance on unseen devices 2017-09-14 21:27:33 +02:00
Cedric Nugteren 20da5e33a8 Split the database files over multiple directories and files; first step towards separate compilation 2017-09-06 21:50:42 +02:00
Cedric Nugteren 84ec50e29d Added interface and stubs for the im2col routine 2017-07-02 12:10:22 +02:00
Cedric Nugteren 615a7fdc81 Fixes some compilation issues related to the database structure change 2017-06-21 23:07:47 +02:00
Kirill Mavreshko 628e1e8cce Fixes inability to run GEMM on multiple identical GPUs (issue #155) 2017-05-26 15:04:19 +05:00
Cedric Nugteren f151e56daa Added the IxAMIN routines: absolute minimum version of IxAMAX 2017-05-12 20:01:33 -07:00
Cedric Nugteren 2d45c37676 Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard 2017-04-10 07:40:27 +02:00
Cedric Nugteren 49e04c7fce Added API and test infrastructure for the batched GEMM routine 2017-03-10 21:24:35 +01:00
Cedric Nugteren fa0a9c689f Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes 2017-03-08 20:10:20 +01:00
Cedric Nugteren b114ea49a9 Added first naive version of the batched AXPY routine 2017-03-05 15:06:14 +01:00
Cedric Nugteren f9a520b3af Prepared generator for batched routines; added batched AXPY routine interface 2017-03-05 10:38:38 +01:00
Cedric Nugteren ea6790665d Merge branch 'development' into triangular_solvers 2017-02-26 14:51:45 +01:00
Cedric Nugteren b7310036ed Removed half-precision support from the TRSM routine; too unstable 2017-02-26 12:56:21 +01:00
Cedric Nugteren cda449a5c3 Added a C interface to the OverrideParameters function; added some in-line comments to the API 2017-02-16 21:14:48 +01:00
Cedric Nugteren 08bfb75a9d Added input-sanity checks for the OverrideParameters function 2017-02-16 21:12:50 +01:00
Cedric Nugteren cdb3bb7166 Added first version of the OverrideParameters function 2017-02-13 20:53:06 +01:00
Cedric Nugteren c248f900c0 Merge branch 'development' into triangular_solvers 2017-02-05 22:18:59 +01:00
Ivan Shapovalov 5bcd92f297 Routine, Cache: generalize, reduce amount of copying in fast path
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov 1b8e816333 FillCache: perform compilation for each precision separately
Thus do not prevent filling cache for float if the device does not support
e. g. double.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov 1a1e863ab3 treewide: include clpp11.hpp first to silence deprecation warnings
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Cedric Nugteren a5fd2323b6 Added prototype for the TRSV routine 2017-01-20 11:30:32 +01:00
Cedric Nugteren 681a465b35 Prepared for the addition of the TRSM triangular solver kernel 2016-12-18 12:30:16 +01:00
Ivan Shapovalov 56f300607b Routine: get rid of ::SetUp()
Since we now use C++ exceptions inside the implementation (and exceptions
can be thrown from constructors), there is no need for a separate
Routine::SetUp() function.

For this, we also change the way how the kernel source string is constructed.
The kernel-specific source code is now passed to the Routine ctor via
an initializer_list of C strings to avoid unnecessary data copying
while also working around C1091 of MSVC 2013.
2016-10-22 08:45:27 +03:00
Ivan Shapovalov b98af44fcf treewide: use C++ exceptions properly
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.

Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.

However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren b330ab0866 Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library 2016-06-30 10:49:17 +02:00
Cedric Nugteren 61203453aa Renamed all C++ source files to .cpp to match the .hpp extension better 2016-06-19 13:55:49 +02:00