Commit graph

444 commits

Author SHA1 Message Date
Cedric Nugteren 44f7fa628a Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM 2017-10-27 22:01:15 +02:00
Cedric Nugteren d49aae236e Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls 2017-10-25 20:35:39 +02:00
Cedric Nugteren 472f90501c Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570 2017-10-20 18:06:12 +02:00
Cedric Nugteren 363568787e Moved CUmodule code from Kernel to Program class to not require re-compilation every time 2017-10-18 18:17:30 +02:00
Cedric Nugteren 9d879c949a Fix an incompatibility with CUDA's FP16 definition 2017-10-17 20:29:23 +02:00
Cedric Nugteren b1270f04b8 Made buffers of batched routines read/write (was: read-only) 2017-10-17 19:56:47 +02:00
Cedric Nugteren f349731d54 CUDA kernel compilation fixes 2017-10-17 19:53:09 +02:00
Cedric Nugteren 0719f14486 Made all CUDA kernel launches synchronous; removed exception raising 2017-10-16 21:54:42 +02:00
Cedric Nugteren d62823f067 Added a missing OpenCL-to-CUDA function translation 2017-10-15 19:53:52 +02:00
Cedric Nugteren 7663cba234 Fixes for the CUDA API: first tests pass and the client runs 2017-10-15 17:43:20 +02:00
Cedric Nugteren 71049e8d39 Added the SM-compute-arch version to the nv compile options 2017-10-15 17:41:44 +02:00
Cedric Nugteren 7408da174c Various fixes to make the first CUDA examples work 2017-10-15 12:17:35 +02:00
Cedric Nugteren 55a802c63d Fixed a kernel/attribute order bug in the direct GEMM kernels 2017-10-14 17:21:34 +02:00
Cedric Nugteren b06bc01da9 Make local memory pointers a define in OpenCL; some fixes to the recently changed transpose kernel code 2017-10-14 17:13:54 +02:00
Cedric Nugteren d9456306e0 Made transpose kernel struct init proper according to the C standard 2017-10-14 16:48:06 +02:00
Cedric Nugteren 313fc796b2 Fixed several (not all) CUDA kernel compilation issues 2017-10-14 16:01:12 +02:00
Cedric Nugteren 54d0c440ce Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
Cedric Nugteren 2d7b648a24 Added OpenCL to CUDA translation header for the kernels 2017-10-14 10:49:25 +02:00
Cedric Nugteren cc5b475425 CUDA API now takes context and device in instead of stream 2017-10-12 12:20:43 +02:00
Cedric Nugteren b901809345 Added first (untested) version of a CUDA API 2017-10-11 23:16:57 +02:00
Cedric Nugteren 44246053a5 Removed include of clpp11.hpp in places other than utilities.hpp 2017-10-09 19:41:40 +02:00
Cedric Nugteren df3c9f4a8a Moved non-routine-specific API functions and includes to separate files 2017-10-08 21:52:02 +02:00
Cedric Nugteren 3598762029 Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs 2017-10-08 10:29:47 +02:00
Cedric Nugteren 6d3e1212f0 Synchronizes clpp11.h with CLCudaAPI 9.0 2017-10-07 18:43:29 +02:00
Cedric Nugteren 86b80cdc98 Fixed a small typo 2017-10-07 18:39:32 +02:00
Cedric Nugteren 375193fe4e Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers 2017-10-03 21:55:21 +02:00
Cedric Nugteren 6b226028d5 Allow OverrideParameters function to work before a kernel was first used 2017-10-01 20:32:39 +02:00
Cedric Nugteren 1009303717 Merge branch 'additional_tuners' 2017-09-30 21:04:32 +02:00
Cedric Nugteren c151ab1325 Refactored the tuning architecture: less duplicate now; more defaults 2017-09-30 20:26:26 +02:00
Cedric Nugteren ed980a1df1 Updated database override function to work with the new database storage format 2017-09-24 15:44:14 +02:00
Cedric Nugteren 255f09843c Made program and binary databases dependent on the routine parameters on top of the name 2017-09-23 20:40:38 +02:00
Cedric Nugteren 890281f3e8 Made database-caching no longer dependent on device name but on device/platform IDs 2017-09-23 17:50:44 +02:00
Cedric Nugteren ae1eeb4d1f Fixed type conversion warnings under MSVC 2013 2017-09-19 19:44:34 +02:00
Cedric Nugteren 1d2ee29cb9 Fixed compilation issues of the database for MSVC 2013 2017-09-19 19:44:05 +02:00
Cedric Nugteren a23cd8d13a Updated README with proper AMD device names; fixed device look-up for names of length 50+ 2017-09-16 21:26:38 +02:00
Cedric Nugteren 0802e3d84c Added tuning results for Intel Core i7 6770HQ 2017-09-16 21:19:06 +02:00
Cedric Nugteren bcf39eb79a Fixed a compilation error and warning under MacOS 2017-09-16 18:34:11 +02:00
Cedric Nugteren 163474e171 Fixed an issue with the NVIDIA compute capability not being retrieved properly 2017-09-16 18:25:23 +02:00
Cedric Nugteren 4e317f5e85 Improved compilation time of the tuner database 2017-09-16 18:02:37 +02:00
Cedric Nugteren c21878ecce Added a guard against missing AMD and NVIDIA extensions 2017-09-14 21:58:08 +02:00
Cedric Nugteren 0d13d814c2 Added architecture layer in the tuning database for better performance on unseen devices 2017-09-14 21:27:33 +02:00
Cedric Nugteren 76382ff6c1 Added the new vendor-architecture-name hierarchy to the tuners as well 2017-09-10 16:34:54 +02:00
Cedric Nugteren 91ea7fcde2 Introduced the notion of a device-architecture for the database and added device and architecture name mappings 2017-09-08 21:09:05 +02:00
Cedric Nugteren 20da5e33a8 Split the database files over multiple directories and files; first step towards separate compilation 2017-09-06 21:50:42 +02:00
Cedric Nugteren 8905da259d Fixed a modulo and division issue manifesting on Apple OpenCL for im2col 2017-09-05 18:49:23 +02:00
Cedric Nugteren 28462aa050 Removed an assumption that the 'default' tuning parameters have to be stored last; this is no longer needed 2017-09-04 17:39:57 +02:00
Cedric Nugteren 297159d5b9 Fixed a bug in im2col: process only valid channel IDs 2017-08-31 21:58:12 +02:00
Cedric Nugteren 6194d43efb Fixed a bug in im2col confusing first and second workgroup size; made im2col kernel 2d instead of 3d 2017-08-31 20:34:10 +02:00
Cedric Nugteren 54e160cd88 Fixed some things in the tuner: bugs, style, and defaults to random search 2017-08-31 20:28:01 +02:00
Cedric Nugteren 161fd8514d Merge branch 'master' into im_to_col 2017-08-24 21:15:14 +02:00