Commit graph

684 commits

Author SHA1 Message Date
Cedric Nugteren 9c643b293c Improved the for-loop pre-processing 2017-11-26 13:32:48 +01:00
Cedric Nugteren 69aa3b35ed Implemented first simple pre-processor: defines parser and loop unrolling based on assumptions 2017-11-25 17:46:01 +01:00
Cedric Nugteren f01bcded1e Moved string splitting functions; added string character removal function 2017-11-25 17:44:21 +01:00
Cedric Nugteren c0c6d00b12 Added stub for a preprocessor and a corresponding compilation test 2017-11-25 10:24:05 +01:00
Cedric Nugteren ebce82e650
Merge pull request #222 from CNugteren/override_params_from_json
Override params in clients from tuner JSON
2017-11-25 09:48:27 +01:00
Cedric Nugteren abb4d5ab32 Added tuning results for ARM Mali T760 GPU 2017-11-24 21:16:54 +01:00
Cedric Nugteren 9527c89c30 Made parameter override in the clients a command-line argument and added support for multi-kernel routines 2017-11-22 20:53:20 +01:00
Cedric Nugteren 0f080bbc6e Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated 2017-11-20 20:54:18 +01:00
Cedric Nugteren e0f3484084 Fixes some displaying issues in the GEMM routine tuner 2017-11-20 20:29:52 +01:00
Cedric Nugteren 5467c0cac5 Fixed a variety of warnings and an error for MSVC2013 compilation 2017-11-19 21:09:24 +01:00
Cedric Nugteren 4e0d08c3bc Added compilation timing and better compilation error reporting 2017-11-19 16:58:13 +01:00
Cedric Nugteren a3a8b44f59 Some fixed for the new auto-tuner to be compatible with the Python scripts 2017-11-19 16:31:08 +01:00
Cedric Nugteren 76d2b7f0b6 Revived the GEMM routine tuner; minor formatting changes 2017-11-19 12:59:52 +01:00
Cedric Nugteren 7a54494577 Modified the kernel tuners to use the newly integrated auto-tuner 2017-11-19 12:58:41 +01:00
Cedric Nugteren 8a5a5e031e Moved some tuning functions from .hpp to .cpp 2017-11-17 20:58:36 +01:00
Cedric Nugteren f94d498a37 Moved compilation function to separate file; removed dependency of tuners of the CLBlast library 2017-11-17 20:57:46 +01:00
Cedric Nugteren 2b8ad70b63 Added printing of the best parameters for the new tuner 2017-11-16 21:18:29 +01:00
Cedric Nugteren 1b2b46f2f0 Added first version of integrated and re-written auto-tuner 2017-11-15 22:49:35 +01:00
Cedric Nugteren 0cd78bb6f9 Added kernel timing functionality to the utilities 2017-11-15 22:47:06 +01:00
Cedric Nugteren b337bffbaf Added exception handle with catch-all 2017-11-15 22:44:44 +01:00
Cedric Nugteren 03ebf14b97 Made the exception dispatch function optionally silent 2017-11-13 21:11:31 +01:00
Cedric Nugteren 4bac1287f2 Moved square-difference utility function for use in the tuners 2017-11-13 21:10:44 +01:00
Cedric Nugteren 677afd3b96 Factored out the creation of the OpenCL header and the program compilation 2017-11-11 16:14:43 +01:00
Cedric Nugteren c41d219ea4 Added tuning results for the GeForce GTX750Ti 2017-11-09 21:19:21 +01:00
Cedric Nugteren b18cc9d3f1
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
2017-11-07 22:20:13 +01:00
Cedric Nugteren 3ec0be6fb8 Added various GEMM routine tuning results 2017-11-07 21:34:54 +01:00
Cedric Nugteren 33ac2b0175 Improved the way the database defaults are computed 2017-11-06 21:59:45 +01:00
Cedric Nugteren 34a33b54cf Changed GEMM routine tuner's scoring to use L2 measure instead for better averaging 2017-11-06 20:50:36 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren 73272ab97d Fixed a bug in database compression/decompression 2017-11-02 21:19:18 +01:00
Cedric Nugteren 5c90577dfd Added collecting and printing of scores for the kernel-selection tuner 2017-10-30 20:39:21 +01:00
Cedric Nugteren ac5a58cfe5 Added platform ID to the binary program cache to prevent issues with multi-platform systems 2017-10-29 20:01:30 +01:00
Cedric Nugteren 319762f150 Added Android support using the GNU C++ STL library and the GCC toolchain 2017-10-29 12:07:07 +01:00
Cedric Nugteren 12b08ae491 Merge branch 'master' into android_support 2017-10-28 17:32:37 +02:00
Cedric Nugteren 334a26eb12 Added initial version of a GEMM kernel selection tuner 2017-10-28 17:30:29 +02:00
Cedric Nugteren bd57dfa435 Moved timing function to a separate file 2017-10-28 14:12:05 +02:00
Cedric Nugteren fa6e5e67f5 Fixed a bug when using the matrix A-offset argument for the TRSM routine 2017-10-27 22:12:30 +02:00
Cedric Nugteren 449577cf07 Reduced TRSM block-size for better numerical stability 2017-10-27 22:07:43 +02:00
Cedric Nugteren 44f7fa628a Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM 2017-10-27 22:01:15 +02:00
Cedric Nugteren d49aae236e Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls 2017-10-25 20:35:39 +02:00
Cedric Nugteren 472f90501c Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570 2017-10-20 18:06:12 +02:00
Cedric Nugteren 363568787e Moved CUmodule code from Kernel to Program class to not require re-compilation every time 2017-10-18 18:17:30 +02:00
Cedric Nugteren 9d879c949a Fix an incompatibility with CUDA's FP16 definition 2017-10-17 20:29:23 +02:00
Cedric Nugteren b1270f04b8 Made buffers of batched routines read/write (was: read-only) 2017-10-17 19:56:47 +02:00
Cedric Nugteren f349731d54 CUDA kernel compilation fixes 2017-10-17 19:53:09 +02:00
Cedric Nugteren 0719f14486 Made all CUDA kernel launches synchronous; removed exception raising 2017-10-16 21:54:42 +02:00
Cedric Nugteren d62823f067 Added a missing OpenCL-to-CUDA function translation 2017-10-15 19:53:52 +02:00
Cedric Nugteren 7663cba234 Fixes for the CUDA API: first tests pass and the client runs 2017-10-15 17:43:20 +02:00
Cedric Nugteren 71049e8d39 Added the SM-compute-arch version to the nv compile options 2017-10-15 17:41:44 +02:00
Cedric Nugteren 7408da174c Various fixes to make the first CUDA examples work 2017-10-15 12:17:35 +02:00
Cedric Nugteren 55a802c63d Fixed a kernel/attribute order bug in the direct GEMM kernels 2017-10-14 17:21:34 +02:00
Cedric Nugteren b06bc01da9 Make local memory pointers a define in OpenCL; some fixes to the recently changed transpose kernel code 2017-10-14 17:13:54 +02:00
Cedric Nugteren d9456306e0 Made transpose kernel struct init proper according to the C standard 2017-10-14 16:48:06 +02:00
Cedric Nugteren 313fc796b2 Fixed several (not all) CUDA kernel compilation issues 2017-10-14 16:01:12 +02:00
Cedric Nugteren 54d0c440ce Various fixes to make the host code and sample compile with the CUDA API 2017-10-14 11:43:57 +02:00
Cedric Nugteren 2d7b648a24 Added OpenCL to CUDA translation header for the kernels 2017-10-14 10:49:25 +02:00
Cedric Nugteren cc5b475425 CUDA API now takes context and device in instead of stream 2017-10-12 12:20:43 +02:00
Cedric Nugteren b901809345 Added first (untested) version of a CUDA API 2017-10-11 23:16:57 +02:00
Cedric Nugteren 44246053a5 Removed include of clpp11.hpp in places other than utilities.hpp 2017-10-09 19:41:40 +02:00
Cedric Nugteren df3c9f4a8a Moved non-routine-specific API functions and includes to separate files 2017-10-08 21:52:02 +02:00
Cedric Nugteren 3598762029 Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs 2017-10-08 10:29:47 +02:00
Cedric Nugteren 6d3e1212f0 Synchronizes clpp11.h with CLCudaAPI 9.0 2017-10-07 18:43:29 +02:00
Cedric Nugteren 86b80cdc98 Fixed a small typo 2017-10-07 18:39:32 +02:00
Cedric Nugteren 375193fe4e Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers 2017-10-03 21:55:21 +02:00
Cedric Nugteren 6b226028d5 Allow OverrideParameters function to work before a kernel was first used 2017-10-01 20:32:39 +02:00
Cedric Nugteren 1009303717 Merge branch 'additional_tuners' 2017-09-30 21:04:32 +02:00
Cedric Nugteren c151ab1325 Refactored the tuning architecture: less duplicate now; more defaults 2017-09-30 20:26:26 +02:00
Cedric Nugteren 00b5771477 Added Android header for compilation with gnustl STL 2017-09-26 21:20:01 +02:00
Cedric Nugteren 21af690472 Added missing headers 2017-09-26 21:17:55 +02:00
Cedric Nugteren ed980a1df1 Updated database override function to work with the new database storage format 2017-09-24 15:44:14 +02:00
Cedric Nugteren 255f09843c Made program and binary databases dependent on the routine parameters on top of the name 2017-09-23 20:40:38 +02:00
Cedric Nugteren 890281f3e8 Made database-caching no longer dependent on device name but on device/platform IDs 2017-09-23 17:50:44 +02:00
Cedric Nugteren ae1eeb4d1f Fixed type conversion warnings under MSVC 2013 2017-09-19 19:44:34 +02:00
Cedric Nugteren 1d2ee29cb9 Fixed compilation issues of the database for MSVC 2013 2017-09-19 19:44:05 +02:00
Cedric Nugteren a23cd8d13a Updated README with proper AMD device names; fixed device look-up for names of length 50+ 2017-09-16 21:26:38 +02:00
Cedric Nugteren 0802e3d84c Added tuning results for Intel Core i7 6770HQ 2017-09-16 21:19:06 +02:00
Cedric Nugteren bcf39eb79a Fixed a compilation error and warning under MacOS 2017-09-16 18:34:11 +02:00
Cedric Nugteren 163474e171 Fixed an issue with the NVIDIA compute capability not being retrieved properly 2017-09-16 18:25:23 +02:00
Cedric Nugteren 4e317f5e85 Improved compilation time of the tuner database 2017-09-16 18:02:37 +02:00
Cedric Nugteren c21878ecce Added a guard against missing AMD and NVIDIA extensions 2017-09-14 21:58:08 +02:00
Cedric Nugteren 0d13d814c2 Added architecture layer in the tuning database for better performance on unseen devices 2017-09-14 21:27:33 +02:00
Cedric Nugteren 76382ff6c1 Added the new vendor-architecture-name hierarchy to the tuners as well 2017-09-10 16:34:54 +02:00
Cedric Nugteren 91ea7fcde2 Introduced the notion of a device-architecture for the database and added device and architecture name mappings 2017-09-08 21:09:05 +02:00
Cedric Nugteren 20da5e33a8 Split the database files over multiple directories and files; first step towards separate compilation 2017-09-06 21:50:42 +02:00
Cedric Nugteren 8905da259d Fixed a modulo and division issue manifesting on Apple OpenCL for im2col 2017-09-05 18:49:23 +02:00
Cedric Nugteren 28462aa050 Removed an assumption that the 'default' tuning parameters have to be stored last; this is no longer needed 2017-09-04 17:39:57 +02:00
Cedric Nugteren 297159d5b9 Fixed a bug in im2col: process only valid channel IDs 2017-08-31 21:58:12 +02:00
Cedric Nugteren 6194d43efb Fixed a bug in im2col confusing first and second workgroup size; made im2col kernel 2d instead of 3d 2017-08-31 20:34:10 +02:00
Cedric Nugteren 54e160cd88 Fixed some things in the tuner: bugs, style, and defaults to random search 2017-08-31 20:28:01 +02:00
Cedric Nugteren 161fd8514d Merge branch 'master' into im_to_col 2017-08-24 21:15:14 +02:00
Cedric Nugteren 4d9d03ba51 Completed im2col implementation 2017-08-24 21:11:12 +02:00
Cedric Nugteren a8c26594d9 Made the im2col client properly handle the arguments 2017-08-23 19:54:09 +02:00
Cedric Nugteren da28cc5e93 Minor updates after merging in the PSO addition to the tuners 2017-08-21 20:14:02 +02:00
Cedric Nugteren e5eb6b1d3a Merge pull request #173 from mcian/PSO_params
Add PSO parameters support and search strategy selection from command…
2017-08-21 20:06:29 +02:00
mcian dfd332524a Remove multistrategy and related functions 2017-08-21 14:09:11 +02:00
Cedric Nugteren 803ca781f9 First version of im2col kernel, unoptimized but working 2017-08-19 18:25:13 +02:00
Cedric Nugteren 777681dcbd Merge branch 'master' into im_to_col 2017-08-12 20:50:00 +02:00
Cedric Nugteren 0a63621579 Moved functions from the header to the .cpp file to prevent compiling the same code multiple times 2017-08-12 15:59:14 +02:00
Cedric Nugteren 844e68853e Moved some utility functions to a test-specific utility compilation-unit 2017-08-12 15:38:17 +02:00
mcian 4adee60884 Revert the xgemm strategy to default. If user wants to use multistrategy can simple call the function TestHeuristic from the main 2017-08-09 16:58:46 +02:00
mcian 0b4aa109f8 Use cltune::SearchMethod enum instead of int values 2017-08-09 16:05:25 +02:00
mcian 99afdcd908 Restore direct GEMM to previous version 2017-07-31 14:06:23 +02:00
Cedric Nugteren 18d832e149 Added tuning results for the Qualcomm Adreno 330 GPU 2017-07-30 18:18:02 +02:00
Cedric Nugteren 0ea16a0e63 Minor optimization for the direct GEMM kernel: don't ceil m and n unnecessarily high 2017-07-25 20:53:12 +02:00
Cedric Nugteren 55861c40ff Merge branch 'relax_gemmbatched_ld_requirements' 2017-07-23 21:04:17 +02:00
mcian 473e814718 Code refactoring 2017-07-23 14:48:13 +02:00
Cedric Nugteren 2d52f9b1d3 Merge pull request #176 from CNugteren/inline_keyword_optional
Made the inline keyword in kernels optional
2017-07-22 10:44:08 +02:00
mcian a36283aaec Add new threshold for ARM 2017-07-17 12:20:46 +02:00
mcian 8131e68664 Add PSO parameters support and search strategy selection from command line 2017-07-17 12:00:25 +02:00
Cedric Nugteren 97bcf77d4b First step towards supporting im2col in the test infrastructure 2017-07-16 22:33:49 +02:00
Cedric Nugteren f77b48692b Relaxed requirement on a_ld and b_ld for batched GEMM 2017-07-12 21:53:39 +02:00
Cedric Nugteren 442c31dd50 Made the inline keyword in kernels optional currently only enabled for NVIDIA and ARM GPUs 2017-07-08 17:12:16 +02:00
Cedric Nugteren 84ec50e29d Added interface and stubs for the im2col routine 2017-07-02 12:10:22 +02:00
Cedric Nugteren 4cf516cfec Fixed an if-statement in the direct GEMM kernel causing a bug with specific sets of input parameters 2017-06-30 21:57:41 +02:00
Cedric Nugteren 1a8ed48a35 Fixed some Clang and MSVC warnings 2017-06-25 11:50:36 +02:00
Cedric Nugteren 615a7fdc81 Fixes some compilation issues related to the database structure change 2017-06-21 23:07:47 +02:00
Cedric Nugteren e44feb8576 Changed the structure of the database to reduce compilation time and save memory 2017-06-20 21:19:26 +02:00
Cedric Nugteren 48f2682eb7 Added tuning results for the Core i7-920 CPU 2017-06-18 20:53:59 +02:00
Cedric Nugteren 3070b502b5 Fixed an overflow bug on 32-bit systems when chosing a GEMM kernel 2017-06-18 20:51:11 +02:00
Cedric Nugteren 33ed1e5a06 Added tuning results for GeForce GT 650M (thanks to bzcheeseman) 2017-06-01 22:52:08 +02:00
Cedric Nugteren f57e209aab Merge pull request #158 from CNugteren/msvc_compilation_fixes
MSVC compilation fixes
2017-05-27 17:53:30 +02:00
Kirill Mavreshko 64ba590279 Fixed comment decribing the order of program cache fields 2017-05-27 10:30:09 +05:00
Cedric Nugteren f7a16d427c Fixed a compilation issue under MSVC 2013 2017-05-26 22:10:56 +02:00
Kirill Mavreshko 628e1e8cce Fixes inability to run GEMM on multiple identical GPUs (issue #155) 2017-05-26 15:04:19 +05:00
Cedric Nugteren 8400ee3a09 Fixed an TRSM issue caused by incorrect block size calculation 2017-05-15 22:04:55 +02:00
Cedric Nugteren 512b83dbad Fixed a missing synchronization barrier in the invert kernel; fixes TRSM tests 2017-05-14 20:27:35 +02:00
Cedric Nugteren f151e56daa Added the IxAMIN routines: absolute minimum version of IxAMAX 2017-05-12 20:01:33 -07:00
Cedric Nugteren 86e8df60f1 Fixed a bug in the TRSM routine; tests now pass 2017-05-12 17:43:56 -07:00
Cedric Nugteren 71933c3411 Added tuning results for the AMD Radeon Fiji GPU 2017-05-11 22:53:52 -07:00
Cedric Nugteren 1df28a15fc Re-added random tuning for GEMM after accidental removal 2017-05-11 22:12:38 -07:00
Cedric Nugteren 1c33af6eab Re-added Titan X (Pascal) tuning results based on more averaging when tuning 2017-04-23 17:58:56 +02:00
Cedric Nugteren 3eea8dc998 Increased the default number of runs for the tuner from 2 up to 10 for fast kernels 2017-04-22 13:56:07 +02:00
Cedric Nugteren 192199c9cb Fixed the direct vs indirect setting for NVIDIA GPUs 2017-04-22 13:43:27 +02:00
Cedric Nugteren e41d204856 Increased the default number of runs for GEMV tuning; updated GEMV tuning results for Iris Pro 2017-04-21 22:12:20 +02:00
Cedric Nugteren d7314d4f8e Tuned the direct versus indirect GEMM kernel trade-off point for NVIDIA GPUs 2017-04-20 22:19:09 +02:00
Cedric Nugteren 409a5a2ad0 Fixed a namespace clash with CUDA FP16 for the half-datatype 2017-04-17 16:47:15 +02:00
Cedric Nugteren 2673f50518 Merge branch 'development' into benchmarking 2017-04-16 19:41:14 +02:00
Cedric Nugteren 10205d773e Added a new Xaxpy kernel in between the regular and fast version in 2017-04-14 20:16:10 +02:00
Cedric Nugteren f7f8ec644f Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works 2017-04-13 21:31:27 +02:00
Cedric Nugteren 22b3ea9256 Merge branch 'development' into cublas_reference
Conflicts:
	scripts/generator/generator.py
2017-04-10 20:11:45 +02:00
Cedric Nugteren 7374c37e2e Fixed a compilation issue under MSVC and GCC 2017-04-10 08:38:24 +02:00
Cedric Nugteren 2d45c37676 Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard 2017-04-10 07:40:27 +02:00
Cedric Nugteren fb6c78ea07 Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance 2017-04-07 07:37:30 +02:00
Cedric Nugteren d28ee082b0 Uses float2 and double2 for base complex data-types instead of a custom struct; fixes bug on Apple OpenCL 2017-04-07 07:35:15 +02:00
Cedric Nugteren ce369702d8 Added some missing const-ness 2017-04-07 07:34:32 +02:00
Cedric Nugteren b24d364743 Layed the groundwork for cuBLAS comparisons in the clients 2017-04-02 18:06:15 +02:00
Cedric Nugteren b84d2296b8 Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication 2017-04-01 13:36:24 +02:00
Cedric Nugteren c27d2f0c1e Added an (optional) non-direct implementation of the batched GEMM routine 2017-03-19 16:04:04 +01:00
Cedric Nugteren 2fd04dae83 Added batched versions of the pad/copy/transpose kernels 2017-03-19 15:57:44 +01:00
Cedric Nugteren 11bb30e72b Added the possibility to tune batched kernels 2017-03-14 20:29:51 +01:00
Cedric Nugteren 7b8f8fce68 Added initial naive version of the batched GEMM routine based on the direct GEMM kernel 2017-03-11 16:02:45 +01:00
Cedric Nugteren 49e04c7fce Added API and test infrastructure for the batched GEMM routine 2017-03-10 21:24:35 +01:00
Cedric Nugteren d754586b49 Added proper testing of the alpha parameter; finalized the batched AXPY implementation 2017-03-10 20:49:59 +01:00
Cedric Nugteren 92a657290a Fixed a small compilation bug for MSVC related to a floating-point constant 2017-03-10 20:30:10 +01:00
Cedric Nugteren 878d93e7dc Implemented a batched version of the AXPY kernel 2017-03-08 20:36:35 +01:00
Cedric Nugteren fa0a9c689f Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes 2017-03-08 20:10:20 +01:00
Cedric Nugteren 6aba0bbae7 Minor fixes to the client w.r.t. the addition of the batch count 2017-03-05 16:44:16 +01:00
Cedric Nugteren b114ea49a9 Added first naive version of the batched AXPY routine 2017-03-05 15:06:14 +01:00
Cedric Nugteren cdf354f895 Adjusted the test-infrastructure to support testing of batched-versions of routines 2017-03-05 15:04:16 +01:00
Cedric Nugteren 7f14b11f1e Changed the way the test-data is generated: now using a single MT generator and distribution for all data 2017-03-05 11:13:47 +01:00
Cedric Nugteren f9a520b3af Prepared generator for batched routines; added batched AXPY routine interface 2017-03-05 10:38:38 +01:00
Cedric Nugteren e9ef037549 Added tuning results for the Radeon HD6750M GPU (Apple OpenCL) 2017-03-04 15:24:55 +01:00
Cedric Nugteren e993ee077b Added a proper data-preparation function for the TRSM tests 2017-03-04 15:21:33 +01:00
Cedric Nugteren 3fc73851f7 Added proper support for the b_offset argument in TRSM 2017-03-01 21:23:33 +01:00
Cedric Nugteren 00281dad26 Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants 2017-02-27 21:00:04 +01:00
Cedric Nugteren e09c26c706 Split the GEMM kernel further up to prevent C1091 in MSVC 2017-02-26 15:03:12 +01:00
Cedric Nugteren ea6790665d Merge branch 'development' into triangular_solvers 2017-02-26 14:51:45 +01:00
Cedric Nugteren df7638c305 Fixed an out-of-bounds memory access when filling a matrix with a constant 2017-02-26 14:31:05 +01:00
Cedric Nugteren b7310036ed Removed half-precision support from the TRSM routine; too unstable 2017-02-26 12:56:21 +01:00
Cedric Nugteren a433987441 Fixes division in the kernel for inversion of complex numbers 2017-02-26 10:18:45 +01:00
Cedric Nugteren e47d95887c Added PrepareData function for TRSM to create proper test input 2017-02-25 12:23:04 +01:00
Cedric Nugteren 2f2a510c38 Implemented a simple row-major to col-major problem conversion for TRSM 2017-02-24 21:08:44 +01:00
Cedric Nugteren 1e5b5157bc Fixed a few issues with the TRSM routine; some tests still failing 2017-02-22 20:31:33 +01:00
Cedric Nugteren 133ebfc834 Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass 2017-02-19 17:43:26 +01:00
Cedric Nugteren 0643a29af5 Added tuning parameters for the AMD RX480 GPU (Ellesmere) 2017-02-18 13:59:10 +01:00
Cedric Nugteren d6538dfc25 Fixed the naming of the C API of OverrideParameters and fixed the description 2017-02-18 10:59:38 +01:00
Cedric Nugteren cda449a5c3 Added a C interface to the OverrideParameters function; added some in-line comments to the API 2017-02-16 21:14:48 +01:00
Cedric Nugteren 08bfb75a9d Added input-sanity checks for the OverrideParameters function 2017-02-16 21:12:50 +01:00
Cedric Nugteren cdb3bb7166 Added first version of the OverrideParameters function 2017-02-13 20:53:06 +01:00
Cedric Nugteren 00eb55a2d4 Fixed a small bug in GEMV: unused kernel in parameter list 2017-02-13 20:48:32 +01:00
Cedric Nugteren 345a5feb9a Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides) 2017-02-12 12:02:39 +01:00
Cedric Nugteren faa842b927 Made RemoveBySubset from the cache work with references to keys 2017-02-12 11:58:20 +01:00
Cedric Nugteren 36b942a698 Added an option to remove items from the caches, optionally by a subset of 2 specific key-values only 2017-02-11 14:05:38 +01:00
Cedric Nugteren dc93523204 Added tuning results for Titan X (Pascal version) 2017-02-08 21:14:38 +01:00
Cedric Nugteren c248f900c0 Merge branch 'development' into triangular_solvers 2017-02-05 22:18:59 +01:00
Cedric Nugteren e7cbb5915a Fixed complex version of the TRSV kernel 2017-02-05 14:36:31 +01:00
Cedric Nugteren c209dd7af9 Improved substition kernels a bit; added complex support 2017-02-04 22:48:06 +01:00
Cedric Nugteren fec8c1a806 Completed a first STRSV implementation 2017-02-04 16:04:19 +01:00
Cedric Nugteren a6ba6470aa Added row-major support for TRSV 2017-02-04 14:25:27 +01:00
Cedric Nugteren 7c73ceb095 Added first (incomplete) version of TRSV routine 2017-01-29 17:02:00 +01:00
Ivan Shapovalov 5fb1da1a0f Database: pass Device instead of Queue for clarity 2017-01-24 12:18:14 +03:00
Ivan Shapovalov 50e758a007 Routine: cache the database instance as well
This does not change much, but will become useful in next commits when
plugin support is introduced.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov 6dc18c1c57 Database: ref-count the internal map for caching 2017-01-24 11:56:15 +03:00
Ivan Shapovalov 5bcd92f297 Routine, Cache: generalize, reduce amount of copying in fast path
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov 1b8e816333 FillCache: perform compilation for each precision separately
Thus do not prevent filling cache for float if the device does not support
e. g. double.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov 6ad11665a1 Routine: fix semi-warm routine construction (when binary is in cache)
There was a missing return statement in the semi-warm path that made
CLBlast to continue to cold path after a cache hit.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov a9914ee3a8 src/clpp11.hpp: check pointers before clRelease*()
This is to avoid spurious "induced" errors on destruction, if construction
failed for some reason.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov 8e1c084c93 src/clpp11.hpp: do not store program source/binary in Program
The stored source/binary does not seem to serve any purpose, yet its
presence makes Program a heavy (not pure refcounted) object, which is
undesired esp. because it is copied from the cache in the hot path.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov 1a1e863ab3 treewide: include clpp11.hpp first to silence deprecation warnings
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Ivan Shapovalov 43c7707173 Routine: use PrecisionSupported<>() instead of duplicating the check 2017-01-20 17:20:45 +03:00
Cedric Nugteren a5fd2323b6 Added prototype for the TRSV routine 2017-01-20 11:30:32 +01:00
Cedric Nugteren a2c0a9c551 Set number of decimals for floating-point printing for error reporting 2017-01-20 11:13:44 +01:00
Cedric Nugteren 2e4f6e1609 Added tuning results for NVIDIA GTX 1080 and Intel Core i7-4790K 2017-01-19 19:42:31 +01:00
Cedric Nugteren df9a77d74d Added first version of the TRSM routine based on the diagonal invert kernel 2017-01-18 21:29:59 +01:00
Cedric Nugteren 4b3ffd9989 Added a first version of the diagonal block invert routine in preparation of TRSM 2017-01-15 17:30:00 +01:00
Cedric Nugteren 4a4be0c3a5 Prints additional information in verbose/debug mode 2017-01-15 17:17:40 +01:00
Cedric Nugteren 69ca271a8c Always enables cl_khr_fp64 when running double-precision, not just for OpenCL 1.1 or lower 2017-01-07 13:31:29 +01:00
Cedric Nugteren 32b850b12b Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU 2017-01-03 20:30:56 +01:00
Cedric Nugteren 681a465b35 Prepared for the addition of the TRSM triangular solver kernel 2016-12-18 12:30:16 +01:00
Cedric Nugteren 6b533dda1c Fixed a bug when using offsets in the direct GEMM kernels 2016-12-18 11:54:32 +01:00
Cedric Nugteren 26e0177431 Made Intel GPUs always use the indirect version of the GEMM kernel 2016-11-29 20:47:20 +01:00
Cedric Nugteren 39c49bf4f9 Made it possible to use the command-line environmental variables for each executable and without re-running CMake 2016-11-27 11:00:29 +01:00
Cedric Nugteren 080e1be684 Improved the default parameters for cases with non-common parameters across all devices 2016-11-26 16:38:17 +01:00
Cedric Nugteren cb398f0e42 Merge pull request #125 from CNugteren/netlib_blas_api
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren 792cc8359f Fixed a vector-size related bug in the CLBlast Netlib API 2016-11-23 22:00:20 +01:00
Cedric Nugteren 654b41bb2b Fixed a bug in the HSCAL routine 2016-11-23 21:29:16 +01:00
Cedric Nugteren 26ca071480 Minor changes to ensure full compatibility with the Netlib CBLAS API 2016-11-22 08:41:52 +01:00
Cedric Nugteren eefe0df435 Made functions with scalar-buffers as output properly return values 2016-11-20 21:36:57 +01:00
Cedric Nugteren d8af24e388 Now correctly tests for validaty of the B matrix in the TRMM routine 2016-11-20 16:27:54 +01:00
Cedric Nugteren 90eb8738c4 Forced OpenCL 1.1 compilation and disabled a deprecation warning 2016-11-20 16:27:02 +01:00
Cedric Nugteren 2f0697564f Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything 2016-11-20 15:05:42 +01:00
Cedric Nugteren 6eeb1180fd Changed the GEMM kernel selection parameters for Skylake GPUs to always favour the regular kernel 2016-11-19 22:15:33 +01:00
Cedric Nugteren 746d688e07 Updated the tuning results for the Intel Skylake ULT GT2 GPU 2016-11-15 22:42:04 +01:00
Cedric Nugteren 8ae8ab06a2 Renamed the include and source files of the Netlib CBLAS API 2016-10-25 20:33:10 +02:00
Cedric Nugteren 140121ef91 Removed the clblast namespace from the Netlib C API source file to ensure proper linking 2016-10-25 20:21:50 +02:00
Cedric Nugteren 729862e873 Fixed some issues preventing the Netlib CBLAS API from linking correctly 2016-10-25 19:56:42 +02:00
Cedric Nugteren 926aca53a0 Made the Netlib CBLAS API use the same enums with prefixes as the regular C API of CLBlast 2016-10-25 19:45:57 +02:00
Cedric Nugteren 59183b7d79 Sets the proper sizes for the buffers for the Netlib CBLAS API 2016-10-25 19:21:49 +02:00
Cedric Nugteren f96fd372bc Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes 2016-10-25 14:28:52 +02:00
Cedric Nugteren ec687afa75 Added tuning results for GeForce GTX TITAN Black 2016-10-24 19:49:10 +02:00
Cedric Nugteren 76d5d2ccfc Fixed a bug in the transpose-matrix function 2016-10-23 20:49:55 +02:00
Cedric Nugteren b8d4a9b9d0 Removed PUBLIC_API from the C++ exception classes 2016-10-23 16:09:59 +02:00
Cedric Nugteren 66f5c9d9b8 Added a fix for compilation under Visual Studio 2013 related to the new exception classes 2016-10-23 15:55:03 +02:00
Cedric Nugteren c925fe463f Added tuning results for the AMD Tonga GPU 2016-10-22 16:25:31 +02:00
Cedric Nugteren a670c4c4bf All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects 2016-10-22 16:14:56 +02:00
Cedric Nugteren b0ff11acf0 Moved files around a bit; created a utilities subfolder 2016-10-22 15:36:48 +02:00
Cedric Nugteren 9afbbc9ef9 Added documentation for the better exception handling 2016-10-22 15:23:18 +02:00
Cedric Nugteren 280698d076 Merge pull request #117 from intelfx/exceptions
Convert to use C++ exceptions internally
2016-10-22 15:05:12 +02:00
Cedric Nugteren 9b596820d2 Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters (2) 2016-10-22 10:50:12 +02:00
Cedric Nugteren db17b1fbe9 Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters 2016-10-22 10:41:02 +02:00
Ivan Shapovalov 56f300607b Routine: get rid of ::SetUp()
Since we now use C++ exceptions inside the implementation (and exceptions
can be thrown from constructors), there is no need for a separate
Routine::SetUp() function.

For this, we also change the way how the kernel source string is constructed.
The kernel-specific source code is now passed to the Routine ctor via
an initializer_list of C strings to avoid unnecessary data copying
while also working around C1091 of MSVC 2013.
2016-10-22 08:45:27 +03:00
Ivan Shapovalov b98af44fcf treewide: use C++ exceptions properly
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.

Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.

However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Ivan Shapovalov 5d03d48f7a src/clpp11.hpp: avoid throwing exceptions from std::shared_ptr's Deleter 2016-10-22 07:25:16 +03:00
Ivan Shapovalov 6ac7edd2da src/clpp11.hpp: GetInfoString: avoid reallocation 2016-10-22 07:25:16 +03:00
Ivan Shapovalov 106565fa9a src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo() 2016-10-22 07:25:15 +03:00
Cedric Nugteren 597974b40d Merge pull request #118 from matze/add-pkg-config
Generate and install pkg-config description
2016-10-21 21:00:07 +02:00
Matthias Vogelgesang 3797d144cc Generate and install pkg-config description 2016-10-21 09:38:25 +02:00
Cedric Nugteren 0f9311d46a Fixed an issue with a growing database: the database is now a global variable in a namespace and its container uses const-pointers to the actual data 2016-10-14 20:56:32 +02:00
Cedric Nugteren ebb505b783 Added tuning results for Intel HD Graphics IvyBridge GPU 2016-10-13 12:18:28 +02:00
Cedric Nugteren c60f6715f8 Removed a spurious #ifdef 2016-10-12 21:49:59 +02:00