Commit Graph

54 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren 3baf823575
Fixes an issue under Android when the driver was already unloaded (#462) 2023-05-10 17:10:17 +02:00
Angus, Alexander 4f394608a2 implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731 2023-01-03 10:56:04 -08:00
Pradeep Garigipati dff65e9217 Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG 2020-06-07 21:13:33 +05:30
Pradeep Garigipati aec71699f8
Fix Program::GetIR to handle programs with multiple devices 2020-06-05 12:00:45 +05:30
Cedric Nugteren e3ce88154a Silenced a new OpenCL warning message 2020-03-08 10:14:59 +01:00
Cedric Nugteren af6a9eedd1 Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 2019-05-11 20:39:00 +02:00
Umar Arshad cf4907942c Remove assert for extention not available in macOS
The cl_nv_device_attribute_query extention is not available on the
Apple platform. This caused failures during debug builds at runtime.
2019-05-03 23:28:07 -04:00
Cedric Nugteren bf43dbb4ee Made last operation in TRSV and TRSM asynchronous, making the events not null 2018-08-13 22:58:44 +02:00
Cedric Nugteren 2b76bfee97 Fixed a wrong event issue causing error -57 2018-07-29 22:16:27 +02:00
Cedric Nugteren 429ff070f8 Fixed a bug: forgot to initialize the shared pointer for the null kernel 2018-07-27 20:53:24 +02:00
Cedric Nugteren f84036948b Renamed AMD SI workaround defines 2018-07-27 20:38:01 +02:00
Cedric Nugteren e8dea34fce Added workaround for weird AMD SI Hainan bug 2018-07-25 22:59:36 +02:00
Tyler Sorensen 7709a7308b Applied feedback from Cedric from first pull request 2018-07-14 19:50:47 -04:00
Tyler Sorensen 7f2e98a140 added inline ptx to support shuffle on Nvidia GPUs 2018-07-11 15:12:22 -04:00
Cedric Nugteren e3eedacbcc Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first 2018-06-28 20:35:18 +09:00
Cedric Nugteren 8258321a74 Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00
Cedric Nugteren 7b416c8686 Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program 2018-04-26 21:10:17 +02:00
Cedric Nugteren ad1227c4f2 Added optional temp-buffer argument to C++ interface of GEMM 2017-12-30 18:45:06 +01:00
Cedric Nugteren 2b020d59f9 Added defines to disable OpenCL deprecation warnings 2017-12-23 15:32:22 +01:00
Cedric Nugteren ca5dbcd2bd Made the pre-processor run by default for ARM and Qualcomm GPUs 2017-12-09 15:16:53 +01:00
Cedric Nugteren 0f080bbc6e Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated 2017-11-20 20:54:18 +01:00
Cedric Nugteren a3a8b44f59 Some fixed for the new auto-tuner to be compatible with the Python scripts 2017-11-19 16:31:08 +01:00
Cedric Nugteren 319762f150 Added Android support using the GNU C++ STL library and the GCC toolchain 2017-10-29 12:07:07 +01:00
Cedric Nugteren 12b08ae491 Merge branch 'master' into android_support 2017-10-28 17:32:37 +02:00
Cedric Nugteren b1270f04b8 Made buffers of batched routines read/write (was: read-only) 2017-10-17 19:56:47 +02:00
Cedric Nugteren 3598762029 Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs 2017-10-08 10:29:47 +02:00
Cedric Nugteren 6d3e1212f0 Synchronizes clpp11.h with CLCudaAPI 9.0 2017-10-07 18:43:29 +02:00
Cedric Nugteren 21af690472 Added missing headers 2017-09-26 21:17:55 +02:00
Cedric Nugteren 890281f3e8 Made database-caching no longer dependent on device name but on device/platform IDs 2017-09-23 17:50:44 +02:00
Cedric Nugteren 163474e171 Fixed an issue with the NVIDIA compute capability not being retrieved properly 2017-09-16 18:25:23 +02:00
Cedric Nugteren c21878ecce Added a guard against missing AMD and NVIDIA extensions 2017-09-14 21:58:08 +02:00
Cedric Nugteren 76382ff6c1 Added the new vendor-architecture-name hierarchy to the tuners as well 2017-09-10 16:34:54 +02:00
Cedric Nugteren 91ea7fcde2 Introduced the notion of a device-architecture for the database and added device and architecture name mappings 2017-09-08 21:09:05 +02:00
Cedric Nugteren fb6c78ea07 Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance 2017-04-07 07:37:30 +02:00
Cedric Nugteren fa0a9c689f Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes 2017-03-08 20:10:20 +01:00
Ivan Shapovalov 5bcd92f297 Routine, Cache: generalize, reduce amount of copying in fast path
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov a9914ee3a8 src/clpp11.hpp: check pointers before clRelease*()
This is to avoid spurious "induced" errors on destruction, if construction
failed for some reason.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov 8e1c084c93 src/clpp11.hpp: do not store program source/binary in Program
The stored source/binary does not seem to serve any purpose, yet its
presence makes Program a heavy (not pure refcounted) object, which is
undesired esp. because it is copied from the cache in the hot path.
2017-01-24 02:42:59 +03:00
Cedric Nugteren 90eb8738c4 Forced OpenCL 1.1 compilation and disabled a deprecation warning 2016-11-20 16:27:02 +01:00
Ivan Shapovalov b98af44fcf treewide: use C++ exceptions properly
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.

Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.

However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Ivan Shapovalov 5d03d48f7a src/clpp11.hpp: avoid throwing exceptions from std::shared_ptr's Deleter 2016-10-22 07:25:16 +03:00
Ivan Shapovalov 6ac7edd2da src/clpp11.hpp: GetInfoString: avoid reallocation 2016-10-22 07:25:16 +03:00
Ivan Shapovalov 106565fa9a src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo() 2016-10-22 07:25:15 +03:00
Cedric Nugteren db5772e521 Updated to version 8.0 of the CLCudaAPI header 2016-09-27 20:56:49 +02:00
Ivan Shapovalov ae3299da30 clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS 2016-07-22 11:15:52 +03:00
Ivan Shapovalov 5502c5eec4 cl::Kernel: skip NULL entries in waitForEvents 2016-07-22 11:15:52 +03:00
Ivan Shapovalov 2dd5ee3f75 clblast::RunKernel, cl::Kernel: take const vector as waitForEvents 2016-07-22 11:15:52 +03:00
Cedric Nugteren b33bec4a59 Fixed some more types and type conversions in the clpp11 interface to OpenCL 2016-07-16 11:13:23 +02:00
Gian-Carlo Pascutto e0ba59c0ac Make sure the passed types are large enough.
Make sure all out parameters that are passed to functions such
as clGetDeviceInfo are large enough to contain the replies.
2016-07-13 15:59:02 +02:00
Cedric Nugteren 27854070b4 Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen 2016-07-06 21:50:12 +02:00