CLBlast

Commit Graph

Author	SHA1	Message	Date
Cedric Nugteren	3baf823575	Fixes an issue under Android when the driver was already unloaded (#462 )	2023-05-10 17:10:17 +02:00
Angus, Alexander	4f394608a2	implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731	2023-01-03 10:56:04 -08:00
Pradeep Garigipati	dff65e9217	Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG	2020-06-07 21:13:33 +05:30
Pradeep Garigipati	aec71699f8	Fix Program::GetIR to handle programs with multiple devices	2020-06-05 12:00:45 +05:30
Cedric Nugteren	e3ce88154a	Silenced a new OpenCL warning message	2020-03-08 10:14:59 +01:00
Cedric Nugteren	af6a9eedd1	Added a function to set the OpenCL kernel standard, either 1.1 or 1.2	2019-05-11 20:39:00 +02:00
Umar Arshad	cf4907942c	Remove assert for extention not available in macOS The cl_nv_device_attribute_query extention is not available on the Apple platform. This caused failures during debug builds at runtime.	2019-05-03 23:28:07 -04:00
Cedric Nugteren	bf43dbb4ee	Made last operation in TRSV and TRSM asynchronous, making the events not null	2018-08-13 22:58:44 +02:00
Cedric Nugteren	2b76bfee97	Fixed a wrong event issue causing error -57	2018-07-29 22:16:27 +02:00
Cedric Nugteren	429ff070f8	Fixed a bug: forgot to initialize the shared pointer for the null kernel	2018-07-27 20:53:24 +02:00
Cedric Nugteren	f84036948b	Renamed AMD SI workaround defines	2018-07-27 20:38:01 +02:00
Cedric Nugteren	e8dea34fce	Added workaround for weird AMD SI Hainan bug	2018-07-25 22:59:36 +02:00
Tyler Sorensen	7709a7308b	Applied feedback from Cedric from first pull request	2018-07-14 19:50:47 -04:00
Tyler Sorensen	7f2e98a140	added inline ptx to support shuffle on Nvidia GPUs	2018-07-11 15:12:22 -04:00
Cedric Nugteren	e3eedacbcc	Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first	2018-06-28 20:35:18 +09:00
Cedric Nugteren	8258321a74	Now stores a shared_ptr to the Program class in the cache	2018-05-01 20:34:48 +02:00
Cedric Nugteren	7b416c8686	Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program	2018-04-26 21:10:17 +02:00
Cedric Nugteren	ad1227c4f2	Added optional temp-buffer argument to C++ interface of GEMM	2017-12-30 18:45:06 +01:00
Cedric Nugteren	2b020d59f9	Added defines to disable OpenCL deprecation warnings	2017-12-23 15:32:22 +01:00
Cedric Nugteren	ca5dbcd2bd	Made the pre-processor run by default for ARM and Qualcomm GPUs	2017-12-09 15:16:53 +01:00
Cedric Nugteren	0f080bbc6e	Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated	2017-11-20 20:54:18 +01:00
Cedric Nugteren	a3a8b44f59	Some fixed for the new auto-tuner to be compatible with the Python scripts	2017-11-19 16:31:08 +01:00
Cedric Nugteren	319762f150	Added Android support using the GNU C++ STL library and the GCC toolchain	2017-10-29 12:07:07 +01:00
Cedric Nugteren	12b08ae491	Merge branch 'master' into android_support	2017-10-28 17:32:37 +02:00
Cedric Nugteren	b1270f04b8	Made buffers of batched routines read/write (was: read-only)	2017-10-17 19:56:47 +02:00
Cedric Nugteren	3598762029	Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs	2017-10-08 10:29:47 +02:00
Cedric Nugteren	6d3e1212f0	Synchronizes clpp11.h with CLCudaAPI 9.0	2017-10-07 18:43:29 +02:00
Cedric Nugteren	21af690472	Added missing headers	2017-09-26 21:17:55 +02:00
Cedric Nugteren	890281f3e8	Made database-caching no longer dependent on device name but on device/platform IDs	2017-09-23 17:50:44 +02:00
Cedric Nugteren	163474e171	Fixed an issue with the NVIDIA compute capability not being retrieved properly	2017-09-16 18:25:23 +02:00
Cedric Nugteren	c21878ecce	Added a guard against missing AMD and NVIDIA extensions	2017-09-14 21:58:08 +02:00
Cedric Nugteren	76382ff6c1	Added the new vendor-architecture-name hierarchy to the tuners as well	2017-09-10 16:34:54 +02:00
Cedric Nugteren	91ea7fcde2	Introduced the notion of a device-architecture for the database and added device and architecture name mappings	2017-09-08 21:09:05 +02:00
Cedric Nugteren	fb6c78ea07	Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance	2017-04-07 07:37:30 +02:00
Cedric Nugteren	fa0a9c689f	Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes	2017-03-08 20:10:20 +01:00
Ivan Shapovalov	5bcd92f297	Routine, Cache: generalize, reduce amount of copying in fast path Implement a generalized Cache<K, V>. Two variants are provided: the first one is based on std::map, using C++14-specific transparent std::less<> and generalized std::map::find() to allow searching by tuple of references. The second one is based on std::vector and O(n) lookup, but remains C++11-compliant.	2017-01-24 11:56:15 +03:00
Ivan Shapovalov	a9914ee3a8	src/clpp11.hpp: check pointers before clRelease*() This is to avoid spurious "induced" errors on destruction, if construction failed for some reason.	2017-01-24 02:42:59 +03:00
Ivan Shapovalov	8e1c084c93	src/clpp11.hpp: do not store program source/binary in Program The stored source/binary does not seem to serve any purpose, yet its presence makes Program a heavy (not pure refcounted) object, which is undesired esp. because it is copied from the cache in the hot path.	2017-01-24 02:42:59 +03:00
Cedric Nugteren	90eb8738c4	Forced OpenCL 1.1 compilation and disabled a deprecation warning	2016-11-20 16:27:02 +01:00
Ivan Shapovalov	b98af44fcf	treewide: use C++ exceptions properly Since the codebase is designed around proper C++ idioms such as RAII, it makes sense to only use C++ exceptions internally instead of mixing exceptions and error codes. The exceptions are now caught at top level to preserve compatibility with the existing error code-based API. Note that we deliberately do not catch C++ runtime errors (such as `std::bad_alloc`) nor logic errors (aka failed assertions) because no actual handling can ever happen for such errors. However, in the C interface we do catch _all_ exceptions (...) and convert them into a wild-card error code.	2016-10-22 08:45:25 +03:00
Ivan Shapovalov	5d03d48f7a	src/clpp11.hpp: avoid throwing exceptions from std::shared_ptr's Deleter	2016-10-22 07:25:16 +03:00
Ivan Shapovalov	6ac7edd2da	src/clpp11.hpp: GetInfoString: avoid reallocation	2016-10-22 07:25:16 +03:00
Ivan Shapovalov	106565fa9a	src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo()	2016-10-22 07:25:15 +03:00
Cedric Nugteren	db5772e521	Updated to version 8.0 of the CLCudaAPI header	2016-09-27 20:56:49 +02:00
Ivan Shapovalov	ae3299da30	clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	5502c5eec4	cl::Kernel: skip NULL entries in waitForEvents	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	2dd5ee3f75	clblast::RunKernel, cl::Kernel: take const vector as waitForEvents	2016-07-22 11:15:52 +03:00
Cedric Nugteren	b33bec4a59	Fixed some more types and type conversions in the clpp11 interface to OpenCL	2016-07-16 11:13:23 +02:00
Gian-Carlo Pascutto	e0ba59c0ac	Make sure the passed types are large enough. Make sure all out parameters that are passed to functions such as clGetDeviceInfo are large enough to contain the replies.	2016-07-13 15:59:02 +02:00
Cedric Nugteren	27854070b4	Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen	2016-07-06 21:50:12 +02:00

1 2

54 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)