Cedric Nugteren
3baf823575
Fixes an issue under Android when the driver was already unloaded ( #462 )
2023-05-10 17:10:17 +02:00
Angus, Alexander
4f394608a2
implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731
2023-01-03 10:56:04 -08:00
Pradeep Garigipati
dff65e9217
Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG
2020-06-07 21:13:33 +05:30
Pradeep Garigipati
aec71699f8
Fix Program::GetIR to handle programs with multiple devices
2020-06-05 12:00:45 +05:30
Cedric Nugteren
e3ce88154a
Silenced a new OpenCL warning message
2020-03-08 10:14:59 +01:00
Cedric Nugteren
af6a9eedd1
Added a function to set the OpenCL kernel standard, either 1.1 or 1.2
2019-05-11 20:39:00 +02:00
Umar Arshad
cf4907942c
Remove assert for extention not available in macOS
...
The cl_nv_device_attribute_query extention is not available on the
Apple platform. This caused failures during debug builds at runtime.
2019-05-03 23:28:07 -04:00
Cedric Nugteren
bf43dbb4ee
Made last operation in TRSV and TRSM asynchronous, making the events not null
2018-08-13 22:58:44 +02:00
Cedric Nugteren
2b76bfee97
Fixed a wrong event issue causing error -57
2018-07-29 22:16:27 +02:00
Cedric Nugteren
429ff070f8
Fixed a bug: forgot to initialize the shared pointer for the null kernel
2018-07-27 20:53:24 +02:00
Cedric Nugteren
f84036948b
Renamed AMD SI workaround defines
2018-07-27 20:38:01 +02:00
Cedric Nugteren
e8dea34fce
Added workaround for weird AMD SI Hainan bug
2018-07-25 22:59:36 +02:00
Tyler Sorensen
7709a7308b
Applied feedback from Cedric from first pull request
2018-07-14 19:50:47 -04:00
Tyler Sorensen
7f2e98a140
added inline ptx to support shuffle on Nvidia GPUs
2018-07-11 15:12:22 -04:00
Cedric Nugteren
e3eedacbcc
Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first
2018-06-28 20:35:18 +09:00
Cedric Nugteren
8258321a74
Now stores a shared_ptr to the Program class in the cache
2018-05-01 20:34:48 +02:00
Cedric Nugteren
7b416c8686
Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
2018-04-26 21:10:17 +02:00
Cedric Nugteren
ad1227c4f2
Added optional temp-buffer argument to C++ interface of GEMM
2017-12-30 18:45:06 +01:00
Cedric Nugteren
2b020d59f9
Added defines to disable OpenCL deprecation warnings
2017-12-23 15:32:22 +01:00
Cedric Nugteren
ca5dbcd2bd
Made the pre-processor run by default for ARM and Qualcomm GPUs
2017-12-09 15:16:53 +01:00
Cedric Nugteren
0f080bbc6e
Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated
2017-11-20 20:54:18 +01:00
Cedric Nugteren
a3a8b44f59
Some fixed for the new auto-tuner to be compatible with the Python scripts
2017-11-19 16:31:08 +01:00
Cedric Nugteren
319762f150
Added Android support using the GNU C++ STL library and the GCC toolchain
2017-10-29 12:07:07 +01:00
Cedric Nugteren
12b08ae491
Merge branch 'master' into android_support
2017-10-28 17:32:37 +02:00
Cedric Nugteren
b1270f04b8
Made buffers of batched routines read/write (was: read-only)
2017-10-17 19:56:47 +02:00
Cedric Nugteren
3598762029
Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs
2017-10-08 10:29:47 +02:00
Cedric Nugteren
6d3e1212f0
Synchronizes clpp11.h with CLCudaAPI 9.0
2017-10-07 18:43:29 +02:00
Cedric Nugteren
21af690472
Added missing headers
2017-09-26 21:17:55 +02:00
Cedric Nugteren
890281f3e8
Made database-caching no longer dependent on device name but on device/platform IDs
2017-09-23 17:50:44 +02:00
Cedric Nugteren
163474e171
Fixed an issue with the NVIDIA compute capability not being retrieved properly
2017-09-16 18:25:23 +02:00
Cedric Nugteren
c21878ecce
Added a guard against missing AMD and NVIDIA extensions
2017-09-14 21:58:08 +02:00
Cedric Nugteren
76382ff6c1
Added the new vendor-architecture-name hierarchy to the tuners as well
2017-09-10 16:34:54 +02:00
Cedric Nugteren
91ea7fcde2
Introduced the notion of a device-architecture for the database and added device and architecture name mappings
2017-09-08 21:09:05 +02:00
Cedric Nugteren
fb6c78ea07
Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance
2017-04-07 07:37:30 +02:00
Cedric Nugteren
fa0a9c689f
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
2017-03-08 20:10:20 +01:00
Ivan Shapovalov
5bcd92f297
Routine, Cache: generalize, reduce amount of copying in fast path
...
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov
a9914ee3a8
src/clpp11.hpp: check pointers before clRelease*()
...
This is to avoid spurious "induced" errors on destruction, if construction
failed for some reason.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov
8e1c084c93
src/clpp11.hpp: do not store program source/binary in Program
...
The stored source/binary does not seem to serve any purpose, yet its
presence makes Program a heavy (not pure refcounted) object, which is
undesired esp. because it is copied from the cache in the hot path.
2017-01-24 02:42:59 +03:00
Cedric Nugteren
90eb8738c4
Forced OpenCL 1.1 compilation and disabled a deprecation warning
2016-11-20 16:27:02 +01:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Ivan Shapovalov
5d03d48f7a
src/clpp11.hpp: avoid throwing exceptions from std::shared_ptr's Deleter
2016-10-22 07:25:16 +03:00
Ivan Shapovalov
6ac7edd2da
src/clpp11.hpp: GetInfoString: avoid reallocation
2016-10-22 07:25:16 +03:00
Ivan Shapovalov
106565fa9a
src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo()
2016-10-22 07:25:15 +03:00
Cedric Nugteren
db5772e521
Updated to version 8.0 of the CLCudaAPI header
2016-09-27 20:56:49 +02:00
Ivan Shapovalov
ae3299da30
clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
5502c5eec4
cl::Kernel: skip NULL entries in waitForEvents
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
2dd5ee3f75
clblast::RunKernel, cl::Kernel: take const vector as waitForEvents
2016-07-22 11:15:52 +03:00
Cedric Nugteren
b33bec4a59
Fixed some more types and type conversions in the clpp11 interface to OpenCL
2016-07-16 11:13:23 +02:00
Gian-Carlo Pascutto
e0ba59c0ac
Make sure the passed types are large enough.
...
Make sure all out parameters that are passed to functions such
as clGetDeviceInfo are large enough to contain the replies.
2016-07-13 15:59:02 +02:00
Cedric Nugteren
27854070b4
Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen
2016-07-06 21:50:12 +02:00