Commit graph

741 commits

Author SHA1 Message Date
Cedric Nugteren df7638c305 Fixed an out-of-bounds memory access when filling a matrix with a constant 2017-02-26 14:31:05 +01:00
Cedric Nugteren b7310036ed Removed half-precision support from the TRSM routine; too unstable 2017-02-26 12:56:21 +01:00
Cedric Nugteren 70d8c4bad7 Improved the correctness tests for complex numbers in case either real or imag is much larger than the other 2017-02-26 10:19:53 +01:00
Cedric Nugteren a433987441 Fixes division in the kernel for inversion of complex numbers 2017-02-26 10:18:45 +01:00
Cedric Nugteren ccac957f17 Added documentation for the TRSV and TRSM routines 2017-02-25 13:02:15 +01:00
Cedric Nugteren 492ee3d0a5 Removed the invert routine from the tests 2017-02-25 12:28:13 +01:00
Cedric Nugteren e47d95887c Added PrepareData function for TRSM to create proper test input 2017-02-25 12:23:04 +01:00
Cedric Nugteren 2f2a510c38 Implemented a simple row-major to col-major problem conversion for TRSM 2017-02-24 21:08:44 +01:00
Cedric Nugteren 1e5b5157bc Fixed a few issues with the TRSM routine; some tests still failing 2017-02-22 20:31:33 +01:00
Cedric Nugteren 133ebfc834 Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass 2017-02-19 17:43:26 +01:00
Cedric Nugteren 0643a29af5 Added tuning parameters for the AMD RX480 GPU (Ellesmere) 2017-02-18 13:59:10 +01:00
Cedric Nugteren 0ea30263ac Merge pull request #137 from CNugteren/custom_parameters
API to override tuning parameters
2017-02-18 12:34:38 +01:00
Cedric Nugteren 7b2170818f Changed the override-parameters test such that it is compatible with more devices 2017-02-18 11:22:07 +01:00
Cedric Nugteren 2e0951c6dc Fixed small typo in the documentation 2017-02-18 11:05:54 +01:00
Cedric Nugteren fef11a208c Added documentation for the OverrideParameters function 2017-02-18 11:02:57 +01:00
Cedric Nugteren d6538dfc25 Fixed the naming of the C API of OverrideParameters and fixed the description 2017-02-18 10:59:38 +01:00
Cedric Nugteren 3d10690c83 Added missing documentation for the fill and clear cache functions 2017-02-18 10:32:32 +01:00
Cedric Nugteren cda449a5c3 Added a C interface to the OverrideParameters function; added some in-line comments to the API 2017-02-16 21:14:48 +01:00
Cedric Nugteren 08bfb75a9d Added input-sanity checks for the OverrideParameters function 2017-02-16 21:12:50 +01:00
Cedric Nugteren bdc57221bd Added simple tests for the OverrideParameters function 2017-02-14 21:09:00 +01:00
Cedric Nugteren cdb3bb7166 Added first version of the OverrideParameters function 2017-02-13 20:53:06 +01:00
Cedric Nugteren 00eb55a2d4 Fixed a small bug in GEMV: unused kernel in parameter list 2017-02-13 20:48:32 +01:00
Cedric Nugteren 345a5feb9a Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides) 2017-02-12 12:02:39 +01:00
Cedric Nugteren faa842b927 Made RemoveBySubset from the cache work with references to keys 2017-02-12 11:58:20 +01:00
Cedric Nugteren 36b942a698 Added an option to remove items from the caches, optionally by a subset of 2 specific key-values only 2017-02-11 14:05:38 +01:00
Cedric Nugteren dc93523204 Added tuning results for Titan X (Pascal version) 2017-02-08 21:14:38 +01:00
Cedric Nugteren c248f900c0 Merge branch 'development' into triangular_solvers 2017-02-05 22:18:59 +01:00
Cedric Nugteren e7cbb5915a Fixed complex version of the TRSV kernel 2017-02-05 14:36:31 +01:00
Cedric Nugteren c209dd7af9 Improved substition kernels a bit; added complex support 2017-02-04 22:48:06 +01:00
Cedric Nugteren fec8c1a806 Completed a first STRSV implementation 2017-02-04 16:04:19 +01:00
Cedric Nugteren a6ba6470aa Added row-major support for TRSV 2017-02-04 14:25:27 +01:00
Cedric Nugteren 7c73ceb095 Added first (incomplete) version of TRSV routine 2017-01-29 17:02:00 +01:00
Cedric Nugteren fd471e380c Updated the changelog for PR131 and PR132 2017-01-24 20:34:09 +01:00
Cedric Nugteren 5e7d140d59 Merge pull request #132 from intelfx/cache
Refactor cache subsystem
2017-01-24 20:16:57 +01:00
Ivan Shapovalov 5fb1da1a0f Database: pass Device instead of Queue for clarity 2017-01-24 12:18:14 +03:00
Ivan Shapovalov 50e758a007 Routine: cache the database instance as well
This does not change much, but will become useful in next commits when
plugin support is introduced.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov 6dc18c1c57 Database: ref-count the internal map for caching 2017-01-24 11:56:15 +03:00
Ivan Shapovalov 5bcd92f297 Routine, Cache: generalize, reduce amount of copying in fast path
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Cedric Nugteren e943fe77d6 Merge pull request #131 from intelfx/misc
Assorted minor fixes
2017-01-24 09:10:35 +01:00
Ivan Shapovalov 46a59eb882 .travis.yml: do not build for osx twice, there's no gcc there 2017-01-24 02:55:09 +03:00
Ivan Shapovalov 064ba4abd4 treewide: silence type mismatch warnings in *printf() 2017-01-24 02:55:09 +03:00
Ivan Shapovalov 519ccbd273 Tester: always fail on OpenCL and CLBlast internal errors
These errors are self-evident and enough to fail the test even if there is
no clBLAS reference to compare error codes with.
2017-01-24 02:55:09 +03:00
Ivan Shapovalov 1b8e816333 FillCache: perform compilation for each precision separately
Thus do not prevent filling cache for float if the device does not support
e. g. double.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov 6ad11665a1 Routine: fix semi-warm routine construction (when binary is in cache)
There was a missing return statement in the semi-warm path that made
CLBlast to continue to cold path after a cache hit.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov a9914ee3a8 src/clpp11.hpp: check pointers before clRelease*()
This is to avoid spurious "induced" errors on destruction, if construction
failed for some reason.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov 8e1c084c93 src/clpp11.hpp: do not store program source/binary in Program
The stored source/binary does not seem to serve any purpose, yet its
presence makes Program a heavy (not pure refcounted) object, which is
undesired esp. because it is copied from the cache in the hot path.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov ee4124dcbc samples: add CL_USE_DEPRECATED_OPENCL_1_*_APIS where needed 2017-01-24 02:42:59 +03:00
Ivan Shapovalov 1a1e863ab3 treewide: include clpp11.hpp first to silence deprecation warnings
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Ivan Shapovalov 43c7707173 Routine: use PrecisionSupported<>() instead of duplicating the check 2017-01-20 17:20:45 +03:00
Cedric Nugteren a5fd2323b6 Added prototype for the TRSV routine 2017-01-20 11:30:32 +01:00