Commit graph

839 commits

Author SHA1 Message Date
Cedric Nugteren 70d8c4bad7 Improved the correctness tests for complex numbers in case either real or imag is much larger than the other 2017-02-26 10:19:53 +01:00
Cedric Nugteren a433987441 Fixes division in the kernel for inversion of complex numbers 2017-02-26 10:18:45 +01:00
Cedric Nugteren ccac957f17 Added documentation for the TRSV and TRSM routines 2017-02-25 13:02:15 +01:00
Cedric Nugteren 492ee3d0a5 Removed the invert routine from the tests 2017-02-25 12:28:13 +01:00
Cedric Nugteren e47d95887c Added PrepareData function for TRSM to create proper test input 2017-02-25 12:23:04 +01:00
Cedric Nugteren 2f2a510c38 Implemented a simple row-major to col-major problem conversion for TRSM 2017-02-24 21:08:44 +01:00
Cedric Nugteren 1e5b5157bc Fixed a few issues with the TRSM routine; some tests still failing 2017-02-22 20:31:33 +01:00
Cedric Nugteren 133ebfc834 Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass 2017-02-19 17:43:26 +01:00
Cedric Nugteren 0643a29af5 Added tuning parameters for the AMD RX480 GPU (Ellesmere) 2017-02-18 13:59:10 +01:00
Cedric Nugteren 0ea30263ac Merge pull request #137 from CNugteren/custom_parameters
API to override tuning parameters
2017-02-18 12:34:38 +01:00
Cedric Nugteren 7b2170818f Changed the override-parameters test such that it is compatible with more devices 2017-02-18 11:22:07 +01:00
Cedric Nugteren 2e0951c6dc Fixed small typo in the documentation 2017-02-18 11:05:54 +01:00
Cedric Nugteren fef11a208c Added documentation for the OverrideParameters function 2017-02-18 11:02:57 +01:00
Cedric Nugteren d6538dfc25 Fixed the naming of the C API of OverrideParameters and fixed the description 2017-02-18 10:59:38 +01:00
Cedric Nugteren 3d10690c83 Added missing documentation for the fill and clear cache functions 2017-02-18 10:32:32 +01:00
Cedric Nugteren cda449a5c3 Added a C interface to the OverrideParameters function; added some in-line comments to the API 2017-02-16 21:14:48 +01:00
Cedric Nugteren 08bfb75a9d Added input-sanity checks for the OverrideParameters function 2017-02-16 21:12:50 +01:00
Cedric Nugteren bdc57221bd Added simple tests for the OverrideParameters function 2017-02-14 21:09:00 +01:00
Cedric Nugteren cdb3bb7166 Added first version of the OverrideParameters function 2017-02-13 20:53:06 +01:00
Cedric Nugteren 00eb55a2d4 Fixed a small bug in GEMV: unused kernel in parameter list 2017-02-13 20:48:32 +01:00
Cedric Nugteren 345a5feb9a Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides) 2017-02-12 12:02:39 +01:00
Cedric Nugteren faa842b927 Made RemoveBySubset from the cache work with references to keys 2017-02-12 11:58:20 +01:00
Cedric Nugteren 36b942a698 Added an option to remove items from the caches, optionally by a subset of 2 specific key-values only 2017-02-11 14:05:38 +01:00
Cedric Nugteren dc93523204 Added tuning results for Titan X (Pascal version) 2017-02-08 21:14:38 +01:00
Cedric Nugteren c248f900c0 Merge branch 'development' into triangular_solvers 2017-02-05 22:18:59 +01:00
Cedric Nugteren e7cbb5915a Fixed complex version of the TRSV kernel 2017-02-05 14:36:31 +01:00
Cedric Nugteren c209dd7af9 Improved substition kernels a bit; added complex support 2017-02-04 22:48:06 +01:00
Cedric Nugteren fec8c1a806 Completed a first STRSV implementation 2017-02-04 16:04:19 +01:00
Cedric Nugteren a6ba6470aa Added row-major support for TRSV 2017-02-04 14:25:27 +01:00
Cedric Nugteren 7c73ceb095 Added first (incomplete) version of TRSV routine 2017-01-29 17:02:00 +01:00
Cedric Nugteren fd471e380c Updated the changelog for PR131 and PR132 2017-01-24 20:34:09 +01:00
Cedric Nugteren 5e7d140d59 Merge pull request #132 from intelfx/cache
Refactor cache subsystem
2017-01-24 20:16:57 +01:00
Ivan Shapovalov 5fb1da1a0f Database: pass Device instead of Queue for clarity 2017-01-24 12:18:14 +03:00
Ivan Shapovalov 50e758a007 Routine: cache the database instance as well
This does not change much, but will become useful in next commits when
plugin support is introduced.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov 6dc18c1c57 Database: ref-count the internal map for caching 2017-01-24 11:56:15 +03:00
Ivan Shapovalov 5bcd92f297 Routine, Cache: generalize, reduce amount of copying in fast path
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Cedric Nugteren e943fe77d6 Merge pull request #131 from intelfx/misc
Assorted minor fixes
2017-01-24 09:10:35 +01:00
Ivan Shapovalov 46a59eb882 .travis.yml: do not build for osx twice, there's no gcc there 2017-01-24 02:55:09 +03:00
Ivan Shapovalov 064ba4abd4 treewide: silence type mismatch warnings in *printf() 2017-01-24 02:55:09 +03:00
Ivan Shapovalov 519ccbd273 Tester: always fail on OpenCL and CLBlast internal errors
These errors are self-evident and enough to fail the test even if there is
no clBLAS reference to compare error codes with.
2017-01-24 02:55:09 +03:00
Ivan Shapovalov 1b8e816333 FillCache: perform compilation for each precision separately
Thus do not prevent filling cache for float if the device does not support
e. g. double.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov 6ad11665a1 Routine: fix semi-warm routine construction (when binary is in cache)
There was a missing return statement in the semi-warm path that made
CLBlast to continue to cold path after a cache hit.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov a9914ee3a8 src/clpp11.hpp: check pointers before clRelease*()
This is to avoid spurious "induced" errors on destruction, if construction
failed for some reason.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov 8e1c084c93 src/clpp11.hpp: do not store program source/binary in Program
The stored source/binary does not seem to serve any purpose, yet its
presence makes Program a heavy (not pure refcounted) object, which is
undesired esp. because it is copied from the cache in the hot path.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov ee4124dcbc samples: add CL_USE_DEPRECATED_OPENCL_1_*_APIS where needed 2017-01-24 02:42:59 +03:00
Ivan Shapovalov 1a1e863ab3 treewide: include clpp11.hpp first to silence deprecation warnings
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Ivan Shapovalov 43c7707173 Routine: use PrecisionSupported<>() instead of duplicating the check 2017-01-20 17:20:45 +03:00
Cedric Nugteren a5fd2323b6 Added prototype for the TRSV routine 2017-01-20 11:30:32 +01:00
Cedric Nugteren a2c0a9c551 Set number of decimals for floating-point printing for error reporting 2017-01-20 11:13:44 +01:00
Cedric Nugteren 2e4f6e1609 Added tuning results for NVIDIA GTX 1080 and Intel Core i7-4790K 2017-01-19 19:42:31 +01:00