Cedric Nugteren
e47d95887c
Added PrepareData function for TRSM to create proper test input
2017-02-25 12:23:04 +01:00
Cedric Nugteren
2f2a510c38
Implemented a simple row-major to col-major problem conversion for TRSM
2017-02-24 21:08:44 +01:00
Cedric Nugteren
1e5b5157bc
Fixed a few issues with the TRSM routine; some tests still failing
2017-02-22 20:31:33 +01:00
Cedric Nugteren
133ebfc834
Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass
2017-02-19 17:43:26 +01:00
Cedric Nugteren
0643a29af5
Added tuning parameters for the AMD RX480 GPU (Ellesmere)
2017-02-18 13:59:10 +01:00
Cedric Nugteren
0ea30263ac
Merge pull request #137 from CNugteren/custom_parameters
...
API to override tuning parameters
2017-02-18 12:34:38 +01:00
Cedric Nugteren
7b2170818f
Changed the override-parameters test such that it is compatible with more devices
2017-02-18 11:22:07 +01:00
Cedric Nugteren
2e0951c6dc
Fixed small typo in the documentation
2017-02-18 11:05:54 +01:00
Cedric Nugteren
fef11a208c
Added documentation for the OverrideParameters function
2017-02-18 11:02:57 +01:00
Cedric Nugteren
d6538dfc25
Fixed the naming of the C API of OverrideParameters and fixed the description
2017-02-18 10:59:38 +01:00
Cedric Nugteren
3d10690c83
Added missing documentation for the fill and clear cache functions
2017-02-18 10:32:32 +01:00
Cedric Nugteren
cda449a5c3
Added a C interface to the OverrideParameters function; added some in-line comments to the API
2017-02-16 21:14:48 +01:00
Cedric Nugteren
08bfb75a9d
Added input-sanity checks for the OverrideParameters function
2017-02-16 21:12:50 +01:00
Cedric Nugteren
bdc57221bd
Added simple tests for the OverrideParameters function
2017-02-14 21:09:00 +01:00
Cedric Nugteren
cdb3bb7166
Added first version of the OverrideParameters function
2017-02-13 20:53:06 +01:00
Cedric Nugteren
00eb55a2d4
Fixed a small bug in GEMV: unused kernel in parameter list
2017-02-13 20:48:32 +01:00
Cedric Nugteren
345a5feb9a
Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides)
2017-02-12 12:02:39 +01:00
Cedric Nugteren
faa842b927
Made RemoveBySubset from the cache work with references to keys
2017-02-12 11:58:20 +01:00
Cedric Nugteren
36b942a698
Added an option to remove items from the caches, optionally by a subset of 2 specific key-values only
2017-02-11 14:05:38 +01:00
Cedric Nugteren
dc93523204
Added tuning results for Titan X (Pascal version)
2017-02-08 21:14:38 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Cedric Nugteren
e7cbb5915a
Fixed complex version of the TRSV kernel
2017-02-05 14:36:31 +01:00
Cedric Nugteren
c209dd7af9
Improved substition kernels a bit; added complex support
2017-02-04 22:48:06 +01:00
Cedric Nugteren
fec8c1a806
Completed a first STRSV implementation
2017-02-04 16:04:19 +01:00
Cedric Nugteren
a6ba6470aa
Added row-major support for TRSV
2017-02-04 14:25:27 +01:00
Cedric Nugteren
7c73ceb095
Added first (incomplete) version of TRSV routine
2017-01-29 17:02:00 +01:00
Cedric Nugteren
fd471e380c
Updated the changelog for PR131 and PR132
2017-01-24 20:34:09 +01:00
Cedric Nugteren
5e7d140d59
Merge pull request #132 from intelfx/cache
...
Refactor cache subsystem
2017-01-24 20:16:57 +01:00
Ivan Shapovalov
5fb1da1a0f
Database: pass Device instead of Queue for clarity
2017-01-24 12:18:14 +03:00
Ivan Shapovalov
50e758a007
Routine: cache the database instance as well
...
This does not change much, but will become useful in next commits when
plugin support is introduced.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov
6dc18c1c57
Database: ref-count the internal map for caching
2017-01-24 11:56:15 +03:00
Ivan Shapovalov
5bcd92f297
Routine, Cache: generalize, reduce amount of copying in fast path
...
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Cedric Nugteren
e943fe77d6
Merge pull request #131 from intelfx/misc
...
Assorted minor fixes
2017-01-24 09:10:35 +01:00
Ivan Shapovalov
46a59eb882
.travis.yml: do not build for osx twice, there's no gcc there
2017-01-24 02:55:09 +03:00
Ivan Shapovalov
064ba4abd4
treewide: silence type mismatch warnings in *printf()
2017-01-24 02:55:09 +03:00
Ivan Shapovalov
519ccbd273
Tester: always fail on OpenCL and CLBlast internal errors
...
These errors are self-evident and enough to fail the test even if there is
no clBLAS reference to compare error codes with.
2017-01-24 02:55:09 +03:00
Ivan Shapovalov
1b8e816333
FillCache: perform compilation for each precision separately
...
Thus do not prevent filling cache for float if the device does not support
e. g. double.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov
6ad11665a1
Routine: fix semi-warm routine construction (when binary is in cache)
...
There was a missing return statement in the semi-warm path that made
CLBlast to continue to cold path after a cache hit.
2017-01-24 02:43:00 +03:00
Ivan Shapovalov
a9914ee3a8
src/clpp11.hpp: check pointers before clRelease*()
...
This is to avoid spurious "induced" errors on destruction, if construction
failed for some reason.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov
8e1c084c93
src/clpp11.hpp: do not store program source/binary in Program
...
The stored source/binary does not seem to serve any purpose, yet its
presence makes Program a heavy (not pure refcounted) object, which is
undesired esp. because it is copied from the cache in the hot path.
2017-01-24 02:42:59 +03:00
Ivan Shapovalov
ee4124dcbc
samples: add CL_USE_DEPRECATED_OPENCL_1_*_APIS where needed
2017-01-24 02:42:59 +03:00
Ivan Shapovalov
1a1e863ab3
treewide: include clpp11.hpp first to silence deprecation warnings
...
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Ivan Shapovalov
43c7707173
Routine: use PrecisionSupported<>() instead of duplicating the check
2017-01-20 17:20:45 +03:00
Cedric Nugteren
a5fd2323b6
Added prototype for the TRSV routine
2017-01-20 11:30:32 +01:00
Cedric Nugteren
a2c0a9c551
Set number of decimals for floating-point printing for error reporting
2017-01-20 11:13:44 +01:00
Cedric Nugteren
2e4f6e1609
Added tuning results for NVIDIA GTX 1080 and Intel Core i7-4790K
2017-01-19 19:42:31 +01:00
Cedric Nugteren
df9a77d74d
Added first version of the TRSM routine based on the diagonal invert kernel
2017-01-18 21:29:59 +01:00
Cedric Nugteren
4b3ffd9989
Added a first version of the diagonal block invert routine in preparation of TRSM
2017-01-15 17:30:00 +01:00
Cedric Nugteren
4a4be0c3a5
Prints additional information in verbose/debug mode
2017-01-15 17:17:40 +01:00
Cedric Nugteren
ff2bf985a3
Updated the link to cl.hpp in the Khronos registry for the samples
2017-01-07 13:57:23 +01:00