Cedric Nugteren
00281dad26
Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants
2017-02-27 21:00:04 +01:00
Cedric Nugteren
4284fcd940
Updated the README documentation
2017-02-26 16:32:53 +01:00
Cedric Nugteren
7de7e7d8ed
Merge pull request #138 from CNugteren/triangular_solvers
...
Added the triangular solvers (TRSV/TRSM)
2017-02-26 16:26:41 +01:00
Cedric Nugteren
e09c26c706
Split the GEMM kernel further up to prevent C1091 in MSVC
2017-02-26 15:03:12 +01:00
Cedric Nugteren
dde67ac79e
Minor fix to the generator script
2017-02-26 14:53:58 +01:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
a145890aaa
Added a guard against invalid buffer sizes in the prepare-data functions for tests
2017-02-26 14:37:29 +01:00
Cedric Nugteren
df7638c305
Fixed an out-of-bounds memory access when filling a matrix with a constant
2017-02-26 14:31:05 +01:00
Cedric Nugteren
b7310036ed
Removed half-precision support from the TRSM routine; too unstable
2017-02-26 12:56:21 +01:00
Cedric Nugteren
70d8c4bad7
Improved the correctness tests for complex numbers in case either real or imag is much larger than the other
2017-02-26 10:19:53 +01:00
Cedric Nugteren
a433987441
Fixes division in the kernel for inversion of complex numbers
2017-02-26 10:18:45 +01:00
Cedric Nugteren
ccac957f17
Added documentation for the TRSV and TRSM routines
2017-02-25 13:02:15 +01:00
Cedric Nugteren
492ee3d0a5
Removed the invert routine from the tests
2017-02-25 12:28:13 +01:00
Cedric Nugteren
e47d95887c
Added PrepareData function for TRSM to create proper test input
2017-02-25 12:23:04 +01:00
Cedric Nugteren
2f2a510c38
Implemented a simple row-major to col-major problem conversion for TRSM
2017-02-24 21:08:44 +01:00
Cedric Nugteren
1e5b5157bc
Fixed a few issues with the TRSM routine; some tests still failing
2017-02-22 20:31:33 +01:00
Cedric Nugteren
133ebfc834
Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass
2017-02-19 17:43:26 +01:00
Cedric Nugteren
0643a29af5
Added tuning parameters for the AMD RX480 GPU (Ellesmere)
2017-02-18 13:59:10 +01:00
Cedric Nugteren
0ea30263ac
Merge pull request #137 from CNugteren/custom_parameters
...
API to override tuning parameters
2017-02-18 12:34:38 +01:00
Cedric Nugteren
7b2170818f
Changed the override-parameters test such that it is compatible with more devices
2017-02-18 11:22:07 +01:00
Cedric Nugteren
2e0951c6dc
Fixed small typo in the documentation
2017-02-18 11:05:54 +01:00
Cedric Nugteren
fef11a208c
Added documentation for the OverrideParameters function
2017-02-18 11:02:57 +01:00
Cedric Nugteren
d6538dfc25
Fixed the naming of the C API of OverrideParameters and fixed the description
2017-02-18 10:59:38 +01:00
Cedric Nugteren
3d10690c83
Added missing documentation for the fill and clear cache functions
2017-02-18 10:32:32 +01:00
Cedric Nugteren
cda449a5c3
Added a C interface to the OverrideParameters function; added some in-line comments to the API
2017-02-16 21:14:48 +01:00
Cedric Nugteren
08bfb75a9d
Added input-sanity checks for the OverrideParameters function
2017-02-16 21:12:50 +01:00
Cedric Nugteren
bdc57221bd
Added simple tests for the OverrideParameters function
2017-02-14 21:09:00 +01:00
Cedric Nugteren
cdb3bb7166
Added first version of the OverrideParameters function
2017-02-13 20:53:06 +01:00
Cedric Nugteren
00eb55a2d4
Fixed a small bug in GEMV: unused kernel in parameter list
2017-02-13 20:48:32 +01:00
Cedric Nugteren
345a5feb9a
Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides)
2017-02-12 12:02:39 +01:00
Cedric Nugteren
faa842b927
Made RemoveBySubset from the cache work with references to keys
2017-02-12 11:58:20 +01:00
Cedric Nugteren
36b942a698
Added an option to remove items from the caches, optionally by a subset of 2 specific key-values only
2017-02-11 14:05:38 +01:00
Cedric Nugteren
dc93523204
Added tuning results for Titan X (Pascal version)
2017-02-08 21:14:38 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Cedric Nugteren
e7cbb5915a
Fixed complex version of the TRSV kernel
2017-02-05 14:36:31 +01:00
Cedric Nugteren
c209dd7af9
Improved substition kernels a bit; added complex support
2017-02-04 22:48:06 +01:00
Cedric Nugteren
fec8c1a806
Completed a first STRSV implementation
2017-02-04 16:04:19 +01:00
Cedric Nugteren
a6ba6470aa
Added row-major support for TRSV
2017-02-04 14:25:27 +01:00
Cedric Nugteren
7c73ceb095
Added first (incomplete) version of TRSV routine
2017-01-29 17:02:00 +01:00
Cedric Nugteren
fd471e380c
Updated the changelog for PR131 and PR132
2017-01-24 20:34:09 +01:00
Cedric Nugteren
5e7d140d59
Merge pull request #132 from intelfx/cache
...
Refactor cache subsystem
2017-01-24 20:16:57 +01:00
Ivan Shapovalov
5fb1da1a0f
Database: pass Device instead of Queue for clarity
2017-01-24 12:18:14 +03:00
Ivan Shapovalov
50e758a007
Routine: cache the database instance as well
...
This does not change much, but will become useful in next commits when
plugin support is introduced.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov
6dc18c1c57
Database: ref-count the internal map for caching
2017-01-24 11:56:15 +03:00
Ivan Shapovalov
5bcd92f297
Routine, Cache: generalize, reduce amount of copying in fast path
...
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Cedric Nugteren
e943fe77d6
Merge pull request #131 from intelfx/misc
...
Assorted minor fixes
2017-01-24 09:10:35 +01:00
Ivan Shapovalov
46a59eb882
.travis.yml: do not build for osx twice, there's no gcc there
2017-01-24 02:55:09 +03:00
Ivan Shapovalov
064ba4abd4
treewide: silence type mismatch warnings in *printf()
2017-01-24 02:55:09 +03:00
Ivan Shapovalov
519ccbd273
Tester: always fail on OpenCL and CLBlast internal errors
...
These errors are self-evident and enough to fail the test even if there is
no clBLAS reference to compare error codes with.
2017-01-24 02:55:09 +03:00
Ivan Shapovalov
1b8e816333
FillCache: perform compilation for each precision separately
...
Thus do not prevent filling cache for float if the device does not support
e. g. double.
2017-01-24 02:43:00 +03:00