Cedric Nugteren
560f7a40f6
Added convgemm to the CLBlast database, added initial parameters for Skylake GPU
2018-12-31 19:05:34 +01:00
Cedric Nugteren
ff4d5558a6
Widened Apple OpenCL check, added way to debug too-large-workgroups issue
2018-05-30 22:59:04 +02:00
Cedric Nugteren
a8bb0c9f3c
Added Apple OpenCL TRSV block size override; removed failing old Intel GPU test from README
2018-05-29 21:29:12 +02:00
Cedric Nugteren
e71c037304
Fixed a performance overhead in database creation: it is again a static variable now as it was before
2018-01-06 11:28:04 +01:00
Cedric Nugteren
4a2fc4aa98
Made the database-vector a non-static member
2017-12-26 11:32:05 +01:00
Cedric Nugteren
b1f52f130c
Updated the database to use the new TRSV and Invert tuners
2017-12-23 13:55:22 +01:00
Cedric Nugteren
9b0a435fb0
Integrated the GEMM routine tuner for kernel selection; added first tuning results
2017-11-02 21:47:14 +01:00
Cedric Nugteren
255f09843c
Made program and binary databases dependent on the routine parameters on top of the name
2017-09-23 20:40:38 +02:00
Cedric Nugteren
a23cd8d13a
Updated README with proper AMD device names; fixed device look-up for names of length 50+
2017-09-16 21:26:38 +02:00
Cedric Nugteren
bcf39eb79a
Fixed a compilation error and warning under MacOS
2017-09-16 18:34:11 +02:00
Cedric Nugteren
4e317f5e85
Improved compilation time of the tuner database
2017-09-16 18:02:37 +02:00
Cedric Nugteren
0d13d814c2
Added architecture layer in the tuning database for better performance on unseen devices
2017-09-14 21:27:33 +02:00
Cedric Nugteren
76382ff6c1
Added the new vendor-architecture-name hierarchy to the tuners as well
2017-09-10 16:34:54 +02:00
Cedric Nugteren
91ea7fcde2
Introduced the notion of a device-architecture for the database and added device and architecture name mappings
2017-09-08 21:09:05 +02:00
Cedric Nugteren
20da5e33a8
Split the database files over multiple directories and files; first step towards separate compilation
2017-09-06 21:50:42 +02:00
Cedric Nugteren
28462aa050
Removed an assumption that the 'default' tuning parameters have to be stored last; this is no longer needed
2017-09-04 17:39:57 +02:00
Cedric Nugteren
e44feb8576
Changed the structure of the database to reduce compilation time and save memory
2017-06-20 21:19:26 +02:00
Cedric Nugteren
7374c37e2e
Fixed a compilation issue under MSVC and GCC
2017-04-10 08:38:24 +02:00
Cedric Nugteren
2d45c37676
Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard
2017-04-10 07:40:27 +02:00
Cedric Nugteren
fb6c78ea07
Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance
2017-04-07 07:37:30 +02:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
08bfb75a9d
Added input-sanity checks for the OverrideParameters function
2017-02-16 21:12:50 +01:00
Cedric Nugteren
345a5feb9a
Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides)
2017-02-12 12:02:39 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Cedric Nugteren
fec8c1a806
Completed a first STRSV implementation
2017-02-04 16:04:19 +01:00
Ivan Shapovalov
5fb1da1a0f
Database: pass Device instead of Queue for clarity
2017-01-24 12:18:14 +03:00
Ivan Shapovalov
6dc18c1c57
Database: ref-count the internal map for caching
2017-01-24 11:56:15 +03:00
Cedric Nugteren
4b3ffd9989
Added a first version of the diagonal block invert routine in preparation of TRSM
2017-01-15 17:30:00 +01:00
Cedric Nugteren
b0ff11acf0
Moved files around a bit; created a utilities subfolder
2016-10-22 15:36:48 +02:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren
0f9311d46a
Fixed an issue with a growing database: the database is now a global variable in a namespace and its container uses const-pointers to the actual data
2016-10-14 20:56:32 +02:00
Cedric Nugteren
fcac81bfef
First fixes towards compilation on Visual Studio 2013
2016-10-10 20:37:45 +02:00
Cedric Nugteren
a3e67f2be2
Added a kernel selection database to select between the direct and indirect GEMM kernels
2016-10-06 19:51:12 +02:00
Cedric Nugteren
669f43aed6
Separated the tuning parameters of the new direct GEMM kernel from the indirect version
2016-09-25 13:52:08 +02:00
Cedric Nugteren
aa3dffe356
Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are now automatically taken from 32-bit if there are no entries at all
2016-09-12 20:13:38 +02:00
Cedric Nugteren
de1afe168d
Removed all old tuning results for the XgemvFastRot kernel; re-added for a couple of devices
2016-07-25 22:57:23 +02:00
Cedric Nugteren
2582f0290a
Moved the XgemvFast and XgemvFastRot tuning database into a separate file
2016-07-25 22:43:49 +02:00
Cedric Nugteren
ffa35c623a
Minor improvements after merging in groundwork for custom tuning parameters and kernels
2016-07-24 17:00:21 +02:00
Ivan Shapovalov
e4e1f05079
clblast::Database, clblast::Routine: implement "database overlays" provided by routine implementation
2016-07-22 11:15:52 +03:00
Cedric Nugteren
61203453aa
Renamed all C++ source files to .cpp to match the .hpp extension better
2016-06-19 13:55:49 +02:00