Cedric Nugteren
1e738db6dd
Split the database into multiple small compilation units
2017-12-27 12:04:22 +01:00
Cedric Nugteren
4a2fc4aa98
Made the database-vector a non-static member
2017-12-26 11:32:05 +01:00
Cedric Nugteren
7aabeb44cc
Updated the tuning results for the IvyBridge M GT2 GPU
2017-12-23 15:46:41 +01:00
Cedric Nugteren
b1f52f130c
Updated the database to use the new TRSV and Invert tuners
2017-12-23 13:55:22 +01:00
Cedric Nugteren
0ee81e27b9
Added tuning results for Apple AMD Radeon Pro 580
2017-12-20 19:59:31 +01:00
Cedric Nugteren
69f6591564
Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results based on kernel pre-processor
2017-12-17 16:59:08 +01:00
Cedric Nugteren
abb4d5ab32
Added tuning results for ARM Mali T760 GPU
2017-11-24 21:16:54 +01:00
Cedric Nugteren
c41d219ea4
Added tuning results for the GeForce GTX750Ti
2017-11-09 21:19:21 +01:00
Cedric Nugteren
3ec0be6fb8
Added various GEMM routine tuning results
2017-11-07 21:34:54 +01:00
Cedric Nugteren
33ac2b0175
Improved the way the database defaults are computed
2017-11-06 21:59:45 +01:00
Cedric Nugteren
9b0a435fb0
Integrated the GEMM routine tuner for kernel selection; added first tuning results
2017-11-02 21:47:14 +01:00
Cedric Nugteren
73272ab97d
Fixed a bug in database compression/decompression
2017-11-02 21:19:18 +01:00
Cedric Nugteren
472f90501c
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
2017-10-20 18:06:12 +02:00
Cedric Nugteren
ed980a1df1
Updated database override function to work with the new database storage format
2017-09-24 15:44:14 +02:00
Cedric Nugteren
255f09843c
Made program and binary databases dependent on the routine parameters on top of the name
2017-09-23 20:40:38 +02:00
Cedric Nugteren
1d2ee29cb9
Fixed compilation issues of the database for MSVC 2013
2017-09-19 19:44:05 +02:00
Cedric Nugteren
a23cd8d13a
Updated README with proper AMD device names; fixed device look-up for names of length 50+
2017-09-16 21:26:38 +02:00
Cedric Nugteren
0802e3d84c
Added tuning results for Intel Core i7 6770HQ
2017-09-16 21:19:06 +02:00
Cedric Nugteren
bcf39eb79a
Fixed a compilation error and warning under MacOS
2017-09-16 18:34:11 +02:00
Cedric Nugteren
4e317f5e85
Improved compilation time of the tuner database
2017-09-16 18:02:37 +02:00
Cedric Nugteren
0d13d814c2
Added architecture layer in the tuning database for better performance on unseen devices
2017-09-14 21:27:33 +02:00
Cedric Nugteren
76382ff6c1
Added the new vendor-architecture-name hierarchy to the tuners as well
2017-09-10 16:34:54 +02:00
Cedric Nugteren
91ea7fcde2
Introduced the notion of a device-architecture for the database and added device and architecture name mappings
2017-09-08 21:09:05 +02:00
Cedric Nugteren
20da5e33a8
Split the database files over multiple directories and files; first step towards separate compilation
2017-09-06 21:50:42 +02:00
Cedric Nugteren
28462aa050
Removed an assumption that the 'default' tuning parameters have to be stored last; this is no longer needed
2017-09-04 17:39:57 +02:00
Cedric Nugteren
18d832e149
Added tuning results for the Qualcomm Adreno 330 GPU
2017-07-30 18:18:02 +02:00
mcian
a36283aaec
Add new threshold for ARM
2017-07-17 12:20:46 +02:00
Cedric Nugteren
1a8ed48a35
Fixed some Clang and MSVC warnings
2017-06-25 11:50:36 +02:00
Cedric Nugteren
615a7fdc81
Fixes some compilation issues related to the database structure change
2017-06-21 23:07:47 +02:00
Cedric Nugteren
e44feb8576
Changed the structure of the database to reduce compilation time and save memory
2017-06-20 21:19:26 +02:00
Cedric Nugteren
48f2682eb7
Added tuning results for the Core i7-920 CPU
2017-06-18 20:53:59 +02:00
Cedric Nugteren
33ed1e5a06
Added tuning results for GeForce GT 650M (thanks to bzcheeseman)
2017-06-01 22:52:08 +02:00
Cedric Nugteren
71933c3411
Added tuning results for the AMD Radeon Fiji GPU
2017-05-11 22:53:52 -07:00
Cedric Nugteren
1c33af6eab
Re-added Titan X (Pascal) tuning results based on more averaging when tuning
2017-04-23 17:58:56 +02:00
Cedric Nugteren
192199c9cb
Fixed the direct vs indirect setting for NVIDIA GPUs
2017-04-22 13:43:27 +02:00
Cedric Nugteren
e41d204856
Increased the default number of runs for GEMV tuning; updated GEMV tuning results for Iris Pro
2017-04-21 22:12:20 +02:00
Cedric Nugteren
d7314d4f8e
Tuned the direct versus indirect GEMM kernel trade-off point for NVIDIA GPUs
2017-04-20 22:19:09 +02:00
Cedric Nugteren
7374c37e2e
Fixed a compilation issue under MSVC and GCC
2017-04-10 08:38:24 +02:00
Cedric Nugteren
2d45c37676
Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard
2017-04-10 07:40:27 +02:00
Cedric Nugteren
fb6c78ea07
Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance
2017-04-07 07:37:30 +02:00
Cedric Nugteren
e9ef037549
Added tuning results for the Radeon HD6750M GPU (Apple OpenCL)
2017-03-04 15:24:55 +01:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
0643a29af5
Added tuning parameters for the AMD RX480 GPU (Ellesmere)
2017-02-18 13:59:10 +01:00
Cedric Nugteren
08bfb75a9d
Added input-sanity checks for the OverrideParameters function
2017-02-16 21:12:50 +01:00
Cedric Nugteren
345a5feb9a
Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides)
2017-02-12 12:02:39 +01:00
Cedric Nugteren
dc93523204
Added tuning results for Titan X (Pascal version)
2017-02-08 21:14:38 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Cedric Nugteren
fec8c1a806
Completed a first STRSV implementation
2017-02-04 16:04:19 +01:00
Ivan Shapovalov
5fb1da1a0f
Database: pass Device instead of Queue for clarity
2017-01-24 12:18:14 +03:00
Ivan Shapovalov
6dc18c1c57
Database: ref-count the internal map for caching
2017-01-24 11:56:15 +03:00
Cedric Nugteren
2e4f6e1609
Added tuning results for NVIDIA GTX 1080 and Intel Core i7-4790K
2017-01-19 19:42:31 +01:00
Cedric Nugteren
4b3ffd9989
Added a first version of the diagonal block invert routine in preparation of TRSM
2017-01-15 17:30:00 +01:00
Cedric Nugteren
32b850b12b
Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU
2017-01-03 20:30:56 +01:00
Cedric Nugteren
26e0177431
Made Intel GPUs always use the indirect version of the GEMM kernel
2016-11-29 20:47:20 +01:00
Cedric Nugteren
080e1be684
Improved the default parameters for cases with non-common parameters across all devices
2016-11-26 16:38:17 +01:00
Cedric Nugteren
6eeb1180fd
Changed the GEMM kernel selection parameters for Skylake GPUs to always favour the regular kernel
2016-11-19 22:15:33 +01:00
Cedric Nugteren
746d688e07
Updated the tuning results for the Intel Skylake ULT GT2 GPU
2016-11-15 22:42:04 +01:00
Cedric Nugteren
ec687afa75
Added tuning results for GeForce GTX TITAN Black
2016-10-24 19:49:10 +02:00
Cedric Nugteren
c925fe463f
Added tuning results for the AMD Tonga GPU
2016-10-22 16:25:31 +02:00
Cedric Nugteren
b0ff11acf0
Moved files around a bit; created a utilities subfolder
2016-10-22 15:36:48 +02:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren
0f9311d46a
Fixed an issue with a growing database: the database is now a global variable in a namespace and its container uses const-pointers to the actual data
2016-10-14 20:56:32 +02:00
Cedric Nugteren
ebb505b783
Added tuning results for Intel HD Graphics IvyBridge GPU
2016-10-13 12:18:28 +02:00
Cedric Nugteren
f88c50522d
Fixed an issue with const members of structs in the database
2016-10-10 22:24:05 +02:00
Cedric Nugteren
fcac81bfef
First fixes towards compilation on Visual Studio 2013
2016-10-10 20:37:45 +02:00
Cedric Nugteren
08ee57f494
Updated the tuning results for the GTX 750 Ti GPU
2016-10-10 16:41:41 +02:00
Cedric Nugteren
7c228f6a67
Changed the thresholds for the direct/indirect GEMM kernels for NVIDIA and Intel GPUs
2016-10-10 16:01:02 +02:00
Cedric Nugteren
7baac46e72
Fixed a performance bug for Intel Iris Pro GPUs due to incorrect tuning results
2016-10-08 21:56:06 +02:00
Cedric Nugteren
b698e45478
Added first tuning results for the single-kernel direct GEMM implementation
2016-10-06 21:13:14 +02:00
Cedric Nugteren
a3e67f2be2
Added a kernel selection database to select between the direct and indirect GEMM kernels
2016-10-06 19:51:12 +02:00
Cedric Nugteren
a459920105
Added padding to the local memory of the GEMM direct kernel
2016-10-01 16:58:53 +02:00
Cedric Nugteren
73d135c2ce
Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter
2016-09-25 14:48:34 +02:00
Cedric Nugteren
669f43aed6
Separated the tuning parameters of the new direct GEMM kernel from the indirect version
2016-09-25 13:52:08 +02:00
Cedric Nugteren
aa3dffe356
Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are now automatically taken from 32-bit if there are no entries at all
2016-09-12 20:13:38 +02:00
Cedric Nugteren
b5a67f86ec
Complete re-write of the database script. Changed Pandas for the much faster and convienient plain JSON/dict data-type
2016-09-11 21:29:28 +02:00
Cedric Nugteren
e21f32bc99
Updated database based on exhaustive tuning results for GEMM for the R9 M370X GPU
2016-09-10 14:00:43 +02:00
Cedric Nugteren
3daba70997
Updated the database script to remove duplicate entries: keeps only the best-performing cases for a specific parameters combination
2016-09-10 11:12:09 +02:00
Cedric Nugteren
521bf6cdfc
Added tuning results for Intel Broadwell 5500 GT2 GPU
2016-09-03 16:43:23 +02:00
Cedric Nugteren
19574b2519
Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to handle duplicate entries of different runs
2016-09-03 12:45:11 +02:00
Cedric Nugteren
0c0f0ac7f9
Also changed the default-default for unknown device types to use the same method as for known device groups
2016-08-21 20:35:20 +02:00
Cedric Nugteren
7d5631b7e4
Updated the database script to calculate the relative best performance of tuning results common for a device/vendor type
2016-08-15 21:01:07 +02:00
Cedric Nugteren
de1afe168d
Removed all old tuning results for the XgemvFastRot kernel; re-added for a couple of devices
2016-07-25 22:57:23 +02:00
Cedric Nugteren
2582f0290a
Moved the XgemvFast and XgemvFastRot tuning database into a separate file
2016-07-25 22:43:49 +02:00
Cedric Nugteren
0252df731a
Merge branch 'development' into gemv_performance
2016-07-24 17:06:27 +02:00
Cedric Nugteren
ffa35c623a
Minor improvements after merging in groundwork for custom tuning parameters and kernels
2016-07-24 17:00:21 +02:00
Cedric Nugteren
7a4f963763
Further improvements to the XgemvFastRot kernel, properly enables coalescing now
2016-07-23 14:52:32 +02:00
Cedric Nugteren
75fe8235f7
Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enabling better memory performance
2016-07-23 10:20:11 +02:00
Ivan Shapovalov
e4e1f05079
clblast::Database, clblast::Routine: implement "database overlays" provided by routine implementation
2016-07-22 11:15:52 +03:00
Cedric Nugteren
57f09178d8
Added tuning results for AMD Oland and for Intel Graphics HD 530
2016-07-10 11:46:44 +02:00
Cedric Nugteren
9683b50c55
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
2016-07-03 20:30:47 +02:00
Cedric Nugteren
66908ef5cd
Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (thanks to OursDesCavernes)
2016-06-19 14:59:50 +02:00
Cedric Nugteren
61203453aa
Renamed all C++ source files to .cpp to match the .hpp extension better
2016-06-19 13:55:49 +02:00
Cedric Nugteren
f726fbdc9f
Moved all headers into the source tree, changed headers to .hpp extension
2016-06-18 20:20:13 +02:00