Cedric Nugteren
|
3ec0be6fb8
|
Added various GEMM routine tuning results
|
2017-11-07 21:34:54 +01:00 |
Cedric Nugteren
|
33ac2b0175
|
Improved the way the database defaults are computed
|
2017-11-06 21:59:45 +01:00 |
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
Cedric Nugteren
|
73272ab97d
|
Fixed a bug in database compression/decompression
|
2017-11-02 21:19:18 +01:00 |
Cedric Nugteren
|
472f90501c
|
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
|
2017-10-20 18:06:12 +02:00 |
Cedric Nugteren
|
0802e3d84c
|
Added tuning results for Intel Core i7 6770HQ
|
2017-09-16 21:19:06 +02:00 |
Cedric Nugteren
|
4e317f5e85
|
Improved compilation time of the tuner database
|
2017-09-16 18:02:37 +02:00 |
Cedric Nugteren
|
0d13d814c2
|
Added architecture layer in the tuning database for better performance on unseen devices
|
2017-09-14 21:27:33 +02:00 |
Cedric Nugteren
|
20da5e33a8
|
Split the database files over multiple directories and files; first step towards separate compilation
|
2017-09-06 21:50:42 +02:00 |
Cedric Nugteren
|
18d832e149
|
Added tuning results for the Qualcomm Adreno 330 GPU
|
2017-07-30 18:18:02 +02:00 |
Cedric Nugteren
|
1a8ed48a35
|
Fixed some Clang and MSVC warnings
|
2017-06-25 11:50:36 +02:00 |
Cedric Nugteren
|
615a7fdc81
|
Fixes some compilation issues related to the database structure change
|
2017-06-21 23:07:47 +02:00 |
Cedric Nugteren
|
e44feb8576
|
Changed the structure of the database to reduce compilation time and save memory
|
2017-06-20 21:19:26 +02:00 |
Cedric Nugteren
|
48f2682eb7
|
Added tuning results for the Core i7-920 CPU
|
2017-06-18 20:53:59 +02:00 |
Cedric Nugteren
|
33ed1e5a06
|
Added tuning results for GeForce GT 650M (thanks to bzcheeseman)
|
2017-06-01 22:52:08 +02:00 |
Cedric Nugteren
|
71933c3411
|
Added tuning results for the AMD Radeon Fiji GPU
|
2017-05-11 22:53:52 -07:00 |
Cedric Nugteren
|
1c33af6eab
|
Re-added Titan X (Pascal) tuning results based on more averaging when tuning
|
2017-04-23 17:58:56 +02:00 |
Cedric Nugteren
|
e41d204856
|
Increased the default number of runs for GEMV tuning; updated GEMV tuning results for Iris Pro
|
2017-04-21 22:12:20 +02:00 |
Cedric Nugteren
|
e9ef037549
|
Added tuning results for the Radeon HD6750M GPU (Apple OpenCL)
|
2017-03-04 15:24:55 +01:00 |
Cedric Nugteren
|
ea6790665d
|
Merge branch 'development' into triangular_solvers
|
2017-02-26 14:51:45 +01:00 |
Cedric Nugteren
|
0643a29af5
|
Added tuning parameters for the AMD RX480 GPU (Ellesmere)
|
2017-02-18 13:59:10 +01:00 |
Cedric Nugteren
|
dc93523204
|
Added tuning results for Titan X (Pascal version)
|
2017-02-08 21:14:38 +01:00 |
Cedric Nugteren
|
c248f900c0
|
Merge branch 'development' into triangular_solvers
|
2017-02-05 22:18:59 +01:00 |
Cedric Nugteren
|
fec8c1a806
|
Completed a first STRSV implementation
|
2017-02-04 16:04:19 +01:00 |
Cedric Nugteren
|
2e4f6e1609
|
Added tuning results for NVIDIA GTX 1080 and Intel Core i7-4790K
|
2017-01-19 19:42:31 +01:00 |
Cedric Nugteren
|
4b3ffd9989
|
Added a first version of the diagonal block invert routine in preparation of TRSM
|
2017-01-15 17:30:00 +01:00 |
Cedric Nugteren
|
32b850b12b
|
Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU
|
2017-01-03 20:30:56 +01:00 |
Cedric Nugteren
|
080e1be684
|
Improved the default parameters for cases with non-common parameters across all devices
|
2016-11-26 16:38:17 +01:00 |
Cedric Nugteren
|
746d688e07
|
Updated the tuning results for the Intel Skylake ULT GT2 GPU
|
2016-11-15 22:42:04 +01:00 |
Cedric Nugteren
|
ec687afa75
|
Added tuning results for GeForce GTX TITAN Black
|
2016-10-24 19:49:10 +02:00 |
Cedric Nugteren
|
c925fe463f
|
Added tuning results for the AMD Tonga GPU
|
2016-10-22 16:25:31 +02:00 |
Cedric Nugteren
|
0f9311d46a
|
Fixed an issue with a growing database: the database is now a global variable in a namespace and its container uses const-pointers to the actual data
|
2016-10-14 20:56:32 +02:00 |
Cedric Nugteren
|
ebb505b783
|
Added tuning results for Intel HD Graphics IvyBridge GPU
|
2016-10-13 12:18:28 +02:00 |
Cedric Nugteren
|
08ee57f494
|
Updated the tuning results for the GTX 750 Ti GPU
|
2016-10-10 16:41:41 +02:00 |
Cedric Nugteren
|
7baac46e72
|
Fixed a performance bug for Intel Iris Pro GPUs due to incorrect tuning results
|
2016-10-08 21:56:06 +02:00 |
Cedric Nugteren
|
b698e45478
|
Added first tuning results for the single-kernel direct GEMM implementation
|
2016-10-06 21:13:14 +02:00 |
Cedric Nugteren
|
a459920105
|
Added padding to the local memory of the GEMM direct kernel
|
2016-10-01 16:58:53 +02:00 |
Cedric Nugteren
|
73d135c2ce
|
Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter
|
2016-09-25 14:48:34 +02:00 |
Cedric Nugteren
|
669f43aed6
|
Separated the tuning parameters of the new direct GEMM kernel from the indirect version
|
2016-09-25 13:52:08 +02:00 |
Cedric Nugteren
|
aa3dffe356
|
Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are now automatically taken from 32-bit if there are no entries at all
|
2016-09-12 20:13:38 +02:00 |
Cedric Nugteren
|
b5a67f86ec
|
Complete re-write of the database script. Changed Pandas for the much faster and convienient plain JSON/dict data-type
|
2016-09-11 21:29:28 +02:00 |
Cedric Nugteren
|
e21f32bc99
|
Updated database based on exhaustive tuning results for GEMM for the R9 M370X GPU
|
2016-09-10 14:00:43 +02:00 |
Cedric Nugteren
|
3daba70997
|
Updated the database script to remove duplicate entries: keeps only the best-performing cases for a specific parameters combination
|
2016-09-10 11:12:09 +02:00 |
Cedric Nugteren
|
521bf6cdfc
|
Added tuning results for Intel Broadwell 5500 GT2 GPU
|
2016-09-03 16:43:23 +02:00 |
Cedric Nugteren
|
19574b2519
|
Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to handle duplicate entries of different runs
|
2016-09-03 12:45:11 +02:00 |
Cedric Nugteren
|
0c0f0ac7f9
|
Also changed the default-default for unknown device types to use the same method as for known device groups
|
2016-08-21 20:35:20 +02:00 |
Cedric Nugteren
|
7d5631b7e4
|
Updated the database script to calculate the relative best performance of tuning results common for a device/vendor type
|
2016-08-15 21:01:07 +02:00 |
Cedric Nugteren
|
de1afe168d
|
Removed all old tuning results for the XgemvFastRot kernel; re-added for a couple of devices
|
2016-07-25 22:57:23 +02:00 |
Cedric Nugteren
|
2582f0290a
|
Moved the XgemvFast and XgemvFastRot tuning database into a separate file
|
2016-07-25 22:43:49 +02:00 |
Cedric Nugteren
|
7a4f963763
|
Further improvements to the XgemvFastRot kernel, properly enables coalescing now
|
2016-07-23 14:52:32 +02:00 |
Cedric Nugteren
|
75fe8235f7
|
Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enabling better memory performance
|
2016-07-23 10:20:11 +02:00 |
Cedric Nugteren
|
57f09178d8
|
Added tuning results for AMD Oland and for Intel Graphics HD 530
|
2016-07-10 11:46:44 +02:00 |
Cedric Nugteren
|
9683b50c55
|
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
|
2016-07-03 20:30:47 +02:00 |
Cedric Nugteren
|
66908ef5cd
|
Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (thanks to OursDesCavernes)
|
2016-06-19 14:59:50 +02:00 |
Cedric Nugteren
|
f726fbdc9f
|
Moved all headers into the source tree, changed headers to .hpp extension
|
2016-06-18 20:20:13 +02:00 |