Cedric Nugteren
606990af6f
Made the database script properly handle multiple entries for a single device
2017-11-20 21:38:23 +01:00
Cedric Nugteren
0f080bbc6e
Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated
2017-11-20 20:54:18 +01:00
Cedric Nugteren
e0f3484084
Fixes some displaying issues in the GEMM routine tuner
2017-11-20 20:29:52 +01:00
Cedric Nugteren
5467c0cac5
Fixed a variety of warnings and an error for MSVC2013 compilation
2017-11-19 21:09:24 +01:00
Cedric Nugteren
da76d7ab81
Merge pull request #216 from CNugteren/integrated_tuner
...
Integrated tuner
2017-11-19 20:05:15 +01:00
Cedric Nugteren
defad3d1a2
Minor fix to the database script
2017-11-19 18:19:21 +01:00
Cedric Nugteren
4e0d08c3bc
Added compilation timing and better compilation error reporting
2017-11-19 16:58:13 +01:00
Cedric Nugteren
a3a8b44f59
Some fixed for the new auto-tuner to be compatible with the Python scripts
2017-11-19 16:31:08 +01:00
Cedric Nugteren
c6690df896
Made the tuners be compiled by default
2017-11-19 14:33:25 +01:00
Cedric Nugteren
76d2b7f0b6
Revived the GEMM routine tuner; minor formatting changes
2017-11-19 12:59:52 +01:00
Cedric Nugteren
8d2f7d53aa
Added a library with common tuner sources to speed-up compilation
2017-11-19 12:59:28 +01:00
Cedric Nugteren
7a54494577
Modified the kernel tuners to use the newly integrated auto-tuner
2017-11-19 12:58:41 +01:00
Cedric Nugteren
8a5a5e031e
Moved some tuning functions from .hpp to .cpp
2017-11-17 20:58:36 +01:00
Cedric Nugteren
f94d498a37
Moved compilation function to separate file; removed dependency of tuners of the CLBlast library
2017-11-17 20:57:46 +01:00
Cedric Nugteren
d9cf206979
Removed dependency on CLTune
2017-11-16 21:28:36 +01:00
Cedric Nugteren
2b8ad70b63
Added printing of the best parameters for the new tuner
2017-11-16 21:18:29 +01:00
Cedric Nugteren
1b2b46f2f0
Added first version of integrated and re-written auto-tuner
2017-11-15 22:49:35 +01:00
Cedric Nugteren
0cd78bb6f9
Added kernel timing functionality to the utilities
2017-11-15 22:47:06 +01:00
Cedric Nugteren
b337bffbaf
Added exception handle with catch-all
2017-11-15 22:44:44 +01:00
Cedric Nugteren
03ebf14b97
Made the exception dispatch function optionally silent
2017-11-13 21:11:31 +01:00
Cedric Nugteren
4bac1287f2
Moved square-difference utility function for use in the tuners
2017-11-13 21:10:44 +01:00
Cedric Nugteren
677afd3b96
Factored out the creation of the OpenCL header and the program compilation
2017-11-11 16:14:43 +01:00
Cedric Nugteren
c41d219ea4
Added tuning results for the GeForce GTX750Ti
2017-11-09 21:19:21 +01:00
Cedric Nugteren
5d5e3f93bc
Updated to CLBlast version 1.2.0
2017-11-08 21:30:06 +01:00
Cedric Nugteren
d24138808b
Fixed an FP16 issue in the homatcopy test; added a comment about improper testing of integer returning functions for FP16
2017-11-08 21:20:07 +01:00
Cedric Nugteren
b18cc9d3f1
Merge pull request #212 from CNugteren/kernel_selection_tuner
...
GEMM kernel selection tuner
2017-11-07 22:20:13 +01:00
Cedric Nugteren
6fe9916231
Updated the roadmap
2017-11-07 21:35:04 +01:00
Cedric Nugteren
3ec0be6fb8
Added various GEMM routine tuning results
2017-11-07 21:34:54 +01:00
Cedric Nugteren
33ac2b0175
Improved the way the database defaults are computed
2017-11-06 21:59:45 +01:00
Cedric Nugteren
34a33b54cf
Changed GEMM routine tuner's scoring to use L2 measure instead for better averaging
2017-11-06 20:50:36 +01:00
Cedric Nugteren
9b0a435fb0
Integrated the GEMM routine tuner for kernel selection; added first tuning results
2017-11-02 21:47:14 +01:00
Cedric Nugteren
73272ab97d
Fixed a bug in database compression/decompression
2017-11-02 21:19:18 +01:00
Cedric Nugteren
5c90577dfd
Added collecting and printing of scores for the kernel-selection tuner
2017-10-30 20:39:21 +01:00
Cedric Nugteren
061b1c571b
Merge branch 'binary_cache_platform_dependent'
2017-10-30 19:42:35 +01:00
Cedric Nugteren
ac5a58cfe5
Added platform ID to the binary program cache to prevent issues with multi-platform systems
2017-10-29 20:01:30 +01:00
Cedric Nugteren
19c53f6dd0
Merge pull request #208 from CNugteren/android_support
...
Added Android support
2017-10-29 16:45:56 +01:00
Cedric Nugteren
f24d611e57
Made it possible to compile the CLBlast performance clients for Android with the NDK
2017-10-29 13:02:14 +01:00
Cedric Nugteren
319762f150
Added Android support using the GNU C++ STL library and the GCC toolchain
2017-10-29 12:07:07 +01:00
Cedric Nugteren
12b08ae491
Merge branch 'master' into android_support
2017-10-28 17:32:37 +02:00
Cedric Nugteren
334a26eb12
Added initial version of a GEMM kernel selection tuner
2017-10-28 17:30:29 +02:00
Cedric Nugteren
bd57dfa435
Moved timing function to a separate file
2017-10-28 14:12:05 +02:00
Cedric Nugteren
fa6e5e67f5
Fixed a bug when using the matrix A-offset argument for the TRSM routine
2017-10-27 22:12:30 +02:00
Cedric Nugteren
449577cf07
Reduced TRSM block-size for better numerical stability
2017-10-27 22:07:43 +02:00
Cedric Nugteren
44f7fa628a
Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM
2017-10-27 22:01:15 +02:00
Cedric Nugteren
8579b2b494
Added a DTRSM C++ interface example
2017-10-27 21:53:19 +02:00
Cedric Nugteren
e388f055f7
Fixed small bug in (unused) invert tester
2017-10-25 20:35:39 +02:00
Cedric Nugteren
8cdb5cb4a7
Updated roadmap with links to issues and status
2017-10-25 20:35:39 +02:00
Cedric Nugteren
d49aae236e
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls
2017-10-25 20:35:39 +02:00
Cedric Nugteren
42ac3b4748
Merge pull request #206 from matze/use-gnuinstall-dirs
...
Use GNUInstallDirs to determine install paths
2017-10-23 20:03:47 +02:00
Matthias Vogelgesang
34e537a5c1
Use GNUInstallDirs to determine install paths
...
The GNUInstallDirs module* provides variables that match the install directories
for GNU Software and allows users to override them. Without hardcoding paths
packagers can choose library paths according to distribution policies (i.e.
lib, lib64, lib<arch>, ...).
* https://cmake.org/cmake/help/v3.0/module/GNUInstallDirs.html
2017-10-23 15:54:55 +02:00