Commit Graph

945 Commits (abb4d5ab324878337853685d1fdb705b913deeb4)

Author SHA1 Message Date
Cedric Nugteren abb4d5ab32 Added tuning results for ARM Mali T760 GPU 2017-11-24 21:16:54 +01:00
Cedric Nugteren 606990af6f Made the database script properly handle multiple entries for a single device 2017-11-20 21:38:23 +01:00
Cedric Nugteren 0f080bbc6e Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated 2017-11-20 20:54:18 +01:00
Cedric Nugteren e0f3484084 Fixes some displaying issues in the GEMM routine tuner 2017-11-20 20:29:52 +01:00
Cedric Nugteren 5467c0cac5 Fixed a variety of warnings and an error for MSVC2013 compilation 2017-11-19 21:09:24 +01:00
Cedric Nugteren da76d7ab81
Merge pull request #216 from CNugteren/integrated_tuner
Integrated tuner
2017-11-19 20:05:15 +01:00
Cedric Nugteren defad3d1a2 Minor fix to the database script 2017-11-19 18:19:21 +01:00
Cedric Nugteren 4e0d08c3bc Added compilation timing and better compilation error reporting 2017-11-19 16:58:13 +01:00
Cedric Nugteren a3a8b44f59 Some fixed for the new auto-tuner to be compatible with the Python scripts 2017-11-19 16:31:08 +01:00
Cedric Nugteren c6690df896 Made the tuners be compiled by default 2017-11-19 14:33:25 +01:00
Cedric Nugteren 76d2b7f0b6 Revived the GEMM routine tuner; minor formatting changes 2017-11-19 12:59:52 +01:00
Cedric Nugteren 8d2f7d53aa Added a library with common tuner sources to speed-up compilation 2017-11-19 12:59:28 +01:00
Cedric Nugteren 7a54494577 Modified the kernel tuners to use the newly integrated auto-tuner 2017-11-19 12:58:41 +01:00
Cedric Nugteren 8a5a5e031e Moved some tuning functions from .hpp to .cpp 2017-11-17 20:58:36 +01:00
Cedric Nugteren f94d498a37 Moved compilation function to separate file; removed dependency of tuners of the CLBlast library 2017-11-17 20:57:46 +01:00
Cedric Nugteren d9cf206979 Removed dependency on CLTune 2017-11-16 21:28:36 +01:00
Cedric Nugteren 2b8ad70b63 Added printing of the best parameters for the new tuner 2017-11-16 21:18:29 +01:00
Cedric Nugteren 1b2b46f2f0 Added first version of integrated and re-written auto-tuner 2017-11-15 22:49:35 +01:00
Cedric Nugteren 0cd78bb6f9 Added kernel timing functionality to the utilities 2017-11-15 22:47:06 +01:00
Cedric Nugteren b337bffbaf Added exception handle with catch-all 2017-11-15 22:44:44 +01:00
Cedric Nugteren 03ebf14b97 Made the exception dispatch function optionally silent 2017-11-13 21:11:31 +01:00
Cedric Nugteren 4bac1287f2 Moved square-difference utility function for use in the tuners 2017-11-13 21:10:44 +01:00
Cedric Nugteren 677afd3b96 Factored out the creation of the OpenCL header and the program compilation 2017-11-11 16:14:43 +01:00
Cedric Nugteren c41d219ea4 Added tuning results for the GeForce GTX750Ti 2017-11-09 21:19:21 +01:00
Cedric Nugteren 5d5e3f93bc Updated to CLBlast version 1.2.0 2017-11-08 21:30:06 +01:00
Cedric Nugteren d24138808b Fixed an FP16 issue in the homatcopy test; added a comment about improper testing of integer returning functions for FP16 2017-11-08 21:20:07 +01:00
Cedric Nugteren b18cc9d3f1
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
2017-11-07 22:20:13 +01:00
Cedric Nugteren 6fe9916231 Updated the roadmap 2017-11-07 21:35:04 +01:00
Cedric Nugteren 3ec0be6fb8 Added various GEMM routine tuning results 2017-11-07 21:34:54 +01:00
Cedric Nugteren 33ac2b0175 Improved the way the database defaults are computed 2017-11-06 21:59:45 +01:00
Cedric Nugteren 34a33b54cf Changed GEMM routine tuner's scoring to use L2 measure instead for better averaging 2017-11-06 20:50:36 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren 73272ab97d Fixed a bug in database compression/decompression 2017-11-02 21:19:18 +01:00
Cedric Nugteren 5c90577dfd Added collecting and printing of scores for the kernel-selection tuner 2017-10-30 20:39:21 +01:00
Cedric Nugteren 061b1c571b Merge branch 'binary_cache_platform_dependent' 2017-10-30 19:42:35 +01:00
Cedric Nugteren ac5a58cfe5 Added platform ID to the binary program cache to prevent issues with multi-platform systems 2017-10-29 20:01:30 +01:00
Cedric Nugteren 19c53f6dd0
Merge pull request #208 from CNugteren/android_support
Added Android support
2017-10-29 16:45:56 +01:00
Cedric Nugteren f24d611e57 Made it possible to compile the CLBlast performance clients for Android with the NDK 2017-10-29 13:02:14 +01:00
Cedric Nugteren 319762f150 Added Android support using the GNU C++ STL library and the GCC toolchain 2017-10-29 12:07:07 +01:00
Cedric Nugteren 12b08ae491 Merge branch 'master' into android_support 2017-10-28 17:32:37 +02:00
Cedric Nugteren 334a26eb12 Added initial version of a GEMM kernel selection tuner 2017-10-28 17:30:29 +02:00
Cedric Nugteren bd57dfa435 Moved timing function to a separate file 2017-10-28 14:12:05 +02:00
Cedric Nugteren fa6e5e67f5 Fixed a bug when using the matrix A-offset argument for the TRSM routine 2017-10-27 22:12:30 +02:00
Cedric Nugteren 449577cf07 Reduced TRSM block-size for better numerical stability 2017-10-27 22:07:43 +02:00
Cedric Nugteren 44f7fa628a Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM 2017-10-27 22:01:15 +02:00
Cedric Nugteren 8579b2b494 Added a DTRSM C++ interface example 2017-10-27 21:53:19 +02:00
Cedric Nugteren e388f055f7 Fixed small bug in (unused) invert tester 2017-10-25 20:35:39 +02:00
Cedric Nugteren 8cdb5cb4a7 Updated roadmap with links to issues and status 2017-10-25 20:35:39 +02:00
Cedric Nugteren d49aae236e Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls 2017-10-25 20:35:39 +02:00
Cedric Nugteren 42ac3b4748 Merge pull request #206 from matze/use-gnuinstall-dirs
Use GNUInstallDirs to determine install paths
2017-10-23 20:03:47 +02:00