Commit Graph

1473 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren 60da54da5d Added preliminary support for xHER2 and xSYR2 routines 2016-03-02 21:18:01 +01:00
Cedric Nugteren fa79720557 Added tuning results for Intel Iris Pro and AMD R9 M370X 2016-02-28 16:47:52 +01:00
Cedric Nugteren 3c27edb087 Updated the changelog with newly supported level-2 routines 2016-02-28 16:37:49 +01:00
Cedric Nugteren 610a31283b Merge branch 'ger_routines' into development 2016-02-28 16:31:31 +01:00
Cedric Nugteren 4a56822dcc Fixed a couple of correctness bugs in the Xher kernels 2016-02-28 15:49:59 +01:00
Cedric Nugteren e3545215a5 Added support for xHER, xHPR, xSYR, and xSPR routines 2016-02-28 14:16:48 +01:00
Cedric Nugteren cef78c7356 Fixed a compilation issue under AppleClang 2016-02-28 14:14:50 +01:00
Cedric Nugteren 9f682aa66b Set a proper default precision for the CLBlast clients 2016-02-20 14:41:53 +01:00
Cedric Nugteren 6dc44da07b Added support for xGERU and xGERC routines 2016-02-20 14:15:41 +01:00
Cedric Nugteren 8854a73127 Added XGER routine, kernel, and tuner 2016-02-20 12:40:01 +01:00
Cedric Nugteren c457a70aa1 Updated the changelog 2016-02-10 21:32:09 +01:00
CNugteren fadd76207f Fixed warnings under MSVC 2016-02-08 20:44:05 +01:00
Cedric Nugteren bf84463ab2 Separated the GEMM kernel in two parts to reduce string length for MSVC 2016-02-08 20:06:02 +01:00
Cedric Nugteren 38c56bbde2 Split-up the XGEMV kernel in two parts 2016-02-08 19:43:34 +01:00
Cedric Nugteren 6f4b34f813 Added tuning parameters for various devices using the new database script 2016-02-07 16:41:09 +01:00
Cedric Nugteren 165a94c200 Various fixes to the database script 2016-02-07 16:39:37 +01:00
Cedric Nugteren 00be6f7530 Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names 2016-02-07 11:59:30 +01:00
Cedric Nugteren c76f1d9dbb Made the tuning database an optional external download 2016-02-07 10:59:51 +01:00
CNugteren 704a729f5c Made the database script compatible with Python 3 2016-02-06 13:11:36 +01:00
CNugteren b7900652b2 Reduced the maximum workgroup-size for GEMV kernels further 2016-02-06 13:07:19 +01:00
Cedric Nugteren bb985f010b Changed the order of tuners in the alltuners target 2016-02-06 12:48:42 +01:00
CNugteren 40346bb3a5 Reduced unrolling factor in xgemv kernel to reduce compilation times 2016-02-06 12:09:21 +01:00
CNugteren fbf071ba62 Fixed a linker error in the performance client under GCC 2016-02-06 10:53:44 +01:00
CNugteren 9622d3be22 Fixes for compilation under Visual Studio 2016-01-30 14:57:49 +01:00
Cedric Nugteren 44fb40e5c4 Prepared for MSVC support 2016-01-30 11:54:29 +01:00
Cedric Nugteren f573fe6bb3 Fixed a bug in the graph scripts (thanks to Victor Pakhomov) 2016-01-30 11:53:54 +01:00
Cedric Nugteren 310d05d187 Updated to version 4.0 of the CLCudaAPI header 2016-01-30 11:52:21 +01:00
Cedric Nugteren deb58709c8 Merge branch 'tuning_database' into development
is merge is necessary,
2016-01-30 11:46:45 +01:00
Cedric Nugteren 276e772a2c Added first auto-generated database headers from the Python database; only K40 and Iris supported now 2016-01-30 11:43:21 +01:00
Cedric Nugteren 76c9148030 Minor improvements to the database script, including proper file paths 2016-01-24 17:56:27 +01:00
Cedric Nugteren f0b3091cdb Added Python function to compute defaults for a particular device/vendor combination 2016-01-24 17:35:31 +01:00
Cedric Nugteren e4e3663e61 Updated FindOpenCL for Intel Linux OpenCL paths 2016-01-23 16:09:07 +01:00
CNugteren 09c94b17cf Added tuning data for Tesla K40 2015-10-28 21:20:42 +01:00
CNugteren c0d469718a Now sets local memory size in xgemv tuner properly 2015-10-28 21:19:59 +01:00
CNugteren bb4e78f737 Added initial tuning database with Intel Iris data 2015-10-25 16:49:59 +01:00
CNugteren ccd1a5c7cc Updated tuning database script according to the new JSON format 2015-10-25 16:49:29 +01:00
CNugteren 179ad0666d Fixed an arguments-related bug in the GEMV tuner 2015-10-25 16:48:26 +01:00
CNugteren a2d5d7770e Moved the tuner database script to a separate folder 2015-10-25 16:27:14 +01:00
CNugteren 9bf6be8426 Added alpha and beta to tuner meta-data 2015-10-23 11:01:44 +02:00
CNugteren 3f616366bd Prepared the changelog for the next release 2015-10-17 15:57:04 +02:00
Cedric Nugteren 4678fd371d Merge pull request #29 from CNugteren/development
Update to version 0.5.0
2015-10-17 15:54:55 +02:00
CNugteren 92404035e8 Updated to version 0.5.0 2015-10-17 15:48:13 +02:00
CNugteren afb3e64fd3 Travis now also build the development branch 2015-10-17 15:42:45 +02:00
Cedric Nugteren 653feca564 Merge pull request #28 from CNugteren/kernels_reorganization
Kernels re-organization level-3
2015-10-17 15:30:06 +02:00
CNugteren 0d4091fdfb Added guards for routine-specific level-3 pad kernels 2015-10-13 08:29:45 +02:00
CNugteren f74c9a5640 Routine names are now all default arguments defined in the header 2015-10-12 08:35:58 +02:00
CNugteren 54a8723f8c Moved level3 kernel files to a subfolder 2015-10-12 08:28:40 +02:00
Cedric Nugteren 92b4b0d1fe Merge pull request #27 from CNugteren/level2_matrix_vector
Added many level-2 matrix-vector routines
2015-09-26 17:02:34 +02:00
CNugteren 2b56c2c603 Added TRMV/TBMV/TPMV routines 2015-09-26 16:58:03 +02:00
CNugteren 04d28b0420 Made buffer copying a const-method for the source 2015-09-26 16:48:11 +02:00