Cedric Nugteren
|
7468e2ba9d
|
Adjusted the correctness-test error margins
|
2016-03-06 16:32:38 +01:00 |
|
Cedric Nugteren
|
c93cd2fc2d
|
Merge branch 'rank2_update_routines' into development
|
2016-03-06 15:48:51 +01:00 |
|
Cedric Nugteren
|
306bf67660
|
Added preliminary support for xHPR2 and xSPR2 routines
|
2016-03-06 15:48:11 +01:00 |
|
Cedric Nugteren
|
60da54da5d
|
Added preliminary support for xHER2 and xSYR2 routines
|
2016-03-02 21:18:01 +01:00 |
|
Cedric Nugteren
|
fa79720557
|
Added tuning results for Intel Iris Pro and AMD R9 M370X
|
2016-02-28 16:47:52 +01:00 |
|
Cedric Nugteren
|
3c27edb087
|
Updated the changelog with newly supported level-2 routines
|
2016-02-28 16:37:49 +01:00 |
|
Cedric Nugteren
|
610a31283b
|
Merge branch 'ger_routines' into development
|
2016-02-28 16:31:31 +01:00 |
|
Cedric Nugteren
|
4a56822dcc
|
Fixed a couple of correctness bugs in the Xher kernels
|
2016-02-28 15:49:59 +01:00 |
|
Cedric Nugteren
|
e3545215a5
|
Added support for xHER, xHPR, xSYR, and xSPR routines
|
2016-02-28 14:16:48 +01:00 |
|
Cedric Nugteren
|
cef78c7356
|
Fixed a compilation issue under AppleClang
|
2016-02-28 14:14:50 +01:00 |
|
Cedric Nugteren
|
9f682aa66b
|
Set a proper default precision for the CLBlast clients
|
2016-02-20 14:41:53 +01:00 |
|
Cedric Nugteren
|
6dc44da07b
|
Added support for xGERU and xGERC routines
|
2016-02-20 14:15:41 +01:00 |
|
Cedric Nugteren
|
8854a73127
|
Added XGER routine, kernel, and tuner
|
2016-02-20 12:40:01 +01:00 |
|
Cedric Nugteren
|
c457a70aa1
|
Updated the changelog
|
2016-02-10 21:32:09 +01:00 |
|
CNugteren
|
fadd76207f
|
Fixed warnings under MSVC
|
2016-02-08 20:44:05 +01:00 |
|
Cedric Nugteren
|
bf84463ab2
|
Separated the GEMM kernel in two parts to reduce string length for MSVC
|
2016-02-08 20:06:02 +01:00 |
|
Cedric Nugteren
|
38c56bbde2
|
Split-up the XGEMV kernel in two parts
|
2016-02-08 19:43:34 +01:00 |
|
Cedric Nugteren
|
6f4b34f813
|
Added tuning parameters for various devices using the new database script
|
2016-02-07 16:41:09 +01:00 |
|
Cedric Nugteren
|
165a94c200
|
Various fixes to the database script
|
2016-02-07 16:39:37 +01:00 |
|
Cedric Nugteren
|
00be6f7530
|
Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names
|
2016-02-07 11:59:30 +01:00 |
|
Cedric Nugteren
|
c76f1d9dbb
|
Made the tuning database an optional external download
|
2016-02-07 10:59:51 +01:00 |
|
CNugteren
|
704a729f5c
|
Made the database script compatible with Python 3
|
2016-02-06 13:11:36 +01:00 |
|
CNugteren
|
b7900652b2
|
Reduced the maximum workgroup-size for GEMV kernels further
|
2016-02-06 13:07:19 +01:00 |
|
Cedric Nugteren
|
bb985f010b
|
Changed the order of tuners in the alltuners target
|
2016-02-06 12:48:42 +01:00 |
|
CNugteren
|
40346bb3a5
|
Reduced unrolling factor in xgemv kernel to reduce compilation times
|
2016-02-06 12:09:21 +01:00 |
|
CNugteren
|
fbf071ba62
|
Fixed a linker error in the performance client under GCC
|
2016-02-06 10:53:44 +01:00 |
|
CNugteren
|
9622d3be22
|
Fixes for compilation under Visual Studio
|
2016-01-30 14:57:49 +01:00 |
|
Cedric Nugteren
|
44fb40e5c4
|
Prepared for MSVC support
|
2016-01-30 11:54:29 +01:00 |
|
Cedric Nugteren
|
f573fe6bb3
|
Fixed a bug in the graph scripts (thanks to Victor Pakhomov)
|
2016-01-30 11:53:54 +01:00 |
|
Cedric Nugteren
|
310d05d187
|
Updated to version 4.0 of the CLCudaAPI header
|
2016-01-30 11:52:21 +01:00 |
|
Cedric Nugteren
|
deb58709c8
|
Merge branch 'tuning_database' into development
is merge is necessary,
|
2016-01-30 11:46:45 +01:00 |
|
Cedric Nugteren
|
276e772a2c
|
Added first auto-generated database headers from the Python database; only K40 and Iris supported now
|
2016-01-30 11:43:21 +01:00 |
|
Cedric Nugteren
|
76c9148030
|
Minor improvements to the database script, including proper file paths
|
2016-01-24 17:56:27 +01:00 |
|
Cedric Nugteren
|
f0b3091cdb
|
Added Python function to compute defaults for a particular device/vendor combination
|
2016-01-24 17:35:31 +01:00 |
|
Cedric Nugteren
|
e4e3663e61
|
Updated FindOpenCL for Intel Linux OpenCL paths
|
2016-01-23 16:09:07 +01:00 |
|
CNugteren
|
09c94b17cf
|
Added tuning data for Tesla K40
|
2015-10-28 21:20:42 +01:00 |
|
CNugteren
|
c0d469718a
|
Now sets local memory size in xgemv tuner properly
|
2015-10-28 21:19:59 +01:00 |
|
CNugteren
|
bb4e78f737
|
Added initial tuning database with Intel Iris data
|
2015-10-25 16:49:59 +01:00 |
|
CNugteren
|
ccd1a5c7cc
|
Updated tuning database script according to the new JSON format
|
2015-10-25 16:49:29 +01:00 |
|
CNugteren
|
179ad0666d
|
Fixed an arguments-related bug in the GEMV tuner
|
2015-10-25 16:48:26 +01:00 |
|
CNugteren
|
a2d5d7770e
|
Moved the tuner database script to a separate folder
|
2015-10-25 16:27:14 +01:00 |
|
CNugteren
|
9bf6be8426
|
Added alpha and beta to tuner meta-data
|
2015-10-23 11:01:44 +02:00 |
|
CNugteren
|
3f616366bd
|
Prepared the changelog for the next release
|
2015-10-17 15:57:04 +02:00 |
|
CNugteren
|
92404035e8
|
Updated to version 0.5.0
|
2015-10-17 15:48:13 +02:00 |
|
CNugteren
|
afb3e64fd3
|
Travis now also build the development branch
|
2015-10-17 15:42:45 +02:00 |
|
Cedric Nugteren
|
653feca564
|
Merge pull request #28 from CNugteren/kernels_reorganization
Kernels re-organization level-3
|
2015-10-17 15:30:06 +02:00 |
|
CNugteren
|
0d4091fdfb
|
Added guards for routine-specific level-3 pad kernels
|
2015-10-13 08:29:45 +02:00 |
|
CNugteren
|
f74c9a5640
|
Routine names are now all default arguments defined in the header
|
2015-10-12 08:35:58 +02:00 |
|
CNugteren
|
54a8723f8c
|
Moved level3 kernel files to a subfolder
|
2015-10-12 08:28:40 +02:00 |
|
Cedric Nugteren
|
92b4b0d1fe
|
Merge pull request #27 from CNugteren/level2_matrix_vector
Added many level-2 matrix-vector routines
|
2015-09-26 17:02:34 +02:00 |
|