Cedric Nugteren
|
066af4069b
|
Removed an unused variable from the copy-transpose-pad function
|
2016-07-16 10:56:37 +02:00 |
Cedric Nugteren
|
c87e877bf2
|
Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel
|
2016-07-10 20:32:01 +02:00 |
Cedric Nugteren
|
27854070b4
|
Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen
|
2016-07-06 21:50:12 +02:00 |
Cedric Nugteren
|
76b20cfe0c
|
Fixes for the AppVeyor Windows build
|
2016-06-27 14:44:08 +02:00 |
Cedric Nugteren
|
61203453aa
|
Renamed all C++ source files to .cpp to match the .hpp extension better
|
2016-06-19 13:55:49 +02:00 |
Cedric Nugteren
|
f726fbdc9f
|
Moved all headers into the source tree, changed headers to .hpp extension
|
2016-06-18 20:20:13 +02:00 |
Cedric Nugteren
|
bacb5d2bb2
|
Clean-up of the routine class, moved RunKernel to the routine/common file
|
2016-06-18 18:16:14 +02:00 |
Cedric Nugteren
|
7b4c0e1cf0
|
Removed the template from the Routine base-class
|
2016-06-18 14:56:55 +02:00 |
Cedric Nugteren
|
f9947b4d7f
|
Removed the precision argument from the routines in favor of a single templated function
|
2016-06-17 14:30:37 +02:00 |
Cedric Nugteren
|
536b7fe4bc
|
Removed the interface to the cache functions from the Routine class, calls them directly now
|
2016-06-17 13:57:50 +02:00 |
Cedric Nugteren
|
98a95c89fc
|
Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine class
|
2016-06-17 12:32:06 +02:00 |
Cedric Nugteren
|
afe8852eaa
|
Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file
|
2016-06-17 11:29:07 +02:00 |
Cedric Nugteren
|
52ccaf5b25
|
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
|
2016-06-16 18:07:46 +02:00 |
Cedric Nugteren
|
39b7dbc5e3
|
Added some constness to variables related to the GEMM routines
|
2016-06-15 12:34:05 +02:00 |
Cedric Nugteren
|
b894611ad1
|
Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) and renamed files and functions appropriately
|
2016-06-14 18:17:58 +02:00 |
Cedric Nugteren
|
9f87455070
|
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
|
2016-05-25 13:29:53 +02:00 |
Cedric Nugteren
|
3e9a07f00a
|
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
|
2016-05-22 16:59:14 +02:00 |
Cedric Nugteren
|
c8ff3f143f
|
Prepared the GER kernels and tuner for half-precision support
|
2016-05-22 16:18:08 +02:00 |
Cedric Nugteren
|
95b828da12
|
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
|
2016-05-22 15:38:26 +02:00 |
Cedric Nugteren
|
88551b4005
|
Prepared the GEMV kernels and tuner for half-precision support
|
2016-05-22 15:22:54 +02:00 |
Cedric Nugteren
|
f70ded34f3
|
Added half-precision support for all level 1 routines
|
2016-05-22 14:26:19 +02:00 |
Cedric Nugteren
|
489c5d76cf
|
Merged in latest changes from 0.7.1 release
|
2016-05-18 21:32:56 +02:00 |
Cedric Nugteren
|
af2ac62212
|
Prepared GEMM and supporting kernels and tuners for half-precision support
|
2016-05-16 12:37:24 +02:00 |
Cedric Nugteren
|
5e1b2e021f
|
Set kernel arguments for AXPY as constant memory buffers, making it possible to transfer half-precision values as well
|
2016-05-14 18:06:00 +02:00 |
Cedric Nugteren
|
120c31a30f
|
Initial experimental version of the half-precision HAXPY routine
|
2016-05-13 20:49:34 +02:00 |
Cedric Nugteren
|
bee2f943ec
|
Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking
|
2016-05-01 14:03:37 +02:00 |
Cedric Nugteren
|
d9b21d7f49
|
Fixed the cache to store binaries instead of OpenCL programs
|
2016-04-28 21:14:17 +02:00 |
cnugteren
|
16a048f1ac
|
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
|
2016-04-20 22:12:51 -06:00 |
cnugteren
|
8be99de82d
|
Added support for the SASUM/DASUM/ScASUM/DzASUM routines
|
2016-04-14 19:58:26 -06:00 |
cnugteren
|
1d3d38a261
|
Events are now properly implemented using event waiting list and asking the user to wait for event completion
|
2016-04-09 22:22:24 -06:00 |
cnugteren
|
90e237b97a
|
Removed redundant queue synchronisation statements
|
2016-04-04 08:38:31 -07:00 |
Cedric Nugteren
|
aaa687ca98
|
Added preliminary support for the xNRM2 routines
|
2016-03-28 23:00:44 +02:00 |
Cedric Nugteren
|
f4c09220c1
|
Fixed a bug in the GER-family of routines due to incorrect division of the workgroup size
|
2016-03-06 16:43:28 +01:00 |
Cedric Nugteren
|
306bf67660
|
Added preliminary support for xHPR2 and xSPR2 routines
|
2016-03-06 15:48:11 +01:00 |
Cedric Nugteren
|
60da54da5d
|
Added preliminary support for xHER2 and xSYR2 routines
|
2016-03-02 21:18:01 +01:00 |
Cedric Nugteren
|
4a56822dcc
|
Fixed a couple of correctness bugs in the Xher kernels
|
2016-02-28 15:49:59 +01:00 |
Cedric Nugteren
|
e3545215a5
|
Added support for xHER, xHPR, xSYR, and xSPR routines
|
2016-02-28 14:16:48 +01:00 |
Cedric Nugteren
|
6dc44da07b
|
Added support for xGERU and xGERC routines
|
2016-02-20 14:15:41 +01:00 |
Cedric Nugteren
|
8854a73127
|
Added XGER routine, kernel, and tuner
|
2016-02-20 12:40:01 +01:00 |
Cedric Nugteren
|
bf84463ab2
|
Separated the GEMM kernel in two parts to reduce string length for MSVC
|
2016-02-08 20:06:02 +01:00 |
Cedric Nugteren
|
38c56bbde2
|
Split-up the XGEMV kernel in two parts
|
2016-02-08 19:43:34 +01:00 |
Cedric Nugteren
|
276e772a2c
|
Added first auto-generated database headers from the Python database; only K40 and Iris supported now
|
2016-01-30 11:43:21 +01:00 |
CNugteren
|
f74c9a5640
|
Routine names are now all default arguments defined in the header
|
2015-10-12 08:35:58 +02:00 |
CNugteren
|
54a8723f8c
|
Moved level3 kernel files to a subfolder
|
2015-10-12 08:28:40 +02:00 |
CNugteren
|
2b56c2c603
|
Added TRMV/TBMV/TPMV routines
|
2015-09-26 16:58:03 +02:00 |
CNugteren
|
de6547a92b
|
Added SBMV and SPMV routines
|
2015-09-19 18:01:19 +02:00 |
CNugteren
|
80da67d28b
|
Added the HPMV routine
|
2015-09-19 17:40:38 +02:00 |
CNugteren
|
aebd156869
|
Added the HBMV routine
|
2015-09-19 11:11:34 +02:00 |
CNugteren
|
93dddda63e
|
Improved the organization and performance of level 2 routines
|
2015-09-18 17:46:41 +02:00 |
CNugteren
|
4507ba4997
|
Added first version of banded matrix-vector multiplication
|
2015-09-18 15:25:20 +02:00 |
CNugteren
|
a2e726d3bd
|
Added xDOT/xDOTU/xDOTC dot-product routines
|
2015-09-14 16:57:00 +02:00 |
CNugteren
|
ff0c54c386
|
Added the XSWAP, XSCAL and XCOPY level-1 routines
|
2015-08-22 17:11:20 +02:00 |
CNugteren
|
75517353d5
|
Re-organized level1 xaxpy kernel
|
2015-08-22 14:33:48 +02:00 |
CNugteren
|
75b4d92ac3
|
Added distinguished names for GEMV inherited HEMV/SYMV
|
2015-08-04 08:15:39 +02:00 |
CNugteren
|
938ca2707f
|
Added HEMV routine
|
2015-07-31 17:35:42 +02:00 |
CNugteren
|
b89517a2e7
|
Added SYMV routine
|
2015-07-31 17:13:41 +02:00 |
CNugteren
|
f7199b831f
|
Now using the new Claduc C++11 OpenCL header
|
2015-07-27 07:18:06 +02:00 |
CNugteren
|
48e2e96f1b
|
Kernel caching is now based on a routine's name
|
2015-07-19 16:24:14 +02:00 |
CNugteren
|
4e499a67c1
|
The kernel source string is now a routine's member variable
|
2015-07-19 13:44:37 +02:00 |
CNugteren
|
b526623fc7
|
Skips pre/post processing kernels if not needed
|
2015-07-15 22:12:38 +02:00 |
CNugteren
|
0dc85845f7
|
Updated interface of the PadCopyTransposeMatrix method
|
2015-07-13 08:41:26 +02:00 |
CNugteren
|
aa852bbe67
|
Added subfolders for the level1/2/3 routines
|
2015-07-12 16:57:09 +02:00 |
CNugteren
|
b5d39d9d0c
|
Added the HEMM routine, tester, and client
|
2015-07-12 15:11:50 +02:00 |
CNugteren
|
b02876d6e9
|
Added the HER2K routine, tester, and client
|
2015-07-10 20:59:20 +02:00 |
CNugteren
|
919bba3eaf
|
Added the HERK routine, tester, and client
|
2015-07-10 07:19:59 +02:00 |
CNugteren
|
5578d5ab28
|
Added option to set the imaginary part of the diagonal to zero
|
2015-07-08 07:25:18 +02:00 |
CNugteren
|
d9ea0c47c6
|
Added the TRMM routine, tester, and client
|
2015-07-02 07:16:04 +02:00 |
CNugteren
|
b8d81a60d6
|
Fixed typos in SYMM
|
2015-07-01 09:38:04 +02:00 |
CNugteren
|
7c8d16147a
|
Added the SYR2K routine, tester, and client
|
2015-06-26 08:12:56 +02:00 |
CNugteren
|
57c705dbf2
|
Clarified comment
|
2015-06-25 20:38:34 +02:00 |
CNugteren
|
60a88aac86
|
Added the SYRK routine, tester, and client
|
2015-06-24 07:50:18 +02:00 |
CNugteren
|
20eb3506d6
|
Added a condition to update only lower/upper triangular parts in the un-pad kernels
|
2015-06-23 08:09:07 +02:00 |
CNugteren
|
682c01a80c
|
Now returns program from database by reference
|
2015-06-18 18:44:14 +02:00 |
CNugteren
|
7e176ccac9
|
Added support for conjugate transpose in GEMV
|
2015-06-16 08:42:52 +02:00 |
CNugteren
|
8f01c644b5
|
Added support for complex conjugate transpose
|
2015-06-16 07:43:19 +02:00 |
CNugteren
|
294a3e3d41
|
Split the three variations of the GEMV kernel for maximal tuning freedom
|
2015-06-14 11:15:53 +02:00 |
CNugteren
|
ab0064dab7
|
Fixed number of threads launched for GEMV
|
2015-06-14 10:08:56 +02:00 |
CNugteren
|
9aa2989447
|
Fixed number of threads launched for AXPY
|
2015-06-14 10:08:23 +02:00 |
CNugteren
|
4b3e3dcfe0
|
Added a fast GEMV kernel with vector loads, no tail, and fewer if-statements
|
2015-06-13 20:46:01 +02:00 |
CNugteren
|
e522d1a74e
|
Added initial version of GEMV including tester and performance client
|
2015-06-13 11:01:20 +02:00 |
CNugteren
|
bc5a341dfe
|
Initial commit of preview version
|
2015-05-30 12:30:43 +02:00 |