Cedric Nugteren
|
8854a73127
|
Added XGER routine, kernel, and tuner
|
2016-02-20 12:40:01 +01:00 |
|
Cedric Nugteren
|
bf84463ab2
|
Separated the GEMM kernel in two parts to reduce string length for MSVC
|
2016-02-08 20:06:02 +01:00 |
|
Cedric Nugteren
|
38c56bbde2
|
Split-up the XGEMV kernel in two parts
|
2016-02-08 19:43:34 +01:00 |
|
CNugteren
|
b7900652b2
|
Reduced the maximum workgroup-size for GEMV kernels further
|
2016-02-06 13:07:19 +01:00 |
|
CNugteren
|
40346bb3a5
|
Reduced unrolling factor in xgemv kernel to reduce compilation times
|
2016-02-06 12:09:21 +01:00 |
|
CNugteren
|
c0d469718a
|
Now sets local memory size in xgemv tuner properly
|
2015-10-28 21:19:59 +01:00 |
|
CNugteren
|
179ad0666d
|
Fixed an arguments-related bug in the GEMV tuner
|
2015-10-25 16:48:26 +01:00 |
|
CNugteren
|
54a8723f8c
|
Moved level3 kernel files to a subfolder
|
2015-10-12 08:28:40 +02:00 |
|
CNugteren
|
4507ba4997
|
Added first version of banded matrix-vector multiplication
|
2015-09-18 15:25:20 +02:00 |
|
CNugteren
|
a2e726d3bd
|
Added xDOT/xDOTU/xDOTC dot-product routines
|
2015-09-14 16:57:00 +02:00 |
|
CNugteren
|
2a383f3450
|
Added extra temporary buffer to tuners in preparation of Xdot routines
|
2015-09-14 15:53:34 +02:00 |
|
CNugteren
|
75517353d5
|
Re-organized level1 xaxpy kernel
|
2015-08-22 14:33:48 +02:00 |
|
CNugteren
|
dbdb58c600
|
Refactored the tuners, added JSON output
|
2015-08-09 15:50:41 +02:00 |
|
CNugteren
|
4dcecfe934
|
Added workgroup shuffle option to transpose kernel for AMD GPUs
|
2015-07-22 07:31:16 +02:00 |
|
CNugteren
|
4e499a67c1
|
The kernel source string is now a routine's member variable
|
2015-07-19 13:44:37 +02:00 |
|
CNugteren
|
7e176ccac9
|
Added support for conjugate transpose in GEMV
|
2015-06-16 08:42:52 +02:00 |
|
CNugteren
|
af78a04eca
|
Updated the tuners to set the conjugate argument
|
2015-06-16 07:50:45 +02:00 |
|
CNugteren
|
294a3e3d41
|
Split the three variations of the GEMV kernel for maximal tuning freedom
|
2015-06-14 11:15:53 +02:00 |
|
CNugteren
|
4b3e3dcfe0
|
Added a fast GEMV kernel with vector loads, no tail, and fewer if-statements
|
2015-06-13 20:46:01 +02:00 |
|
CNugteren
|
9b66883e9c
|
Improved GEMV kernel with local memory and a tunable WPT
|
2015-06-13 14:10:07 +02:00 |
|
CNugteren
|
e522d1a74e
|
Added initial version of GEMV including tester and performance client
|
2015-06-13 11:01:20 +02:00 |
|
CNugteren
|
85c1db9322
|
Added initial naive version of Xgemv kernel
|
2015-06-10 08:44:30 +02:00 |
|
CNugteren
|
bc5a341dfe
|
Initial commit of preview version
|
2015-05-30 12:30:43 +02:00 |
|