Cedric Nugteren
b114ea49a9
Added first naive version of the batched AXPY routine
2017-03-05 15:06:14 +01:00
Cedric Nugteren
3fc73851f7
Added proper support for the b_offset argument in TRSM
2017-03-01 21:23:33 +01:00
Cedric Nugteren
00281dad26
Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants
2017-02-27 21:00:04 +01:00
Cedric Nugteren
e09c26c706
Split the GEMM kernel further up to prevent C1091 in MSVC
2017-02-26 15:03:12 +01:00
Cedric Nugteren
ea6790665d
Merge branch 'development' into triangular_solvers
2017-02-26 14:51:45 +01:00
Cedric Nugteren
df7638c305
Fixed an out-of-bounds memory access when filling a matrix with a constant
2017-02-26 14:31:05 +01:00
Cedric Nugteren
2f2a510c38
Implemented a simple row-major to col-major problem conversion for TRSM
2017-02-24 21:08:44 +01:00
Cedric Nugteren
1e5b5157bc
Fixed a few issues with the TRSM routine; some tests still failing
2017-02-22 20:31:33 +01:00
Cedric Nugteren
00eb55a2d4
Fixed a small bug in GEMV: unused kernel in parameter list
2017-02-13 20:48:32 +01:00
Cedric Nugteren
345a5feb9a
Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides)
2017-02-12 12:02:39 +01:00
Cedric Nugteren
c248f900c0
Merge branch 'development' into triangular_solvers
2017-02-05 22:18:59 +01:00
Cedric Nugteren
c209dd7af9
Improved substition kernels a bit; added complex support
2017-02-04 22:48:06 +01:00
Cedric Nugteren
fec8c1a806
Completed a first STRSV implementation
2017-02-04 16:04:19 +01:00
Cedric Nugteren
a6ba6470aa
Added row-major support for TRSV
2017-02-04 14:25:27 +01:00
Cedric Nugteren
7c73ceb095
Added first (incomplete) version of TRSV routine
2017-01-29 17:02:00 +01:00
Ivan Shapovalov
5bcd92f297
Routine, Cache: generalize, reduce amount of copying in fast path
...
Implement a generalized Cache<K, V>. Two variants are provided: the
first one is based on std::map, using C++14-specific transparent
std::less<> and generalized std::map::find() to allow searching by tuple
of references. The second one is based on std::vector and O(n) lookup,
but remains C++11-compliant.
2017-01-24 11:56:15 +03:00
Ivan Shapovalov
1a1e863ab3
treewide: include clpp11.hpp first to silence deprecation warnings
...
Otherwise, cl.h gets included through clblast.h before clpp11.hpp.
2017-01-20 17:32:42 +03:00
Cedric Nugteren
a5fd2323b6
Added prototype for the TRSV routine
2017-01-20 11:30:32 +01:00
Cedric Nugteren
df9a77d74d
Added first version of the TRSM routine based on the diagonal invert kernel
2017-01-18 21:29:59 +01:00
Cedric Nugteren
4b3ffd9989
Added a first version of the diagonal block invert routine in preparation of TRSM
2017-01-15 17:30:00 +01:00
Cedric Nugteren
681a465b35
Prepared for the addition of the TRSM triangular solver kernel
2016-12-18 12:30:16 +01:00
Cedric Nugteren
6b533dda1c
Fixed a bug when using offsets in the direct GEMM kernels
2016-12-18 11:54:32 +01:00
Cedric Nugteren
cb398f0e42
Merge pull request #125 from CNugteren/netlib_blas_api
...
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren
654b41bb2b
Fixed a bug in the HSCAL routine
2016-11-23 21:29:16 +01:00
Cedric Nugteren
d8af24e388
Now correctly tests for validaty of the B matrix in the TRMM routine
2016-11-20 16:27:54 +01:00
Cedric Nugteren
2f0697564f
Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything
2016-11-20 15:05:42 +01:00
Cedric Nugteren
76d5d2ccfc
Fixed a bug in the transpose-matrix function
2016-10-23 20:49:55 +02:00
Cedric Nugteren
280698d076
Merge pull request #117 from intelfx/exceptions
...
Convert to use C++ exceptions internally
2016-10-22 15:05:12 +02:00
Cedric Nugteren
db17b1fbe9
Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters
2016-10-22 10:41:02 +02:00
Ivan Shapovalov
56f300607b
Routine: get rid of ::SetUp()
...
Since we now use C++ exceptions inside the implementation (and exceptions
can be thrown from constructors), there is no need for a separate
Routine::SetUp() function.
For this, we also change the way how the kernel source string is constructed.
The kernel-specific source code is now passed to the Routine ctor via
an initializer_list of C strings to avoid unnecessary data copying
while also working around C1091 of MSVC 2013.
2016-10-22 08:45:27 +03:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Cedric Nugteren
de77f00e8c
Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 and 2015
2016-10-10 22:23:33 +02:00
Cedric Nugteren
a3e67f2be2
Added a kernel selection database to select between the direct and indirect GEMM kernels
2016-10-06 19:51:12 +02:00
Cedric Nugteren
c1c4bc5d20
Re-organised GEMM direct kernel and added faster fall-back version for incomplete rectangles
2016-10-03 19:32:01 +02:00
Cedric Nugteren
d8827e908c
Specialised the GEMM direct kernel in four ways for transposing/non-transposing: NN, NT, TN, TT
2016-10-02 17:59:05 +02:00
Cedric Nugteren
61f489e370
Split the GEMM direct kernel into two files; set the default tuning target to 256-256-256
2016-10-02 15:06:59 +02:00
Cedric Nugteren
73d135c2ce
Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter
2016-09-25 14:48:34 +02:00
Cedric Nugteren
669f43aed6
Separated the tuning parameters of the new direct GEMM kernel from the indirect version
2016-09-25 13:52:08 +02:00
Cedric Nugteren
6aa652d6ea
Merge branch 'development' into gemm_direct
2016-09-21 21:32:18 +02:00
Cedric Nugteren
4ce584a014
Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC can't handle long strings
2016-09-12 22:13:16 +02:00
Cedric Nugteren
5004a435ff
Fixed issues related to the recent changes in the Xgemm infrastructure
2016-07-26 20:59:59 +02:00
Cedric Nugteren
5053f6ebc6
Merge branch 'development' into gemm_direct
2016-07-26 20:53:31 +02:00
Cedric Nugteren
2582f0290a
Moved the XgemvFast and XgemvFastRot tuning database into a separate file
2016-07-25 22:43:49 +02:00
Cedric Nugteren
0252df731a
Merge branch 'development' into gemv_performance
2016-07-24 17:06:27 +02:00
Cedric Nugteren
75fe8235f7
Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enabling better memory performance
2016-07-23 10:20:11 +02:00
Ivan Shapovalov
ae3299da30
clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
2dd5ee3f75
clblast::RunKernel, cl::Kernel: take const vector as waitForEvents
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
1ae71614ac
xgemm: do not hardcode kernel requirements for internal matrix layout
...
Do not hardcode the knowledge about "A and C col-major, B row-major".
This allows for easier reuse of the DoGemm() routine with different
kernels.
2016-07-22 11:15:52 +03:00
Cedric Nugteren
798d32edad
Improved the GEMM direct kernel by adding register blocking. Still not fast though
2016-07-17 14:36:51 +02:00
Cedric Nugteren
eaa348735e
Created infrastructure to support a direct GEMM kernel; added correct but slow reference kernel as a place-holder
2016-07-16 15:18:28 +02:00
Cedric Nugteren
066af4069b
Removed an unused variable from the copy-transpose-pad function
2016-07-16 10:56:37 +02:00
Cedric Nugteren
c87e877bf2
Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel
2016-07-10 20:32:01 +02:00
Cedric Nugteren
27854070b4
Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen
2016-07-06 21:50:12 +02:00
Cedric Nugteren
76b20cfe0c
Fixes for the AppVeyor Windows build
2016-06-27 14:44:08 +02:00
Cedric Nugteren
61203453aa
Renamed all C++ source files to .cpp to match the .hpp extension better
2016-06-19 13:55:49 +02:00
Cedric Nugteren
f726fbdc9f
Moved all headers into the source tree, changed headers to .hpp extension
2016-06-18 20:20:13 +02:00
Cedric Nugteren
bacb5d2bb2
Clean-up of the routine class, moved RunKernel to the routine/common file
2016-06-18 18:16:14 +02:00
Cedric Nugteren
7b4c0e1cf0
Removed the template from the Routine base-class
2016-06-18 14:56:55 +02:00
Cedric Nugteren
f9947b4d7f
Removed the precision argument from the routines in favor of a single templated function
2016-06-17 14:30:37 +02:00
Cedric Nugteren
536b7fe4bc
Removed the interface to the cache functions from the Routine class, calls them directly now
2016-06-17 13:57:50 +02:00
Cedric Nugteren
98a95c89fc
Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine class
2016-06-17 12:32:06 +02:00
Cedric Nugteren
afe8852eaa
Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file
2016-06-17 11:29:07 +02:00
Cedric Nugteren
52ccaf5b25
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
2016-06-16 18:07:46 +02:00
Cedric Nugteren
39b7dbc5e3
Added some constness to variables related to the GEMM routines
2016-06-15 12:34:05 +02:00
Cedric Nugteren
b894611ad1
Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) and renamed files and functions appropriately
2016-06-14 18:17:58 +02:00
Cedric Nugteren
9f87455070
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
2016-05-25 13:29:53 +02:00
Cedric Nugteren
3e9a07f00a
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
2016-05-22 16:59:14 +02:00
Cedric Nugteren
c8ff3f143f
Prepared the GER kernels and tuner for half-precision support
2016-05-22 16:18:08 +02:00
Cedric Nugteren
95b828da12
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
2016-05-22 15:38:26 +02:00
Cedric Nugteren
88551b4005
Prepared the GEMV kernels and tuner for half-precision support
2016-05-22 15:22:54 +02:00
Cedric Nugteren
f70ded34f3
Added half-precision support for all level 1 routines
2016-05-22 14:26:19 +02:00
Cedric Nugteren
489c5d76cf
Merged in latest changes from 0.7.1 release
2016-05-18 21:32:56 +02:00
Cedric Nugteren
af2ac62212
Prepared GEMM and supporting kernels and tuners for half-precision support
2016-05-16 12:37:24 +02:00
Cedric Nugteren
5e1b2e021f
Set kernel arguments for AXPY as constant memory buffers, making it possible to transfer half-precision values as well
2016-05-14 18:06:00 +02:00
Cedric Nugteren
120c31a30f
Initial experimental version of the half-precision HAXPY routine
2016-05-13 20:49:34 +02:00
Cedric Nugteren
bee2f943ec
Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking
2016-05-01 14:03:37 +02:00
Cedric Nugteren
d9b21d7f49
Fixed the cache to store binaries instead of OpenCL programs
2016-04-28 21:14:17 +02:00
cnugteren
16a048f1ac
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
2016-04-20 22:12:51 -06:00
cnugteren
8be99de82d
Added support for the SASUM/DASUM/ScASUM/DzASUM routines
2016-04-14 19:58:26 -06:00
cnugteren
1d3d38a261
Events are now properly implemented using event waiting list and asking the user to wait for event completion
2016-04-09 22:22:24 -06:00
cnugteren
90e237b97a
Removed redundant queue synchronisation statements
2016-04-04 08:38:31 -07:00
Cedric Nugteren
aaa687ca98
Added preliminary support for the xNRM2 routines
2016-03-28 23:00:44 +02:00
Cedric Nugteren
f4c09220c1
Fixed a bug in the GER-family of routines due to incorrect division of the workgroup size
2016-03-06 16:43:28 +01:00
Cedric Nugteren
306bf67660
Added preliminary support for xHPR2 and xSPR2 routines
2016-03-06 15:48:11 +01:00
Cedric Nugteren
60da54da5d
Added preliminary support for xHER2 and xSYR2 routines
2016-03-02 21:18:01 +01:00
Cedric Nugteren
4a56822dcc
Fixed a couple of correctness bugs in the Xher kernels
2016-02-28 15:49:59 +01:00
Cedric Nugteren
e3545215a5
Added support for xHER, xHPR, xSYR, and xSPR routines
2016-02-28 14:16:48 +01:00
Cedric Nugteren
6dc44da07b
Added support for xGERU and xGERC routines
2016-02-20 14:15:41 +01:00
Cedric Nugteren
8854a73127
Added XGER routine, kernel, and tuner
2016-02-20 12:40:01 +01:00
Cedric Nugteren
bf84463ab2
Separated the GEMM kernel in two parts to reduce string length for MSVC
2016-02-08 20:06:02 +01:00
Cedric Nugteren
38c56bbde2
Split-up the XGEMV kernel in two parts
2016-02-08 19:43:34 +01:00
Cedric Nugteren
276e772a2c
Added first auto-generated database headers from the Python database; only K40 and Iris supported now
2016-01-30 11:43:21 +01:00
CNugteren
f74c9a5640
Routine names are now all default arguments defined in the header
2015-10-12 08:35:58 +02:00
CNugteren
54a8723f8c
Moved level3 kernel files to a subfolder
2015-10-12 08:28:40 +02:00
CNugteren
2b56c2c603
Added TRMV/TBMV/TPMV routines
2015-09-26 16:58:03 +02:00
CNugteren
de6547a92b
Added SBMV and SPMV routines
2015-09-19 18:01:19 +02:00
CNugteren
80da67d28b
Added the HPMV routine
2015-09-19 17:40:38 +02:00
CNugteren
aebd156869
Added the HBMV routine
2015-09-19 11:11:34 +02:00
CNugteren
93dddda63e
Improved the organization and performance of level 2 routines
2015-09-18 17:46:41 +02:00
CNugteren
4507ba4997
Added first version of banded matrix-vector multiplication
2015-09-18 15:25:20 +02:00
CNugteren
a2e726d3bd
Added xDOT/xDOTU/xDOTC dot-product routines
2015-09-14 16:57:00 +02:00
CNugteren
ff0c54c386
Added the XSWAP, XSCAL and XCOPY level-1 routines
2015-08-22 17:11:20 +02:00
CNugteren
75517353d5
Re-organized level1 xaxpy kernel
2015-08-22 14:33:48 +02:00
CNugteren
75b4d92ac3
Added distinguished names for GEMV inherited HEMV/SYMV
2015-08-04 08:15:39 +02:00
CNugteren
938ca2707f
Added HEMV routine
2015-07-31 17:35:42 +02:00
CNugteren
b89517a2e7
Added SYMV routine
2015-07-31 17:13:41 +02:00
CNugteren
f7199b831f
Now using the new Claduc C++11 OpenCL header
2015-07-27 07:18:06 +02:00
CNugteren
48e2e96f1b
Kernel caching is now based on a routine's name
2015-07-19 16:24:14 +02:00
CNugteren
4e499a67c1
The kernel source string is now a routine's member variable
2015-07-19 13:44:37 +02:00
CNugteren
b526623fc7
Skips pre/post processing kernels if not needed
2015-07-15 22:12:38 +02:00
CNugteren
0dc85845f7
Updated interface of the PadCopyTransposeMatrix method
2015-07-13 08:41:26 +02:00
CNugteren
aa852bbe67
Added subfolders for the level1/2/3 routines
2015-07-12 16:57:09 +02:00
CNugteren
b5d39d9d0c
Added the HEMM routine, tester, and client
2015-07-12 15:11:50 +02:00
CNugteren
b02876d6e9
Added the HER2K routine, tester, and client
2015-07-10 20:59:20 +02:00
CNugteren
919bba3eaf
Added the HERK routine, tester, and client
2015-07-10 07:19:59 +02:00
CNugteren
5578d5ab28
Added option to set the imaginary part of the diagonal to zero
2015-07-08 07:25:18 +02:00
CNugteren
d9ea0c47c6
Added the TRMM routine, tester, and client
2015-07-02 07:16:04 +02:00
CNugteren
b8d81a60d6
Fixed typos in SYMM
2015-07-01 09:38:04 +02:00
CNugteren
7c8d16147a
Added the SYR2K routine, tester, and client
2015-06-26 08:12:56 +02:00
CNugteren
57c705dbf2
Clarified comment
2015-06-25 20:38:34 +02:00
CNugteren
60a88aac86
Added the SYRK routine, tester, and client
2015-06-24 07:50:18 +02:00
CNugteren
20eb3506d6
Added a condition to update only lower/upper triangular parts in the un-pad kernels
2015-06-23 08:09:07 +02:00
CNugteren
682c01a80c
Now returns program from database by reference
2015-06-18 18:44:14 +02:00
CNugteren
7e176ccac9
Added support for conjugate transpose in GEMV
2015-06-16 08:42:52 +02:00
CNugteren
8f01c644b5
Added support for complex conjugate transpose
2015-06-16 07:43:19 +02:00
CNugteren
294a3e3d41
Split the three variations of the GEMV kernel for maximal tuning freedom
2015-06-14 11:15:53 +02:00
CNugteren
ab0064dab7
Fixed number of threads launched for GEMV
2015-06-14 10:08:56 +02:00
CNugteren
9aa2989447
Fixed number of threads launched for AXPY
2015-06-14 10:08:23 +02:00
CNugteren
4b3e3dcfe0
Added a fast GEMV kernel with vector loads, no tail, and fewer if-statements
2015-06-13 20:46:01 +02:00
CNugteren
e522d1a74e
Added initial version of GEMV including tester and performance client
2015-06-13 11:01:20 +02:00
CNugteren
bc5a341dfe
Initial commit of preview version
2015-05-30 12:30:43 +02:00