Cedric Nugteren
080e1be684
Improved the default parameters for cases with non-common parameters across all devices
2016-11-26 16:38:17 +01:00
Cedric Nugteren
cb398f0e42
Merge pull request #125 from CNugteren/netlib_blas_api
...
Netlib CBLAS API for CLBlast
2016-11-24 19:35:59 +01:00
Cedric Nugteren
792cc8359f
Fixed a vector-size related bug in the CLBlast Netlib API
2016-11-23 22:00:20 +01:00
Cedric Nugteren
654b41bb2b
Fixed a bug in the HSCAL routine
2016-11-23 21:29:16 +01:00
Cedric Nugteren
26ca071480
Minor changes to ensure full compatibility with the Netlib CBLAS API
2016-11-22 08:41:52 +01:00
Cedric Nugteren
eefe0df435
Made functions with scalar-buffers as output properly return values
2016-11-20 21:36:57 +01:00
Cedric Nugteren
d8af24e388
Now correctly tests for validaty of the B matrix in the TRMM routine
2016-11-20 16:27:54 +01:00
Cedric Nugteren
90eb8738c4
Forced OpenCL 1.1 compilation and disabled a deprecation warning
2016-11-20 16:27:02 +01:00
Cedric Nugteren
2f0697564f
Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything
2016-11-20 15:05:42 +01:00
Cedric Nugteren
6eeb1180fd
Changed the GEMM kernel selection parameters for Skylake GPUs to always favour the regular kernel
2016-11-19 22:15:33 +01:00
Cedric Nugteren
746d688e07
Updated the tuning results for the Intel Skylake ULT GT2 GPU
2016-11-15 22:42:04 +01:00
Cedric Nugteren
8ae8ab06a2
Renamed the include and source files of the Netlib CBLAS API
2016-10-25 20:33:10 +02:00
Cedric Nugteren
140121ef91
Removed the clblast namespace from the Netlib C API source file to ensure proper linking
2016-10-25 20:21:50 +02:00
Cedric Nugteren
729862e873
Fixed some issues preventing the Netlib CBLAS API from linking correctly
2016-10-25 19:56:42 +02:00
Cedric Nugteren
926aca53a0
Made the Netlib CBLAS API use the same enums with prefixes as the regular C API of CLBlast
2016-10-25 19:45:57 +02:00
Cedric Nugteren
59183b7d79
Sets the proper sizes for the buffers for the Netlib CBLAS API
2016-10-25 19:21:49 +02:00
Cedric Nugteren
f96fd372bc
Added initial version of a Netlib CBLAS implementation. TODO: Set correct buffer sizes
2016-10-25 14:28:52 +02:00
Cedric Nugteren
ec687afa75
Added tuning results for GeForce GTX TITAN Black
2016-10-24 19:49:10 +02:00
Cedric Nugteren
76d5d2ccfc
Fixed a bug in the transpose-matrix function
2016-10-23 20:49:55 +02:00
Cedric Nugteren
b8d4a9b9d0
Removed PUBLIC_API from the C++ exception classes
2016-10-23 16:09:59 +02:00
Cedric Nugteren
66f5c9d9b8
Added a fix for compilation under Visual Studio 2013 related to the new exception classes
2016-10-23 15:55:03 +02:00
Cedric Nugteren
c925fe463f
Added tuning results for the AMD Tonga GPU
2016-10-22 16:25:31 +02:00
Cedric Nugteren
a670c4c4bf
All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects
2016-10-22 16:14:56 +02:00
Cedric Nugteren
b0ff11acf0
Moved files around a bit; created a utilities subfolder
2016-10-22 15:36:48 +02:00
Cedric Nugteren
9afbbc9ef9
Added documentation for the better exception handling
2016-10-22 15:23:18 +02:00
Cedric Nugteren
280698d076
Merge pull request #117 from intelfx/exceptions
...
Convert to use C++ exceptions internally
2016-10-22 15:05:12 +02:00
Cedric Nugteren
9b596820d2
Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters (2)
2016-10-22 10:50:12 +02:00
Cedric Nugteren
db17b1fbe9
Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters
2016-10-22 10:41:02 +02:00
Ivan Shapovalov
56f300607b
Routine: get rid of ::SetUp()
...
Since we now use C++ exceptions inside the implementation (and exceptions
can be thrown from constructors), there is no need for a separate
Routine::SetUp() function.
For this, we also change the way how the kernel source string is constructed.
The kernel-specific source code is now passed to the Routine ctor via
an initializer_list of C strings to avoid unnecessary data copying
while also working around C1091 of MSVC 2013.
2016-10-22 08:45:27 +03:00
Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Ivan Shapovalov
5d03d48f7a
src/clpp11.hpp: avoid throwing exceptions from std::shared_ptr's Deleter
2016-10-22 07:25:16 +03:00
Ivan Shapovalov
6ac7edd2da
src/clpp11.hpp: GetInfoString: avoid reallocation
2016-10-22 07:25:16 +03:00
Ivan Shapovalov
106565fa9a
src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo()
2016-10-22 07:25:15 +03:00
Cedric Nugteren
597974b40d
Merge pull request #118 from matze/add-pkg-config
...
Generate and install pkg-config description
2016-10-21 21:00:07 +02:00
Matthias Vogelgesang
3797d144cc
Generate and install pkg-config description
2016-10-21 09:38:25 +02:00
Cedric Nugteren
0f9311d46a
Fixed an issue with a growing database: the database is now a global variable in a namespace and its container uses const-pointers to the actual data
2016-10-14 20:56:32 +02:00
Cedric Nugteren
ebb505b783
Added tuning results for Intel HD Graphics IvyBridge GPU
2016-10-13 12:18:28 +02:00
Cedric Nugteren
c60f6715f8
Removed a spurious #ifdef
2016-10-12 21:49:59 +02:00
Cedric Nugteren
ad2b6ecea2
Fixed missing line ending
2016-10-12 21:10:22 +02:00
Cedric Nugteren
8a9d3cdf37
Added support for compiling the library, the client, and the samples under MSVC 2013
2016-10-10 22:45:39 +02:00
Cedric Nugteren
f88c50522d
Fixed an issue with const members of structs in the database
2016-10-10 22:24:05 +02:00
Cedric Nugteren
de77f00e8c
Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 and 2015
2016-10-10 22:23:33 +02:00
Cedric Nugteren
fcac81bfef
First fixes towards compilation on Visual Studio 2013
2016-10-10 20:37:45 +02:00
Cedric Nugteren
08ee57f494
Updated the tuning results for the GTX 750 Ti GPU
2016-10-10 16:41:41 +02:00
Cedric Nugteren
7c228f6a67
Changed the thresholds for the direct/indirect GEMM kernels for NVIDIA and Intel GPUs
2016-10-10 16:01:02 +02:00
Cedric Nugteren
7baac46e72
Fixed a performance bug for Intel Iris Pro GPUs due to incorrect tuning results
2016-10-08 21:56:06 +02:00
Cedric Nugteren
b698e45478
Added first tuning results for the single-kernel direct GEMM implementation
2016-10-06 21:13:14 +02:00
Cedric Nugteren
a3e67f2be2
Added a kernel selection database to select between the direct and indirect GEMM kernels
2016-10-06 19:51:12 +02:00
Cedric Nugteren
7052a00a3e
Fixed a const-correctness issue with complex conjugation in the GEMM direct kernel
2016-10-03 20:13:19 +02:00
Cedric Nugteren
ca0c075de2
Added functions to load from off-chip to local memory without vector loads for the GEMM direct kernels
2016-10-03 20:09:15 +02:00