Ivan Shapovalov
b98af44fcf
treewide: use C++ exceptions properly
...
Since the codebase is designed around proper C++ idioms such as RAII, it
makes sense to only use C++ exceptions internally instead of mixing
exceptions and error codes. The exceptions are now caught at top level
to preserve compatibility with the existing error code-based API.
Note that we deliberately do not catch C++ runtime errors (such as
`std::bad_alloc`) nor logic errors (aka failed assertions) because no
actual handling can ever happen for such errors.
However, in the C interface we do catch _all_ exceptions (...) and
convert them into a wild-card error code.
2016-10-22 08:45:25 +03:00
Ivan Shapovalov
5d03d48f7a
src/clpp11.hpp: avoid throwing exceptions from std::shared_ptr's Deleter
2016-10-22 07:25:16 +03:00
Ivan Shapovalov
6ac7edd2da
src/clpp11.hpp: GetInfoString: avoid reallocation
2016-10-22 07:25:16 +03:00
Ivan Shapovalov
106565fa9a
src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo()
2016-10-22 07:25:15 +03:00
Cedric Nugteren
53deed298f
Added documentation and minor refactoring for the recent support of static library compilation
2016-10-15 17:11:08 +02:00
Cedric Nugteren
a63f57297b
Merge pull request #115 from shehzan10/development
...
Fixes for static lib compilation on Windows
2016-10-15 09:48:03 +02:00
Shehzan Mohammed
0d958bf3b3
Fixes for static lib compilation on Windows
2016-10-14 18:45:34 -04:00
Cedric Nugteren
c0482ace6c
Fixed a bug where clblas.h couldn't be found for the performance tests (clients)
2016-10-14 22:11:35 +02:00
Cedric Nugteren
0f9311d46a
Fixed an issue with a growing database: the database is now a global variable in a namespace and its container uses const-pointers to the actual data
2016-10-14 20:56:32 +02:00
Cedric Nugteren
3386ad49c4
Set proper flags for the verbose mode (debug flags)
2016-10-14 20:54:05 +02:00
Cedric Nugteren
99a620f9a1
Merge pull request #112 from shehzan10/static
...
Add option to build shared or static library
2016-10-14 10:06:44 +02:00
Shehzan Mohammed
56f07e42b1
Add option to build shared or static library
2016-10-13 12:03:44 -04:00
Cedric Nugteren
ebb505b783
Added tuning results for Intel HD Graphics IvyBridge GPU
2016-10-13 12:18:28 +02:00
Cedric Nugteren
541415374f
Merge pull request #108 from CNugteren/msvc2013
...
Support for Visual Studio 2013
2016-10-13 08:34:07 +02:00
Cedric Nugteren
c60f6715f8
Removed a spurious #ifdef
2016-10-12 21:49:59 +02:00
Cedric Nugteren
ad2b6ecea2
Fixed missing line ending
2016-10-12 21:10:22 +02:00
Cedric Nugteren
8a9d3cdf37
Added support for compiling the library, the client, and the samples under MSVC 2013
2016-10-10 22:45:39 +02:00
Cedric Nugteren
f88c50522d
Fixed an issue with const members of structs in the database
2016-10-10 22:24:05 +02:00
Cedric Nugteren
de77f00e8c
Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 and 2015
2016-10-10 22:23:33 +02:00
Cedric Nugteren
fcac81bfef
First fixes towards compilation on Visual Studio 2013
2016-10-10 20:37:45 +02:00
Cedric Nugteren
39afc9543b
Changed the storage location of the database to a separate Github repository
2016-10-10 19:10:12 +02:00
Cedric Nugteren
71f5c0c145
Changed the license to MIT
2016-10-10 18:07:17 +02:00
Cedric Nugteren
42ee4abbbc
Updated the performance graphs for Intel Iris Pro GPU and AMD Radeon M370X GPU
2016-10-10 18:07:05 +02:00
Cedric Nugteren
f563341e7b
Added fresh performance graphs for GeForce 750Ti; removed old GTX480 results
2016-10-10 16:59:28 +02:00
Cedric Nugteren
08ee57f494
Updated the tuning results for the GTX 750 Ti GPU
2016-10-10 16:41:41 +02:00
Cedric Nugteren
2194dee217
Merge branch 'gemm_direct' into development
2016-10-10 16:05:18 +02:00
Cedric Nugteren
7c228f6a67
Changed the thresholds for the direct/indirect GEMM kernels for NVIDIA and Intel GPUs
2016-10-10 16:01:02 +02:00
Cedric Nugteren
d7cfb6aa9b
Added benchmark script for small matrix sizes, testing the direct GEMM kernels
2016-10-08 22:05:54 +02:00
Cedric Nugteren
7baac46e72
Fixed a performance bug for Intel Iris Pro GPUs due to incorrect tuning results
2016-10-08 21:56:06 +02:00
Cedric Nugteren
b698e45478
Added first tuning results for the single-kernel direct GEMM implementation
2016-10-06 21:13:14 +02:00
Cedric Nugteren
a3e67f2be2
Added a kernel selection database to select between the direct and indirect GEMM kernels
2016-10-06 19:51:12 +02:00
Cedric Nugteren
7052a00a3e
Fixed a const-correctness issue with complex conjugation in the GEMM direct kernel
2016-10-03 20:13:19 +02:00
Cedric Nugteren
ca0c075de2
Added functions to load from off-chip to local memory without vector loads for the GEMM direct kernels
2016-10-03 20:09:15 +02:00
Cedric Nugteren
c1c4bc5d20
Re-organised GEMM direct kernel and added faster fall-back version for incomplete rectangles
2016-10-03 19:32:01 +02:00
Cedric Nugteren
243cef73db
Set the default number of runs for all kernels to at least 2 runs
2016-10-02 21:23:23 +02:00
Cedric Nugteren
d8827e908c
Specialised the GEMM direct kernel in four ways for transposing/non-transposing: NN, NT, TN, TT
2016-10-02 17:59:05 +02:00
Cedric Nugteren
61f489e370
Split the GEMM direct kernel into two files; set the default tuning target to 256-256-256
2016-10-02 15:06:59 +02:00
Cedric Nugteren
a459920105
Added padding to the local memory of the GEMM direct kernel
2016-10-01 16:58:53 +02:00
Cedric Nugteren
ecc704cc76
Added default num-runs to the tuner adding averaging over 10 runs as a default for the GEMM direct kernel
2016-10-01 16:55:21 +02:00
Cedric Nugteren
a9d35cf04c
Merge branch 'development' into gemm_direct
2016-10-01 13:45:08 +02:00
Cedric Nugteren
d59e5c570b
Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0
2016-09-27 21:03:24 +02:00
Cedric Nugteren
db5772e521
Updated to version 8.0 of the CLCudaAPI header
2016-09-27 20:56:49 +02:00
Cedric Nugteren
adc058440c
Fixed the local memory size computation for the GEMM tuners
2016-09-27 20:03:55 +02:00
Cedric Nugteren
6178fcd584
Now generates test/client/tuner data using a fixed seed to enable reproducability of results
2016-09-27 19:55:21 +02:00
Cedric Nugteren
e3076d26cc
Added more relaxed error checking for the half-precision tests
2016-09-27 19:42:58 +02:00
Cedric Nugteren
a2bfae3c46
Merge pull request #103 from dividiti/link_clblas_with_pthread
...
Link clBLAS together with pthread
2016-09-27 08:53:08 +02:00
Anton Lokhmotov
c484bb26b6
Use cross-platform thread lib idiom instead of *nix-specific pthread.
2016-09-26 21:04:28 +00:00
Anton Lokhmotov
c20a5bb7ca
Link clBLAS together with pthread.
2016-09-26 10:30:18 +00:00
Cedric Nugteren
73d135c2ce
Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter
2016-09-25 14:48:34 +02:00
Cedric Nugteren
669f43aed6
Separated the tuning parameters of the new direct GEMM kernel from the indirect version
2016-09-25 13:52:08 +02:00