Cedric Nugteren
d24138808b
Fixed an FP16 issue in the homatcopy test; added a comment about improper testing of integer returning functions for FP16
2017-11-08 21:20:07 +01:00
Cedric Nugteren
b18cc9d3f1
Merge pull request #212 from CNugteren/kernel_selection_tuner
...
GEMM kernel selection tuner
2017-11-07 22:20:13 +01:00
Cedric Nugteren
6fe9916231
Updated the roadmap
2017-11-07 21:35:04 +01:00
Cedric Nugteren
3ec0be6fb8
Added various GEMM routine tuning results
2017-11-07 21:34:54 +01:00
Cedric Nugteren
33ac2b0175
Improved the way the database defaults are computed
2017-11-06 21:59:45 +01:00
Cedric Nugteren
34a33b54cf
Changed GEMM routine tuner's scoring to use L2 measure instead for better averaging
2017-11-06 20:50:36 +01:00
Cedric Nugteren
9b0a435fb0
Integrated the GEMM routine tuner for kernel selection; added first tuning results
2017-11-02 21:47:14 +01:00
Cedric Nugteren
73272ab97d
Fixed a bug in database compression/decompression
2017-11-02 21:19:18 +01:00
Cedric Nugteren
5c90577dfd
Added collecting and printing of scores for the kernel-selection tuner
2017-10-30 20:39:21 +01:00
Cedric Nugteren
061b1c571b
Merge branch 'binary_cache_platform_dependent'
2017-10-30 19:42:35 +01:00
Cedric Nugteren
ac5a58cfe5
Added platform ID to the binary program cache to prevent issues with multi-platform systems
2017-10-29 20:01:30 +01:00
Cedric Nugteren
19c53f6dd0
Merge pull request #208 from CNugteren/android_support
...
Added Android support
2017-10-29 16:45:56 +01:00
Cedric Nugteren
f24d611e57
Made it possible to compile the CLBlast performance clients for Android with the NDK
2017-10-29 13:02:14 +01:00
Cedric Nugteren
319762f150
Added Android support using the GNU C++ STL library and the GCC toolchain
2017-10-29 12:07:07 +01:00
Cedric Nugteren
12b08ae491
Merge branch 'master' into android_support
2017-10-28 17:32:37 +02:00
Cedric Nugteren
334a26eb12
Added initial version of a GEMM kernel selection tuner
2017-10-28 17:30:29 +02:00
Cedric Nugteren
bd57dfa435
Moved timing function to a separate file
2017-10-28 14:12:05 +02:00
Cedric Nugteren
fa6e5e67f5
Fixed a bug when using the matrix A-offset argument for the TRSM routine
2017-10-27 22:12:30 +02:00
Cedric Nugteren
449577cf07
Reduced TRSM block-size for better numerical stability
2017-10-27 22:07:43 +02:00
Cedric Nugteren
44f7fa628a
Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM
2017-10-27 22:01:15 +02:00
Cedric Nugteren
8579b2b494
Added a DTRSM C++ interface example
2017-10-27 21:53:19 +02:00
Cedric Nugteren
e388f055f7
Fixed small bug in (unused) invert tester
2017-10-25 20:35:39 +02:00
Cedric Nugteren
8cdb5cb4a7
Updated roadmap with links to issues and status
2017-10-25 20:35:39 +02:00
Cedric Nugteren
d49aae236e
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls
2017-10-25 20:35:39 +02:00
Cedric Nugteren
42ac3b4748
Merge pull request #206 from matze/use-gnuinstall-dirs
...
Use GNUInstallDirs to determine install paths
2017-10-23 20:03:47 +02:00
Matthias Vogelgesang
34e537a5c1
Use GNUInstallDirs to determine install paths
...
The GNUInstallDirs module* provides variables that match the install directories
for GNU Software and allows users to override them. Without hardcoding paths
packagers can choose library paths according to distribution policies (i.e.
lib, lib64, lib<arch>, ...).
* https://cmake.org/cmake/help/v3.0/module/GNUInstallDirs.html
2017-10-23 15:54:55 +02:00
Cedric Nugteren
5fd1f2fc60
Added first version of a roadmap
2017-10-20 18:21:31 +02:00
Cedric Nugteren
472f90501c
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
2017-10-20 18:06:12 +02:00
Cedric Nugteren
42dcd8fd8a
Merge pull request #204 from CNugteren/cuda_api
...
Cuda API to CLBlast
2017-10-20 12:07:30 +02:00
Cedric Nugteren
363568787e
Moved CUmodule code from Kernel to Program class to not require re-compilation every time
2017-10-18 18:17:30 +02:00
Cedric Nugteren
9d879c949a
Fix an incompatibility with CUDA's FP16 definition
2017-10-17 20:29:23 +02:00
Cedric Nugteren
b1270f04b8
Made buffers of batched routines read/write (was: read-only)
2017-10-17 19:56:47 +02:00
Cedric Nugteren
f349731d54
CUDA kernel compilation fixes
2017-10-17 19:53:09 +02:00
Cedric Nugteren
03760f80eb
Added CUDA API documentation
2017-10-16 21:54:42 +02:00
Cedric Nugteren
0719f14486
Made all CUDA kernel launches synchronous; removed exception raising
2017-10-16 21:54:42 +02:00
Cedric Nugteren
d62823f067
Added a missing OpenCL-to-CUDA function translation
2017-10-15 19:53:52 +02:00
Cedric Nugteren
8431a165d0
Fixed a small copy-paste typo
2017-10-15 19:38:48 +02:00
Cedric Nugteren
e6da575fff
Modified test interfaces such that they support either OpenCL or CUDA
2017-10-15 19:35:21 +02:00
Cedric Nugteren
7663cba234
Fixes for the CUDA API: first tests pass and the client runs
2017-10-15 17:43:20 +02:00
Cedric Nugteren
71049e8d39
Added the SM-compute-arch version to the nv compile options
2017-10-15 17:41:44 +02:00
Cedric Nugteren
a3069a97c3
Prepared test and client infrastructure for use with the CUDA API
2017-10-15 13:56:19 +02:00
Cedric Nugteren
7408da174c
Various fixes to make the first CUDA examples work
2017-10-15 12:17:35 +02:00
Cedric Nugteren
55a802c63d
Fixed a kernel/attribute order bug in the direct GEMM kernels
2017-10-14 17:21:34 +02:00
Cedric Nugteren
b06bc01da9
Make local memory pointers a define in OpenCL; some fixes to the recently changed transpose kernel code
2017-10-14 17:13:54 +02:00
Cedric Nugteren
d9456306e0
Made transpose kernel struct init proper according to the C standard
2017-10-14 16:48:06 +02:00
Cedric Nugteren
48133a0cd1
Added an option to choose whether to override the MSVC flags from /MT to /MD (default ON)
2017-10-14 16:26:35 +02:00
Cedric Nugteren
313fc796b2
Fixed several (not all) CUDA kernel compilation issues
2017-10-14 16:01:12 +02:00
Cedric Nugteren
74d6e0048c
Added DAXPY example for the CUDA API
2017-10-14 12:23:35 +02:00
Cedric Nugteren
54d0c440ce
Various fixes to make the host code and sample compile with the CUDA API
2017-10-14 11:43:57 +02:00
Cedric Nugteren
16b9efd605
Added first untested CUDA sample
2017-10-14 10:50:28 +02:00