Cedric Nugteren
fe639455bd
Added an option to compile the Netlib API with static OpenCL device and context
2018-08-05 21:12:39 +02:00
Cedric Nugteren
f84036948b
Renamed AMD SI workaround defines
2018-07-27 20:38:01 +02:00
Cedric Nugteren
e8dea34fce
Added workaround for weird AMD SI Hainan bug
2018-07-25 22:59:36 +02:00
Cedric Nugteren
db179a1e40
Updated to CLBlast version 1.4.1
2018-07-14 12:29:06 +02:00
Cedric Nugteren
4471b67735
Updated to CLBlast version 1.4.0
2018-06-03 13:18:05 +02:00
Cedric Nugteren
bd1715aff9
Fixes for CUDA version of CLBlast
2018-06-03 10:41:57 +02:00
Cedric Nugteren
4f594e3931
Added MKL as an alternative for CBLAS for correctness and performance comparisons
2018-06-02 17:57:45 +02:00
Cedric Nugteren
76e0079a90
Fixed compilation issues
2018-05-19 14:18:23 +02:00
Cedric Nugteren
66583b3cda
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
2018-05-19 12:48:59 +02:00
Cedric Nugteren
3e3a26e0da
Fixes for the CUDA API
2018-04-20 21:50:36 +02:00
Cedric Nugteren
0e1a152023
First version of the tuning API, added interface for copy-kernel, added sample
2018-03-06 20:52:12 +01:00
Cedric Nugteren
c5a28cd70b
Added CLBlast version numbering to the compiled library
2018-02-11 15:31:21 +01:00
Cedric Nugteren
ef5008f5e4
Created the API and stubs for the HAD (hadamard-product) routines
2018-01-31 20:41:02 +01:00
Cedric Nugteren
37c5e8f58c
Updated to CLBlast version 1.3.0
2018-01-29 20:45:21 +01:00
Cedric Nugteren
d1d80ca131
Fixed a compilation error of the kernel-preprocessor test under MSVC
2018-01-29 20:26:25 +01:00
Cedric Nugteren
0e5eaa6eb9
Factored out the generic parts of the GEMM routine tuner
2018-01-15 21:32:51 +01:00
Cedric Nugteren
90e8e55acb
Added test for the RetrieveParameters function
2018-01-11 20:34:09 +01:00
Cedric Nugteren
9fb2c61b25
Added API and tests for new GemmStridedBatched routine
2018-01-07 14:27:15 +01:00
Cedric Nugteren
1e738db6dd
Split the database into multiple small compilation units
2017-12-27 12:04:22 +01:00
Cedric Nugteren
bd540829ea
Fixes for the CUDA backend of CLBlast
2017-12-24 12:10:55 +01:00
Cedric Nugteren
8657e90cf8
Fixed linking of the preprocessor test for MSVC
2017-12-24 11:33:47 +01:00
Cedric Nugteren
b1f52f130c
Updated the database to use the new TRSV and Invert tuners
2017-12-23 13:55:22 +01:00
Cedric Nugteren
aa7db4f987
Added TRSV block-size tuner
2017-12-23 13:34:57 +01:00
Cedric Nugteren
07a7012b0d
Added skeleton for a tuner for the invert kernel
2017-12-19 21:10:48 +01:00
Cedric Nugteren
c0c6d00b12
Added stub for a preprocessor and a corresponding compilation test
2017-11-25 10:24:05 +01:00
Cedric Nugteren
c6690df896
Made the tuners be compiled by default
2017-11-19 14:33:25 +01:00
Cedric Nugteren
8d2f7d53aa
Added a library with common tuner sources to speed-up compilation
2017-11-19 12:59:28 +01:00
Cedric Nugteren
f94d498a37
Moved compilation function to separate file; removed dependency of tuners of the CLBlast library
2017-11-17 20:57:46 +01:00
Cedric Nugteren
d9cf206979
Removed dependency on CLTune
2017-11-16 21:28:36 +01:00
Cedric Nugteren
1b2b46f2f0
Added first version of integrated and re-written auto-tuner
2017-11-15 22:49:35 +01:00
Cedric Nugteren
0cd78bb6f9
Added kernel timing functionality to the utilities
2017-11-15 22:47:06 +01:00
Cedric Nugteren
5d5e3f93bc
Updated to CLBlast version 1.2.0
2017-11-08 21:30:06 +01:00
Cedric Nugteren
b18cc9d3f1
Merge pull request #212 from CNugteren/kernel_selection_tuner
...
GEMM kernel selection tuner
2017-11-07 22:20:13 +01:00
Cedric Nugteren
9b0a435fb0
Integrated the GEMM routine tuner for kernel selection; added first tuning results
2017-11-02 21:47:14 +01:00
Cedric Nugteren
f24d611e57
Made it possible to compile the CLBlast performance clients for Android with the NDK
2017-10-29 13:02:14 +01:00
Cedric Nugteren
334a26eb12
Added initial version of a GEMM kernel selection tuner
2017-10-28 17:30:29 +02:00
Cedric Nugteren
bd57dfa435
Moved timing function to a separate file
2017-10-28 14:12:05 +02:00
Cedric Nugteren
8579b2b494
Added a DTRSM C++ interface example
2017-10-27 21:53:19 +02:00
Matthias Vogelgesang
34e537a5c1
Use GNUInstallDirs to determine install paths
...
The GNUInstallDirs module* provides variables that match the install directories
for GNU Software and allows users to override them. Without hardcoding paths
packagers can choose library paths according to distribution policies (i.e.
lib, lib64, lib<arch>, ...).
* https://cmake.org/cmake/help/v3.0/module/GNUInstallDirs.html
2017-10-23 15:54:55 +02:00
Cedric Nugteren
42dcd8fd8a
Merge pull request #204 from CNugteren/cuda_api
...
Cuda API to CLBlast
2017-10-20 12:07:30 +02:00
Cedric Nugteren
a3069a97c3
Prepared test and client infrastructure for use with the CUDA API
2017-10-15 13:56:19 +02:00
Cedric Nugteren
48133a0cd1
Added an option to choose whether to override the MSVC flags from /MT to /MD (default ON)
2017-10-14 16:26:35 +02:00
Cedric Nugteren
74d6e0048c
Added DAXPY example for the CUDA API
2017-10-14 12:23:35 +02:00
Cedric Nugteren
16b9efd605
Added first untested CUDA sample
2017-10-14 10:50:28 +02:00
Cedric Nugteren
b901809345
Added first (untested) version of a CUDA API
2017-10-11 23:16:57 +02:00
Cedric Nugteren
df3c9f4a8a
Moved non-routine-specific API functions and includes to separate files
2017-10-08 21:52:02 +02:00
Cedric Nugteren
f4c4674cf6
Updated to version 1.1.0
2017-09-30 17:19:17 +02:00
Cedric Nugteren
2ef6578961
Added first version of a small CLBlast diagnostics helper
2017-09-19 21:43:35 +02:00
Cedric Nugteren
76382ff6c1
Added the new vendor-architecture-name hierarchy to the tuners as well
2017-09-10 16:34:54 +02:00
Cedric Nugteren
91ea7fcde2
Introduced the notion of a device-architecture for the database and added device and architecture name mappings
2017-09-08 21:09:05 +02:00