Cedric Nugteren
0c9411c844
Updated to version 1.5.0
2018-12-04 20:46:02 +01:00
Cedric Nugteren
d45911b61d
Added groundwork for col2im algorithm plus first non-working version of kernel and test
2018-10-23 20:52:25 +02:00
Cedric Nugteren
83ba3d4b7b
Merge branch 'master' into convgemm_multi_kernel
2018-09-16 20:01:18 +02:00
Cedric Nugteren
9d9f09fce9
Name change of setting to NETLIB_PERSISTENT_OPENCL
2018-08-07 22:41:06 +02:00
Cedric Nugteren
fe639455bd
Added an option to compile the Netlib API with static OpenCL device and context
2018-08-05 21:12:39 +02:00
Cedric Nugteren
5903820ba2
Merge branch 'master' into CLBlast-267-convgemm
2018-07-29 10:26:34 +02:00
Cedric Nugteren
f84036948b
Renamed AMD SI workaround defines
2018-07-27 20:38:01 +02:00
Cedric Nugteren
e8dea34fce
Added workaround for weird AMD SI Hainan bug
2018-07-25 22:59:36 +02:00
Cedric Nugteren
db179a1e40
Updated to CLBlast version 1.4.1
2018-07-14 12:29:06 +02:00
Cedric Nugteren
1c9a741470
Merge branch 'master' into CLBlast-267-convgemm
2018-06-03 15:53:27 +02:00
Cedric Nugteren
4471b67735
Updated to CLBlast version 1.4.0
2018-06-03 13:18:05 +02:00
Cedric Nugteren
bd1715aff9
Fixes for CUDA version of CLBlast
2018-06-03 10:41:57 +02:00
Cedric Nugteren
4f594e3931
Added MKL as an alternative for CBLAS for correctness and performance comparisons
2018-06-02 17:57:45 +02:00
Cedric Nugteren
cbcd4ff7e8
Merge branch 'master' into CLBlast-267-convgemm
2018-05-19 17:54:27 +02:00
Cedric Nugteren
76e0079a90
Fixed compilation issues
2018-05-19 14:18:23 +02:00
Cedric Nugteren
66583b3cda
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
2018-05-19 12:48:59 +02:00
Cedric Nugteren
2d1f6ba7fe
Added convgemm skeleton, test infrastructure, and first reference implementation
2018-05-06 11:35:34 +02:00
Cedric Nugteren
3e3a26e0da
Fixes for the CUDA API
2018-04-20 21:50:36 +02:00
Cedric Nugteren
0e1a152023
First version of the tuning API, added interface for copy-kernel, added sample
2018-03-06 20:52:12 +01:00
Cedric Nugteren
c5a28cd70b
Added CLBlast version numbering to the compiled library
2018-02-11 15:31:21 +01:00
Cedric Nugteren
ef5008f5e4
Created the API and stubs for the HAD (hadamard-product) routines
2018-01-31 20:41:02 +01:00
Cedric Nugteren
37c5e8f58c
Updated to CLBlast version 1.3.0
2018-01-29 20:45:21 +01:00
Cedric Nugteren
d1d80ca131
Fixed a compilation error of the kernel-preprocessor test under MSVC
2018-01-29 20:26:25 +01:00
Cedric Nugteren
0e5eaa6eb9
Factored out the generic parts of the GEMM routine tuner
2018-01-15 21:32:51 +01:00
Cedric Nugteren
90e8e55acb
Added test for the RetrieveParameters function
2018-01-11 20:34:09 +01:00
Cedric Nugteren
9fb2c61b25
Added API and tests for new GemmStridedBatched routine
2018-01-07 14:27:15 +01:00
Cedric Nugteren
1e738db6dd
Split the database into multiple small compilation units
2017-12-27 12:04:22 +01:00
Cedric Nugteren
bd540829ea
Fixes for the CUDA backend of CLBlast
2017-12-24 12:10:55 +01:00
Cedric Nugteren
8657e90cf8
Fixed linking of the preprocessor test for MSVC
2017-12-24 11:33:47 +01:00
Cedric Nugteren
b1f52f130c
Updated the database to use the new TRSV and Invert tuners
2017-12-23 13:55:22 +01:00
Cedric Nugteren
aa7db4f987
Added TRSV block-size tuner
2017-12-23 13:34:57 +01:00
Cedric Nugteren
07a7012b0d
Added skeleton for a tuner for the invert kernel
2017-12-19 21:10:48 +01:00
Cedric Nugteren
c0c6d00b12
Added stub for a preprocessor and a corresponding compilation test
2017-11-25 10:24:05 +01:00
Cedric Nugteren
c6690df896
Made the tuners be compiled by default
2017-11-19 14:33:25 +01:00
Cedric Nugteren
8d2f7d53aa
Added a library with common tuner sources to speed-up compilation
2017-11-19 12:59:28 +01:00
Cedric Nugteren
f94d498a37
Moved compilation function to separate file; removed dependency of tuners of the CLBlast library
2017-11-17 20:57:46 +01:00
Cedric Nugteren
d9cf206979
Removed dependency on CLTune
2017-11-16 21:28:36 +01:00
Cedric Nugteren
1b2b46f2f0
Added first version of integrated and re-written auto-tuner
2017-11-15 22:49:35 +01:00
Cedric Nugteren
0cd78bb6f9
Added kernel timing functionality to the utilities
2017-11-15 22:47:06 +01:00
Cedric Nugteren
5d5e3f93bc
Updated to CLBlast version 1.2.0
2017-11-08 21:30:06 +01:00
Cedric Nugteren
b18cc9d3f1
Merge pull request #212 from CNugteren/kernel_selection_tuner
...
GEMM kernel selection tuner
2017-11-07 22:20:13 +01:00
Cedric Nugteren
9b0a435fb0
Integrated the GEMM routine tuner for kernel selection; added first tuning results
2017-11-02 21:47:14 +01:00
Cedric Nugteren
f24d611e57
Made it possible to compile the CLBlast performance clients for Android with the NDK
2017-10-29 13:02:14 +01:00
Cedric Nugteren
334a26eb12
Added initial version of a GEMM kernel selection tuner
2017-10-28 17:30:29 +02:00
Cedric Nugteren
bd57dfa435
Moved timing function to a separate file
2017-10-28 14:12:05 +02:00
Cedric Nugteren
8579b2b494
Added a DTRSM C++ interface example
2017-10-27 21:53:19 +02:00
Matthias Vogelgesang
34e537a5c1
Use GNUInstallDirs to determine install paths
...
The GNUInstallDirs module* provides variables that match the install directories
for GNU Software and allows users to override them. Without hardcoding paths
packagers can choose library paths according to distribution policies (i.e.
lib, lib64, lib<arch>, ...).
* https://cmake.org/cmake/help/v3.0/module/GNUInstallDirs.html
2017-10-23 15:54:55 +02:00
Cedric Nugteren
42dcd8fd8a
Merge pull request #204 from CNugteren/cuda_api
...
Cuda API to CLBlast
2017-10-20 12:07:30 +02:00
Cedric Nugteren
a3069a97c3
Prepared test and client infrastructure for use with the CUDA API
2017-10-15 13:56:19 +02:00
Cedric Nugteren
48133a0cd1
Added an option to choose whether to override the MSVC flags from /MT to /MD (default ON)
2017-10-14 16:26:35 +02:00