Cedric Nugteren
|
7fab29304c
|
Added sample to play around with XAMAX routine
|
2020-03-08 11:26:18 +01:00 |
|
Cedric Nugteren
|
8433985051
|
Updated to version 1.5.1
|
2020-02-18 10:29:40 +01:00 |
|
Cedric Nugteren
|
560f7a40f6
|
Added convgemm to the CLBlast database, added initial parameters for Skylake GPU
|
2018-12-31 19:05:34 +01:00 |
|
Cedric Nugteren
|
1f0cd61824
|
Added first version of a tuner for the ConvGemm direct kernel
|
2018-12-18 13:59:26 +09:00 |
|
Cedric Nugteren
|
0c9411c844
|
Updated to version 1.5.0
|
2018-12-04 20:46:02 +01:00 |
|
Cedric Nugteren
|
d45911b61d
|
Added groundwork for col2im algorithm plus first non-working version of kernel and test
|
2018-10-23 20:52:25 +02:00 |
|
Cedric Nugteren
|
83ba3d4b7b
|
Merge branch 'master' into convgemm_multi_kernel
|
2018-09-16 20:01:18 +02:00 |
|
Cedric Nugteren
|
9d9f09fce9
|
Name change of setting to NETLIB_PERSISTENT_OPENCL
|
2018-08-07 22:41:06 +02:00 |
|
Cedric Nugteren
|
fe639455bd
|
Added an option to compile the Netlib API with static OpenCL device and context
|
2018-08-05 21:12:39 +02:00 |
|
Cedric Nugteren
|
5903820ba2
|
Merge branch 'master' into CLBlast-267-convgemm
|
2018-07-29 10:26:34 +02:00 |
|
Cedric Nugteren
|
f84036948b
|
Renamed AMD SI workaround defines
|
2018-07-27 20:38:01 +02:00 |
|
Cedric Nugteren
|
e8dea34fce
|
Added workaround for weird AMD SI Hainan bug
|
2018-07-25 22:59:36 +02:00 |
|
Cedric Nugteren
|
db179a1e40
|
Updated to CLBlast version 1.4.1
|
2018-07-14 12:29:06 +02:00 |
|
Cedric Nugteren
|
1c9a741470
|
Merge branch 'master' into CLBlast-267-convgemm
|
2018-06-03 15:53:27 +02:00 |
|
Cedric Nugteren
|
4471b67735
|
Updated to CLBlast version 1.4.0
|
2018-06-03 13:18:05 +02:00 |
|
Cedric Nugteren
|
bd1715aff9
|
Fixes for CUDA version of CLBlast
|
2018-06-03 10:41:57 +02:00 |
|
Cedric Nugteren
|
4f594e3931
|
Added MKL as an alternative for CBLAS for correctness and performance comparisons
|
2018-06-02 17:57:45 +02:00 |
|
Cedric Nugteren
|
cbcd4ff7e8
|
Merge branch 'master' into CLBlast-267-convgemm
|
2018-05-19 17:54:27 +02:00 |
|
Cedric Nugteren
|
76e0079a90
|
Fixed compilation issues
|
2018-05-19 14:18:23 +02:00 |
|
Cedric Nugteren
|
66583b3cda
|
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
|
2018-05-19 12:48:59 +02:00 |
|
Cedric Nugteren
|
2d1f6ba7fe
|
Added convgemm skeleton, test infrastructure, and first reference implementation
|
2018-05-06 11:35:34 +02:00 |
|
Cedric Nugteren
|
3e3a26e0da
|
Fixes for the CUDA API
|
2018-04-20 21:50:36 +02:00 |
|
Cedric Nugteren
|
0e1a152023
|
First version of the tuning API, added interface for copy-kernel, added sample
|
2018-03-06 20:52:12 +01:00 |
|
Cedric Nugteren
|
c5a28cd70b
|
Added CLBlast version numbering to the compiled library
|
2018-02-11 15:31:21 +01:00 |
|
Cedric Nugteren
|
ef5008f5e4
|
Created the API and stubs for the HAD (hadamard-product) routines
|
2018-01-31 20:41:02 +01:00 |
|
Cedric Nugteren
|
37c5e8f58c
|
Updated to CLBlast version 1.3.0
|
2018-01-29 20:45:21 +01:00 |
|
Cedric Nugteren
|
d1d80ca131
|
Fixed a compilation error of the kernel-preprocessor test under MSVC
|
2018-01-29 20:26:25 +01:00 |
|
Cedric Nugteren
|
0e5eaa6eb9
|
Factored out the generic parts of the GEMM routine tuner
|
2018-01-15 21:32:51 +01:00 |
|
Cedric Nugteren
|
90e8e55acb
|
Added test for the RetrieveParameters function
|
2018-01-11 20:34:09 +01:00 |
|
Cedric Nugteren
|
9fb2c61b25
|
Added API and tests for new GemmStridedBatched routine
|
2018-01-07 14:27:15 +01:00 |
|
Cedric Nugteren
|
1e738db6dd
|
Split the database into multiple small compilation units
|
2017-12-27 12:04:22 +01:00 |
|
Cedric Nugteren
|
bd540829ea
|
Fixes for the CUDA backend of CLBlast
|
2017-12-24 12:10:55 +01:00 |
|
Cedric Nugteren
|
8657e90cf8
|
Fixed linking of the preprocessor test for MSVC
|
2017-12-24 11:33:47 +01:00 |
|
Cedric Nugteren
|
b1f52f130c
|
Updated the database to use the new TRSV and Invert tuners
|
2017-12-23 13:55:22 +01:00 |
|
Cedric Nugteren
|
aa7db4f987
|
Added TRSV block-size tuner
|
2017-12-23 13:34:57 +01:00 |
|
Cedric Nugteren
|
07a7012b0d
|
Added skeleton for a tuner for the invert kernel
|
2017-12-19 21:10:48 +01:00 |
|
Cedric Nugteren
|
c0c6d00b12
|
Added stub for a preprocessor and a corresponding compilation test
|
2017-11-25 10:24:05 +01:00 |
|
Cedric Nugteren
|
c6690df896
|
Made the tuners be compiled by default
|
2017-11-19 14:33:25 +01:00 |
|
Cedric Nugteren
|
8d2f7d53aa
|
Added a library with common tuner sources to speed-up compilation
|
2017-11-19 12:59:28 +01:00 |
|
Cedric Nugteren
|
f94d498a37
|
Moved compilation function to separate file; removed dependency of tuners of the CLBlast library
|
2017-11-17 20:57:46 +01:00 |
|
Cedric Nugteren
|
d9cf206979
|
Removed dependency on CLTune
|
2017-11-16 21:28:36 +01:00 |
|
Cedric Nugteren
|
1b2b46f2f0
|
Added first version of integrated and re-written auto-tuner
|
2017-11-15 22:49:35 +01:00 |
|
Cedric Nugteren
|
0cd78bb6f9
|
Added kernel timing functionality to the utilities
|
2017-11-15 22:47:06 +01:00 |
|
Cedric Nugteren
|
5d5e3f93bc
|
Updated to CLBlast version 1.2.0
|
2017-11-08 21:30:06 +01:00 |
|
Cedric Nugteren
|
b18cc9d3f1
|
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
|
2017-11-07 22:20:13 +01:00 |
|
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
|
Cedric Nugteren
|
f24d611e57
|
Made it possible to compile the CLBlast performance clients for Android with the NDK
|
2017-10-29 13:02:14 +01:00 |
|
Cedric Nugteren
|
334a26eb12
|
Added initial version of a GEMM kernel selection tuner
|
2017-10-28 17:30:29 +02:00 |
|
Cedric Nugteren
|
bd57dfa435
|
Moved timing function to a separate file
|
2017-10-28 14:12:05 +02:00 |
|
Cedric Nugteren
|
8579b2b494
|
Added a DTRSM C++ interface example
|
2017-10-27 21:53:19 +02:00 |
|