Cedric Nugteren
|
d24138808b
|
Fixed an FP16 issue in the homatcopy test; added a comment about improper testing of integer returning functions for FP16
|
2017-11-08 21:20:07 +01:00 |
|
Cedric Nugteren
|
b18cc9d3f1
|
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
|
2017-11-07 22:20:13 +01:00 |
|
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
|
Cedric Nugteren
|
f24d611e57
|
Made it possible to compile the CLBlast performance clients for Android with the NDK
|
2017-10-29 13:02:14 +01:00 |
|
Cedric Nugteren
|
319762f150
|
Added Android support using the GNU C++ STL library and the GCC toolchain
|
2017-10-29 12:07:07 +01:00 |
|
Cedric Nugteren
|
12b08ae491
|
Merge branch 'master' into android_support
|
2017-10-28 17:32:37 +02:00 |
|
Cedric Nugteren
|
5fd1f2fc60
|
Added first version of a roadmap
|
2017-10-20 18:21:31 +02:00 |
|
Cedric Nugteren
|
472f90501c
|
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
|
2017-10-20 18:06:12 +02:00 |
|
Cedric Nugteren
|
03760f80eb
|
Added CUDA API documentation
|
2017-10-16 21:54:42 +02:00 |
|
Cedric Nugteren
|
f4c4674cf6
|
Updated to version 1.1.0
|
2017-09-30 17:19:17 +02:00 |
|
Cedric Nugteren
|
2949e156f5
|
Added notes for Android compilation of CLBlast
|
2017-09-26 21:23:53 +02:00 |
|
Cedric Nugteren
|
a23cd8d13a
|
Updated README with proper AMD device names; fixed device look-up for names of length 50+
|
2017-09-16 21:26:38 +02:00 |
|
Cedric Nugteren
|
4d9d03ba51
|
Completed im2col implementation
|
2017-08-24 21:11:12 +02:00 |
|
Cedric Nugteren
|
18d832e149
|
Added tuning results for the Qualcomm Adreno 330 GPU
|
2017-07-30 18:18:02 +02:00 |
|
Cedric Nugteren
|
b7473f50df
|
Added status badges for correctness tests; updated list of contributors; fixed minor typos
|
2017-07-24 20:14:47 +02:00 |
|
Cedric Nugteren
|
b8df03e5bc
|
Added CLBlast paper and presentation references in README
|
2017-06-25 20:45:14 +02:00 |
|
Cedric Nugteren
|
48f2682eb7
|
Added tuning results for the Core i7-920 CPU
|
2017-06-18 20:53:59 +02:00 |
|
Cedric Nugteren
|
33ed1e5a06
|
Added tuning results for GeForce GT 650M (thanks to bzcheeseman)
|
2017-06-01 22:52:08 +02:00 |
|
Cedric Nugteren
|
f151e56daa
|
Added the IxAMIN routines: absolute minimum version of IxAMAX
|
2017-05-12 20:01:33 -07:00 |
|
Cedric Nugteren
|
81d9ed3946
|
Removed the included performance reports; README now redirects to the new external website
|
2017-05-12 13:18:10 -07:00 |
|
Cedric Nugteren
|
71933c3411
|
Added tuning results for the AMD Radeon Fiji GPU
|
2017-05-11 22:53:52 -07:00 |
|
Cedric Nugteren
|
d67455fdb8
|
Fixes the build-status table in the README
|
2017-05-11 22:22:10 -07:00 |
|
Cedric Nugteren
|
b0f3659121
|
The master branch is now the main 'development' branch
|
2017-05-03 19:49:15 +02:00 |
|
Cedric Nugteren
|
e3bb58f602
|
Finalized support for performance testing against cuBLAS
|
2017-04-16 17:53:51 +02:00 |
|
Cedric Nugteren
|
fa5c4b00b7
|
Replaced the R graph scripts with Python/Matplotlib benchmark scripts
|
2017-03-26 15:36:34 +02:00 |
|
Cedric Nugteren
|
7b8f8fce68
|
Added initial naive version of the batched GEMM routine based on the direct GEMM kernel
|
2017-03-11 16:02:45 +01:00 |
|
Cedric Nugteren
|
e9ef037549
|
Added tuning results for the Radeon HD6750M GPU (Apple OpenCL)
|
2017-03-04 15:24:55 +01:00 |
|
Cedric Nugteren
|
4284fcd940
|
Updated the README documentation
|
2017-02-26 16:32:53 +01:00 |
|
Cedric Nugteren
|
ea6790665d
|
Merge branch 'development' into triangular_solvers
|
2017-02-26 14:51:45 +01:00 |
|
Cedric Nugteren
|
b7310036ed
|
Removed half-precision support from the TRSM routine; too unstable
|
2017-02-26 12:56:21 +01:00 |
|
Cedric Nugteren
|
ccac957f17
|
Added documentation for the TRSV and TRSM routines
|
2017-02-25 13:02:15 +01:00 |
|
Cedric Nugteren
|
0643a29af5
|
Added tuning parameters for the AMD RX480 GPU (Ellesmere)
|
2017-02-18 13:59:10 +01:00 |
|
Cedric Nugteren
|
2e0951c6dc
|
Fixed small typo in the documentation
|
2017-02-18 11:05:54 +01:00 |
|
Cedric Nugteren
|
fef11a208c
|
Added documentation for the OverrideParameters function
|
2017-02-18 11:02:57 +01:00 |
|
Cedric Nugteren
|
dc93523204
|
Added tuning results for Titan X (Pascal version)
|
2017-02-08 21:14:38 +01:00 |
|
Cedric Nugteren
|
2e4f6e1609
|
Added tuning results for NVIDIA GTX 1080 and Intel Core i7-4790K
|
2017-01-19 19:42:31 +01:00 |
|
Cedric Nugteren
|
32b850b12b
|
Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU
|
2017-01-03 20:30:56 +01:00 |
|
Cedric Nugteren
|
2cf7d8429a
|
Updated to version 0.10.0
|
2016-11-27 13:34:18 +01:00 |
|
Cedric Nugteren
|
39c49bf4f9
|
Made it possible to use the command-line environmental variables for each executable and without re-running CMake
|
2016-11-27 11:00:29 +01:00 |
|
Cedric Nugteren
|
fa42befcc1
|
Made compilation of the Netlib CBLAS API conditional
|
2016-11-23 21:33:35 +01:00 |
|
Cedric Nugteren
|
bb14a5880e
|
Added an example and documentation for the Netlib CBLAS API
|
2016-10-25 20:37:33 +02:00 |
|
Cedric Nugteren
|
0f5bf35ebe
|
Updated list of acknowledgments and thanks
|
2016-10-24 19:54:45 +02:00 |
|
Cedric Nugteren
|
ec687afa75
|
Added tuning results for GeForce GTX TITAN Black
|
2016-10-24 19:49:10 +02:00 |
|
Cedric Nugteren
|
43f4f02399
|
Added an initial version of contributing guidelines
|
2016-10-23 16:56:51 +02:00 |
|
Cedric Nugteren
|
c925fe463f
|
Added tuning results for the AMD Tonga GPU
|
2016-10-22 16:25:31 +02:00 |
|
Cedric Nugteren
|
c8d0e41e84
|
Added the possibility to supply the env-variable CLBLAST_TEST_ARGUMENTS to specify options for the make alltest or ctest targets
|
2016-10-20 23:05:16 +02:00 |
|
Cedric Nugteren
|
53deed298f
|
Added documentation and minor refactoring for the recent support of static library compilation
|
2016-10-15 17:11:08 +02:00 |
|
Cedric Nugteren
|
ebb505b783
|
Added tuning results for Intel HD Graphics IvyBridge GPU
|
2016-10-13 12:18:28 +02:00 |
|
Cedric Nugteren
|
8a9d3cdf37
|
Added support for compiling the library, the client, and the samples under MSVC 2013
|
2016-10-10 22:45:39 +02:00 |
|
Cedric Nugteren
|
d59e5c570b
|
Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0
|
2016-09-27 21:03:24 +02:00 |
|