Commit graph

143 commits

Author SHA1 Message Date
Cedric Nugteren 11f4c7dd93 Added documentation on the convgemm routine 2019-01-19 15:44:19 +01:00
Cedric Nugteren 4676ec2921 Added a FAQ document 2018-12-01 17:19:28 +01:00
Cedric Nugteren 2d32a23293 Added new col2im routine to the documentation 2018-11-01 21:46:19 +01:00
Cedric Nugteren d749c4af72 Added note about AMD southern islands GPU issue and the required workaround 2018-07-31 20:55:56 +02:00
Cedric Nugteren 123f38a8ab Added Beignet 1.2.1 requirement to the README for IvyBridge GPUs 2018-07-31 20:52:00 +02:00
Cedric Nugteren a8bb0c9f3c Added Apple OpenCL TRSV block size override; removed failing old Intel GPU test from README 2018-05-29 21:29:12 +02:00
Cedric Nugteren e3022e562f Updated README with IWOCL talk and GPU zoo acknowledgment 2018-05-17 12:50:28 +02:00
Umar Arshad 1659ae5432 Update ci links to use doman names and build names instead of IP/id
Updates the README badges to point to the domain name instead of
IP addresses. Also updates the names of the builds to the name
of the build instead of the id of the build.
2018-05-08 20:24:40 -04:00
Cedric Nugteren 8b381480f8 Updated README with new badges and paper citation 2018-05-01 20:51:10 +02:00
Cedric Nugteren 49b02ec194 Added initial glossary 2018-03-10 17:02:38 +01:00
Cedric Nugteren 86455841d1 Added badge for OSX-Intel-CPU builds 2018-03-10 16:49:36 +01:00
Cedric Nugteren 269bddbf34 Fixed the buildbot badges in the README 2018-03-03 13:12:09 +01:00
Cedric Nugteren 1aef354577 Updated documentation and build badges 2018-03-03 10:57:06 +01:00
Cedric Nugteren ced830539e Split the documentation and updated where needed 2018-02-24 21:11:28 +01:00
Cedric Nugteren 69ed46c8da Implemented the XHAD Hadamard product routine 2018-02-02 21:18:37 +01:00
Cedric Nugteren 97e92cb10c Updated the known issues 2018-01-28 14:50:03 +01:00
Cedric Nugteren b4c8e1d9a5 Made plotting script more flexible: extra argument to set the comparison library 2017-12-31 16:02:46 +01:00
Cedric Nugteren e81eb4f6d4 Added a note that the ArrayFire Jenkins servers are down, being switched to buildbot 2017-12-24 11:32:31 +01:00
Cedric Nugteren 0ee81e27b9 Added tuning results for Apple AMD Radeon Pro 580 2017-12-20 19:59:31 +01:00
Cedric Nugteren 35e2b3ed5b Updated the known issues 2017-12-16 12:11:15 +01:00
Cedric Nugteren abb4d5ab32 Added tuning results for ARM Mali T760 GPU 2017-11-24 21:16:54 +01:00
Cedric Nugteren d9cf206979 Removed dependency on CLTune 2017-11-16 21:28:36 +01:00
Cedric Nugteren c41d219ea4 Added tuning results for the GeForce GTX750Ti 2017-11-09 21:19:21 +01:00
Cedric Nugteren 5d5e3f93bc Updated to CLBlast version 1.2.0 2017-11-08 21:30:06 +01:00
Cedric Nugteren d24138808b Fixed an FP16 issue in the homatcopy test; added a comment about improper testing of integer returning functions for FP16 2017-11-08 21:20:07 +01:00
Cedric Nugteren b18cc9d3f1
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
2017-11-07 22:20:13 +01:00
Cedric Nugteren 9b0a435fb0 Integrated the GEMM routine tuner for kernel selection; added first tuning results 2017-11-02 21:47:14 +01:00
Cedric Nugteren f24d611e57 Made it possible to compile the CLBlast performance clients for Android with the NDK 2017-10-29 13:02:14 +01:00
Cedric Nugteren 319762f150 Added Android support using the GNU C++ STL library and the GCC toolchain 2017-10-29 12:07:07 +01:00
Cedric Nugteren 12b08ae491 Merge branch 'master' into android_support 2017-10-28 17:32:37 +02:00
Cedric Nugteren 5fd1f2fc60 Added first version of a roadmap 2017-10-20 18:21:31 +02:00
Cedric Nugteren 472f90501c Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570 2017-10-20 18:06:12 +02:00
Cedric Nugteren 03760f80eb Added CUDA API documentation 2017-10-16 21:54:42 +02:00
Cedric Nugteren f4c4674cf6 Updated to version 1.1.0 2017-09-30 17:19:17 +02:00
Cedric Nugteren 2949e156f5 Added notes for Android compilation of CLBlast 2017-09-26 21:23:53 +02:00
Cedric Nugteren a23cd8d13a Updated README with proper AMD device names; fixed device look-up for names of length 50+ 2017-09-16 21:26:38 +02:00
Cedric Nugteren 4d9d03ba51 Completed im2col implementation 2017-08-24 21:11:12 +02:00
Cedric Nugteren 18d832e149 Added tuning results for the Qualcomm Adreno 330 GPU 2017-07-30 18:18:02 +02:00
Cedric Nugteren b7473f50df Added status badges for correctness tests; updated list of contributors; fixed minor typos 2017-07-24 20:14:47 +02:00
Cedric Nugteren b8df03e5bc Added CLBlast paper and presentation references in README 2017-06-25 20:45:14 +02:00
Cedric Nugteren 48f2682eb7 Added tuning results for the Core i7-920 CPU 2017-06-18 20:53:59 +02:00
Cedric Nugteren 33ed1e5a06 Added tuning results for GeForce GT 650M (thanks to bzcheeseman) 2017-06-01 22:52:08 +02:00
Cedric Nugteren f151e56daa Added the IxAMIN routines: absolute minimum version of IxAMAX 2017-05-12 20:01:33 -07:00
Cedric Nugteren 81d9ed3946 Removed the included performance reports; README now redirects to the new external website 2017-05-12 13:18:10 -07:00
Cedric Nugteren 71933c3411 Added tuning results for the AMD Radeon Fiji GPU 2017-05-11 22:53:52 -07:00
Cedric Nugteren d67455fdb8 Fixes the build-status table in the README 2017-05-11 22:22:10 -07:00
Cedric Nugteren b0f3659121 The master branch is now the main 'development' branch 2017-05-03 19:49:15 +02:00
Cedric Nugteren e3bb58f602 Finalized support for performance testing against cuBLAS 2017-04-16 17:53:51 +02:00
Cedric Nugteren fa5c4b00b7 Replaced the R graph scripts with Python/Matplotlib benchmark scripts 2017-03-26 15:36:34 +02:00
Cedric Nugteren 7b8f8fce68 Added initial naive version of the batched GEMM routine based on the direct GEMM kernel 2017-03-11 16:02:45 +01:00