Commit graph

675 commits

Author SHA1 Message Date
Cedric Nugteren 9331442a56 Merge branch 'development' into netlib_blas_api 2016-10-16 11:43:03 +02:00
Cedric Nugteren 53deed298f Added documentation and minor refactoring for the recent support of static library compilation 2016-10-15 17:11:08 +02:00
Cedric Nugteren a63f57297b Merge pull request #115 from shehzan10/development
Fixes for static lib compilation on Windows
2016-10-15 09:48:03 +02:00
Shehzan Mohammed 0d958bf3b3 Fixes for static lib compilation on Windows 2016-10-14 18:45:34 -04:00
Cedric Nugteren c0482ace6c Fixed a bug where clblas.h couldn't be found for the performance tests (clients) 2016-10-14 22:11:35 +02:00
Cedric Nugteren 0f9311d46a Fixed an issue with a growing database: the database is now a global variable in a namespace and its container uses const-pointers to the actual data 2016-10-14 20:56:32 +02:00
Cedric Nugteren 3386ad49c4 Set proper flags for the verbose mode (debug flags) 2016-10-14 20:54:05 +02:00
Cedric Nugteren 99a620f9a1 Merge pull request #112 from shehzan10/static
Add option to build shared or static library
2016-10-14 10:06:44 +02:00
Shehzan Mohammed 56f07e42b1 Add option to build shared or static library 2016-10-13 12:03:44 -04:00
Cedric Nugteren ebb505b783 Added tuning results for Intel HD Graphics IvyBridge GPU 2016-10-13 12:18:28 +02:00
Cedric Nugteren 541415374f Merge pull request #108 from CNugteren/msvc2013
Support for Visual Studio 2013
2016-10-13 08:34:07 +02:00
Cedric Nugteren c60f6715f8 Removed a spurious #ifdef 2016-10-12 21:49:59 +02:00
Cedric Nugteren ad2b6ecea2 Fixed missing line ending 2016-10-12 21:10:22 +02:00
Cedric Nugteren 8a9d3cdf37 Added support for compiling the library, the client, and the samples under MSVC 2013 2016-10-10 22:45:39 +02:00
Cedric Nugteren f88c50522d Fixed an issue with const members of structs in the database 2016-10-10 22:24:05 +02:00
Cedric Nugteren de77f00e8c Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 and 2015 2016-10-10 22:23:33 +02:00
Cedric Nugteren fcac81bfef First fixes towards compilation on Visual Studio 2013 2016-10-10 20:37:45 +02:00
Cedric Nugteren 39afc9543b Changed the storage location of the database to a separate Github repository 2016-10-10 19:10:12 +02:00
Cedric Nugteren 71f5c0c145 Changed the license to MIT 2016-10-10 18:07:17 +02:00
Cedric Nugteren 42ee4abbbc Updated the performance graphs for Intel Iris Pro GPU and AMD Radeon M370X GPU 2016-10-10 18:07:05 +02:00
Cedric Nugteren f563341e7b Added fresh performance graphs for GeForce 750Ti; removed old GTX480 results 2016-10-10 16:59:28 +02:00
Cedric Nugteren 08ee57f494 Updated the tuning results for the GTX 750 Ti GPU 2016-10-10 16:41:41 +02:00
Cedric Nugteren 2194dee217 Merge branch 'gemm_direct' into development 2016-10-10 16:05:18 +02:00
Cedric Nugteren 7c228f6a67 Changed the thresholds for the direct/indirect GEMM kernels for NVIDIA and Intel GPUs 2016-10-10 16:01:02 +02:00
Cedric Nugteren d7cfb6aa9b Added benchmark script for small matrix sizes, testing the direct GEMM kernels 2016-10-08 22:05:54 +02:00
Cedric Nugteren 7baac46e72 Fixed a performance bug for Intel Iris Pro GPUs due to incorrect tuning results 2016-10-08 21:56:06 +02:00
Cedric Nugteren b698e45478 Added first tuning results for the single-kernel direct GEMM implementation 2016-10-06 21:13:14 +02:00
Cedric Nugteren a3e67f2be2 Added a kernel selection database to select between the direct and indirect GEMM kernels 2016-10-06 19:51:12 +02:00
Cedric Nugteren 8d5747aa54 Made non-standard types void-pointers in the Netlib BLAS interface 2016-10-05 08:23:54 +02:00
Cedric Nugteren a17b714c3e Added first version of Netlib BLAS API header 2016-10-05 00:09:39 +02:00
Cedric Nugteren 7052a00a3e Fixed a const-correctness issue with complex conjugation in the GEMM direct kernel 2016-10-03 20:13:19 +02:00
Cedric Nugteren ca0c075de2 Added functions to load from off-chip to local memory without vector loads for the GEMM direct kernels 2016-10-03 20:09:15 +02:00
Cedric Nugteren c1c4bc5d20 Re-organised GEMM direct kernel and added faster fall-back version for incomplete rectangles 2016-10-03 19:32:01 +02:00
Cedric Nugteren 243cef73db Set the default number of runs for all kernels to at least 2 runs 2016-10-02 21:23:23 +02:00
Cedric Nugteren d8827e908c Specialised the GEMM direct kernel in four ways for transposing/non-transposing: NN, NT, TN, TT 2016-10-02 17:59:05 +02:00
Cedric Nugteren 61f489e370 Split the GEMM direct kernel into two files; set the default tuning target to 256-256-256 2016-10-02 15:06:59 +02:00
Cedric Nugteren a459920105 Added padding to the local memory of the GEMM direct kernel 2016-10-01 16:58:53 +02:00
Cedric Nugteren ecc704cc76 Added default num-runs to the tuner adding averaging over 10 runs as a default for the GEMM direct kernel 2016-10-01 16:55:21 +02:00
Cedric Nugteren a9d35cf04c Merge branch 'development' into gemm_direct 2016-10-01 13:45:08 +02:00
Cedric Nugteren d59e5c570b Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0 2016-09-27 21:03:24 +02:00
Cedric Nugteren db5772e521 Updated to version 8.0 of the CLCudaAPI header 2016-09-27 20:56:49 +02:00
Cedric Nugteren adc058440c Fixed the local memory size computation for the GEMM tuners 2016-09-27 20:03:55 +02:00
Cedric Nugteren 6178fcd584 Now generates test/client/tuner data using a fixed seed to enable reproducability of results 2016-09-27 19:55:21 +02:00
Cedric Nugteren e3076d26cc Added more relaxed error checking for the half-precision tests 2016-09-27 19:42:58 +02:00
Cedric Nugteren a2bfae3c46 Merge pull request #103 from dividiti/link_clblas_with_pthread
Link clBLAS together with pthread
2016-09-27 08:53:08 +02:00
Anton Lokhmotov c484bb26b6 Use cross-platform thread lib idiom instead of *nix-specific pthread. 2016-09-26 21:04:28 +00:00
Anton Lokhmotov c20a5bb7ca Link clBLAS together with pthread. 2016-09-26 10:30:18 +00:00
Cedric Nugteren 73d135c2ce Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter 2016-09-25 14:48:34 +02:00
Cedric Nugteren 669f43aed6 Separated the tuning parameters of the new direct GEMM kernel from the indirect version 2016-09-25 13:52:08 +02:00
Cedric Nugteren 140dc12854 Added a first version of the direct version of GEMM with local memory 2016-09-25 11:38:35 +02:00