Commit graph

496 commits

Author SHA1 Message Date
Cedric Nugteren b698e45478 Added first tuning results for the single-kernel direct GEMM implementation 2016-10-06 21:13:14 +02:00
Cedric Nugteren a3e67f2be2 Added a kernel selection database to select between the direct and indirect GEMM kernels 2016-10-06 19:51:12 +02:00
Cedric Nugteren 7052a00a3e Fixed a const-correctness issue with complex conjugation in the GEMM direct kernel 2016-10-03 20:13:19 +02:00
Cedric Nugteren ca0c075de2 Added functions to load from off-chip to local memory without vector loads for the GEMM direct kernels 2016-10-03 20:09:15 +02:00
Cedric Nugteren c1c4bc5d20 Re-organised GEMM direct kernel and added faster fall-back version for incomplete rectangles 2016-10-03 19:32:01 +02:00
Cedric Nugteren 243cef73db Set the default number of runs for all kernels to at least 2 runs 2016-10-02 21:23:23 +02:00
Cedric Nugteren d8827e908c Specialised the GEMM direct kernel in four ways for transposing/non-transposing: NN, NT, TN, TT 2016-10-02 17:59:05 +02:00
Cedric Nugteren 61f489e370 Split the GEMM direct kernel into two files; set the default tuning target to 256-256-256 2016-10-02 15:06:59 +02:00
Cedric Nugteren a459920105 Added padding to the local memory of the GEMM direct kernel 2016-10-01 16:58:53 +02:00
Cedric Nugteren ecc704cc76 Added default num-runs to the tuner adding averaging over 10 runs as a default for the GEMM direct kernel 2016-10-01 16:55:21 +02:00
Cedric Nugteren a9d35cf04c Merge branch 'development' into gemm_direct 2016-10-01 13:45:08 +02:00
Cedric Nugteren d59e5c570b Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0 2016-09-27 21:03:24 +02:00
Cedric Nugteren db5772e521 Updated to version 8.0 of the CLCudaAPI header 2016-09-27 20:56:49 +02:00
Cedric Nugteren adc058440c Fixed the local memory size computation for the GEMM tuners 2016-09-27 20:03:55 +02:00
Cedric Nugteren 6178fcd584 Now generates test/client/tuner data using a fixed seed to enable reproducability of results 2016-09-27 19:55:21 +02:00
Cedric Nugteren e3076d26cc Added more relaxed error checking for the half-precision tests 2016-09-27 19:42:58 +02:00
Cedric Nugteren a2bfae3c46 Merge pull request #103 from dividiti/link_clblas_with_pthread
Link clBLAS together with pthread
2016-09-27 08:53:08 +02:00
Anton Lokhmotov c484bb26b6 Use cross-platform thread lib idiom instead of *nix-specific pthread. 2016-09-26 21:04:28 +00:00
Anton Lokhmotov c20a5bb7ca Link clBLAS together with pthread. 2016-09-26 10:30:18 +00:00
Cedric Nugteren 73d135c2ce Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter 2016-09-25 14:48:34 +02:00
Cedric Nugteren 669f43aed6 Separated the tuning parameters of the new direct GEMM kernel from the indirect version 2016-09-25 13:52:08 +02:00
Cedric Nugteren 140dc12854 Added a first version of the direct version of GEMM with local memory 2016-09-25 11:38:35 +02:00
Cedric Nugteren 8a5ce05022 Fix another issue with the packaging in the AppVeyor script 2016-09-25 10:32:12 +02:00
Cedric Nugteren 08abb7dfa4 Fix an issue with the packaging in the AppVeyor script 2016-09-25 10:20:47 +02:00
Cedric Nugteren a594067758 Updated AppVeyor script to fix an issue with changes in the latest AppVeyor servers 2016-09-25 10:10:42 +02:00
Cedric Nugteren c712fd4cb1 Merge pull request #101 from dividiti/add_ref_includes_to_test_correctness_common
Add path to ref library header when building tests.
2016-09-24 15:26:08 +02:00
Anton Lokhmotov 750f185ba9 Add path to ref library header when building tests. 2016-09-24 11:46:34 +00:00
Cedric Nugteren d595a8ed7e Fixed a bug waiting for an invalid event in case of a non-succesfull CLBlast call in the tests and samples 2016-09-22 20:47:22 +02:00
Cedric Nugteren 6aa652d6ea Merge branch 'development' into gemm_direct 2016-09-21 21:32:18 +02:00
Cedric Nugteren b1929d8ce7 It is now possible to set the OpenCL compiler options through an environmental variable 2016-09-21 21:22:16 +02:00
Cedric Nugteren 63003a1429 Merge branch 'master' into development 2016-09-21 20:57:23 +02:00
Cedric Nugteren d13a98272b Merge pull request #100 from gpu/master
Fixed link in README.md
2016-09-20 21:47:15 +02:00
Marco Hutter 9b0f6238b3 Fixed link in README.md
The GitHub link could be https://github.com/gpu
(without "s"), but the website should be OK, too
2016-09-20 18:03:57 +02:00
Cedric Nugteren f07ac22f5b Merge pull request #99 from CNugteren/development
Update to version 0.9.0
2016-09-13 21:14:51 +02:00
Cedric Nugteren 4b94afda94 Updated to version 0.9.0 2016-09-13 19:20:39 +02:00
Cedric Nugteren 48ab0428cb Renamed the DEFAULT_DEVICE and DEFAULT_PLATFORM env variables to be in line with recent usages of CLBLAST_DEVICE and CLBLAST_PLATFORM 2016-09-13 19:08:49 +02:00
Cedric Nugteren d7305346ca Merge pull request #98 from intelfx/no-ignored-attributes
CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warnings
2016-09-13 17:58:12 +02:00
Ivan Shapovalov 9095537a6a CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warnings 2016-09-13 16:12:30 +03:00
Cedric Nugteren 4ce584a014 Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC can't handle long strings 2016-09-12 22:13:16 +02:00
Cedric Nugteren 9fb7a0efe1 Merge branch 'database_rewrite' into development 2016-09-12 20:16:18 +02:00
Cedric Nugteren aa3dffe356 Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are now automatically taken from 32-bit if there are no entries at all 2016-09-12 20:13:38 +02:00
Cedric Nugteren b5a67f86ec Complete re-write of the database script. Changed Pandas for the much faster and convienient plain JSON/dict data-type 2016-09-11 21:29:28 +02:00
Cedric Nugteren 94163970ae Merge branch 'xgemm_tuner_exhaustive' into development 2016-09-10 14:01:21 +02:00
Cedric Nugteren e21f32bc99 Updated database based on exhaustive tuning results for GEMM for the R9 M370X GPU 2016-09-10 14:00:43 +02:00
Cedric Nugteren 3daba70997 Updated the database script to remove duplicate entries: keeps only the best-performing cases for a specific parameters combination 2016-09-10 11:12:09 +02:00
Cedric Nugteren 55038d3c91 Split GEMM tuning in two parts: a small set of tuning parameters which is explored exhaustively and a larger set which is explored randomly 2016-09-06 20:30:06 +02:00
Cedric Nugteren a2f8350703 Refactored the Python C++ generator script; now confirms to the PEP8 styleguide 2016-09-04 21:26:30 +02:00
Cedric Nugteren b30b26b89e The GEMM kernel no longer adds beta*C in case beta is zero; this would cause problems if C contains NaNs 2016-09-04 17:21:16 +02:00
Cedric Nugteren 521bf6cdfc Added tuning results for Intel Broadwell 5500 GT2 GPU 2016-09-03 16:43:23 +02:00
Cedric Nugteren 19574b2519 Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to handle duplicate entries of different runs 2016-09-03 12:45:11 +02:00