Commit graph

506 commits

Author SHA1 Message Date
Cedric Nugteren fcac81bfef First fixes towards compilation on Visual Studio 2013 2016-10-10 20:37:45 +02:00
Cedric Nugteren 39afc9543b Changed the storage location of the database to a separate Github repository 2016-10-10 19:10:12 +02:00
Cedric Nugteren 71f5c0c145 Changed the license to MIT 2016-10-10 18:07:17 +02:00
Cedric Nugteren 42ee4abbbc Updated the performance graphs for Intel Iris Pro GPU and AMD Radeon M370X GPU 2016-10-10 18:07:05 +02:00
Cedric Nugteren f563341e7b Added fresh performance graphs for GeForce 750Ti; removed old GTX480 results 2016-10-10 16:59:28 +02:00
Cedric Nugteren 08ee57f494 Updated the tuning results for the GTX 750 Ti GPU 2016-10-10 16:41:41 +02:00
Cedric Nugteren 2194dee217 Merge branch 'gemm_direct' into development 2016-10-10 16:05:18 +02:00
Cedric Nugteren 7c228f6a67 Changed the thresholds for the direct/indirect GEMM kernels for NVIDIA and Intel GPUs 2016-10-10 16:01:02 +02:00
Cedric Nugteren d7cfb6aa9b Added benchmark script for small matrix sizes, testing the direct GEMM kernels 2016-10-08 22:05:54 +02:00
Cedric Nugteren 7baac46e72 Fixed a performance bug for Intel Iris Pro GPUs due to incorrect tuning results 2016-10-08 21:56:06 +02:00
Cedric Nugteren b698e45478 Added first tuning results for the single-kernel direct GEMM implementation 2016-10-06 21:13:14 +02:00
Cedric Nugteren a3e67f2be2 Added a kernel selection database to select between the direct and indirect GEMM kernels 2016-10-06 19:51:12 +02:00
Cedric Nugteren 7052a00a3e Fixed a const-correctness issue with complex conjugation in the GEMM direct kernel 2016-10-03 20:13:19 +02:00
Cedric Nugteren ca0c075de2 Added functions to load from off-chip to local memory without vector loads for the GEMM direct kernels 2016-10-03 20:09:15 +02:00
Cedric Nugteren c1c4bc5d20 Re-organised GEMM direct kernel and added faster fall-back version for incomplete rectangles 2016-10-03 19:32:01 +02:00
Cedric Nugteren 243cef73db Set the default number of runs for all kernels to at least 2 runs 2016-10-02 21:23:23 +02:00
Cedric Nugteren d8827e908c Specialised the GEMM direct kernel in four ways for transposing/non-transposing: NN, NT, TN, TT 2016-10-02 17:59:05 +02:00
Cedric Nugteren 61f489e370 Split the GEMM direct kernel into two files; set the default tuning target to 256-256-256 2016-10-02 15:06:59 +02:00
Cedric Nugteren a459920105 Added padding to the local memory of the GEMM direct kernel 2016-10-01 16:58:53 +02:00
Cedric Nugteren ecc704cc76 Added default num-runs to the tuner adding averaging over 10 runs as a default for the GEMM direct kernel 2016-10-01 16:55:21 +02:00
Cedric Nugteren a9d35cf04c Merge branch 'development' into gemm_direct 2016-10-01 13:45:08 +02:00
Cedric Nugteren d59e5c570b Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0 2016-09-27 21:03:24 +02:00
Cedric Nugteren db5772e521 Updated to version 8.0 of the CLCudaAPI header 2016-09-27 20:56:49 +02:00
Cedric Nugteren adc058440c Fixed the local memory size computation for the GEMM tuners 2016-09-27 20:03:55 +02:00
Cedric Nugteren 6178fcd584 Now generates test/client/tuner data using a fixed seed to enable reproducability of results 2016-09-27 19:55:21 +02:00
Cedric Nugteren e3076d26cc Added more relaxed error checking for the half-precision tests 2016-09-27 19:42:58 +02:00
Cedric Nugteren a2bfae3c46 Merge pull request #103 from dividiti/link_clblas_with_pthread
Link clBLAS together with pthread
2016-09-27 08:53:08 +02:00
Anton Lokhmotov c484bb26b6 Use cross-platform thread lib idiom instead of *nix-specific pthread. 2016-09-26 21:04:28 +00:00
Anton Lokhmotov c20a5bb7ca Link clBLAS together with pthread. 2016-09-26 10:30:18 +00:00
Cedric Nugteren 73d135c2ce Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter 2016-09-25 14:48:34 +02:00
Cedric Nugteren 669f43aed6 Separated the tuning parameters of the new direct GEMM kernel from the indirect version 2016-09-25 13:52:08 +02:00
Cedric Nugteren 140dc12854 Added a first version of the direct version of GEMM with local memory 2016-09-25 11:38:35 +02:00
Cedric Nugteren 8a5ce05022 Fix another issue with the packaging in the AppVeyor script 2016-09-25 10:32:12 +02:00
Cedric Nugteren 08abb7dfa4 Fix an issue with the packaging in the AppVeyor script 2016-09-25 10:20:47 +02:00
Cedric Nugteren a594067758 Updated AppVeyor script to fix an issue with changes in the latest AppVeyor servers 2016-09-25 10:10:42 +02:00
Cedric Nugteren c712fd4cb1 Merge pull request #101 from dividiti/add_ref_includes_to_test_correctness_common
Add path to ref library header when building tests.
2016-09-24 15:26:08 +02:00
Anton Lokhmotov 750f185ba9 Add path to ref library header when building tests. 2016-09-24 11:46:34 +00:00
Cedric Nugteren d595a8ed7e Fixed a bug waiting for an invalid event in case of a non-succesfull CLBlast call in the tests and samples 2016-09-22 20:47:22 +02:00
Cedric Nugteren 6aa652d6ea Merge branch 'development' into gemm_direct 2016-09-21 21:32:18 +02:00
Cedric Nugteren b1929d8ce7 It is now possible to set the OpenCL compiler options through an environmental variable 2016-09-21 21:22:16 +02:00
Cedric Nugteren 63003a1429 Merge branch 'master' into development 2016-09-21 20:57:23 +02:00
Cedric Nugteren d13a98272b Merge pull request #100 from gpu/master
Fixed link in README.md
2016-09-20 21:47:15 +02:00
Marco Hutter 9b0f6238b3 Fixed link in README.md
The GitHub link could be https://github.com/gpu
(without "s"), but the website should be OK, too
2016-09-20 18:03:57 +02:00
Cedric Nugteren f07ac22f5b Merge pull request #99 from CNugteren/development
Update to version 0.9.0
2016-09-13 21:14:51 +02:00
Cedric Nugteren 4b94afda94 Updated to version 0.9.0 2016-09-13 19:20:39 +02:00
Cedric Nugteren 48ab0428cb Renamed the DEFAULT_DEVICE and DEFAULT_PLATFORM env variables to be in line with recent usages of CLBLAST_DEVICE and CLBLAST_PLATFORM 2016-09-13 19:08:49 +02:00
Cedric Nugteren d7305346ca Merge pull request #98 from intelfx/no-ignored-attributes
CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warnings
2016-09-13 17:58:12 +02:00
Ivan Shapovalov 9095537a6a CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warnings 2016-09-13 16:12:30 +03:00
Cedric Nugteren 4ce584a014 Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC can't handle long strings 2016-09-12 22:13:16 +02:00
Cedric Nugteren 9fb7a0efe1 Merge branch 'database_rewrite' into development 2016-09-12 20:16:18 +02:00