Commit graph

1190 commits

Author SHA1 Message Date
Cedric Nugteren e3eedacbcc Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first 2018-06-28 20:35:18 +09:00
Cedric Nugteren 4471b67735 Updated to CLBlast version 1.4.0 2018-06-03 13:18:05 +02:00
Cedric Nugteren fee8df153c Added list of tuners to be run by 'alltuners' target 2018-06-03 10:42:15 +02:00
Cedric Nugteren bd1715aff9 Fixes for CUDA version of CLBlast 2018-06-03 10:41:57 +02:00
Cedric Nugteren 4f594e3931 Added MKL as an alternative for CBLAS for correctness and performance comparisons 2018-06-02 17:57:45 +02:00
Cedric Nugteren 7c3431a72a Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present 2018-06-01 20:59:44 +02:00
Cedric Nugteren 5702bff5ad Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue 2018-05-31 22:37:06 +02:00
Cedric Nugteren e609220393 Some potential fixes for error -54 when launching TRSV and TRSM kernels 2018-05-31 20:09:49 +02:00
Cedric Nugteren ff4d5558a6 Widened Apple OpenCL check, added way to debug too-large-workgroups issue 2018-05-30 22:59:04 +02:00
Cedric Nugteren a8bb0c9f3c Added Apple OpenCL TRSV block size override; removed failing old Intel GPU test from README 2018-05-29 21:29:12 +02:00
Cedric Nugteren 6616a59774
Merge pull request #287 from CNugteren/apple-opencl-limitations-fixes
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27 20:54:27 +02:00
Cedric Nugteren b3f6950af3
Merge pull request #286 from CNugteren/runtime_statistics_in_client
Runtime statistics in client
2018-05-27 19:27:52 +02:00
Cedric Nugteren 01d254c0b0 Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM 2018-05-27 18:38:47 +02:00
Cedric Nugteren 53198121ac Made FillMatrix and FillVector functions take a configurable local workgroup size 2018-05-27 12:03:32 +02:00
Cedric Nugteren 38318fa39c Added maximum time reporting to the client statistics 2018-05-27 11:39:51 +02:00
Cedric Nugteren c85c385aaf Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation 2018-05-23 22:36:38 +02:00
Cedric Nugteren 8e28a7699d
Merge pull request #285 from CNugteren/size_specific_routine_tuner
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 21:06:14 +02:00
Cedric Nugteren ba0b558e84 Added an option to run the routine tuner for a single specific GEMM size 2018-05-19 17:42:11 +02:00
Cedric Nugteren 507d7bc729
Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_disk
Routine tuners read kernel JSON from disk
2018-05-19 17:06:37 +02:00
Cedric Nugteren 76e0079a90 Fixed compilation issues 2018-05-19 14:18:23 +02:00
Cedric Nugteren 66583b3cda The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target 2018-05-19 12:48:59 +02:00
Cedric Nugteren 637e49e134 Fixed a bug in loading xgemm-direct JSON data from disk 2018-05-19 12:48:04 +02:00
Cedric Nugteren 0326c7d559
Merge pull request #283 from CNugteren/canary_buffer_overflow_protection
Canary buffer overflow protection
2018-05-18 21:32:20 +02:00
Cedric Nugteren 60d057c7fd
Merge branch 'master' into canary_buffer_overflow_protection 2018-05-18 21:30:11 +02:00
Cedric Nugteren a133563582
Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvements
Better cache behaviour of OpenCL programs
2018-05-17 20:26:53 +02:00
Cedric Nugteren a65772cd30 Updated the roadmap 2018-05-17 12:52:23 +02:00
Cedric Nugteren e3022e562f Updated README with IWOCL talk and GPU zoo acknowledgment 2018-05-17 12:50:28 +02:00
Cedric Nugteren ad57a45039 Added documentation on some details of the GEMM implementation 2018-05-17 12:50:03 +02:00
Cedric Nugteren 8290ad78b9 Fixed a few issues with canary region testing 2018-05-17 12:16:32 +02:00
Cedric Nugteren 85341836dd Added a canary region for overflow detection to the correctness tests 2018-05-17 10:45:50 +01:00
Cedric Nugteren b855af681f Added a canary region for overflow detection to the tuners 2018-05-17 10:45:10 +01:00
Cedric Nugteren fa4fee4fee
Merge pull request #279 from umar456/ci_links
Update ci links to use doman names and build names instead of IP/id
2018-05-09 18:56:25 +02:00
Umar Arshad 1659ae5432 Update ci links to use doman names and build names instead of IP/id
Updates the README badges to point to the domain name instead of
IP addresses. Also updates the names of the builds to the name
of the build instead of the id of the build.
2018-05-08 20:24:40 -04:00
Cedric Nugteren 8b381480f8 Updated README with new badges and paper citation 2018-05-01 20:51:10 +02:00
Cedric Nugteren 8258321a74 Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00
Cedric Nugteren b2248a17ae
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
Intel subgroup shuffling
2018-04-29 15:48:35 +02:00
Cedric Nugteren 9f22bc232b Updated the changelog 2018-04-29 15:06:44 +02:00
Cedric Nugteren 0022107b2a Updated the roadmap 2018-04-29 15:06:33 +02:00
Cedric Nugteren 7b416c8686 Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program 2018-04-26 21:10:17 +02:00
Cedric Nugteren 2965b87dda Added Intel subgroup shuffle support to the 2D register caching GEMM kernel 2018-04-24 21:32:42 +02:00
Cedric Nugteren 2b1e0295e6 Added a define to enable subgroup shuffling if supported by the device 2018-04-24 20:41:15 +02:00
Cedric Nugteren 5d46a3193e
Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernel
Added 2D-register-caching GEMM kernel
2018-04-21 21:15:44 +02:00
Cedric Nugteren 3e3a26e0da Fixes for the CUDA API 2018-04-20 21:50:36 +02:00
Cedric Nugteren 458e6717a9 Expressed HER2K as two HERK calls 2018-04-18 20:58:29 +02:00
Cedric Nugteren dcce23d938 Expressed SYR2K as two SYRK calls 2018-04-18 20:29:28 +02:00
Cedric Nugteren ef6b1207df Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel 2018-04-17 21:13:28 +02:00
Cedric Nugteren 93610a9cba Fixed some failing tests for GEMM and batched GEMM routines 2018-04-15 12:53:32 +02:00
Cedric Nugteren f14e6f87d2 Updated tuning results for the Skylake ULT GT2 GPU with the new kernel 2018-04-15 11:45:45 +02:00
Cedric Nugteren 0dff7f1ac4 Made GEMM rotation expectations kernel-specific 2018-04-13 22:27:11 +02:00
Cedric Nugteren 0f49dd24e5 Updated database with defaults of GEMMK=0 and KREG=1 2018-04-10 21:26:18 +02:00