Commit graph

1196 commits

Author SHA1 Message Date
Tyler Sorensen 36093429fd restored some of the changed tuning files for xgemm 2018-07-11 15:31:51 -04:00
Tyler Sorensen 7f2e98a140 added inline ptx to support shuffle on Nvidia GPUs 2018-07-11 15:12:22 -04:00
Cedric Nugteren 7bae54f61f Updated changelog 2018-07-06 19:39:46 +02:00
Cedric Nugteren 49e06d20ab
Merge pull request #296 from alycm/CLBlast-291-eliminate-temporary-program
Eliminate a temporary Program object
2018-07-06 19:35:10 +02:00
Alastair Murray 25661b2d6f Eliminate a temporary Program object
This was causing a crash for me because the temporary Program destructor called
clReleaseProgram on the cl_program with Program, and then clBuildProgram was
called on the same cl_program (belonging to the Program owned by the
shared_ptr, but it's the same cl_program).
2018-07-06 12:58:20 +01:00
Cedric Nugteren 43e3f27254
Merge pull request #295 from CNugteren/CLBlast-292-no-cl-program-release-windows
Disabled calls to clReleaseProgram under Windows
2018-06-28 21:22:12 +09:00
Cedric Nugteren e3eedacbcc Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first 2018-06-28 20:35:18 +09:00
Cedric Nugteren 4471b67735 Updated to CLBlast version 1.4.0 2018-06-03 13:18:05 +02:00
Cedric Nugteren fee8df153c Added list of tuners to be run by 'alltuners' target 2018-06-03 10:42:15 +02:00
Cedric Nugteren bd1715aff9 Fixes for CUDA version of CLBlast 2018-06-03 10:41:57 +02:00
Cedric Nugteren 4f594e3931 Added MKL as an alternative for CBLAS for correctness and performance comparisons 2018-06-02 17:57:45 +02:00
Cedric Nugteren 7c3431a72a Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present 2018-06-01 20:59:44 +02:00
Cedric Nugteren 5702bff5ad Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue 2018-05-31 22:37:06 +02:00
Cedric Nugteren e609220393 Some potential fixes for error -54 when launching TRSV and TRSM kernels 2018-05-31 20:09:49 +02:00
Cedric Nugteren ff4d5558a6 Widened Apple OpenCL check, added way to debug too-large-workgroups issue 2018-05-30 22:59:04 +02:00
Cedric Nugteren a8bb0c9f3c Added Apple OpenCL TRSV block size override; removed failing old Intel GPU test from README 2018-05-29 21:29:12 +02:00
Cedric Nugteren 6616a59774
Merge pull request #287 from CNugteren/apple-opencl-limitations-fixes
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27 20:54:27 +02:00
Cedric Nugteren b3f6950af3
Merge pull request #286 from CNugteren/runtime_statistics_in_client
Runtime statistics in client
2018-05-27 19:27:52 +02:00
Cedric Nugteren 01d254c0b0 Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM 2018-05-27 18:38:47 +02:00
Cedric Nugteren 53198121ac Made FillMatrix and FillVector functions take a configurable local workgroup size 2018-05-27 12:03:32 +02:00
Cedric Nugteren 38318fa39c Added maximum time reporting to the client statistics 2018-05-27 11:39:51 +02:00
Cedric Nugteren c85c385aaf Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation 2018-05-23 22:36:38 +02:00
Cedric Nugteren 8e28a7699d
Merge pull request #285 from CNugteren/size_specific_routine_tuner
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 21:06:14 +02:00
Cedric Nugteren ba0b558e84 Added an option to run the routine tuner for a single specific GEMM size 2018-05-19 17:42:11 +02:00
Cedric Nugteren 507d7bc729
Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_disk
Routine tuners read kernel JSON from disk
2018-05-19 17:06:37 +02:00
Cedric Nugteren 76e0079a90 Fixed compilation issues 2018-05-19 14:18:23 +02:00
Cedric Nugteren 66583b3cda The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target 2018-05-19 12:48:59 +02:00
Cedric Nugteren 637e49e134 Fixed a bug in loading xgemm-direct JSON data from disk 2018-05-19 12:48:04 +02:00
Cedric Nugteren 0326c7d559
Merge pull request #283 from CNugteren/canary_buffer_overflow_protection
Canary buffer overflow protection
2018-05-18 21:32:20 +02:00
Cedric Nugteren 60d057c7fd
Merge branch 'master' into canary_buffer_overflow_protection 2018-05-18 21:30:11 +02:00
Cedric Nugteren a133563582
Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvements
Better cache behaviour of OpenCL programs
2018-05-17 20:26:53 +02:00
Cedric Nugteren a65772cd30 Updated the roadmap 2018-05-17 12:52:23 +02:00
Cedric Nugteren e3022e562f Updated README with IWOCL talk and GPU zoo acknowledgment 2018-05-17 12:50:28 +02:00
Cedric Nugteren ad57a45039 Added documentation on some details of the GEMM implementation 2018-05-17 12:50:03 +02:00
Cedric Nugteren 8290ad78b9 Fixed a few issues with canary region testing 2018-05-17 12:16:32 +02:00
Cedric Nugteren 85341836dd Added a canary region for overflow detection to the correctness tests 2018-05-17 10:45:50 +01:00
Cedric Nugteren b855af681f Added a canary region for overflow detection to the tuners 2018-05-17 10:45:10 +01:00
Cedric Nugteren fa4fee4fee
Merge pull request #279 from umar456/ci_links
Update ci links to use doman names and build names instead of IP/id
2018-05-09 18:56:25 +02:00
Umar Arshad 1659ae5432 Update ci links to use doman names and build names instead of IP/id
Updates the README badges to point to the domain name instead of
IP addresses. Also updates the names of the builds to the name
of the build instead of the id of the build.
2018-05-08 20:24:40 -04:00
Cedric Nugteren 8b381480f8 Updated README with new badges and paper citation 2018-05-01 20:51:10 +02:00
Cedric Nugteren 8258321a74 Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00
Cedric Nugteren b2248a17ae
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
Intel subgroup shuffling
2018-04-29 15:48:35 +02:00
Cedric Nugteren 9f22bc232b Updated the changelog 2018-04-29 15:06:44 +02:00
Cedric Nugteren 0022107b2a Updated the roadmap 2018-04-29 15:06:33 +02:00
Cedric Nugteren 7b416c8686 Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program 2018-04-26 21:10:17 +02:00
Cedric Nugteren 2965b87dda Added Intel subgroup shuffle support to the 2D register caching GEMM kernel 2018-04-24 21:32:42 +02:00
Cedric Nugteren 2b1e0295e6 Added a define to enable subgroup shuffling if supported by the device 2018-04-24 20:41:15 +02:00
Cedric Nugteren 5d46a3193e
Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernel
Added 2D-register-caching GEMM kernel
2018-04-21 21:15:44 +02:00
Cedric Nugteren 3e3a26e0da Fixes for the CUDA API 2018-04-20 21:50:36 +02:00
Cedric Nugteren 458e6717a9 Expressed HER2K as two HERK calls 2018-04-18 20:58:29 +02:00