Commit graph

1395 commits

Author SHA1 Message Date
Cedric Nugteren 6616a59774
Merge pull request #287 from CNugteren/apple-opencl-limitations-fixes
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27 20:54:27 +02:00
Cedric Nugteren b3f6950af3
Merge pull request #286 from CNugteren/runtime_statistics_in_client
Runtime statistics in client
2018-05-27 19:27:52 +02:00
Cedric Nugteren 01d254c0b0 Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM 2018-05-27 18:38:47 +02:00
Cedric Nugteren 53198121ac Made FillMatrix and FillVector functions take a configurable local workgroup size 2018-05-27 12:03:32 +02:00
Cedric Nugteren 38318fa39c Added maximum time reporting to the client statistics 2018-05-27 11:39:51 +02:00
Cedric Nugteren c85c385aaf Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation 2018-05-23 22:36:38 +02:00
Cedric Nugteren 838422fbb1 Further implemented single-kernel approach of convgemm; extended test to capture other parts of the kernel code 2018-05-21 11:47:16 +02:00
Cedric Nugteren 5d87abf780 Added method selection option to switch between im2col and single-kernel approach for convgemm 2018-05-21 11:28:11 +02:00
Cedric Nugteren 8e28a7699d
Merge pull request #285 from CNugteren/size_specific_routine_tuner
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 21:06:14 +02:00
Cedric Nugteren 37cabd4f1f Moved new convgemm kernel to levelx kernel folder 2018-05-19 21:05:45 +02:00
Cedric Nugteren 27b52ac2c8 Second version of direct reading from image tensor for convgemm: also with local memory support now 2018-05-19 21:02:44 +02:00
Cedric Nugteren cbcd4ff7e8 Merge branch 'master' into CLBlast-267-convgemm 2018-05-19 17:54:27 +02:00
Cedric Nugteren ba0b558e84 Added an option to run the routine tuner for a single specific GEMM size 2018-05-19 17:42:11 +02:00
Cedric Nugteren 507d7bc729
Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_disk
Routine tuners read kernel JSON from disk
2018-05-19 17:06:37 +02:00
Cedric Nugteren 76e0079a90 Fixed compilation issues 2018-05-19 14:18:23 +02:00
Cedric Nugteren 66583b3cda The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target 2018-05-19 12:48:59 +02:00
Cedric Nugteren 637e49e134 Fixed a bug in loading xgemm-direct JSON data from disk 2018-05-19 12:48:04 +02:00
Cedric Nugteren 0326c7d559
Merge pull request #283 from CNugteren/canary_buffer_overflow_protection
Canary buffer overflow protection
2018-05-18 21:32:20 +02:00
Cedric Nugteren 60d057c7fd
Merge branch 'master' into canary_buffer_overflow_protection 2018-05-18 21:30:11 +02:00
Cedric Nugteren a133563582
Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvements
Better cache behaviour of OpenCL programs
2018-05-17 20:26:53 +02:00
Cedric Nugteren a65772cd30 Updated the roadmap 2018-05-17 12:52:23 +02:00
Cedric Nugteren e3022e562f Updated README with IWOCL talk and GPU zoo acknowledgment 2018-05-17 12:50:28 +02:00
Cedric Nugteren ad57a45039 Added documentation on some details of the GEMM implementation 2018-05-17 12:50:03 +02:00
Cedric Nugteren 8290ad78b9 Fixed a few issues with canary region testing 2018-05-17 12:16:32 +02:00
Cedric Nugteren 85341836dd Added a canary region for overflow detection to the correctness tests 2018-05-17 10:45:50 +01:00
Cedric Nugteren b855af681f Added a canary region for overflow detection to the tuners 2018-05-17 10:45:10 +01:00
Cedric Nugteren e057a9186a First version of direct reading from image tensor for convgemm: only for edge cases now 2018-05-17 09:23:28 +01:00
Cedric Nugteren 0cb9580042 Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm kernel 2018-05-13 22:10:21 +02:00
Cedric Nugteren ad8f1027ab Plugged in the code of strided-batched-gemm into convgemm in preparation of a new kernel 2018-05-13 21:01:46 +02:00
Cedric Nugteren 4e6d30088d Changed temporary convgemm implementation to use batched-strided GEMM 2018-05-09 20:38:39 +02:00
Cedric Nugteren b608280361 Fixed the performance client for convgemm and added GFLOPS measurements 2018-05-09 19:59:31 +02:00
Cedric Nugteren fa4fee4fee
Merge pull request #279 from umar456/ci_links
Update ci links to use doman names and build names instead of IP/id
2018-05-09 18:56:25 +02:00
Cedric Nugteren a4119531ee Updated the documentation for convgemm to include data layout (NCHW) 2018-05-09 17:46:27 +02:00
Cedric Nugteren cc95d4fa03 Implemented convolution as im2col + GEMM 2018-05-09 17:42:59 +02:00
Cedric Nugteren 52e6195628 Split channels/strides testing values off from kernel sizes for more flexibility 2018-05-09 17:23:55 +02:00
Umar Arshad 1659ae5432 Update ci links to use doman names and build names instead of IP/id
Updates the README badges to point to the domain name instead of
IP addresses. Also updates the names of the builds to the name
of the build instead of the id of the build.
2018-05-08 20:24:40 -04:00
Cedric Nugteren 2d1f6ba7fe Added convgemm skeleton, test infrastructure, and first reference implementation 2018-05-06 11:35:34 +02:00
Cedric Nugteren 2776d76176 Added interface of batched convolution as GEMM 2018-05-05 14:06:33 +02:00
Cedric Nugteren 8b381480f8 Updated README with new badges and paper citation 2018-05-01 20:51:10 +02:00
Cedric Nugteren 8258321a74 Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00
Cedric Nugteren b2248a17ae
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
Intel subgroup shuffling
2018-04-29 15:48:35 +02:00
Cedric Nugteren 9f22bc232b Updated the changelog 2018-04-29 15:06:44 +02:00
Cedric Nugteren 0022107b2a Updated the roadmap 2018-04-29 15:06:33 +02:00
Cedric Nugteren 7b416c8686 Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program 2018-04-26 21:10:17 +02:00
Cedric Nugteren 2965b87dda Added Intel subgroup shuffle support to the 2D register caching GEMM kernel 2018-04-24 21:32:42 +02:00
Cedric Nugteren 2b1e0295e6 Added a define to enable subgroup shuffling if supported by the device 2018-04-24 20:41:15 +02:00
Cedric Nugteren 5d46a3193e
Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernel
Added 2D-register-caching GEMM kernel
2018-04-21 21:15:44 +02:00
Cedric Nugteren 3e3a26e0da Fixes for the CUDA API 2018-04-20 21:50:36 +02:00
Cedric Nugteren 458e6717a9 Expressed HER2K as two HERK calls 2018-04-18 20:58:29 +02:00
Cedric Nugteren dcce23d938 Expressed SYR2K as two SYRK calls 2018-04-18 20:29:28 +02:00