Commit graph

1322 commits

Author SHA1 Message Date
Cedric Nugteren 8290ad78b9 Fixed a few issues with canary region testing 2018-05-17 12:16:32 +02:00
Cedric Nugteren 85341836dd Added a canary region for overflow detection to the correctness tests 2018-05-17 10:45:50 +01:00
Cedric Nugteren b855af681f Added a canary region for overflow detection to the tuners 2018-05-17 10:45:10 +01:00
Cedric Nugteren e057a9186a First version of direct reading from image tensor for convgemm: only for edge cases now 2018-05-17 09:23:28 +01:00
Cedric Nugteren 0cb9580042 Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm kernel 2018-05-13 22:10:21 +02:00
Cedric Nugteren ad8f1027ab Plugged in the code of strided-batched-gemm into convgemm in preparation of a new kernel 2018-05-13 21:01:46 +02:00
Cedric Nugteren 4e6d30088d Changed temporary convgemm implementation to use batched-strided GEMM 2018-05-09 20:38:39 +02:00
Cedric Nugteren b608280361 Fixed the performance client for convgemm and added GFLOPS measurements 2018-05-09 19:59:31 +02:00
Cedric Nugteren fa4fee4fee
Merge pull request #279 from umar456/ci_links
Update ci links to use doman names and build names instead of IP/id
2018-05-09 18:56:25 +02:00
Cedric Nugteren a4119531ee Updated the documentation for convgemm to include data layout (NCHW) 2018-05-09 17:46:27 +02:00
Cedric Nugteren cc95d4fa03 Implemented convolution as im2col + GEMM 2018-05-09 17:42:59 +02:00
Cedric Nugteren 52e6195628 Split channels/strides testing values off from kernel sizes for more flexibility 2018-05-09 17:23:55 +02:00
Umar Arshad 1659ae5432 Update ci links to use doman names and build names instead of IP/id
Updates the README badges to point to the domain name instead of
IP addresses. Also updates the names of the builds to the name
of the build instead of the id of the build.
2018-05-08 20:24:40 -04:00
Cedric Nugteren 2d1f6ba7fe Added convgemm skeleton, test infrastructure, and first reference implementation 2018-05-06 11:35:34 +02:00
Cedric Nugteren 2776d76176 Added interface of batched convolution as GEMM 2018-05-05 14:06:33 +02:00
Cedric Nugteren 8b381480f8 Updated README with new badges and paper citation 2018-05-01 20:51:10 +02:00
Cedric Nugteren 8258321a74 Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00
Cedric Nugteren b2248a17ae
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
Intel subgroup shuffling
2018-04-29 15:48:35 +02:00
Cedric Nugteren 9f22bc232b Updated the changelog 2018-04-29 15:06:44 +02:00
Cedric Nugteren 0022107b2a Updated the roadmap 2018-04-29 15:06:33 +02:00
Cedric Nugteren 7b416c8686 Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program 2018-04-26 21:10:17 +02:00
Cedric Nugteren 2965b87dda Added Intel subgroup shuffle support to the 2D register caching GEMM kernel 2018-04-24 21:32:42 +02:00
Cedric Nugteren 2b1e0295e6 Added a define to enable subgroup shuffling if supported by the device 2018-04-24 20:41:15 +02:00
Cedric Nugteren 5d46a3193e
Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernel
Added 2D-register-caching GEMM kernel
2018-04-21 21:15:44 +02:00
Cedric Nugteren 3e3a26e0da Fixes for the CUDA API 2018-04-20 21:50:36 +02:00
Cedric Nugteren 458e6717a9 Expressed HER2K as two HERK calls 2018-04-18 20:58:29 +02:00
Cedric Nugteren dcce23d938 Expressed SYR2K as two SYRK calls 2018-04-18 20:29:28 +02:00
Cedric Nugteren ef6b1207df Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel 2018-04-17 21:13:28 +02:00
Cedric Nugteren 93610a9cba Fixed some failing tests for GEMM and batched GEMM routines 2018-04-15 12:53:32 +02:00
Cedric Nugteren f14e6f87d2 Updated tuning results for the Skylake ULT GT2 GPU with the new kernel 2018-04-15 11:45:45 +02:00
Cedric Nugteren 0dff7f1ac4 Made GEMM rotation expectations kernel-specific 2018-04-13 22:27:11 +02:00
Cedric Nugteren 0f49dd24e5 Updated database with defaults of GEMMK=0 and KREG=1 2018-04-10 21:26:18 +02:00
Cedric Nugteren f6a48f05ed Made it possible to add tuning parameters to the database using the script 2018-04-10 21:24:36 +02:00
Cedric Nugteren 3fbbb81137 Fixed a bug in the compression part of the database script 2018-04-10 21:18:11 +02:00
Cedric Nugteren 77ba11f686 Extended the maximum number of tuning parameters from 14 to 16 2018-04-08 18:12:54 +02:00
Cedric Nugteren a93fec1026 Fixed issues with the pre-processor 2018-04-08 18:02:44 +02:00
Cedric Nugteren 7cbc6b7495 Merge branch 'master' into CLBlast-228-2d-register-gemm-kernel 2018-04-07 17:51:40 +02:00
Cedric Nugteren 16f7f49683 Added tuning results for NVIDIA GeForce 970 2018-04-07 17:48:25 +02:00
Cedric Nugteren 9596e46d01 Added tuning results for NVIDIA GeForce 920MX 2018-04-07 17:44:32 +02:00
Cedric Nugteren cf7965dc68 Fixed a python3 import error issue with the database script 2018-04-07 17:40:43 +02:00
Cedric Nugteren 048fe90e57 Added tuning results for Intel HD Graphics 620 2018-04-07 17:33:57 +02:00
Cedric Nugteren 3519d32ac4 Extended the GEMM tuner to be able to tune the new 'kernel 1' 2018-04-07 17:05:44 +02:00
Cedric Nugteren 381f1fe67a Fixed a compilation issue for complex datatypes and vload 2018-04-07 16:57:36 +02:00
Cedric Nugteren 2a29dc061c Fixed a compilation issue for complex datatypes and vload 2018-04-06 21:06:13 +02:00
Cedric Nugteren eae25f5727 Added first version of 2D register tiling kernel with A and C transposed as well 2018-04-03 21:18:40 +02:00
Cedric Nugteren 63996eb68b Updated pyclblast to 1.1.0 and uploaded to PyPi 2018-03-30 10:38:36 +02:00
Cedric Nugteren 4de220a7a2
Merge pull request #255 from kodonnell/py_override
Adding override parameters to pyclblast
2018-03-30 10:28:00 +02:00
Cedric Nugteren d86ff75fa5 Added argument checking for the GEMM tuner: expects m/n to be multiples of MWG/NWG 2018-03-30 10:23:33 +02:00
Cedric Nugteren 7e69c422af Updated the roadmap 2018-03-30 10:05:16 +02:00
Cedric Nugteren bb0889fa7a Merge branch 'CLBlast-227-vivante-compiler-errors' 2018-03-30 09:22:09 +02:00