Commit Graph

1156 Commits (8258321a74f5b44a559c91bb0adb1358d22da801)

Author SHA1 Message Date
Cedric Nugteren 8258321a74 Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00
Cedric Nugteren b2248a17ae
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
Intel subgroup shuffling
2018-04-29 15:48:35 +02:00
Cedric Nugteren 9f22bc232b Updated the changelog 2018-04-29 15:06:44 +02:00
Cedric Nugteren 0022107b2a Updated the roadmap 2018-04-29 15:06:33 +02:00
Cedric Nugteren 7b416c8686 Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program 2018-04-26 21:10:17 +02:00
Cedric Nugteren 2965b87dda Added Intel subgroup shuffle support to the 2D register caching GEMM kernel 2018-04-24 21:32:42 +02:00
Cedric Nugteren 2b1e0295e6 Added a define to enable subgroup shuffling if supported by the device 2018-04-24 20:41:15 +02:00
Cedric Nugteren 5d46a3193e
Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernel
Added 2D-register-caching GEMM kernel
2018-04-21 21:15:44 +02:00
Cedric Nugteren 3e3a26e0da Fixes for the CUDA API 2018-04-20 21:50:36 +02:00
Cedric Nugteren 458e6717a9 Expressed HER2K as two HERK calls 2018-04-18 20:58:29 +02:00
Cedric Nugteren dcce23d938 Expressed SYR2K as two SYRK calls 2018-04-18 20:29:28 +02:00
Cedric Nugteren ef6b1207df Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel 2018-04-17 21:13:28 +02:00
Cedric Nugteren 93610a9cba Fixed some failing tests for GEMM and batched GEMM routines 2018-04-15 12:53:32 +02:00
Cedric Nugteren f14e6f87d2 Updated tuning results for the Skylake ULT GT2 GPU with the new kernel 2018-04-15 11:45:45 +02:00
Cedric Nugteren 0dff7f1ac4 Made GEMM rotation expectations kernel-specific 2018-04-13 22:27:11 +02:00
Cedric Nugteren 0f49dd24e5 Updated database with defaults of GEMMK=0 and KREG=1 2018-04-10 21:26:18 +02:00
Cedric Nugteren f6a48f05ed Made it possible to add tuning parameters to the database using the script 2018-04-10 21:24:36 +02:00
Cedric Nugteren 3fbbb81137 Fixed a bug in the compression part of the database script 2018-04-10 21:18:11 +02:00
Cedric Nugteren 77ba11f686 Extended the maximum number of tuning parameters from 14 to 16 2018-04-08 18:12:54 +02:00
Cedric Nugteren a93fec1026 Fixed issues with the pre-processor 2018-04-08 18:02:44 +02:00
Cedric Nugteren 7cbc6b7495 Merge branch 'master' into CLBlast-228-2d-register-gemm-kernel 2018-04-07 17:51:40 +02:00
Cedric Nugteren 16f7f49683 Added tuning results for NVIDIA GeForce 970 2018-04-07 17:48:25 +02:00
Cedric Nugteren 9596e46d01 Added tuning results for NVIDIA GeForce 920MX 2018-04-07 17:44:32 +02:00
Cedric Nugteren cf7965dc68 Fixed a python3 import error issue with the database script 2018-04-07 17:40:43 +02:00
Cedric Nugteren 048fe90e57 Added tuning results for Intel HD Graphics 620 2018-04-07 17:33:57 +02:00
Cedric Nugteren 3519d32ac4 Extended the GEMM tuner to be able to tune the new 'kernel 1' 2018-04-07 17:05:44 +02:00
Cedric Nugteren 381f1fe67a Fixed a compilation issue for complex datatypes and vload 2018-04-07 16:57:36 +02:00
Cedric Nugteren 2a29dc061c Fixed a compilation issue for complex datatypes and vload 2018-04-06 21:06:13 +02:00
Cedric Nugteren eae25f5727 Added first version of 2D register tiling kernel with A and C transposed as well 2018-04-03 21:18:40 +02:00
Cedric Nugteren 63996eb68b Updated pyclblast to 1.1.0 and uploaded to PyPi 2018-03-30 10:38:36 +02:00
Cedric Nugteren 4de220a7a2
Merge pull request #255 from kodonnell/py_override
Adding override parameters to pyclblast
2018-03-30 10:28:00 +02:00
Cedric Nugteren d86ff75fa5 Added argument checking for the GEMM tuner: expects m/n to be multiples of MWG/NWG 2018-03-30 10:23:33 +02:00
Cedric Nugteren 7e69c422af Updated the roadmap 2018-03-30 10:05:16 +02:00
Cedric Nugteren bb0889fa7a Merge branch 'CLBlast-227-vivante-compiler-errors' 2018-03-30 09:22:09 +02:00
kodonell 173a7eb928 merged 2018-03-27 08:55:39 +13:00
kodonell d16f2d1317 got the generator thing working 2018-03-27 08:45:54 +13:00
kodonell f07c2a29b8 moved override_parameters example out of sgemm example 2018-03-27 08:30:58 +13:00
kodonell 58e70c56f1 tidying up pyclblast override_parameters api, and added example 2018-03-26 08:51:55 +13:00
Cedric Nugteren 1cbe2ea301 Removed arrays as function argument from GEMM kernels for Vivante OpenCL compiler 2018-03-23 20:29:20 +01:00
Cedric Nugteren a97d8a0197
Merge pull request #269 from CNugteren/CLBlast-266-local-mem-constraint
CLBlast #266 local mem constraint
2018-03-22 22:42:33 +01:00
Cedric Nugteren 9fb6550dd0 Added the OpenCL local memory size constraint to the tuners 2018-03-22 21:01:02 +01:00
Cedric Nugteren 7a2371213b Re-added support for local memory size constraint checking in the tuner 2018-03-21 22:58:37 +01:00
Cedric Nugteren 52791bf355 Fixed a failing TRSM test using a CPU with Apple OpenCL 2018-03-15 21:09:52 +01:00
Cedric Nugteren 7a756cbce7 Fixed a failing TRSV test using a CPU with Apple OpenCL 2018-03-15 20:58:42 +01:00
Cedric Nugteren f4d96e80c3 Fixed breaking preprocessor test on certain platforms due to empty kernel string 2018-03-15 20:45:41 +01:00
Cedric Nugteren 9ff6cd7547 Added queue-finish commands to PyCLBlast samples and tests 2018-03-15 20:37:48 +01:00
Cedric Nugteren 934893972e
Merge pull request #262 from CNugteren/CLBlast-237-tuning-api
CLBlast #237: Tuning API
2018-03-11 15:38:33 +01:00
Cedric Nugteren bcf1208431 Added basic tests for PyCLBlast 2018-03-11 15:32:36 +01:00
Cedric Nugteren 0dd1bc6f48 Made benchmarking script also work for complex numbers 2018-03-10 17:03:57 +01:00
Cedric Nugteren 49b02ec194 Added initial glossary 2018-03-10 17:02:38 +01:00