Commit graph

1395 commits

Author SHA1 Message Date
Cedric Nugteren ef6b1207df Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel 2018-04-17 21:13:28 +02:00
Cedric Nugteren 93610a9cba Fixed some failing tests for GEMM and batched GEMM routines 2018-04-15 12:53:32 +02:00
Cedric Nugteren f14e6f87d2 Updated tuning results for the Skylake ULT GT2 GPU with the new kernel 2018-04-15 11:45:45 +02:00
Cedric Nugteren 0dff7f1ac4 Made GEMM rotation expectations kernel-specific 2018-04-13 22:27:11 +02:00
Cedric Nugteren 0f49dd24e5 Updated database with defaults of GEMMK=0 and KREG=1 2018-04-10 21:26:18 +02:00
Cedric Nugteren f6a48f05ed Made it possible to add tuning parameters to the database using the script 2018-04-10 21:24:36 +02:00
Cedric Nugteren 3fbbb81137 Fixed a bug in the compression part of the database script 2018-04-10 21:18:11 +02:00
Cedric Nugteren 77ba11f686 Extended the maximum number of tuning parameters from 14 to 16 2018-04-08 18:12:54 +02:00
Cedric Nugteren a93fec1026 Fixed issues with the pre-processor 2018-04-08 18:02:44 +02:00
Cedric Nugteren 7cbc6b7495 Merge branch 'master' into CLBlast-228-2d-register-gemm-kernel 2018-04-07 17:51:40 +02:00
Cedric Nugteren 16f7f49683 Added tuning results for NVIDIA GeForce 970 2018-04-07 17:48:25 +02:00
Cedric Nugteren 9596e46d01 Added tuning results for NVIDIA GeForce 920MX 2018-04-07 17:44:32 +02:00
Cedric Nugteren cf7965dc68 Fixed a python3 import error issue with the database script 2018-04-07 17:40:43 +02:00
Cedric Nugteren 048fe90e57 Added tuning results for Intel HD Graphics 620 2018-04-07 17:33:57 +02:00
Cedric Nugteren 3519d32ac4 Extended the GEMM tuner to be able to tune the new 'kernel 1' 2018-04-07 17:05:44 +02:00
Cedric Nugteren 381f1fe67a Fixed a compilation issue for complex datatypes and vload 2018-04-07 16:57:36 +02:00
Cedric Nugteren 2a29dc061c Fixed a compilation issue for complex datatypes and vload 2018-04-06 21:06:13 +02:00
Cedric Nugteren eae25f5727 Added first version of 2D register tiling kernel with A and C transposed as well 2018-04-03 21:18:40 +02:00
Cedric Nugteren 63996eb68b Updated pyclblast to 1.1.0 and uploaded to PyPi 2018-03-30 10:38:36 +02:00
Cedric Nugteren 4de220a7a2
Merge pull request #255 from kodonnell/py_override
Adding override parameters to pyclblast
2018-03-30 10:28:00 +02:00
Cedric Nugteren d86ff75fa5 Added argument checking for the GEMM tuner: expects m/n to be multiples of MWG/NWG 2018-03-30 10:23:33 +02:00
Cedric Nugteren 7e69c422af Updated the roadmap 2018-03-30 10:05:16 +02:00
Cedric Nugteren bb0889fa7a Merge branch 'CLBlast-227-vivante-compiler-errors' 2018-03-30 09:22:09 +02:00
kodonell 173a7eb928 merged 2018-03-27 08:55:39 +13:00
kodonell d16f2d1317 got the generator thing working 2018-03-27 08:45:54 +13:00
kodonell f07c2a29b8 moved override_parameters example out of sgemm example 2018-03-27 08:30:58 +13:00
kodonell 58e70c56f1 tidying up pyclblast override_parameters api, and added example 2018-03-26 08:51:55 +13:00
Cedric Nugteren 1cbe2ea301 Removed arrays as function argument from GEMM kernels for Vivante OpenCL compiler 2018-03-23 20:29:20 +01:00
Cedric Nugteren a97d8a0197
Merge pull request #269 from CNugteren/CLBlast-266-local-mem-constraint
CLBlast #266 local mem constraint
2018-03-22 22:42:33 +01:00
Cedric Nugteren 9fb6550dd0 Added the OpenCL local memory size constraint to the tuners 2018-03-22 21:01:02 +01:00
Cedric Nugteren 7a2371213b Re-added support for local memory size constraint checking in the tuner 2018-03-21 22:58:37 +01:00
Cedric Nugteren 52791bf355 Fixed a failing TRSM test using a CPU with Apple OpenCL 2018-03-15 21:09:52 +01:00
Cedric Nugteren 7a756cbce7 Fixed a failing TRSV test using a CPU with Apple OpenCL 2018-03-15 20:58:42 +01:00
Cedric Nugteren f4d96e80c3 Fixed breaking preprocessor test on certain platforms due to empty kernel string 2018-03-15 20:45:41 +01:00
Cedric Nugteren 9ff6cd7547 Added queue-finish commands to PyCLBlast samples and tests 2018-03-15 20:37:48 +01:00
Cedric Nugteren 934893972e
Merge pull request #262 from CNugteren/CLBlast-237-tuning-api
CLBlast #237: Tuning API
2018-03-11 15:38:33 +01:00
Cedric Nugteren bcf1208431 Added basic tests for PyCLBlast 2018-03-11 15:32:36 +01:00
Cedric Nugteren 0dd1bc6f48 Made benchmarking script also work for complex numbers 2018-03-10 17:03:57 +01:00
Cedric Nugteren 49b02ec194 Added initial glossary 2018-03-10 17:02:38 +01:00
Cedric Nugteren 86455841d1 Added badge for OSX-Intel-CPU builds 2018-03-10 16:49:36 +01:00
Cedric Nugteren 903deaf368 Fixed an issue for DLL linking under Windows 2018-03-10 16:45:31 +01:00
Cedric Nugteren e7dccfa3cc Fixed an issue for DLL linking under Windows 2018-03-10 14:57:36 +01:00
Cedric Nugteren 54bbc99273 Updated the documentation for the tuner API 2018-03-10 14:52:40 +01:00
Cedric Nugteren 3d2ef9331b Fixed a few things for the new tuning API 2018-03-10 14:35:11 +01:00
Cedric Nugteren 0bdc51e47c Completed the API for all tuneable kernels 2018-03-10 10:54:44 +01:00
kodonell c6056da0c8 ok, device id working 2018-03-10 22:21:30 +13:00
Cedric Nugteren 6397e61746 Added several more tuner API functions 2018-03-09 21:40:22 +01:00
kodonell 54a4b871b3 initial add of override parameters to pyclblast - cython not complaining, but segfault 2018-03-09 15:27:33 +13:00
Cedric Nugteren 49cc8b31ff Fixed compilation issue in Xger tuner 2018-03-06 20:59:23 +01:00
Cedric Nugteren 0e1a152023 First version of the tuning API, added interface for copy-kernel, added sample 2018-03-06 20:52:12 +01:00