Commit graph

736 commits

Author SHA1 Message Date
Cedric Nugteren 1fa0930d85 Fix Windows paths in pyclblast 2021-02-05 21:52:23 +01:00
Cedric Nugteren d57f8065ea Added second Windows library path 2021-02-04 20:13:02 +01:00
Cedric Nugteren c78c649844 Add library path for Windows as well 2021-01-30 14:28:11 +01:00
Cedric Nugteren bbcb357a71 Add library dir on Linux for pyclblast 2021-01-29 20:48:05 +01:00
Cedric Nugteren 07837a5c2d Update pyclblast package version number 2021-01-21 20:49:31 +01:00
Jerry James dc82a1fbc8 Use reference types to prevent unnecessary copying 2021-01-20 10:21:36 -07:00
Cedric Nugteren 0ee39af5ed Add tuning results for TITAN RTX 2020-10-10 13:01:12 +02:00
Cedric Nugteren 481d86665f Add tuning results for Radeon RX Vega 2020-10-10 12:56:28 +02:00
Pradeep Garigipati dff65e9217 Add a cautionary note in Program::GetIR and mention the fix in CHANGELOG 2020-06-07 21:13:33 +05:30
Pradeep Garigipati aec71699f8
Fix Program::GetIR to handle programs with multiple devices 2020-06-05 12:00:45 +05:30
Cedric Nugteren c369cf1a16 Increase display width of the local/global sizes 2020-05-11 20:26:33 +02:00
Cedric Nugteren 4a6c7c37a3 Made sure that the global workgroup size is a multiple of the local size in the tuners 2020-05-10 20:28:23 +02:00
Cedric Nugteren 69a4b4d4b0 Added logging of local/global workgroup sizes when run the tuners 2020-05-10 20:08:28 +02:00
Cedric Nugteren 0870e76fba Updated PyCLBlast version number 2020-05-10 14:55:03 +02:00
Cedric Nugteren 0b7ce8033c Added a sample to demonstrate a batched routine 2020-05-10 14:54:50 +02:00
Cedric Nugteren b94e81af10 Added pyclblast bindings for the 3 batched routines 2020-05-10 12:26:25 +02:00
Cedric Nugteren bbb2031bf3 Move queue creation out of the tuner loop 2020-05-03 20:30:55 +02:00
Cedric Nugteren b46853660e Made it more likely (but no guarantees) for amax/amin to return the first index 2020-03-08 11:26:49 +01:00
Cedric Nugteren e3ce88154a Silenced a new OpenCL warning message 2020-03-08 10:14:59 +01:00
Cedric Nugteren 49eb490ee1 Catches all exceptions of the tuners 2020-02-17 22:07:51 +01:00
Tarmo Räntilä 21b66ca761 Reduce TestMatrix calls for xgemmstridedbatched.
Replace the looped test by a single one with the offset of the last batch.
2019-12-09 22:17:24 +02:00
Tarmo Räntilä bf50c4e53e Reduce TestMatrix calls for xgemmbatched.
Replace the looped test by a single one with the maximal found offset.
2019-12-09 22:13:52 +02:00
etomzak 9560193a9e Fix out-of-bounds read/write in XhadFaster
Fix an error in XhadFaster where data would be written beyond the end of zgm.
The kernel loop assumed that there was always enough work for each thread to
process WPT items, but this was not enforced. It's possible to detect the
overflow with the "canary" buffer regions, but for SHAD, kCanarySize must be
~500 (much larger than the normal 127).

This commit may improve the performance of XhadFaster, since the kernel was
performing 2x work in some cases (once over real data, once over garbage).

Courtesy of Codeplay Software Ltd.
2019-09-04 12:55:25 +01:00
Cedric Nugteren 3f9d7bca22 Fixed a bug in the absolute-min index kernel 2019-05-19 14:00:18 +02:00
Cedric Nugteren af6a9eedd1 Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 2019-05-11 20:39:00 +02:00
Cedric Nugteren 9cbffc9b7c Changed back to cl_intel_subgroups as suggested 2019-05-08 22:01:56 +02:00
Cedric Nugteren c5a82f6978 Added a host-code check to make sure the avc_motion_estimation is available 2019-05-07 20:47:50 +02:00
Cedric Nugteren c6ba86cdc3 Enabled avc_motion_estimation extension for Intel subgroup shuffling 2019-05-07 20:47:31 +02:00
Umar Arshad cf4907942c Remove assert for extention not available in macOS
The cl_nv_device_attribute_query extention is not available on the
Apple platform. This caused failures during debug builds at runtime.
2019-05-03 23:28:07 -04:00
Cedric Nugteren 7084311e45 Added tuning parameters for Tesla P100 16GB 2019-02-09 16:31:48 +01:00
Cedric Nugteren 1035e533cd Added tuning parameters for Xeon E5-2630 v3 and v4 2019-02-09 16:29:30 +01:00
Cedric Nugteren e0541c41a1 Added fp32 to fp16 conversion function in Python to make haxpy example work 2019-01-23 19:52:01 +01:00
Cedric Nugteren 347f0df32f Added a (non-working) sample of half precision AXPY in Python 2019-01-22 21:14:43 +01:00
Cedric Nugteren 23b9f655fa Updated pyclblast README, updated to 1.2.0 for half-precision support 2019-01-22 21:14:02 +01:00
Cedric Nugteren 3937efdcda Added experimental support for half-precision in pyclblast 2019-01-22 21:13:41 +01:00
Cedric Nugteren 9a9c24e811
Merge pull request #345 from CNugteren/convolution-fixes-and-tuner
Convolution with single kernel
2019-01-19 17:56:05 +01:00
Cedric Nugteren c42e48068b Added a few more initial Intel tuning parameters for convgemm 2019-01-19 15:32:35 +01:00
Cedric Nugteren afcf5dc6eb Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine 2019-01-05 10:56:35 +01:00
Cedric Nugteren 560f7a40f6 Added convgemm to the CLBlast database, added initial parameters for Skylake GPU 2018-12-31 19:05:34 +01:00
Cedric Nugteren d929525039 Added support for the convgemm tuner in the tuner database 2018-12-31 18:49:12 +01:00
Cedric Nugteren 153ac06cf2 Added the forgotten batch dimension to the tuner to get correct kernel executions 2018-12-31 13:19:58 +01:00
Koichi Akabe a8e6f813dd Fix the xconvgemm tuner 2018-12-18 14:05:25 +09:00
Cedric Nugteren 1f0cd61824 Added first version of a tuner for the ConvGemm direct kernel 2018-12-18 13:59:26 +09:00
Koichi Akabe 301dc280df Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel 2018-12-18 13:56:00 +09:00
Cedric Nugteren c0e41b87cb Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel 2018-11-30 20:23:26 +01:00
Koichi Akabe a646d6ca46
Remove unnecessary attribute of inline function 2018-11-19 13:03:50 +09:00
Koichi Akabe 032e3b0cc0 Add kernel_mode option to im2col, col2im, and convgemm functions 2018-11-12 10:12:07 +09:00
Cedric Nugteren 6f67525ea6 Changed col2im to append to the existing im-buffer 2018-11-07 19:45:07 +01:00
Cedric Nugteren 2d32a23293 Added new col2im routine to the documentation 2018-11-01 21:46:19 +01:00
Koichi Akabe 0b3d04f709 Fix col2im implementation 2018-10-30 14:54:55 +09:00