Commit Graph

1473 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren ec501055f9
Merge pull request #360 from CNugteren/CLBlast-359-fix-broken-iamin
Fixed a bug in the absolute-min index kernel
2019-05-19 22:39:26 +02:00
Cedric Nugteren 3f9d7bca22 Fixed a bug in the absolute-min index kernel 2019-05-19 14:00:18 +02:00
Cedric Nugteren 500d19be4c
Merge pull request #357 from CNugteren/CLBlast-355-intel-shuffle-extension-fix
intel shuffle extension fix
2019-05-16 20:12:32 +02:00
Cedric Nugteren af6a9eedd1 Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 2019-05-11 20:39:00 +02:00
Cedric Nugteren 9cbffc9b7c Changed back to cl_intel_subgroups as suggested 2019-05-08 22:01:56 +02:00
Cedric Nugteren c5a82f6978 Added a host-code check to make sure the avc_motion_estimation is available 2019-05-07 20:47:50 +02:00
Cedric Nugteren c6ba86cdc3 Enabled avc_motion_estimation extension for Intel subgroup shuffling 2019-05-07 20:47:31 +02:00
Cedric Nugteren 774cebaa40
Merge pull request #356 from umar456/osx_assert
Remove assert for extention not available in macOS
2019-05-06 09:58:36 +02:00
Umar Arshad cf4907942c Remove assert for extention not available in macOS
The cl_nv_device_attribute_query extention is not available on the
Apple platform. This caused failures during debug builds at runtime.
2019-05-03 23:28:07 -04:00
Cedric Nugteren 7084311e45 Added tuning parameters for Tesla P100 16GB 2019-02-09 16:31:48 +01:00
Cedric Nugteren 1035e533cd Added tuning parameters for Xeon E5-2630 v3 and v4 2019-02-09 16:29:30 +01:00
Cedric Nugteren eff0f9ad1d
Merge pull request #348 from CNugteren/CLBlast-334-pyclblast-half-precision-support
PyCLBlast half precision support
2019-01-26 11:04:14 +01:00
Cedric Nugteren e0541c41a1 Added fp32 to fp16 conversion function in Python to make haxpy example work 2019-01-23 19:52:01 +01:00
Cedric Nugteren 347f0df32f Added a (non-working) sample of half precision AXPY in Python 2019-01-22 21:14:43 +01:00
Cedric Nugteren 23b9f655fa Updated pyclblast README, updated to 1.2.0 for half-precision support 2019-01-22 21:14:02 +01:00
Cedric Nugteren 3937efdcda Added experimental support for half-precision in pyclblast 2019-01-22 21:13:41 +01:00
Cedric Nugteren 9a9c24e811
Merge pull request #345 from CNugteren/convolution-fixes-and-tuner
Convolution with single kernel
2019-01-19 17:56:05 +01:00
Cedric Nugteren 11f4c7dd93 Added documentation on the convgemm routine 2019-01-19 15:44:19 +01:00
Cedric Nugteren c42e48068b Added a few more initial Intel tuning parameters for convgemm 2019-01-19 15:32:35 +01:00
Cedric Nugteren afcf5dc6eb Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine 2019-01-05 10:56:35 +01:00
Cedric Nugteren 560f7a40f6 Added convgemm to the CLBlast database, added initial parameters for Skylake GPU 2018-12-31 19:05:34 +01:00
Cedric Nugteren d929525039 Added support for the convgemm tuner in the tuner database 2018-12-31 18:49:12 +01:00
Cedric Nugteren 153ac06cf2 Added the forgotten batch dimension to the tuner to get correct kernel executions 2018-12-31 13:19:58 +01:00
Cedric Nugteren b894993967
Merge pull request #343 from vbkaisetsu/feature/convgemm-single
Fix single kernel version of convgemm
2018-12-23 11:11:59 +01:00
Cedric Nugteren 1f41c3c50a Merge branch 'master' into convolution-fixes-and-tuner 2018-12-22 11:40:19 +01:00
Koichi Akabe 9532f8652c Update changelog 2018-12-21 11:08:01 +09:00
Koichi Akabe c0883cf2fe Update the documentation 2018-12-18 14:08:16 +09:00
Koichi Akabe a8e6f813dd Fix the xconvgemm tuner 2018-12-18 14:05:25 +09:00
Cedric Nugteren 1f0cd61824 Added first version of a tuner for the ConvGemm direct kernel 2018-12-18 13:59:26 +09:00
Koichi Akabe 301dc280df Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel 2018-12-18 13:56:00 +09:00
Cedric Nugteren 9819957768
Merge pull request #342 from vbkaisetsu/fix/im2col-hf-tests
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
2018-12-17 22:39:53 +01:00
Koichi Akabe d9db543d75 Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm 2018-12-17 21:57:35 +09:00
Cedric Nugteren 0c9411c844 Updated to version 1.5.0 2018-12-04 20:46:02 +01:00
Cedric Nugteren 09ab5f512f Updated the roadmap document 2018-12-01 17:20:36 +01:00
Cedric Nugteren 4676ec2921 Added a FAQ document 2018-12-01 17:19:28 +01:00
Cedric Nugteren cec021ac34
Merge pull request #341 from CNugteren/CLBlast-340-GEMMK1-issue-with-unequal-MWG-NWG
Fixed an issue for the GEMMK == 1 kernel
2018-12-01 17:14:47 +01:00
Cedric Nugteren c0e41b87cb Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel 2018-11-30 20:23:26 +01:00
Cedric Nugteren bca1506e87
Merge pull request #335 from vbkaisetsu/patch-1
Remove unnecessary qualifier of inline function
2018-11-19 21:03:27 +01:00
Koichi Akabe a646d6ca46
Remove unnecessary attribute of inline function 2018-11-19 13:03:50 +09:00
Cedric Nugteren e0ddfbfa3b
Merge pull request #332 from vbkaisetsu/feature/im2col-col2im-flip
Add im2colflip and col2imflip functions
2018-11-17 20:51:11 +01:00
Koichi Akabe 032e3b0cc0 Add kernel_mode option to im2col, col2im, and convgemm functions 2018-11-12 10:12:07 +09:00
Cedric Nugteren 90112618da
Merge pull request #331 from CNugteren/CLBlast-270-col2im
Implements col2im routine
2018-11-09 08:06:13 +01:00
Cedric Nugteren 6f67525ea6 Changed col2im to append to the existing im-buffer 2018-11-07 19:45:07 +01:00
Cedric Nugteren 2d32a23293 Added new col2im routine to the documentation 2018-11-01 21:46:19 +01:00
Cedric Nugteren 469c346a8e Fixed half-precision tests for im2col and col2im 2018-11-01 21:44:21 +01:00
Cedric Nugteren 4215bbe62a
Merge pull request #330 from vbkaisetsu/CLBlast-270-col2im
Add col2im function
2018-10-31 10:37:21 +01:00
Koichi Akabe 0b3d04f709 Fix col2im implementation 2018-10-30 14:54:55 +09:00
Cedric Nugteren 441373c8fd
Merge pull request #329 from tholu/patch-1
Update FindOpenCL.cmake
2018-10-29 20:06:01 +01:00
Thomas Lutz 17d045cc41
Update FindOpenCL.cmake
Add path to ROCm OpenCL as possible location in cmake script
2018-10-28 21:52:25 +01:00
Cedric Nugteren d45911b61d Added groundwork for col2im algorithm plus first non-working version of kernel and test 2018-10-23 20:52:25 +02:00