Cedric Nugteren
|
3937efdcda
|
Added experimental support for half-precision in pyclblast
|
2019-01-22 21:13:41 +01:00 |
|
Cedric Nugteren
|
9a9c24e811
|
Merge pull request #345 from CNugteren/convolution-fixes-and-tuner
Convolution with single kernel
|
2019-01-19 17:56:05 +01:00 |
|
Cedric Nugteren
|
11f4c7dd93
|
Added documentation on the convgemm routine
|
2019-01-19 15:44:19 +01:00 |
|
Cedric Nugteren
|
c42e48068b
|
Added a few more initial Intel tuning parameters for convgemm
|
2019-01-19 15:32:35 +01:00 |
|
Cedric Nugteren
|
afcf5dc6eb
|
Added a check to prevent the stride of matrix C being set to 0 for the strided-batched-GEMM routine
|
2019-01-05 10:56:35 +01:00 |
|
Cedric Nugteren
|
560f7a40f6
|
Added convgemm to the CLBlast database, added initial parameters for Skylake GPU
|
2018-12-31 19:05:34 +01:00 |
|
Cedric Nugteren
|
d929525039
|
Added support for the convgemm tuner in the tuner database
|
2018-12-31 18:49:12 +01:00 |
|
Cedric Nugteren
|
153ac06cf2
|
Added the forgotten batch dimension to the tuner to get correct kernel executions
|
2018-12-31 13:19:58 +01:00 |
|
Cedric Nugteren
|
b894993967
|
Merge pull request #343 from vbkaisetsu/feature/convgemm-single
Fix single kernel version of convgemm
|
2018-12-23 11:11:59 +01:00 |
|
Cedric Nugteren
|
1f41c3c50a
|
Merge branch 'master' into convolution-fixes-and-tuner
|
2018-12-22 11:40:19 +01:00 |
|
Koichi Akabe
|
9532f8652c
|
Update changelog
|
2018-12-21 11:08:01 +09:00 |
|
Koichi Akabe
|
c0883cf2fe
|
Update the documentation
|
2018-12-18 14:08:16 +09:00 |
|
Koichi Akabe
|
a8e6f813dd
|
Fix the xconvgemm tuner
|
2018-12-18 14:05:25 +09:00 |
|
Cedric Nugteren
|
1f0cd61824
|
Added first version of a tuner for the ConvGemm direct kernel
|
2018-12-18 13:59:26 +09:00 |
|
Koichi Akabe
|
301dc280df
|
Fix xconvgemm kernel and enable ConvGemmMethod::kSingleKernel
|
2018-12-18 13:56:00 +09:00 |
|
Cedric Nugteren
|
9819957768
|
Merge pull request #342 from vbkaisetsu/fix/im2col-hf-tests
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
|
2018-12-17 22:39:53 +01:00 |
|
Koichi Akabe
|
d9db543d75
|
Fix half-float+kernel_mode test cases of im2col, col2im, and convgemm
|
2018-12-17 21:57:35 +09:00 |
|
Cedric Nugteren
|
0c9411c844
|
Updated to version 1.5.0
|
2018-12-04 20:46:02 +01:00 |
|
Cedric Nugteren
|
09ab5f512f
|
Updated the roadmap document
|
2018-12-01 17:20:36 +01:00 |
|
Cedric Nugteren
|
4676ec2921
|
Added a FAQ document
|
2018-12-01 17:19:28 +01:00 |
|
Cedric Nugteren
|
cec021ac34
|
Merge pull request #341 from CNugteren/CLBlast-340-GEMMK1-issue-with-unequal-MWG-NWG
Fixed an issue for the GEMMK == 1 kernel
|
2018-12-01 17:14:47 +01:00 |
|
Cedric Nugteren
|
c0e41b87cb
|
Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel
|
2018-11-30 20:23:26 +01:00 |
|
Cedric Nugteren
|
bca1506e87
|
Merge pull request #335 from vbkaisetsu/patch-1
Remove unnecessary qualifier of inline function
|
2018-11-19 21:03:27 +01:00 |
|
Koichi Akabe
|
a646d6ca46
|
Remove unnecessary attribute of inline function
|
2018-11-19 13:03:50 +09:00 |
|
Cedric Nugteren
|
e0ddfbfa3b
|
Merge pull request #332 from vbkaisetsu/feature/im2col-col2im-flip
Add im2colflip and col2imflip functions
|
2018-11-17 20:51:11 +01:00 |
|
Koichi Akabe
|
032e3b0cc0
|
Add kernel_mode option to im2col, col2im, and convgemm functions
|
2018-11-12 10:12:07 +09:00 |
|
Cedric Nugteren
|
90112618da
|
Merge pull request #331 from CNugteren/CLBlast-270-col2im
Implements col2im routine
|
2018-11-09 08:06:13 +01:00 |
|
Cedric Nugteren
|
6f67525ea6
|
Changed col2im to append to the existing im-buffer
|
2018-11-07 19:45:07 +01:00 |
|
Cedric Nugteren
|
2d32a23293
|
Added new col2im routine to the documentation
|
2018-11-01 21:46:19 +01:00 |
|
Cedric Nugteren
|
469c346a8e
|
Fixed half-precision tests for im2col and col2im
|
2018-11-01 21:44:21 +01:00 |
|
Cedric Nugteren
|
4215bbe62a
|
Merge pull request #330 from vbkaisetsu/CLBlast-270-col2im
Add col2im function
|
2018-10-31 10:37:21 +01:00 |
|
Koichi Akabe
|
0b3d04f709
|
Fix col2im implementation
|
2018-10-30 14:54:55 +09:00 |
|
Cedric Nugteren
|
441373c8fd
|
Merge pull request #329 from tholu/patch-1
Update FindOpenCL.cmake
|
2018-10-29 20:06:01 +01:00 |
|
Thomas Lutz
|
17d045cc41
|
Update FindOpenCL.cmake
Add path to ROCm OpenCL as possible location in cmake script
|
2018-10-28 21:52:25 +01:00 |
|
Cedric Nugteren
|
d45911b61d
|
Added groundwork for col2im algorithm plus first non-working version of kernel and test
|
2018-10-23 20:52:25 +02:00 |
|
Cedric Nugteren
|
44b630fc22
|
Some name changes in im2col code
|
2018-10-22 22:12:58 +02:00 |
|
Cedric Nugteren
|
ab0178c56b
|
Fixed MSVC's compilation error C1061 due to too many for-loops
|
2018-10-17 21:35:09 +02:00 |
|
Cedric Nugteren
|
9a1454496d
|
Fixed a bug with the pre-processing and the AXPY kernel
|
2018-10-17 21:15:53 +02:00 |
|
Cedric Nugteren
|
e33542acdd
|
Merge pull request #325 from CNugteren/CLBlast-321-axpy-faster-kernel-bug
Fixed a bug in the XaxpyFaster kernel for specific parameters
|
2018-10-16 21:06:57 +02:00 |
|
Cedric Nugteren
|
664a238adf
|
Fixed a bug in the XaxpyFaster kernel for specific parameters
|
2018-10-15 20:08:29 +02:00 |
|
Cedric Nugteren
|
634b2bc75c
|
Merge pull request #319 from CNugteren/convgemm_multi_kernel
First im2col+GEMM implementation of convolution
|
2018-10-14 17:27:45 +02:00 |
|
Cedric Nugteren
|
ff7bee93d3
|
Merge pull request #324 from CNugteren/CLBlast-315-tuning-api-improvements
Made tuning API more flexible
|
2018-10-14 17:26:13 +02:00 |
|
Cedric Nugteren
|
115a8f0f3d
|
Updated changelog regarding tuning API change
|
2018-10-13 17:49:49 +02:00 |
|
Cedric Nugteren
|
46c50cdd7e
|
Made tuning API more flexible: disregards any extra parameter values
|
2018-10-13 17:47:29 +02:00 |
|
Cedric Nugteren
|
8676b62178
|
Updated the documentation for GEMV tuning
|
2018-10-13 17:43:51 +02:00 |
|
Cedric Nugteren
|
e47b8f7f62
|
Merge pull request #323 from CNugteren/CLBlast-322-fix-preprocessor-warnings
Fixed pre-processor warnings related to the subgroup shuffling
|
2018-10-11 09:33:04 +02:00 |
|
Cedric Nugteren
|
1736c0cef4
|
Fixed pre-processor warnings related to the subgroup shuffling
|
2018-10-10 19:12:42 +02:00 |
|
Cedric Nugteren
|
83ba3d4b7b
|
Merge branch 'master' into convgemm_multi_kernel
|
2018-09-16 20:01:18 +02:00 |
|
Cedric Nugteren
|
c163868e18
|
Merge pull request #318 from CNugteren/CLBlast-315-preprocessor-gemmk1-issue
Fixed pre-processor issues with the new GEMMK=1 kernel
|
2018-09-15 21:47:04 +02:00 |
|
Cedric Nugteren
|
0f6dd01e51
|
Fixed an MSVC compilation error due to large strings
|
2018-09-15 19:58:07 +02:00 |
|