Koichi Akabe
a646d6ca46
Remove unnecessary attribute of inline function
2018-11-19 13:03:50 +09:00
Cedric Nugteren
e0ddfbfa3b
Merge pull request #332 from vbkaisetsu/feature/im2col-col2im-flip
...
Add im2colflip and col2imflip functions
2018-11-17 20:51:11 +01:00
Koichi Akabe
032e3b0cc0
Add kernel_mode option to im2col, col2im, and convgemm functions
2018-11-12 10:12:07 +09:00
Cedric Nugteren
90112618da
Merge pull request #331 from CNugteren/CLBlast-270-col2im
...
Implements col2im routine
2018-11-09 08:06:13 +01:00
Cedric Nugteren
6f67525ea6
Changed col2im to append to the existing im-buffer
2018-11-07 19:45:07 +01:00
Cedric Nugteren
2d32a23293
Added new col2im routine to the documentation
2018-11-01 21:46:19 +01:00
Cedric Nugteren
469c346a8e
Fixed half-precision tests for im2col and col2im
2018-11-01 21:44:21 +01:00
Cedric Nugteren
4215bbe62a
Merge pull request #330 from vbkaisetsu/CLBlast-270-col2im
...
Add col2im function
2018-10-31 10:37:21 +01:00
Koichi Akabe
0b3d04f709
Fix col2im implementation
2018-10-30 14:54:55 +09:00
Cedric Nugteren
441373c8fd
Merge pull request #329 from tholu/patch-1
...
Update FindOpenCL.cmake
2018-10-29 20:06:01 +01:00
Thomas Lutz
17d045cc41
Update FindOpenCL.cmake
...
Add path to ROCm OpenCL as possible location in cmake script
2018-10-28 21:52:25 +01:00
Cedric Nugteren
d45911b61d
Added groundwork for col2im algorithm plus first non-working version of kernel and test
2018-10-23 20:52:25 +02:00
Cedric Nugteren
44b630fc22
Some name changes in im2col code
2018-10-22 22:12:58 +02:00
Cedric Nugteren
ab0178c56b
Fixed MSVC's compilation error C1061 due to too many for-loops
2018-10-17 21:35:09 +02:00
Cedric Nugteren
9a1454496d
Fixed a bug with the pre-processing and the AXPY kernel
2018-10-17 21:15:53 +02:00
Cedric Nugteren
e33542acdd
Merge pull request #325 from CNugteren/CLBlast-321-axpy-faster-kernel-bug
...
Fixed a bug in the XaxpyFaster kernel for specific parameters
2018-10-16 21:06:57 +02:00
Cedric Nugteren
664a238adf
Fixed a bug in the XaxpyFaster kernel for specific parameters
2018-10-15 20:08:29 +02:00
Cedric Nugteren
634b2bc75c
Merge pull request #319 from CNugteren/convgemm_multi_kernel
...
First im2col+GEMM implementation of convolution
2018-10-14 17:27:45 +02:00
Cedric Nugteren
ff7bee93d3
Merge pull request #324 from CNugteren/CLBlast-315-tuning-api-improvements
...
Made tuning API more flexible
2018-10-14 17:26:13 +02:00
Cedric Nugteren
115a8f0f3d
Updated changelog regarding tuning API change
2018-10-13 17:49:49 +02:00
Cedric Nugteren
46c50cdd7e
Made tuning API more flexible: disregards any extra parameter values
2018-10-13 17:47:29 +02:00
Cedric Nugteren
8676b62178
Updated the documentation for GEMV tuning
2018-10-13 17:43:51 +02:00
Cedric Nugteren
e47b8f7f62
Merge pull request #323 from CNugteren/CLBlast-322-fix-preprocessor-warnings
...
Fixed pre-processor warnings related to the subgroup shuffling
2018-10-11 09:33:04 +02:00
Cedric Nugteren
1736c0cef4
Fixed pre-processor warnings related to the subgroup shuffling
2018-10-10 19:12:42 +02:00
Cedric Nugteren
83ba3d4b7b
Merge branch 'master' into convgemm_multi_kernel
2018-09-16 20:01:18 +02:00
Cedric Nugteren
c163868e18
Merge pull request #318 from CNugteren/CLBlast-315-preprocessor-gemmk1-issue
...
Fixed pre-processor issues with the new GEMMK=1 kernel
2018-09-15 21:47:04 +02:00
Cedric Nugteren
0f6dd01e51
Fixed an MSVC compilation error due to large strings
2018-09-15 19:58:07 +02:00
Cedric Nugteren
91dbd580ab
Added a kernel-parameter pair table to document the tuning API
2018-09-15 18:47:31 +02:00
Cedric Nugteren
9bedaa752d
Fixed an MSVC compilation error due to large strings
2018-09-15 17:35:26 +02:00
Cedric Nugteren
8ac39fa331
Disabled Intel subgroup shuffling for double-precision
2018-09-15 16:53:09 +02:00
Cedric Nugteren
51cc346751
Fixed issues with GEMMK=1 kernel and the pre-processor
2018-09-15 16:50:34 +02:00
Cedric Nugteren
4917b77e13
Added pre-processor test for GEMMK=1 kernel
2018-09-15 16:49:51 +02:00
Cedric Nugteren
b7d8339012
Reduced size of the xCONVGEMM correctness tests
2018-09-07 22:04:24 +02:00
Cedric Nugteren
bbb4523b7c
Added reference implementation for xCONVGEMM for half-precision
2018-09-07 22:04:08 +02:00
Cedric Nugteren
c788e040f7
Added xCONVGEMM as im2col plus a batched GEMM kernel
2018-09-07 22:02:44 +02:00
Cedric Nugteren
23e855d643
Merge pull request #316 from ranocha/patch-1
...
Add Julia Wrapper
2018-09-03 18:03:54 +02:00
Hendrik Ranocha
faed209f30
Add Julia Wrapper
...
I've written a wrapper of CLBlast in Julia which can be found [here](https://github.com/JuliaGPU/CLBlast.jl ). It is published and available using the Julia package manager.
2018-09-03 15:57:16 +02:00
Cedric Nugteren
c2c1e5fa95
Merge pull request #312 from CNugteren/CLBlast-311-missing-event-in-trsv-trsm
...
Missing events in TRSV and TRSM
2018-08-14 22:52:36 +02:00
Cedric Nugteren
bf43dbb4ee
Made last operation in TRSV and TRSM asynchronous, making the events not null
2018-08-13 22:58:44 +02:00
Cedric Nugteren
3115c15db5
Small refactoring of events in TRSV substitution routine
2018-08-13 22:58:01 +02:00
Cedric Nugteren
dd1fa7cc81
Merge pull request #310 from CNugteren/CLBlast-307-netlib-api-static-opencl-vars
...
Netlib API with optional static OpenCL variables
2018-08-09 21:37:47 +02:00
Cedric Nugteren
9d9f09fce9
Name change of setting to NETLIB_PERSISTENT_OPENCL
2018-08-07 22:41:06 +02:00
Cedric Nugteren
fe639455bd
Added an option to compile the Netlib API with static OpenCL device and context
2018-08-05 21:12:39 +02:00
Cedric Nugteren
2bea758165
Merge pull request #309 from CNugteren/CLBlast-306-omatcopy-conjugate
...
Fixes bug in conjugate transpose not being executed
2018-08-02 08:35:32 +02:00
Cedric Nugteren
bed10d2731
Merge pull request #308 from CNugteren/CLBlast-301-weird-AMD-Hainan-bug
...
Added workaround for AMD Southern Islands GPU issue
2018-07-31 21:49:53 +02:00
Cedric Nugteren
503ab74f02
Fixed issue with not performing complex conjugation under certain cases when transposing
2018-07-31 21:49:37 +02:00
Cedric Nugteren
391e5757bd
Fixed the tests of OMATCOPY to include proper complex conjugation
2018-07-31 21:44:28 +02:00
Cedric Nugteren
713d0f96b3
Fixed an error reporting issue related to the canary region
2018-07-31 21:24:21 +02:00
Cedric Nugteren
d749c4af72
Added note about AMD southern islands GPU issue and the required workaround
2018-07-31 20:55:56 +02:00
Cedric Nugteren
123f38a8ab
Added Beignet 1.2.1 requirement to the README for IvyBridge GPUs
2018-07-31 20:52:00 +02:00