Commit Graph

1473 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Cedric Nugteren 44b630fc22 Some name changes in im2col code 2018-10-22 22:12:58 +02:00
Cedric Nugteren ab0178c56b Fixed MSVC's compilation error C1061 due to too many for-loops 2018-10-17 21:35:09 +02:00
Cedric Nugteren 9a1454496d Fixed a bug with the pre-processing and the AXPY kernel 2018-10-17 21:15:53 +02:00
Cedric Nugteren e33542acdd
Merge pull request #325 from CNugteren/CLBlast-321-axpy-faster-kernel-bug
Fixed a bug in the XaxpyFaster kernel for specific parameters
2018-10-16 21:06:57 +02:00
Cedric Nugteren 664a238adf Fixed a bug in the XaxpyFaster kernel for specific parameters 2018-10-15 20:08:29 +02:00
Cedric Nugteren 634b2bc75c
Merge pull request #319 from CNugteren/convgemm_multi_kernel
First im2col+GEMM implementation of convolution
2018-10-14 17:27:45 +02:00
Cedric Nugteren ff7bee93d3
Merge pull request #324 from CNugteren/CLBlast-315-tuning-api-improvements
Made tuning API more flexible
2018-10-14 17:26:13 +02:00
Cedric Nugteren 115a8f0f3d Updated changelog regarding tuning API change 2018-10-13 17:49:49 +02:00
Cedric Nugteren 46c50cdd7e Made tuning API more flexible: disregards any extra parameter values 2018-10-13 17:47:29 +02:00
Cedric Nugteren 8676b62178 Updated the documentation for GEMV tuning 2018-10-13 17:43:51 +02:00
Cedric Nugteren e47b8f7f62
Merge pull request #323 from CNugteren/CLBlast-322-fix-preprocessor-warnings
Fixed pre-processor warnings related to the subgroup shuffling
2018-10-11 09:33:04 +02:00
Cedric Nugteren 1736c0cef4 Fixed pre-processor warnings related to the subgroup shuffling 2018-10-10 19:12:42 +02:00
Cedric Nugteren 83ba3d4b7b Merge branch 'master' into convgemm_multi_kernel 2018-09-16 20:01:18 +02:00
Cedric Nugteren c163868e18
Merge pull request #318 from CNugteren/CLBlast-315-preprocessor-gemmk1-issue
Fixed pre-processor issues with the new GEMMK=1 kernel
2018-09-15 21:47:04 +02:00
Cedric Nugteren 0f6dd01e51 Fixed an MSVC compilation error due to large strings 2018-09-15 19:58:07 +02:00
Cedric Nugteren 91dbd580ab Added a kernel-parameter pair table to document the tuning API 2018-09-15 18:47:31 +02:00
Cedric Nugteren 9bedaa752d Fixed an MSVC compilation error due to large strings 2018-09-15 17:35:26 +02:00
Cedric Nugteren 8ac39fa331 Disabled Intel subgroup shuffling for double-precision 2018-09-15 16:53:09 +02:00
Cedric Nugteren 51cc346751 Fixed issues with GEMMK=1 kernel and the pre-processor 2018-09-15 16:50:34 +02:00
Cedric Nugteren 4917b77e13 Added pre-processor test for GEMMK=1 kernel 2018-09-15 16:49:51 +02:00
Cedric Nugteren b7d8339012 Reduced size of the xCONVGEMM correctness tests 2018-09-07 22:04:24 +02:00
Cedric Nugteren bbb4523b7c Added reference implementation for xCONVGEMM for half-precision 2018-09-07 22:04:08 +02:00
Cedric Nugteren c788e040f7 Added xCONVGEMM as im2col plus a batched GEMM kernel 2018-09-07 22:02:44 +02:00
Cedric Nugteren 23e855d643
Merge pull request #316 from ranocha/patch-1
Add Julia Wrapper
2018-09-03 18:03:54 +02:00
Hendrik Ranocha faed209f30
Add Julia Wrapper
I've written a wrapper of CLBlast in Julia which can be found [here](https://github.com/JuliaGPU/CLBlast.jl). It is published and available using the Julia package manager.
2018-09-03 15:57:16 +02:00
Cedric Nugteren c2c1e5fa95
Merge pull request #312 from CNugteren/CLBlast-311-missing-event-in-trsv-trsm
Missing events in TRSV and TRSM
2018-08-14 22:52:36 +02:00
Cedric Nugteren bf43dbb4ee Made last operation in TRSV and TRSM asynchronous, making the events not null 2018-08-13 22:58:44 +02:00
Cedric Nugteren 3115c15db5 Small refactoring of events in TRSV substitution routine 2018-08-13 22:58:01 +02:00
Cedric Nugteren dd1fa7cc81
Merge pull request #310 from CNugteren/CLBlast-307-netlib-api-static-opencl-vars
Netlib API with optional static OpenCL variables
2018-08-09 21:37:47 +02:00
Cedric Nugteren 9d9f09fce9 Name change of setting to NETLIB_PERSISTENT_OPENCL 2018-08-07 22:41:06 +02:00
Cedric Nugteren fe639455bd Added an option to compile the Netlib API with static OpenCL device and context 2018-08-05 21:12:39 +02:00
Cedric Nugteren 2bea758165
Merge pull request #309 from CNugteren/CLBlast-306-omatcopy-conjugate
Fixes bug in conjugate transpose not being executed
2018-08-02 08:35:32 +02:00
Cedric Nugteren bed10d2731
Merge pull request #308 from CNugteren/CLBlast-301-weird-AMD-Hainan-bug
Added workaround for AMD Southern Islands GPU issue
2018-07-31 21:49:53 +02:00
Cedric Nugteren 503ab74f02 Fixed issue with not performing complex conjugation under certain cases when transposing 2018-07-31 21:49:37 +02:00
Cedric Nugteren 391e5757bd Fixed the tests of OMATCOPY to include proper complex conjugation 2018-07-31 21:44:28 +02:00
Cedric Nugteren 713d0f96b3 Fixed an error reporting issue related to the canary region 2018-07-31 21:24:21 +02:00
Cedric Nugteren d749c4af72 Added note about AMD southern islands GPU issue and the required workaround 2018-07-31 20:55:56 +02:00
Cedric Nugteren 123f38a8ab Added Beignet 1.2.1 requirement to the README for IvyBridge GPUs 2018-07-31 20:52:00 +02:00
Cedric Nugteren bf24421a34 Updated the tuning results for Intel IvyBridge M GT2 2018-07-31 20:49:41 +02:00
Cedric Nugteren 38bdb248cd
Merge pull request #305 from CNugteren/CLBlast-303-tuner-check-local-size
Tuners now check for valid local thread size
2018-07-30 21:13:30 +02:00
Cedric Nugteren 2b76bfee97 Fixed a wrong event issue causing error -57 2018-07-29 22:16:27 +02:00
Cedric Nugteren 2dd539f911 Removed complex numbers support for CONVGEMM 2018-07-29 10:37:14 +02:00
Cedric Nugteren 5903820ba2 Merge branch 'master' into CLBlast-267-convgemm 2018-07-29 10:26:34 +02:00
Cedric Nugteren bc47e7e7cc Added print statements to indicate the 4 stages of GEMM tuning 2018-07-28 16:08:22 +02:00
Cedric Nugteren fa84ac36f2 The tuners now also check for valid local thread configurations and skip invalid ones completely, saving compilation time 2018-07-28 16:01:03 +02:00
Cedric Nugteren dda1e567f8
Merge pull request #304 from CNugteren/CLBlast-300-fix-staggered-indices-AMD-GEMMK1
Fix staggered indices on AMD GPUs for GEMMK == 1 kernel
2018-07-28 15:29:16 +02:00
Cedric Nugteren 0f0baa561b Disabled the use of staggered indices on AMD GPUs for the new GEMMK == 1 kernels to improve performance 2018-07-28 14:36:33 +02:00
Cedric Nugteren 03bed8633e Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel 2018-07-27 23:08:49 +02:00
Cedric Nugteren 429ff070f8 Fixed a bug: forgot to initialize the shared pointer for the null kernel 2018-07-27 20:53:24 +02:00
Cedric Nugteren f84036948b Renamed AMD SI workaround defines 2018-07-27 20:38:01 +02:00