Cedric Nugteren
9a1454496d
Fixed a bug with the pre-processing and the AXPY kernel
2018-10-17 21:15:53 +02:00
Cedric Nugteren
664a238adf
Fixed a bug in the XaxpyFaster kernel for specific parameters
2018-10-15 20:08:29 +02:00
Cedric Nugteren
634b2bc75c
Merge pull request #319 from CNugteren/convgemm_multi_kernel
...
First im2col+GEMM implementation of convolution
2018-10-14 17:27:45 +02:00
Cedric Nugteren
46c50cdd7e
Made tuning API more flexible: disregards any extra parameter values
2018-10-13 17:47:29 +02:00
Cedric Nugteren
1736c0cef4
Fixed pre-processor warnings related to the subgroup shuffling
2018-10-10 19:12:42 +02:00
Cedric Nugteren
83ba3d4b7b
Merge branch 'master' into convgemm_multi_kernel
2018-09-16 20:01:18 +02:00
Cedric Nugteren
0f6dd01e51
Fixed an MSVC compilation error due to large strings
2018-09-15 19:58:07 +02:00
Cedric Nugteren
9bedaa752d
Fixed an MSVC compilation error due to large strings
2018-09-15 17:35:26 +02:00
Cedric Nugteren
8ac39fa331
Disabled Intel subgroup shuffling for double-precision
2018-09-15 16:53:09 +02:00
Cedric Nugteren
51cc346751
Fixed issues with GEMMK=1 kernel and the pre-processor
2018-09-15 16:50:34 +02:00
Cedric Nugteren
c788e040f7
Added xCONVGEMM as im2col plus a batched GEMM kernel
2018-09-07 22:02:44 +02:00
Cedric Nugteren
bf43dbb4ee
Made last operation in TRSV and TRSM asynchronous, making the events not null
2018-08-13 22:58:44 +02:00
Cedric Nugteren
3115c15db5
Small refactoring of events in TRSV substitution routine
2018-08-13 22:58:01 +02:00
Cedric Nugteren
9d9f09fce9
Name change of setting to NETLIB_PERSISTENT_OPENCL
2018-08-07 22:41:06 +02:00
Cedric Nugteren
fe639455bd
Added an option to compile the Netlib API with static OpenCL device and context
2018-08-05 21:12:39 +02:00
Cedric Nugteren
2bea758165
Merge pull request #309 from CNugteren/CLBlast-306-omatcopy-conjugate
...
Fixes bug in conjugate transpose not being executed
2018-08-02 08:35:32 +02:00
Cedric Nugteren
bed10d2731
Merge pull request #308 from CNugteren/CLBlast-301-weird-AMD-Hainan-bug
...
Added workaround for AMD Southern Islands GPU issue
2018-07-31 21:49:53 +02:00
Cedric Nugteren
503ab74f02
Fixed issue with not performing complex conjugation under certain cases when transposing
2018-07-31 21:49:37 +02:00
Cedric Nugteren
bf24421a34
Updated the tuning results for Intel IvyBridge M GT2
2018-07-31 20:49:41 +02:00
Cedric Nugteren
2b76bfee97
Fixed a wrong event issue causing error -57
2018-07-29 22:16:27 +02:00
Cedric Nugteren
2dd539f911
Removed complex numbers support for CONVGEMM
2018-07-29 10:37:14 +02:00
Cedric Nugteren
5903820ba2
Merge branch 'master' into CLBlast-267-convgemm
2018-07-29 10:26:34 +02:00
Cedric Nugteren
bc47e7e7cc
Added print statements to indicate the 4 stages of GEMM tuning
2018-07-28 16:08:22 +02:00
Cedric Nugteren
fa84ac36f2
The tuners now also check for valid local thread configurations and skip invalid ones completely, saving compilation time
2018-07-28 16:01:03 +02:00
Cedric Nugteren
0f0baa561b
Disabled the use of staggered indices on AMD GPUs for the new GEMMK == 1 kernels to improve performance
2018-07-28 14:36:33 +02:00
Cedric Nugteren
03bed8633e
Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel
2018-07-27 23:08:49 +02:00
Cedric Nugteren
429ff070f8
Fixed a bug: forgot to initialize the shared pointer for the null kernel
2018-07-27 20:53:24 +02:00
Cedric Nugteren
f84036948b
Renamed AMD SI workaround defines
2018-07-27 20:38:01 +02:00
Cedric Nugteren
e8dea34fce
Added workaround for weird AMD SI Hainan bug
2018-07-25 22:59:36 +02:00
Cedric Nugteren
6a8b9e24f2
Added code to report the average tuning results
2018-07-25 22:28:44 +02:00
Cedric Nugteren
f8fb707fa4
Merge pull request #297 from tyler-utah/master
...
inline PTX to support subgroup shuffle for Nvidia GPUs
2018-07-23 19:43:03 +02:00
Tyler Sorensen
0772d63498
moved a two-line macro to a single line
2018-07-16 20:12:30 -04:00
Tyler Sorensen
f4e5b1c14c
forgot to add test cases back in, oops
2018-07-14 22:47:39 -04:00
Tyler Sorensen
7709a7308b
Applied feedback from Cedric from first pull request
2018-07-14 19:50:47 -04:00
Cedric Nugteren
f72620f474
Added tuning results for Intel i5-4970S
2018-07-13 21:25:21 +02:00
Cedric Nugteren
3621639b63
Added device-name removal code to handle POCL naming convention
2018-07-13 21:20:27 +02:00
Cedric Nugteren
08b1417956
Added tuning results for GeForce GTX 1070 Ti
2018-07-13 21:07:32 +02:00
Cedric Nugteren
c459582c4f
Added tuning results for HD Graphics 6000 Broadwell GT3
2018-07-13 21:05:43 +02:00
Tyler Sorensen
36093429fd
restored some of the changed tuning files for xgemm
2018-07-11 15:31:51 -04:00
Tyler Sorensen
7f2e98a140
added inline ptx to support shuffle on Nvidia GPUs
2018-07-11 15:12:22 -04:00
Alastair Murray
25661b2d6f
Eliminate a temporary Program object
...
This was causing a crash for me because the temporary Program destructor called
clReleaseProgram on the cl_program with Program, and then clBuildProgram was
called on the same cl_program (belonging to the Program owned by the
shared_ptr, but it's the same cl_program).
2018-07-06 12:58:20 +01:00
Cedric Nugteren
e3eedacbcc
Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first
2018-06-28 20:35:18 +09:00
Cedric Nugteren
1c9a741470
Merge branch 'master' into CLBlast-267-convgemm
2018-06-03 15:53:27 +02:00
Cedric Nugteren
bd1715aff9
Fixes for CUDA version of CLBlast
2018-06-03 10:41:57 +02:00
Cedric Nugteren
7c3431a72a
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present
2018-06-01 20:59:44 +02:00
Cedric Nugteren
5702bff5ad
Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue
2018-05-31 22:37:06 +02:00
Cedric Nugteren
e609220393
Some potential fixes for error -54 when launching TRSV and TRSM kernels
2018-05-31 20:09:49 +02:00
Cedric Nugteren
ff4d5558a6
Widened Apple OpenCL check, added way to debug too-large-workgroups issue
2018-05-30 22:59:04 +02:00
Cedric Nugteren
a8bb0c9f3c
Added Apple OpenCL TRSV block size override; removed failing old Intel GPU test from README
2018-05-29 21:29:12 +02:00
Cedric Nugteren
6616a59774
Merge pull request #287 from CNugteren/apple-opencl-limitations-fixes
...
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27 20:54:27 +02:00