Cedric Nugteren
0f0baa561b
Disabled the use of staggered indices on AMD GPUs for the new GEMMK == 1 kernels to improve performance
2018-07-28 14:36:33 +02:00
Cedric Nugteren
03bed8633e
Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel
2018-07-27 23:08:49 +02:00
Cedric Nugteren
429ff070f8
Fixed a bug: forgot to initialize the shared pointer for the null kernel
2018-07-27 20:53:24 +02:00
Cedric Nugteren
f84036948b
Renamed AMD SI workaround defines
2018-07-27 20:38:01 +02:00
Cedric Nugteren
e8dea34fce
Added workaround for weird AMD SI Hainan bug
2018-07-25 22:59:36 +02:00
Cedric Nugteren
6a8b9e24f2
Added code to report the average tuning results
2018-07-25 22:28:44 +02:00
Cedric Nugteren
f8fb707fa4
Merge pull request #297 from tyler-utah/master
...
inline PTX to support subgroup shuffle for Nvidia GPUs
2018-07-23 19:43:03 +02:00
Tyler Sorensen
0772d63498
moved a two-line macro to a single line
2018-07-16 20:12:30 -04:00
Tyler Sorensen
f4e5b1c14c
forgot to add test cases back in, oops
2018-07-14 22:47:39 -04:00
Tyler Sorensen
7709a7308b
Applied feedback from Cedric from first pull request
2018-07-14 19:50:47 -04:00
Cedric Nugteren
db179a1e40
Updated to CLBlast version 1.4.1
2018-07-14 12:29:06 +02:00
Cedric Nugteren
f72620f474
Added tuning results for Intel i5-4970S
2018-07-13 21:25:21 +02:00
Cedric Nugteren
3621639b63
Added device-name removal code to handle POCL naming convention
2018-07-13 21:20:27 +02:00
Cedric Nugteren
08b1417956
Added tuning results for GeForce GTX 1070 Ti
2018-07-13 21:07:32 +02:00
Cedric Nugteren
c459582c4f
Added tuning results for HD Graphics 6000 Broadwell GT3
2018-07-13 21:05:43 +02:00
Tyler Sorensen
36093429fd
restored some of the changed tuning files for xgemm
2018-07-11 15:31:51 -04:00
Tyler Sorensen
7f2e98a140
added inline ptx to support shuffle on Nvidia GPUs
2018-07-11 15:12:22 -04:00
Cedric Nugteren
7bae54f61f
Updated changelog
2018-07-06 19:39:46 +02:00
Cedric Nugteren
49e06d20ab
Merge pull request #296 from alycm/CLBlast-291-eliminate-temporary-program
...
Eliminate a temporary Program object
2018-07-06 19:35:10 +02:00
Alastair Murray
25661b2d6f
Eliminate a temporary Program object
...
This was causing a crash for me because the temporary Program destructor called
clReleaseProgram on the cl_program with Program, and then clBuildProgram was
called on the same cl_program (belonging to the Program owned by the
shared_ptr, but it's the same cl_program).
2018-07-06 12:58:20 +01:00
Cedric Nugteren
43e3f27254
Merge pull request #295 from CNugteren/CLBlast-292-no-cl-program-release-windows
...
Disabled calls to clReleaseProgram under Windows
2018-06-28 21:22:12 +09:00
Cedric Nugteren
e3eedacbcc
Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first
2018-06-28 20:35:18 +09:00
Cedric Nugteren
1c9a741470
Merge branch 'master' into CLBlast-267-convgemm
2018-06-03 15:53:27 +02:00
Cedric Nugteren
4471b67735
Updated to CLBlast version 1.4.0
2018-06-03 13:18:05 +02:00
Cedric Nugteren
fee8df153c
Added list of tuners to be run by 'alltuners' target
2018-06-03 10:42:15 +02:00
Cedric Nugteren
bd1715aff9
Fixes for CUDA version of CLBlast
2018-06-03 10:41:57 +02:00
Cedric Nugteren
4f594e3931
Added MKL as an alternative for CBLAS for correctness and performance comparisons
2018-06-02 17:57:45 +02:00
Cedric Nugteren
7c3431a72a
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present
2018-06-01 20:59:44 +02:00
Cedric Nugteren
5702bff5ad
Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue
2018-05-31 22:37:06 +02:00
Cedric Nugteren
e609220393
Some potential fixes for error -54 when launching TRSV and TRSM kernels
2018-05-31 20:09:49 +02:00
Cedric Nugteren
ff4d5558a6
Widened Apple OpenCL check, added way to debug too-large-workgroups issue
2018-05-30 22:59:04 +02:00
Cedric Nugteren
a8bb0c9f3c
Added Apple OpenCL TRSV block size override; removed failing old Intel GPU test from README
2018-05-29 21:29:12 +02:00
Cedric Nugteren
6616a59774
Merge pull request #287 from CNugteren/apple-opencl-limitations-fixes
...
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27 20:54:27 +02:00
Cedric Nugteren
b3f6950af3
Merge pull request #286 from CNugteren/runtime_statistics_in_client
...
Runtime statistics in client
2018-05-27 19:27:52 +02:00
Cedric Nugteren
01d254c0b0
Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM
2018-05-27 18:38:47 +02:00
Cedric Nugteren
53198121ac
Made FillMatrix and FillVector functions take a configurable local workgroup size
2018-05-27 12:03:32 +02:00
Cedric Nugteren
38318fa39c
Added maximum time reporting to the client statistics
2018-05-27 11:39:51 +02:00
Cedric Nugteren
c85c385aaf
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
2018-05-23 22:36:38 +02:00
Cedric Nugteren
838422fbb1
Further implemented single-kernel approach of convgemm; extended test to capture other parts of the kernel code
2018-05-21 11:47:16 +02:00
Cedric Nugteren
5d87abf780
Added method selection option to switch between im2col and single-kernel approach for convgemm
2018-05-21 11:28:11 +02:00
Cedric Nugteren
8e28a7699d
Merge pull request #285 from CNugteren/size_specific_routine_tuner
...
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 21:06:14 +02:00
Cedric Nugteren
37cabd4f1f
Moved new convgemm kernel to levelx kernel folder
2018-05-19 21:05:45 +02:00
Cedric Nugteren
27b52ac2c8
Second version of direct reading from image tensor for convgemm: also with local memory support now
2018-05-19 21:02:44 +02:00
Cedric Nugteren
cbcd4ff7e8
Merge branch 'master' into CLBlast-267-convgemm
2018-05-19 17:54:27 +02:00
Cedric Nugteren
ba0b558e84
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 17:42:11 +02:00
Cedric Nugteren
507d7bc729
Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_disk
...
Routine tuners read kernel JSON from disk
2018-05-19 17:06:37 +02:00
Cedric Nugteren
76e0079a90
Fixed compilation issues
2018-05-19 14:18:23 +02:00
Cedric Nugteren
66583b3cda
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
2018-05-19 12:48:59 +02:00
Cedric Nugteren
637e49e134
Fixed a bug in loading xgemm-direct JSON data from disk
2018-05-19 12:48:04 +02:00
Cedric Nugteren
0326c7d559
Merge pull request #283 from CNugteren/canary_buffer_overflow_protection
...
Canary buffer overflow protection
2018-05-18 21:32:20 +02:00