Cedric Nugteren
03bed8633e
Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel
2018-07-27 23:08:49 +02:00
Cedric Nugteren
6a8b9e24f2
Added code to report the average tuning results
2018-07-25 22:28:44 +02:00
Cedric Nugteren
f8fb707fa4
Merge pull request #297 from tyler-utah/master
...
inline PTX to support subgroup shuffle for Nvidia GPUs
2018-07-23 19:43:03 +02:00
Tyler Sorensen
0772d63498
moved a two-line macro to a single line
2018-07-16 20:12:30 -04:00
Tyler Sorensen
f4e5b1c14c
forgot to add test cases back in, oops
2018-07-14 22:47:39 -04:00
Tyler Sorensen
7709a7308b
Applied feedback from Cedric from first pull request
2018-07-14 19:50:47 -04:00
Cedric Nugteren
db179a1e40
Updated to CLBlast version 1.4.1
2018-07-14 12:29:06 +02:00
Cedric Nugteren
f72620f474
Added tuning results for Intel i5-4970S
2018-07-13 21:25:21 +02:00
Cedric Nugteren
3621639b63
Added device-name removal code to handle POCL naming convention
2018-07-13 21:20:27 +02:00
Cedric Nugteren
08b1417956
Added tuning results for GeForce GTX 1070 Ti
2018-07-13 21:07:32 +02:00
Cedric Nugteren
c459582c4f
Added tuning results for HD Graphics 6000 Broadwell GT3
2018-07-13 21:05:43 +02:00
Tyler Sorensen
36093429fd
restored some of the changed tuning files for xgemm
2018-07-11 15:31:51 -04:00
Tyler Sorensen
7f2e98a140
added inline ptx to support shuffle on Nvidia GPUs
2018-07-11 15:12:22 -04:00
Cedric Nugteren
7bae54f61f
Updated changelog
2018-07-06 19:39:46 +02:00
Cedric Nugteren
49e06d20ab
Merge pull request #296 from alycm/CLBlast-291-eliminate-temporary-program
...
Eliminate a temporary Program object
2018-07-06 19:35:10 +02:00
Alastair Murray
25661b2d6f
Eliminate a temporary Program object
...
This was causing a crash for me because the temporary Program destructor called
clReleaseProgram on the cl_program with Program, and then clBuildProgram was
called on the same cl_program (belonging to the Program owned by the
shared_ptr, but it's the same cl_program).
2018-07-06 12:58:20 +01:00
Cedric Nugteren
43e3f27254
Merge pull request #295 from CNugteren/CLBlast-292-no-cl-program-release-windows
...
Disabled calls to clReleaseProgram under Windows
2018-06-28 21:22:12 +09:00
Cedric Nugteren
e3eedacbcc
Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first
2018-06-28 20:35:18 +09:00
Cedric Nugteren
4471b67735
Updated to CLBlast version 1.4.0
2018-06-03 13:18:05 +02:00
Cedric Nugteren
fee8df153c
Added list of tuners to be run by 'alltuners' target
2018-06-03 10:42:15 +02:00
Cedric Nugteren
bd1715aff9
Fixes for CUDA version of CLBlast
2018-06-03 10:41:57 +02:00
Cedric Nugteren
4f594e3931
Added MKL as an alternative for CBLAS for correctness and performance comparisons
2018-06-02 17:57:45 +02:00
Cedric Nugteren
7c3431a72a
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present
2018-06-01 20:59:44 +02:00
Cedric Nugteren
5702bff5ad
Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue
2018-05-31 22:37:06 +02:00
Cedric Nugteren
e609220393
Some potential fixes for error -54 when launching TRSV and TRSM kernels
2018-05-31 20:09:49 +02:00
Cedric Nugteren
ff4d5558a6
Widened Apple OpenCL check, added way to debug too-large-workgroups issue
2018-05-30 22:59:04 +02:00
Cedric Nugteren
a8bb0c9f3c
Added Apple OpenCL TRSV block size override; removed failing old Intel GPU test from README
2018-05-29 21:29:12 +02:00
Cedric Nugteren
6616a59774
Merge pull request #287 from CNugteren/apple-opencl-limitations-fixes
...
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27 20:54:27 +02:00
Cedric Nugteren
b3f6950af3
Merge pull request #286 from CNugteren/runtime_statistics_in_client
...
Runtime statistics in client
2018-05-27 19:27:52 +02:00
Cedric Nugteren
01d254c0b0
Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM
2018-05-27 18:38:47 +02:00
Cedric Nugteren
53198121ac
Made FillMatrix and FillVector functions take a configurable local workgroup size
2018-05-27 12:03:32 +02:00
Cedric Nugteren
38318fa39c
Added maximum time reporting to the client statistics
2018-05-27 11:39:51 +02:00
Cedric Nugteren
c85c385aaf
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
2018-05-23 22:36:38 +02:00
Cedric Nugteren
8e28a7699d
Merge pull request #285 from CNugteren/size_specific_routine_tuner
...
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 21:06:14 +02:00
Cedric Nugteren
ba0b558e84
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 17:42:11 +02:00
Cedric Nugteren
507d7bc729
Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_disk
...
Routine tuners read kernel JSON from disk
2018-05-19 17:06:37 +02:00
Cedric Nugteren
76e0079a90
Fixed compilation issues
2018-05-19 14:18:23 +02:00
Cedric Nugteren
66583b3cda
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
2018-05-19 12:48:59 +02:00
Cedric Nugteren
637e49e134
Fixed a bug in loading xgemm-direct JSON data from disk
2018-05-19 12:48:04 +02:00
Cedric Nugteren
0326c7d559
Merge pull request #283 from CNugteren/canary_buffer_overflow_protection
...
Canary buffer overflow protection
2018-05-18 21:32:20 +02:00
Cedric Nugteren
60d057c7fd
Merge branch 'master' into canary_buffer_overflow_protection
2018-05-18 21:30:11 +02:00
Cedric Nugteren
a133563582
Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvements
...
Better cache behaviour of OpenCL programs
2018-05-17 20:26:53 +02:00
Cedric Nugteren
a65772cd30
Updated the roadmap
2018-05-17 12:52:23 +02:00
Cedric Nugteren
e3022e562f
Updated README with IWOCL talk and GPU zoo acknowledgment
2018-05-17 12:50:28 +02:00
Cedric Nugteren
ad57a45039
Added documentation on some details of the GEMM implementation
2018-05-17 12:50:03 +02:00
Cedric Nugteren
8290ad78b9
Fixed a few issues with canary region testing
2018-05-17 12:16:32 +02:00
Cedric Nugteren
85341836dd
Added a canary region for overflow detection to the correctness tests
2018-05-17 10:45:50 +01:00
Cedric Nugteren
b855af681f
Added a canary region for overflow detection to the tuners
2018-05-17 10:45:10 +01:00
Cedric Nugteren
fa4fee4fee
Merge pull request #279 from umar456/ci_links
...
Update ci links to use doman names and build names instead of IP/id
2018-05-09 18:56:25 +02:00
Umar Arshad
1659ae5432
Update ci links to use doman names and build names instead of IP/id
...
Updates the README badges to point to the domain name instead of
IP addresses. Also updates the names of the builds to the name
of the build instead of the id of the build.
2018-05-08 20:24:40 -04:00