Cedric Nugteren
c459582c4f
Added tuning results for HD Graphics 6000 Broadwell GT3
2018-07-13 21:05:43 +02:00
Cedric Nugteren
7bae54f61f
Updated changelog
2018-07-06 19:39:46 +02:00
Cedric Nugteren
49e06d20ab
Merge pull request #296 from alycm/CLBlast-291-eliminate-temporary-program
...
Eliminate a temporary Program object
2018-07-06 19:35:10 +02:00
Alastair Murray
25661b2d6f
Eliminate a temporary Program object
...
This was causing a crash for me because the temporary Program destructor called
clReleaseProgram on the cl_program with Program, and then clBuildProgram was
called on the same cl_program (belonging to the Program owned by the
shared_ptr, but it's the same cl_program).
2018-07-06 12:58:20 +01:00
Cedric Nugteren
43e3f27254
Merge pull request #295 from CNugteren/CLBlast-292-no-cl-program-release-windows
...
Disabled calls to clReleaseProgram under Windows
2018-06-28 21:22:12 +09:00
Cedric Nugteren
e3eedacbcc
Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first
2018-06-28 20:35:18 +09:00
Cedric Nugteren
4471b67735
Updated to CLBlast version 1.4.0
2018-06-03 13:18:05 +02:00
Cedric Nugteren
fee8df153c
Added list of tuners to be run by 'alltuners' target
2018-06-03 10:42:15 +02:00
Cedric Nugteren
bd1715aff9
Fixes for CUDA version of CLBlast
2018-06-03 10:41:57 +02:00
Cedric Nugteren
4f594e3931
Added MKL as an alternative for CBLAS for correctness and performance comparisons
2018-06-02 17:57:45 +02:00
Cedric Nugteren
7c3431a72a
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present
2018-06-01 20:59:44 +02:00
Cedric Nugteren
5702bff5ad
Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue
2018-05-31 22:37:06 +02:00
Cedric Nugteren
e609220393
Some potential fixes for error -54 when launching TRSV and TRSM kernels
2018-05-31 20:09:49 +02:00
Cedric Nugteren
ff4d5558a6
Widened Apple OpenCL check, added way to debug too-large-workgroups issue
2018-05-30 22:59:04 +02:00
Cedric Nugteren
a8bb0c9f3c
Added Apple OpenCL TRSV block size override; removed failing old Intel GPU test from README
2018-05-29 21:29:12 +02:00
Cedric Nugteren
6616a59774
Merge pull request #287 from CNugteren/apple-opencl-limitations-fixes
...
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27 20:54:27 +02:00
Cedric Nugteren
b3f6950af3
Merge pull request #286 from CNugteren/runtime_statistics_in_client
...
Runtime statistics in client
2018-05-27 19:27:52 +02:00
Cedric Nugteren
01d254c0b0
Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM
2018-05-27 18:38:47 +02:00
Cedric Nugteren
53198121ac
Made FillMatrix and FillVector functions take a configurable local workgroup size
2018-05-27 12:03:32 +02:00
Cedric Nugteren
38318fa39c
Added maximum time reporting to the client statistics
2018-05-27 11:39:51 +02:00
Cedric Nugteren
c85c385aaf
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
2018-05-23 22:36:38 +02:00
Cedric Nugteren
8e28a7699d
Merge pull request #285 from CNugteren/size_specific_routine_tuner
...
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 21:06:14 +02:00
Cedric Nugteren
ba0b558e84
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 17:42:11 +02:00
Cedric Nugteren
507d7bc729
Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_disk
...
Routine tuners read kernel JSON from disk
2018-05-19 17:06:37 +02:00
Cedric Nugteren
76e0079a90
Fixed compilation issues
2018-05-19 14:18:23 +02:00
Cedric Nugteren
66583b3cda
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
2018-05-19 12:48:59 +02:00
Cedric Nugteren
637e49e134
Fixed a bug in loading xgemm-direct JSON data from disk
2018-05-19 12:48:04 +02:00
Cedric Nugteren
0326c7d559
Merge pull request #283 from CNugteren/canary_buffer_overflow_protection
...
Canary buffer overflow protection
2018-05-18 21:32:20 +02:00
Cedric Nugteren
60d057c7fd
Merge branch 'master' into canary_buffer_overflow_protection
2018-05-18 21:30:11 +02:00
Cedric Nugteren
a133563582
Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvements
...
Better cache behaviour of OpenCL programs
2018-05-17 20:26:53 +02:00
Cedric Nugteren
a65772cd30
Updated the roadmap
2018-05-17 12:52:23 +02:00
Cedric Nugteren
e3022e562f
Updated README with IWOCL talk and GPU zoo acknowledgment
2018-05-17 12:50:28 +02:00
Cedric Nugteren
ad57a45039
Added documentation on some details of the GEMM implementation
2018-05-17 12:50:03 +02:00
Cedric Nugteren
8290ad78b9
Fixed a few issues with canary region testing
2018-05-17 12:16:32 +02:00
Cedric Nugteren
85341836dd
Added a canary region for overflow detection to the correctness tests
2018-05-17 10:45:50 +01:00
Cedric Nugteren
b855af681f
Added a canary region for overflow detection to the tuners
2018-05-17 10:45:10 +01:00
Cedric Nugteren
fa4fee4fee
Merge pull request #279 from umar456/ci_links
...
Update ci links to use doman names and build names instead of IP/id
2018-05-09 18:56:25 +02:00
Umar Arshad
1659ae5432
Update ci links to use doman names and build names instead of IP/id
...
Updates the README badges to point to the domain name instead of
IP addresses. Also updates the names of the builds to the name
of the build instead of the id of the build.
2018-05-08 20:24:40 -04:00
Cedric Nugteren
8b381480f8
Updated README with new badges and paper citation
2018-05-01 20:51:10 +02:00
Cedric Nugteren
8258321a74
Now stores a shared_ptr to the Program class in the cache
2018-05-01 20:34:48 +02:00
Cedric Nugteren
b2248a17ae
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
...
Intel subgroup shuffling
2018-04-29 15:48:35 +02:00
Cedric Nugteren
9f22bc232b
Updated the changelog
2018-04-29 15:06:44 +02:00
Cedric Nugteren
0022107b2a
Updated the roadmap
2018-04-29 15:06:33 +02:00
Cedric Nugteren
7b416c8686
Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
2018-04-26 21:10:17 +02:00
Cedric Nugteren
2965b87dda
Added Intel subgroup shuffle support to the 2D register caching GEMM kernel
2018-04-24 21:32:42 +02:00
Cedric Nugteren
2b1e0295e6
Added a define to enable subgroup shuffling if supported by the device
2018-04-24 20:41:15 +02:00
Cedric Nugteren
5d46a3193e
Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernel
...
Added 2D-register-caching GEMM kernel
2018-04-21 21:15:44 +02:00
Cedric Nugteren
3e3a26e0da
Fixes for the CUDA API
2018-04-20 21:50:36 +02:00
Cedric Nugteren
458e6717a9
Expressed HER2K as two HERK calls
2018-04-18 20:58:29 +02:00
Cedric Nugteren
dcce23d938
Expressed SYR2K as two SYRK calls
2018-04-18 20:29:28 +02:00