Cedric Nugteren
bd1715aff9
Fixes for CUDA version of CLBlast
2018-06-03 10:41:57 +02:00
Cedric Nugteren
4f594e3931
Added MKL as an alternative for CBLAS for correctness and performance comparisons
2018-06-02 17:57:45 +02:00
Cedric Nugteren
7c3431a72a
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present
2018-06-01 20:59:44 +02:00
Cedric Nugteren
5702bff5ad
Added error-checking for half-empty local work group sizes; fixed a minor TRSV global worksize issue
2018-05-31 22:37:06 +02:00
Cedric Nugteren
e609220393
Some potential fixes for error -54 when launching TRSV and TRSM kernels
2018-05-31 20:09:49 +02:00
Cedric Nugteren
ff4d5558a6
Widened Apple OpenCL check, added way to debug too-large-workgroups issue
2018-05-30 22:59:04 +02:00
Cedric Nugteren
a8bb0c9f3c
Added Apple OpenCL TRSV block size override; removed failing old Intel GPU test from README
2018-05-29 21:29:12 +02:00
Cedric Nugteren
6616a59774
Merge pull request #287 from CNugteren/apple-opencl-limitations-fixes
...
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27 20:54:27 +02:00
Cedric Nugteren
b3f6950af3
Merge pull request #286 from CNugteren/runtime_statistics_in_client
...
Runtime statistics in client
2018-05-27 19:27:52 +02:00
Cedric Nugteren
01d254c0b0
Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM
2018-05-27 18:38:47 +02:00
Cedric Nugteren
53198121ac
Made FillMatrix and FillVector functions take a configurable local workgroup size
2018-05-27 12:03:32 +02:00
Cedric Nugteren
38318fa39c
Added maximum time reporting to the client statistics
2018-05-27 11:39:51 +02:00
Cedric Nugteren
c85c385aaf
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
2018-05-23 22:36:38 +02:00
Cedric Nugteren
8e28a7699d
Merge pull request #285 from CNugteren/size_specific_routine_tuner
...
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 21:06:14 +02:00
Cedric Nugteren
ba0b558e84
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 17:42:11 +02:00
Cedric Nugteren
507d7bc729
Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_disk
...
Routine tuners read kernel JSON from disk
2018-05-19 17:06:37 +02:00
Cedric Nugteren
76e0079a90
Fixed compilation issues
2018-05-19 14:18:23 +02:00
Cedric Nugteren
66583b3cda
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
2018-05-19 12:48:59 +02:00
Cedric Nugteren
637e49e134
Fixed a bug in loading xgemm-direct JSON data from disk
2018-05-19 12:48:04 +02:00
Cedric Nugteren
0326c7d559
Merge pull request #283 from CNugteren/canary_buffer_overflow_protection
...
Canary buffer overflow protection
2018-05-18 21:32:20 +02:00
Cedric Nugteren
60d057c7fd
Merge branch 'master' into canary_buffer_overflow_protection
2018-05-18 21:30:11 +02:00
Cedric Nugteren
a133563582
Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvements
...
Better cache behaviour of OpenCL programs
2018-05-17 20:26:53 +02:00
Cedric Nugteren
a65772cd30
Updated the roadmap
2018-05-17 12:52:23 +02:00
Cedric Nugteren
e3022e562f
Updated README with IWOCL talk and GPU zoo acknowledgment
2018-05-17 12:50:28 +02:00
Cedric Nugteren
ad57a45039
Added documentation on some details of the GEMM implementation
2018-05-17 12:50:03 +02:00
Cedric Nugteren
8290ad78b9
Fixed a few issues with canary region testing
2018-05-17 12:16:32 +02:00
Cedric Nugteren
85341836dd
Added a canary region for overflow detection to the correctness tests
2018-05-17 10:45:50 +01:00
Cedric Nugteren
b855af681f
Added a canary region for overflow detection to the tuners
2018-05-17 10:45:10 +01:00
Cedric Nugteren
fa4fee4fee
Merge pull request #279 from umar456/ci_links
...
Update ci links to use doman names and build names instead of IP/id
2018-05-09 18:56:25 +02:00
Umar Arshad
1659ae5432
Update ci links to use doman names and build names instead of IP/id
...
Updates the README badges to point to the domain name instead of
IP addresses. Also updates the names of the builds to the name
of the build instead of the id of the build.
2018-05-08 20:24:40 -04:00
Cedric Nugteren
8b381480f8
Updated README with new badges and paper citation
2018-05-01 20:51:10 +02:00
Cedric Nugteren
8258321a74
Now stores a shared_ptr to the Program class in the cache
2018-05-01 20:34:48 +02:00
Cedric Nugteren
b2248a17ae
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
...
Intel subgroup shuffling
2018-04-29 15:48:35 +02:00
Cedric Nugteren
9f22bc232b
Updated the changelog
2018-04-29 15:06:44 +02:00
Cedric Nugteren
0022107b2a
Updated the roadmap
2018-04-29 15:06:33 +02:00
Cedric Nugteren
7b416c8686
Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
2018-04-26 21:10:17 +02:00
Cedric Nugteren
2965b87dda
Added Intel subgroup shuffle support to the 2D register caching GEMM kernel
2018-04-24 21:32:42 +02:00
Cedric Nugteren
2b1e0295e6
Added a define to enable subgroup shuffling if supported by the device
2018-04-24 20:41:15 +02:00
Cedric Nugteren
5d46a3193e
Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernel
...
Added 2D-register-caching GEMM kernel
2018-04-21 21:15:44 +02:00
Cedric Nugteren
3e3a26e0da
Fixes for the CUDA API
2018-04-20 21:50:36 +02:00
Cedric Nugteren
458e6717a9
Expressed HER2K as two HERK calls
2018-04-18 20:58:29 +02:00
Cedric Nugteren
dcce23d938
Expressed SYR2K as two SYRK calls
2018-04-18 20:29:28 +02:00
Cedric Nugteren
ef6b1207df
Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel
2018-04-17 21:13:28 +02:00
Cedric Nugteren
93610a9cba
Fixed some failing tests for GEMM and batched GEMM routines
2018-04-15 12:53:32 +02:00
Cedric Nugteren
f14e6f87d2
Updated tuning results for the Skylake ULT GT2 GPU with the new kernel
2018-04-15 11:45:45 +02:00
Cedric Nugteren
0dff7f1ac4
Made GEMM rotation expectations kernel-specific
2018-04-13 22:27:11 +02:00
Cedric Nugteren
0f49dd24e5
Updated database with defaults of GEMMK=0 and KREG=1
2018-04-10 21:26:18 +02:00
Cedric Nugteren
f6a48f05ed
Made it possible to add tuning parameters to the database using the script
2018-04-10 21:24:36 +02:00
Cedric Nugteren
3fbbb81137
Fixed a bug in the compression part of the database script
2018-04-10 21:18:11 +02:00
Cedric Nugteren
77ba11f686
Extended the maximum number of tuning parameters from 14 to 16
2018-04-08 18:12:54 +02:00