Cedric Nugteren
6616a59774
Merge pull request #287 from CNugteren/apple-opencl-limitations-fixes
...
Apple opencl limitations for TRSV/TRSM now return not-implemented status
2018-05-27 20:54:27 +02:00
Cedric Nugteren
b3f6950af3
Merge pull request #286 from CNugteren/runtime_statistics_in_client
...
Runtime statistics in client
2018-05-27 19:27:52 +02:00
Cedric Nugteren
01d254c0b0
Added a check to return 'NotImplemented' error code in case of systems with < 16 LWGS for TSRV and TRSM
2018-05-27 18:38:47 +02:00
Cedric Nugteren
53198121ac
Made FillMatrix and FillVector functions take a configurable local workgroup size
2018-05-27 12:03:32 +02:00
Cedric Nugteren
38318fa39c
Added maximum time reporting to the client statistics
2018-05-27 11:39:51 +02:00
Cedric Nugteren
c85c385aaf
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
2018-05-23 22:36:38 +02:00
Cedric Nugteren
838422fbb1
Further implemented single-kernel approach of convgemm; extended test to capture other parts of the kernel code
2018-05-21 11:47:16 +02:00
Cedric Nugteren
5d87abf780
Added method selection option to switch between im2col and single-kernel approach for convgemm
2018-05-21 11:28:11 +02:00
Cedric Nugteren
8e28a7699d
Merge pull request #285 from CNugteren/size_specific_routine_tuner
...
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 21:06:14 +02:00
Cedric Nugteren
37cabd4f1f
Moved new convgemm kernel to levelx kernel folder
2018-05-19 21:05:45 +02:00
Cedric Nugteren
27b52ac2c8
Second version of direct reading from image tensor for convgemm: also with local memory support now
2018-05-19 21:02:44 +02:00
Cedric Nugteren
cbcd4ff7e8
Merge branch 'master' into CLBlast-267-convgemm
2018-05-19 17:54:27 +02:00
Cedric Nugteren
ba0b558e84
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 17:42:11 +02:00
Cedric Nugteren
507d7bc729
Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_disk
...
Routine tuners read kernel JSON from disk
2018-05-19 17:06:37 +02:00
Cedric Nugteren
76e0079a90
Fixed compilation issues
2018-05-19 14:18:23 +02:00
Cedric Nugteren
66583b3cda
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
2018-05-19 12:48:59 +02:00
Cedric Nugteren
637e49e134
Fixed a bug in loading xgemm-direct JSON data from disk
2018-05-19 12:48:04 +02:00
Cedric Nugteren
0326c7d559
Merge pull request #283 from CNugteren/canary_buffer_overflow_protection
...
Canary buffer overflow protection
2018-05-18 21:32:20 +02:00
Cedric Nugteren
60d057c7fd
Merge branch 'master' into canary_buffer_overflow_protection
2018-05-18 21:30:11 +02:00
Cedric Nugteren
a133563582
Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvements
...
Better cache behaviour of OpenCL programs
2018-05-17 20:26:53 +02:00
Cedric Nugteren
a65772cd30
Updated the roadmap
2018-05-17 12:52:23 +02:00
Cedric Nugteren
e3022e562f
Updated README with IWOCL talk and GPU zoo acknowledgment
2018-05-17 12:50:28 +02:00
Cedric Nugteren
ad57a45039
Added documentation on some details of the GEMM implementation
2018-05-17 12:50:03 +02:00
Cedric Nugteren
8290ad78b9
Fixed a few issues with canary region testing
2018-05-17 12:16:32 +02:00
Cedric Nugteren
85341836dd
Added a canary region for overflow detection to the correctness tests
2018-05-17 10:45:50 +01:00
Cedric Nugteren
b855af681f
Added a canary region for overflow detection to the tuners
2018-05-17 10:45:10 +01:00
Cedric Nugteren
e057a9186a
First version of direct reading from image tensor for convgemm: only for edge cases now
2018-05-17 09:23:28 +01:00
Cedric Nugteren
0cb9580042
Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm kernel
2018-05-13 22:10:21 +02:00
Cedric Nugteren
ad8f1027ab
Plugged in the code of strided-batched-gemm into convgemm in preparation of a new kernel
2018-05-13 21:01:46 +02:00
Cedric Nugteren
4e6d30088d
Changed temporary convgemm implementation to use batched-strided GEMM
2018-05-09 20:38:39 +02:00
Cedric Nugteren
b608280361
Fixed the performance client for convgemm and added GFLOPS measurements
2018-05-09 19:59:31 +02:00
Cedric Nugteren
fa4fee4fee
Merge pull request #279 from umar456/ci_links
...
Update ci links to use doman names and build names instead of IP/id
2018-05-09 18:56:25 +02:00
Cedric Nugteren
a4119531ee
Updated the documentation for convgemm to include data layout (NCHW)
2018-05-09 17:46:27 +02:00
Cedric Nugteren
cc95d4fa03
Implemented convolution as im2col + GEMM
2018-05-09 17:42:59 +02:00
Cedric Nugteren
52e6195628
Split channels/strides testing values off from kernel sizes for more flexibility
2018-05-09 17:23:55 +02:00
Umar Arshad
1659ae5432
Update ci links to use doman names and build names instead of IP/id
...
Updates the README badges to point to the domain name instead of
IP addresses. Also updates the names of the builds to the name
of the build instead of the id of the build.
2018-05-08 20:24:40 -04:00
Cedric Nugteren
2d1f6ba7fe
Added convgemm skeleton, test infrastructure, and first reference implementation
2018-05-06 11:35:34 +02:00
Cedric Nugteren
2776d76176
Added interface of batched convolution as GEMM
2018-05-05 14:06:33 +02:00
Cedric Nugteren
8b381480f8
Updated README with new badges and paper citation
2018-05-01 20:51:10 +02:00
Cedric Nugteren
8258321a74
Now stores a shared_ptr to the Program class in the cache
2018-05-01 20:34:48 +02:00
Cedric Nugteren
b2248a17ae
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
...
Intel subgroup shuffling
2018-04-29 15:48:35 +02:00
Cedric Nugteren
9f22bc232b
Updated the changelog
2018-04-29 15:06:44 +02:00
Cedric Nugteren
0022107b2a
Updated the roadmap
2018-04-29 15:06:33 +02:00
Cedric Nugteren
7b416c8686
Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
2018-04-26 21:10:17 +02:00
Cedric Nugteren
2965b87dda
Added Intel subgroup shuffle support to the 2D register caching GEMM kernel
2018-04-24 21:32:42 +02:00
Cedric Nugteren
2b1e0295e6
Added a define to enable subgroup shuffling if supported by the device
2018-04-24 20:41:15 +02:00
Cedric Nugteren
5d46a3193e
Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernel
...
Added 2D-register-caching GEMM kernel
2018-04-21 21:15:44 +02:00
Cedric Nugteren
3e3a26e0da
Fixes for the CUDA API
2018-04-20 21:50:36 +02:00
Cedric Nugteren
458e6717a9
Expressed HER2K as two HERK calls
2018-04-18 20:58:29 +02:00
Cedric Nugteren
dcce23d938
Expressed SYR2K as two SYRK calls
2018-04-18 20:29:28 +02:00