Cedric Nugteren
b3f6950af3
Merge pull request #286 from CNugteren/runtime_statistics_in_client
...
Runtime statistics in client
2018-05-27 19:27:52 +02:00
Cedric Nugteren
38318fa39c
Added maximum time reporting to the client statistics
2018-05-27 11:39:51 +02:00
Cedric Nugteren
c85c385aaf
Added an option in the clients to output timing statistics: minimum, mean, and standard-deviation
2018-05-23 22:36:38 +02:00
Cedric Nugteren
8e28a7699d
Merge pull request #285 from CNugteren/size_specific_routine_tuner
...
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 21:06:14 +02:00
Cedric Nugteren
ba0b558e84
Added an option to run the routine tuner for a single specific GEMM size
2018-05-19 17:42:11 +02:00
Cedric Nugteren
507d7bc729
Merge pull request #284 from CNugteren/routine_tuners_read_kernel_json_from_disk
...
Routine tuners read kernel JSON from disk
2018-05-19 17:06:37 +02:00
Cedric Nugteren
76e0079a90
Fixed compilation issues
2018-05-19 14:18:23 +02:00
Cedric Nugteren
66583b3cda
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
2018-05-19 12:48:59 +02:00
Cedric Nugteren
637e49e134
Fixed a bug in loading xgemm-direct JSON data from disk
2018-05-19 12:48:04 +02:00
Cedric Nugteren
0326c7d559
Merge pull request #283 from CNugteren/canary_buffer_overflow_protection
...
Canary buffer overflow protection
2018-05-18 21:32:20 +02:00
Cedric Nugteren
60d057c7fd
Merge branch 'master' into canary_buffer_overflow_protection
2018-05-18 21:30:11 +02:00
Cedric Nugteren
a133563582
Merge pull request #282 from CNugteren/CLBlast-276-program-release-improvements
...
Better cache behaviour of OpenCL programs
2018-05-17 20:26:53 +02:00
Cedric Nugteren
a65772cd30
Updated the roadmap
2018-05-17 12:52:23 +02:00
Cedric Nugteren
e3022e562f
Updated README with IWOCL talk and GPU zoo acknowledgment
2018-05-17 12:50:28 +02:00
Cedric Nugteren
ad57a45039
Added documentation on some details of the GEMM implementation
2018-05-17 12:50:03 +02:00
Cedric Nugteren
8290ad78b9
Fixed a few issues with canary region testing
2018-05-17 12:16:32 +02:00
Cedric Nugteren
85341836dd
Added a canary region for overflow detection to the correctness tests
2018-05-17 10:45:50 +01:00
Cedric Nugteren
b855af681f
Added a canary region for overflow detection to the tuners
2018-05-17 10:45:10 +01:00
Cedric Nugteren
fa4fee4fee
Merge pull request #279 from umar456/ci_links
...
Update ci links to use doman names and build names instead of IP/id
2018-05-09 18:56:25 +02:00
Umar Arshad
1659ae5432
Update ci links to use doman names and build names instead of IP/id
...
Updates the README badges to point to the domain name instead of
IP addresses. Also updates the names of the builds to the name
of the build instead of the id of the build.
2018-05-08 20:24:40 -04:00
Cedric Nugteren
8b381480f8
Updated README with new badges and paper citation
2018-05-01 20:51:10 +02:00
Cedric Nugteren
8258321a74
Now stores a shared_ptr to the Program class in the cache
2018-05-01 20:34:48 +02:00
Cedric Nugteren
b2248a17ae
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
...
Intel subgroup shuffling
2018-04-29 15:48:35 +02:00
Cedric Nugteren
9f22bc232b
Updated the changelog
2018-04-29 15:06:44 +02:00
Cedric Nugteren
0022107b2a
Updated the roadmap
2018-04-29 15:06:33 +02:00
Cedric Nugteren
7b416c8686
Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
2018-04-26 21:10:17 +02:00
Cedric Nugteren
2965b87dda
Added Intel subgroup shuffle support to the 2D register caching GEMM kernel
2018-04-24 21:32:42 +02:00
Cedric Nugteren
2b1e0295e6
Added a define to enable subgroup shuffling if supported by the device
2018-04-24 20:41:15 +02:00
Cedric Nugteren
5d46a3193e
Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernel
...
Added 2D-register-caching GEMM kernel
2018-04-21 21:15:44 +02:00
Cedric Nugteren
3e3a26e0da
Fixes for the CUDA API
2018-04-20 21:50:36 +02:00
Cedric Nugteren
458e6717a9
Expressed HER2K as two HERK calls
2018-04-18 20:58:29 +02:00
Cedric Nugteren
dcce23d938
Expressed SYR2K as two SYRK calls
2018-04-18 20:29:28 +02:00
Cedric Nugteren
ef6b1207df
Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel
2018-04-17 21:13:28 +02:00
Cedric Nugteren
93610a9cba
Fixed some failing tests for GEMM and batched GEMM routines
2018-04-15 12:53:32 +02:00
Cedric Nugteren
f14e6f87d2
Updated tuning results for the Skylake ULT GT2 GPU with the new kernel
2018-04-15 11:45:45 +02:00
Cedric Nugteren
0dff7f1ac4
Made GEMM rotation expectations kernel-specific
2018-04-13 22:27:11 +02:00
Cedric Nugteren
0f49dd24e5
Updated database with defaults of GEMMK=0 and KREG=1
2018-04-10 21:26:18 +02:00
Cedric Nugteren
f6a48f05ed
Made it possible to add tuning parameters to the database using the script
2018-04-10 21:24:36 +02:00
Cedric Nugteren
3fbbb81137
Fixed a bug in the compression part of the database script
2018-04-10 21:18:11 +02:00
Cedric Nugteren
77ba11f686
Extended the maximum number of tuning parameters from 14 to 16
2018-04-08 18:12:54 +02:00
Cedric Nugteren
a93fec1026
Fixed issues with the pre-processor
2018-04-08 18:02:44 +02:00
Cedric Nugteren
7cbc6b7495
Merge branch 'master' into CLBlast-228-2d-register-gemm-kernel
2018-04-07 17:51:40 +02:00
Cedric Nugteren
16f7f49683
Added tuning results for NVIDIA GeForce 970
2018-04-07 17:48:25 +02:00
Cedric Nugteren
9596e46d01
Added tuning results for NVIDIA GeForce 920MX
2018-04-07 17:44:32 +02:00
Cedric Nugteren
cf7965dc68
Fixed a python3 import error issue with the database script
2018-04-07 17:40:43 +02:00
Cedric Nugteren
048fe90e57
Added tuning results for Intel HD Graphics 620
2018-04-07 17:33:57 +02:00
Cedric Nugteren
3519d32ac4
Extended the GEMM tuner to be able to tune the new 'kernel 1'
2018-04-07 17:05:44 +02:00
Cedric Nugteren
381f1fe67a
Fixed a compilation issue for complex datatypes and vload
2018-04-07 16:57:36 +02:00
Cedric Nugteren
2a29dc061c
Fixed a compilation issue for complex datatypes and vload
2018-04-06 21:06:13 +02:00
Cedric Nugteren
eae25f5727
Added first version of 2D register tiling kernel with A and C transposed as well
2018-04-03 21:18:40 +02:00