Cedric Nugteren
|
5d87abf780
|
Added method selection option to switch between im2col and single-kernel approach for convgemm
|
2018-05-21 11:28:11 +02:00 |
Cedric Nugteren
|
37cabd4f1f
|
Moved new convgemm kernel to levelx kernel folder
|
2018-05-19 21:05:45 +02:00 |
Cedric Nugteren
|
27b52ac2c8
|
Second version of direct reading from image tensor for convgemm: also with local memory support now
|
2018-05-19 21:02:44 +02:00 |
Cedric Nugteren
|
cbcd4ff7e8
|
Merge branch 'master' into CLBlast-267-convgemm
|
2018-05-19 17:54:27 +02:00 |
Cedric Nugteren
|
ba0b558e84
|
Added an option to run the routine tuner for a single specific GEMM size
|
2018-05-19 17:42:11 +02:00 |
Cedric Nugteren
|
76e0079a90
|
Fixed compilation issues
|
2018-05-19 14:18:23 +02:00 |
Cedric Nugteren
|
66583b3cda
|
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
|
2018-05-19 12:48:59 +02:00 |
Cedric Nugteren
|
60d057c7fd
|
Merge branch 'master' into canary_buffer_overflow_protection
|
2018-05-18 21:30:11 +02:00 |
Cedric Nugteren
|
b855af681f
|
Added a canary region for overflow detection to the tuners
|
2018-05-17 10:45:10 +01:00 |
Cedric Nugteren
|
e057a9186a
|
First version of direct reading from image tensor for convgemm: only for edge cases now
|
2018-05-17 09:23:28 +01:00 |
Cedric Nugteren
|
0cb9580042
|
Created a dedicated convgemm GEMM kernel as a copy of the batched direct gemm kernel
|
2018-05-13 22:10:21 +02:00 |
Cedric Nugteren
|
ad8f1027ab
|
Plugged in the code of strided-batched-gemm into convgemm in preparation of a new kernel
|
2018-05-13 21:01:46 +02:00 |
Cedric Nugteren
|
4e6d30088d
|
Changed temporary convgemm implementation to use batched-strided GEMM
|
2018-05-09 20:38:39 +02:00 |
Cedric Nugteren
|
cc95d4fa03
|
Implemented convolution as im2col + GEMM
|
2018-05-09 17:42:59 +02:00 |
Cedric Nugteren
|
2d1f6ba7fe
|
Added convgemm skeleton, test infrastructure, and first reference implementation
|
2018-05-06 11:35:34 +02:00 |
Cedric Nugteren
|
2776d76176
|
Added interface of batched convolution as GEMM
|
2018-05-05 14:06:33 +02:00 |
Cedric Nugteren
|
8258321a74
|
Now stores a shared_ptr to the Program class in the cache
|
2018-05-01 20:34:48 +02:00 |
Cedric Nugteren
|
b2248a17ae
|
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
Intel subgroup shuffling
|
2018-04-29 15:48:35 +02:00 |
Cedric Nugteren
|
7b416c8686
|
Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
|
2018-04-26 21:10:17 +02:00 |
Cedric Nugteren
|
2965b87dda
|
Added Intel subgroup shuffle support to the 2D register caching GEMM kernel
|
2018-04-24 21:32:42 +02:00 |
Cedric Nugteren
|
2b1e0295e6
|
Added a define to enable subgroup shuffling if supported by the device
|
2018-04-24 20:41:15 +02:00 |
Cedric Nugteren
|
3e3a26e0da
|
Fixes for the CUDA API
|
2018-04-20 21:50:36 +02:00 |
Cedric Nugteren
|
458e6717a9
|
Expressed HER2K as two HERK calls
|
2018-04-18 20:58:29 +02:00 |
Cedric Nugteren
|
dcce23d938
|
Expressed SYR2K as two SYRK calls
|
2018-04-18 20:29:28 +02:00 |
Cedric Nugteren
|
ef6b1207df
|
Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel
|
2018-04-17 21:13:28 +02:00 |
Cedric Nugteren
|
93610a9cba
|
Fixed some failing tests for GEMM and batched GEMM routines
|
2018-04-15 12:53:32 +02:00 |
Cedric Nugteren
|
f14e6f87d2
|
Updated tuning results for the Skylake ULT GT2 GPU with the new kernel
|
2018-04-15 11:45:45 +02:00 |
Cedric Nugteren
|
0dff7f1ac4
|
Made GEMM rotation expectations kernel-specific
|
2018-04-13 22:27:11 +02:00 |
Cedric Nugteren
|
0f49dd24e5
|
Updated database with defaults of GEMMK=0 and KREG=1
|
2018-04-10 21:26:18 +02:00 |
Cedric Nugteren
|
77ba11f686
|
Extended the maximum number of tuning parameters from 14 to 16
|
2018-04-08 18:12:54 +02:00 |
Cedric Nugteren
|
a93fec1026
|
Fixed issues with the pre-processor
|
2018-04-08 18:02:44 +02:00 |
Cedric Nugteren
|
7cbc6b7495
|
Merge branch 'master' into CLBlast-228-2d-register-gemm-kernel
|
2018-04-07 17:51:40 +02:00 |
Cedric Nugteren
|
16f7f49683
|
Added tuning results for NVIDIA GeForce 970
|
2018-04-07 17:48:25 +02:00 |
Cedric Nugteren
|
9596e46d01
|
Added tuning results for NVIDIA GeForce 920MX
|
2018-04-07 17:44:32 +02:00 |
Cedric Nugteren
|
048fe90e57
|
Added tuning results for Intel HD Graphics 620
|
2018-04-07 17:33:57 +02:00 |
Cedric Nugteren
|
3519d32ac4
|
Extended the GEMM tuner to be able to tune the new 'kernel 1'
|
2018-04-07 17:05:44 +02:00 |
Cedric Nugteren
|
381f1fe67a
|
Fixed a compilation issue for complex datatypes and vload
|
2018-04-07 16:57:36 +02:00 |
Cedric Nugteren
|
2a29dc061c
|
Fixed a compilation issue for complex datatypes and vload
|
2018-04-06 21:06:13 +02:00 |
Cedric Nugteren
|
eae25f5727
|
Added first version of 2D register tiling kernel with A and C transposed as well
|
2018-04-03 21:18:40 +02:00 |
Cedric Nugteren
|
63996eb68b
|
Updated pyclblast to 1.1.0 and uploaded to PyPi
|
2018-03-30 10:38:36 +02:00 |
Cedric Nugteren
|
4de220a7a2
|
Merge pull request #255 from kodonnell/py_override
Adding override parameters to pyclblast
|
2018-03-30 10:28:00 +02:00 |
Cedric Nugteren
|
d86ff75fa5
|
Added argument checking for the GEMM tuner: expects m/n to be multiples of MWG/NWG
|
2018-03-30 10:23:33 +02:00 |
Cedric Nugteren
|
bb0889fa7a
|
Merge branch 'CLBlast-227-vivante-compiler-errors'
|
2018-03-30 09:22:09 +02:00 |
kodonell
|
173a7eb928
|
merged
|
2018-03-27 08:55:39 +13:00 |
kodonell
|
f07c2a29b8
|
moved override_parameters example out of sgemm example
|
2018-03-27 08:30:58 +13:00 |
kodonell
|
58e70c56f1
|
tidying up pyclblast override_parameters api, and added example
|
2018-03-26 08:51:55 +13:00 |
Cedric Nugteren
|
1cbe2ea301
|
Removed arrays as function argument from GEMM kernels for Vivante OpenCL compiler
|
2018-03-23 20:29:20 +01:00 |
Cedric Nugteren
|
9fb6550dd0
|
Added the OpenCL local memory size constraint to the tuners
|
2018-03-22 21:01:02 +01:00 |
Cedric Nugteren
|
7a2371213b
|
Re-added support for local memory size constraint checking in the tuner
|
2018-03-21 22:58:37 +01:00 |
Cedric Nugteren
|
52791bf355
|
Fixed a failing TRSM test using a CPU with Apple OpenCL
|
2018-03-15 21:09:52 +01:00 |
Cedric Nugteren
|
7a756cbce7
|
Fixed a failing TRSV test using a CPU with Apple OpenCL
|
2018-03-15 20:58:42 +01:00 |
Cedric Nugteren
|
9ff6cd7547
|
Added queue-finish commands to PyCLBlast samples and tests
|
2018-03-15 20:37:48 +01:00 |
Cedric Nugteren
|
934893972e
|
Merge pull request #262 from CNugteren/CLBlast-237-tuning-api
CLBlast #237: Tuning API
|
2018-03-11 15:38:33 +01:00 |
Cedric Nugteren
|
bcf1208431
|
Added basic tests for PyCLBlast
|
2018-03-11 15:32:36 +01:00 |
Cedric Nugteren
|
903deaf368
|
Fixed an issue for DLL linking under Windows
|
2018-03-10 16:45:31 +01:00 |
Cedric Nugteren
|
3d2ef9331b
|
Fixed a few things for the new tuning API
|
2018-03-10 14:35:11 +01:00 |
Cedric Nugteren
|
0bdc51e47c
|
Completed the API for all tuneable kernels
|
2018-03-10 10:54:44 +01:00 |
kodonell
|
c6056da0c8
|
ok, device id working
|
2018-03-10 22:21:30 +13:00 |
Cedric Nugteren
|
6397e61746
|
Added several more tuner API functions
|
2018-03-09 21:40:22 +01:00 |
kodonell
|
54a4b871b3
|
initial add of override parameters to pyclblast - cython not complaining, but segfault
|
2018-03-09 15:27:33 +13:00 |
Cedric Nugteren
|
49cc8b31ff
|
Fixed compilation issue in Xger tuner
|
2018-03-06 20:59:23 +01:00 |
Cedric Nugteren
|
0e1a152023
|
First version of the tuning API, added interface for copy-kernel, added sample
|
2018-03-06 20:52:12 +01:00 |
Cedric Nugteren
|
a1cedf36e3
|
Separate kernel tuners in .cpp with main and .hpp with settings
|
2018-03-03 16:37:31 +01:00 |
Cedric Nugteren
|
bff64917bd
|
Fixed some small issues regarding PR#253
|
2018-03-03 10:43:12 +01:00 |
sivagnanamn
|
1433dc67f1
|
Added C API for getting GEMM temp buffer size
|
2018-03-03 03:00:17 +09:00 |
Cedric Nugteren
|
11f765c16c
|
Generated function signatures/inspect for PyCLBlast
|
2018-02-25 15:31:38 +01:00 |
Cedric Nugteren
|
13dc26e63d
|
Generated PyCLBlast docstrings
|
2018-02-25 15:30:57 +01:00 |
Cedric Nugteren
|
0557694d39
|
Fixed several issues in the new invert tuner
|
2018-02-20 20:53:13 +01:00 |
Cedric Nugteren
|
fc10a4baca
|
Set initial pyclblast to be version 1.0.0
|
2018-02-18 20:19:19 +01:00 |
Cedric Nugteren
|
ce5e2a1e00
|
Prepared PyCLBlast for release as a package on PyPi
|
2018-02-18 18:01:02 +01:00 |
Cedric Nugteren
|
76c21a95c2
|
Added PyCLBlast samples
|
2018-02-18 17:59:43 +01:00 |
Cedric Nugteren
|
a66e24a009
|
Added all other level 1/2/3 routines to pyclblast
|
2018-02-18 17:34:10 +01:00 |
Cedric Nugteren
|
e1bfb40827
|
Added GEMM to the Python wrapper
|
2018-02-18 16:33:20 +01:00 |
Cedric Nugteren
|
eb85f6b514
|
First agenerated version (clblastXswap only for now) of the pyclblast wrapper
|
2018-02-14 20:50:47 +01:00 |
Cedric Nugteren
|
61b8c771ed
|
Added skeleton for Python interface using Cython
|
2018-02-13 21:42:32 +01:00 |
Cedric Nugteren
|
70d0fe89c6
|
Fixed a minor typo
|
2018-02-11 15:31:08 +01:00 |
Cedric Nugteren
|
69ed46c8da
|
Implemented the XHAD Hadamard product routine
|
2018-02-02 21:18:37 +01:00 |
Cedric Nugteren
|
ef5008f5e4
|
Created the API and stubs for the HAD (hadamard-product) routines
|
2018-01-31 20:41:02 +01:00 |
Cedric Nugteren
|
caebe8a9d5
|
Fixed an event synchronisation issue in the batched gemm routines
|
2018-01-26 20:37:04 +01:00 |
Cedric Nugteren
|
19fd263fb2
|
Moved some constants from global scope to a function; removed unnecessary includes
|
2018-01-25 20:00:43 +01:00 |
Cedric Nugteren
|
6a9d6b5da2
|
Changed the default number of runs for the GEMV tuner to fix issues for FP16
|
2018-01-25 19:57:36 +01:00 |
Cedric Nugteren
|
c3f9371d16
|
Made GEMM routine tuning a bit more generic in preparation of possible separate batched tuning arguments
|
2018-01-18 19:41:59 +01:00 |
Cedric Nugteren
|
bc54411d19
|
Made the batched routines also chose direct/indirect kernel like the main GEMM routine
|
2018-01-18 19:41:02 +01:00 |
Cedric Nugteren
|
0e5eaa6eb9
|
Factored out the generic parts of the GEMM routine tuner
|
2018-01-15 21:32:51 +01:00 |
Cedric Nugteren
|
a500f537d8
|
Added a RetrieveParameters function to inspect tuning parameters
|
2018-01-11 20:32:06 +01:00 |
Cedric Nugteren
|
99a4df88a6
|
Implemented the in-direct version of the strided-batched GEMM kernel
|
2018-01-08 21:07:01 +01:00 |
Cedric Nugteren
|
13f0f6fc6e
|
Implemented direct version of strided-batched GEMM kernel
|
2018-01-07 14:58:45 +01:00 |
Cedric Nugteren
|
9fb2c61b25
|
Added API and tests for new GemmStridedBatched routine
|
2018-01-07 14:27:15 +01:00 |
Cedric Nugteren
|
f1e3b35541
|
Reduced duplicate code in the batched GEMM implementation
|
2018-01-06 19:26:11 +01:00 |
Cedric Nugteren
|
c9b5d614e2
|
Fixed a vendor naming bug in the tuners and in the database
|
2018-01-06 17:02:58 +01:00 |
Cedric Nugteren
|
a7ccce1969
|
Merge pull request #238 from CNugteren/gemm_api_with_temp_buffer
GEMM API with optional temp buffer
|
2018-01-06 16:08:27 +01:00 |
Cedric Nugteren
|
ad197da08d
|
Fixed the CUDA interface: replaced nullptr with 0
|
2018-01-06 13:38:44 +01:00 |
Cedric Nugteren
|
e71c037304
|
Fixed a performance overhead in database creation: it is again a static variable now as it was before
|
2018-01-06 11:28:04 +01:00 |
Cedric Nugteren
|
ce069545d4
|
Added CUDA interface to get temporary-buffer size for GEMM routine
|
2018-01-06 10:05:28 +01:00 |
Cedric Nugteren
|
44431daecc
|
Added a CUDA version of the GEMM temp-buffer optional argument
|
2018-01-04 19:33:51 +01:00 |
Cedric Nugteren
|
af14fff1e9
|
Updated the generator script to automatically generate the temp-buffer code
|
2018-01-04 19:31:57 +01:00 |
Cedric Nugteren
|
7f893a85d9
|
Revert "Added options to disable parts of the invert kernel to find out where the AMD compiler crashes"
This reverts commit 407ed52cec .
|
2017-12-31 16:10:40 +01:00 |
Cedric Nugteren
|
69226ae828
|
Changed the invert kernel slightly; added part1a/part1b disable-defines
|
2017-12-31 14:07:08 +01:00 |
Cedric Nugteren
|
7ce415b927
|
Fixed ifdef's into ifndef's
|
2017-12-30 21:17:31 +01:00 |
Cedric Nugteren
|
407ed52cec
|
Added options to disable parts of the invert kernel to find out where the AMD compiler crashes
|
2017-12-30 21:07:50 +01:00 |
Cedric Nugteren
|
ad1227c4f2
|
Added optional temp-buffer argument to C++ interface of GEMM
|
2017-12-30 18:45:06 +01:00 |
Cedric Nugteren
|
6d1e30e61f
|
Added interface to compute the required temporary buffer size for GEMM
|
2017-12-28 14:46:45 +01:00 |
Cedric Nugteren
|
aaea9474a1
|
Factored out argument processing from the GEMM routine
|
2017-12-28 13:56:18 +01:00 |
Cedric Nugteren
|
74792ce96c
|
Refactored GEMM code in preparation of separate temp-buffer computation
|
2017-12-28 11:08:10 +01:00 |
Cedric Nugteren
|
2b9bf3a9aa
|
Simplified invert kernel a little
|
2017-12-27 17:03:06 +01:00 |
Cedric Nugteren
|
1e738db6dd
|
Split the database into multiple small compilation units
|
2017-12-27 12:04:22 +01:00 |
Cedric Nugteren
|
4a2fc4aa98
|
Made the database-vector a non-static member
|
2017-12-26 11:32:05 +01:00 |
Cedric Nugteren
|
bd540829ea
|
Fixes for the CUDA backend of CLBlast
|
2017-12-24 12:10:55 +01:00 |
Cedric Nugteren
|
ef71d8e9b5
|
Fixed unused variable warnings showing up with Clang
|
2017-12-23 16:07:26 +01:00 |
Cedric Nugteren
|
7aabeb44cc
|
Updated the tuning results for the IvyBridge M GT2 GPU
|
2017-12-23 15:46:41 +01:00 |
Cedric Nugteren
|
2b020d59f9
|
Added defines to disable OpenCL deprecation warnings
|
2017-12-23 15:32:22 +01:00 |
Cedric Nugteren
|
04bf5437bc
|
Fixed a warning under MSVC
|
2017-12-23 15:30:08 +01:00 |
Cedric Nugteren
|
288766debb
|
Now calling main TRSV routine again to fix compilation in MSVC
|
2017-12-23 14:49:21 +01:00 |
Cedric Nugteren
|
736399e528
|
Split the invert kernel in two parts to prevent error C1091 in MSVC 2013
|
2017-12-23 14:18:07 +01:00 |
Cedric Nugteren
|
b1f52f130c
|
Updated the database to use the new TRSV and Invert tuners
|
2017-12-23 13:55:22 +01:00 |
Cedric Nugteren
|
aa7db4f987
|
Added TRSV block-size tuner
|
2017-12-23 13:34:57 +01:00 |
Cedric Nugteren
|
9dec53ff52
|
Merge branch 'master' into feature/more_tuners
|
2017-12-21 20:18:05 +01:00 |
Cedric Nugteren
|
0ee81e27b9
|
Added tuning results for Apple AMD Radeon Pro 580
|
2017-12-20 19:59:31 +01:00 |
Cedric Nugteren
|
07a7012b0d
|
Added skeleton for a tuner for the invert kernel
|
2017-12-19 21:10:48 +01:00 |
Cedric Nugteren
|
249bdaa8e9
|
Reformatted tuning code to make compilation faster
|
2017-12-18 21:34:07 +01:00 |
Cedric Nugteren
|
e2f8068459
|
Fixed an issue with the tuner: it was using platform vendor rather than device vendor
|
2017-12-17 17:58:06 +01:00 |
Cedric Nugteren
|
69f6591564
|
Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results based on kernel pre-processor
|
2017-12-17 16:59:08 +01:00 |
Cedric Nugteren
|
7408f6e6eb
|
Fixed an unnecessary overflow issue on 32-bit systems
|
2017-12-17 16:42:54 +01:00 |
Cedric Nugteren
|
4a58efc130
|
Fixed for error C1091 in MSVC 2013
|
2017-12-10 16:40:59 +01:00 |
Cedric Nugteren
|
b4d3a50f19
|
Split GEMM kernel in 4 files instead of 3 due to MSVC 2013 string length limit
|
2017-12-10 16:09:09 +01:00 |
Cedric Nugteren
|
82467b64c4
|
Fixed a missing include
|
2017-12-10 14:49:38 +01:00 |
Cedric Nugteren
|
c2f08fa346
|
Fixed an issue in the tuners to prevent error -14 from persisting (CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST)
|
2017-12-10 14:48:13 +01:00 |
Cedric Nugteren
|
9112e587ae
|
Fixed an Android compilation issue
|
2017-12-10 13:31:57 +01:00 |
Cedric Nugteren
|
9f02fb542c
|
Completed kernel modifications for pre-processor of all other kernels
|
2017-12-09 20:44:21 +01:00 |
Cedric Nugteren
|
ca5dbcd2bd
|
Made the pre-processor run by default for ARM and Qualcomm GPUs
|
2017-12-09 15:16:53 +01:00 |
Cedric Nugteren
|
02c0d64037
|
Modified the direct GEMM kernel to support array-to-register promotion
|
2017-12-09 14:53:10 +01:00 |
Cedric Nugteren
|
23e3a85f2c
|
Reformatted GEMM kernel to support array-to-register promotion
|
2017-12-09 14:09:13 +01:00 |
Cedric Nugteren
|
d9df62b794
|
Fixed defines parsing and substituting in pre-processor; fixed some variable names in kernels
|
2017-12-09 10:49:55 +01:00 |
Cedric Nugteren
|
540896476d
|
Added register promotion to the main GEMM kernel
|
2017-12-07 22:05:29 +01:00 |
Cedric Nugteren
|
0f9637bbac
|
Improved array-to-register promotion, now handling function calls as well
|
2017-12-05 20:39:49 +01:00 |
Cedric Nugteren
|
cf4555d1f4
|
Added GEMM (direct and in-direct) to the pre-processor testing; modified the loops in kernel accordingly
|
2017-12-03 16:40:36 +01:00 |
Cedric Nugteren
|
0a1a3de58a
|
Added basic bracket parsing in defines and loop expressions
|
2017-12-03 16:39:22 +01:00 |
Cedric Nugteren
|
60312e5878
|
Reformated transpose kernels for the pre-processor; extended the amount of tests
|
2017-12-03 12:00:37 +01:00 |
Cedric Nugteren
|
92842024b0
|
Improved array to register promotion in the pre-processor
|
2017-12-03 11:59:38 +01:00 |
Cedric Nugteren
|
bf7aeb8d5b
|
Improved the pre-processor's handling of defines; added a special nested defines test
|
2017-11-30 21:43:16 +01:00 |
Cedric Nugteren
|
13eb772343
|
Integrated pre-processor in compilation flow, default is still disabled
|
2017-11-30 21:32:47 +01:00 |
Cedric Nugteren
|
93ffb876c6
|
Reformatted unrollable kernel loops and added the new promote_to_registers pragma for several kernels
|
2017-11-29 20:21:08 +01:00 |
Cedric Nugteren
|
0dde6af703
|
Extended the preprocessor tests to include CopyFast and CopyPad
|
2017-11-29 20:18:36 +01:00 |
Cedric Nugteren
|
1d35f65cea
|
Improves the array-to-register promotion in the pre-processor
|
2017-11-29 19:53:50 +01:00 |
Cedric Nugteren
|
14047861ce
|
Improved the kernel pre-processor in various ways
|
2017-11-28 20:52:08 +01:00 |
Cedric Nugteren
|
35956f9db1
|
Added simple implementation of array-to-register promotion
|
2017-11-27 20:26:30 +01:00 |
Cedric Nugteren
|
9c643b293c
|
Improved the for-loop pre-processing
|
2017-11-26 13:32:48 +01:00 |
Cedric Nugteren
|
69aa3b35ed
|
Implemented first simple pre-processor: defines parser and loop unrolling based on assumptions
|
2017-11-25 17:46:01 +01:00 |
Cedric Nugteren
|
f01bcded1e
|
Moved string splitting functions; added string character removal function
|
2017-11-25 17:44:21 +01:00 |
Cedric Nugteren
|
c0c6d00b12
|
Added stub for a preprocessor and a corresponding compilation test
|
2017-11-25 10:24:05 +01:00 |
Cedric Nugteren
|
ebce82e650
|
Merge pull request #222 from CNugteren/override_params_from_json
Override params in clients from tuner JSON
|
2017-11-25 09:48:27 +01:00 |
Cedric Nugteren
|
abb4d5ab32
|
Added tuning results for ARM Mali T760 GPU
|
2017-11-24 21:16:54 +01:00 |
Cedric Nugteren
|
9527c89c30
|
Made parameter override in the clients a command-line argument and added support for multi-kernel routines
|
2017-11-22 20:53:20 +01:00 |
Cedric Nugteren
|
0f080bbc6e
|
Potentially fixed an MSVC 2013 issue with a copy-constructor not being generated
|
2017-11-20 20:54:18 +01:00 |
Cedric Nugteren
|
e0f3484084
|
Fixes some displaying issues in the GEMM routine tuner
|
2017-11-20 20:29:52 +01:00 |
Cedric Nugteren
|
5467c0cac5
|
Fixed a variety of warnings and an error for MSVC2013 compilation
|
2017-11-19 21:09:24 +01:00 |
Cedric Nugteren
|
4e0d08c3bc
|
Added compilation timing and better compilation error reporting
|
2017-11-19 16:58:13 +01:00 |
Cedric Nugteren
|
a3a8b44f59
|
Some fixed for the new auto-tuner to be compatible with the Python scripts
|
2017-11-19 16:31:08 +01:00 |
Cedric Nugteren
|
76d2b7f0b6
|
Revived the GEMM routine tuner; minor formatting changes
|
2017-11-19 12:59:52 +01:00 |
Cedric Nugteren
|
7a54494577
|
Modified the kernel tuners to use the newly integrated auto-tuner
|
2017-11-19 12:58:41 +01:00 |
Cedric Nugteren
|
8a5a5e031e
|
Moved some tuning functions from .hpp to .cpp
|
2017-11-17 20:58:36 +01:00 |
Cedric Nugteren
|
f94d498a37
|
Moved compilation function to separate file; removed dependency of tuners of the CLBlast library
|
2017-11-17 20:57:46 +01:00 |
Cedric Nugteren
|
2b8ad70b63
|
Added printing of the best parameters for the new tuner
|
2017-11-16 21:18:29 +01:00 |
Cedric Nugteren
|
1b2b46f2f0
|
Added first version of integrated and re-written auto-tuner
|
2017-11-15 22:49:35 +01:00 |
Cedric Nugteren
|
0cd78bb6f9
|
Added kernel timing functionality to the utilities
|
2017-11-15 22:47:06 +01:00 |
Cedric Nugteren
|
b337bffbaf
|
Added exception handle with catch-all
|
2017-11-15 22:44:44 +01:00 |
Cedric Nugteren
|
03ebf14b97
|
Made the exception dispatch function optionally silent
|
2017-11-13 21:11:31 +01:00 |
Cedric Nugteren
|
4bac1287f2
|
Moved square-difference utility function for use in the tuners
|
2017-11-13 21:10:44 +01:00 |
Cedric Nugteren
|
677afd3b96
|
Factored out the creation of the OpenCL header and the program compilation
|
2017-11-11 16:14:43 +01:00 |
Cedric Nugteren
|
c41d219ea4
|
Added tuning results for the GeForce GTX750Ti
|
2017-11-09 21:19:21 +01:00 |
Cedric Nugteren
|
b18cc9d3f1
|
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
|
2017-11-07 22:20:13 +01:00 |
Cedric Nugteren
|
3ec0be6fb8
|
Added various GEMM routine tuning results
|
2017-11-07 21:34:54 +01:00 |
Cedric Nugteren
|
33ac2b0175
|
Improved the way the database defaults are computed
|
2017-11-06 21:59:45 +01:00 |
Cedric Nugteren
|
34a33b54cf
|
Changed GEMM routine tuner's scoring to use L2 measure instead for better averaging
|
2017-11-06 20:50:36 +01:00 |
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
Cedric Nugteren
|
73272ab97d
|
Fixed a bug in database compression/decompression
|
2017-11-02 21:19:18 +01:00 |
Cedric Nugteren
|
5c90577dfd
|
Added collecting and printing of scores for the kernel-selection tuner
|
2017-10-30 20:39:21 +01:00 |
Cedric Nugteren
|
ac5a58cfe5
|
Added platform ID to the binary program cache to prevent issues with multi-platform systems
|
2017-10-29 20:01:30 +01:00 |
Cedric Nugteren
|
319762f150
|
Added Android support using the GNU C++ STL library and the GCC toolchain
|
2017-10-29 12:07:07 +01:00 |
Cedric Nugteren
|
12b08ae491
|
Merge branch 'master' into android_support
|
2017-10-28 17:32:37 +02:00 |
Cedric Nugteren
|
334a26eb12
|
Added initial version of a GEMM kernel selection tuner
|
2017-10-28 17:30:29 +02:00 |
Cedric Nugteren
|
bd57dfa435
|
Moved timing function to a separate file
|
2017-10-28 14:12:05 +02:00 |
Cedric Nugteren
|
fa6e5e67f5
|
Fixed a bug when using the matrix A-offset argument for the TRSM routine
|
2017-10-27 22:12:30 +02:00 |
Cedric Nugteren
|
449577cf07
|
Reduced TRSM block-size for better numerical stability
|
2017-10-27 22:07:43 +02:00 |
Cedric Nugteren
|
44f7fa628a
|
Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM
|
2017-10-27 22:01:15 +02:00 |
Cedric Nugteren
|
d49aae236e
|
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls
|
2017-10-25 20:35:39 +02:00 |
Cedric Nugteren
|
472f90501c
|
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
|
2017-10-20 18:06:12 +02:00 |
Cedric Nugteren
|
363568787e
|
Moved CUmodule code from Kernel to Program class to not require re-compilation every time
|
2017-10-18 18:17:30 +02:00 |
Cedric Nugteren
|
9d879c949a
|
Fix an incompatibility with CUDA's FP16 definition
|
2017-10-17 20:29:23 +02:00 |
Cedric Nugteren
|
b1270f04b8
|
Made buffers of batched routines read/write (was: read-only)
|
2017-10-17 19:56:47 +02:00 |
Cedric Nugteren
|
f349731d54
|
CUDA kernel compilation fixes
|
2017-10-17 19:53:09 +02:00 |
Cedric Nugteren
|
0719f14486
|
Made all CUDA kernel launches synchronous; removed exception raising
|
2017-10-16 21:54:42 +02:00 |
Cedric Nugteren
|
d62823f067
|
Added a missing OpenCL-to-CUDA function translation
|
2017-10-15 19:53:52 +02:00 |
Cedric Nugteren
|
7663cba234
|
Fixes for the CUDA API: first tests pass and the client runs
|
2017-10-15 17:43:20 +02:00 |
Cedric Nugteren
|
71049e8d39
|
Added the SM-compute-arch version to the nv compile options
|
2017-10-15 17:41:44 +02:00 |
Cedric Nugteren
|
7408da174c
|
Various fixes to make the first CUDA examples work
|
2017-10-15 12:17:35 +02:00 |
Cedric Nugteren
|
55a802c63d
|
Fixed a kernel/attribute order bug in the direct GEMM kernels
|
2017-10-14 17:21:34 +02:00 |
Cedric Nugteren
|
b06bc01da9
|
Make local memory pointers a define in OpenCL; some fixes to the recently changed transpose kernel code
|
2017-10-14 17:13:54 +02:00 |
Cedric Nugteren
|
d9456306e0
|
Made transpose kernel struct init proper according to the C standard
|
2017-10-14 16:48:06 +02:00 |
Cedric Nugteren
|
313fc796b2
|
Fixed several (not all) CUDA kernel compilation issues
|
2017-10-14 16:01:12 +02:00 |