Cedric Nugteren
|
6a9d6b5da2
|
Changed the default number of runs for the GEMV tuner to fix issues for FP16
|
2018-01-25 19:57:36 +01:00 |
|
Cedric Nugteren
|
b2c946c517
|
Merge pull request #244 from CNugteren/kernel_selection_batched_gemm
Kernel selection for batched GEMM
|
2018-01-20 10:19:28 +01:00 |
|
Cedric Nugteren
|
c3f9371d16
|
Made GEMM routine tuning a bit more generic in preparation of possible separate batched tuning arguments
|
2018-01-18 19:41:59 +01:00 |
|
Cedric Nugteren
|
bc54411d19
|
Made the batched routines also chose direct/indirect kernel like the main GEMM routine
|
2018-01-18 19:41:02 +01:00 |
|
Cedric Nugteren
|
0e5eaa6eb9
|
Factored out the generic parts of the GEMM routine tuner
|
2018-01-15 21:32:51 +01:00 |
|
Cedric Nugteren
|
b35e3d1e53
|
Small improvements to benchmarking for cuBLAS
|
2018-01-14 19:50:27 +01:00 |
|
Cedric Nugteren
|
6d52eb2956
|
Merge pull request #240 from CNugteren/retrieve_tuning_parameters
Retrieve tuning parameters
|
2018-01-11 23:09:48 +01:00 |
|
Cedric Nugteren
|
90e8e55acb
|
Added test for the RetrieveParameters function
|
2018-01-11 20:34:09 +01:00 |
|
Cedric Nugteren
|
a500f537d8
|
Added a RetrieveParameters function to inspect tuning parameters
|
2018-01-11 20:32:06 +01:00 |
|
Cedric Nugteren
|
389919faec
|
Fixed bug in override parameters test
|
2018-01-11 20:30:45 +01:00 |
|
Cedric Nugteren
|
9b084d0409
|
Merge pull request #239 from CNugteren/gemm_strided_batched
GemmStridedBatched
|
2018-01-11 19:42:50 +01:00 |
|
Cedric Nugteren
|
99a4df88a6
|
Implemented the in-direct version of the strided-batched GEMM kernel
|
2018-01-08 21:07:01 +01:00 |
|
Cedric Nugteren
|
13f0f6fc6e
|
Implemented direct version of strided-batched GEMM kernel
|
2018-01-07 14:58:45 +01:00 |
|
Cedric Nugteren
|
9fb2c61b25
|
Added API and tests for new GemmStridedBatched routine
|
2018-01-07 14:27:15 +01:00 |
|
Cedric Nugteren
|
0c48c6e6c4
|
Fixed a minor nullptr related issue in the code generator
|
2018-01-06 19:32:54 +01:00 |
|
Cedric Nugteren
|
00687f8d81
|
Prevented half-precision batched routines from failing in the tests
|
2018-01-06 19:26:38 +01:00 |
|
Cedric Nugteren
|
f1e3b35541
|
Reduced duplicate code in the batched GEMM implementation
|
2018-01-06 19:26:11 +01:00 |
|
Cedric Nugteren
|
c988c2cdd1
|
Updated changelog and roadmap
|
2018-01-06 17:16:11 +01:00 |
|
Cedric Nugteren
|
c9b5d614e2
|
Fixed a vendor naming bug in the tuners and in the database
|
2018-01-06 17:02:58 +01:00 |
|
Cedric Nugteren
|
a7ccce1969
|
Merge pull request #238 from CNugteren/gemm_api_with_temp_buffer
GEMM API with optional temp buffer
|
2018-01-06 16:08:27 +01:00 |
|
Cedric Nugteren
|
ad197da08d
|
Fixed the CUDA interface: replaced nullptr with 0
|
2018-01-06 13:38:44 +01:00 |
|
Cedric Nugteren
|
e71c037304
|
Fixed a performance overhead in database creation: it is again a static variable now as it was before
|
2018-01-06 11:28:04 +01:00 |
|
Cedric Nugteren
|
ce069545d4
|
Added CUDA interface to get temporary-buffer size for GEMM routine
|
2018-01-06 10:05:28 +01:00 |
|
Cedric Nugteren
|
44431daecc
|
Added a CUDA version of the GEMM temp-buffer optional argument
|
2018-01-04 19:33:51 +01:00 |
|
Cedric Nugteren
|
af14fff1e9
|
Updated the generator script to automatically generate the temp-buffer code
|
2018-01-04 19:31:57 +01:00 |
|
Cedric Nugteren
|
a3925e5060
|
Updated the ROADMAP
|
2018-01-03 20:38:48 +01:00 |
|
Cedric Nugteren
|
5315b982a9
|
Added the temp-buffer to the GEMM testers and clients
|
2018-01-03 20:20:31 +01:00 |
|
Cedric Nugteren
|
eb89371d2b
|
Added a queue argument to the get-size function when running the tests/clients
|
2018-01-03 20:19:45 +01:00 |
|
Cedric Nugteren
|
8040a4e355
|
Merge pull request #236 from CNugteren/trsm_compilation
Fixed compilation of TRSM/Invert for AMD APP
|
2018-01-01 16:10:11 +01:00 |
|
Cedric Nugteren
|
ad483123e6
|
Fixed the issue with AMD's APP compiler not being able to compile the invert kernel
|
2017-12-31 16:13:13 +01:00 |
|
Cedric Nugteren
|
1511909b6f
|
Revert "Added a simple test to check compilation of the invert kernels (issue with AMD APP)"
This reverts commit 0eb9b35481 .
|
2017-12-31 16:11:35 +01:00 |
|
Cedric Nugteren
|
7f893a85d9
|
Revert "Added options to disable parts of the invert kernel to find out where the AMD compiler crashes"
This reverts commit 407ed52cec .
|
2017-12-31 16:10:40 +01:00 |
|
Cedric Nugteren
|
b4c8e1d9a5
|
Made plotting script more flexible: extra argument to set the comparison library
|
2017-12-31 16:02:46 +01:00 |
|
Cedric Nugteren
|
69226ae828
|
Changed the invert kernel slightly; added part1a/part1b disable-defines
|
2017-12-31 14:07:08 +01:00 |
|
Cedric Nugteren
|
7ce415b927
|
Fixed ifdef's into ifndef's
|
2017-12-30 21:17:31 +01:00 |
|
Cedric Nugteren
|
407ed52cec
|
Added options to disable parts of the invert kernel to find out where the AMD compiler crashes
|
2017-12-30 21:07:50 +01:00 |
|
Cedric Nugteren
|
ad1227c4f2
|
Added optional temp-buffer argument to C++ interface of GEMM
|
2017-12-30 18:45:06 +01:00 |
|
Cedric Nugteren
|
6d1e30e61f
|
Added interface to compute the required temporary buffer size for GEMM
|
2017-12-28 14:46:45 +01:00 |
|
Cedric Nugteren
|
aaea9474a1
|
Factored out argument processing from the GEMM routine
|
2017-12-28 13:56:18 +01:00 |
|
Cedric Nugteren
|
74792ce96c
|
Refactored GEMM code in preparation of separate temp-buffer computation
|
2017-12-28 11:08:10 +01:00 |
|
Cedric Nugteren
|
936cf2668d
|
Merge pull request #234 from CNugteren/database_compilation_split
Database compilation split
|
2017-12-27 20:05:31 +01:00 |
|
Cedric Nugteren
|
0eb9b35481
|
Added a simple test to check compilation of the invert kernels (issue with AMD APP)
|
2017-12-27 17:16:08 +01:00 |
|
Cedric Nugteren
|
2b9bf3a9aa
|
Simplified invert kernel a little
|
2017-12-27 17:03:06 +01:00 |
|
Cedric Nugteren
|
1e738db6dd
|
Split the database into multiple small compilation units
|
2017-12-27 12:04:22 +01:00 |
|
Cedric Nugteren
|
4a2fc4aa98
|
Made the database-vector a non-static member
|
2017-12-26 11:32:05 +01:00 |
|
Cedric Nugteren
|
bd540829ea
|
Fixes for the CUDA backend of CLBlast
|
2017-12-24 12:10:55 +01:00 |
|
Cedric Nugteren
|
8657e90cf8
|
Fixed linking of the preprocessor test for MSVC
|
2017-12-24 11:33:47 +01:00 |
|
Cedric Nugteren
|
e81eb4f6d4
|
Added a note that the ArrayFire Jenkins servers are down, being switched to buildbot
|
2017-12-24 11:32:31 +01:00 |
|
Cedric Nugteren
|
ef71d8e9b5
|
Fixed unused variable warnings showing up with Clang
|
2017-12-23 16:07:26 +01:00 |
|
Cedric Nugteren
|
7aabeb44cc
|
Updated the tuning results for the IvyBridge M GT2 GPU
|
2017-12-23 15:46:41 +01:00 |
|