Cedric Nugteren
|
e1bfb40827
|
Added GEMM to the Python wrapper
|
2018-02-18 16:33:20 +01:00 |
|
Cedric Nugteren
|
eb85f6b514
|
First agenerated version (clblastXswap only for now) of the pyclblast wrapper
|
2018-02-14 20:50:47 +01:00 |
|
Cedric Nugteren
|
61b8c771ed
|
Added skeleton for Python interface using Cython
|
2018-02-13 21:42:32 +01:00 |
|
Cedric Nugteren
|
c5a28cd70b
|
Added CLBlast version numbering to the compiled library
|
2018-02-11 15:31:21 +01:00 |
|
Cedric Nugteren
|
70d0fe89c6
|
Fixed a minor typo
|
2018-02-11 15:31:08 +01:00 |
|
Cedric Nugteren
|
101152568a
|
Merge pull request #246 from CNugteren/CLBlast-224-hadamard-product
Hadamard product
|
2018-02-03 13:18:03 +01:00 |
|
Cedric Nugteren
|
69ed46c8da
|
Implemented the XHAD Hadamard product routine
|
2018-02-02 21:18:37 +01:00 |
|
Cedric Nugteren
|
ae66782eab
|
Fixed the XHAD documentation
|
2018-02-02 21:12:07 +01:00 |
|
Cedric Nugteren
|
ef5008f5e4
|
Created the API and stubs for the HAD (hadamard-product) routines
|
2018-01-31 20:41:02 +01:00 |
|
Cedric Nugteren
|
37c5e8f58c
|
Updated to CLBlast version 1.3.0
|
2018-01-29 20:45:21 +01:00 |
|
Cedric Nugteren
|
f12c7fcdf2
|
Merge branch 'master' of github.com:CNugteren/CLBlast
|
2018-01-29 20:34:37 +01:00 |
|
Cedric Nugteren
|
d1d80ca131
|
Fixed a compilation error of the kernel-preprocessor test under MSVC
|
2018-01-29 20:26:25 +01:00 |
|
Cedric Nugteren
|
97e92cb10c
|
Updated the known issues
|
2018-01-28 14:50:03 +01:00 |
|
Cedric Nugteren
|
180532ea39
|
Some fixes to the benchmark scripts
|
2018-01-27 20:06:13 +01:00 |
|
Cedric Nugteren
|
ada762f668
|
Minor displaying improvements to the graph plotting scripts
|
2018-01-26 20:38:11 +01:00 |
|
Cedric Nugteren
|
caebe8a9d5
|
Fixed an event synchronisation issue in the batched gemm routines
|
2018-01-26 20:37:04 +01:00 |
|
Cedric Nugteren
|
3651b51664
|
Improved the benchmark scripts; added gemmstridedbatched benchmark
|
2018-01-25 21:24:18 +01:00 |
|
Cedric Nugteren
|
19fd263fb2
|
Moved some constants from global scope to a function; removed unnecessary includes
|
2018-01-25 20:00:43 +01:00 |
|
Cedric Nugteren
|
6a9d6b5da2
|
Changed the default number of runs for the GEMV tuner to fix issues for FP16
|
2018-01-25 19:57:36 +01:00 |
|
Cedric Nugteren
|
b2c946c517
|
Merge pull request #244 from CNugteren/kernel_selection_batched_gemm
Kernel selection for batched GEMM
|
2018-01-20 10:19:28 +01:00 |
|
Cedric Nugteren
|
c3f9371d16
|
Made GEMM routine tuning a bit more generic in preparation of possible separate batched tuning arguments
|
2018-01-18 19:41:59 +01:00 |
|
Cedric Nugteren
|
bc54411d19
|
Made the batched routines also chose direct/indirect kernel like the main GEMM routine
|
2018-01-18 19:41:02 +01:00 |
|
Cedric Nugteren
|
0e5eaa6eb9
|
Factored out the generic parts of the GEMM routine tuner
|
2018-01-15 21:32:51 +01:00 |
|
Cedric Nugteren
|
b35e3d1e53
|
Small improvements to benchmarking for cuBLAS
|
2018-01-14 19:50:27 +01:00 |
|
Cedric Nugteren
|
6d52eb2956
|
Merge pull request #240 from CNugteren/retrieve_tuning_parameters
Retrieve tuning parameters
|
2018-01-11 23:09:48 +01:00 |
|
Cedric Nugteren
|
90e8e55acb
|
Added test for the RetrieveParameters function
|
2018-01-11 20:34:09 +01:00 |
|
Cedric Nugteren
|
a500f537d8
|
Added a RetrieveParameters function to inspect tuning parameters
|
2018-01-11 20:32:06 +01:00 |
|
Cedric Nugteren
|
389919faec
|
Fixed bug in override parameters test
|
2018-01-11 20:30:45 +01:00 |
|
Cedric Nugteren
|
9b084d0409
|
Merge pull request #239 from CNugteren/gemm_strided_batched
GemmStridedBatched
|
2018-01-11 19:42:50 +01:00 |
|
Cedric Nugteren
|
99a4df88a6
|
Implemented the in-direct version of the strided-batched GEMM kernel
|
2018-01-08 21:07:01 +01:00 |
|
Cedric Nugteren
|
13f0f6fc6e
|
Implemented direct version of strided-batched GEMM kernel
|
2018-01-07 14:58:45 +01:00 |
|
Cedric Nugteren
|
9fb2c61b25
|
Added API and tests for new GemmStridedBatched routine
|
2018-01-07 14:27:15 +01:00 |
|
Cedric Nugteren
|
0c48c6e6c4
|
Fixed a minor nullptr related issue in the code generator
|
2018-01-06 19:32:54 +01:00 |
|
Cedric Nugteren
|
00687f8d81
|
Prevented half-precision batched routines from failing in the tests
|
2018-01-06 19:26:38 +01:00 |
|
Cedric Nugteren
|
f1e3b35541
|
Reduced duplicate code in the batched GEMM implementation
|
2018-01-06 19:26:11 +01:00 |
|
Cedric Nugteren
|
c988c2cdd1
|
Updated changelog and roadmap
|
2018-01-06 17:16:11 +01:00 |
|
Cedric Nugteren
|
c9b5d614e2
|
Fixed a vendor naming bug in the tuners and in the database
|
2018-01-06 17:02:58 +01:00 |
|
Cedric Nugteren
|
a7ccce1969
|
Merge pull request #238 from CNugteren/gemm_api_with_temp_buffer
GEMM API with optional temp buffer
|
2018-01-06 16:08:27 +01:00 |
|
Cedric Nugteren
|
ad197da08d
|
Fixed the CUDA interface: replaced nullptr with 0
|
2018-01-06 13:38:44 +01:00 |
|
Cedric Nugteren
|
e71c037304
|
Fixed a performance overhead in database creation: it is again a static variable now as it was before
|
2018-01-06 11:28:04 +01:00 |
|
Cedric Nugteren
|
ce069545d4
|
Added CUDA interface to get temporary-buffer size for GEMM routine
|
2018-01-06 10:05:28 +01:00 |
|
Cedric Nugteren
|
44431daecc
|
Added a CUDA version of the GEMM temp-buffer optional argument
|
2018-01-04 19:33:51 +01:00 |
|
Cedric Nugteren
|
af14fff1e9
|
Updated the generator script to automatically generate the temp-buffer code
|
2018-01-04 19:31:57 +01:00 |
|
Cedric Nugteren
|
a3925e5060
|
Updated the ROADMAP
|
2018-01-03 20:38:48 +01:00 |
|
Cedric Nugteren
|
5315b982a9
|
Added the temp-buffer to the GEMM testers and clients
|
2018-01-03 20:20:31 +01:00 |
|
Cedric Nugteren
|
eb89371d2b
|
Added a queue argument to the get-size function when running the tests/clients
|
2018-01-03 20:19:45 +01:00 |
|
Cedric Nugteren
|
8040a4e355
|
Merge pull request #236 from CNugteren/trsm_compilation
Fixed compilation of TRSM/Invert for AMD APP
|
2018-01-01 16:10:11 +01:00 |
|
Cedric Nugteren
|
ad483123e6
|
Fixed the issue with AMD's APP compiler not being able to compile the invert kernel
|
2017-12-31 16:13:13 +01:00 |
|
Cedric Nugteren
|
1511909b6f
|
Revert "Added a simple test to check compilation of the invert kernels (issue with AMD APP)"
This reverts commit 0eb9b35481 .
|
2017-12-31 16:11:35 +01:00 |
|
Cedric Nugteren
|
7f893a85d9
|
Revert "Added options to disable parts of the invert kernel to find out where the AMD compiler crashes"
This reverts commit 407ed52cec .
|
2017-12-31 16:10:40 +01:00 |
|