Cedric Nugteren
4676ec2921
Added a FAQ document
2018-12-01 17:19:28 +01:00
Koichi Akabe
032e3b0cc0
Add kernel_mode option to im2col, col2im, and convgemm functions
2018-11-12 10:12:07 +09:00
Cedric Nugteren
6f67525ea6
Changed col2im to append to the existing im-buffer
2018-11-07 19:45:07 +01:00
Cedric Nugteren
2d32a23293
Added new col2im routine to the documentation
2018-11-01 21:46:19 +01:00
Cedric Nugteren
d45911b61d
Added groundwork for col2im algorithm plus first non-working version of kernel and test
2018-10-23 20:52:25 +02:00
Cedric Nugteren
634b2bc75c
Merge pull request #319 from CNugteren/convgemm_multi_kernel
...
First im2col+GEMM implementation of convolution
2018-10-14 17:27:45 +02:00
Cedric Nugteren
8676b62178
Updated the documentation for GEMV tuning
2018-10-13 17:43:51 +02:00
Cedric Nugteren
83ba3d4b7b
Merge branch 'master' into convgemm_multi_kernel
2018-09-16 20:01:18 +02:00
Cedric Nugteren
91dbd580ab
Added a kernel-parameter pair table to document the tuning API
2018-09-15 18:47:31 +02:00
Cedric Nugteren
c788e040f7
Added xCONVGEMM as im2col plus a batched GEMM kernel
2018-09-07 22:02:44 +02:00
Hendrik Ranocha
faed209f30
Add Julia Wrapper
...
I've written a wrapper of CLBlast in Julia which can be found [here](https://github.com/JuliaGPU/CLBlast.jl ). It is published and available using the Julia package manager.
2018-09-03 15:57:16 +02:00
Cedric Nugteren
2dd539f911
Removed complex numbers support for CONVGEMM
2018-07-29 10:37:14 +02:00
Cedric Nugteren
5903820ba2
Merge branch 'master' into CLBlast-267-convgemm
2018-07-29 10:26:34 +02:00
Cedric Nugteren
db179a1e40
Updated to CLBlast version 1.4.1
2018-07-14 12:29:06 +02:00
Cedric Nugteren
f72620f474
Added tuning results for Intel i5-4970S
2018-07-13 21:25:21 +02:00
Cedric Nugteren
08b1417956
Added tuning results for GeForce GTX 1070 Ti
2018-07-13 21:07:32 +02:00
Cedric Nugteren
c459582c4f
Added tuning results for HD Graphics 6000 Broadwell GT3
2018-07-13 21:05:43 +02:00
Cedric Nugteren
1c9a741470
Merge branch 'master' into CLBlast-267-convgemm
2018-06-03 15:53:27 +02:00
Cedric Nugteren
fee8df153c
Added list of tuners to be run by 'alltuners' target
2018-06-03 10:42:15 +02:00
Cedric Nugteren
cbcd4ff7e8
Merge branch 'master' into CLBlast-267-convgemm
2018-05-19 17:54:27 +02:00
Cedric Nugteren
66583b3cda
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
2018-05-19 12:48:59 +02:00
Cedric Nugteren
ad57a45039
Added documentation on some details of the GEMM implementation
2018-05-17 12:50:03 +02:00
Cedric Nugteren
a4119531ee
Updated the documentation for convgemm to include data layout (NCHW)
2018-05-09 17:46:27 +02:00
Cedric Nugteren
2d1f6ba7fe
Added convgemm skeleton, test infrastructure, and first reference implementation
2018-05-06 11:35:34 +02:00
Cedric Nugteren
2776d76176
Added interface of batched convolution as GEMM
2018-05-05 14:06:33 +02:00
Cedric Nugteren
16f7f49683
Added tuning results for NVIDIA GeForce 970
2018-04-07 17:48:25 +02:00
Cedric Nugteren
9596e46d01
Added tuning results for NVIDIA GeForce 920MX
2018-04-07 17:44:32 +02:00
Cedric Nugteren
7e69c422af
Updated the roadmap
2018-03-30 10:05:16 +02:00
Cedric Nugteren
934893972e
Merge pull request #262 from CNugteren/CLBlast-237-tuning-api
...
CLBlast #237 : Tuning API
2018-03-11 15:38:33 +01:00
Cedric Nugteren
49b02ec194
Added initial glossary
2018-03-10 17:02:38 +01:00
Cedric Nugteren
54bbc99273
Updated the documentation for the tuner API
2018-03-10 14:52:40 +01:00
Cedric Nugteren
1aef354577
Updated documentation and build badges
2018-03-03 10:57:06 +01:00
Cedric Nugteren
bff64917bd
Fixed some small issues regarding PR#253
2018-03-03 10:43:12 +01:00
sivagnanamn
1433dc67f1
Added C API for getting GEMM temp buffer size
2018-03-03 03:00:17 +09:00
Cedric Nugteren
be8d95bd03
Added a note on preventing segfaults with OpenCL using the AMD APP SDK
2018-02-26 19:52:53 +01:00
Cedric Nugteren
c10bbff751
Fixed Ubuntu PPA package name
2018-02-25 16:15:20 +01:00
Cedric Nugteren
9699169cdf
Added API documentation for two missing C++ functions
2018-02-25 14:44:22 +01:00
Cedric Nugteren
ced830539e
Split the documentation and updated where needed
2018-02-24 21:11:28 +01:00
Cedric Nugteren
e784df0230
Renamed the API documentation
2018-02-24 20:46:44 +01:00
Kirill Mavreshko
5463bd5c44
Fix of multiple duplicates in documentation
2018-02-20 18:10:31 +05:00
Cedric Nugteren
ae66782eab
Fixed the XHAD documentation
2018-02-02 21:12:07 +01:00
Cedric Nugteren
ef5008f5e4
Created the API and stubs for the HAD (hadamard-product) routines
2018-01-31 20:41:02 +01:00
Cedric Nugteren
9fb2c61b25
Added API and tests for new GemmStridedBatched routine
2018-01-07 14:27:15 +01:00
Cedric Nugteren
af14fff1e9
Updated the generator script to automatically generate the temp-buffer code
2018-01-04 19:31:57 +01:00
Cedric Nugteren
84ec50e29d
Added interface and stubs for the im2col routine
2017-07-02 12:10:22 +02:00
Cedric Nugteren
f151e56daa
Added the IxAMIN routines: absolute minimum version of IxAMAX
2017-05-12 20:01:33 -07:00
Cedric Nugteren
81d9ed3946
Removed the included performance reports; README now redirects to the new external website
2017-05-12 13:18:10 -07:00
Cedric Nugteren
49e04c7fce
Added API and test infrastructure for the batched GEMM routine
2017-03-10 21:24:35 +01:00
Cedric Nugteren
fa0a9c689f
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
2017-03-08 20:10:20 +01:00
Cedric Nugteren
b114ea49a9
Added first naive version of the batched AXPY routine
2017-03-05 15:06:14 +01:00