Cedric Nugteren
|
4a6c7c37a3
|
Made sure that the global workgroup size is a multiple of the local size in the tuners
|
2020-05-10 20:28:23 +02:00 |
|
Cedric Nugteren
|
0870e76fba
|
Updated PyCLBlast version number
|
2020-05-10 14:55:03 +02:00 |
|
Cedric Nugteren
|
5f97d64505
|
Update API documentation
|
2020-03-08 11:29:47 +01:00 |
|
Cedric Nugteren
|
e3ce88154a
|
Silenced a new OpenCL warning message
|
2020-03-08 10:14:59 +01:00 |
|
Cedric Nugteren
|
8433985051
|
Updated to version 1.5.1
|
2020-02-18 10:29:40 +01:00 |
|
Cedric Nugteren
|
49eb490ee1
|
Catches all exceptions of the tuners
|
2020-02-17 22:07:51 +01:00 |
|
Cedric Nugteren
|
6ac74008b6
|
Added notion of fixes in XhadFaster
|
2019-09-06 19:33:30 +02:00 |
|
Cedric Nugteren
|
3f9d7bca22
|
Fixed a bug in the absolute-min index kernel
|
2019-05-19 14:00:18 +02:00 |
|
Cedric Nugteren
|
1035e533cd
|
Added tuning parameters for Xeon E5-2630 v3 and v4
|
2019-02-09 16:29:30 +01:00 |
|
Koichi Akabe
|
9532f8652c
|
Update changelog
|
2018-12-21 11:08:01 +09:00 |
|
Cedric Nugteren
|
0c9411c844
|
Updated to version 1.5.0
|
2018-12-04 20:46:02 +01:00 |
|
Cedric Nugteren
|
4676ec2921
|
Added a FAQ document
|
2018-12-01 17:19:28 +01:00 |
|
Cedric Nugteren
|
c0e41b87cb
|
Fixed an issue for unequal MWG and NWG and the new GEMMK == 1 kernel
|
2018-11-30 20:23:26 +01:00 |
|
Cedric Nugteren
|
2d32a23293
|
Added new col2im routine to the documentation
|
2018-11-01 21:46:19 +01:00 |
|
Cedric Nugteren
|
664a238adf
|
Fixed a bug in the XaxpyFaster kernel for specific parameters
|
2018-10-15 20:08:29 +02:00 |
|
Cedric Nugteren
|
634b2bc75c
|
Merge pull request #319 from CNugteren/convgemm_multi_kernel
First im2col+GEMM implementation of convolution
|
2018-10-14 17:27:45 +02:00 |
|
Cedric Nugteren
|
115a8f0f3d
|
Updated changelog regarding tuning API change
|
2018-10-13 17:49:49 +02:00 |
|
Cedric Nugteren
|
83ba3d4b7b
|
Merge branch 'master' into convgemm_multi_kernel
|
2018-09-16 20:01:18 +02:00 |
|
Cedric Nugteren
|
8ac39fa331
|
Disabled Intel subgroup shuffling for double-precision
|
2018-09-15 16:53:09 +02:00 |
|
Cedric Nugteren
|
c788e040f7
|
Added xCONVGEMM as im2col plus a batched GEMM kernel
|
2018-09-07 22:02:44 +02:00 |
|
Cedric Nugteren
|
9d9f09fce9
|
Name change of setting to NETLIB_PERSISTENT_OPENCL
|
2018-08-07 22:41:06 +02:00 |
|
Cedric Nugteren
|
fe639455bd
|
Added an option to compile the Netlib API with static OpenCL device and context
|
2018-08-05 21:12:39 +02:00 |
|
Cedric Nugteren
|
503ab74f02
|
Fixed issue with not performing complex conjugation under certain cases when transposing
|
2018-07-31 21:49:37 +02:00 |
|
Cedric Nugteren
|
fa84ac36f2
|
The tuners now also check for valid local thread configurations and skip invalid ones completely, saving compilation time
|
2018-07-28 16:01:03 +02:00 |
|
Cedric Nugteren
|
03bed8633e
|
Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel
|
2018-07-27 23:08:49 +02:00 |
|
Cedric Nugteren
|
6a8b9e24f2
|
Added code to report the average tuning results
|
2018-07-25 22:28:44 +02:00 |
|
Cedric Nugteren
|
db179a1e40
|
Updated to CLBlast version 1.4.1
|
2018-07-14 12:29:06 +02:00 |
|
Cedric Nugteren
|
c459582c4f
|
Added tuning results for HD Graphics 6000 Broadwell GT3
|
2018-07-13 21:05:43 +02:00 |
|
Cedric Nugteren
|
7bae54f61f
|
Updated changelog
|
2018-07-06 19:39:46 +02:00 |
|
Cedric Nugteren
|
e3eedacbcc
|
Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first
|
2018-06-28 20:35:18 +09:00 |
|
Cedric Nugteren
|
4471b67735
|
Updated to CLBlast version 1.4.0
|
2018-06-03 13:18:05 +02:00 |
|
Cedric Nugteren
|
4f594e3931
|
Added MKL as an alternative for CBLAS for correctness and performance comparisons
|
2018-06-02 17:57:45 +02:00 |
|
Cedric Nugteren
|
66583b3cda
|
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
|
2018-05-19 12:48:59 +02:00 |
|
Cedric Nugteren
|
60d057c7fd
|
Merge branch 'master' into canary_buffer_overflow_protection
|
2018-05-18 21:30:11 +02:00 |
|
Cedric Nugteren
|
85341836dd
|
Added a canary region for overflow detection to the correctness tests
|
2018-05-17 10:45:50 +01:00 |
|
Cedric Nugteren
|
8258321a74
|
Now stores a shared_ptr to the Program class in the cache
|
2018-05-01 20:34:48 +02:00 |
|
Cedric Nugteren
|
b2248a17ae
|
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
Intel subgroup shuffling
|
2018-04-29 15:48:35 +02:00 |
|
Cedric Nugteren
|
9f22bc232b
|
Updated the changelog
|
2018-04-29 15:06:44 +02:00 |
|
Cedric Nugteren
|
7b416c8686
|
Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
|
2018-04-26 21:10:17 +02:00 |
|
Cedric Nugteren
|
f14e6f87d2
|
Updated tuning results for the Skylake ULT GT2 GPU with the new kernel
|
2018-04-15 11:45:45 +02:00 |
|
Cedric Nugteren
|
9596e46d01
|
Added tuning results for NVIDIA GeForce 920MX
|
2018-04-07 17:44:32 +02:00 |
|
Cedric Nugteren
|
9fb6550dd0
|
Added the OpenCL local memory size constraint to the tuners
|
2018-03-22 21:01:02 +01:00 |
|
Cedric Nugteren
|
54bbc99273
|
Updated the documentation for the tuner API
|
2018-03-10 14:52:40 +01:00 |
|
Cedric Nugteren
|
1940e67009
|
Updated the changelog
|
2018-02-26 19:53:50 +01:00 |
|
Cedric Nugteren
|
0557694d39
|
Fixed several issues in the new invert tuner
|
2018-02-20 20:53:13 +01:00 |
|
Cedric Nugteren
|
c3a3976b7d
|
Updated changelog and roadmap: Python package created
|
2018-02-18 18:01:26 +01:00 |
|
Cedric Nugteren
|
69ed46c8da
|
Implemented the XHAD Hadamard product routine
|
2018-02-02 21:18:37 +01:00 |
|
Cedric Nugteren
|
37c5e8f58c
|
Updated to CLBlast version 1.3.0
|
2018-01-29 20:45:21 +01:00 |
|
Cedric Nugteren
|
a500f537d8
|
Added a RetrieveParameters function to inspect tuning parameters
|
2018-01-11 20:32:06 +01:00 |
|
Cedric Nugteren
|
99a4df88a6
|
Implemented the in-direct version of the strided-batched GEMM kernel
|
2018-01-08 21:07:01 +01:00 |
|