Cedric Nugteren
|
9d9f09fce9
|
Name change of setting to NETLIB_PERSISTENT_OPENCL
|
2018-08-07 22:41:06 +02:00 |
|
Cedric Nugteren
|
fe639455bd
|
Added an option to compile the Netlib API with static OpenCL device and context
|
2018-08-05 21:12:39 +02:00 |
|
Cedric Nugteren
|
503ab74f02
|
Fixed issue with not performing complex conjugation under certain cases when transposing
|
2018-07-31 21:49:37 +02:00 |
|
Cedric Nugteren
|
fa84ac36f2
|
The tuners now also check for valid local thread configurations and skip invalid ones completely, saving compilation time
|
2018-07-28 16:01:03 +02:00 |
|
Cedric Nugteren
|
03bed8633e
|
Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel
|
2018-07-27 23:08:49 +02:00 |
|
Cedric Nugteren
|
6a8b9e24f2
|
Added code to report the average tuning results
|
2018-07-25 22:28:44 +02:00 |
|
Cedric Nugteren
|
db179a1e40
|
Updated to CLBlast version 1.4.1
|
2018-07-14 12:29:06 +02:00 |
|
Cedric Nugteren
|
c459582c4f
|
Added tuning results for HD Graphics 6000 Broadwell GT3
|
2018-07-13 21:05:43 +02:00 |
|
Cedric Nugteren
|
7bae54f61f
|
Updated changelog
|
2018-07-06 19:39:46 +02:00 |
|
Cedric Nugteren
|
e3eedacbcc
|
Disabled calls to clReleaseProgram under Windows to avoid segfaults when the OpenCL driver unloads first
|
2018-06-28 20:35:18 +09:00 |
|
Cedric Nugteren
|
4471b67735
|
Updated to CLBlast version 1.4.0
|
2018-06-03 13:18:05 +02:00 |
|
Cedric Nugteren
|
4f594e3931
|
Added MKL as an alternative for CBLAS for correctness and performance comparisons
|
2018-06-02 17:57:45 +02:00 |
|
Cedric Nugteren
|
66583b3cda
|
The GEMM routine tuner now loads kernel JSON tuning results from disk if available; now run part of alltuners target
|
2018-05-19 12:48:59 +02:00 |
|
Cedric Nugteren
|
60d057c7fd
|
Merge branch 'master' into canary_buffer_overflow_protection
|
2018-05-18 21:30:11 +02:00 |
|
Cedric Nugteren
|
85341836dd
|
Added a canary region for overflow detection to the correctness tests
|
2018-05-17 10:45:50 +01:00 |
|
Cedric Nugteren
|
8258321a74
|
Now stores a shared_ptr to the Program class in the cache
|
2018-05-01 20:34:48 +02:00 |
|
Cedric Nugteren
|
b2248a17ae
|
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
Intel subgroup shuffling
|
2018-04-29 15:48:35 +02:00 |
|
Cedric Nugteren
|
9f22bc232b
|
Updated the changelog
|
2018-04-29 15:06:44 +02:00 |
|
Cedric Nugteren
|
7b416c8686
|
Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
|
2018-04-26 21:10:17 +02:00 |
|
Cedric Nugteren
|
f14e6f87d2
|
Updated tuning results for the Skylake ULT GT2 GPU with the new kernel
|
2018-04-15 11:45:45 +02:00 |
|
Cedric Nugteren
|
9596e46d01
|
Added tuning results for NVIDIA GeForce 920MX
|
2018-04-07 17:44:32 +02:00 |
|
Cedric Nugteren
|
9fb6550dd0
|
Added the OpenCL local memory size constraint to the tuners
|
2018-03-22 21:01:02 +01:00 |
|
Cedric Nugteren
|
54bbc99273
|
Updated the documentation for the tuner API
|
2018-03-10 14:52:40 +01:00 |
|
Cedric Nugteren
|
1940e67009
|
Updated the changelog
|
2018-02-26 19:53:50 +01:00 |
|
Cedric Nugteren
|
0557694d39
|
Fixed several issues in the new invert tuner
|
2018-02-20 20:53:13 +01:00 |
|
Cedric Nugteren
|
c3a3976b7d
|
Updated changelog and roadmap: Python package created
|
2018-02-18 18:01:26 +01:00 |
|
Cedric Nugteren
|
69ed46c8da
|
Implemented the XHAD Hadamard product routine
|
2018-02-02 21:18:37 +01:00 |
|
Cedric Nugteren
|
37c5e8f58c
|
Updated to CLBlast version 1.3.0
|
2018-01-29 20:45:21 +01:00 |
|
Cedric Nugteren
|
a500f537d8
|
Added a RetrieveParameters function to inspect tuning parameters
|
2018-01-11 20:32:06 +01:00 |
|
Cedric Nugteren
|
99a4df88a6
|
Implemented the in-direct version of the strided-batched GEMM kernel
|
2018-01-08 21:07:01 +01:00 |
|
Cedric Nugteren
|
c988c2cdd1
|
Updated changelog and roadmap
|
2018-01-06 17:16:11 +01:00 |
|
Cedric Nugteren
|
ad483123e6
|
Fixed the issue with AMD's APP compiler not being able to compile the invert kernel
|
2017-12-31 16:13:13 +01:00 |
|
Cedric Nugteren
|
1e738db6dd
|
Split the database into multiple small compilation units
|
2017-12-27 12:04:22 +01:00 |
|
Cedric Nugteren
|
b1f52f130c
|
Updated the database to use the new TRSV and Invert tuners
|
2017-12-23 13:55:22 +01:00 |
|
Cedric Nugteren
|
c680666250
|
Added try-except to database script parser to skip invalid files
|
2017-12-20 19:14:04 +01:00 |
|
Cedric Nugteren
|
69f6591564
|
Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results based on kernel pre-processor
|
2017-12-17 16:59:08 +01:00 |
|
Cedric Nugteren
|
11489e68ef
|
Updated roadmap: completed pre-processor implementation
|
2017-12-10 16:08:06 +01:00 |
|
Cedric Nugteren
|
ca5dbcd2bd
|
Made the pre-processor run by default for ARM and Qualcomm GPUs
|
2017-12-09 15:16:53 +01:00 |
|
Cedric Nugteren
|
a768b7686b
|
Added precision check to parameter override for the clients
|
2017-11-24 21:09:39 +01:00 |
|
Cedric Nugteren
|
76d2b7f0b6
|
Revived the GEMM routine tuner; minor formatting changes
|
2017-11-19 12:59:52 +01:00 |
|
Cedric Nugteren
|
c41d219ea4
|
Added tuning results for the GeForce GTX750Ti
|
2017-11-09 21:19:21 +01:00 |
|
Cedric Nugteren
|
5d5e3f93bc
|
Updated to CLBlast version 1.2.0
|
2017-11-08 21:30:06 +01:00 |
|
Cedric Nugteren
|
b18cc9d3f1
|
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
|
2017-11-07 22:20:13 +01:00 |
|
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
|
Cedric Nugteren
|
f24d611e57
|
Made it possible to compile the CLBlast performance clients for Android with the NDK
|
2017-10-29 13:02:14 +01:00 |
|
Cedric Nugteren
|
fa6e5e67f5
|
Fixed a bug when using the matrix A-offset argument for the TRSM routine
|
2017-10-27 22:12:30 +02:00 |
|
Cedric Nugteren
|
44f7fa628a
|
Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM
|
2017-10-27 22:01:15 +02:00 |
|
Cedric Nugteren
|
d49aae236e
|
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls
|
2017-10-25 20:35:39 +02:00 |
|
Cedric Nugteren
|
472f90501c
|
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
|
2017-10-20 18:06:12 +02:00 |
|
Cedric Nugteren
|
03760f80eb
|
Added CUDA API documentation
|
2017-10-16 21:54:42 +02:00 |
|