Cedric Nugteren
|
9fb6550dd0
|
Added the OpenCL local memory size constraint to the tuners
|
2018-03-22 21:01:02 +01:00 |
|
Cedric Nugteren
|
54bbc99273
|
Updated the documentation for the tuner API
|
2018-03-10 14:52:40 +01:00 |
|
Cedric Nugteren
|
1940e67009
|
Updated the changelog
|
2018-02-26 19:53:50 +01:00 |
|
Cedric Nugteren
|
0557694d39
|
Fixed several issues in the new invert tuner
|
2018-02-20 20:53:13 +01:00 |
|
Cedric Nugteren
|
c3a3976b7d
|
Updated changelog and roadmap: Python package created
|
2018-02-18 18:01:26 +01:00 |
|
Cedric Nugteren
|
69ed46c8da
|
Implemented the XHAD Hadamard product routine
|
2018-02-02 21:18:37 +01:00 |
|
Cedric Nugteren
|
37c5e8f58c
|
Updated to CLBlast version 1.3.0
|
2018-01-29 20:45:21 +01:00 |
|
Cedric Nugteren
|
a500f537d8
|
Added a RetrieveParameters function to inspect tuning parameters
|
2018-01-11 20:32:06 +01:00 |
|
Cedric Nugteren
|
99a4df88a6
|
Implemented the in-direct version of the strided-batched GEMM kernel
|
2018-01-08 21:07:01 +01:00 |
|
Cedric Nugteren
|
c988c2cdd1
|
Updated changelog and roadmap
|
2018-01-06 17:16:11 +01:00 |
|
Cedric Nugteren
|
ad483123e6
|
Fixed the issue with AMD's APP compiler not being able to compile the invert kernel
|
2017-12-31 16:13:13 +01:00 |
|
Cedric Nugteren
|
1e738db6dd
|
Split the database into multiple small compilation units
|
2017-12-27 12:04:22 +01:00 |
|
Cedric Nugteren
|
b1f52f130c
|
Updated the database to use the new TRSV and Invert tuners
|
2017-12-23 13:55:22 +01:00 |
|
Cedric Nugteren
|
c680666250
|
Added try-except to database script parser to skip invalid files
|
2017-12-20 19:14:04 +01:00 |
|
Cedric Nugteren
|
69f6591564
|
Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results based on kernel pre-processor
|
2017-12-17 16:59:08 +01:00 |
|
Cedric Nugteren
|
11489e68ef
|
Updated roadmap: completed pre-processor implementation
|
2017-12-10 16:08:06 +01:00 |
|
Cedric Nugteren
|
ca5dbcd2bd
|
Made the pre-processor run by default for ARM and Qualcomm GPUs
|
2017-12-09 15:16:53 +01:00 |
|
Cedric Nugteren
|
a768b7686b
|
Added precision check to parameter override for the clients
|
2017-11-24 21:09:39 +01:00 |
|
Cedric Nugteren
|
76d2b7f0b6
|
Revived the GEMM routine tuner; minor formatting changes
|
2017-11-19 12:59:52 +01:00 |
|
Cedric Nugteren
|
c41d219ea4
|
Added tuning results for the GeForce GTX750Ti
|
2017-11-09 21:19:21 +01:00 |
|
Cedric Nugteren
|
5d5e3f93bc
|
Updated to CLBlast version 1.2.0
|
2017-11-08 21:30:06 +01:00 |
|
Cedric Nugteren
|
b18cc9d3f1
|
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
|
2017-11-07 22:20:13 +01:00 |
|
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
|
Cedric Nugteren
|
f24d611e57
|
Made it possible to compile the CLBlast performance clients for Android with the NDK
|
2017-10-29 13:02:14 +01:00 |
|
Cedric Nugteren
|
fa6e5e67f5
|
Fixed a bug when using the matrix A-offset argument for the TRSM routine
|
2017-10-27 22:12:30 +02:00 |
|
Cedric Nugteren
|
44f7fa628a
|
Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM
|
2017-10-27 22:01:15 +02:00 |
|
Cedric Nugteren
|
d49aae236e
|
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls
|
2017-10-25 20:35:39 +02:00 |
|
Cedric Nugteren
|
472f90501c
|
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
|
2017-10-20 18:06:12 +02:00 |
|
Cedric Nugteren
|
03760f80eb
|
Added CUDA API documentation
|
2017-10-16 21:54:42 +02:00 |
|
Cedric Nugteren
|
375193fe4e
|
Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers
|
2017-10-03 21:55:21 +02:00 |
|
Cedric Nugteren
|
6b226028d5
|
Allow OverrideParameters function to work before a kernel was first used
|
2017-10-01 20:32:39 +02:00 |
|
Cedric Nugteren
|
29c5283c4b
|
Kernels are now cached based on their routine name and their tuning parameters
|
2017-09-30 20:29:18 +02:00 |
|
Cedric Nugteren
|
f4c4674cf6
|
Updated to version 1.1.0
|
2017-09-30 17:19:17 +02:00 |
|
Cedric Nugteren
|
2df9f21ab8
|
Added extra benchmarks to verify new database caching keys performance
|
2017-09-23 18:06:43 +02:00 |
|
Cedric Nugteren
|
65c492edf6
|
Added OpenCL properties printing to the diagnostics helper
|
2017-09-22 21:35:32 +02:00 |
|
Cedric Nugteren
|
0802e3d84c
|
Added tuning results for Intel Core i7 6770HQ
|
2017-09-16 21:19:06 +02:00 |
|
Cedric Nugteren
|
4e317f5e85
|
Improved compilation time of the tuner database
|
2017-09-16 18:02:37 +02:00 |
|
Cedric Nugteren
|
0d13d814c2
|
Added architecture layer in the tuning database for better performance on unseen devices
|
2017-09-14 21:27:33 +02:00 |
|
Cedric Nugteren
|
28462aa050
|
Removed an assumption that the 'default' tuning parameters have to be stored last; this is no longer needed
|
2017-09-04 17:39:57 +02:00 |
|
Cedric Nugteren
|
161fd8514d
|
Merge branch 'master' into im_to_col
|
2017-08-24 21:15:14 +02:00 |
|
Cedric Nugteren
|
4d9d03ba51
|
Completed im2col implementation
|
2017-08-24 21:11:12 +02:00 |
|
Cedric Nugteren
|
da28cc5e93
|
Minor updates after merging in the PSO addition to the tuners
|
2017-08-21 20:14:02 +02:00 |
|
Cedric Nugteren
|
eb896838b1
|
Updated to version 1.0.1 (bugfix release)
|
2017-08-08 20:35:49 +02:00 |
|
Cedric Nugteren
|
1155c068e9
|
Updated to version 1.0.0
|
2017-07-30 20:54:21 +02:00 |
|
Cedric Nugteren
|
b7473f50df
|
Added status badges for correctness tests; updated list of contributors; fixed minor typos
|
2017-07-24 20:14:47 +02:00 |
|
Cedric Nugteren
|
4cf516cfec
|
Fixed an if-statement in the direct GEMM kernel causing a bug with specific sets of input parameters
|
2017-06-30 21:57:41 +02:00 |
|
Cedric Nugteren
|
ce528a9d39
|
Fixed and suppresses several warnings for MSVC
|
2017-06-26 21:38:04 +02:00 |
|
Cedric Nugteren
|
615a7fdc81
|
Fixes some compilation issues related to the database structure change
|
2017-06-21 23:07:47 +02:00 |
|
Cedric Nugteren
|
33ed1e5a06
|
Added tuning results for GeForce GT 650M (thanks to bzcheeseman)
|
2017-06-01 22:52:08 +02:00 |
|
Cedric Nugteren
|
f151e56daa
|
Added the IxAMIN routines: absolute minimum version of IxAMAX
|
2017-05-12 20:01:33 -07:00 |
|