Cedric Nugteren
|
99a4df88a6
|
Implemented the in-direct version of the strided-batched GEMM kernel
|
2018-01-08 21:07:01 +01:00 |
|
Cedric Nugteren
|
c988c2cdd1
|
Updated changelog and roadmap
|
2018-01-06 17:16:11 +01:00 |
|
Cedric Nugteren
|
ad483123e6
|
Fixed the issue with AMD's APP compiler not being able to compile the invert kernel
|
2017-12-31 16:13:13 +01:00 |
|
Cedric Nugteren
|
1e738db6dd
|
Split the database into multiple small compilation units
|
2017-12-27 12:04:22 +01:00 |
|
Cedric Nugteren
|
b1f52f130c
|
Updated the database to use the new TRSV and Invert tuners
|
2017-12-23 13:55:22 +01:00 |
|
Cedric Nugteren
|
c680666250
|
Added try-except to database script parser to skip invalid files
|
2017-12-20 19:14:04 +01:00 |
|
Cedric Nugteren
|
69f6591564
|
Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results based on kernel pre-processor
|
2017-12-17 16:59:08 +01:00 |
|
Cedric Nugteren
|
11489e68ef
|
Updated roadmap: completed pre-processor implementation
|
2017-12-10 16:08:06 +01:00 |
|
Cedric Nugteren
|
ca5dbcd2bd
|
Made the pre-processor run by default for ARM and Qualcomm GPUs
|
2017-12-09 15:16:53 +01:00 |
|
Cedric Nugteren
|
a768b7686b
|
Added precision check to parameter override for the clients
|
2017-11-24 21:09:39 +01:00 |
|
Cedric Nugteren
|
76d2b7f0b6
|
Revived the GEMM routine tuner; minor formatting changes
|
2017-11-19 12:59:52 +01:00 |
|
Cedric Nugteren
|
c41d219ea4
|
Added tuning results for the GeForce GTX750Ti
|
2017-11-09 21:19:21 +01:00 |
|
Cedric Nugteren
|
5d5e3f93bc
|
Updated to CLBlast version 1.2.0
|
2017-11-08 21:30:06 +01:00 |
|
Cedric Nugteren
|
b18cc9d3f1
|
Merge pull request #212 from CNugteren/kernel_selection_tuner
GEMM kernel selection tuner
|
2017-11-07 22:20:13 +01:00 |
|
Cedric Nugteren
|
9b0a435fb0
|
Integrated the GEMM routine tuner for kernel selection; added first tuning results
|
2017-11-02 21:47:14 +01:00 |
|
Cedric Nugteren
|
f24d611e57
|
Made it possible to compile the CLBlast performance clients for Android with the NDK
|
2017-10-29 13:02:14 +01:00 |
|
Cedric Nugteren
|
fa6e5e67f5
|
Fixed a bug when using the matrix A-offset argument for the TRSM routine
|
2017-10-27 22:12:30 +02:00 |
|
Cedric Nugteren
|
44f7fa628a
|
Added GEMV synchronisation for the TRSV routine: similar bug as in TRSM
|
2017-10-27 22:01:15 +02:00 |
|
Cedric Nugteren
|
d49aae236e
|
Fixed a bug in TRSM routine due to missing event synchronisations after GEMM calls
|
2017-10-25 20:35:39 +02:00 |
|
Cedric Nugteren
|
472f90501c
|
Added tuning parameters for GeForce GTX 580, GeForce GTX 1080Ti, and Core i5-4570
|
2017-10-20 18:06:12 +02:00 |
|
Cedric Nugteren
|
03760f80eb
|
Added CUDA API documentation
|
2017-10-16 21:54:42 +02:00 |
|
Cedric Nugteren
|
375193fe4e
|
Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers
|
2017-10-03 21:55:21 +02:00 |
|
Cedric Nugteren
|
6b226028d5
|
Allow OverrideParameters function to work before a kernel was first used
|
2017-10-01 20:32:39 +02:00 |
|
Cedric Nugteren
|
29c5283c4b
|
Kernels are now cached based on their routine name and their tuning parameters
|
2017-09-30 20:29:18 +02:00 |
|
Cedric Nugteren
|
f4c4674cf6
|
Updated to version 1.1.0
|
2017-09-30 17:19:17 +02:00 |
|
Cedric Nugteren
|
2df9f21ab8
|
Added extra benchmarks to verify new database caching keys performance
|
2017-09-23 18:06:43 +02:00 |
|
Cedric Nugteren
|
65c492edf6
|
Added OpenCL properties printing to the diagnostics helper
|
2017-09-22 21:35:32 +02:00 |
|
Cedric Nugteren
|
0802e3d84c
|
Added tuning results for Intel Core i7 6770HQ
|
2017-09-16 21:19:06 +02:00 |
|
Cedric Nugteren
|
4e317f5e85
|
Improved compilation time of the tuner database
|
2017-09-16 18:02:37 +02:00 |
|
Cedric Nugteren
|
0d13d814c2
|
Added architecture layer in the tuning database for better performance on unseen devices
|
2017-09-14 21:27:33 +02:00 |
|
Cedric Nugteren
|
28462aa050
|
Removed an assumption that the 'default' tuning parameters have to be stored last; this is no longer needed
|
2017-09-04 17:39:57 +02:00 |
|
Cedric Nugteren
|
161fd8514d
|
Merge branch 'master' into im_to_col
|
2017-08-24 21:15:14 +02:00 |
|
Cedric Nugteren
|
4d9d03ba51
|
Completed im2col implementation
|
2017-08-24 21:11:12 +02:00 |
|
Cedric Nugteren
|
da28cc5e93
|
Minor updates after merging in the PSO addition to the tuners
|
2017-08-21 20:14:02 +02:00 |
|
Cedric Nugteren
|
eb896838b1
|
Updated to version 1.0.1 (bugfix release)
|
2017-08-08 20:35:49 +02:00 |
|
Cedric Nugteren
|
1155c068e9
|
Updated to version 1.0.0
|
2017-07-30 20:54:21 +02:00 |
|
Cedric Nugteren
|
b7473f50df
|
Added status badges for correctness tests; updated list of contributors; fixed minor typos
|
2017-07-24 20:14:47 +02:00 |
|
Cedric Nugteren
|
4cf516cfec
|
Fixed an if-statement in the direct GEMM kernel causing a bug with specific sets of input parameters
|
2017-06-30 21:57:41 +02:00 |
|
Cedric Nugteren
|
ce528a9d39
|
Fixed and suppresses several warnings for MSVC
|
2017-06-26 21:38:04 +02:00 |
|
Cedric Nugteren
|
615a7fdc81
|
Fixes some compilation issues related to the database structure change
|
2017-06-21 23:07:47 +02:00 |
|
Cedric Nugteren
|
33ed1e5a06
|
Added tuning results for GeForce GT 650M (thanks to bzcheeseman)
|
2017-06-01 22:52:08 +02:00 |
|
Cedric Nugteren
|
f151e56daa
|
Added the IxAMIN routines: absolute minimum version of IxAMAX
|
2017-05-12 20:01:33 -07:00 |
|
Cedric Nugteren
|
86e8df60f1
|
Fixed a bug in the TRSM routine; tests now pass
|
2017-05-12 17:43:56 -07:00 |
|
Cedric Nugteren
|
81d9ed3946
|
Removed the included performance reports; README now redirects to the new external website
|
2017-05-12 13:18:10 -07:00 |
|
Cedric Nugteren
|
71933c3411
|
Added tuning results for the AMD Radeon Fiji GPU
|
2017-05-11 22:53:52 -07:00 |
|
Cedric Nugteren
|
97955fc221
|
Minor naming fixes to the benchmark script
|
2017-05-11 22:12:16 -07:00 |
|
Cedric Nugteren
|
e9d2a2f54c
|
Updated to version 0.11.0
|
2017-05-02 20:29:59 +02:00 |
|
Cedric Nugteren
|
e3bb58f602
|
Finalized support for performance testing against cuBLAS
|
2017-04-16 17:53:51 +02:00 |
|
Cedric Nugteren
|
300531b869
|
Updated the changelog with the Apple CPU override
|
2017-04-10 07:21:34 +02:00 |
|
Cedric Nugteren
|
fa5c4b00b7
|
Replaced the R graph scripts with Python/Matplotlib benchmark scripts
|
2017-03-26 15:36:34 +02:00 |
|
Cedric Nugteren
|
7b8f8fce68
|
Added initial naive version of the batched GEMM routine based on the direct GEMM kernel
|
2017-03-11 16:02:45 +01:00 |
|
Cedric Nugteren
|
d754586b49
|
Added proper testing of the alpha parameter; finalized the batched AXPY implementation
|
2017-03-10 20:49:59 +01:00 |
|
Cedric Nugteren
|
d6f1b5fca3
|
Added L2 error computation and checking for half-precision tests
|
2017-02-27 21:49:20 +01:00 |
|
Cedric Nugteren
|
00281dad26
|
Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants
|
2017-02-27 21:00:04 +01:00 |
|
Cedric Nugteren
|
ea6790665d
|
Merge branch 'development' into triangular_solvers
|
2017-02-26 14:51:45 +01:00 |
|
Cedric Nugteren
|
ccac957f17
|
Added documentation for the TRSV and TRSM routines
|
2017-02-25 13:02:15 +01:00 |
|
Cedric Nugteren
|
fef11a208c
|
Added documentation for the OverrideParameters function
|
2017-02-18 11:02:57 +01:00 |
|
Cedric Nugteren
|
fd471e380c
|
Updated the changelog for PR131 and PR132
|
2017-01-24 20:34:09 +01:00 |
|
Cedric Nugteren
|
ff2bf985a3
|
Updated the link to cl.hpp in the Khronos registry for the samples
|
2017-01-07 13:57:23 +01:00 |
|
Cedric Nugteren
|
69ca271a8c
|
Always enables cl_khr_fp64 when running double-precision, not just for OpenCL 1.1 or lower
|
2017-01-07 13:31:29 +01:00 |
|
Cedric Nugteren
|
32b850b12b
|
Added tuning results for the AMD Turks GPU and the Intel Core i7-2670QM CPU
|
2017-01-03 20:30:56 +01:00 |
|
Cedric Nugteren
|
6b533dda1c
|
Fixed a bug when using offsets in the direct GEMM kernels
|
2016-12-18 11:54:32 +01:00 |
|
Cedric Nugteren
|
2cf7d8429a
|
Updated to version 0.10.0
|
2016-11-27 13:34:18 +01:00 |
|
Cedric Nugteren
|
39c49bf4f9
|
Made it possible to use the command-line environmental variables for each executable and without re-running CMake
|
2016-11-27 11:00:29 +01:00 |
|
Cedric Nugteren
|
cb398f0e42
|
Merge pull request #125 from CNugteren/netlib_blas_api
Netlib CBLAS API for CLBlast
|
2016-11-24 19:35:59 +01:00 |
|
Cedric Nugteren
|
2f0697564f
|
Fixed a bug in the TRMM routine caused by overwriting input data before consuming everything
|
2016-11-20 15:05:42 +01:00 |
|
Cedric Nugteren
|
bb14a5880e
|
Added an example and documentation for the Netlib CBLAS API
|
2016-10-25 20:37:33 +02:00 |
|
Cedric Nugteren
|
a670c4c4bf
|
All enums in the C API are now prefixed with CLBlast to avoid potential name clashes with other projects
|
2016-10-22 16:14:56 +02:00 |
|
Cedric Nugteren
|
9afbbc9ef9
|
Added documentation for the better exception handling
|
2016-10-22 15:23:18 +02:00 |
|
Cedric Nugteren
|
db17b1fbe9
|
Fixed a bug in the SYRK/SYR2K/HERK/HER2K routines that would occur with specific tuning parameters
|
2016-10-22 10:41:02 +02:00 |
|
Cedric Nugteren
|
53deed298f
|
Added documentation and minor refactoring for the recent support of static library compilation
|
2016-10-15 17:11:08 +02:00 |
|
Cedric Nugteren
|
ebb505b783
|
Added tuning results for Intel HD Graphics IvyBridge GPU
|
2016-10-13 12:18:28 +02:00 |
|
Cedric Nugteren
|
8a9d3cdf37
|
Added support for compiling the library, the client, and the samples under MSVC 2013
|
2016-10-10 22:45:39 +02:00 |
|
Cedric Nugteren
|
b698e45478
|
Added first tuning results for the single-kernel direct GEMM implementation
|
2016-10-06 21:13:14 +02:00 |
|
Cedric Nugteren
|
d59e5c570b
|
Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0
|
2016-09-27 21:03:24 +02:00 |
|
Cedric Nugteren
|
db5772e521
|
Updated to version 8.0 of the CLCudaAPI header
|
2016-09-27 20:56:49 +02:00 |
|
Cedric Nugteren
|
e3076d26cc
|
Added more relaxed error checking for the half-precision tests
|
2016-09-27 19:42:58 +02:00 |
|
Cedric Nugteren
|
d595a8ed7e
|
Fixed a bug waiting for an invalid event in case of a non-succesfull CLBlast call in the tests and samples
|
2016-09-22 20:47:22 +02:00 |
|
Cedric Nugteren
|
b1929d8ce7
|
It is now possible to set the OpenCL compiler options through an environmental variable
|
2016-09-21 21:22:16 +02:00 |
|
Cedric Nugteren
|
4b94afda94
|
Updated to version 0.9.0
|
2016-09-13 19:20:39 +02:00 |
|
Cedric Nugteren
|
b30b26b89e
|
The GEMM kernel no longer adds beta*C in case beta is zero; this would cause problems if C contains NaNs
|
2016-09-04 17:21:16 +02:00 |
|
Cedric Nugteren
|
8d6a6a5bbf
|
Merge branch 'database_defaults' into development
|
2016-08-22 19:31:36 +02:00 |
|
Cedric Nugteren
|
00979faab4
|
Updated the changelog; refactored the database-get-bests code a bit
|
2016-08-21 20:16:06 +02:00 |
|
Cedric Nugteren
|
7eeef74338
|
Merge branch 'development' of github.com:CNugteren/CLBlast into development
Conflicts:
README.md
|
2016-08-20 12:59:21 +02:00 |
|
Cedric Nugteren
|
6eca53ee23
|
Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvasschemacq-master
Conflicts:
src/kernels/level1/xaxpy.opencl
src/kernels/level2/xgemv.opencl
src/kernels/level2/xgemv_fast.opencl
src/kernels/level2/xger.opencl
src/kernels/level2/xher.opencl
src/kernels/level2/xher2.opencl
src/kernels/level3/xgemm_part2.opencl
|
2016-08-20 12:50:31 +02:00 |
|
Cedric Nugteren
|
35623cd98d
|
Minor update regarding the previous CMake export/install target changes
|
2016-07-28 20:45:09 +02:00 |
|
Cedric Nugteren
|
40a72259eb
|
Fixe a bug in the new XgemvFastRot kernel related to local memory size
|
2016-07-23 16:58:11 +02:00 |
|
Cedric Nugteren
|
c87e877bf2
|
Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel
|
2016-07-10 20:32:01 +02:00 |
|
Cedric Nugteren
|
9caa7ca5b9
|
Cache now compares cl_context instead of a pointer to a context; added verbose print statements to the cache
|
2016-07-08 20:57:58 +02:00 |
|
Cedric Nugteren
|
77325b8974
|
Added an option to the performance clients to do a warm-up run before timing
|
2016-07-06 21:25:55 +02:00 |
|
Cedric Nugteren
|
9683b50c55
|
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
|
2016-07-03 20:30:47 +02:00 |
|
Cedric Nugteren
|
7cf2f8c268
|
Fixed some memory leaks related to events not properly cleaned-up
|
2016-07-02 15:34:55 +02:00 |
|
Cedric Nugteren
|
b330ab0866
|
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
|
2016-06-30 10:49:17 +02:00 |
|
Cedric Nugteren
|
cd74aaac52
|
Updated to version 6.0 of the CLCudaAPI header
|
2016-06-29 19:42:49 +02:00 |
|
Cedric Nugteren
|
56483347e8
|
Prepared the changelog for the next release
|
2016-06-28 22:33:13 +02:00 |
|
Cedric Nugteren
|
577f0ee117
|
Updated to version 0.8.0
|
2016-06-28 21:32:00 +02:00 |
|
Cedric Nugteren
|
7eeb790824
|
Added Appveyor Windows CI support
|
2016-06-27 12:47:39 +02:00 |
|
Cedric Nugteren
|
61203453aa
|
Renamed all C++ source files to .cpp to match the .hpp extension better
|
2016-06-19 13:55:49 +02:00 |
|
Cedric Nugteren
|
52ccaf5b25
|
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
|
2016-06-16 18:07:46 +02:00 |
|
Cedric Nugteren
|
995a528cec
|
Improved API documentation and added documentation for level-2 and level-3 routines
|
2016-06-13 20:17:26 +02:00 |
|