Commit graph

197 commits

Author SHA1 Message Date
Cedric Nugteren 53deed298f Added documentation and minor refactoring for the recent support of static library compilation 2016-10-15 17:11:08 +02:00
Cedric Nugteren ebb505b783 Added tuning results for Intel HD Graphics IvyBridge GPU 2016-10-13 12:18:28 +02:00
Cedric Nugteren 8a9d3cdf37 Added support for compiling the library, the client, and the samples under MSVC 2013 2016-10-10 22:45:39 +02:00
Cedric Nugteren b698e45478 Added first tuning results for the single-kernel direct GEMM implementation 2016-10-06 21:13:14 +02:00
Cedric Nugteren d59e5c570b Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0 2016-09-27 21:03:24 +02:00
Cedric Nugteren db5772e521 Updated to version 8.0 of the CLCudaAPI header 2016-09-27 20:56:49 +02:00
Cedric Nugteren e3076d26cc Added more relaxed error checking for the half-precision tests 2016-09-27 19:42:58 +02:00
Cedric Nugteren d595a8ed7e Fixed a bug waiting for an invalid event in case of a non-succesfull CLBlast call in the tests and samples 2016-09-22 20:47:22 +02:00
Cedric Nugteren b1929d8ce7 It is now possible to set the OpenCL compiler options through an environmental variable 2016-09-21 21:22:16 +02:00
Cedric Nugteren 4b94afda94 Updated to version 0.9.0 2016-09-13 19:20:39 +02:00
Cedric Nugteren b30b26b89e The GEMM kernel no longer adds beta*C in case beta is zero; this would cause problems if C contains NaNs 2016-09-04 17:21:16 +02:00
Cedric Nugteren 8d6a6a5bbf Merge branch 'database_defaults' into development 2016-08-22 19:31:36 +02:00
Cedric Nugteren 00979faab4 Updated the changelog; refactored the database-get-bests code a bit 2016-08-21 20:16:06 +02:00
Cedric Nugteren 7eeef74338 Merge branch 'development' of github.com:CNugteren/CLBlast into development
Conflicts:
	README.md
2016-08-20 12:59:21 +02:00
Cedric Nugteren 6eca53ee23 Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvasschemacq-master
Conflicts:
	src/kernels/level1/xaxpy.opencl
	src/kernels/level2/xgemv.opencl
	src/kernels/level2/xgemv_fast.opencl
	src/kernels/level2/xger.opencl
	src/kernels/level2/xher.opencl
	src/kernels/level2/xher2.opencl
	src/kernels/level3/xgemm_part2.opencl
2016-08-20 12:50:31 +02:00
Cedric Nugteren 35623cd98d Minor update regarding the previous CMake export/install target changes 2016-07-28 20:45:09 +02:00
Cedric Nugteren 40a72259eb Fixe a bug in the new XgemvFastRot kernel related to local memory size 2016-07-23 16:58:11 +02:00
Cedric Nugteren c87e877bf2 Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel 2016-07-10 20:32:01 +02:00
Cedric Nugteren 9caa7ca5b9 Cache now compares cl_context instead of a pointer to a context; added verbose print statements to the cache 2016-07-08 20:57:58 +02:00
Cedric Nugteren 77325b8974 Added an option to the performance clients to do a warm-up run before timing 2016-07-06 21:25:55 +02:00
Cedric Nugteren 9683b50c55 Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp) 2016-07-03 20:30:47 +02:00
Cedric Nugteren 7cf2f8c268 Fixed some memory leaks related to events not properly cleaned-up 2016-07-02 15:34:55 +02:00
Cedric Nugteren b330ab0866 Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library 2016-06-30 10:49:17 +02:00
Cedric Nugteren cd74aaac52 Updated to version 6.0 of the CLCudaAPI header 2016-06-29 19:42:49 +02:00
Cedric Nugteren 56483347e8 Prepared the changelog for the next release 2016-06-28 22:33:13 +02:00
Cedric Nugteren 577f0ee117 Updated to version 0.8.0 2016-06-28 21:32:00 +02:00
Cedric Nugteren 7eeb790824 Added Appveyor Windows CI support 2016-06-27 12:47:39 +02:00
Cedric Nugteren 61203453aa Renamed all C++ source files to .cpp to match the .hpp extension better 2016-06-19 13:55:49 +02:00
Cedric Nugteren 52ccaf5b25 Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing 2016-06-16 18:07:46 +02:00
Cedric Nugteren 995a528cec Improved API documentation and added documentation for level-2 and level-3 routines 2016-06-13 20:17:26 +02:00
Cedric Nugteren 137d1d8708 Added tuning parameters for 'GRID K520' and 'HD Graphics Skylake ULT GT2' 2016-06-01 09:39:33 +02:00
Cedric Nugteren 983df6a8b4 Made use of CMake's built-in unit testing, allowing all tests to be run using 'make test' 2016-05-31 20:53:55 +02:00
Cedric Nugteren f6b2cd9579 Increased the verbosity of the -verbose option in the correctness tests 2016-05-30 20:07:09 +02:00
Cedric Nugteren 305bf16c4c Separated the performance tests (clients) from the correctness tests in CMake 2016-05-30 16:38:26 +02:00
Cedric Nugteren 61105e3810 Merge branch 'half_precision' into development 2016-05-30 11:11:28 +02:00
Cedric Nugteren 9f87455070 Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM 2016-05-25 13:29:53 +02:00
Cedric Nugteren 3e9a07f00a Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2 2016-05-22 16:59:14 +02:00
Cedric Nugteren 95b828da12 Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV 2016-05-22 15:38:26 +02:00
Cedric Nugteren 803aaf3070 Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN 2016-05-22 14:47:14 +02:00
Cedric Nugteren 489c5d76cf Merged in latest changes from 0.7.1 release 2016-05-18 21:32:56 +02:00
Cedric Nugteren 182d2cffa1 Prepared the changelog for the next release 2016-05-18 21:26:20 +02:00
Cedric Nugteren 9a061528eb Updated to version 0.7.1 2016-05-18 21:13:04 +02:00
Cedric Nugteren 7ad5cc89d0 Made MSVC link the run-time libraries statically 2016-05-17 23:12:19 +02:00
Cedric Nugteren 4b6bdd83a2 Added header with conversions from and to half-precision floating-point 2016-05-15 20:13:57 +02:00
cnugteren 716d7c67d9 Fixed a bug in the xGEMM routine related to the event incorrectly set 2016-05-15 16:10:56 +02:00
cnugteren 9065b34684 Added support for staggered/shuffled offsets for GEMM to improve performance for large power-of-2 kernels on AMD GPUs 2016-05-15 14:04:34 +02:00
Cedric Nugteren 0dacd04bcd Prepared the changelog for the next release 2016-05-08 21:30:04 +02:00
Cedric Nugteren c5730c8b43 Updated to version 0.7.0 2016-05-08 20:29:41 +02:00
Cedric Nugteren ed2904a344 Added preliminary generated API documentation 2016-05-08 09:49:00 +02:00
Cedric Nugteren 6c9e08c5e2 Added an option to the tests to control whether to test against clBLAS or a CPU BLAS library 2016-05-07 12:22:06 +02:00
Cedric Nugteren 435729a43e Added tuning results for AMD Hawaii (R9 290X) 2016-05-02 20:20:23 +02:00
Cedric Nugteren e113ff0852 Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX 2016-04-30 09:49:39 +02:00
Cedric Nugteren d9b21d7f49 Fixed the cache to store binaries instead of OpenCL programs 2016-04-28 21:14:17 +02:00
Cedric Nugteren d7ddbdeb1f Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX 2016-04-27 18:07:30 +02:00
Cedric Nugteren 82be8f211c Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache 2016-04-27 16:02:13 +02:00
cnugteren 16a048f1ac Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines 2016-04-20 22:12:51 -06:00
cnugteren 5a4f8217be Updated the reduction-kernel tuner to also tune the epilogue 2016-04-14 21:37:52 -06:00
cnugteren c4ab9bda63 Updated the documentation in light of the support for a reference CPU BLAS library 2016-04-03 16:07:25 -07:00
cnugteren 8217b01702 Updated the documentation 2016-03-31 20:20:32 -07:00
Cedric Nugteren 49822c8ead Fixed the C-api export to be able to properly build a DLL on Windows 2016-03-23 20:49:28 +01:00
Cedric Nugteren 918797735d Made the library thread-safe by guarding the kernel cache with a mutex 2016-03-14 22:55:22 +01:00
Cedric Nugteren fda335ddf2 Prepared the changelog for the next release 2016-03-13 11:09:02 +01:00
Cedric Nugteren bf4bd072e2 Updated to version 0.6.0 2016-03-13 11:02:40 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 3c27edb087 Updated the changelog with newly supported level-2 routines 2016-02-28 16:37:49 +01:00
Cedric Nugteren c457a70aa1 Updated the changelog 2016-02-10 21:32:09 +01:00
CNugteren 3f616366bd Prepared the changelog for the next release 2015-10-17 15:57:04 +02:00
CNugteren 92404035e8 Updated to version 0.5.0 2015-10-17 15:48:13 +02:00
CNugteren 0d4091fdfb Added guards for routine-specific level-3 pad kernels 2015-10-13 08:29:45 +02:00
CNugteren 2b56c2c603 Added TRMV/TBMV/TPMV routines 2015-09-26 16:58:03 +02:00
CNugteren de6547a92b Added SBMV and SPMV routines 2015-09-19 18:01:19 +02:00
CNugteren 80da67d28b Added the HPMV routine 2015-09-19 17:40:38 +02:00
CNugteren aebd156869 Added the HBMV routine 2015-09-19 11:11:34 +02:00
CNugteren 93dddda63e Improved the organization and performance of level 2 routines 2015-09-18 17:46:41 +02:00
CNugteren 4507ba4997 Added first version of banded matrix-vector multiplication 2015-09-18 15:25:20 +02:00
CNugteren a2e726d3bd Added xDOT/xDOTU/xDOTC dot-product routines 2015-09-14 16:57:00 +02:00
CNugteren ff0c54c386 Added the XSWAP, XSCAL and XCOPY level-1 routines 2015-08-22 17:11:20 +02:00
CNugteren 70ba7c83d4 Prepared the changelog for the next release 2015-08-22 12:50:26 +02:00
CNugteren 74f601794d Updated to version 0.4.0 2015-08-22 12:41:40 +02:00
CNugteren ff1a670e88 Updated the documentation 2015-08-22 12:40:18 +02:00
CNugteren 4242f90215 Added the plain C API 2015-08-13 18:00:09 +02:00
CNugteren fc7cd434e1 Added HEMV and SYMV 2015-07-31 17:44:17 +02:00
CNugteren a27ce11c69 Updated documentation reflecting removal of clBLAS sources 2015-07-31 11:15:48 +02:00
CNugteren b10f4a633c Prepared the changelog for the next release 2015-07-24 20:50:00 +02:00
CNugteren efbdcd2d90 Updated to version 0.3.0 2015-07-24 08:25:32 +02:00
CNugteren a76dc2f09c Updated the docs to reflect the performance improvements 2015-07-24 08:16:41 +02:00
CNugteren 6908c4ebd2 Updated changelog with pre/post-processing bypass 2015-07-15 22:24:15 +02:00
CNugteren c920400261 Added HEMM, HERK, HER2K, and TRMM 2015-07-12 15:14:35 +02:00
CNugteren 3726f6a618 Re-organized test and client infrastructure 2015-06-29 20:42:34 +02:00
CNugteren 7c8d16147a Added the SYR2K routine, tester, and client 2015-06-26 08:12:56 +02:00
CNugteren 3de4471afe Added the SYRK routine 2015-06-24 07:52:19 +02:00
CNugteren 985eeac503 Updated to version 0.2.0 2015-06-21 09:13:08 +02:00
CNugteren 84dd6ba1d7 Updated changelog with testing improvements 2015-06-20 16:47:50 +02:00
CNugteren 41ce480c51 Updated changelog with host-code performance optimisation 2015-06-19 07:34:00 +02:00
CNugteren 8b2dbdba98 Updated with conjugate transpose and CGEMM/ZGEMM CSYMM/ZSYMM 2015-06-17 07:12:45 +02:00
CNugteren f925d47dad Added GEMV to changelog and readme 2015-06-15 08:41:37 +02:00
CNugteren bc5a341dfe Initial commit of preview version 2015-05-30 12:30:43 +02:00