Cedric Nugteren
|
35623cd98d
|
Minor update regarding the previous CMake export/install target changes
|
2016-07-28 20:45:09 +02:00 |
|
Cedric Nugteren
|
40a72259eb
|
Fixe a bug in the new XgemvFastRot kernel related to local memory size
|
2016-07-23 16:58:11 +02:00 |
|
Cedric Nugteren
|
c87e877bf2
|
Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel
|
2016-07-10 20:32:01 +02:00 |
|
Cedric Nugteren
|
9caa7ca5b9
|
Cache now compares cl_context instead of a pointer to a context; added verbose print statements to the cache
|
2016-07-08 20:57:58 +02:00 |
|
Cedric Nugteren
|
77325b8974
|
Added an option to the performance clients to do a warm-up run before timing
|
2016-07-06 21:25:55 +02:00 |
|
Cedric Nugteren
|
9683b50c55
|
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
|
2016-07-03 20:30:47 +02:00 |
|
Cedric Nugteren
|
7cf2f8c268
|
Fixed some memory leaks related to events not properly cleaned-up
|
2016-07-02 15:34:55 +02:00 |
|
Cedric Nugteren
|
b330ab0866
|
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
|
2016-06-30 10:49:17 +02:00 |
|
Cedric Nugteren
|
cd74aaac52
|
Updated to version 6.0 of the CLCudaAPI header
|
2016-06-29 19:42:49 +02:00 |
|
Cedric Nugteren
|
56483347e8
|
Prepared the changelog for the next release
|
2016-06-28 22:33:13 +02:00 |
|
Cedric Nugteren
|
577f0ee117
|
Updated to version 0.8.0
|
2016-06-28 21:32:00 +02:00 |
|
Cedric Nugteren
|
7eeb790824
|
Added Appveyor Windows CI support
|
2016-06-27 12:47:39 +02:00 |
|
Cedric Nugteren
|
61203453aa
|
Renamed all C++ source files to .cpp to match the .hpp extension better
|
2016-06-19 13:55:49 +02:00 |
|
Cedric Nugteren
|
52ccaf5b25
|
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
|
2016-06-16 18:07:46 +02:00 |
|
Cedric Nugteren
|
995a528cec
|
Improved API documentation and added documentation for level-2 and level-3 routines
|
2016-06-13 20:17:26 +02:00 |
|
Cedric Nugteren
|
137d1d8708
|
Added tuning parameters for 'GRID K520' and 'HD Graphics Skylake ULT GT2'
|
2016-06-01 09:39:33 +02:00 |
|
Cedric Nugteren
|
983df6a8b4
|
Made use of CMake's built-in unit testing, allowing all tests to be run using 'make test'
|
2016-05-31 20:53:55 +02:00 |
|
Cedric Nugteren
|
f6b2cd9579
|
Increased the verbosity of the -verbose option in the correctness tests
|
2016-05-30 20:07:09 +02:00 |
|
Cedric Nugteren
|
305bf16c4c
|
Separated the performance tests (clients) from the correctness tests in CMake
|
2016-05-30 16:38:26 +02:00 |
|
Cedric Nugteren
|
61105e3810
|
Merge branch 'half_precision' into development
|
2016-05-30 11:11:28 +02:00 |
|
Cedric Nugteren
|
9f87455070
|
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
|
2016-05-25 13:29:53 +02:00 |
|
Cedric Nugteren
|
3e9a07f00a
|
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
|
2016-05-22 16:59:14 +02:00 |
|
Cedric Nugteren
|
95b828da12
|
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
|
2016-05-22 15:38:26 +02:00 |
|
Cedric Nugteren
|
803aaf3070
|
Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
|
2016-05-22 14:47:14 +02:00 |
|
Cedric Nugteren
|
489c5d76cf
|
Merged in latest changes from 0.7.1 release
|
2016-05-18 21:32:56 +02:00 |
|
Cedric Nugteren
|
182d2cffa1
|
Prepared the changelog for the next release
|
2016-05-18 21:26:20 +02:00 |
|
Cedric Nugteren
|
9a061528eb
|
Updated to version 0.7.1
|
2016-05-18 21:13:04 +02:00 |
|
Cedric Nugteren
|
7ad5cc89d0
|
Made MSVC link the run-time libraries statically
|
2016-05-17 23:12:19 +02:00 |
|
Cedric Nugteren
|
4b6bdd83a2
|
Added header with conversions from and to half-precision floating-point
|
2016-05-15 20:13:57 +02:00 |
|
cnugteren
|
716d7c67d9
|
Fixed a bug in the xGEMM routine related to the event incorrectly set
|
2016-05-15 16:10:56 +02:00 |
|
cnugteren
|
9065b34684
|
Added support for staggered/shuffled offsets for GEMM to improve performance for large power-of-2 kernels on AMD GPUs
|
2016-05-15 14:04:34 +02:00 |
|
Cedric Nugteren
|
0dacd04bcd
|
Prepared the changelog for the next release
|
2016-05-08 21:30:04 +02:00 |
|
Cedric Nugteren
|
c5730c8b43
|
Updated to version 0.7.0
|
2016-05-08 20:29:41 +02:00 |
|
Cedric Nugteren
|
ed2904a344
|
Added preliminary generated API documentation
|
2016-05-08 09:49:00 +02:00 |
|
Cedric Nugteren
|
6c9e08c5e2
|
Added an option to the tests to control whether to test against clBLAS or a CPU BLAS library
|
2016-05-07 12:22:06 +02:00 |
|
Cedric Nugteren
|
435729a43e
|
Added tuning results for AMD Hawaii (R9 290X)
|
2016-05-02 20:20:23 +02:00 |
|
Cedric Nugteren
|
e113ff0852
|
Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX
|
2016-04-30 09:49:39 +02:00 |
|
Cedric Nugteren
|
d9b21d7f49
|
Fixed the cache to store binaries instead of OpenCL programs
|
2016-04-28 21:14:17 +02:00 |
|
Cedric Nugteren
|
d7ddbdeb1f
|
Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX
|
2016-04-27 18:07:30 +02:00 |
|
Cedric Nugteren
|
82be8f211c
|
Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache
|
2016-04-27 16:02:13 +02:00 |
|
cnugteren
|
16a048f1ac
|
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
|
2016-04-20 22:12:51 -06:00 |
|
cnugteren
|
5a4f8217be
|
Updated the reduction-kernel tuner to also tune the epilogue
|
2016-04-14 21:37:52 -06:00 |
|
cnugteren
|
c4ab9bda63
|
Updated the documentation in light of the support for a reference CPU BLAS library
|
2016-04-03 16:07:25 -07:00 |
|
cnugteren
|
8217b01702
|
Updated the documentation
|
2016-03-31 20:20:32 -07:00 |
|
Cedric Nugteren
|
49822c8ead
|
Fixed the C-api export to be able to properly build a DLL on Windows
|
2016-03-23 20:49:28 +01:00 |
|
Cedric Nugteren
|
918797735d
|
Made the library thread-safe by guarding the kernel cache with a mutex
|
2016-03-14 22:55:22 +01:00 |
|
Cedric Nugteren
|
fda335ddf2
|
Prepared the changelog for the next release
|
2016-03-13 11:09:02 +01:00 |
|
Cedric Nugteren
|
bf4bd072e2
|
Updated to version 0.6.0
|
2016-03-13 11:02:40 +01:00 |
|
Cedric Nugteren
|
306bf67660
|
Added preliminary support for xHPR2 and xSPR2 routines
|
2016-03-06 15:48:11 +01:00 |
|
Cedric Nugteren
|
3c27edb087
|
Updated the changelog with newly supported level-2 routines
|
2016-02-28 16:37:49 +01:00 |
|