Commit graph

63 commits

Author SHA1 Message Date
Cedric Nugteren 61105e3810 Merge branch 'half_precision' into development 2016-05-30 11:11:28 +02:00
Cedric Nugteren 9f87455070 Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM 2016-05-25 13:29:53 +02:00
Cedric Nugteren 3e9a07f00a Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2 2016-05-22 16:59:14 +02:00
Cedric Nugteren 95b828da12 Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV 2016-05-22 15:38:26 +02:00
Cedric Nugteren 803aaf3070 Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN 2016-05-22 14:47:14 +02:00
Cedric Nugteren 489c5d76cf Merged in latest changes from 0.7.1 release 2016-05-18 21:32:56 +02:00
Cedric Nugteren 182d2cffa1 Prepared the changelog for the next release 2016-05-18 21:26:20 +02:00
Cedric Nugteren 9a061528eb Updated to version 0.7.1 2016-05-18 21:13:04 +02:00
Cedric Nugteren 7ad5cc89d0 Made MSVC link the run-time libraries statically 2016-05-17 23:12:19 +02:00
Cedric Nugteren 4b6bdd83a2 Added header with conversions from and to half-precision floating-point 2016-05-15 20:13:57 +02:00
cnugteren 716d7c67d9 Fixed a bug in the xGEMM routine related to the event incorrectly set 2016-05-15 16:10:56 +02:00
cnugteren 9065b34684 Added support for staggered/shuffled offsets for GEMM to improve performance for large power-of-2 kernels on AMD GPUs 2016-05-15 14:04:34 +02:00
Cedric Nugteren 0dacd04bcd Prepared the changelog for the next release 2016-05-08 21:30:04 +02:00
Cedric Nugteren c5730c8b43 Updated to version 0.7.0 2016-05-08 20:29:41 +02:00
Cedric Nugteren ed2904a344 Added preliminary generated API documentation 2016-05-08 09:49:00 +02:00
Cedric Nugteren 6c9e08c5e2 Added an option to the tests to control whether to test against clBLAS or a CPU BLAS library 2016-05-07 12:22:06 +02:00
Cedric Nugteren 435729a43e Added tuning results for AMD Hawaii (R9 290X) 2016-05-02 20:20:23 +02:00
Cedric Nugteren e113ff0852 Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX 2016-04-30 09:49:39 +02:00
Cedric Nugteren d9b21d7f49 Fixed the cache to store binaries instead of OpenCL programs 2016-04-28 21:14:17 +02:00
Cedric Nugteren d7ddbdeb1f Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX 2016-04-27 18:07:30 +02:00
Cedric Nugteren 82be8f211c Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache 2016-04-27 16:02:13 +02:00
cnugteren 16a048f1ac Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines 2016-04-20 22:12:51 -06:00
cnugteren 5a4f8217be Updated the reduction-kernel tuner to also tune the epilogue 2016-04-14 21:37:52 -06:00
cnugteren c4ab9bda63 Updated the documentation in light of the support for a reference CPU BLAS library 2016-04-03 16:07:25 -07:00
cnugteren 8217b01702 Updated the documentation 2016-03-31 20:20:32 -07:00
Cedric Nugteren 49822c8ead Fixed the C-api export to be able to properly build a DLL on Windows 2016-03-23 20:49:28 +01:00
Cedric Nugteren 918797735d Made the library thread-safe by guarding the kernel cache with a mutex 2016-03-14 22:55:22 +01:00
Cedric Nugteren fda335ddf2 Prepared the changelog for the next release 2016-03-13 11:09:02 +01:00
Cedric Nugteren bf4bd072e2 Updated to version 0.6.0 2016-03-13 11:02:40 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 3c27edb087 Updated the changelog with newly supported level-2 routines 2016-02-28 16:37:49 +01:00
Cedric Nugteren c457a70aa1 Updated the changelog 2016-02-10 21:32:09 +01:00
CNugteren 3f616366bd Prepared the changelog for the next release 2015-10-17 15:57:04 +02:00
CNugteren 92404035e8 Updated to version 0.5.0 2015-10-17 15:48:13 +02:00
CNugteren 0d4091fdfb Added guards for routine-specific level-3 pad kernels 2015-10-13 08:29:45 +02:00
CNugteren 2b56c2c603 Added TRMV/TBMV/TPMV routines 2015-09-26 16:58:03 +02:00
CNugteren de6547a92b Added SBMV and SPMV routines 2015-09-19 18:01:19 +02:00
CNugteren 80da67d28b Added the HPMV routine 2015-09-19 17:40:38 +02:00
CNugteren aebd156869 Added the HBMV routine 2015-09-19 11:11:34 +02:00
CNugteren 93dddda63e Improved the organization and performance of level 2 routines 2015-09-18 17:46:41 +02:00
CNugteren 4507ba4997 Added first version of banded matrix-vector multiplication 2015-09-18 15:25:20 +02:00
CNugteren a2e726d3bd Added xDOT/xDOTU/xDOTC dot-product routines 2015-09-14 16:57:00 +02:00
CNugteren ff0c54c386 Added the XSWAP, XSCAL and XCOPY level-1 routines 2015-08-22 17:11:20 +02:00
CNugteren 70ba7c83d4 Prepared the changelog for the next release 2015-08-22 12:50:26 +02:00
CNugteren 74f601794d Updated to version 0.4.0 2015-08-22 12:41:40 +02:00
CNugteren ff1a670e88 Updated the documentation 2015-08-22 12:40:18 +02:00
CNugteren 4242f90215 Added the plain C API 2015-08-13 18:00:09 +02:00
CNugteren fc7cd434e1 Added HEMV and SYMV 2015-07-31 17:44:17 +02:00
CNugteren a27ce11c69 Updated documentation reflecting removal of clBLAS sources 2015-07-31 11:15:48 +02:00
CNugteren b10f4a633c Prepared the changelog for the next release 2015-07-24 20:50:00 +02:00