Cedric Nugteren
|
489c5d76cf
|
Merged in latest changes from 0.7.1 release
|
2016-05-18 21:32:56 +02:00 |
|
Cedric Nugteren
|
182d2cffa1
|
Prepared the changelog for the next release
|
2016-05-18 21:26:20 +02:00 |
|
Cedric Nugteren
|
9a061528eb
|
Updated to version 0.7.1
|
2016-05-18 21:13:04 +02:00 |
|
Cedric Nugteren
|
7ad5cc89d0
|
Made MSVC link the run-time libraries statically
|
2016-05-17 23:12:19 +02:00 |
|
Cedric Nugteren
|
4b6bdd83a2
|
Added header with conversions from and to half-precision floating-point
|
2016-05-15 20:13:57 +02:00 |
|
cnugteren
|
716d7c67d9
|
Fixed a bug in the xGEMM routine related to the event incorrectly set
|
2016-05-15 16:10:56 +02:00 |
|
cnugteren
|
9065b34684
|
Added support for staggered/shuffled offsets for GEMM to improve performance for large power-of-2 kernels on AMD GPUs
|
2016-05-15 14:04:34 +02:00 |
|
Cedric Nugteren
|
0dacd04bcd
|
Prepared the changelog for the next release
|
2016-05-08 21:30:04 +02:00 |
|
Cedric Nugteren
|
c5730c8b43
|
Updated to version 0.7.0
|
2016-05-08 20:29:41 +02:00 |
|
Cedric Nugteren
|
ed2904a344
|
Added preliminary generated API documentation
|
2016-05-08 09:49:00 +02:00 |
|
Cedric Nugteren
|
6c9e08c5e2
|
Added an option to the tests to control whether to test against clBLAS or a CPU BLAS library
|
2016-05-07 12:22:06 +02:00 |
|
Cedric Nugteren
|
435729a43e
|
Added tuning results for AMD Hawaii (R9 290X)
|
2016-05-02 20:20:23 +02:00 |
|
Cedric Nugteren
|
e113ff0852
|
Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX
|
2016-04-30 09:49:39 +02:00 |
|
Cedric Nugteren
|
d9b21d7f49
|
Fixed the cache to store binaries instead of OpenCL programs
|
2016-04-28 21:14:17 +02:00 |
|
Cedric Nugteren
|
d7ddbdeb1f
|
Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX
|
2016-04-27 18:07:30 +02:00 |
|
Cedric Nugteren
|
82be8f211c
|
Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache
|
2016-04-27 16:02:13 +02:00 |
|
cnugteren
|
16a048f1ac
|
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
|
2016-04-20 22:12:51 -06:00 |
|
cnugteren
|
5a4f8217be
|
Updated the reduction-kernel tuner to also tune the epilogue
|
2016-04-14 21:37:52 -06:00 |
|
cnugteren
|
c4ab9bda63
|
Updated the documentation in light of the support for a reference CPU BLAS library
|
2016-04-03 16:07:25 -07:00 |
|
cnugteren
|
8217b01702
|
Updated the documentation
|
2016-03-31 20:20:32 -07:00 |
|
Cedric Nugteren
|
49822c8ead
|
Fixed the C-api export to be able to properly build a DLL on Windows
|
2016-03-23 20:49:28 +01:00 |
|
Cedric Nugteren
|
918797735d
|
Made the library thread-safe by guarding the kernel cache with a mutex
|
2016-03-14 22:55:22 +01:00 |
|
Cedric Nugteren
|
fda335ddf2
|
Prepared the changelog for the next release
|
2016-03-13 11:09:02 +01:00 |
|
Cedric Nugteren
|
bf4bd072e2
|
Updated to version 0.6.0
|
2016-03-13 11:02:40 +01:00 |
|
Cedric Nugteren
|
306bf67660
|
Added preliminary support for xHPR2 and xSPR2 routines
|
2016-03-06 15:48:11 +01:00 |
|
Cedric Nugteren
|
3c27edb087
|
Updated the changelog with newly supported level-2 routines
|
2016-02-28 16:37:49 +01:00 |
|
Cedric Nugteren
|
c457a70aa1
|
Updated the changelog
|
2016-02-10 21:32:09 +01:00 |
|
CNugteren
|
3f616366bd
|
Prepared the changelog for the next release
|
2015-10-17 15:57:04 +02:00 |
|
CNugteren
|
92404035e8
|
Updated to version 0.5.0
|
2015-10-17 15:48:13 +02:00 |
|
CNugteren
|
0d4091fdfb
|
Added guards for routine-specific level-3 pad kernels
|
2015-10-13 08:29:45 +02:00 |
|
CNugteren
|
2b56c2c603
|
Added TRMV/TBMV/TPMV routines
|
2015-09-26 16:58:03 +02:00 |
|
CNugteren
|
de6547a92b
|
Added SBMV and SPMV routines
|
2015-09-19 18:01:19 +02:00 |
|
CNugteren
|
80da67d28b
|
Added the HPMV routine
|
2015-09-19 17:40:38 +02:00 |
|
CNugteren
|
aebd156869
|
Added the HBMV routine
|
2015-09-19 11:11:34 +02:00 |
|
CNugteren
|
93dddda63e
|
Improved the organization and performance of level 2 routines
|
2015-09-18 17:46:41 +02:00 |
|
CNugteren
|
4507ba4997
|
Added first version of banded matrix-vector multiplication
|
2015-09-18 15:25:20 +02:00 |
|
CNugteren
|
a2e726d3bd
|
Added xDOT/xDOTU/xDOTC dot-product routines
|
2015-09-14 16:57:00 +02:00 |
|
CNugteren
|
ff0c54c386
|
Added the XSWAP, XSCAL and XCOPY level-1 routines
|
2015-08-22 17:11:20 +02:00 |
|
CNugteren
|
70ba7c83d4
|
Prepared the changelog for the next release
|
2015-08-22 12:50:26 +02:00 |
|
CNugteren
|
74f601794d
|
Updated to version 0.4.0
|
2015-08-22 12:41:40 +02:00 |
|
CNugteren
|
ff1a670e88
|
Updated the documentation
|
2015-08-22 12:40:18 +02:00 |
|
CNugteren
|
4242f90215
|
Added the plain C API
|
2015-08-13 18:00:09 +02:00 |
|
CNugteren
|
fc7cd434e1
|
Added HEMV and SYMV
|
2015-07-31 17:44:17 +02:00 |
|
CNugteren
|
a27ce11c69
|
Updated documentation reflecting removal of clBLAS sources
|
2015-07-31 11:15:48 +02:00 |
|
CNugteren
|
b10f4a633c
|
Prepared the changelog for the next release
|
2015-07-24 20:50:00 +02:00 |
|
CNugteren
|
efbdcd2d90
|
Updated to version 0.3.0
|
2015-07-24 08:25:32 +02:00 |
|
CNugteren
|
a76dc2f09c
|
Updated the docs to reflect the performance improvements
|
2015-07-24 08:16:41 +02:00 |
|
CNugteren
|
6908c4ebd2
|
Updated changelog with pre/post-processing bypass
|
2015-07-15 22:24:15 +02:00 |
|
CNugteren
|
c920400261
|
Added HEMM, HERK, HER2K, and TRMM
|
2015-07-12 15:14:35 +02:00 |
|
CNugteren
|
3726f6a618
|
Re-organized test and client infrastructure
|
2015-06-29 20:42:34 +02:00 |
|
CNugteren
|
7c8d16147a
|
Added the SYR2K routine, tester, and client
|
2015-06-26 08:12:56 +02:00 |
|
CNugteren
|
3de4471afe
|
Added the SYRK routine
|
2015-06-24 07:52:19 +02:00 |
|
CNugteren
|
985eeac503
|
Updated to version 0.2.0
|
2015-06-21 09:13:08 +02:00 |
|
CNugteren
|
84dd6ba1d7
|
Updated changelog with testing improvements
|
2015-06-20 16:47:50 +02:00 |
|
CNugteren
|
41ce480c51
|
Updated changelog with host-code performance optimisation
|
2015-06-19 07:34:00 +02:00 |
|
CNugteren
|
8b2dbdba98
|
Updated with conjugate transpose and CGEMM/ZGEMM CSYMM/ZSYMM
|
2015-06-17 07:12:45 +02:00 |
|
CNugteren
|
f925d47dad
|
Added GEMV to changelog and readme
|
2015-06-15 08:41:37 +02:00 |
|
CNugteren
|
bc5a341dfe
|
Initial commit of preview version
|
2015-05-30 12:30:43 +02:00 |
|