2015-05-30 12:30:43 +02:00
|
|
|
|
2016-03-13 11:09:02 +01:00
|
|
|
Development version (next release)
|
2016-03-23 20:49:28 +01:00
|
|
|
- Added exports to be able to create a DLL on Windows (thanks to Marco Hutter)
|
2016-03-14 22:55:22 +01:00
|
|
|
- Made the library thread-safe
|
2016-04-04 01:07:25 +02:00
|
|
|
- Performance and correctness tests can now (on top of clBLAS) be performed against CPU BLAS libraries
|
2016-04-01 05:20:32 +02:00
|
|
|
- Fixed the use of events within the library
|
2016-04-27 16:02:13 +02:00
|
|
|
- Changed the enum parameters to match the raw values of the cblas standard
|
2016-04-30 09:49:39 +02:00
|
|
|
- Fixed the cache of previously compiled binaries and added a function to fill or clear it
|
|
|
|
- Added additional sample programs
|
2016-04-01 05:20:32 +02:00
|
|
|
- Added level-1 routines:
|
|
|
|
* SNRM2/DNRM2/ScNRM2/DzNRM2
|
2016-04-15 05:37:52 +02:00
|
|
|
* SASUM/DASUM/ScASUM/DzASUM
|
2016-04-27 18:07:30 +02:00
|
|
|
* SSUM/DSUM/ScSUM/DzSUM (non-absolute version of the above xASUM BLAS routines)
|
2016-04-21 06:12:51 +02:00
|
|
|
* iSAMAX/iDAMAX/iCAMAX/iZAMAX
|
2016-04-27 18:07:30 +02:00
|
|
|
* iSMAX/iDMAX/iCMAX/iZMAX (non-absolute version of the above ixAMAX BLAS routines)
|
2016-04-30 09:49:39 +02:00
|
|
|
* iSMIN/iDMIN/iCMIN/iZMIN (non-absolute minimum version of the above ixAMAX BLAS routines)
|
2016-03-13 11:09:02 +01:00
|
|
|
|
2016-03-13 11:02:40 +01:00
|
|
|
Version 0.6.0
|
2016-02-10 21:32:09 +01:00
|
|
|
- Added support for MSVC (Visual Studio) 2015
|
|
|
|
- Added tuned parameters for various devices (see README)
|
|
|
|
- Now automatically generates C++ code from JSON tuning results
|
2016-02-28 16:37:49 +01:00
|
|
|
- Added level-2 routines:
|
|
|
|
* SGER/DGER
|
|
|
|
* CGERU/ZGERU
|
|
|
|
* CGERC/ZGERC
|
|
|
|
* CHER/ZHER
|
|
|
|
* CHPR/ZHPR
|
2016-03-06 15:48:11 +01:00
|
|
|
* CHER2/ZHER2
|
|
|
|
* CHPR2/ZHPR2
|
2016-02-28 16:37:49 +01:00
|
|
|
* CSYR/ZSYR
|
|
|
|
* CSPR/ZSPR
|
2016-03-06 15:48:11 +01:00
|
|
|
* CSYR2/ZSYR2
|
|
|
|
* CSPR2/ZSPR2
|
2015-10-17 15:57:04 +02:00
|
|
|
|
2015-10-17 15:48:13 +02:00
|
|
|
Version 0.5.0
|
2015-09-18 17:46:41 +02:00
|
|
|
- Improved structure and performance of level-2 routines (xSYMV/xHEMV)
|
2015-10-13 08:29:45 +02:00
|
|
|
- Reduced compilation time of level-3 OpenCL kernels
|
2015-08-22 17:11:20 +02:00
|
|
|
- Added level-1 routines:
|
|
|
|
* SSWAP/DSWAP/CSWAP/ZSWAP
|
|
|
|
* SSCAL/DSCAL/CSCAL/ZSCAL
|
|
|
|
* SCOPY/DCOPY/CCOPY/ZCOPY
|
2015-09-14 16:57:00 +02:00
|
|
|
* SDOT/DDOT
|
|
|
|
* CDOTU/ZDOTU
|
|
|
|
* CDOTC/ZDOTC
|
2015-09-18 15:25:20 +02:00
|
|
|
- Added level-2 routines:
|
|
|
|
* SGBMV/DGBMV/CGBMV/ZGBMV
|
2015-09-19 11:11:34 +02:00
|
|
|
* CHBMV/ZHBMV
|
2015-09-19 17:40:38 +02:00
|
|
|
* CHPMV/ZHPMV
|
2015-09-19 18:01:19 +02:00
|
|
|
* SSBMV/DSBMV
|
|
|
|
* SSPMV/DSPMV
|
2015-09-26 16:58:03 +02:00
|
|
|
* STRMV/DTRMV/CTRMV/ZTRMV
|
|
|
|
* STBMV/DTBMV/CTBMV/ZTBMV
|
|
|
|
* STPMV/DTPMV/CTPMV/ZTPMV
|
2015-08-22 12:50:26 +02:00
|
|
|
|
2015-08-22 12:41:40 +02:00
|
|
|
Version 0.4.0
|
2015-07-31 11:15:48 +02:00
|
|
|
- Now using the Claduc C++11 interface to OpenCL
|
2015-08-13 18:00:09 +02:00
|
|
|
- Added plain C API for increased compatibility (clblast_c.h)
|
2015-08-22 12:40:18 +02:00
|
|
|
- Re-organized tuner infrastructure and added JSON output
|
|
|
|
- Removed clBLAS sources, it should now be installed separately for testing
|
|
|
|
- Added Travis continuous integration
|
2015-07-31 17:44:17 +02:00
|
|
|
- Added level-2 routines:
|
|
|
|
* CHEMV/ZHEMV
|
|
|
|
* SSYMV/DSYMV
|
2015-07-24 20:50:00 +02:00
|
|
|
|
2015-07-24 08:25:32 +02:00
|
|
|
Version 0.3.0
|
2015-06-29 20:42:34 +02:00
|
|
|
- Re-organized test/client infrastructure to avoid code duplication
|
2015-07-24 08:16:41 +02:00
|
|
|
- Added an optional bypass for pre/post-processing kernels in level-3 routines
|
|
|
|
- Significantly improved performance of level-3 routines on AMD GPUs
|
2015-06-24 07:52:19 +02:00
|
|
|
- Added level-3 routines:
|
2015-07-12 15:14:35 +02:00
|
|
|
* CHEMM/ZHEMM
|
2015-06-24 07:52:19 +02:00
|
|
|
* SSYRK/DSYRK/CSYRK/ZSYRK
|
2015-07-12 15:14:35 +02:00
|
|
|
* CHERK/ZHERK
|
2015-06-26 08:12:56 +02:00
|
|
|
* SSYR2K/DSYR2K/CSYR2K/ZSYR2K
|
2015-07-12 15:14:35 +02:00
|
|
|
* CHER2K/ZHER2K
|
|
|
|
* STRMM/DTRMM/CTRMM/ZTRMM
|
2015-06-24 07:52:19 +02:00
|
|
|
|
2015-06-21 09:13:08 +02:00
|
|
|
Version 0.2.0
|
2015-06-17 07:12:45 +02:00
|
|
|
- Added support for complex conjugate transpose
|
2015-06-20 16:47:50 +02:00
|
|
|
- Several host-code performance improvements
|
|
|
|
- Improved testing infrastructure and coverage
|
2015-06-15 08:41:37 +02:00
|
|
|
- Added level-2 routines:
|
2015-06-20 16:47:50 +02:00
|
|
|
* SGEMV/DGEMV/CGEMV/ZGEMV
|
2015-06-17 07:12:45 +02:00
|
|
|
- Added level-3 routines:
|
2015-06-20 16:47:50 +02:00
|
|
|
* CGEMM/ZGEMM
|
|
|
|
* CSYMM/ZSYMM
|
2015-06-15 08:41:37 +02:00
|
|
|
|
2015-05-30 12:30:43 +02:00
|
|
|
Version 0.1.0
|
|
|
|
- Initial preview version release to GitHub
|
|
|
|
- Supported level-1 routines:
|
2015-06-20 16:47:50 +02:00
|
|
|
* SAXPY/DAXPY/CAXPY/ZAXPY
|
2015-05-30 12:30:43 +02:00
|
|
|
- Supported level-3 routines:
|
2015-06-20 16:47:50 +02:00
|
|
|
* SGEMM/DGEMM
|
|
|
|
* SSYMM/DSYMM
|