CLBlast/CHANGELOG

138 lines
5.4 KiB
Plaintext
Raw Normal View History

2015-05-30 12:30:43 +02:00
Development version (next release)
- It is now possible to set OpenCL compiler options through the env variable CLBLAST_BUILD_OPTIONS
- Fixed a bug in the tests and samples related to waiting for an invalid event
2016-09-13 19:20:39 +02:00
Version 0.9.0
- Updated to version 6.0 of the CLCudaAPI C++11 OpenCL header
- Improved performance significantly of rotated GEMV computations
- Improved performance of unseen/un-tuned devices by a better default tuning parameter selection
- Fixed proper MSVC dllimport and dllexport declarations
- Fixed memory leaks related to events not being released
- Fixed a bug with a size_t and cl_ulong mismatch on 32-bit systems
- Fixed a bug related to the cache and retrieval of programs based on the OpenCL context
- Fixed a performance issue (caused by fp16 support) by optimizing alpha/beta parameter passing to kernels
- Fixed a bug in the OpenCL kernels: now placing __kernel before __attribute__
- Fixed a bug in level-3 routines when beta is zero and matrix C contains NaNs
- Added an option (-warm_up) to do a warm-up run before timing in the performance clients
- Various minor fixes and enhancements
- Added tuned parameters for various devices (see README)
2016-06-28 21:32:00 +02:00
Version 0.8.0
- Added support for half-precision floating-point (fp16) in the library
- Made it possible to compile the performance tests (clients) separately from the correctness tests
- Made a reference BLAS and head-to-head performance comparison optional in the clients
- Increased the verbosity of the "-verbose" option in the correctness tests
- Refactored the host code for better compilation times and fewer lines of code
2016-06-27 12:47:39 +02:00
- Added Appveyor continuous integration and increased coverage of the Travis builds
- Improved the API documentation
- Various minor fixes and enhancements
- Added tuned parameters for various devices (see README)
- Added half-precision routines:
* Level-1: HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
* Level-2: HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV/HGER/HSYR/HSPR/HSYR2/HSPR2
* Level-3: HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
- Added non-BLAS routines:
* SOMATCOPY/DOMATCOPY/COMATCOPY/ZOMATCOPY/HOMATCOPY (matrix copy, scaling, and/or transpose)
Version 0.7.1
- Improved performance of large power-of-2 xGEMM kernels for AMD GPUs
- Fixed a bug in the xGEMM routine related to the event incorrectly set
- Made MSVC link the run-time libraries statically
2016-05-08 20:29:41 +02:00
Version 0.7.0
- Added exports to be able to create a DLL on Windows (thanks to Marco Hutter)
- Made the library thread-safe
- Performance and correctness tests can now (on top of clBLAS) be performed against CPU BLAS libraries
2016-04-01 05:20:32 +02:00
- Fixed the use of events within the library
- Changed the enum parameters to match the raw values of the cblas standard
- Fixed the cache of previously compiled binaries and added a function to fill or clear it
- Various minor fixes and enhancements
- Added a preliminary version of the API documentation
- Added additional sample programs
- Added tuned parameters for various devices (see README)
2016-04-01 05:20:32 +02:00
- Added level-1 routines:
* SNRM2/DNRM2/ScNRM2/DzNRM2
* SASUM/DASUM/ScASUM/DzASUM
* SSUM/DSUM/ScSUM/DzSUM (non-absolute version of the above xASUM BLAS routines)
* iSAMAX/iDAMAX/iCAMAX/iZAMAX
* iSMAX/iDMAX/iCMAX/iZMAX (non-absolute version of the above ixAMAX BLAS routines)
* iSMIN/iDMIN/iCMIN/iZMIN (non-absolute minimum version of the above ixAMAX BLAS routines)
2016-03-13 11:02:40 +01:00
Version 0.6.0
2016-02-10 21:32:09 +01:00
- Added support for MSVC (Visual Studio) 2015
- Added tuned parameters for various devices (see README)
- Now automatically generates C++ code from JSON tuning results
- Added level-2 routines:
* SGER/DGER
* CGERU/ZGERU
* CGERC/ZGERC
* CHER/ZHER
* CHPR/ZHPR
* CHER2/ZHER2
* CHPR2/ZHPR2
* CSYR/ZSYR
* CSPR/ZSPR
* CSYR2/ZSYR2
* CSPR2/ZSPR2
2015-10-17 15:48:13 +02:00
Version 0.5.0
- Improved structure and performance of level-2 routines (xSYMV/xHEMV)
- Reduced compilation time of level-3 OpenCL kernels
- Added level-1 routines:
* SSWAP/DSWAP/CSWAP/ZSWAP
* SSCAL/DSCAL/CSCAL/ZSCAL
* SCOPY/DCOPY/CCOPY/ZCOPY
* SDOT/DDOT
* CDOTU/ZDOTU
* CDOTC/ZDOTC
- Added level-2 routines:
* SGBMV/DGBMV/CGBMV/ZGBMV
2015-09-19 11:11:34 +02:00
* CHBMV/ZHBMV
2015-09-19 17:40:38 +02:00
* CHPMV/ZHPMV
2015-09-19 18:01:19 +02:00
* SSBMV/DSBMV
* SSPMV/DSPMV
2015-09-26 16:58:03 +02:00
* STRMV/DTRMV/CTRMV/ZTRMV
* STBMV/DTBMV/CTBMV/ZTBMV
* STPMV/DTPMV/CTPMV/ZTPMV
2015-08-22 12:41:40 +02:00
Version 0.4.0
- Now using the Claduc C++11 interface to OpenCL
2015-08-13 18:00:09 +02:00
- Added plain C API for increased compatibility (clblast_c.h)
2015-08-22 12:40:18 +02:00
- Re-organized tuner infrastructure and added JSON output
- Removed clBLAS sources, it should now be installed separately for testing
- Added Travis continuous integration
2015-07-31 17:44:17 +02:00
- Added level-2 routines:
* CHEMV/ZHEMV
* SSYMV/DSYMV
2015-07-24 08:25:32 +02:00
Version 0.3.0
- Re-organized test/client infrastructure to avoid code duplication
- Added an optional bypass for pre/post-processing kernels in level-3 routines
- Significantly improved performance of level-3 routines on AMD GPUs
2015-06-24 07:52:19 +02:00
- Added level-3 routines:
2015-07-12 15:14:35 +02:00
* CHEMM/ZHEMM
2015-06-24 07:52:19 +02:00
* SSYRK/DSYRK/CSYRK/ZSYRK
2015-07-12 15:14:35 +02:00
* CHERK/ZHERK
* SSYR2K/DSYR2K/CSYR2K/ZSYR2K
2015-07-12 15:14:35 +02:00
* CHER2K/ZHER2K
* STRMM/DTRMM/CTRMM/ZTRMM
2015-06-24 07:52:19 +02:00
2015-06-21 09:13:08 +02:00
Version 0.2.0
- Added support for complex conjugate transpose
- Several host-code performance improvements
- Improved testing infrastructure and coverage
2015-06-15 08:41:37 +02:00
- Added level-2 routines:
* SGEMV/DGEMV/CGEMV/ZGEMV
- Added level-3 routines:
* CGEMM/ZGEMM
* CSYMM/ZSYMM
2015-06-15 08:41:37 +02:00
2015-05-30 12:30:43 +02:00
Version 0.1.0
- Initial preview version release to GitHub
- Supported level-1 routines:
* SAXPY/DAXPY/CAXPY/ZAXPY
2015-05-30 12:30:43 +02:00
- Supported level-3 routines:
* SGEMM/DGEMM
* SSYMM/DSYMM