CLBlast

mirror of https://github.com/CNugteren/CLBlast.git synced 2024-07-15 19:05:44 +02:00

Author	SHA1	Message	Date
Cedric Nugteren	ac1575056e	Added proper argument handling and displaying for half-precision data-types	2016-05-24 14:06:16 +02:00
Cedric Nugteren	ae7d705d6f	Updated README with information on half-precision support	2016-05-23 19:23:46 +02:00
Cedric Nugteren	3e9a07f00a	Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2	2016-05-22 16:59:14 +02:00
Cedric Nugteren	f0cb3fdc81	Fixed tuning results for half-precision; added first results for the xGER kernels	2016-05-22 16:46:05 +02:00
Cedric Nugteren	c8ff3f143f	Prepared the GER kernels and tuner for half-precision support	2016-05-22 16:18:08 +02:00
Cedric Nugteren	95b828da12	Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV	2016-05-22 15:38:26 +02:00
Cedric Nugteren	b6268d0c22	Added first tuning results for the half-precision xGEMV kernels	2016-05-22 15:29:05 +02:00
Cedric Nugteren	88551b4005	Prepared the GEMV kernels and tuner for half-precision support	2016-05-22 15:22:54 +02:00
Cedric Nugteren	803aaf3070	Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN	2016-05-22 14:47:14 +02:00
Cedric Nugteren	3c9e63c054	Added first tuning results for the half-precision xDOT kernels	2016-05-22 14:43:25 +02:00
Cedric Nugteren	f70ded34f3	Added half-precision support for all level 1 routines	2016-05-22 14:26:19 +02:00
Cedric Nugteren	489c5d76cf	Merged in latest changes from 0.7.1 release	2016-05-18 21:32:56 +02:00
Cedric Nugteren	7a3b695db7	Added half precision tuning results for supporting kernels (pad, copy, transpose, padtranspose)	2016-05-16 12:45:10 +02:00
Cedric Nugteren	af2ac62212	Prepared GEMM and supporting kernels and tuners for half-precision support	2016-05-16 12:37:24 +02:00
Cedric Nugteren	591e343ec9	Added an example of using the half-precision HAXPY routine	2016-05-15 20:18:34 +02:00
Cedric Nugteren	4b6bdd83a2	Added header with conversions from and to half-precision floating-point	2016-05-15 20:13:57 +02:00
Cedric Nugteren	5e1b2e021f	Set kernel arguments for AXPY as constant memory buffers, making it possible to transfer half-precision values as well	2016-05-14 18:06:00 +02:00
Cedric Nugteren	120c31a30f	Initial experimental version of the half-precision HAXPY routine	2016-05-13 20:49:34 +02:00
Cedric Nugteren	f2ba75890c	Initial changes in preparation for half-precision fp16 support	2016-05-12 19:56:21 +02:00
Cedric Nugteren	1c72d225c5	Fixed links in the README	2016-05-10 21:03:51 +02:00
Cedric Nugteren	0dacd04bcd	Prepared the changelog for the next release	2016-05-08 21:30:04 +02:00
CNugteren	942912daeb	Fixes for compilation of the tests under Visual Studio 2015	2016-05-08 21:11:37 +02:00
Cedric Nugteren	c5730c8b43	Updated to version 0.7.0	2016-05-08 20:29:41 +02:00
cnugteren	3b81ee2c08	Fixed an issue where the xAMAX tester would incorrectly report failures when testing against CBLAS	2016-05-08 18:28:01 +02:00
cnugteren	eaf1de5745	Fixed an issue where the xNRM2 and xASUM testers would incorrectly report failures for complex inputs	2016-05-08 18:07:55 +02:00
cnugteren	25a25dbd6f	Fixed errors in xAXPY and xSCAL tests on AMD hardware	2016-05-08 17:30:31 +02:00
cnugteren	1acb31896c	Fixed an issue with computing the GFLOPS numbers for the xGEMM performance tests for non-square matrices	2016-05-08 10:06:06 +02:00
Cedric Nugteren	ed2904a344	Added preliminary generated API documentation	2016-05-08 09:49:00 +02:00
Cedric Nugteren	6c9e08c5e2	Added an option to the tests to control whether to test against clBLAS or a CPU BLAS library	2016-05-07 12:22:06 +02:00
Cedric Nugteren	56aa1701c9	Added printing of indices when testing in verbose mode	2016-05-05 23:09:57 +02:00
Cedric Nugteren	f18c12389d	Merge pull request #57 from dividiti/development Locate the C BLAS library before the F77 one.	2016-05-05 22:27:22 +02:00
Anton Lokhmotov	e075dc347a	Locate the C BLAS library before the F77 one.	2016-05-05 14:38:10 +00:00
Cedric Nugteren	aa97c836b1	Fixed an issue with linking against the ATLAS BLAS library	2016-05-04 19:16:09 +02:00
Cedric Nugteren	435729a43e	Added tuning results for AMD Hawaii (R9 290X)	2016-05-02 20:20:23 +02:00
Cedric Nugteren	a8f109296c	Fixed the calculation of the required buffer sizes in case of subvectors and submatrices	2016-05-02 20:04:55 +02:00
Cedric Nugteren	27d0ac7f38	Added tuning results for AMD Pitcairn (R9 270X)	2016-05-01 19:33:50 +02:00
Cedric Nugteren	c94b628318	Updated tuning database for reduction/dot kernels based on the new tuner; partially repopulated the database	2016-05-01 19:17:04 +02:00
Cedric Nugteren	b9317d7d0c	Made the default xDOT tuning size smaller	2016-05-01 14:39:44 +02:00
Cedric Nugteren	bee2f943ec	Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking	2016-05-01 14:03:37 +02:00
Cedric Nugteren	9602c150aa	Added a program cache (per-context) next to the per-device binary cache	2016-05-01 12:56:08 +02:00
Cedric Nugteren	e113ff0852	Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX	2016-04-30 09:49:39 +02:00
Cedric Nugteren	2952390f27	Added an example to demonstrate the use of the ClearCache and FillCache functions	2016-04-29 23:33:36 +02:00
Cedric Nugteren	877aad693f	Added FillCache: a function to pre-compile all kernels for a specific device	2016-04-29 23:33:12 +02:00
Cedric Nugteren	4f528b1730	Added sample C programs for the SASUM and DGEMV routines	2016-04-29 20:33:19 +02:00
Cedric Nugteren	d9b21d7f49	Fixed the cache to store binaries instead of OpenCL programs	2016-04-28 21:14:17 +02:00
Cedric Nugteren	d7ddbdeb1f	Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX	2016-04-27 18:07:30 +02:00
Cedric Nugteren	13eed1a0f9	Added missing namespace to the SGEMM example	2016-04-27 17:59:28 +02:00
Cedric Nugteren	8075934ca7	Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute counterparts of xASUM and IxAMAX)	2016-04-27 17:06:19 +02:00
Cedric Nugteren	82be8f211c	Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache	2016-04-27 16:02:13 +02:00
Cedric Nugteren	44bdb60e83	Relaxed the absolute error margin for floating-point value comparisons to 1e-4	2016-04-27 14:42:30 +02:00

1 2 3 4 5 ...

317 commits