CLBlast

mirror of https://github.com/CNugteren/CLBlast.git synced 2024-07-07 12:23:46 +02:00

Author	SHA1	Message	Date
Cedric Nugteren	39b7dbc5e3	Added some constness to variables related to the GEMM routines	2016-06-15 12:34:05 +02:00
Cedric Nugteren	b894611ad1	Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) and renamed files and functions appropriately	2016-06-14 18:17:58 +02:00
Cedric Nugteren	3e78a99355	Moved device vendor and type checks to a common header	2016-06-14 14:30:22 +02:00
Cedric Nugteren	6e2017c67d	Added support for FP16 on ARM Mali-T628 (officially not supported)	2016-06-14 14:29:53 +02:00
Cedric Nugteren	995a528cec	Improved API documentation and added documentation for level-2 and level-3 routines	2016-06-13 20:17:26 +02:00
Cedric Nugteren	4fb8f9517c	Added documentation for the matrix-update level-2 family of routines	2016-06-10 11:16:06 +02:00
Cedric Nugteren	6925003e45	Added global memory synchronisation for better cache performance on ARM Mali GPUs	2016-06-08 10:13:37 +02:00
Cedric Nugteren	6d6b030053	Made the CPU BLAS library the default reference to test against in favor of clBLAS	2016-06-08 09:21:39 +02:00
Cedric Nugteren	7a7873d552	Fixed the RPATH settings for linking on OS X	2016-06-06 13:40:52 +02:00
Cedric Nugteren	c1895ea459	Made the tests for invalid buffer sizes also verbose in verbose mode	2016-06-06 12:20:42 +02:00
Cedric Nugteren	e561e3fbd5	Added return value to the test binaries (0: success, 1: failure), allowing it to work under CTest properly	2016-06-02 16:24:22 +02:00
Cedric Nugteren	137d1d8708	Added tuning parameters for 'GRID K520' and 'HD Graphics Skylake ULT GT2'	2016-06-01 09:39:33 +02:00
Cedric Nugteren	983df6a8b4	Made use of CMake's built-in unit testing, allowing all tests to be run using 'make test'	2016-05-31 20:53:55 +02:00
Cedric Nugteren	f6b2cd9579	Increased the verbosity of the -verbose option in the correctness tests	2016-05-30 20:07:09 +02:00
Cedric Nugteren	305bf16c4c	Separated the performance tests (clients) from the correctness tests in CMake	2016-05-30 16:38:26 +02:00
Cedric Nugteren	61105e3810	Merge branch 'half_precision' into development	2016-05-30 11:11:28 +02:00
Cedric Nugteren	03182f9d07	Added half-precision tests for the clBLAS reference through conversion to single-precision	2016-05-26 23:36:19 +02:00
Cedric Nugteren	b487d4dd44	Added half-precision tests for the CBLAS reference through conversion to single-precison	2016-05-26 13:15:27 +02:00
Cedric Nugteren	4612ff3552	Added possibility to run the performance client with half-precision	2016-05-25 14:37:26 +02:00
Cedric Nugteren	9f87455070	Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM	2016-05-25 13:29:53 +02:00
Cedric Nugteren	ac1575056e	Added proper argument handling and displaying for half-precision data-types	2016-05-24 14:06:16 +02:00
Cedric Nugteren	ae7d705d6f	Updated README with information on half-precision support	2016-05-23 19:23:46 +02:00
Cedric Nugteren	3e9a07f00a	Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2	2016-05-22 16:59:14 +02:00
Cedric Nugteren	f0cb3fdc81	Fixed tuning results for half-precision; added first results for the xGER kernels	2016-05-22 16:46:05 +02:00
Cedric Nugteren	c8ff3f143f	Prepared the GER kernels and tuner for half-precision support	2016-05-22 16:18:08 +02:00
Cedric Nugteren	95b828da12	Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV	2016-05-22 15:38:26 +02:00
Cedric Nugteren	b6268d0c22	Added first tuning results for the half-precision xGEMV kernels	2016-05-22 15:29:05 +02:00
Cedric Nugteren	88551b4005	Prepared the GEMV kernels and tuner for half-precision support	2016-05-22 15:22:54 +02:00
Cedric Nugteren	803aaf3070	Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN	2016-05-22 14:47:14 +02:00
Cedric Nugteren	3c9e63c054	Added first tuning results for the half-precision xDOT kernels	2016-05-22 14:43:25 +02:00
Cedric Nugteren	f70ded34f3	Added half-precision support for all level 1 routines	2016-05-22 14:26:19 +02:00
Cedric Nugteren	489c5d76cf	Merged in latest changes from 0.7.1 release	2016-05-18 21:32:56 +02:00
Cedric Nugteren	182d2cffa1	Prepared the changelog for the next release	2016-05-18 21:26:20 +02:00
Cedric Nugteren	181eb20bbf	Merge pull request #60 from CNugteren/development Update to version 0.7.1	2016-05-18 21:18:07 +02:00
Cedric Nugteren	9a061528eb	Updated to version 0.7.1	2016-05-18 21:13:04 +02:00
CNugteren	748df9bf75	Fixes for Visual Studio	2016-05-18 20:53:40 +02:00
Cedric Nugteren	9bccc2544a	Fixes for CMake policy CMP0054	2016-05-18 20:36:07 +02:00
Cedric Nugteren	7ad5cc89d0	Made MSVC link the run-time libraries statically	2016-05-17 23:12:19 +02:00
Cedric Nugteren	c240774bad	Fixed warning CMP0054	2016-05-17 22:55:11 +02:00
Cedric Nugteren	7a3b695db7	Added half precision tuning results for supporting kernels (pad, copy, transpose, padtranspose)	2016-05-16 12:45:10 +02:00
Cedric Nugteren	af2ac62212	Prepared GEMM and supporting kernels and tuners for half-precision support	2016-05-16 12:37:24 +02:00
Cedric Nugteren	591e343ec9	Added an example of using the half-precision HAXPY routine	2016-05-15 20:18:34 +02:00
Cedric Nugteren	4b6bdd83a2	Added header with conversions from and to half-precision floating-point	2016-05-15 20:13:57 +02:00
cnugteren	7f5cfd92ba	Updated the performance graph for the Radeon M370X AMD GPU	2016-05-15 17:31:19 +02:00
cnugteren	fd107c9b12	Added new tuning results for SGEMM and updated the performance graph for the Radeon M370X AMD GPU	2016-05-15 17:28:22 +02:00
cnugteren	802c1f48c7	Removed comparison to CBLAS for the graph scripts	2016-05-15 17:06:36 +02:00
cnugteren	716d7c67d9	Fixed a bug in the xGEMM routine related to the event incorrectly set	2016-05-15 16:10:56 +02:00
cnugteren	9e36b3b20d	Fixed the arguments in the performance graphs to reflect the changes in enum values	2016-05-15 14:31:37 +02:00
cnugteren	9065b34684	Added support for staggered/shuffled offsets for GEMM to improve performance for large power-of-2 kernels on AMD GPUs	2016-05-15 14:04:34 +02:00
Cedric Nugteren	5e1b2e021f	Set kernel arguments for AXPY as constant memory buffers, making it possible to transfer half-precision values as well	2016-05-14 18:06:00 +02:00

... 2 3 4 5 6 ...

506 commits