Cedric Nugteren
|
f6b2cd9579
|
Increased the verbosity of the -verbose option in the correctness tests
|
2016-05-30 20:07:09 +02:00 |
|
Cedric Nugteren
|
305bf16c4c
|
Separated the performance tests (clients) from the correctness tests in CMake
|
2016-05-30 16:38:26 +02:00 |
|
Cedric Nugteren
|
61105e3810
|
Merge branch 'half_precision' into development
|
2016-05-30 11:11:28 +02:00 |
|
Cedric Nugteren
|
03182f9d07
|
Added half-precision tests for the clBLAS reference through conversion to single-precision
|
2016-05-26 23:36:19 +02:00 |
|
Cedric Nugteren
|
b487d4dd44
|
Added half-precision tests for the CBLAS reference through conversion to single-precison
|
2016-05-26 13:15:27 +02:00 |
|
Cedric Nugteren
|
4612ff3552
|
Added possibility to run the performance client with half-precision
|
2016-05-25 14:37:26 +02:00 |
|
Cedric Nugteren
|
9f87455070
|
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
|
2016-05-25 13:29:53 +02:00 |
|
Cedric Nugteren
|
ac1575056e
|
Added proper argument handling and displaying for half-precision data-types
|
2016-05-24 14:06:16 +02:00 |
|
Cedric Nugteren
|
ae7d705d6f
|
Updated README with information on half-precision support
|
2016-05-23 19:23:46 +02:00 |
|
Cedric Nugteren
|
3e9a07f00a
|
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
|
2016-05-22 16:59:14 +02:00 |
|
Cedric Nugteren
|
f0cb3fdc81
|
Fixed tuning results for half-precision; added first results for the xGER kernels
|
2016-05-22 16:46:05 +02:00 |
|
Cedric Nugteren
|
c8ff3f143f
|
Prepared the GER kernels and tuner for half-precision support
|
2016-05-22 16:18:08 +02:00 |
|
Cedric Nugteren
|
95b828da12
|
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
|
2016-05-22 15:38:26 +02:00 |
|
Cedric Nugteren
|
b6268d0c22
|
Added first tuning results for the half-precision xGEMV kernels
|
2016-05-22 15:29:05 +02:00 |
|
Cedric Nugteren
|
88551b4005
|
Prepared the GEMV kernels and tuner for half-precision support
|
2016-05-22 15:22:54 +02:00 |
|
Cedric Nugteren
|
803aaf3070
|
Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
|
2016-05-22 14:47:14 +02:00 |
|
Cedric Nugteren
|
3c9e63c054
|
Added first tuning results for the half-precision xDOT kernels
|
2016-05-22 14:43:25 +02:00 |
|
Cedric Nugteren
|
f70ded34f3
|
Added half-precision support for all level 1 routines
|
2016-05-22 14:26:19 +02:00 |
|
Cedric Nugteren
|
489c5d76cf
|
Merged in latest changes from 0.7.1 release
|
2016-05-18 21:32:56 +02:00 |
|
Cedric Nugteren
|
182d2cffa1
|
Prepared the changelog for the next release
|
2016-05-18 21:26:20 +02:00 |
|
Cedric Nugteren
|
181eb20bbf
|
Merge pull request #60 from CNugteren/development
Update to version 0.7.1
|
2016-05-18 21:18:07 +02:00 |
|
Cedric Nugteren
|
9a061528eb
|
Updated to version 0.7.1
|
2016-05-18 21:13:04 +02:00 |
|
CNugteren
|
748df9bf75
|
Fixes for Visual Studio
|
2016-05-18 20:53:40 +02:00 |
|
Cedric Nugteren
|
9bccc2544a
|
Fixes for CMake policy CMP0054
|
2016-05-18 20:36:07 +02:00 |
|
Cedric Nugteren
|
7ad5cc89d0
|
Made MSVC link the run-time libraries statically
|
2016-05-17 23:12:19 +02:00 |
|
Cedric Nugteren
|
c240774bad
|
Fixed warning CMP0054
|
2016-05-17 22:55:11 +02:00 |
|
Cedric Nugteren
|
7a3b695db7
|
Added half precision tuning results for supporting kernels (pad, copy, transpose, padtranspose)
|
2016-05-16 12:45:10 +02:00 |
|
Cedric Nugteren
|
af2ac62212
|
Prepared GEMM and supporting kernels and tuners for half-precision support
|
2016-05-16 12:37:24 +02:00 |
|
Cedric Nugteren
|
591e343ec9
|
Added an example of using the half-precision HAXPY routine
|
2016-05-15 20:18:34 +02:00 |
|
Cedric Nugteren
|
4b6bdd83a2
|
Added header with conversions from and to half-precision floating-point
|
2016-05-15 20:13:57 +02:00 |
|
cnugteren
|
7f5cfd92ba
|
Updated the performance graph for the Radeon M370X AMD GPU
|
2016-05-15 17:31:19 +02:00 |
|
cnugteren
|
fd107c9b12
|
Added new tuning results for SGEMM and updated the performance graph for the Radeon M370X AMD GPU
|
2016-05-15 17:28:22 +02:00 |
|
cnugteren
|
802c1f48c7
|
Removed comparison to CBLAS for the graph scripts
|
2016-05-15 17:06:36 +02:00 |
|
cnugteren
|
716d7c67d9
|
Fixed a bug in the xGEMM routine related to the event incorrectly set
|
2016-05-15 16:10:56 +02:00 |
|
cnugteren
|
9e36b3b20d
|
Fixed the arguments in the performance graphs to reflect the changes in enum values
|
2016-05-15 14:31:37 +02:00 |
|
cnugteren
|
9065b34684
|
Added support for staggered/shuffled offsets for GEMM to improve performance for large power-of-2 kernels on AMD GPUs
|
2016-05-15 14:04:34 +02:00 |
|
Cedric Nugteren
|
5e1b2e021f
|
Set kernel arguments for AXPY as constant memory buffers, making it possible to transfer half-precision values as well
|
2016-05-14 18:06:00 +02:00 |
|
Cedric Nugteren
|
120c31a30f
|
Initial experimental version of the half-precision HAXPY routine
|
2016-05-13 20:49:34 +02:00 |
|
Cedric Nugteren
|
f2ba75890c
|
Initial changes in preparation for half-precision fp16 support
|
2016-05-12 19:56:21 +02:00 |
|
Cedric Nugteren
|
1c72d225c5
|
Fixed links in the README
|
2016-05-10 21:03:51 +02:00 |
|
Cedric Nugteren
|
0dacd04bcd
|
Prepared the changelog for the next release
|
2016-05-08 21:30:04 +02:00 |
|
Cedric Nugteren
|
d91356a6b7
|
Merge pull request #58 from CNugteren/development
Update to version 0.7.0
|
2016-05-08 21:25:50 +02:00 |
|
CNugteren
|
942912daeb
|
Fixes for compilation of the tests under Visual Studio 2015
|
2016-05-08 21:11:37 +02:00 |
|
Cedric Nugteren
|
c5730c8b43
|
Updated to version 0.7.0
|
2016-05-08 20:29:41 +02:00 |
|
cnugteren
|
3b81ee2c08
|
Fixed an issue where the xAMAX tester would incorrectly report failures when testing against CBLAS
|
2016-05-08 18:28:01 +02:00 |
|
cnugteren
|
eaf1de5745
|
Fixed an issue where the xNRM2 and xASUM testers would incorrectly report failures for complex inputs
|
2016-05-08 18:07:55 +02:00 |
|
cnugteren
|
25a25dbd6f
|
Fixed errors in xAXPY and xSCAL tests on AMD hardware
|
2016-05-08 17:30:31 +02:00 |
|
cnugteren
|
1acb31896c
|
Fixed an issue with computing the GFLOPS numbers for the xGEMM performance tests for non-square matrices
|
2016-05-08 10:06:06 +02:00 |
|
Cedric Nugteren
|
ed2904a344
|
Added preliminary generated API documentation
|
2016-05-08 09:49:00 +02:00 |
|
Cedric Nugteren
|
6c9e08c5e2
|
Added an option to the tests to control whether to test against clBLAS or a CPU BLAS library
|
2016-05-07 12:22:06 +02:00 |
|