Cedric Nugteren
|
6925003e45
|
Added global memory synchronisation for better cache performance on ARM Mali GPUs
|
2016-06-08 10:13:37 +02:00 |
|
Cedric Nugteren
|
6d6b030053
|
Made the CPU BLAS library the default reference to test against in favor of clBLAS
|
2016-06-08 09:21:39 +02:00 |
|
Cedric Nugteren
|
7a7873d552
|
Fixed the RPATH settings for linking on OS X
|
2016-06-06 13:40:52 +02:00 |
|
Cedric Nugteren
|
c1895ea459
|
Made the tests for invalid buffer sizes also verbose in verbose mode
|
2016-06-06 12:20:42 +02:00 |
|
Cedric Nugteren
|
e561e3fbd5
|
Added return value to the test binaries (0: success, 1: failure), allowing it to work under CTest properly
|
2016-06-02 16:24:22 +02:00 |
|
Cedric Nugteren
|
137d1d8708
|
Added tuning parameters for 'GRID K520' and 'HD Graphics Skylake ULT GT2'
|
2016-06-01 09:39:33 +02:00 |
|
Cedric Nugteren
|
983df6a8b4
|
Made use of CMake's built-in unit testing, allowing all tests to be run using 'make test'
|
2016-05-31 20:53:55 +02:00 |
|
Cedric Nugteren
|
f6b2cd9579
|
Increased the verbosity of the -verbose option in the correctness tests
|
2016-05-30 20:07:09 +02:00 |
|
Cedric Nugteren
|
305bf16c4c
|
Separated the performance tests (clients) from the correctness tests in CMake
|
2016-05-30 16:38:26 +02:00 |
|
Cedric Nugteren
|
61105e3810
|
Merge branch 'half_precision' into development
|
2016-05-30 11:11:28 +02:00 |
|
Cedric Nugteren
|
03182f9d07
|
Added half-precision tests for the clBLAS reference through conversion to single-precision
|
2016-05-26 23:36:19 +02:00 |
|
Cedric Nugteren
|
b487d4dd44
|
Added half-precision tests for the CBLAS reference through conversion to single-precison
|
2016-05-26 13:15:27 +02:00 |
|
Cedric Nugteren
|
4612ff3552
|
Added possibility to run the performance client with half-precision
|
2016-05-25 14:37:26 +02:00 |
|
Cedric Nugteren
|
9f87455070
|
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
|
2016-05-25 13:29:53 +02:00 |
|
Cedric Nugteren
|
ac1575056e
|
Added proper argument handling and displaying for half-precision data-types
|
2016-05-24 14:06:16 +02:00 |
|
Cedric Nugteren
|
ae7d705d6f
|
Updated README with information on half-precision support
|
2016-05-23 19:23:46 +02:00 |
|
Cedric Nugteren
|
3e9a07f00a
|
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
|
2016-05-22 16:59:14 +02:00 |
|
Cedric Nugteren
|
f0cb3fdc81
|
Fixed tuning results for half-precision; added first results for the xGER kernels
|
2016-05-22 16:46:05 +02:00 |
|
Cedric Nugteren
|
c8ff3f143f
|
Prepared the GER kernels and tuner for half-precision support
|
2016-05-22 16:18:08 +02:00 |
|
Cedric Nugteren
|
95b828da12
|
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
|
2016-05-22 15:38:26 +02:00 |
|
Cedric Nugteren
|
b6268d0c22
|
Added first tuning results for the half-precision xGEMV kernels
|
2016-05-22 15:29:05 +02:00 |
|
Cedric Nugteren
|
88551b4005
|
Prepared the GEMV kernels and tuner for half-precision support
|
2016-05-22 15:22:54 +02:00 |
|
Cedric Nugteren
|
803aaf3070
|
Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
|
2016-05-22 14:47:14 +02:00 |
|
Cedric Nugteren
|
3c9e63c054
|
Added first tuning results for the half-precision xDOT kernels
|
2016-05-22 14:43:25 +02:00 |
|
Cedric Nugteren
|
f70ded34f3
|
Added half-precision support for all level 1 routines
|
2016-05-22 14:26:19 +02:00 |
|
Cedric Nugteren
|
489c5d76cf
|
Merged in latest changes from 0.7.1 release
|
2016-05-18 21:32:56 +02:00 |
|
Cedric Nugteren
|
182d2cffa1
|
Prepared the changelog for the next release
|
2016-05-18 21:26:20 +02:00 |
|
Cedric Nugteren
|
181eb20bbf
|
Merge pull request #60 from CNugteren/development
Update to version 0.7.1
|
2016-05-18 21:18:07 +02:00 |
|
Cedric Nugteren
|
9a061528eb
|
Updated to version 0.7.1
|
2016-05-18 21:13:04 +02:00 |
|
CNugteren
|
748df9bf75
|
Fixes for Visual Studio
|
2016-05-18 20:53:40 +02:00 |
|
Cedric Nugteren
|
9bccc2544a
|
Fixes for CMake policy CMP0054
|
2016-05-18 20:36:07 +02:00 |
|
Cedric Nugteren
|
7ad5cc89d0
|
Made MSVC link the run-time libraries statically
|
2016-05-17 23:12:19 +02:00 |
|
Cedric Nugteren
|
c240774bad
|
Fixed warning CMP0054
|
2016-05-17 22:55:11 +02:00 |
|
Cedric Nugteren
|
7a3b695db7
|
Added half precision tuning results for supporting kernels (pad, copy, transpose, padtranspose)
|
2016-05-16 12:45:10 +02:00 |
|
Cedric Nugteren
|
af2ac62212
|
Prepared GEMM and supporting kernels and tuners for half-precision support
|
2016-05-16 12:37:24 +02:00 |
|
Cedric Nugteren
|
591e343ec9
|
Added an example of using the half-precision HAXPY routine
|
2016-05-15 20:18:34 +02:00 |
|
Cedric Nugteren
|
4b6bdd83a2
|
Added header with conversions from and to half-precision floating-point
|
2016-05-15 20:13:57 +02:00 |
|
cnugteren
|
7f5cfd92ba
|
Updated the performance graph for the Radeon M370X AMD GPU
|
2016-05-15 17:31:19 +02:00 |
|
cnugteren
|
fd107c9b12
|
Added new tuning results for SGEMM and updated the performance graph for the Radeon M370X AMD GPU
|
2016-05-15 17:28:22 +02:00 |
|
cnugteren
|
802c1f48c7
|
Removed comparison to CBLAS for the graph scripts
|
2016-05-15 17:06:36 +02:00 |
|
cnugteren
|
716d7c67d9
|
Fixed a bug in the xGEMM routine related to the event incorrectly set
|
2016-05-15 16:10:56 +02:00 |
|
cnugteren
|
9e36b3b20d
|
Fixed the arguments in the performance graphs to reflect the changes in enum values
|
2016-05-15 14:31:37 +02:00 |
|
cnugteren
|
9065b34684
|
Added support for staggered/shuffled offsets for GEMM to improve performance for large power-of-2 kernels on AMD GPUs
|
2016-05-15 14:04:34 +02:00 |
|
Cedric Nugteren
|
5e1b2e021f
|
Set kernel arguments for AXPY as constant memory buffers, making it possible to transfer half-precision values as well
|
2016-05-14 18:06:00 +02:00 |
|
Cedric Nugteren
|
120c31a30f
|
Initial experimental version of the half-precision HAXPY routine
|
2016-05-13 20:49:34 +02:00 |
|
Cedric Nugteren
|
f2ba75890c
|
Initial changes in preparation for half-precision fp16 support
|
2016-05-12 19:56:21 +02:00 |
|
Cedric Nugteren
|
1c72d225c5
|
Fixed links in the README
|
2016-05-10 21:03:51 +02:00 |
|
Cedric Nugteren
|
0dacd04bcd
|
Prepared the changelog for the next release
|
2016-05-08 21:30:04 +02:00 |
|
Cedric Nugteren
|
d91356a6b7
|
Merge pull request #58 from CNugteren/development
Update to version 0.7.0
|
2016-05-08 21:25:50 +02:00 |
|
CNugteren
|
942912daeb
|
Fixes for compilation of the tests under Visual Studio 2015
|
2016-05-08 21:11:37 +02:00 |
|