Cedric Nugteren
|
f726fbdc9f
|
Moved all headers into the source tree, changed headers to .hpp extension
|
2016-06-18 20:20:13 +02:00 |
|
Cedric Nugteren
|
bacb5d2bb2
|
Clean-up of the routine class, moved RunKernel to the routine/common file
|
2016-06-18 18:16:14 +02:00 |
|
Cedric Nugteren
|
7b4c0e1cf0
|
Removed the template from the Routine base-class
|
2016-06-18 14:56:55 +02:00 |
|
Cedric Nugteren
|
f9947b4d7f
|
Removed the precision argument from the routines in favor of a single templated function
|
2016-06-17 14:30:37 +02:00 |
|
Cedric Nugteren
|
536b7fe4bc
|
Removed the interface to the cache functions from the Routine class, calls them directly now
|
2016-06-17 13:57:50 +02:00 |
|
Cedric Nugteren
|
98a95c89fc
|
Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine class
|
2016-06-17 12:32:06 +02:00 |
|
Cedric Nugteren
|
520e28e7a7
|
Moved the ErrorIn function from the Routine class to the utilities header
|
2016-06-17 11:41:10 +02:00 |
|
Cedric Nugteren
|
afe8852eaa
|
Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file
|
2016-06-17 11:29:07 +02:00 |
|
Cedric Nugteren
|
52ccaf5b25
|
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
|
2016-06-16 18:07:46 +02:00 |
|
Cedric Nugteren
|
39b7dbc5e3
|
Added some constness to variables related to the GEMM routines
|
2016-06-15 12:34:05 +02:00 |
|
Cedric Nugteren
|
b894611ad1
|
Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) and renamed files and functions appropriately
|
2016-06-14 18:17:58 +02:00 |
|
Cedric Nugteren
|
3e78a99355
|
Moved device vendor and type checks to a common header
|
2016-06-14 14:30:22 +02:00 |
|
Cedric Nugteren
|
6e2017c67d
|
Added support for FP16 on ARM Mali-T628 (officially not supported)
|
2016-06-14 14:29:53 +02:00 |
|
Cedric Nugteren
|
995a528cec
|
Improved API documentation and added documentation for level-2 and level-3 routines
|
2016-06-13 20:17:26 +02:00 |
|
Cedric Nugteren
|
4fb8f9517c
|
Added documentation for the matrix-update level-2 family of routines
|
2016-06-10 11:16:06 +02:00 |
|
Cedric Nugteren
|
6925003e45
|
Added global memory synchronisation for better cache performance on ARM Mali GPUs
|
2016-06-08 10:13:37 +02:00 |
|
Cedric Nugteren
|
6d6b030053
|
Made the CPU BLAS library the default reference to test against in favor of clBLAS
|
2016-06-08 09:21:39 +02:00 |
|
Cedric Nugteren
|
7a7873d552
|
Fixed the RPATH settings for linking on OS X
|
2016-06-06 13:40:52 +02:00 |
|
Cedric Nugteren
|
c1895ea459
|
Made the tests for invalid buffer sizes also verbose in verbose mode
|
2016-06-06 12:20:42 +02:00 |
|
Cedric Nugteren
|
e561e3fbd5
|
Added return value to the test binaries (0: success, 1: failure), allowing it to work under CTest properly
|
2016-06-02 16:24:22 +02:00 |
|
Cedric Nugteren
|
137d1d8708
|
Added tuning parameters for 'GRID K520' and 'HD Graphics Skylake ULT GT2'
|
2016-06-01 09:39:33 +02:00 |
|
Cedric Nugteren
|
983df6a8b4
|
Made use of CMake's built-in unit testing, allowing all tests to be run using 'make test'
|
2016-05-31 20:53:55 +02:00 |
|
Cedric Nugteren
|
f6b2cd9579
|
Increased the verbosity of the -verbose option in the correctness tests
|
2016-05-30 20:07:09 +02:00 |
|
Cedric Nugteren
|
305bf16c4c
|
Separated the performance tests (clients) from the correctness tests in CMake
|
2016-05-30 16:38:26 +02:00 |
|
Cedric Nugteren
|
61105e3810
|
Merge branch 'half_precision' into development
|
2016-05-30 11:11:28 +02:00 |
|
Cedric Nugteren
|
03182f9d07
|
Added half-precision tests for the clBLAS reference through conversion to single-precision
|
2016-05-26 23:36:19 +02:00 |
|
Cedric Nugteren
|
b487d4dd44
|
Added half-precision tests for the CBLAS reference through conversion to single-precison
|
2016-05-26 13:15:27 +02:00 |
|
Cedric Nugteren
|
4612ff3552
|
Added possibility to run the performance client with half-precision
|
2016-05-25 14:37:26 +02:00 |
|
Cedric Nugteren
|
9f87455070
|
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
|
2016-05-25 13:29:53 +02:00 |
|
Cedric Nugteren
|
ac1575056e
|
Added proper argument handling and displaying for half-precision data-types
|
2016-05-24 14:06:16 +02:00 |
|
Cedric Nugteren
|
ae7d705d6f
|
Updated README with information on half-precision support
|
2016-05-23 19:23:46 +02:00 |
|
Cedric Nugteren
|
3e9a07f00a
|
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
|
2016-05-22 16:59:14 +02:00 |
|
Cedric Nugteren
|
f0cb3fdc81
|
Fixed tuning results for half-precision; added first results for the xGER kernels
|
2016-05-22 16:46:05 +02:00 |
|
Cedric Nugteren
|
c8ff3f143f
|
Prepared the GER kernels and tuner for half-precision support
|
2016-05-22 16:18:08 +02:00 |
|
Cedric Nugteren
|
95b828da12
|
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
|
2016-05-22 15:38:26 +02:00 |
|
Cedric Nugteren
|
b6268d0c22
|
Added first tuning results for the half-precision xGEMV kernels
|
2016-05-22 15:29:05 +02:00 |
|
Cedric Nugteren
|
88551b4005
|
Prepared the GEMV kernels and tuner for half-precision support
|
2016-05-22 15:22:54 +02:00 |
|
Cedric Nugteren
|
803aaf3070
|
Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
|
2016-05-22 14:47:14 +02:00 |
|
Cedric Nugteren
|
3c9e63c054
|
Added first tuning results for the half-precision xDOT kernels
|
2016-05-22 14:43:25 +02:00 |
|
Cedric Nugteren
|
f70ded34f3
|
Added half-precision support for all level 1 routines
|
2016-05-22 14:26:19 +02:00 |
|
Cedric Nugteren
|
489c5d76cf
|
Merged in latest changes from 0.7.1 release
|
2016-05-18 21:32:56 +02:00 |
|
Cedric Nugteren
|
182d2cffa1
|
Prepared the changelog for the next release
|
2016-05-18 21:26:20 +02:00 |
|
Cedric Nugteren
|
181eb20bbf
|
Merge pull request #60 from CNugteren/development
Update to version 0.7.1
|
2016-05-18 21:18:07 +02:00 |
|
Cedric Nugteren
|
9a061528eb
|
Updated to version 0.7.1
|
2016-05-18 21:13:04 +02:00 |
|
CNugteren
|
748df9bf75
|
Fixes for Visual Studio
|
2016-05-18 20:53:40 +02:00 |
|
Cedric Nugteren
|
9bccc2544a
|
Fixes for CMake policy CMP0054
|
2016-05-18 20:36:07 +02:00 |
|
Cedric Nugteren
|
7ad5cc89d0
|
Made MSVC link the run-time libraries statically
|
2016-05-17 23:12:19 +02:00 |
|
Cedric Nugteren
|
c240774bad
|
Fixed warning CMP0054
|
2016-05-17 22:55:11 +02:00 |
|
Cedric Nugteren
|
7a3b695db7
|
Added half precision tuning results for supporting kernels (pad, copy, transpose, padtranspose)
|
2016-05-16 12:45:10 +02:00 |
|
Cedric Nugteren
|
af2ac62212
|
Prepared GEMM and supporting kernels and tuners for half-precision support
|
2016-05-16 12:37:24 +02:00 |
|