Cedric Nugteren
|
181eb20bbf
|
Merge pull request #60 from CNugteren/development
Update to version 0.7.1
|
2016-05-18 21:18:07 +02:00 |
|
Cedric Nugteren
|
9a061528eb
|
Updated to version 0.7.1
|
2016-05-18 21:13:04 +02:00 |
|
CNugteren
|
748df9bf75
|
Fixes for Visual Studio
|
2016-05-18 20:53:40 +02:00 |
|
Cedric Nugteren
|
9bccc2544a
|
Fixes for CMake policy CMP0054
|
2016-05-18 20:36:07 +02:00 |
|
Cedric Nugteren
|
7ad5cc89d0
|
Made MSVC link the run-time libraries statically
|
2016-05-17 23:12:19 +02:00 |
|
Cedric Nugteren
|
c240774bad
|
Fixed warning CMP0054
|
2016-05-17 22:55:11 +02:00 |
|
Cedric Nugteren
|
7a3b695db7
|
Added half precision tuning results for supporting kernels (pad, copy, transpose, padtranspose)
|
2016-05-16 12:45:10 +02:00 |
|
Cedric Nugteren
|
af2ac62212
|
Prepared GEMM and supporting kernels and tuners for half-precision support
|
2016-05-16 12:37:24 +02:00 |
|
Cedric Nugteren
|
591e343ec9
|
Added an example of using the half-precision HAXPY routine
|
2016-05-15 20:18:34 +02:00 |
|
Cedric Nugteren
|
4b6bdd83a2
|
Added header with conversions from and to half-precision floating-point
|
2016-05-15 20:13:57 +02:00 |
|
cnugteren
|
7f5cfd92ba
|
Updated the performance graph for the Radeon M370X AMD GPU
|
2016-05-15 17:31:19 +02:00 |
|
cnugteren
|
fd107c9b12
|
Added new tuning results for SGEMM and updated the performance graph for the Radeon M370X AMD GPU
|
2016-05-15 17:28:22 +02:00 |
|
cnugteren
|
802c1f48c7
|
Removed comparison to CBLAS for the graph scripts
|
2016-05-15 17:06:36 +02:00 |
|
cnugteren
|
716d7c67d9
|
Fixed a bug in the xGEMM routine related to the event incorrectly set
|
2016-05-15 16:10:56 +02:00 |
|
cnugteren
|
9e36b3b20d
|
Fixed the arguments in the performance graphs to reflect the changes in enum values
|
2016-05-15 14:31:37 +02:00 |
|
cnugteren
|
9065b34684
|
Added support for staggered/shuffled offsets for GEMM to improve performance for large power-of-2 kernels on AMD GPUs
|
2016-05-15 14:04:34 +02:00 |
|
Cedric Nugteren
|
5e1b2e021f
|
Set kernel arguments for AXPY as constant memory buffers, making it possible to transfer half-precision values as well
|
2016-05-14 18:06:00 +02:00 |
|
Cedric Nugteren
|
120c31a30f
|
Initial experimental version of the half-precision HAXPY routine
|
2016-05-13 20:49:34 +02:00 |
|
Cedric Nugteren
|
f2ba75890c
|
Initial changes in preparation for half-precision fp16 support
|
2016-05-12 19:56:21 +02:00 |
|
Cedric Nugteren
|
1c72d225c5
|
Fixed links in the README
|
2016-05-10 21:03:51 +02:00 |
|
Cedric Nugteren
|
0dacd04bcd
|
Prepared the changelog for the next release
|
2016-05-08 21:30:04 +02:00 |
|
Cedric Nugteren
|
d91356a6b7
|
Merge pull request #58 from CNugteren/development
Update to version 0.7.0
|
2016-05-08 21:25:50 +02:00 |
|
CNugteren
|
942912daeb
|
Fixes for compilation of the tests under Visual Studio 2015
|
2016-05-08 21:11:37 +02:00 |
|
Cedric Nugteren
|
c5730c8b43
|
Updated to version 0.7.0
|
2016-05-08 20:29:41 +02:00 |
|
cnugteren
|
3b81ee2c08
|
Fixed an issue where the xAMAX tester would incorrectly report failures when testing against CBLAS
|
2016-05-08 18:28:01 +02:00 |
|
cnugteren
|
eaf1de5745
|
Fixed an issue where the xNRM2 and xASUM testers would incorrectly report failures for complex inputs
|
2016-05-08 18:07:55 +02:00 |
|
cnugteren
|
25a25dbd6f
|
Fixed errors in xAXPY and xSCAL tests on AMD hardware
|
2016-05-08 17:30:31 +02:00 |
|
cnugteren
|
1acb31896c
|
Fixed an issue with computing the GFLOPS numbers for the xGEMM performance tests for non-square matrices
|
2016-05-08 10:06:06 +02:00 |
|
Cedric Nugteren
|
ed2904a344
|
Added preliminary generated API documentation
|
2016-05-08 09:49:00 +02:00 |
|
Cedric Nugteren
|
6c9e08c5e2
|
Added an option to the tests to control whether to test against clBLAS or a CPU BLAS library
|
2016-05-07 12:22:06 +02:00 |
|
Cedric Nugteren
|
56aa1701c9
|
Added printing of indices when testing in verbose mode
|
2016-05-05 23:09:57 +02:00 |
|
Cedric Nugteren
|
f18c12389d
|
Merge pull request #57 from dividiti/development
Locate the C BLAS library before the F77 one.
|
2016-05-05 22:27:22 +02:00 |
|
Anton Lokhmotov
|
e075dc347a
|
Locate the C BLAS library before the F77 one.
|
2016-05-05 14:38:10 +00:00 |
|
Cedric Nugteren
|
aa97c836b1
|
Fixed an issue with linking against the ATLAS BLAS library
|
2016-05-04 19:16:09 +02:00 |
|
Cedric Nugteren
|
435729a43e
|
Added tuning results for AMD Hawaii (R9 290X)
|
2016-05-02 20:20:23 +02:00 |
|
Cedric Nugteren
|
a8f109296c
|
Fixed the calculation of the required buffer sizes in case of subvectors and submatrices
|
2016-05-02 20:04:55 +02:00 |
|
Cedric Nugteren
|
27d0ac7f38
|
Added tuning results for AMD Pitcairn (R9 270X)
|
2016-05-01 19:33:50 +02:00 |
|
Cedric Nugteren
|
c94b628318
|
Updated tuning database for reduction/dot kernels based on the new tuner; partially repopulated the database
|
2016-05-01 19:17:04 +02:00 |
|
Cedric Nugteren
|
b9317d7d0c
|
Made the default xDOT tuning size smaller
|
2016-05-01 14:39:44 +02:00 |
|
Cedric Nugteren
|
bee2f943ec
|
Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking
|
2016-05-01 14:03:37 +02:00 |
|
Cedric Nugteren
|
9602c150aa
|
Added a program cache (per-context) next to the per-device binary cache
|
2016-05-01 12:56:08 +02:00 |
|
Cedric Nugteren
|
e113ff0852
|
Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX
|
2016-04-30 09:49:39 +02:00 |
|
Cedric Nugteren
|
2952390f27
|
Added an example to demonstrate the use of the ClearCache and FillCache functions
|
2016-04-29 23:33:36 +02:00 |
|
Cedric Nugteren
|
877aad693f
|
Added FillCache: a function to pre-compile all kernels for a specific device
|
2016-04-29 23:33:12 +02:00 |
|
Cedric Nugteren
|
4f528b1730
|
Added sample C programs for the SASUM and DGEMV routines
|
2016-04-29 20:33:19 +02:00 |
|
Cedric Nugteren
|
d9b21d7f49
|
Fixed the cache to store binaries instead of OpenCL programs
|
2016-04-28 21:14:17 +02:00 |
|
Cedric Nugteren
|
d7ddbdeb1f
|
Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX
|
2016-04-27 18:07:30 +02:00 |
|
Cedric Nugteren
|
13eed1a0f9
|
Added missing namespace to the SGEMM example
|
2016-04-27 17:59:28 +02:00 |
|
Cedric Nugteren
|
8075934ca7
|
Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute counterparts of xASUM and IxAMAX)
|
2016-04-27 17:06:19 +02:00 |
|
Cedric Nugteren
|
82be8f211c
|
Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache
|
2016-04-27 16:02:13 +02:00 |
|