Commit graph

726 commits

Author SHA1 Message Date
Cedric Nugteren f70ded34f3 Added half-precision support for all level 1 routines 2016-05-22 14:26:19 +02:00
Cedric Nugteren 489c5d76cf Merged in latest changes from 0.7.1 release 2016-05-18 21:32:56 +02:00
Cedric Nugteren 182d2cffa1 Prepared the changelog for the next release 2016-05-18 21:26:20 +02:00
Cedric Nugteren 181eb20bbf Merge pull request #60 from CNugteren/development
Update to version 0.7.1
2016-05-18 21:18:07 +02:00
Cedric Nugteren 9a061528eb Updated to version 0.7.1 2016-05-18 21:13:04 +02:00
CNugteren 748df9bf75 Fixes for Visual Studio 2016-05-18 20:53:40 +02:00
Cedric Nugteren 9bccc2544a Fixes for CMake policy CMP0054 2016-05-18 20:36:07 +02:00
Cedric Nugteren 7ad5cc89d0 Made MSVC link the run-time libraries statically 2016-05-17 23:12:19 +02:00
Cedric Nugteren c240774bad Fixed warning CMP0054 2016-05-17 22:55:11 +02:00
Cedric Nugteren 7a3b695db7 Added half precision tuning results for supporting kernels (pad, copy, transpose, padtranspose) 2016-05-16 12:45:10 +02:00
Cedric Nugteren af2ac62212 Prepared GEMM and supporting kernels and tuners for half-precision support 2016-05-16 12:37:24 +02:00
Cedric Nugteren 591e343ec9 Added an example of using the half-precision HAXPY routine 2016-05-15 20:18:34 +02:00
Cedric Nugteren 4b6bdd83a2 Added header with conversions from and to half-precision floating-point 2016-05-15 20:13:57 +02:00
cnugteren 7f5cfd92ba Updated the performance graph for the Radeon M370X AMD GPU 2016-05-15 17:31:19 +02:00
cnugteren fd107c9b12 Added new tuning results for SGEMM and updated the performance graph for the Radeon M370X AMD GPU 2016-05-15 17:28:22 +02:00
cnugteren 802c1f48c7 Removed comparison to CBLAS for the graph scripts 2016-05-15 17:06:36 +02:00
cnugteren 716d7c67d9 Fixed a bug in the xGEMM routine related to the event incorrectly set 2016-05-15 16:10:56 +02:00
cnugteren 9e36b3b20d Fixed the arguments in the performance graphs to reflect the changes in enum values 2016-05-15 14:31:37 +02:00
cnugteren 9065b34684 Added support for staggered/shuffled offsets for GEMM to improve performance for large power-of-2 kernels on AMD GPUs 2016-05-15 14:04:34 +02:00
Cedric Nugteren 5e1b2e021f Set kernel arguments for AXPY as constant memory buffers, making it possible to transfer half-precision values as well 2016-05-14 18:06:00 +02:00
Cedric Nugteren 120c31a30f Initial experimental version of the half-precision HAXPY routine 2016-05-13 20:49:34 +02:00
Cedric Nugteren f2ba75890c Initial changes in preparation for half-precision fp16 support 2016-05-12 19:56:21 +02:00
Cedric Nugteren 1c72d225c5 Fixed links in the README 2016-05-10 21:03:51 +02:00
Cedric Nugteren 0dacd04bcd Prepared the changelog for the next release 2016-05-08 21:30:04 +02:00
Cedric Nugteren d91356a6b7 Merge pull request #58 from CNugteren/development
Update to version 0.7.0
2016-05-08 21:25:50 +02:00
CNugteren 942912daeb Fixes for compilation of the tests under Visual Studio 2015 2016-05-08 21:11:37 +02:00
Cedric Nugteren c5730c8b43 Updated to version 0.7.0 2016-05-08 20:29:41 +02:00
cnugteren 3b81ee2c08 Fixed an issue where the xAMAX tester would incorrectly report failures when testing against CBLAS 2016-05-08 18:28:01 +02:00
cnugteren eaf1de5745 Fixed an issue where the xNRM2 and xASUM testers would incorrectly report failures for complex inputs 2016-05-08 18:07:55 +02:00
cnugteren 25a25dbd6f Fixed errors in xAXPY and xSCAL tests on AMD hardware 2016-05-08 17:30:31 +02:00
cnugteren 1acb31896c Fixed an issue with computing the GFLOPS numbers for the xGEMM performance tests for non-square matrices 2016-05-08 10:06:06 +02:00
Cedric Nugteren ed2904a344 Added preliminary generated API documentation 2016-05-08 09:49:00 +02:00
Cedric Nugteren 6c9e08c5e2 Added an option to the tests to control whether to test against clBLAS or a CPU BLAS library 2016-05-07 12:22:06 +02:00
Cedric Nugteren 56aa1701c9 Added printing of indices when testing in verbose mode 2016-05-05 23:09:57 +02:00
Cedric Nugteren f18c12389d Merge pull request #57 from dividiti/development
Locate the C BLAS library before the F77 one.
2016-05-05 22:27:22 +02:00
Anton Lokhmotov e075dc347a Locate the C BLAS library before the F77 one. 2016-05-05 14:38:10 +00:00
Cedric Nugteren aa97c836b1 Fixed an issue with linking against the ATLAS BLAS library 2016-05-04 19:16:09 +02:00
Cedric Nugteren 435729a43e Added tuning results for AMD Hawaii (R9 290X) 2016-05-02 20:20:23 +02:00
Cedric Nugteren a8f109296c Fixed the calculation of the required buffer sizes in case of subvectors and submatrices 2016-05-02 20:04:55 +02:00
Cedric Nugteren 27d0ac7f38 Added tuning results for AMD Pitcairn (R9 270X) 2016-05-01 19:33:50 +02:00
Cedric Nugteren c94b628318 Updated tuning database for reduction/dot kernels based on the new tuner; partially repopulated the database 2016-05-01 19:17:04 +02:00
Cedric Nugteren b9317d7d0c Made the default xDOT tuning size smaller 2016-05-01 14:39:44 +02:00
Cedric Nugteren bee2f943ec Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking 2016-05-01 14:03:37 +02:00
Cedric Nugteren 9602c150aa Added a program cache (per-context) next to the per-device binary cache 2016-05-01 12:56:08 +02:00
Cedric Nugteren e113ff0852 Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX 2016-04-30 09:49:39 +02:00
Cedric Nugteren 2952390f27 Added an example to demonstrate the use of the ClearCache and FillCache functions 2016-04-29 23:33:36 +02:00
Cedric Nugteren 877aad693f Added FillCache: a function to pre-compile all kernels for a specific device 2016-04-29 23:33:12 +02:00
Cedric Nugteren 4f528b1730 Added sample C programs for the SASUM and DGEMV routines 2016-04-29 20:33:19 +02:00
Cedric Nugteren d9b21d7f49 Fixed the cache to store binaries instead of OpenCL programs 2016-04-28 21:14:17 +02:00
Cedric Nugteren d7ddbdeb1f Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX 2016-04-27 18:07:30 +02:00