Commit graph

306 commits

Author SHA1 Message Date
Cedric Nugteren 7ad5cc89d0 Made MSVC link the run-time libraries statically 2016-05-17 23:12:19 +02:00
Cedric Nugteren c240774bad Fixed warning CMP0054 2016-05-17 22:55:11 +02:00
cnugteren 7f5cfd92ba Updated the performance graph for the Radeon M370X AMD GPU 2016-05-15 17:31:19 +02:00
cnugteren fd107c9b12 Added new tuning results for SGEMM and updated the performance graph for the Radeon M370X AMD GPU 2016-05-15 17:28:22 +02:00
cnugteren 802c1f48c7 Removed comparison to CBLAS for the graph scripts 2016-05-15 17:06:36 +02:00
cnugteren 716d7c67d9 Fixed a bug in the xGEMM routine related to the event incorrectly set 2016-05-15 16:10:56 +02:00
cnugteren 9e36b3b20d Fixed the arguments in the performance graphs to reflect the changes in enum values 2016-05-15 14:31:37 +02:00
cnugteren 9065b34684 Added support for staggered/shuffled offsets for GEMM to improve performance for large power-of-2 kernels on AMD GPUs 2016-05-15 14:04:34 +02:00
Cedric Nugteren 1c72d225c5 Fixed links in the README 2016-05-10 21:03:51 +02:00
Cedric Nugteren 0dacd04bcd Prepared the changelog for the next release 2016-05-08 21:30:04 +02:00
CNugteren 942912daeb Fixes for compilation of the tests under Visual Studio 2015 2016-05-08 21:11:37 +02:00
Cedric Nugteren c5730c8b43 Updated to version 0.7.0 2016-05-08 20:29:41 +02:00
cnugteren 3b81ee2c08 Fixed an issue where the xAMAX tester would incorrectly report failures when testing against CBLAS 2016-05-08 18:28:01 +02:00
cnugteren eaf1de5745 Fixed an issue where the xNRM2 and xASUM testers would incorrectly report failures for complex inputs 2016-05-08 18:07:55 +02:00
cnugteren 25a25dbd6f Fixed errors in xAXPY and xSCAL tests on AMD hardware 2016-05-08 17:30:31 +02:00
cnugteren 1acb31896c Fixed an issue with computing the GFLOPS numbers for the xGEMM performance tests for non-square matrices 2016-05-08 10:06:06 +02:00
Cedric Nugteren ed2904a344 Added preliminary generated API documentation 2016-05-08 09:49:00 +02:00
Cedric Nugteren 6c9e08c5e2 Added an option to the tests to control whether to test against clBLAS or a CPU BLAS library 2016-05-07 12:22:06 +02:00
Cedric Nugteren 56aa1701c9 Added printing of indices when testing in verbose mode 2016-05-05 23:09:57 +02:00
Cedric Nugteren f18c12389d Merge pull request #57 from dividiti/development
Locate the C BLAS library before the F77 one.
2016-05-05 22:27:22 +02:00
Anton Lokhmotov e075dc347a Locate the C BLAS library before the F77 one. 2016-05-05 14:38:10 +00:00
Cedric Nugteren aa97c836b1 Fixed an issue with linking against the ATLAS BLAS library 2016-05-04 19:16:09 +02:00
Cedric Nugteren 435729a43e Added tuning results for AMD Hawaii (R9 290X) 2016-05-02 20:20:23 +02:00
Cedric Nugteren a8f109296c Fixed the calculation of the required buffer sizes in case of subvectors and submatrices 2016-05-02 20:04:55 +02:00
Cedric Nugteren 27d0ac7f38 Added tuning results for AMD Pitcairn (R9 270X) 2016-05-01 19:33:50 +02:00
Cedric Nugteren c94b628318 Updated tuning database for reduction/dot kernels based on the new tuner; partially repopulated the database 2016-05-01 19:17:04 +02:00
Cedric Nugteren b9317d7d0c Made the default xDOT tuning size smaller 2016-05-01 14:39:44 +02:00
Cedric Nugteren bee2f943ec Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking 2016-05-01 14:03:37 +02:00
Cedric Nugteren 9602c150aa Added a program cache (per-context) next to the per-device binary cache 2016-05-01 12:56:08 +02:00
Cedric Nugteren e113ff0852 Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX 2016-04-30 09:49:39 +02:00
Cedric Nugteren 2952390f27 Added an example to demonstrate the use of the ClearCache and FillCache functions 2016-04-29 23:33:36 +02:00
Cedric Nugteren 877aad693f Added FillCache: a function to pre-compile all kernels for a specific device 2016-04-29 23:33:12 +02:00
Cedric Nugteren 4f528b1730 Added sample C programs for the SASUM and DGEMV routines 2016-04-29 20:33:19 +02:00
Cedric Nugteren d9b21d7f49 Fixed the cache to store binaries instead of OpenCL programs 2016-04-28 21:14:17 +02:00
Cedric Nugteren d7ddbdeb1f Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX 2016-04-27 18:07:30 +02:00
Cedric Nugteren 13eed1a0f9 Added missing namespace to the SGEMM example 2016-04-27 17:59:28 +02:00
Cedric Nugteren 8075934ca7 Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute counterparts of xASUM and IxAMAX) 2016-04-27 17:06:19 +02:00
Cedric Nugteren 82be8f211c Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache 2016-04-27 16:02:13 +02:00
Cedric Nugteren 44bdb60e83 Relaxed the absolute error margin for floating-point value comparisons to 1e-4 2016-04-27 14:42:30 +02:00
Cedric Nugteren 226e834d0a Added a '-verbose' option to the test binaries to report errors in more detail if needed 2016-04-27 14:38:30 +02:00
Cedric Nugteren 3555cd0436 All CLBlast enum constants now have the same raw values as in the cblas standard 2016-04-27 11:37:55 +02:00
cnugteren c8e28a33c0 Merge branch 'level1_routines' into development 2016-04-20 22:14:55 -06:00
cnugteren 16a048f1ac Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines 2016-04-20 22:12:51 -06:00
cnugteren 894983fc3c Added prototype for ixAMAX routines 2016-04-20 21:11:33 -06:00
cnugteren 5a4f8217be Updated the reduction-kernel tuner to also tune the epilogue 2016-04-14 21:37:52 -06:00
cnugteren 8be99de82d Added support for the SASUM/DASUM/ScASUM/DzASUM routines 2016-04-14 19:58:26 -06:00
cnugteren e0497807e2 Added prototype for xASUM routines 2016-04-13 21:44:49 -06:00
cnugteren a61724ece5 Fixed the way the defaults are calculated in the database; added warning for non-matching tuner arguments 2016-04-11 22:27:44 -06:00
cnugteren 1d3d38a261 Events are now properly implemented using event waiting list and asking the user to wait for event completion 2016-04-09 22:22:24 -06:00
cnugteren c2cfee76c4 Properly set warning flags for Clang 2016-04-04 08:39:13 -07:00