Commit graph

75 commits

Author SHA1 Message Date
Ivan Shapovalov a1d80e7402 CMakeLists.txt: use ${clblast_SOURCE_DIR} instead of ${CMAKE_SOURCE_DIR} 2016-07-22 11:15:52 +03:00
Cedric Nugteren 27854070b4 Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen 2016-07-06 21:50:12 +02:00
CNugteren 2d665099ef Fixed a linking issue with the tuners on Visual Studio 2016-07-04 19:46:14 +02:00
Cedric Nugteren b330ab0866 Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library 2016-06-30 10:49:17 +02:00
Cedric Nugteren 577f0ee117 Updated to version 0.8.0 2016-06-28 21:32:00 +02:00
CNugteren 871b576c06 Made it possible to build the clients and tests on Windows using Visual Studio 2016-06-28 16:38:45 +02:00
Cedric Nugteren ca386f9883 Added fp16 to the alltuners target 2016-06-27 11:46:33 +02:00
Cedric Nugteren 61203453aa Renamed all C++ source files to .cpp to match the .hpp extension better 2016-06-19 13:55:49 +02:00
Cedric Nugteren f726fbdc9f Moved all headers into the source tree, changed headers to .hpp extension 2016-06-18 20:20:13 +02:00
Cedric Nugteren bacb5d2bb2 Clean-up of the routine class, moved RunKernel to the routine/common file 2016-06-18 18:16:14 +02:00
Cedric Nugteren 52ccaf5b25 Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing 2016-06-16 18:07:46 +02:00
Cedric Nugteren b894611ad1 Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) and renamed files and functions appropriately 2016-06-14 18:17:58 +02:00
Cedric Nugteren 6d6b030053 Made the CPU BLAS library the default reference to test against in favor of clBLAS 2016-06-08 09:21:39 +02:00
Cedric Nugteren 7a7873d552 Fixed the RPATH settings for linking on OS X 2016-06-06 13:40:52 +02:00
Cedric Nugteren 983df6a8b4 Made use of CMake's built-in unit testing, allowing all tests to be run using 'make test' 2016-05-31 20:53:55 +02:00
Cedric Nugteren 305bf16c4c Separated the performance tests (clients) from the correctness tests in CMake 2016-05-30 16:38:26 +02:00
Cedric Nugteren 489c5d76cf Merged in latest changes from 0.7.1 release 2016-05-18 21:32:56 +02:00
Cedric Nugteren 591e343ec9 Added an example of using the half-precision HAXPY routine 2016-05-15 20:18:34 +02:00
Cedric Nugteren 4b6bdd83a2 Added header with conversions from and to half-precision floating-point 2016-05-15 20:13:57 +02:00
Cedric Nugteren c5730c8b43 Updated to version 0.7.0 2016-05-08 20:29:41 +02:00
Cedric Nugteren 2952390f27 Added an example to demonstrate the use of the ClearCache and FillCache functions 2016-04-29 23:33:36 +02:00
Cedric Nugteren 4f528b1730 Added sample C programs for the SASUM and DGEMV routines 2016-04-29 20:33:19 +02:00
Cedric Nugteren 82be8f211c Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache 2016-04-27 16:02:13 +02:00
cnugteren 16a048f1ac Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines 2016-04-20 22:12:51 -06:00
cnugteren 8be99de82d Added support for the SASUM/DASUM/ScASUM/DzASUM routines 2016-04-14 19:58:26 -06:00
cnugteren c2cfee76c4 Properly set warning flags for Clang 2016-04-04 08:39:13 -07:00
cnugteren 1a82861a90 Added support for testing (performance and correctness) against a CPU BLAS library 2016-04-02 11:58:00 -07:00
cnugteren a2056f2216 Create a first version of CPU BLAS detection in CMake 2016-03-31 22:22:29 -07:00
cnugteren 8c3c6db7d0 Merge branch 'level1_routines' into development 2016-03-30 21:37:56 -07:00
cnugteren 6578102ae9 CMake now downloads the cl.hpp header from the Khronos website when building the samples 2016-03-30 16:24:38 -07:00
Cedric Nugteren aaa687ca98 Added preliminary support for the xNRM2 routines 2016-03-28 23:00:44 +02:00
Cedric Nugteren 706c6987c6 Fixed compilation of the two SGEMM samples 2016-03-23 20:31:25 +01:00
Cedric Nugteren bf4bd072e2 Updated to version 0.6.0 2016-03-13 11:02:40 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 60da54da5d Added preliminary support for xHER2 and xSYR2 routines 2016-03-02 21:18:01 +01:00
Cedric Nugteren e3545215a5 Added support for xHER, xHPR, xSYR, and xSPR routines 2016-02-28 14:16:48 +01:00
Cedric Nugteren 6dc44da07b Added support for xGERU and xGERC routines 2016-02-20 14:15:41 +01:00
Cedric Nugteren 8854a73127 Added XGER routine, kernel, and tuner 2016-02-20 12:40:01 +01:00
Cedric Nugteren bb985f010b Changed the order of tuners in the alltuners target 2016-02-06 12:48:42 +01:00
CNugteren 9622d3be22 Fixes for compilation under Visual Studio 2016-01-30 14:57:49 +01:00
Cedric Nugteren 44fb40e5c4 Prepared for MSVC support 2016-01-30 11:54:29 +01:00
CNugteren 92404035e8 Updated to version 0.5.0 2015-10-17 15:48:13 +02:00
CNugteren 2b56c2c603 Added TRMV/TBMV/TPMV routines 2015-09-26 16:58:03 +02:00
CNugteren de6547a92b Added SBMV and SPMV routines 2015-09-19 18:01:19 +02:00
CNugteren 80da67d28b Added the HPMV routine 2015-09-19 17:40:38 +02:00
CNugteren aebd156869 Added the HBMV routine 2015-09-19 11:11:34 +02:00
CNugteren 4507ba4997 Added first version of banded matrix-vector multiplication 2015-09-18 15:25:20 +02:00
CNugteren a2e726d3bd Added xDOT/xDOTU/xDOTC dot-product routines 2015-09-14 16:57:00 +02:00
CNugteren ff0c54c386 Added the XSWAP, XSCAL and XCOPY level-1 routines 2015-08-22 17:11:20 +02:00
CNugteren 74f601794d Updated to version 0.4.0 2015-08-22 12:41:40 +02:00