Commit graph

91 commits

Author SHA1 Message Date
cnugteren 1a82861a90 Added support for testing (performance and correctness) against a CPU BLAS library 2016-04-02 11:58:00 -07:00
cnugteren 5c83217cf2 Added a wrapper for CBLAS libraries for performance/correctness testing 2016-04-01 22:36:39 -07:00
cnugteren 8c3c6db7d0 Merge branch 'level1_routines' into development 2016-03-30 21:37:56 -07:00
Cedric Nugteren c1df786764 Added prototypes for the xROTM and xROTMG routines 2016-03-30 16:13:37 -07:00
Cedric Nugteren 6ecc0d089c Added prototypes for the xROT and xROTG functions 2016-03-30 16:13:32 -07:00
Cedric Nugteren 6e5f558746 Made event an optional argument in the CLBlast C++ API 2016-03-30 16:13:26 -07:00
Cedric Nugteren 6f561abada Added missing newline to the end of the public API file 2016-03-30 16:13:22 -07:00
Cedric Nugteren 2429ad5025 Fixed properly passing of OpenCL events to CLBlast functions 2016-03-30 16:12:53 -07:00
Cedric Nugteren aaa687ca98 Added preliminary support for the xNRM2 routines 2016-03-28 23:00:44 +02:00
Cedric Nugteren 1d5a702d9d Added prototypes for ScNRM2/DzNRM2 routines 2016-03-25 10:30:38 +01:00
Cedric Nugteren 3876096c30 Added prototypes for SNRM2/DNRM2 routines 2016-03-25 10:00:40 +01:00
Cedric Nugteren 49822c8ead Fixed the C-api export to be able to properly build a DLL on Windows 2016-03-23 20:49:28 +01:00
Cedric Nugteren d935695417 Added __declspec(dllexport) to create a DLL on Windows 2016-03-19 11:09:09 +01:00
Cedric Nugteren 918797735d Made the library thread-safe by guarding the kernel cache with a mutex 2016-03-14 22:55:22 +01:00
Cedric Nugteren 88c551cdea Added tuning results for the newest xGER family kernels 2016-03-12 16:23:58 +01:00
Cedric Nugteren 83c6a51765 Added tuning results for the ARM Mali-T628 GPU 2016-03-12 15:10:35 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 60da54da5d Added preliminary support for xHER2 and xSYR2 routines 2016-03-02 21:18:01 +01:00
Cedric Nugteren fa79720557 Added tuning results for Intel Iris Pro and AMD R9 M370X 2016-02-28 16:47:52 +01:00
Cedric Nugteren e3545215a5 Added support for xHER, xHPR, xSYR, and xSPR routines 2016-02-28 14:16:48 +01:00
Cedric Nugteren cef78c7356 Fixed a compilation issue under AppleClang 2016-02-28 14:14:50 +01:00
Cedric Nugteren 9f682aa66b Set a proper default precision for the CLBlast clients 2016-02-20 14:41:53 +01:00
Cedric Nugteren 6dc44da07b Added support for xGERU and xGERC routines 2016-02-20 14:15:41 +01:00
Cedric Nugteren 8854a73127 Added XGER routine, kernel, and tuner 2016-02-20 12:40:01 +01:00
Cedric Nugteren 6f4b34f813 Added tuning parameters for various devices using the new database script 2016-02-07 16:41:09 +01:00
Cedric Nugteren 00be6f7530 Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names 2016-02-07 11:59:30 +01:00
CNugteren fbf071ba62 Fixed a linker error in the performance client under GCC 2016-02-06 10:53:44 +01:00
Cedric Nugteren 310d05d187 Updated to version 4.0 of the CLCudaAPI header 2016-01-30 11:52:21 +01:00
Cedric Nugteren 276e772a2c Added first auto-generated database headers from the Python database; only K40 and Iris supported now 2016-01-30 11:43:21 +01:00
CNugteren 9bf6be8426 Added alpha and beta to tuner meta-data 2015-10-23 11:01:44 +02:00
CNugteren f74c9a5640 Routine names are now all default arguments defined in the header 2015-10-12 08:35:58 +02:00
CNugteren 2b56c2c603 Added TRMV/TBMV/TPMV routines 2015-09-26 16:58:03 +02:00
CNugteren 04d28b0420 Made buffer copying a const-method for the source 2015-09-26 16:48:11 +02:00
CNugteren de6547a92b Added SBMV and SPMV routines 2015-09-19 18:01:19 +02:00
CNugteren 80da67d28b Added the HPMV routine 2015-09-19 17:40:38 +02:00
CNugteren c32c4a9739 Added infrastructure for packed matrices 2015-09-19 17:37:42 +02:00
CNugteren aebd156869 Added the HBMV routine 2015-09-19 11:11:34 +02:00
CNugteren 93dddda63e Improved the organization and performance of level 2 routines 2015-09-18 17:46:41 +02:00
CNugteren 4507ba4997 Added first version of banded matrix-vector multiplication 2015-09-18 15:25:20 +02:00
CNugteren 6105ad6f5b Added interface of all level 2 routines 2015-09-17 17:05:45 +02:00
CNugteren 6307d2e5db Added script to generate API interface and implementation automatically 2015-09-17 10:14:33 +02:00
CNugteren a2e726d3bd Added xDOT/xDOTU/xDOTC dot-product routines 2015-09-14 16:57:00 +02:00
CNugteren 2a383f3450 Added extra temporary buffer to tuners in preparation of Xdot routines 2015-09-14 15:53:34 +02:00
CNugteren e0c5312abb Added support for the dot buffer and offset argument 2015-09-14 12:28:50 +02:00
CNugteren ff0c54c386 Added the XSWAP, XSCAL and XCOPY level-1 routines 2015-08-22 17:11:20 +02:00
Cedric Nugteren cf168fca70 Merge pull request #23 from CNugteren/tuner_database
Added initial version of a tuner-database
2015-08-20 08:38:18 +02:00
CNugteren 798a3b6101 Add check for supported precision to the tuners 2015-08-19 19:35:08 +02:00
CNugteren b46de22433 Moved precision tester to utilities 2015-08-19 19:34:29 +02:00
CNugteren 8a02db0746 Added precision to the JSON output 2015-08-19 11:12:42 +02:00
CNugteren 603e389545 Added all supported routines to the C API 2015-08-13 17:58:46 +02:00