Commit graph

105 commits

Author SHA1 Message Date
cnugteren 5c83217cf2 Added a wrapper for CBLAS libraries for performance/correctness testing 2016-04-01 22:36:39 -07:00
cnugteren 8c3c6db7d0 Merge branch 'level1_routines' into development 2016-03-30 21:37:56 -07:00
cnugteren 5409f349a1 Fixed the nrm2 kernel for complex data-types 2016-03-30 21:32:04 -07:00
Cedric Nugteren c1df786764 Added prototypes for the xROTM and xROTMG routines 2016-03-30 16:13:37 -07:00
Cedric Nugteren 6ecc0d089c Added prototypes for the xROT and xROTG functions 2016-03-30 16:13:32 -07:00
Cedric Nugteren 2429ad5025 Fixed properly passing of OpenCL events to CLBlast functions 2016-03-30 16:12:53 -07:00
Cedric Nugteren aaa687ca98 Added preliminary support for the xNRM2 routines 2016-03-28 23:00:44 +02:00
Cedric Nugteren 1d5a702d9d Added prototypes for ScNRM2/DzNRM2 routines 2016-03-25 10:30:38 +01:00
Cedric Nugteren 3876096c30 Added prototypes for SNRM2/DNRM2 routines 2016-03-25 10:00:40 +01:00
Cedric Nugteren 49822c8ead Fixed the C-api export to be able to properly build a DLL on Windows 2016-03-23 20:49:28 +01:00
Cedric Nugteren d935695417 Added __declspec(dllexport) to create a DLL on Windows 2016-03-19 11:09:09 +01:00
Cedric Nugteren 918797735d Made the library thread-safe by guarding the kernel cache with a mutex 2016-03-14 22:55:22 +01:00
Cedric Nugteren f4c09220c1 Fixed a bug in the GER-family of routines due to incorrect division of the workgroup size 2016-03-06 16:43:28 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 60da54da5d Added preliminary support for xHER2 and xSYR2 routines 2016-03-02 21:18:01 +01:00
Cedric Nugteren 4a56822dcc Fixed a couple of correctness bugs in the Xher kernels 2016-02-28 15:49:59 +01:00
Cedric Nugteren e3545215a5 Added support for xHER, xHPR, xSYR, and xSPR routines 2016-02-28 14:16:48 +01:00
Cedric Nugteren 9f682aa66b Set a proper default precision for the CLBlast clients 2016-02-20 14:41:53 +01:00
Cedric Nugteren 6dc44da07b Added support for xGERU and xGERC routines 2016-02-20 14:15:41 +01:00
Cedric Nugteren 8854a73127 Added XGER routine, kernel, and tuner 2016-02-20 12:40:01 +01:00
Cedric Nugteren bf84463ab2 Separated the GEMM kernel in two parts to reduce string length for MSVC 2016-02-08 20:06:02 +01:00
Cedric Nugteren 38c56bbde2 Split-up the XGEMV kernel in two parts 2016-02-08 19:43:34 +01:00
Cedric Nugteren 00be6f7530 Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names 2016-02-07 11:59:30 +01:00
CNugteren b7900652b2 Reduced the maximum workgroup-size for GEMV kernels further 2016-02-06 13:07:19 +01:00
CNugteren 40346bb3a5 Reduced unrolling factor in xgemv kernel to reduce compilation times 2016-02-06 12:09:21 +01:00
CNugteren 9622d3be22 Fixes for compilation under Visual Studio 2016-01-30 14:57:49 +01:00
Cedric Nugteren 276e772a2c Added first auto-generated database headers from the Python database; only K40 and Iris supported now 2016-01-30 11:43:21 +01:00
CNugteren c0d469718a Now sets local memory size in xgemv tuner properly 2015-10-28 21:19:59 +01:00
CNugteren 179ad0666d Fixed an arguments-related bug in the GEMV tuner 2015-10-25 16:48:26 +01:00
CNugteren a2d5d7770e Moved the tuner database script to a separate folder 2015-10-25 16:27:14 +01:00
CNugteren 0d4091fdfb Added guards for routine-specific level-3 pad kernels 2015-10-13 08:29:45 +02:00
CNugteren f74c9a5640 Routine names are now all default arguments defined in the header 2015-10-12 08:35:58 +02:00
CNugteren 54a8723f8c Moved level3 kernel files to a subfolder 2015-10-12 08:28:40 +02:00
CNugteren 2b56c2c603 Added TRMV/TBMV/TPMV routines 2015-09-26 16:58:03 +02:00
CNugteren de6547a92b Added SBMV and SPMV routines 2015-09-19 18:01:19 +02:00
CNugteren 80da67d28b Added the HPMV routine 2015-09-19 17:40:38 +02:00
CNugteren c32c4a9739 Added infrastructure for packed matrices 2015-09-19 17:37:42 +02:00
CNugteren aebd156869 Added the HBMV routine 2015-09-19 11:11:34 +02:00
CNugteren 93dddda63e Improved the organization and performance of level 2 routines 2015-09-18 17:46:41 +02:00
CNugteren 4507ba4997 Added first version of banded matrix-vector multiplication 2015-09-18 15:25:20 +02:00
CNugteren 6105ad6f5b Added interface of all level 2 routines 2015-09-17 17:05:45 +02:00
CNugteren 6307d2e5db Added script to generate API interface and implementation automatically 2015-09-17 10:14:33 +02:00
CNugteren a2e726d3bd Added xDOT/xDOTU/xDOTC dot-product routines 2015-09-14 16:57:00 +02:00
CNugteren 2a383f3450 Added extra temporary buffer to tuners in preparation of Xdot routines 2015-09-14 15:53:34 +02:00
CNugteren e0c5312abb Added support for the dot buffer and offset argument 2015-09-14 12:28:50 +02:00
CNugteren ff0c54c386 Added the XSWAP, XSCAL and XCOPY level-1 routines 2015-08-22 17:11:20 +02:00
CNugteren 75517353d5 Re-organized level1 xaxpy kernel 2015-08-22 14:33:48 +02:00
Cedric Nugteren cf168fca70 Merge pull request #23 from CNugteren/tuner_database
Added initial version of a tuner-database
2015-08-20 08:38:18 +02:00
CNugteren 15db2bcc20 Added initial version of tuner-database Python script 2015-08-20 08:30:51 +02:00
CNugteren b46de22433 Moved precision tester to utilities 2015-08-19 19:34:29 +02:00