Commit graph

272 commits

Author SHA1 Message Date
Cedric Nugteren a8f109296c Fixed the calculation of the required buffer sizes in case of subvectors and submatrices 2016-05-02 20:04:55 +02:00
Cedric Nugteren b9317d7d0c Made the default xDOT tuning size smaller 2016-05-01 14:39:44 +02:00
Cedric Nugteren bee2f943ec Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking 2016-05-01 14:03:37 +02:00
Cedric Nugteren 9602c150aa Added a program cache (per-context) next to the per-device binary cache 2016-05-01 12:56:08 +02:00
Cedric Nugteren e113ff0852 Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX 2016-04-30 09:49:39 +02:00
Cedric Nugteren 877aad693f Added FillCache: a function to pre-compile all kernels for a specific device 2016-04-29 23:33:12 +02:00
Cedric Nugteren d9b21d7f49 Fixed the cache to store binaries instead of OpenCL programs 2016-04-28 21:14:17 +02:00
Cedric Nugteren d7ddbdeb1f Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX 2016-04-27 18:07:30 +02:00
Cedric Nugteren 8075934ca7 Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute counterparts of xASUM and IxAMAX) 2016-04-27 17:06:19 +02:00
Cedric Nugteren 82be8f211c Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache 2016-04-27 16:02:13 +02:00
cnugteren 16a048f1ac Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines 2016-04-20 22:12:51 -06:00
cnugteren 894983fc3c Added prototype for ixAMAX routines 2016-04-20 21:11:33 -06:00
cnugteren 5a4f8217be Updated the reduction-kernel tuner to also tune the epilogue 2016-04-14 21:37:52 -06:00
cnugteren 8be99de82d Added support for the SASUM/DASUM/ScASUM/DzASUM routines 2016-04-14 19:58:26 -06:00
cnugteren e0497807e2 Added prototype for xASUM routines 2016-04-13 21:44:49 -06:00
cnugteren 1d3d38a261 Events are now properly implemented using event waiting list and asking the user to wait for event completion 2016-04-09 22:22:24 -06:00
cnugteren 90e237b97a Removed redundant queue synchronisation statements 2016-04-04 08:38:31 -07:00
cnugteren 5c83217cf2 Added a wrapper for CBLAS libraries for performance/correctness testing 2016-04-01 22:36:39 -07:00
cnugteren 8c3c6db7d0 Merge branch 'level1_routines' into development 2016-03-30 21:37:56 -07:00
cnugteren 5409f349a1 Fixed the nrm2 kernel for complex data-types 2016-03-30 21:32:04 -07:00
Cedric Nugteren c1df786764 Added prototypes for the xROTM and xROTMG routines 2016-03-30 16:13:37 -07:00
Cedric Nugteren 6ecc0d089c Added prototypes for the xROT and xROTG functions 2016-03-30 16:13:32 -07:00
Cedric Nugteren 2429ad5025 Fixed properly passing of OpenCL events to CLBlast functions 2016-03-30 16:12:53 -07:00
Cedric Nugteren aaa687ca98 Added preliminary support for the xNRM2 routines 2016-03-28 23:00:44 +02:00
Cedric Nugteren 1d5a702d9d Added prototypes for ScNRM2/DzNRM2 routines 2016-03-25 10:30:38 +01:00
Cedric Nugteren 3876096c30 Added prototypes for SNRM2/DNRM2 routines 2016-03-25 10:00:40 +01:00
Cedric Nugteren 49822c8ead Fixed the C-api export to be able to properly build a DLL on Windows 2016-03-23 20:49:28 +01:00
Cedric Nugteren d935695417 Added __declspec(dllexport) to create a DLL on Windows 2016-03-19 11:09:09 +01:00
Cedric Nugteren 918797735d Made the library thread-safe by guarding the kernel cache with a mutex 2016-03-14 22:55:22 +01:00
Cedric Nugteren f4c09220c1 Fixed a bug in the GER-family of routines due to incorrect division of the workgroup size 2016-03-06 16:43:28 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 60da54da5d Added preliminary support for xHER2 and xSYR2 routines 2016-03-02 21:18:01 +01:00
Cedric Nugteren 4a56822dcc Fixed a couple of correctness bugs in the Xher kernels 2016-02-28 15:49:59 +01:00
Cedric Nugteren e3545215a5 Added support for xHER, xHPR, xSYR, and xSPR routines 2016-02-28 14:16:48 +01:00
Cedric Nugteren 9f682aa66b Set a proper default precision for the CLBlast clients 2016-02-20 14:41:53 +01:00
Cedric Nugteren 6dc44da07b Added support for xGERU and xGERC routines 2016-02-20 14:15:41 +01:00
Cedric Nugteren 8854a73127 Added XGER routine, kernel, and tuner 2016-02-20 12:40:01 +01:00
Cedric Nugteren bf84463ab2 Separated the GEMM kernel in two parts to reduce string length for MSVC 2016-02-08 20:06:02 +01:00
Cedric Nugteren 38c56bbde2 Split-up the XGEMV kernel in two parts 2016-02-08 19:43:34 +01:00
Cedric Nugteren 00be6f7530 Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names 2016-02-07 11:59:30 +01:00
CNugteren b7900652b2 Reduced the maximum workgroup-size for GEMV kernels further 2016-02-06 13:07:19 +01:00
CNugteren 40346bb3a5 Reduced unrolling factor in xgemv kernel to reduce compilation times 2016-02-06 12:09:21 +01:00
CNugteren 9622d3be22 Fixes for compilation under Visual Studio 2016-01-30 14:57:49 +01:00
Cedric Nugteren 276e772a2c Added first auto-generated database headers from the Python database; only K40 and Iris supported now 2016-01-30 11:43:21 +01:00
CNugteren c0d469718a Now sets local memory size in xgemv tuner properly 2015-10-28 21:19:59 +01:00
CNugteren 179ad0666d Fixed an arguments-related bug in the GEMV tuner 2015-10-25 16:48:26 +01:00
CNugteren a2d5d7770e Moved the tuner database script to a separate folder 2015-10-25 16:27:14 +01:00
CNugteren 0d4091fdfb Added guards for routine-specific level-3 pad kernels 2015-10-13 08:29:45 +02:00
CNugteren f74c9a5640 Routine names are now all default arguments defined in the header 2015-10-12 08:35:58 +02:00
CNugteren 54a8723f8c Moved level3 kernel files to a subfolder 2015-10-12 08:28:40 +02:00
CNugteren 2b56c2c603 Added TRMV/TBMV/TPMV routines 2015-09-26 16:58:03 +02:00
CNugteren de6547a92b Added SBMV and SPMV routines 2015-09-19 18:01:19 +02:00
CNugteren 80da67d28b Added the HPMV routine 2015-09-19 17:40:38 +02:00
CNugteren c32c4a9739 Added infrastructure for packed matrices 2015-09-19 17:37:42 +02:00
CNugteren aebd156869 Added the HBMV routine 2015-09-19 11:11:34 +02:00
CNugteren 93dddda63e Improved the organization and performance of level 2 routines 2015-09-18 17:46:41 +02:00
CNugteren 4507ba4997 Added first version of banded matrix-vector multiplication 2015-09-18 15:25:20 +02:00
CNugteren 6105ad6f5b Added interface of all level 2 routines 2015-09-17 17:05:45 +02:00
CNugteren 6307d2e5db Added script to generate API interface and implementation automatically 2015-09-17 10:14:33 +02:00
CNugteren a2e726d3bd Added xDOT/xDOTU/xDOTC dot-product routines 2015-09-14 16:57:00 +02:00
CNugteren 2a383f3450 Added extra temporary buffer to tuners in preparation of Xdot routines 2015-09-14 15:53:34 +02:00
CNugteren e0c5312abb Added support for the dot buffer and offset argument 2015-09-14 12:28:50 +02:00
CNugteren ff0c54c386 Added the XSWAP, XSCAL and XCOPY level-1 routines 2015-08-22 17:11:20 +02:00
CNugteren 75517353d5 Re-organized level1 xaxpy kernel 2015-08-22 14:33:48 +02:00
Cedric Nugteren cf168fca70 Merge pull request #23 from CNugteren/tuner_database
Added initial version of a tuner-database
2015-08-20 08:38:18 +02:00
CNugteren 15db2bcc20 Added initial version of tuner-database Python script 2015-08-20 08:30:51 +02:00
CNugteren b46de22433 Moved precision tester to utilities 2015-08-19 19:34:29 +02:00
CNugteren cbd25bffea Added hotfix 8eeb7f721f 2015-08-19 11:12:16 +02:00
Cedric Nugteren 4f6e42d052 Merge pull request #21 from CNugteren/c_api
Added a plain C API
2015-08-13 18:02:03 +02:00
CNugteren 603e389545 Added all supported routines to the C API 2015-08-13 17:58:46 +02:00
CNugteren 8eeb7f721f Fixed a complex data-type bug in the transpose kernel 2015-08-13 14:33:42 +02:00
CNugteren 8617195ac5 Added initial version of C API with just one routine 2015-08-13 13:46:13 +02:00
CNugteren dbdb58c600 Refactored the tuners, added JSON output 2015-08-09 15:50:41 +02:00
CNugteren 75b4d92ac3 Added distinguished names for GEMV inherited HEMV/SYMV 2015-08-04 08:15:39 +02:00
CNugteren d1a7cf18ec Abstracted loading of matrix A for GEMV kernel 2015-08-03 07:37:14 +02:00
CNugteren 938ca2707f Added HEMV routine 2015-07-31 17:35:42 +02:00
CNugteren b89517a2e7 Added SYMV routine 2015-07-31 17:13:41 +02:00
CNugteren f7199b831f Now using the new Claduc C++11 OpenCL header 2015-07-27 07:18:06 +02:00
CNugteren 4dcecfe934 Added workgroup shuffle option to transpose kernel for AMD GPUs 2015-07-22 07:31:16 +02:00
CNugteren d93efa3169 Transpose kernel now uses vectorized local memory loads and stores 2015-07-21 08:22:18 +02:00
CNugteren a0f0f6c8ce Triangular GEMM kernels are only compiled when needed 2015-07-19 16:36:12 +02:00
CNugteren 48e2e96f1b Kernel caching is now based on a routine's name 2015-07-19 16:24:14 +02:00
CNugteren 4e499a67c1 The kernel source string is now a routine's member variable 2015-07-19 13:44:37 +02:00
CNugteren 9300261bd4 Fixed a bug when using the Xgemm kernel without local memory 2015-07-16 22:49:55 +02:00
CNugteren 0157d6d4ea Using mad() instruction for AMD devices like clBLAS does 2015-07-16 22:42:02 +02:00
CNugteren b526623fc7 Skips pre/post processing kernels if not needed 2015-07-15 22:12:38 +02:00
CNugteren 0dc85845f7 Updated interface of the PadCopyTransposeMatrix method 2015-07-13 08:41:26 +02:00
CNugteren aa852bbe67 Added subfolders for the level1/2/3 routines 2015-07-12 16:57:09 +02:00
CNugteren b5d39d9d0c Added the HEMM routine, tester, and client 2015-07-12 15:11:50 +02:00
CNugteren 9a929f3fb2 Disabled prototype of TRSM 2015-07-10 21:08:18 +02:00
CNugteren b02876d6e9 Added the HER2K routine, tester, and client 2015-07-10 20:59:20 +02:00
CNugteren 919bba3eaf Added the HERK routine, tester, and client 2015-07-10 07:19:59 +02:00
CNugteren 5578d5ab28 Added option to set the imaginary part of the diagonal to zero 2015-07-08 07:25:18 +02:00
CNugteren 599f9a70a6 Added option to set the imaginary part of the diagonal to zero 2015-07-07 07:34:36 +02:00
CNugteren d9ea0c47c6 Added the TRMM routine, tester, and client 2015-07-02 07:16:04 +02:00
CNugteren d879eb3abf Added a set-to-one function for kernels 2015-07-02 07:11:27 +02:00
CNugteren e3dd35f91b Added the unit/non-unit diagonal enum 2015-07-01 09:39:41 +02:00
CNugteren b8d81a60d6 Fixed typos in SYMM 2015-07-01 09:38:04 +02:00
CNugteren 8574f72d46 Added the TRMM and TRSM interface 2015-06-30 07:36:11 +02:00
CNugteren 7c8d16147a Added the SYR2K routine, tester, and client 2015-06-26 08:12:56 +02:00
CNugteren 57c705dbf2 Clarified comment 2015-06-25 20:38:34 +02:00
CNugteren 60a88aac86 Added the SYRK routine, tester, and client 2015-06-24 07:50:18 +02:00
CNugteren 9fc38cdf5e Added a lower/upper triangular version of the GEMM kernel 2015-06-23 17:58:51 +02:00
CNugteren 20eb3506d6 Added a condition to update only lower/upper triangular parts in the un-pad kernels 2015-06-23 08:09:07 +02:00
CNugteren e3829c1067 Added prototypes of SYRK and SYR2K 2015-06-21 12:44:03 +02:00
CNugteren 3ea3ba2bee Distinguish between a short smoke test and a full test 2015-06-20 13:33:50 +02:00
CNugteren e26742c629 Added additional absolute error checking when testing 2015-06-20 10:58:21 +02:00
CNugteren 682c01a80c Now returns program from database by reference 2015-06-18 18:44:14 +02:00
CNugteren 7e176ccac9 Added support for conjugate transpose in GEMV 2015-06-16 08:42:52 +02:00
CNugteren af78a04eca Updated the tuners to set the conjugate argument 2015-06-16 07:50:45 +02:00
CNugteren e03582a112 Added support for CGEMM/ZGEMM and CSYMM/ZSYMM 2015-06-16 07:45:09 +02:00
CNugteren 8f01c644b5 Added support for complex conjugate transpose 2015-06-16 07:43:19 +02:00
CNugteren 01726197ab Fixed a bug in AXPBY defines for complex data-types 2015-06-15 08:38:24 +02:00
CNugteren 294a3e3d41 Split the three variations of the GEMV kernel for maximal tuning freedom 2015-06-14 11:15:53 +02:00
CNugteren ab0064dab7 Fixed number of threads launched for GEMV 2015-06-14 10:08:56 +02:00
CNugteren 9aa2989447 Fixed number of threads launched for AXPY 2015-06-14 10:08:23 +02:00
CNugteren 4b3e3dcfe0 Added a fast GEMV kernel with vector loads, no tail, and fewer if-statements 2015-06-13 20:46:01 +02:00
CNugteren 6662f5d8e9 Refactored the GEMV kernel 2015-06-13 17:07:31 +02:00
CNugteren 9b66883e9c Improved GEMV kernel with local memory and a tunable WPT 2015-06-13 14:10:07 +02:00
CNugteren e522d1a74e Added initial version of GEMV including tester and performance client 2015-06-13 11:01:20 +02:00
CNugteren 85c1db9322 Added initial naive version of Xgemv kernel 2015-06-10 08:44:30 +02:00
CNugteren bc5a341dfe Initial commit of preview version 2015-05-30 12:30:43 +02:00