Commit graph

177 commits

Author SHA1 Message Date
Cedric Nugteren 88c551cdea Added tuning results for the newest xGER family kernels 2016-03-12 16:23:58 +01:00
Cedric Nugteren 83c6a51765 Added tuning results for the ARM Mali-T628 GPU 2016-03-12 15:10:35 +01:00
Cedric Nugteren 306bf67660 Added preliminary support for xHPR2 and xSPR2 routines 2016-03-06 15:48:11 +01:00
Cedric Nugteren 60da54da5d Added preliminary support for xHER2 and xSYR2 routines 2016-03-02 21:18:01 +01:00
Cedric Nugteren fa79720557 Added tuning results for Intel Iris Pro and AMD R9 M370X 2016-02-28 16:47:52 +01:00
Cedric Nugteren e3545215a5 Added support for xHER, xHPR, xSYR, and xSPR routines 2016-02-28 14:16:48 +01:00
Cedric Nugteren cef78c7356 Fixed a compilation issue under AppleClang 2016-02-28 14:14:50 +01:00
Cedric Nugteren 9f682aa66b Set a proper default precision for the CLBlast clients 2016-02-20 14:41:53 +01:00
Cedric Nugteren 6dc44da07b Added support for xGERU and xGERC routines 2016-02-20 14:15:41 +01:00
Cedric Nugteren 8854a73127 Added XGER routine, kernel, and tuner 2016-02-20 12:40:01 +01:00
Cedric Nugteren 6f4b34f813 Added tuning parameters for various devices using the new database script 2016-02-07 16:41:09 +01:00
Cedric Nugteren 00be6f7530 Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names 2016-02-07 11:59:30 +01:00
CNugteren fbf071ba62 Fixed a linker error in the performance client under GCC 2016-02-06 10:53:44 +01:00
Cedric Nugteren 310d05d187 Updated to version 4.0 of the CLCudaAPI header 2016-01-30 11:52:21 +01:00
Cedric Nugteren 276e772a2c Added first auto-generated database headers from the Python database; only K40 and Iris supported now 2016-01-30 11:43:21 +01:00
CNugteren 9bf6be8426 Added alpha and beta to tuner meta-data 2015-10-23 11:01:44 +02:00
CNugteren f74c9a5640 Routine names are now all default arguments defined in the header 2015-10-12 08:35:58 +02:00
CNugteren 2b56c2c603 Added TRMV/TBMV/TPMV routines 2015-09-26 16:58:03 +02:00
CNugteren 04d28b0420 Made buffer copying a const-method for the source 2015-09-26 16:48:11 +02:00
CNugteren de6547a92b Added SBMV and SPMV routines 2015-09-19 18:01:19 +02:00
CNugteren 80da67d28b Added the HPMV routine 2015-09-19 17:40:38 +02:00
CNugteren c32c4a9739 Added infrastructure for packed matrices 2015-09-19 17:37:42 +02:00
CNugteren aebd156869 Added the HBMV routine 2015-09-19 11:11:34 +02:00
CNugteren 93dddda63e Improved the organization and performance of level 2 routines 2015-09-18 17:46:41 +02:00
CNugteren 4507ba4997 Added first version of banded matrix-vector multiplication 2015-09-18 15:25:20 +02:00
CNugteren 6105ad6f5b Added interface of all level 2 routines 2015-09-17 17:05:45 +02:00
CNugteren 6307d2e5db Added script to generate API interface and implementation automatically 2015-09-17 10:14:33 +02:00
CNugteren a2e726d3bd Added xDOT/xDOTU/xDOTC dot-product routines 2015-09-14 16:57:00 +02:00
CNugteren 2a383f3450 Added extra temporary buffer to tuners in preparation of Xdot routines 2015-09-14 15:53:34 +02:00
CNugteren e0c5312abb Added support for the dot buffer and offset argument 2015-09-14 12:28:50 +02:00
CNugteren ff0c54c386 Added the XSWAP, XSCAL and XCOPY level-1 routines 2015-08-22 17:11:20 +02:00
Cedric Nugteren cf168fca70 Merge pull request #23 from CNugteren/tuner_database
Added initial version of a tuner-database
2015-08-20 08:38:18 +02:00
CNugteren 798a3b6101 Add check for supported precision to the tuners 2015-08-19 19:35:08 +02:00
CNugteren b46de22433 Moved precision tester to utilities 2015-08-19 19:34:29 +02:00
CNugteren 8a02db0746 Added precision to the JSON output 2015-08-19 11:12:42 +02:00
CNugteren 603e389545 Added all supported routines to the C API 2015-08-13 17:58:46 +02:00
CNugteren 8617195ac5 Added initial version of C API with just one routine 2015-08-13 13:46:13 +02:00
CNugteren f85d44f602 Added argument m,n,k metadata to JSON files 2015-08-13 08:33:04 +02:00
CNugteren dbdb58c600 Refactored the tuners, added JSON output 2015-08-09 15:50:41 +02:00
CNugteren 75b4d92ac3 Added distinguished names for GEMV inherited HEMV/SYMV 2015-08-04 08:15:39 +02:00
CNugteren 938ca2707f Added HEMV routine 2015-07-31 17:35:42 +02:00
CNugteren b89517a2e7 Added SYMV routine 2015-07-31 17:13:41 +02:00
CNugteren f7199b831f Now using the new Claduc C++11 OpenCL header 2015-07-27 07:18:06 +02:00
CNugteren dd8471ba92 Set the correct name for AMD OpenCL devices 2015-07-22 19:25:06 +02:00
CNugteren 3a6bdeb79a Updated GEMM tuning results for Tahiti 2015-07-22 07:31:39 +02:00
CNugteren 4dcecfe934 Added workgroup shuffle option to transpose kernel for AMD GPUs 2015-07-22 07:31:16 +02:00
CNugteren 48e2e96f1b Kernel caching is now based on a routine's name 2015-07-19 16:24:14 +02:00
CNugteren 4e499a67c1 The kernel source string is now a routine's member variable 2015-07-19 13:44:37 +02:00
CNugteren 250f8ab295 Fixed complex performance on Intel Iris 2015-07-19 13:39:13 +02:00
CNugteren 0dc85845f7 Updated interface of the PadCopyTransposeMatrix method 2015-07-13 08:41:26 +02:00
CNugteren aa852bbe67 Added subfolders for the level1/2/3 routines 2015-07-12 16:57:09 +02:00
CNugteren b5d39d9d0c Added the HEMM routine, tester, and client 2015-07-12 15:11:50 +02:00
CNugteren 9a929f3fb2 Disabled prototype of TRSM 2015-07-10 21:08:18 +02:00
CNugteren b02876d6e9 Added the HER2K routine, tester, and client 2015-07-10 20:59:20 +02:00
CNugteren 919bba3eaf Added the HERK routine, tester, and client 2015-07-10 07:19:59 +02:00
CNugteren 5578d5ab28 Added option to set the imaginary part of the diagonal to zero 2015-07-08 07:25:18 +02:00
CNugteren d9ea0c47c6 Added the TRMM routine, tester, and client 2015-07-02 07:16:04 +02:00
CNugteren e3dd35f91b Added the unit/non-unit diagonal enum 2015-07-01 09:39:41 +02:00
CNugteren 8574f72d46 Added the TRMM and TRSM interface 2015-06-30 07:36:11 +02:00
CNugteren cf1892d22c Added buffer structure and sizes to arguments 2015-06-28 15:37:38 +02:00
CNugteren 7c8d16147a Added the SYR2K routine, tester, and client 2015-06-26 08:12:56 +02:00
CNugteren 60a88aac86 Added the SYRK routine, tester, and client 2015-06-24 07:50:18 +02:00
CNugteren 20eb3506d6 Added a condition to update only lower/upper triangular parts in the un-pad kernels 2015-06-23 08:09:07 +02:00
CNugteren e3829c1067 Added prototypes of SYRK and SYR2K 2015-06-21 12:44:03 +02:00
CNugteren 0f486d9b74 Automatically skips tests with unsupported precision 2015-06-20 14:13:54 +02:00
CNugteren 3ea3ba2bee Distinguish between a short smoke test and a full test 2015-06-20 13:33:50 +02:00
CNugteren e26742c629 Added additional absolute error checking when testing 2015-06-20 10:58:21 +02:00
CNugteren ab55df703d Added const-ref accessors to all CL++11 classes 2015-06-19 07:28:35 +02:00
CNugteren 682c01a80c Now returns program from database by reference 2015-06-18 18:44:14 +02:00
CNugteren 8f01c644b5 Added support for complex conjugate transpose 2015-06-16 07:43:19 +02:00
CNugteren ce703a2f5a Added tuning for DGEMV on Iris and SGEMV on K40m 2015-06-15 08:41:13 +02:00
CNugteren 294a3e3d41 Split the three variations of the GEMV kernel for maximal tuning freedom 2015-06-14 11:15:53 +02:00
CNugteren 4b3e3dcfe0 Added a fast GEMV kernel with vector loads, no tail, and fewer if-statements 2015-06-13 20:46:01 +02:00
CNugteren 9b66883e9c Improved GEMV kernel with local memory and a tunable WPT 2015-06-13 14:10:07 +02:00
CNugteren e522d1a74e Added initial version of GEMV including tester and performance client 2015-06-13 11:01:20 +02:00
CNugteren 85c1db9322 Added initial naive version of Xgemv kernel 2015-06-10 08:44:30 +02:00
CNugteren bc5a341dfe Initial commit of preview version 2015-05-30 12:30:43 +02:00