Commit graph

71 commits

Author SHA1 Message Date
CNugteren de6547a92b Added SBMV and SPMV routines 2015-09-19 18:01:19 +02:00
CNugteren 80da67d28b Added the HPMV routine 2015-09-19 17:40:38 +02:00
CNugteren c32c4a9739 Added infrastructure for packed matrices 2015-09-19 17:37:42 +02:00
CNugteren aebd156869 Added the HBMV routine 2015-09-19 11:11:34 +02:00
CNugteren 93dddda63e Improved the organization and performance of level 2 routines 2015-09-18 17:46:41 +02:00
CNugteren 4507ba4997 Added first version of banded matrix-vector multiplication 2015-09-18 15:25:20 +02:00
CNugteren 6105ad6f5b Added interface of all level 2 routines 2015-09-17 17:05:45 +02:00
CNugteren 6307d2e5db Added script to generate API interface and implementation automatically 2015-09-17 10:14:33 +02:00
CNugteren a2e726d3bd Added xDOT/xDOTU/xDOTC dot-product routines 2015-09-14 16:57:00 +02:00
CNugteren 2a383f3450 Added extra temporary buffer to tuners in preparation of Xdot routines 2015-09-14 15:53:34 +02:00
CNugteren e0c5312abb Added support for the dot buffer and offset argument 2015-09-14 12:28:50 +02:00
CNugteren ff0c54c386 Added the XSWAP, XSCAL and XCOPY level-1 routines 2015-08-22 17:11:20 +02:00
CNugteren 75517353d5 Re-organized level1 xaxpy kernel 2015-08-22 14:33:48 +02:00
Cedric Nugteren cf168fca70 Merge pull request #23 from CNugteren/tuner_database
Added initial version of a tuner-database
2015-08-20 08:38:18 +02:00
CNugteren 15db2bcc20 Added initial version of tuner-database Python script 2015-08-20 08:30:51 +02:00
CNugteren b46de22433 Moved precision tester to utilities 2015-08-19 19:34:29 +02:00
CNugteren cbd25bffea Added hotfix 8eeb7f721f 2015-08-19 11:12:16 +02:00
Cedric Nugteren 4f6e42d052 Merge pull request #21 from CNugteren/c_api
Added a plain C API
2015-08-13 18:02:03 +02:00
CNugteren 603e389545 Added all supported routines to the C API 2015-08-13 17:58:46 +02:00
CNugteren 8eeb7f721f Fixed a complex data-type bug in the transpose kernel 2015-08-13 14:33:42 +02:00
CNugteren 8617195ac5 Added initial version of C API with just one routine 2015-08-13 13:46:13 +02:00
CNugteren dbdb58c600 Refactored the tuners, added JSON output 2015-08-09 15:50:41 +02:00
CNugteren 75b4d92ac3 Added distinguished names for GEMV inherited HEMV/SYMV 2015-08-04 08:15:39 +02:00
CNugteren d1a7cf18ec Abstracted loading of matrix A for GEMV kernel 2015-08-03 07:37:14 +02:00
CNugteren 938ca2707f Added HEMV routine 2015-07-31 17:35:42 +02:00
CNugteren b89517a2e7 Added SYMV routine 2015-07-31 17:13:41 +02:00
CNugteren f7199b831f Now using the new Claduc C++11 OpenCL header 2015-07-27 07:18:06 +02:00
CNugteren 4dcecfe934 Added workgroup shuffle option to transpose kernel for AMD GPUs 2015-07-22 07:31:16 +02:00
CNugteren d93efa3169 Transpose kernel now uses vectorized local memory loads and stores 2015-07-21 08:22:18 +02:00
CNugteren a0f0f6c8ce Triangular GEMM kernels are only compiled when needed 2015-07-19 16:36:12 +02:00
CNugteren 48e2e96f1b Kernel caching is now based on a routine's name 2015-07-19 16:24:14 +02:00
CNugteren 4e499a67c1 The kernel source string is now a routine's member variable 2015-07-19 13:44:37 +02:00
CNugteren 9300261bd4 Fixed a bug when using the Xgemm kernel without local memory 2015-07-16 22:49:55 +02:00
CNugteren 0157d6d4ea Using mad() instruction for AMD devices like clBLAS does 2015-07-16 22:42:02 +02:00
CNugteren b526623fc7 Skips pre/post processing kernels if not needed 2015-07-15 22:12:38 +02:00
CNugteren 0dc85845f7 Updated interface of the PadCopyTransposeMatrix method 2015-07-13 08:41:26 +02:00
CNugteren aa852bbe67 Added subfolders for the level1/2/3 routines 2015-07-12 16:57:09 +02:00
CNugteren b5d39d9d0c Added the HEMM routine, tester, and client 2015-07-12 15:11:50 +02:00
CNugteren 9a929f3fb2 Disabled prototype of TRSM 2015-07-10 21:08:18 +02:00
CNugteren b02876d6e9 Added the HER2K routine, tester, and client 2015-07-10 20:59:20 +02:00
CNugteren 919bba3eaf Added the HERK routine, tester, and client 2015-07-10 07:19:59 +02:00
CNugteren 5578d5ab28 Added option to set the imaginary part of the diagonal to zero 2015-07-08 07:25:18 +02:00
CNugteren 599f9a70a6 Added option to set the imaginary part of the diagonal to zero 2015-07-07 07:34:36 +02:00
CNugteren d9ea0c47c6 Added the TRMM routine, tester, and client 2015-07-02 07:16:04 +02:00
CNugteren d879eb3abf Added a set-to-one function for kernels 2015-07-02 07:11:27 +02:00
CNugteren e3dd35f91b Added the unit/non-unit diagonal enum 2015-07-01 09:39:41 +02:00
CNugteren b8d81a60d6 Fixed typos in SYMM 2015-07-01 09:38:04 +02:00
CNugteren 8574f72d46 Added the TRMM and TRSM interface 2015-06-30 07:36:11 +02:00
CNugteren 7c8d16147a Added the SYR2K routine, tester, and client 2015-06-26 08:12:56 +02:00
CNugteren 57c705dbf2 Clarified comment 2015-06-25 20:38:34 +02:00