Commit graph

1398 commits

Author SHA1 Message Date
CNugteren 250f8ab295 Fixed complex performance on Intel Iris 2015-07-19 13:39:13 +02:00
CNugteren 9300261bd4 Fixed a bug when using the Xgemm kernel without local memory 2015-07-16 22:49:55 +02:00
CNugteren 0157d6d4ea Using mad() instruction for AMD devices like clBLAS does 2015-07-16 22:42:02 +02:00
Cedric Nugteren 3bb1b5fa6e Merge pull request #13 from CNugteren/bypass_pre_post_processing
Bypass pre/post-processing
2015-07-15 22:27:56 +02:00
CNugteren 6908c4ebd2 Updated changelog with pre/post-processing bypass 2015-07-15 22:24:15 +02:00
CNugteren ba0026d2b9 Changed performance graphs to default to column-major 2015-07-15 22:21:24 +02:00
CNugteren b526623fc7 Skips pre/post processing kernels if not needed 2015-07-15 22:12:38 +02:00
CNugteren 0dc85845f7 Updated interface of the PadCopyTransposeMatrix method 2015-07-13 08:41:26 +02:00
Cedric Nugteren 530418f06f Merge pull request #12 from CNugteren/level_subfolders
Added subfolders for the level1/2/3 routines
2015-07-12 16:59:17 +02:00
CNugteren aa852bbe67 Added subfolders for the level1/2/3 routines 2015-07-12 16:57:09 +02:00
Cedric Nugteren 721546e64a Merge pull request #11 from CNugteren/level3_routines_2
Added level-3 routines
2015-07-12 15:22:11 +02:00
CNugteren c920400261 Added HEMM, HERK, HER2K, and TRMM 2015-07-12 15:14:35 +02:00
CNugteren b5d39d9d0c Added the HEMM routine, tester, and client 2015-07-12 15:11:50 +02:00
CNugteren 9a929f3fb2 Disabled prototype of TRSM 2015-07-10 21:08:18 +02:00
CNugteren b02876d6e9 Added the HER2K routine, tester, and client 2015-07-10 20:59:20 +02:00
CNugteren 919bba3eaf Added the HERK routine, tester, and client 2015-07-10 07:19:59 +02:00
CNugteren 2fe3fe1580 The clients now distinguish between the memory and alpha/beta data-type 2015-07-10 07:18:12 +02:00
CNugteren 5578d5ab28 Added option to set the imaginary part of the diagonal to zero 2015-07-08 07:25:18 +02:00
CNugteren 82469fc764 The testers now distinguish between the memory and alpha/beta data-type 2015-07-08 07:21:44 +02:00
CNugteren 599f9a70a6 Added option to set the imaginary part of the diagonal to zero 2015-07-07 07:34:36 +02:00
CNugteren d9ea0c47c6 Added the TRMM routine, tester, and client 2015-07-02 07:16:04 +02:00
CNugteren 500416aa38 Fixed the order of arguments 2015-07-02 07:12:49 +02:00
CNugteren d879eb3abf Added a set-to-one function for kernels 2015-07-02 07:11:27 +02:00
CNugteren e3dd35f91b Added the unit/non-unit diagonal enum 2015-07-01 09:39:41 +02:00
CNugteren b8d81a60d6 Fixed typos in SYMM 2015-07-01 09:38:04 +02:00
CNugteren 8574f72d46 Added the TRMM and TRSM interface 2015-06-30 07:36:11 +02:00
CNugteren a591d5607d Added constness to all cl_mem objects 2015-06-30 07:35:54 +02:00
CNugteren 14186af590 Added TRMM and TRSM clBLAS wrappers 2015-06-30 07:19:46 +02:00
Cedric Nugteren cbf2eef179 Merge pull request #10 from CNugteren/test_infrastructure
Re-organized test infrastructure
2015-06-29 20:45:10 +02:00
CNugteren 3726f6a618 Re-organized test and client infrastructure 2015-06-29 20:42:34 +02:00
CNugteren ede78fe499 Fixed the license for the correctness testers 2015-06-29 20:39:51 +02:00
CNugteren 2914a285d4 Re-organized the performance-client infrastructure to avoid code duplication 2015-06-29 20:38:34 +02:00
CNugteren e5c0edbfd7 Re-organized the test infrastructure to avoid code duplication 2015-06-28 15:52:57 +02:00
CNugteren cf1892d22c Added buffer structure and sizes to arguments 2015-06-28 15:37:38 +02:00
Cedric Nugteren 77e2157485 Merge pull request #9 from CNugteren/level3_routines
Added SYRK and SYR2K level-3 routines
2015-06-26 20:56:21 +02:00
CNugteren e27e339ebf Replaced crosses with tickmarks 2015-06-26 17:43:17 +02:00
CNugteren 7c8d16147a Added the SYR2K routine, tester, and client 2015-06-26 08:12:56 +02:00
CNugteren 75f263ce3a Added symmetric matrix support for the ABC performance tester 2015-06-26 08:10:23 +02:00
CNugteren ff9f9fac57 Added option to test only symmetric matrices (m=n) 2015-06-25 20:39:34 +02:00
CNugteren 57c705dbf2 Clarified comment 2015-06-25 20:38:34 +02:00
CNugteren 96e4012349 Added SSYRK performance graphs 2015-06-25 19:19:31 +02:00
CNugteren 3de4471afe Added the SYRK routine 2015-06-24 07:52:19 +02:00
CNugteren 60a88aac86 Added the SYRK routine, tester, and client 2015-06-24 07:50:18 +02:00
CNugteren a17297937d Added performance-client for AC routines 2015-06-23 22:31:27 +02:00
CNugteren 9fc38cdf5e Added a lower/upper triangular version of the GEMM kernel 2015-06-23 17:58:51 +02:00
CNugteren 0a3831e6d1 Updated bandwidth computation for GEMM and SYMM 2015-06-23 08:09:46 +02:00
CNugteren 20eb3506d6 Added a condition to update only lower/upper triangular parts in the un-pad kernels 2015-06-23 08:09:07 +02:00
CNugteren 4c2a166bc5 Added test infrastructure for AB and AC routines 2015-06-21 12:57:38 +02:00
CNugteren e3829c1067 Added prototypes of SYRK and SYR2K 2015-06-21 12:44:03 +02:00
CNugteren ea7da6a497 Fixed support for complex data-types for GEMM and SYMM clients 2015-06-21 11:21:03 +02:00