Commit Graph

1473 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
CNugteren c52c5f3d35 Added HEMV and SYMV 2015-07-31 17:41:10 +02:00
CNugteren 938ca2707f Added HEMV routine 2015-07-31 17:35:42 +02:00
CNugteren b89517a2e7 Added SYMV routine 2015-07-31 17:13:41 +02:00
Cedric Nugteren 674f69390d Merge pull request #18 from CNugteren/correctness_test_refactoring
Refactored the correctness tests
2015-07-31 16:01:47 +02:00
CNugteren c5d5adbddd Refactored the correctness tests 2015-07-31 15:52:13 +02:00
Cedric Nugteren 6e1e7fdcaf Merge pull request #17 from CNugteren/clblas_external
Removed clBLAS sources
2015-07-31 11:20:30 +02:00
CNugteren a27ce11c69 Updated documentation reflecting removal of clBLAS sources 2015-07-31 11:15:48 +02:00
CNugteren 68044254c7 Removed clBLAS source code, now requires separate installation 2015-07-31 11:06:07 +02:00
CNugteren e4c9f4cfe5 Moved the preferred options of clBLAS (no tests) to the CLBlast CMakeLists file 2015-07-27 07:34:19 +02:00
Cedric Nugteren 1acec9c951 Merge pull request #16 from CNugteren/claduc_header
Now using the new Claduc C++11 OpenCL header
2015-07-27 07:21:20 +02:00
CNugteren f7199b831f Now using the new Claduc C++11 OpenCL header 2015-07-27 07:18:06 +02:00
CNugteren b10f4a633c Prepared the changelog for the next release 2015-07-24 20:50:00 +02:00
Cedric Nugteren db6846b791 Merge pull request #15 from CNugteren/development
Update to version 0.3.0
2015-07-24 08:30:41 +02:00
CNugteren efbdcd2d90 Updated to version 0.3.0 2015-07-24 08:25:32 +02:00
Cedric Nugteren 44760e7381 Merge pull request #14 from CNugteren/amd_performance
Improved performance for AMD GPUs
2015-07-24 08:21:01 +02:00
CNugteren a76dc2f09c Updated the docs to reflect the performance improvements 2015-07-24 08:16:41 +02:00
CNugteren 547b7afffc Updated the performance results, added HD7950 2015-07-23 18:25:39 +02:00
CNugteren 0273b622d3 Made the graph script robust against diagnostic system messages 2015-07-22 21:30:02 +02:00
CNugteren dd8471ba92 Set the correct name for AMD OpenCL devices 2015-07-22 19:25:06 +02:00
CNugteren 3a6bdeb79a Updated GEMM tuning results for Tahiti 2015-07-22 07:31:39 +02:00
CNugteren 4dcecfe934 Added workgroup shuffle option to transpose kernel for AMD GPUs 2015-07-22 07:31:16 +02:00
CNugteren d93efa3169 Transpose kernel now uses vectorized local memory loads and stores 2015-07-21 08:22:18 +02:00
CNugteren a0f0f6c8ce Triangular GEMM kernels are only compiled when needed 2015-07-19 16:36:12 +02:00
CNugteren 48e2e96f1b Kernel caching is now based on a routine's name 2015-07-19 16:24:14 +02:00
CNugteren 4e499a67c1 The kernel source string is now a routine's member variable 2015-07-19 13:44:37 +02:00
CNugteren 250f8ab295 Fixed complex performance on Intel Iris 2015-07-19 13:39:13 +02:00
CNugteren 9300261bd4 Fixed a bug when using the Xgemm kernel without local memory 2015-07-16 22:49:55 +02:00
CNugteren 0157d6d4ea Using mad() instruction for AMD devices like clBLAS does 2015-07-16 22:42:02 +02:00
Cedric Nugteren 3bb1b5fa6e Merge pull request #13 from CNugteren/bypass_pre_post_processing
Bypass pre/post-processing
2015-07-15 22:27:56 +02:00
CNugteren 6908c4ebd2 Updated changelog with pre/post-processing bypass 2015-07-15 22:24:15 +02:00
CNugteren ba0026d2b9 Changed performance graphs to default to column-major 2015-07-15 22:21:24 +02:00
CNugteren b526623fc7 Skips pre/post processing kernels if not needed 2015-07-15 22:12:38 +02:00
CNugteren 0dc85845f7 Updated interface of the PadCopyTransposeMatrix method 2015-07-13 08:41:26 +02:00
Cedric Nugteren 530418f06f Merge pull request #12 from CNugteren/level_subfolders
Added subfolders for the level1/2/3 routines
2015-07-12 16:59:17 +02:00
CNugteren aa852bbe67 Added subfolders for the level1/2/3 routines 2015-07-12 16:57:09 +02:00
Cedric Nugteren 721546e64a Merge pull request #11 from CNugteren/level3_routines_2
Added level-3 routines
2015-07-12 15:22:11 +02:00
CNugteren c920400261 Added HEMM, HERK, HER2K, and TRMM 2015-07-12 15:14:35 +02:00
CNugteren b5d39d9d0c Added the HEMM routine, tester, and client 2015-07-12 15:11:50 +02:00
CNugteren 9a929f3fb2 Disabled prototype of TRSM 2015-07-10 21:08:18 +02:00
CNugteren b02876d6e9 Added the HER2K routine, tester, and client 2015-07-10 20:59:20 +02:00
CNugteren 919bba3eaf Added the HERK routine, tester, and client 2015-07-10 07:19:59 +02:00
CNugteren 2fe3fe1580 The clients now distinguish between the memory and alpha/beta data-type 2015-07-10 07:18:12 +02:00
CNugteren 5578d5ab28 Added option to set the imaginary part of the diagonal to zero 2015-07-08 07:25:18 +02:00
CNugteren 82469fc764 The testers now distinguish between the memory and alpha/beta data-type 2015-07-08 07:21:44 +02:00
CNugteren 599f9a70a6 Added option to set the imaginary part of the diagonal to zero 2015-07-07 07:34:36 +02:00
CNugteren d9ea0c47c6 Added the TRMM routine, tester, and client 2015-07-02 07:16:04 +02:00
CNugteren 500416aa38 Fixed the order of arguments 2015-07-02 07:12:49 +02:00
CNugteren d879eb3abf Added a set-to-one function for kernels 2015-07-02 07:11:27 +02:00
CNugteren e3dd35f91b Added the unit/non-unit diagonal enum 2015-07-01 09:39:41 +02:00
CNugteren b8d81a60d6 Fixed typos in SYMM 2015-07-01 09:38:04 +02:00