Cedric Nugteren
|
4f6e42d052
|
Merge pull request #21 from CNugteren/c_api
Added a plain C API
|
2015-08-13 18:02:03 +02:00 |
|
CNugteren
|
4242f90215
|
Added the plain C API
|
2015-08-13 18:00:09 +02:00 |
|
CNugteren
|
603e389545
|
Added all supported routines to the C API
|
2015-08-13 17:58:46 +02:00 |
|
CNugteren
|
8eeb7f721f
|
Fixed a complex data-type bug in the transpose kernel
|
2015-08-13 14:33:42 +02:00 |
|
CNugteren
|
a6c104ef20
|
Added SGEMM example using the C API
|
2015-08-13 13:47:15 +02:00 |
|
CNugteren
|
8617195ac5
|
Added initial version of C API with just one routine
|
2015-08-13 13:46:13 +02:00 |
|
CNugteren
|
f85d44f602
|
Added argument m,n,k metadata to JSON files
|
2015-08-13 08:33:04 +02:00 |
|
CNugteren
|
dbdb58c600
|
Refactored the tuners, added JSON output
|
2015-08-09 15:50:41 +02:00 |
|
Cedric Nugteren
|
e4aa4519c2
|
Merge pull request #19 from CNugteren/basic_level2_routines
Level-2 routines: HEMV and SYMV
|
2015-08-04 08:19:42 +02:00 |
|
CNugteren
|
75b4d92ac3
|
Added distinguished names for GEMV inherited HEMV/SYMV
|
2015-08-04 08:15:39 +02:00 |
|
CNugteren
|
d1a7cf18ec
|
Abstracted loading of matrix A for GEMV kernel
|
2015-08-03 07:37:14 +02:00 |
|
CNugteren
|
fc7cd434e1
|
Added HEMV and SYMV
|
2015-07-31 17:44:17 +02:00 |
|
CNugteren
|
c52c5f3d35
|
Added HEMV and SYMV
|
2015-07-31 17:41:10 +02:00 |
|
CNugteren
|
938ca2707f
|
Added HEMV routine
|
2015-07-31 17:35:42 +02:00 |
|
CNugteren
|
b89517a2e7
|
Added SYMV routine
|
2015-07-31 17:13:41 +02:00 |
|
Cedric Nugteren
|
674f69390d
|
Merge pull request #18 from CNugteren/correctness_test_refactoring
Refactored the correctness tests
|
2015-07-31 16:01:47 +02:00 |
|
CNugteren
|
c5d5adbddd
|
Refactored the correctness tests
|
2015-07-31 15:52:13 +02:00 |
|
Cedric Nugteren
|
6e1e7fdcaf
|
Merge pull request #17 from CNugteren/clblas_external
Removed clBLAS sources
|
2015-07-31 11:20:30 +02:00 |
|
CNugteren
|
a27ce11c69
|
Updated documentation reflecting removal of clBLAS sources
|
2015-07-31 11:15:48 +02:00 |
|
CNugteren
|
68044254c7
|
Removed clBLAS source code, now requires separate installation
|
2015-07-31 11:06:07 +02:00 |
|
CNugteren
|
e4c9f4cfe5
|
Moved the preferred options of clBLAS (no tests) to the CLBlast CMakeLists file
|
2015-07-27 07:34:19 +02:00 |
|
Cedric Nugteren
|
1acec9c951
|
Merge pull request #16 from CNugteren/claduc_header
Now using the new Claduc C++11 OpenCL header
|
2015-07-27 07:21:20 +02:00 |
|
CNugteren
|
f7199b831f
|
Now using the new Claduc C++11 OpenCL header
|
2015-07-27 07:18:06 +02:00 |
|
CNugteren
|
b10f4a633c
|
Prepared the changelog for the next release
|
2015-07-24 20:50:00 +02:00 |
|
CNugteren
|
efbdcd2d90
|
Updated to version 0.3.0
|
2015-07-24 08:25:32 +02:00 |
|
Cedric Nugteren
|
44760e7381
|
Merge pull request #14 from CNugteren/amd_performance
Improved performance for AMD GPUs
|
2015-07-24 08:21:01 +02:00 |
|
CNugteren
|
a76dc2f09c
|
Updated the docs to reflect the performance improvements
|
2015-07-24 08:16:41 +02:00 |
|
CNugteren
|
547b7afffc
|
Updated the performance results, added HD7950
|
2015-07-23 18:25:39 +02:00 |
|
CNugteren
|
0273b622d3
|
Made the graph script robust against diagnostic system messages
|
2015-07-22 21:30:02 +02:00 |
|
CNugteren
|
dd8471ba92
|
Set the correct name for AMD OpenCL devices
|
2015-07-22 19:25:06 +02:00 |
|
CNugteren
|
3a6bdeb79a
|
Updated GEMM tuning results for Tahiti
|
2015-07-22 07:31:39 +02:00 |
|
CNugteren
|
4dcecfe934
|
Added workgroup shuffle option to transpose kernel for AMD GPUs
|
2015-07-22 07:31:16 +02:00 |
|
CNugteren
|
d93efa3169
|
Transpose kernel now uses vectorized local memory loads and stores
|
2015-07-21 08:22:18 +02:00 |
|
CNugteren
|
a0f0f6c8ce
|
Triangular GEMM kernels are only compiled when needed
|
2015-07-19 16:36:12 +02:00 |
|
CNugteren
|
48e2e96f1b
|
Kernel caching is now based on a routine's name
|
2015-07-19 16:24:14 +02:00 |
|
CNugteren
|
4e499a67c1
|
The kernel source string is now a routine's member variable
|
2015-07-19 13:44:37 +02:00 |
|
CNugteren
|
250f8ab295
|
Fixed complex performance on Intel Iris
|
2015-07-19 13:39:13 +02:00 |
|
CNugteren
|
9300261bd4
|
Fixed a bug when using the Xgemm kernel without local memory
|
2015-07-16 22:49:55 +02:00 |
|
CNugteren
|
0157d6d4ea
|
Using mad() instruction for AMD devices like clBLAS does
|
2015-07-16 22:42:02 +02:00 |
|
Cedric Nugteren
|
3bb1b5fa6e
|
Merge pull request #13 from CNugteren/bypass_pre_post_processing
Bypass pre/post-processing
|
2015-07-15 22:27:56 +02:00 |
|
CNugteren
|
6908c4ebd2
|
Updated changelog with pre/post-processing bypass
|
2015-07-15 22:24:15 +02:00 |
|
CNugteren
|
ba0026d2b9
|
Changed performance graphs to default to column-major
|
2015-07-15 22:21:24 +02:00 |
|
CNugteren
|
b526623fc7
|
Skips pre/post processing kernels if not needed
|
2015-07-15 22:12:38 +02:00 |
|
CNugteren
|
0dc85845f7
|
Updated interface of the PadCopyTransposeMatrix method
|
2015-07-13 08:41:26 +02:00 |
|
Cedric Nugteren
|
530418f06f
|
Merge pull request #12 from CNugteren/level_subfolders
Added subfolders for the level1/2/3 routines
|
2015-07-12 16:59:17 +02:00 |
|
CNugteren
|
aa852bbe67
|
Added subfolders for the level1/2/3 routines
|
2015-07-12 16:57:09 +02:00 |
|
Cedric Nugteren
|
721546e64a
|
Merge pull request #11 from CNugteren/level3_routines_2
Added level-3 routines
|
2015-07-12 15:22:11 +02:00 |
|
CNugteren
|
c920400261
|
Added HEMM, HERK, HER2K, and TRMM
|
2015-07-12 15:14:35 +02:00 |
|
CNugteren
|
b5d39d9d0c
|
Added the HEMM routine, tester, and client
|
2015-07-12 15:11:50 +02:00 |
|
CNugteren
|
9a929f3fb2
|
Disabled prototype of TRSM
|
2015-07-10 21:08:18 +02:00 |
|