Cedric Nugteren
|
44fb40e5c4
|
Prepared for MSVC support
|
2016-01-30 11:54:29 +01:00 |
|
Cedric Nugteren
|
f573fe6bb3
|
Fixed a bug in the graph scripts (thanks to Victor Pakhomov)
|
2016-01-30 11:53:54 +01:00 |
|
Cedric Nugteren
|
310d05d187
|
Updated to version 4.0 of the CLCudaAPI header
|
2016-01-30 11:52:21 +01:00 |
|
Cedric Nugteren
|
deb58709c8
|
Merge branch 'tuning_database' into development
is merge is necessary,
|
2016-01-30 11:46:45 +01:00 |
|
Cedric Nugteren
|
276e772a2c
|
Added first auto-generated database headers from the Python database; only K40 and Iris supported now
|
2016-01-30 11:43:21 +01:00 |
|
Cedric Nugteren
|
76c9148030
|
Minor improvements to the database script, including proper file paths
|
2016-01-24 17:56:27 +01:00 |
|
Cedric Nugteren
|
f0b3091cdb
|
Added Python function to compute defaults for a particular device/vendor combination
|
2016-01-24 17:35:31 +01:00 |
|
Cedric Nugteren
|
e4e3663e61
|
Updated FindOpenCL for Intel Linux OpenCL paths
|
2016-01-23 16:09:07 +01:00 |
|
CNugteren
|
09c94b17cf
|
Added tuning data for Tesla K40
|
2015-10-28 21:20:42 +01:00 |
|
CNugteren
|
c0d469718a
|
Now sets local memory size in xgemv tuner properly
|
2015-10-28 21:19:59 +01:00 |
|
CNugteren
|
bb4e78f737
|
Added initial tuning database with Intel Iris data
|
2015-10-25 16:49:59 +01:00 |
|
CNugteren
|
ccd1a5c7cc
|
Updated tuning database script according to the new JSON format
|
2015-10-25 16:49:29 +01:00 |
|
CNugteren
|
179ad0666d
|
Fixed an arguments-related bug in the GEMV tuner
|
2015-10-25 16:48:26 +01:00 |
|
CNugteren
|
a2d5d7770e
|
Moved the tuner database script to a separate folder
|
2015-10-25 16:27:14 +01:00 |
|
CNugteren
|
9bf6be8426
|
Added alpha and beta to tuner meta-data
|
2015-10-23 11:01:44 +02:00 |
|
CNugteren
|
3f616366bd
|
Prepared the changelog for the next release
|
2015-10-17 15:57:04 +02:00 |
|
CNugteren
|
92404035e8
|
Updated to version 0.5.0
|
2015-10-17 15:48:13 +02:00 |
|
CNugteren
|
afb3e64fd3
|
Travis now also build the development branch
|
2015-10-17 15:42:45 +02:00 |
|
Cedric Nugteren
|
653feca564
|
Merge pull request #28 from CNugteren/kernels_reorganization
Kernels re-organization level-3
|
2015-10-17 15:30:06 +02:00 |
|
CNugteren
|
0d4091fdfb
|
Added guards for routine-specific level-3 pad kernels
|
2015-10-13 08:29:45 +02:00 |
|
CNugteren
|
f74c9a5640
|
Routine names are now all default arguments defined in the header
|
2015-10-12 08:35:58 +02:00 |
|
CNugteren
|
54a8723f8c
|
Moved level3 kernel files to a subfolder
|
2015-10-12 08:28:40 +02:00 |
|
Cedric Nugteren
|
92b4b0d1fe
|
Merge pull request #27 from CNugteren/level2_matrix_vector
Added many level-2 matrix-vector routines
|
2015-09-26 17:02:34 +02:00 |
|
CNugteren
|
2b56c2c603
|
Added TRMV/TBMV/TPMV routines
|
2015-09-26 16:58:03 +02:00 |
|
CNugteren
|
04d28b0420
|
Made buffer copying a const-method for the source
|
2015-09-26 16:48:11 +02:00 |
|
CNugteren
|
de6547a92b
|
Added SBMV and SPMV routines
|
2015-09-19 18:01:19 +02:00 |
|
CNugteren
|
80da67d28b
|
Added the HPMV routine
|
2015-09-19 17:40:38 +02:00 |
|
CNugteren
|
c32c4a9739
|
Added infrastructure for packed matrices
|
2015-09-19 17:37:42 +02:00 |
|
CNugteren
|
aebd156869
|
Added the HBMV routine
|
2015-09-19 11:11:34 +02:00 |
|
CNugteren
|
93dddda63e
|
Improved the organization and performance of level 2 routines
|
2015-09-18 17:46:41 +02:00 |
|
CNugteren
|
4507ba4997
|
Added first version of banded matrix-vector multiplication
|
2015-09-18 15:25:20 +02:00 |
|
Cedric Nugteren
|
42db8ea968
|
Merge pull request #26 from CNugteren/routine_definitions
Generated API interface and implementations
|
2015-09-18 10:23:16 +02:00 |
|
CNugteren
|
4796c9bcbd
|
Added generated main functions for correctness/performance tests for level 2 routines
|
2015-09-18 10:19:03 +02:00 |
|
CNugteren
|
6105ad6f5b
|
Added interface of all level 2 routines
|
2015-09-17 17:05:45 +02:00 |
|
CNugteren
|
6307d2e5db
|
Added script to generate API interface and implementation automatically
|
2015-09-17 10:14:33 +02:00 |
|
CNugteren
|
1c24210026
|
Made Travis always build pushes to the master branch
|
2015-09-14 17:16:31 +02:00 |
|
Cedric Nugteren
|
a2b773573d
|
Merge pull request #25 from CNugteren/level1_routines
Added several level 1 routines
|
2015-09-14 17:12:23 +02:00 |
|
CNugteren
|
224c967584
|
Removed routines from the table which are not supported by clBLAS
|
2015-09-14 17:02:33 +02:00 |
|
CNugteren
|
a2e726d3bd
|
Added xDOT/xDOTU/xDOTC dot-product routines
|
2015-09-14 16:57:00 +02:00 |
|
CNugteren
|
2a383f3450
|
Added extra temporary buffer to tuners in preparation of Xdot routines
|
2015-09-14 15:53:34 +02:00 |
|
CNugteren
|
e0c5312abb
|
Added support for the dot buffer and offset argument
|
2015-09-14 12:28:50 +02:00 |
|
CNugteren
|
b0b81deae1
|
Minor update of options-printing syntax
|
2015-08-24 07:38:20 +02:00 |
|
CNugteren
|
ff0c54c386
|
Added the XSWAP, XSCAL and XCOPY level-1 routines
|
2015-08-22 17:11:20 +02:00 |
|
CNugteren
|
75517353d5
|
Re-organized level1 xaxpy kernel
|
2015-08-22 14:33:48 +02:00 |
|
CNugteren
|
70ba7c83d4
|
Prepared the changelog for the next release
|
2015-08-22 12:50:26 +02:00 |
|
CNugteren
|
74f601794d
|
Updated to version 0.4.0
|
2015-08-22 12:41:40 +02:00 |
|
CNugteren
|
ff1a670e88
|
Updated the documentation
|
2015-08-22 12:40:18 +02:00 |
|
CNugteren
|
5f5d31754a
|
Added clblast prefix to binaries and added the alltests target
|
2015-08-21 07:36:19 +02:00 |
|
Cedric Nugteren
|
cf168fca70
|
Merge pull request #23 from CNugteren/tuner_database
Added initial version of a tuner-database
|
2015-08-20 08:38:18 +02:00 |
|
CNugteren
|
15db2bcc20
|
Added initial version of tuner-database Python script
|
2015-08-20 08:30:51 +02:00 |
|