Cedric Nugteren
|
3daba70997
|
Updated the database script to remove duplicate entries: keeps only the best-performing cases for a specific parameters combination
|
2016-09-10 11:12:09 +02:00 |
|
Cedric Nugteren
|
a2f8350703
|
Refactored the Python C++ generator script; now confirms to the PEP8 styleguide
|
2016-09-04 21:26:30 +02:00 |
|
Cedric Nugteren
|
521bf6cdfc
|
Added tuning results for Intel Broadwell 5500 GT2 GPU
|
2016-09-03 16:43:23 +02:00 |
|
Cedric Nugteren
|
19574b2519
|
Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to handle duplicate entries of different runs
|
2016-09-03 12:45:11 +02:00 |
|
Cedric Nugteren
|
0c0f0ac7f9
|
Also changed the default-default for unknown device types to use the same method as for known device groups
|
2016-08-21 20:35:20 +02:00 |
|
Cedric Nugteren
|
00979faab4
|
Updated the changelog; refactored the database-get-bests code a bit
|
2016-08-21 20:16:06 +02:00 |
|
Cedric Nugteren
|
7d5631b7e4
|
Updated the database script to calculate the relative best performance of tuning results common for a device/vendor type
|
2016-08-15 21:01:07 +02:00 |
|
Cedric Nugteren
|
7da6492b36
|
Improved the speed of the new common-best defaults method for the database generation
|
2016-08-09 21:06:04 +02:00 |
|
Cedric Nugteren
|
3f5401d4c8
|
Added a first version of the database's common-best default calculation
|
2016-08-07 16:25:38 +02:00 |
|
Cedric Nugteren
|
2582f0290a
|
Moved the XgemvFast and XgemvFastRot tuning database into a separate file
|
2016-07-25 22:43:49 +02:00 |
|
Cedric Nugteren
|
622682ffe3
|
Refactored the Python database script: separated functionality in modules, now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up
|
2016-07-24 16:41:01 +02:00 |
|
Cedric Nugteren
|
9683b50c55
|
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
|
2016-07-03 20:30:47 +02:00 |
|
Cedric Nugteren
|
5a690f4e36
|
Prints the current pandas version and reports the minimum required version
|
2016-07-02 16:44:13 +02:00 |
|
Cedric Nugteren
|
b330ab0866
|
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
|
2016-06-30 10:49:17 +02:00 |
|
Cedric Nugteren
|
69beca90f4
|
Moved the performance graph scripts to the 'scripts' subfolder
|
2016-06-27 11:51:57 +02:00 |
|
Cedric Nugteren
|
eab8d3cda1
|
Minor fix to the database script
|
2016-06-19 14:55:17 +02:00 |
|
Cedric Nugteren
|
61203453aa
|
Renamed all C++ source files to .cpp to match the .hpp extension better
|
2016-06-19 13:55:49 +02:00 |
|
Cedric Nugteren
|
f726fbdc9f
|
Moved all headers into the source tree, changed headers to .hpp extension
|
2016-06-18 20:20:13 +02:00 |
|
Cedric Nugteren
|
bacb5d2bb2
|
Clean-up of the routine class, moved RunKernel to the routine/common file
|
2016-06-18 18:16:14 +02:00 |
|
Cedric Nugteren
|
52ccaf5b25
|
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
|
2016-06-16 18:07:46 +02:00 |
|
Cedric Nugteren
|
995a528cec
|
Improved API documentation and added documentation for level-2 and level-3 routines
|
2016-06-13 20:17:26 +02:00 |
|
Cedric Nugteren
|
4fb8f9517c
|
Added documentation for the matrix-update level-2 family of routines
|
2016-06-10 11:16:06 +02:00 |
|
Cedric Nugteren
|
e561e3fbd5
|
Added return value to the test binaries (0: success, 1: failure), allowing it to work under CTest properly
|
2016-06-02 16:24:22 +02:00 |
|
Cedric Nugteren
|
03182f9d07
|
Added half-precision tests for the clBLAS reference through conversion to single-precision
|
2016-05-26 23:36:19 +02:00 |
|
Cedric Nugteren
|
b487d4dd44
|
Added half-precision tests for the CBLAS reference through conversion to single-precison
|
2016-05-26 13:15:27 +02:00 |
|
Cedric Nugteren
|
4612ff3552
|
Added possibility to run the performance client with half-precision
|
2016-05-25 14:37:26 +02:00 |
|
Cedric Nugteren
|
9f87455070
|
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
|
2016-05-25 13:29:53 +02:00 |
|
Cedric Nugteren
|
3e9a07f00a
|
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
|
2016-05-22 16:59:14 +02:00 |
|
Cedric Nugteren
|
95b828da12
|
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
|
2016-05-22 15:38:26 +02:00 |
|
Cedric Nugteren
|
803aaf3070
|
Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
|
2016-05-22 14:47:14 +02:00 |
|
Cedric Nugteren
|
489c5d76cf
|
Merged in latest changes from 0.7.1 release
|
2016-05-18 21:32:56 +02:00 |
|
Cedric Nugteren
|
120c31a30f
|
Initial experimental version of the half-precision HAXPY routine
|
2016-05-13 20:49:34 +02:00 |
|
Cedric Nugteren
|
f2ba75890c
|
Initial changes in preparation for half-precision fp16 support
|
2016-05-12 19:56:21 +02:00 |
|
cnugteren
|
3b81ee2c08
|
Fixed an issue where the xAMAX tester would incorrectly report failures when testing against CBLAS
|
2016-05-08 18:28:01 +02:00 |
|
cnugteren
|
eaf1de5745
|
Fixed an issue where the xNRM2 and xASUM testers would incorrectly report failures for complex inputs
|
2016-05-08 18:07:55 +02:00 |
|
Cedric Nugteren
|
ed2904a344
|
Added preliminary generated API documentation
|
2016-05-08 09:49:00 +02:00 |
|
Cedric Nugteren
|
aa97c836b1
|
Fixed an issue with linking against the ATLAS BLAS library
|
2016-05-04 19:16:09 +02:00 |
|
Cedric Nugteren
|
27d0ac7f38
|
Added tuning results for AMD Pitcairn (R9 270X)
|
2016-05-01 19:33:50 +02:00 |
|
Cedric Nugteren
|
c94b628318
|
Updated tuning database for reduction/dot kernels based on the new tuner; partially repopulated the database
|
2016-05-01 19:17:04 +02:00 |
|
Cedric Nugteren
|
bee2f943ec
|
Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking
|
2016-05-01 14:03:37 +02:00 |
|
Cedric Nugteren
|
e113ff0852
|
Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX
|
2016-04-30 09:49:39 +02:00 |
|
Cedric Nugteren
|
877aad693f
|
Added FillCache: a function to pre-compile all kernels for a specific device
|
2016-04-29 23:33:12 +02:00 |
|
Cedric Nugteren
|
d7ddbdeb1f
|
Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX
|
2016-04-27 18:07:30 +02:00 |
|
Cedric Nugteren
|
8075934ca7
|
Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute counterparts of xASUM and IxAMAX)
|
2016-04-27 17:06:19 +02:00 |
|
Cedric Nugteren
|
82be8f211c
|
Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache
|
2016-04-27 16:02:13 +02:00 |
|
Cedric Nugteren
|
3555cd0436
|
All CLBlast enum constants now have the same raw values as in the cblas standard
|
2016-04-27 11:37:55 +02:00 |
|
cnugteren
|
16a048f1ac
|
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
|
2016-04-20 22:12:51 -06:00 |
|
cnugteren
|
894983fc3c
|
Added prototype for ixAMAX routines
|
2016-04-20 21:11:33 -06:00 |
|
cnugteren
|
8be99de82d
|
Added support for the SASUM/DASUM/ScASUM/DzASUM routines
|
2016-04-14 19:58:26 -06:00 |
|
cnugteren
|
e0497807e2
|
Added prototype for xASUM routines
|
2016-04-13 21:44:49 -06:00 |
|
cnugteren
|
a61724ece5
|
Fixed the way the defaults are calculated in the database; added warning for non-matching tuner arguments
|
2016-04-11 22:27:44 -06:00 |
|
cnugteren
|
1d3d38a261
|
Events are now properly implemented using event waiting list and asking the user to wait for event completion
|
2016-04-09 22:22:24 -06:00 |
|
cnugteren
|
1a82861a90
|
Added support for testing (performance and correctness) against a CPU BLAS library
|
2016-04-02 11:58:00 -07:00 |
|
cnugteren
|
5c83217cf2
|
Added a wrapper for CBLAS libraries for performance/correctness testing
|
2016-04-01 22:36:39 -07:00 |
|
cnugteren
|
8c3c6db7d0
|
Merge branch 'level1_routines' into development
|
2016-03-30 21:37:56 -07:00 |
|
Cedric Nugteren
|
c1df786764
|
Added prototypes for the xROTM and xROTMG routines
|
2016-03-30 16:13:37 -07:00 |
|
Cedric Nugteren
|
6ecc0d089c
|
Added prototypes for the xROT and xROTG functions
|
2016-03-30 16:13:32 -07:00 |
|
Cedric Nugteren
|
6e5f558746
|
Made event an optional argument in the CLBlast C++ API
|
2016-03-30 16:13:26 -07:00 |
|
Cedric Nugteren
|
aaa687ca98
|
Added preliminary support for the xNRM2 routines
|
2016-03-28 23:00:44 +02:00 |
|
Cedric Nugteren
|
1d5a702d9d
|
Added prototypes for ScNRM2/DzNRM2 routines
|
2016-03-25 10:30:38 +01:00 |
|
Cedric Nugteren
|
3876096c30
|
Added prototypes for SNRM2/DNRM2 routines
|
2016-03-25 10:00:40 +01:00 |
|
Cedric Nugteren
|
49822c8ead
|
Fixed the C-api export to be able to properly build a DLL on Windows
|
2016-03-23 20:49:28 +01:00 |
|
Cedric Nugteren
|
d935695417
|
Added __declspec(dllexport) to create a DLL on Windows
|
2016-03-19 11:09:09 +01:00 |
|
Cedric Nugteren
|
306bf67660
|
Added preliminary support for xHPR2 and xSPR2 routines
|
2016-03-06 15:48:11 +01:00 |
|
Cedric Nugteren
|
60da54da5d
|
Added preliminary support for xHER2 and xSYR2 routines
|
2016-03-02 21:18:01 +01:00 |
|
Cedric Nugteren
|
fa79720557
|
Added tuning results for Intel Iris Pro and AMD R9 M370X
|
2016-02-28 16:47:52 +01:00 |
|
Cedric Nugteren
|
e3545215a5
|
Added support for xHER, xHPR, xSYR, and xSPR routines
|
2016-02-28 14:16:48 +01:00 |
|
Cedric Nugteren
|
9f682aa66b
|
Set a proper default precision for the CLBlast clients
|
2016-02-20 14:41:53 +01:00 |
|
Cedric Nugteren
|
6dc44da07b
|
Added support for xGERU and xGERC routines
|
2016-02-20 14:15:41 +01:00 |
|
Cedric Nugteren
|
8854a73127
|
Added XGER routine, kernel, and tuner
|
2016-02-20 12:40:01 +01:00 |
|
Cedric Nugteren
|
165a94c200
|
Various fixes to the database script
|
2016-02-07 16:39:37 +01:00 |
|
Cedric Nugteren
|
00be6f7530
|
Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names
|
2016-02-07 11:59:30 +01:00 |
|
Cedric Nugteren
|
c76f1d9dbb
|
Made the tuning database an optional external download
|
2016-02-07 10:59:51 +01:00 |
|
CNugteren
|
704a729f5c
|
Made the database script compatible with Python 3
|
2016-02-06 13:11:36 +01:00 |
|
Cedric Nugteren
|
276e772a2c
|
Added first auto-generated database headers from the Python database; only K40 and Iris supported now
|
2016-01-30 11:43:21 +01:00 |
|
Cedric Nugteren
|
76c9148030
|
Minor improvements to the database script, including proper file paths
|
2016-01-24 17:56:27 +01:00 |
|
Cedric Nugteren
|
f0b3091cdb
|
Added Python function to compute defaults for a particular device/vendor combination
|
2016-01-24 17:35:31 +01:00 |
|
CNugteren
|
09c94b17cf
|
Added tuning data for Tesla K40
|
2015-10-28 21:20:42 +01:00 |
|
CNugteren
|
bb4e78f737
|
Added initial tuning database with Intel Iris data
|
2015-10-25 16:49:59 +01:00 |
|
CNugteren
|
ccd1a5c7cc
|
Updated tuning database script according to the new JSON format
|
2015-10-25 16:49:29 +01:00 |
|
CNugteren
|
a2d5d7770e
|
Moved the tuner database script to a separate folder
|
2015-10-25 16:27:14 +01:00 |
|
CNugteren
|
2b56c2c603
|
Added TRMV/TBMV/TPMV routines
|
2015-09-26 16:58:03 +02:00 |
|
CNugteren
|
de6547a92b
|
Added SBMV and SPMV routines
|
2015-09-19 18:01:19 +02:00 |
|
CNugteren
|
80da67d28b
|
Added the HPMV routine
|
2015-09-19 17:40:38 +02:00 |
|
CNugteren
|
aebd156869
|
Added the HBMV routine
|
2015-09-19 11:11:34 +02:00 |
|
CNugteren
|
4507ba4997
|
Added first version of banded matrix-vector multiplication
|
2015-09-18 15:25:20 +02:00 |
|
CNugteren
|
4796c9bcbd
|
Added generated main functions for correctness/performance tests for level 2 routines
|
2015-09-18 10:19:03 +02:00 |
|
CNugteren
|
6105ad6f5b
|
Added interface of all level 2 routines
|
2015-09-17 17:05:45 +02:00 |
|
CNugteren
|
6307d2e5db
|
Added script to generate API interface and implementation automatically
|
2015-09-17 10:14:33 +02:00 |
|