Cedric Nugteren
|
bacb5d2bb2
|
Clean-up of the routine class, moved RunKernel to the routine/common file
|
2016-06-18 18:16:14 +02:00 |
|
Cedric Nugteren
|
7b4c0e1cf0
|
Removed the template from the Routine base-class
|
2016-06-18 14:56:55 +02:00 |
|
Cedric Nugteren
|
f9947b4d7f
|
Removed the precision argument from the routines in favor of a single templated function
|
2016-06-17 14:30:37 +02:00 |
|
Cedric Nugteren
|
536b7fe4bc
|
Removed the interface to the cache functions from the Routine class, calls them directly now
|
2016-06-17 13:57:50 +02:00 |
|
Cedric Nugteren
|
98a95c89fc
|
Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine class
|
2016-06-17 12:32:06 +02:00 |
|
Cedric Nugteren
|
520e28e7a7
|
Moved the ErrorIn function from the Routine class to the utilities header
|
2016-06-17 11:41:10 +02:00 |
|
Cedric Nugteren
|
afe8852eaa
|
Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file
|
2016-06-17 11:29:07 +02:00 |
|
Cedric Nugteren
|
52ccaf5b25
|
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
|
2016-06-16 18:07:46 +02:00 |
|
Cedric Nugteren
|
39b7dbc5e3
|
Added some constness to variables related to the GEMM routines
|
2016-06-15 12:34:05 +02:00 |
|
Cedric Nugteren
|
3e78a99355
|
Moved device vendor and type checks to a common header
|
2016-06-14 14:30:22 +02:00 |
|
Cedric Nugteren
|
6925003e45
|
Added global memory synchronisation for better cache performance on ARM Mali GPUs
|
2016-06-08 10:13:37 +02:00 |
|
Cedric Nugteren
|
137d1d8708
|
Added tuning parameters for 'GRID K520' and 'HD Graphics Skylake ULT GT2'
|
2016-06-01 09:39:33 +02:00 |
|
Cedric Nugteren
|
03182f9d07
|
Added half-precision tests for the clBLAS reference through conversion to single-precision
|
2016-05-26 23:36:19 +02:00 |
|
Cedric Nugteren
|
9f87455070
|
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
|
2016-05-25 13:29:53 +02:00 |
|
Cedric Nugteren
|
3e9a07f00a
|
Added level-2 half-precision routines HGER/HSYR/HSPR/HSYR2/HSPR2
|
2016-05-22 16:59:14 +02:00 |
|
Cedric Nugteren
|
f0cb3fdc81
|
Fixed tuning results for half-precision; added first results for the xGER kernels
|
2016-05-22 16:46:05 +02:00 |
|
Cedric Nugteren
|
c8ff3f143f
|
Prepared the GER kernels and tuner for half-precision support
|
2016-05-22 16:18:08 +02:00 |
|
Cedric Nugteren
|
95b828da12
|
Added level-2 half-precision routines HGEMV/HGBMV/HHEMV/HHBMV/HHPMV/HSYMV/HSBMV/HSPMV/HTRMV/HTBMV/HTPMV
|
2016-05-22 15:38:26 +02:00 |
|
Cedric Nugteren
|
b6268d0c22
|
Added first tuning results for the half-precision xGEMV kernels
|
2016-05-22 15:29:05 +02:00 |
|
Cedric Nugteren
|
88551b4005
|
Prepared the GEMV kernels and tuner for half-precision support
|
2016-05-22 15:22:54 +02:00 |
|
Cedric Nugteren
|
803aaf3070
|
Added level-1 half-precision routines HSWAP/HSCAL/HCOPY/HAXPY/HDOT/HNRM2/HASUM/HSUM/iHAMAX/iHMAX/iHMIN
|
2016-05-22 14:47:14 +02:00 |
|
Cedric Nugteren
|
3c9e63c054
|
Added first tuning results for the half-precision xDOT kernels
|
2016-05-22 14:43:25 +02:00 |
|
Cedric Nugteren
|
489c5d76cf
|
Merged in latest changes from 0.7.1 release
|
2016-05-18 21:32:56 +02:00 |
|
Cedric Nugteren
|
7a3b695db7
|
Added half precision tuning results for supporting kernels (pad, copy, transpose, padtranspose)
|
2016-05-16 12:45:10 +02:00 |
|
Cedric Nugteren
|
4b6bdd83a2
|
Added header with conversions from and to half-precision floating-point
|
2016-05-15 20:13:57 +02:00 |
|
Cedric Nugteren
|
5e1b2e021f
|
Set kernel arguments for AXPY as constant memory buffers, making it possible to transfer half-precision values as well
|
2016-05-14 18:06:00 +02:00 |
|
Cedric Nugteren
|
120c31a30f
|
Initial experimental version of the half-precision HAXPY routine
|
2016-05-13 20:49:34 +02:00 |
|
Cedric Nugteren
|
f2ba75890c
|
Initial changes in preparation for half-precision fp16 support
|
2016-05-12 19:56:21 +02:00 |
|
Cedric Nugteren
|
435729a43e
|
Added tuning results for AMD Hawaii (R9 290X)
|
2016-05-02 20:20:23 +02:00 |
|
Cedric Nugteren
|
27d0ac7f38
|
Added tuning results for AMD Pitcairn (R9 270X)
|
2016-05-01 19:33:50 +02:00 |
|
Cedric Nugteren
|
c94b628318
|
Updated tuning database for reduction/dot kernels based on the new tuner; partially repopulated the database
|
2016-05-01 19:17:04 +02:00 |
|
Cedric Nugteren
|
bee2f943ec
|
Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking
|
2016-05-01 14:03:37 +02:00 |
|
Cedric Nugteren
|
9602c150aa
|
Added a program cache (per-context) next to the per-device binary cache
|
2016-05-01 12:56:08 +02:00 |
|
Cedric Nugteren
|
e113ff0852
|
Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX
|
2016-04-30 09:49:39 +02:00 |
|
Cedric Nugteren
|
877aad693f
|
Added FillCache: a function to pre-compile all kernels for a specific device
|
2016-04-29 23:33:12 +02:00 |
|
Cedric Nugteren
|
d9b21d7f49
|
Fixed the cache to store binaries instead of OpenCL programs
|
2016-04-28 21:14:17 +02:00 |
|
Cedric Nugteren
|
d7ddbdeb1f
|
Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX
|
2016-04-27 18:07:30 +02:00 |
|
Cedric Nugteren
|
8075934ca7
|
Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute counterparts of xASUM and IxAMAX)
|
2016-04-27 17:06:19 +02:00 |
|
Cedric Nugteren
|
82be8f211c
|
Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache
|
2016-04-27 16:02:13 +02:00 |
|
Cedric Nugteren
|
226e834d0a
|
Added a '-verbose' option to the test binaries to report errors in more detail if needed
|
2016-04-27 14:38:30 +02:00 |
|
Cedric Nugteren
|
3555cd0436
|
All CLBlast enum constants now have the same raw values as in the cblas standard
|
2016-04-27 11:37:55 +02:00 |
|
cnugteren
|
16a048f1ac
|
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
|
2016-04-20 22:12:51 -06:00 |
|
cnugteren
|
894983fc3c
|
Added prototype for ixAMAX routines
|
2016-04-20 21:11:33 -06:00 |
|
cnugteren
|
8be99de82d
|
Added support for the SASUM/DASUM/ScASUM/DzASUM routines
|
2016-04-14 19:58:26 -06:00 |
|
cnugteren
|
e0497807e2
|
Added prototype for xASUM routines
|
2016-04-13 21:44:49 -06:00 |
|
cnugteren
|
a61724ece5
|
Fixed the way the defaults are calculated in the database; added warning for non-matching tuner arguments
|
2016-04-11 22:27:44 -06:00 |
|
cnugteren
|
1d3d38a261
|
Events are now properly implemented using event waiting list and asking the user to wait for event completion
|
2016-04-09 22:22:24 -06:00 |
|
cnugteren
|
1a82861a90
|
Added support for testing (performance and correctness) against a CPU BLAS library
|
2016-04-02 11:58:00 -07:00 |
|
cnugteren
|
5c83217cf2
|
Added a wrapper for CBLAS libraries for performance/correctness testing
|
2016-04-01 22:36:39 -07:00 |
|
cnugteren
|
8c3c6db7d0
|
Merge branch 'level1_routines' into development
|
2016-03-30 21:37:56 -07:00 |
|
Cedric Nugteren
|
c1df786764
|
Added prototypes for the xROTM and xROTMG routines
|
2016-03-30 16:13:37 -07:00 |
|
Cedric Nugteren
|
6ecc0d089c
|
Added prototypes for the xROT and xROTG functions
|
2016-03-30 16:13:32 -07:00 |
|
Cedric Nugteren
|
6e5f558746
|
Made event an optional argument in the CLBlast C++ API
|
2016-03-30 16:13:26 -07:00 |
|
Cedric Nugteren
|
6f561abada
|
Added missing newline to the end of the public API file
|
2016-03-30 16:13:22 -07:00 |
|
Cedric Nugteren
|
2429ad5025
|
Fixed properly passing of OpenCL events to CLBlast functions
|
2016-03-30 16:12:53 -07:00 |
|
Cedric Nugteren
|
aaa687ca98
|
Added preliminary support for the xNRM2 routines
|
2016-03-28 23:00:44 +02:00 |
|
Cedric Nugteren
|
1d5a702d9d
|
Added prototypes for ScNRM2/DzNRM2 routines
|
2016-03-25 10:30:38 +01:00 |
|
Cedric Nugteren
|
3876096c30
|
Added prototypes for SNRM2/DNRM2 routines
|
2016-03-25 10:00:40 +01:00 |
|
Cedric Nugteren
|
49822c8ead
|
Fixed the C-api export to be able to properly build a DLL on Windows
|
2016-03-23 20:49:28 +01:00 |
|
Cedric Nugteren
|
d935695417
|
Added __declspec(dllexport) to create a DLL on Windows
|
2016-03-19 11:09:09 +01:00 |
|
Cedric Nugteren
|
918797735d
|
Made the library thread-safe by guarding the kernel cache with a mutex
|
2016-03-14 22:55:22 +01:00 |
|
Cedric Nugteren
|
88c551cdea
|
Added tuning results for the newest xGER family kernels
|
2016-03-12 16:23:58 +01:00 |
|
Cedric Nugteren
|
83c6a51765
|
Added tuning results for the ARM Mali-T628 GPU
|
2016-03-12 15:10:35 +01:00 |
|
Cedric Nugteren
|
306bf67660
|
Added preliminary support for xHPR2 and xSPR2 routines
|
2016-03-06 15:48:11 +01:00 |
|
Cedric Nugteren
|
60da54da5d
|
Added preliminary support for xHER2 and xSYR2 routines
|
2016-03-02 21:18:01 +01:00 |
|
Cedric Nugteren
|
fa79720557
|
Added tuning results for Intel Iris Pro and AMD R9 M370X
|
2016-02-28 16:47:52 +01:00 |
|
Cedric Nugteren
|
e3545215a5
|
Added support for xHER, xHPR, xSYR, and xSPR routines
|
2016-02-28 14:16:48 +01:00 |
|
Cedric Nugteren
|
cef78c7356
|
Fixed a compilation issue under AppleClang
|
2016-02-28 14:14:50 +01:00 |
|
Cedric Nugteren
|
9f682aa66b
|
Set a proper default precision for the CLBlast clients
|
2016-02-20 14:41:53 +01:00 |
|
Cedric Nugteren
|
6dc44da07b
|
Added support for xGERU and xGERC routines
|
2016-02-20 14:15:41 +01:00 |
|
Cedric Nugteren
|
8854a73127
|
Added XGER routine, kernel, and tuner
|
2016-02-20 12:40:01 +01:00 |
|
Cedric Nugteren
|
6f4b34f813
|
Added tuning parameters for various devices using the new database script
|
2016-02-07 16:41:09 +01:00 |
|
Cedric Nugteren
|
00be6f7530
|
Added dictionary with short and long OpenCL vendor names to fix issues with Intel having multiple names
|
2016-02-07 11:59:30 +01:00 |
|
CNugteren
|
fbf071ba62
|
Fixed a linker error in the performance client under GCC
|
2016-02-06 10:53:44 +01:00 |
|
Cedric Nugteren
|
310d05d187
|
Updated to version 4.0 of the CLCudaAPI header
|
2016-01-30 11:52:21 +01:00 |
|
Cedric Nugteren
|
276e772a2c
|
Added first auto-generated database headers from the Python database; only K40 and Iris supported now
|
2016-01-30 11:43:21 +01:00 |
|
CNugteren
|
9bf6be8426
|
Added alpha and beta to tuner meta-data
|
2015-10-23 11:01:44 +02:00 |
|
CNugteren
|
f74c9a5640
|
Routine names are now all default arguments defined in the header
|
2015-10-12 08:35:58 +02:00 |
|
CNugteren
|
2b56c2c603
|
Added TRMV/TBMV/TPMV routines
|
2015-09-26 16:58:03 +02:00 |
|
CNugteren
|
04d28b0420
|
Made buffer copying a const-method for the source
|
2015-09-26 16:48:11 +02:00 |
|
CNugteren
|
de6547a92b
|
Added SBMV and SPMV routines
|
2015-09-19 18:01:19 +02:00 |
|
CNugteren
|
80da67d28b
|
Added the HPMV routine
|
2015-09-19 17:40:38 +02:00 |
|
CNugteren
|
c32c4a9739
|
Added infrastructure for packed matrices
|
2015-09-19 17:37:42 +02:00 |
|
CNugteren
|
aebd156869
|
Added the HBMV routine
|
2015-09-19 11:11:34 +02:00 |
|
CNugteren
|
93dddda63e
|
Improved the organization and performance of level 2 routines
|
2015-09-18 17:46:41 +02:00 |
|
CNugteren
|
4507ba4997
|
Added first version of banded matrix-vector multiplication
|
2015-09-18 15:25:20 +02:00 |
|
CNugteren
|
6105ad6f5b
|
Added interface of all level 2 routines
|
2015-09-17 17:05:45 +02:00 |
|
CNugteren
|
6307d2e5db
|
Added script to generate API interface and implementation automatically
|
2015-09-17 10:14:33 +02:00 |
|
CNugteren
|
a2e726d3bd
|
Added xDOT/xDOTU/xDOTC dot-product routines
|
2015-09-14 16:57:00 +02:00 |
|
CNugteren
|
2a383f3450
|
Added extra temporary buffer to tuners in preparation of Xdot routines
|
2015-09-14 15:53:34 +02:00 |
|
CNugteren
|
e0c5312abb
|
Added support for the dot buffer and offset argument
|
2015-09-14 12:28:50 +02:00 |
|
CNugteren
|
ff0c54c386
|
Added the XSWAP, XSCAL and XCOPY level-1 routines
|
2015-08-22 17:11:20 +02:00 |
|
Cedric Nugteren
|
cf168fca70
|
Merge pull request #23 from CNugteren/tuner_database
Added initial version of a tuner-database
|
2015-08-20 08:38:18 +02:00 |
|
CNugteren
|
798a3b6101
|
Add check for supported precision to the tuners
|
2015-08-19 19:35:08 +02:00 |
|
CNugteren
|
b46de22433
|
Moved precision tester to utilities
|
2015-08-19 19:34:29 +02:00 |
|
CNugteren
|
8a02db0746
|
Added precision to the JSON output
|
2015-08-19 11:12:42 +02:00 |
|
CNugteren
|
603e389545
|
Added all supported routines to the C API
|
2015-08-13 17:58:46 +02:00 |
|
CNugteren
|
8617195ac5
|
Added initial version of C API with just one routine
|
2015-08-13 13:46:13 +02:00 |
|
CNugteren
|
f85d44f602
|
Added argument m,n,k metadata to JSON files
|
2015-08-13 08:33:04 +02:00 |
|
CNugteren
|
dbdb58c600
|
Refactored the tuners, added JSON output
|
2015-08-09 15:50:41 +02:00 |
|