Cedric Nugteren
|
57f09178d8
|
Added tuning results for AMD Oland and for Intel Graphics HD 530
|
2016-07-10 11:46:44 +02:00 |
|
Cedric Nugteren
|
27854070b4
|
Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen
|
2016-07-06 21:50:12 +02:00 |
|
Cedric Nugteren
|
9683b50c55
|
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
|
2016-07-03 20:30:47 +02:00 |
|
Cedric Nugteren
|
9171f1c160
|
Updated the README in various places
|
2016-06-27 17:28:48 +02:00 |
|
Cedric Nugteren
|
5557a6ae81
|
Added vcvarsall to AppVeyor and added AppVeyor icons to README
|
2016-06-27 14:10:56 +02:00 |
|
Cedric Nugteren
|
7eeb790824
|
Added Appveyor Windows CI support
|
2016-06-27 12:47:39 +02:00 |
|
Cedric Nugteren
|
5f8886339a
|
Increased coverage of Travis CI automatic builds
|
2016-06-27 12:16:12 +02:00 |
|
Cedric Nugteren
|
69beca90f4
|
Moved the performance graph scripts to the 'scripts' subfolder
|
2016-06-27 11:51:57 +02:00 |
|
Cedric Nugteren
|
66908ef5cd
|
Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (thanks to OursDesCavernes)
|
2016-06-19 14:59:50 +02:00 |
|
Cedric Nugteren
|
61203453aa
|
Renamed all C++ source files to .cpp to match the .hpp extension better
|
2016-06-19 13:55:49 +02:00 |
|
Cedric Nugteren
|
52ccaf5b25
|
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
|
2016-06-16 18:07:46 +02:00 |
|
Cedric Nugteren
|
6d6b030053
|
Made the CPU BLAS library the default reference to test against in favor of clBLAS
|
2016-06-08 09:21:39 +02:00 |
|
Cedric Nugteren
|
137d1d8708
|
Added tuning parameters for 'GRID K520' and 'HD Graphics Skylake ULT GT2'
|
2016-06-01 09:39:33 +02:00 |
|
Cedric Nugteren
|
305bf16c4c
|
Separated the performance tests (clients) from the correctness tests in CMake
|
2016-05-30 16:38:26 +02:00 |
|
Cedric Nugteren
|
9f87455070
|
Added level-3 half-precision routines HGEMM/HSYMM/HSYRK/HSYR2K/HTRMM
|
2016-05-25 13:29:53 +02:00 |
|
Cedric Nugteren
|
ac1575056e
|
Added proper argument handling and displaying for half-precision data-types
|
2016-05-24 14:06:16 +02:00 |
|
Cedric Nugteren
|
ae7d705d6f
|
Updated README with information on half-precision support
|
2016-05-23 19:23:46 +02:00 |
|
Cedric Nugteren
|
489c5d76cf
|
Merged in latest changes from 0.7.1 release
|
2016-05-18 21:32:56 +02:00 |
|
Cedric Nugteren
|
1c72d225c5
|
Fixed links in the README
|
2016-05-10 21:03:51 +02:00 |
|
Cedric Nugteren
|
c5730c8b43
|
Updated to version 0.7.0
|
2016-05-08 20:29:41 +02:00 |
|
Cedric Nugteren
|
6c9e08c5e2
|
Added an option to the tests to control whether to test against clBLAS or a CPU BLAS library
|
2016-05-07 12:22:06 +02:00 |
|
Cedric Nugteren
|
435729a43e
|
Added tuning results for AMD Hawaii (R9 290X)
|
2016-05-02 20:20:23 +02:00 |
|
Cedric Nugteren
|
e113ff0852
|
Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX
|
2016-04-30 09:49:39 +02:00 |
|
Cedric Nugteren
|
d7ddbdeb1f
|
Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX
|
2016-04-27 18:07:30 +02:00 |
|
cnugteren
|
16a048f1ac
|
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
|
2016-04-20 22:12:51 -06:00 |
|
cnugteren
|
8be99de82d
|
Added support for the SASUM/DASUM/ScASUM/DzASUM routines
|
2016-04-14 19:58:26 -06:00 |
|
cnugteren
|
1d3d38a261
|
Events are now properly implemented using event waiting list and asking the user to wait for event completion
|
2016-04-09 22:22:24 -06:00 |
|
cnugteren
|
c4ab9bda63
|
Updated the documentation in light of the support for a reference CPU BLAS library
|
2016-04-03 16:07:25 -07:00 |
|
cnugteren
|
8217b01702
|
Updated the documentation
|
2016-03-31 20:20:32 -07:00 |
|
Cedric Nugteren
|
de7e68e872
|
Updated the README file
|
2016-03-13 10:48:42 +01:00 |
|
Cedric Nugteren
|
306bf67660
|
Added preliminary support for xHPR2 and xSPR2 routines
|
2016-03-06 15:48:11 +01:00 |
|
Cedric Nugteren
|
fa79720557
|
Added tuning results for Intel Iris Pro and AMD R9 M370X
|
2016-02-28 16:47:52 +01:00 |
|
Cedric Nugteren
|
e3545215a5
|
Added support for xHER, xHPR, xSYR, and xSPR routines
|
2016-02-28 14:16:48 +01:00 |
|
Cedric Nugteren
|
6dc44da07b
|
Added support for xGERU and xGERC routines
|
2016-02-20 14:15:41 +01:00 |
|
Cedric Nugteren
|
6f4b34f813
|
Added tuning parameters for various devices using the new database script
|
2016-02-07 16:41:09 +01:00 |
|
Cedric Nugteren
|
44fb40e5c4
|
Prepared for MSVC support
|
2016-01-30 11:54:29 +01:00 |
|
CNugteren
|
2b56c2c603
|
Added TRMV/TBMV/TPMV routines
|
2015-09-26 16:58:03 +02:00 |
|
CNugteren
|
de6547a92b
|
Added SBMV and SPMV routines
|
2015-09-19 18:01:19 +02:00 |
|
CNugteren
|
80da67d28b
|
Added the HPMV routine
|
2015-09-19 17:40:38 +02:00 |
|
CNugteren
|
aebd156869
|
Added the HBMV routine
|
2015-09-19 11:11:34 +02:00 |
|
CNugteren
|
4507ba4997
|
Added first version of banded matrix-vector multiplication
|
2015-09-18 15:25:20 +02:00 |
|
CNugteren
|
224c967584
|
Removed routines from the table which are not supported by clBLAS
|
2015-09-14 17:02:33 +02:00 |
|
CNugteren
|
a2e726d3bd
|
Added xDOT/xDOTU/xDOTC dot-product routines
|
2015-09-14 16:57:00 +02:00 |
|
CNugteren
|
ff0c54c386
|
Added the XSWAP, XSCAL and XCOPY level-1 routines
|
2015-08-22 17:11:20 +02:00 |
|
CNugteren
|
ff1a670e88
|
Updated the documentation
|
2015-08-22 12:40:18 +02:00 |
|
Cedric Nugteren
|
85bd783e0d
|
Merge pull request #22 from CNugteren/travis
Added Travis continuous integration
|
2015-08-19 09:34:01 +02:00 |
|
CNugteren
|
e806bc1ff0
|
Added Travis build-status to the README
|
2015-08-19 09:29:54 +02:00 |
|
CNugteren
|
4242f90215
|
Added the plain C API
|
2015-08-13 18:00:09 +02:00 |
|
CNugteren
|
c52c5f3d35
|
Added HEMV and SYMV
|
2015-07-31 17:41:10 +02:00 |
|
CNugteren
|
a27ce11c69
|
Updated documentation reflecting removal of clBLAS sources
|
2015-07-31 11:15:48 +02:00 |
|