Cedric Nugteren
|
56aa1701c9
|
Added printing of indices when testing in verbose mode
|
2016-05-05 23:09:57 +02:00 |
|
Cedric Nugteren
|
f18c12389d
|
Merge pull request #57 from dividiti/development
Locate the C BLAS library before the F77 one.
|
2016-05-05 22:27:22 +02:00 |
|
Anton Lokhmotov
|
e075dc347a
|
Locate the C BLAS library before the F77 one.
|
2016-05-05 14:38:10 +00:00 |
|
Cedric Nugteren
|
aa97c836b1
|
Fixed an issue with linking against the ATLAS BLAS library
|
2016-05-04 19:16:09 +02:00 |
|
Cedric Nugteren
|
435729a43e
|
Added tuning results for AMD Hawaii (R9 290X)
|
2016-05-02 20:20:23 +02:00 |
|
Cedric Nugteren
|
a8f109296c
|
Fixed the calculation of the required buffer sizes in case of subvectors and submatrices
|
2016-05-02 20:04:55 +02:00 |
|
Cedric Nugteren
|
27d0ac7f38
|
Added tuning results for AMD Pitcairn (R9 270X)
|
2016-05-01 19:33:50 +02:00 |
|
Cedric Nugteren
|
c94b628318
|
Updated tuning database for reduction/dot kernels based on the new tuner; partially repopulated the database
|
2016-05-01 19:17:04 +02:00 |
|
Cedric Nugteren
|
b9317d7d0c
|
Made the default xDOT tuning size smaller
|
2016-05-01 14:39:44 +02:00 |
|
Cedric Nugteren
|
bee2f943ec
|
Changed the index buffer of IxAMAX routines to unsigned int for proper buffersize checking
|
2016-05-01 14:03:37 +02:00 |
|
Cedric Nugteren
|
9602c150aa
|
Added a program cache (per-context) next to the per-device binary cache
|
2016-05-01 12:56:08 +02:00 |
|
Cedric Nugteren
|
e113ff0852
|
Added non-aboslute minimum counter-part IxMIN of the BLAS routine IxAMAX
|
2016-04-30 09:49:39 +02:00 |
|
Cedric Nugteren
|
2952390f27
|
Added an example to demonstrate the use of the ClearCache and FillCache functions
|
2016-04-29 23:33:36 +02:00 |
|
Cedric Nugteren
|
877aad693f
|
Added FillCache: a function to pre-compile all kernels for a specific device
|
2016-04-29 23:33:12 +02:00 |
|
Cedric Nugteren
|
4f528b1730
|
Added sample C programs for the SASUM and DGEMV routines
|
2016-04-29 20:33:19 +02:00 |
|
Cedric Nugteren
|
d9b21d7f49
|
Fixed the cache to store binaries instead of OpenCL programs
|
2016-04-28 21:14:17 +02:00 |
|
Cedric Nugteren
|
d7ddbdeb1f
|
Added non-absolute counter-parts xSUM and IxMAX of the BLAS routines xASUM and IxAMAX
|
2016-04-27 18:07:30 +02:00 |
|
Cedric Nugteren
|
13eed1a0f9
|
Added missing namespace to the SGEMM example
|
2016-04-27 17:59:28 +02:00 |
|
Cedric Nugteren
|
8075934ca7
|
Added prototypes for non-BLAS routines: xSUM and IxMAX (non-absolute counterparts of xASUM and IxAMAX)
|
2016-04-27 17:06:19 +02:00 |
|
Cedric Nugteren
|
82be8f211c
|
Moved all cache-related functions to a separate file; added a ClearCompiledProgramCache function to clear the cache
|
2016-04-27 16:02:13 +02:00 |
|
Cedric Nugteren
|
44bdb60e83
|
Relaxed the absolute error margin for floating-point value comparisons to 1e-4
|
2016-04-27 14:42:30 +02:00 |
|
Cedric Nugteren
|
226e834d0a
|
Added a '-verbose' option to the test binaries to report errors in more detail if needed
|
2016-04-27 14:38:30 +02:00 |
|
Cedric Nugteren
|
3555cd0436
|
All CLBlast enum constants now have the same raw values as in the cblas standard
|
2016-04-27 11:37:55 +02:00 |
|
cnugteren
|
c8e28a33c0
|
Merge branch 'level1_routines' into development
|
2016-04-20 22:14:55 -06:00 |
|
cnugteren
|
16a048f1ac
|
Added support for the iSAMAX/iDAMAX/iCAMAX/iZAMAX routines
|
2016-04-20 22:12:51 -06:00 |
|
cnugteren
|
894983fc3c
|
Added prototype for ixAMAX routines
|
2016-04-20 21:11:33 -06:00 |
|
cnugteren
|
5a4f8217be
|
Updated the reduction-kernel tuner to also tune the epilogue
|
2016-04-14 21:37:52 -06:00 |
|
cnugteren
|
8be99de82d
|
Added support for the SASUM/DASUM/ScASUM/DzASUM routines
|
2016-04-14 19:58:26 -06:00 |
|
cnugteren
|
e0497807e2
|
Added prototype for xASUM routines
|
2016-04-13 21:44:49 -06:00 |
|
cnugteren
|
a61724ece5
|
Fixed the way the defaults are calculated in the database; added warning for non-matching tuner arguments
|
2016-04-11 22:27:44 -06:00 |
|
cnugteren
|
1d3d38a261
|
Events are now properly implemented using event waiting list and asking the user to wait for event completion
|
2016-04-09 22:22:24 -06:00 |
|
cnugteren
|
c2cfee76c4
|
Properly set warning flags for Clang
|
2016-04-04 08:39:13 -07:00 |
|
cnugteren
|
90e237b97a
|
Removed redundant queue synchronisation statements
|
2016-04-04 08:38:31 -07:00 |
|
cnugteren
|
2981ca4d3c
|
Merge branch 'cpu_blas' into development
|
2016-04-03 16:08:48 -07:00 |
|
cnugteren
|
c4ab9bda63
|
Updated the documentation in light of the support for a reference CPU BLAS library
|
2016-04-03 16:07:25 -07:00 |
|
cnugteren
|
cf841d1840
|
Added support for detection of CPU BLAS libraries OpenBLAS, BLIS and Accelerate on OS X
|
2016-04-03 15:51:03 -07:00 |
|
cnugteren
|
1a82861a90
|
Added support for testing (performance and correctness) against a CPU BLAS library
|
2016-04-02 11:58:00 -07:00 |
|
cnugteren
|
5c83217cf2
|
Added a wrapper for CBLAS libraries for performance/correctness testing
|
2016-04-01 22:36:39 -07:00 |
|
cnugteren
|
a2056f2216
|
Create a first version of CPU BLAS detection in CMake
|
2016-03-31 22:22:29 -07:00 |
|
cnugteren
|
8217b01702
|
Updated the documentation
|
2016-03-31 20:20:32 -07:00 |
|
cnugteren
|
8c3c6db7d0
|
Merge branch 'level1_routines' into development
|
2016-03-30 21:37:56 -07:00 |
|
cnugteren
|
5409f349a1
|
Fixed the nrm2 kernel for complex data-types
|
2016-03-30 21:32:04 -07:00 |
|
cnugteren
|
6578102ae9
|
CMake now downloads the cl.hpp header from the Khronos website when building the samples
|
2016-03-30 16:24:38 -07:00 |
|
Cedric Nugteren
|
c1df786764
|
Added prototypes for the xROTM and xROTMG routines
|
2016-03-30 16:13:37 -07:00 |
|
Cedric Nugteren
|
6ecc0d089c
|
Added prototypes for the xROT and xROTG functions
|
2016-03-30 16:13:32 -07:00 |
|
Cedric Nugteren
|
6e5f558746
|
Made event an optional argument in the CLBlast C++ API
|
2016-03-30 16:13:26 -07:00 |
|
Cedric Nugteren
|
6f561abada
|
Added missing newline to the end of the public API file
|
2016-03-30 16:13:22 -07:00 |
|
Cedric Nugteren
|
2429ad5025
|
Fixed properly passing of OpenCL events to CLBlast functions
|
2016-03-30 16:12:53 -07:00 |
|
Cedric Nugteren
|
aaa687ca98
|
Added preliminary support for the xNRM2 routines
|
2016-03-28 23:00:44 +02:00 |
|
Cedric Nugteren
|
1d5a702d9d
|
Added prototypes for ScNRM2/DzNRM2 routines
|
2016-03-25 10:30:38 +01:00 |
|