Cedric Nugteren
|
b20c518f9f
|
Merge pull request #147 from CNugteren/cublas_reference
Added support for performance testing against cuBLAS
|
2017-04-16 19:38:50 +02:00 |
|
Cedric Nugteren
|
e3bb58f602
|
Finalized support for performance testing against cuBLAS
|
2017-04-16 17:53:51 +02:00 |
|
Cedric Nugteren
|
063ef729e1
|
Added settings for benchmarking batched routines
|
2017-04-16 16:55:49 +02:00 |
|
Cedric Nugteren
|
c88ad94338
|
Added a benchmark-all script to run multiple benchmarks automatically
|
2017-04-14 22:02:47 +02:00 |
|
Cedric Nugteren
|
5203402c41
|
Tuned the num-runs settings for the benchmarks
|
2017-04-14 21:22:02 +02:00 |
|
Cedric Nugteren
|
56b2f46fbf
|
Added output-folder for benchmarking and removed the requirement on X
|
2017-04-14 20:32:28 +02:00 |
|
Cedric Nugteren
|
8833ae51be
|
Made the number of runs a benchmark-specific setting in the benchmark scripts
|
2017-04-14 20:16:51 +02:00 |
|
Cedric Nugteren
|
10205d773e
|
Added a new Xaxpy kernel in between the regular and fast version in
|
2017-04-14 20:16:10 +02:00 |
|
Cedric Nugteren
|
f7f8ec644f
|
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works
|
2017-04-13 21:31:27 +02:00 |
|
Cedric Nugteren
|
f24c142948
|
Made compilation of the cuBLAS wrapper work properly
|
2017-04-11 21:50:18 +02:00 |
|
Cedric Nugteren
|
6b625f8915
|
Added reference implementations for performance-testing against cuBLAS
|
2017-04-10 22:54:14 +02:00 |
|
Cedric Nugteren
|
22b3ea9256
|
Merge branch 'development' into cublas_reference
Conflicts:
scripts/generator/generator.py
|
2017-04-10 20:11:45 +02:00 |
|
Cedric Nugteren
|
0da1e38097
|
Merge pull request #145 from CNugteren/apple_cpu_support
Patch to make tests complete on Apple's CPU implementation
|
2017-04-10 20:09:40 +02:00 |
|
Cedric Nugteren
|
7374c37e2e
|
Fixed a compilation issue under MSVC and GCC
|
2017-04-10 08:38:24 +02:00 |
|
Cedric Nugteren
|
2d45c37676
|
Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard
|
2017-04-10 07:40:27 +02:00 |
|
Cedric Nugteren
|
300531b869
|
Updated the changelog with the Apple CPU override
|
2017-04-10 07:21:34 +02:00 |
|
Cedric Nugteren
|
fb6c78ea07
|
Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance
|
2017-04-07 07:37:30 +02:00 |
|
Cedric Nugteren
|
d28ee082b0
|
Uses float2 and double2 for base complex data-types instead of a custom struct; fixes bug on Apple OpenCL
|
2017-04-07 07:35:15 +02:00 |
|
Cedric Nugteren
|
ce369702d8
|
Added some missing const-ness
|
2017-04-07 07:34:32 +02:00 |
|
Cedric Nugteren
|
52dd7433ca
|
Completed the cuBLAS wrapper
|
2017-04-06 20:56:28 +02:00 |
|
Cedric Nugteren
|
dbe22b5bf3
|
Fixed some size_t to int conversion warnings for the CBLAS interface
|
2017-04-06 19:40:51 +02:00 |
|
Cedric Nugteren
|
674ff96fdf
|
Added a first version of a cuBLAS wrapper (WIP)
|
2017-04-05 21:27:25 +02:00 |
|
Cedric Nugteren
|
af9a521042
|
Fixes the CUDA wrapper (now actually tested on a system with CUDA)
|
2017-04-03 21:46:07 +02:00 |
|
Cedric Nugteren
|
0cebcbcc71
|
Added proper CMake searching for CUDA and cuBLAS
|
2017-04-03 21:45:18 +02:00 |
|
Cedric Nugteren
|
eb1fda2729
|
In-lined the float2 and double2 types to avoid collision with CUDA's definitions
|
2017-04-03 21:44:35 +02:00 |
|
Cedric Nugteren
|
b24d364743
|
Layed the groundwork for cuBLAS comparisons in the clients
|
2017-04-02 18:06:15 +02:00 |
|
Cedric Nugteren
|
c5461d77e5
|
Factored out inclusion of clBLAS and CBLAS from the test-routine files
|
2017-04-02 15:24:21 +02:00 |
|
Cedric Nugteren
|
a9c25e9fd2
|
Factored out inclusion of clBLAS and CBLAS from the test-routine files
|
2017-04-02 15:21:19 +02:00 |
|
Cedric Nugteren
|
ea0aeadc34
|
Merge pull request #144 from CNugteren/matplotlib_graphs
Benchmark scripts re-written in Python/Matplotlib
|
2017-04-02 15:05:09 +02:00 |
|
Cedric Nugteren
|
5079fbaeff
|
Merge pull request #143 from CNugteren/test_cblas_timing
CBLAS reference code is now separated from device-host copies
|
2017-04-02 14:59:39 +02:00 |
|
Cedric Nugteren
|
0f96e9d2f9
|
Various tweaks to the new benchmark script
|
2017-04-02 14:53:55 +02:00 |
|
Cedric Nugteren
|
1ee71fdc80
|
Tuned the plots for a tight-layout for in papers and presentations
|
2017-04-01 14:00:46 +02:00 |
|
Cedric Nugteren
|
b84d2296b8
|
Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication
|
2017-04-01 13:36:24 +02:00 |
|
Cedric Nugteren
|
fa5c4b00b7
|
Replaced the R graph scripts with Python/Matplotlib benchmark scripts
|
2017-03-26 15:36:34 +02:00 |
|
Cedric Nugteren
|
a98c00a267
|
Fixed a GCC/MSVC compilation issue
|
2017-03-20 19:53:55 +01:00 |
|
Cedric Nugteren
|
a21d903796
|
Merge pull request #142 from CNugteren/gemm_batched
Added a first batched version of the GEMM routine
|
2017-03-19 18:27:40 +01:00 |
|
Cedric Nugteren
|
0610447a7a
|
Fixed a compilation issue for GCC/MSVC
|
2017-03-19 17:37:52 +01:00 |
|
Cedric Nugteren
|
c27d2f0c1e
|
Added an (optional) non-direct implementation of the batched GEMM routine
|
2017-03-19 16:04:04 +01:00 |
|
Cedric Nugteren
|
2fd04dae83
|
Added batched versions of the pad/copy/transpose kernels
|
2017-03-19 15:57:44 +01:00 |
|
Cedric Nugteren
|
11bb30e72b
|
Added the possibility to tune batched kernels
|
2017-03-14 20:29:51 +01:00 |
|
Cedric Nugteren
|
068ff32e9f
|
Fixed a linker issue for Clang
|
2017-03-12 10:41:18 +01:00 |
|
Cedric Nugteren
|
7b8f8fce68
|
Added initial naive version of the batched GEMM routine based on the direct GEMM kernel
|
2017-03-11 16:02:45 +01:00 |
|
Cedric Nugteren
|
49e04c7fce
|
Added API and test infrastructure for the batched GEMM routine
|
2017-03-10 21:24:35 +01:00 |
|
Cedric Nugteren
|
de3500ed18
|
Merge pull request #141 from CNugteren/axpy_batched
Added the batched version of the AXPY routine
|
2017-03-10 21:15:29 +01:00 |
|
Cedric Nugteren
|
3846f44eaf
|
Small fix for a file that isn't currently compiled anymore
|
2017-03-10 20:53:20 +01:00 |
|
Cedric Nugteren
|
d754586b49
|
Added proper testing of the alpha parameter; finalized the batched AXPY implementation
|
2017-03-10 20:49:59 +01:00 |
|
Cedric Nugteren
|
92a657290a
|
Fixed a small compilation bug for MSVC related to a floating-point constant
|
2017-03-10 20:30:10 +01:00 |
|
Cedric Nugteren
|
878d93e7dc
|
Implemented a batched version of the AXPY kernel
|
2017-03-08 20:36:35 +01:00 |
|
Cedric Nugteren
|
fa0a9c689f
|
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
|
2017-03-08 20:10:20 +01:00 |
|
Cedric Nugteren
|
6aba0bbae7
|
Minor fixes to the client w.r.t. the addition of the batch count
|
2017-03-05 16:44:16 +01:00 |
|