Commit Graph

726 Commits (b0f365912159aa4d4435af211ae0a94cfbd8a977)

Author SHA1 Message Date
Cedric Nugteren b0f3659121 The master branch is now the main 'development' branch 2017-05-03 19:49:15 +02:00
Cedric Nugteren 606f2871dd Merge pull request #150 from CNugteren/development
Update to version 0.11.0
2017-05-02 22:39:50 +02:00
Cedric Nugteren e9d2a2f54c Updated to version 0.11.0 2017-05-02 20:29:59 +02:00
Cedric Nugteren c9f39ed13a Merge pull request #148 from CNugteren/benchmarking
Various updates related to benchmarking
2017-04-23 18:29:59 +02:00
Cedric Nugteren 67d4bbff66 Added an option to the database script to remove tuning results from the database 2017-04-23 17:59:16 +02:00
Cedric Nugteren 1c33af6eab Re-added Titan X (Pascal) tuning results based on more averaging when tuning 2017-04-23 17:58:56 +02:00
Cedric Nugteren 049d0fc95a Fixed a compiler warning message 2017-04-23 10:45:08 +02:00
Cedric Nugteren 3eea8dc998 Increased the default number of runs for the tuner from 2 up to 10 for fast kernels 2017-04-22 13:56:07 +02:00
Cedric Nugteren 192199c9cb Fixed the direct vs indirect setting for NVIDIA GPUs 2017-04-22 13:43:27 +02:00
Cedric Nugteren e41d204856 Increased the default number of runs for GEMV tuning; updated GEMV tuning results for Iris Pro 2017-04-21 22:12:20 +02:00
Cedric Nugteren 957aaae6ca Merge branch 'development' into benchmarking 2017-04-21 21:59:48 +02:00
Cedric Nugteren cc9ad7b33b Removed the words SUMMARY from the title of the benchmark script when benchmarking the summary 2017-04-21 21:34:44 +02:00
Cedric Nugteren 4d34083039 Updated the settings for the batched benchmarks 2017-04-20 22:19:29 +02:00
Cedric Nugteren d7314d4f8e Tuned the direct versus indirect GEMM kernel trade-off point for NVIDIA GPUs 2017-04-20 22:19:09 +02:00
Cedric Nugteren 409a5a2ad0 Fixed a namespace clash with CUDA FP16 for the half-datatype 2017-04-17 16:47:15 +02:00
Cedric Nugteren 3ec14df60e Added proper handling of mismatched arguments in the database script 2017-04-17 15:00:45 +02:00
Cedric Nugteren 3e2faa5db8 Set proper settings for the benchmarks of batched routines 2017-04-16 20:40:15 +02:00
Cedric Nugteren 2673f50518 Merge branch 'development' into benchmarking 2017-04-16 19:41:14 +02:00
Cedric Nugteren b20c518f9f Merge pull request #147 from CNugteren/cublas_reference
Added support for performance testing against cuBLAS
2017-04-16 19:38:50 +02:00
Cedric Nugteren e3bb58f602 Finalized support for performance testing against cuBLAS 2017-04-16 17:53:51 +02:00
Cedric Nugteren 063ef729e1 Added settings for benchmarking batched routines 2017-04-16 16:55:49 +02:00
Cedric Nugteren c88ad94338 Added a benchmark-all script to run multiple benchmarks automatically 2017-04-14 22:02:47 +02:00
Cedric Nugteren 5203402c41 Tuned the num-runs settings for the benchmarks 2017-04-14 21:22:02 +02:00
Cedric Nugteren 56b2f46fbf Added output-folder for benchmarking and removed the requirement on X 2017-04-14 20:32:28 +02:00
Cedric Nugteren 8833ae51be Made the number of runs a benchmark-specific setting in the benchmark scripts 2017-04-14 20:16:51 +02:00
Cedric Nugteren 10205d773e Added a new Xaxpy kernel in between the regular and fast version in 2017-04-14 20:16:10 +02:00
Cedric Nugteren f7f8ec644f Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works 2017-04-13 21:31:27 +02:00
Cedric Nugteren f24c142948 Made compilation of the cuBLAS wrapper work properly 2017-04-11 21:50:18 +02:00
Cedric Nugteren 6b625f8915 Added reference implementations for performance-testing against cuBLAS 2017-04-10 22:54:14 +02:00
Cedric Nugteren 22b3ea9256 Merge branch 'development' into cublas_reference
Conflicts:
	scripts/generator/generator.py
2017-04-10 20:11:45 +02:00
Cedric Nugteren 0da1e38097 Merge pull request #145 from CNugteren/apple_cpu_support
Patch to make tests complete on Apple's CPU implementation
2017-04-10 20:09:40 +02:00
Cedric Nugteren 7374c37e2e Fixed a compilation issue under MSVC and GCC 2017-04-10 08:38:24 +02:00
Cedric Nugteren 2d45c37676 Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard 2017-04-10 07:40:27 +02:00
Cedric Nugteren 300531b869 Updated the changelog with the Apple CPU override 2017-04-10 07:21:34 +02:00
Cedric Nugteren fb6c78ea07 Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance 2017-04-07 07:37:30 +02:00
Cedric Nugteren d28ee082b0 Uses float2 and double2 for base complex data-types instead of a custom struct; fixes bug on Apple OpenCL 2017-04-07 07:35:15 +02:00
Cedric Nugteren ce369702d8 Added some missing const-ness 2017-04-07 07:34:32 +02:00
Cedric Nugteren 52dd7433ca Completed the cuBLAS wrapper 2017-04-06 20:56:28 +02:00
Cedric Nugteren dbe22b5bf3 Fixed some size_t to int conversion warnings for the CBLAS interface 2017-04-06 19:40:51 +02:00
Cedric Nugteren 674ff96fdf Added a first version of a cuBLAS wrapper (WIP) 2017-04-05 21:27:25 +02:00
Cedric Nugteren af9a521042 Fixes the CUDA wrapper (now actually tested on a system with CUDA) 2017-04-03 21:46:07 +02:00
Cedric Nugteren 0cebcbcc71 Added proper CMake searching for CUDA and cuBLAS 2017-04-03 21:45:18 +02:00
Cedric Nugteren eb1fda2729 In-lined the float2 and double2 types to avoid collision with CUDA's definitions 2017-04-03 21:44:35 +02:00
Cedric Nugteren b24d364743 Layed the groundwork for cuBLAS comparisons in the clients 2017-04-02 18:06:15 +02:00
Cedric Nugteren c5461d77e5 Factored out inclusion of clBLAS and CBLAS from the test-routine files 2017-04-02 15:24:21 +02:00
Cedric Nugteren a9c25e9fd2 Factored out inclusion of clBLAS and CBLAS from the test-routine files 2017-04-02 15:21:19 +02:00
Cedric Nugteren ea0aeadc34 Merge pull request #144 from CNugteren/matplotlib_graphs
Benchmark scripts re-written in Python/Matplotlib
2017-04-02 15:05:09 +02:00
Cedric Nugteren 5079fbaeff Merge pull request #143 from CNugteren/test_cblas_timing
CBLAS reference code is now separated from device-host copies
2017-04-02 14:59:39 +02:00
Cedric Nugteren 0f96e9d2f9 Various tweaks to the new benchmark script 2017-04-02 14:53:55 +02:00
Cedric Nugteren 1ee71fdc80 Tuned the plots for a tight-layout for in papers and presentations 2017-04-01 14:00:46 +02:00