Commit graph

741 commits

Author SHA1 Message Date
Kirill Mavreshko 628e1e8cce Fixes inability to run GEMM on multiple identical GPUs (issue #155) 2017-05-26 15:04:19 +05:00
Cedric Nugteren 9c703a6021 Merge pull request #156 from ctuning/master
changing "wb" to "w" when saving json file (text mode)
2017-05-24 20:18:41 +02:00
Grigori Fursin 35e2e6c3a4 changing "wb" to "w" when saving json file (text mode) - compatibility for Python 3 2017-05-24 15:08:34 +02:00
Cedric Nugteren 953a5a9c22 Fixed a minor compilation issue of a sample with GCC 4.8 2017-05-15 22:14:17 +02:00
Cedric Nugteren 8400ee3a09 Fixed an TRSM issue caused by incorrect block size calculation 2017-05-15 22:04:55 +02:00
Cedric Nugteren 512b83dbad Fixed a missing synchronization barrier in the invert kernel; fixes TRSM tests 2017-05-14 20:27:35 +02:00
Cedric Nugteren f151e56daa Added the IxAMIN routines: absolute minimum version of IxAMAX 2017-05-12 20:01:33 -07:00
Cedric Nugteren 86e8df60f1 Fixed a bug in the TRSM routine; tests now pass 2017-05-12 17:43:56 -07:00
Cedric Nugteren 81d9ed3946 Removed the included performance reports; README now redirects to the new external website 2017-05-12 13:18:10 -07:00
Cedric Nugteren 71933c3411 Added tuning results for the AMD Radeon Fiji GPU 2017-05-11 22:53:52 -07:00
Cedric Nugteren d67455fdb8 Fixes the build-status table in the README 2017-05-11 22:22:10 -07:00
Cedric Nugteren 93c8db7fe7 Bug-fix in the half-precision test of the amax routine 2017-05-11 22:19:15 -07:00
Cedric Nugteren 1df28a15fc Re-added random tuning for GEMM after accidental removal 2017-05-11 22:12:38 -07:00
Cedric Nugteren 97955fc221 Minor naming fixes to the benchmark script 2017-05-11 22:12:16 -07:00
Cedric Nugteren 81f598eceb Merge branch 'master_is_neww_devel_branch' 2017-05-11 21:41:18 -07:00
Cedric Nugteren b0f3659121 The master branch is now the main 'development' branch 2017-05-03 19:49:15 +02:00
Cedric Nugteren 606f2871dd Merge pull request #150 from CNugteren/development
Update to version 0.11.0
2017-05-02 22:39:50 +02:00
Cedric Nugteren e9d2a2f54c Updated to version 0.11.0 2017-05-02 20:29:59 +02:00
Cedric Nugteren c9f39ed13a Merge pull request #148 from CNugteren/benchmarking
Various updates related to benchmarking
2017-04-23 18:29:59 +02:00
Cedric Nugteren 67d4bbff66 Added an option to the database script to remove tuning results from the database 2017-04-23 17:59:16 +02:00
Cedric Nugteren 1c33af6eab Re-added Titan X (Pascal) tuning results based on more averaging when tuning 2017-04-23 17:58:56 +02:00
Cedric Nugteren 049d0fc95a Fixed a compiler warning message 2017-04-23 10:45:08 +02:00
Cedric Nugteren 3eea8dc998 Increased the default number of runs for the tuner from 2 up to 10 for fast kernels 2017-04-22 13:56:07 +02:00
Cedric Nugteren 192199c9cb Fixed the direct vs indirect setting for NVIDIA GPUs 2017-04-22 13:43:27 +02:00
Cedric Nugteren e41d204856 Increased the default number of runs for GEMV tuning; updated GEMV tuning results for Iris Pro 2017-04-21 22:12:20 +02:00
Cedric Nugteren 957aaae6ca Merge branch 'development' into benchmarking 2017-04-21 21:59:48 +02:00
Cedric Nugteren cc9ad7b33b Removed the words SUMMARY from the title of the benchmark script when benchmarking the summary 2017-04-21 21:34:44 +02:00
Cedric Nugteren 4d34083039 Updated the settings for the batched benchmarks 2017-04-20 22:19:29 +02:00
Cedric Nugteren d7314d4f8e Tuned the direct versus indirect GEMM kernel trade-off point for NVIDIA GPUs 2017-04-20 22:19:09 +02:00
Cedric Nugteren 409a5a2ad0 Fixed a namespace clash with CUDA FP16 for the half-datatype 2017-04-17 16:47:15 +02:00
Cedric Nugteren 3ec14df60e Added proper handling of mismatched arguments in the database script 2017-04-17 15:00:45 +02:00
Cedric Nugteren 3e2faa5db8 Set proper settings for the benchmarks of batched routines 2017-04-16 20:40:15 +02:00
Cedric Nugteren 2673f50518 Merge branch 'development' into benchmarking 2017-04-16 19:41:14 +02:00
Cedric Nugteren b20c518f9f Merge pull request #147 from CNugteren/cublas_reference
Added support for performance testing against cuBLAS
2017-04-16 19:38:50 +02:00
Cedric Nugteren e3bb58f602 Finalized support for performance testing against cuBLAS 2017-04-16 17:53:51 +02:00
Cedric Nugteren 063ef729e1 Added settings for benchmarking batched routines 2017-04-16 16:55:49 +02:00
Cedric Nugteren c88ad94338 Added a benchmark-all script to run multiple benchmarks automatically 2017-04-14 22:02:47 +02:00
Cedric Nugteren 5203402c41 Tuned the num-runs settings for the benchmarks 2017-04-14 21:22:02 +02:00
Cedric Nugteren 56b2f46fbf Added output-folder for benchmarking and removed the requirement on X 2017-04-14 20:32:28 +02:00
Cedric Nugteren 8833ae51be Made the number of runs a benchmark-specific setting in the benchmark scripts 2017-04-14 20:16:51 +02:00
Cedric Nugteren 10205d773e Added a new Xaxpy kernel in between the regular and fast version in 2017-04-14 20:16:10 +02:00
Cedric Nugteren f7f8ec644f Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works 2017-04-13 21:31:27 +02:00
Cedric Nugteren f24c142948 Made compilation of the cuBLAS wrapper work properly 2017-04-11 21:50:18 +02:00
Cedric Nugteren 6b625f8915 Added reference implementations for performance-testing against cuBLAS 2017-04-10 22:54:14 +02:00
Cedric Nugteren 22b3ea9256 Merge branch 'development' into cublas_reference
Conflicts:
	scripts/generator/generator.py
2017-04-10 20:11:45 +02:00
Cedric Nugteren 0da1e38097 Merge pull request #145 from CNugteren/apple_cpu_support
Patch to make tests complete on Apple's CPU implementation
2017-04-10 20:09:40 +02:00
Cedric Nugteren 7374c37e2e Fixed a compilation issue under MSVC and GCC 2017-04-10 08:38:24 +02:00
Cedric Nugteren 2d45c37676 Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard 2017-04-10 07:40:27 +02:00
Cedric Nugteren 300531b869 Updated the changelog with the Apple CPU override 2017-04-10 07:21:34 +02:00
Cedric Nugteren fb6c78ea07 Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance 2017-04-07 07:37:30 +02:00