Cedric Nugteren
|
8400ee3a09
|
Fixed an TRSM issue caused by incorrect block size calculation
|
2017-05-15 22:04:55 +02:00 |
|
Cedric Nugteren
|
512b83dbad
|
Fixed a missing synchronization barrier in the invert kernel; fixes TRSM tests
|
2017-05-14 20:27:35 +02:00 |
|
Cedric Nugteren
|
f151e56daa
|
Added the IxAMIN routines: absolute minimum version of IxAMAX
|
2017-05-12 20:01:33 -07:00 |
|
Cedric Nugteren
|
86e8df60f1
|
Fixed a bug in the TRSM routine; tests now pass
|
2017-05-12 17:43:56 -07:00 |
|
Cedric Nugteren
|
81d9ed3946
|
Removed the included performance reports; README now redirects to the new external website
|
2017-05-12 13:18:10 -07:00 |
|
Cedric Nugteren
|
71933c3411
|
Added tuning results for the AMD Radeon Fiji GPU
|
2017-05-11 22:53:52 -07:00 |
|
Cedric Nugteren
|
d67455fdb8
|
Fixes the build-status table in the README
|
2017-05-11 22:22:10 -07:00 |
|
Cedric Nugteren
|
93c8db7fe7
|
Bug-fix in the half-precision test of the amax routine
|
2017-05-11 22:19:15 -07:00 |
|
Cedric Nugteren
|
1df28a15fc
|
Re-added random tuning for GEMM after accidental removal
|
2017-05-11 22:12:38 -07:00 |
|
Cedric Nugteren
|
97955fc221
|
Minor naming fixes to the benchmark script
|
2017-05-11 22:12:16 -07:00 |
|
Cedric Nugteren
|
81f598eceb
|
Merge branch 'master_is_neww_devel_branch'
|
2017-05-11 21:41:18 -07:00 |
|
Cedric Nugteren
|
b0f3659121
|
The master branch is now the main 'development' branch
|
2017-05-03 19:49:15 +02:00 |
|
Cedric Nugteren
|
606f2871dd
|
Merge pull request #150 from CNugteren/development
Update to version 0.11.0
|
2017-05-02 22:39:50 +02:00 |
|
Cedric Nugteren
|
e9d2a2f54c
|
Updated to version 0.11.0
|
2017-05-02 20:29:59 +02:00 |
|
Cedric Nugteren
|
c9f39ed13a
|
Merge pull request #148 from CNugteren/benchmarking
Various updates related to benchmarking
|
2017-04-23 18:29:59 +02:00 |
|
Cedric Nugteren
|
67d4bbff66
|
Added an option to the database script to remove tuning results from the database
|
2017-04-23 17:59:16 +02:00 |
|
Cedric Nugteren
|
1c33af6eab
|
Re-added Titan X (Pascal) tuning results based on more averaging when tuning
|
2017-04-23 17:58:56 +02:00 |
|
Cedric Nugteren
|
049d0fc95a
|
Fixed a compiler warning message
|
2017-04-23 10:45:08 +02:00 |
|
Cedric Nugteren
|
3eea8dc998
|
Increased the default number of runs for the tuner from 2 up to 10 for fast kernels
|
2017-04-22 13:56:07 +02:00 |
|
Cedric Nugteren
|
192199c9cb
|
Fixed the direct vs indirect setting for NVIDIA GPUs
|
2017-04-22 13:43:27 +02:00 |
|
Cedric Nugteren
|
e41d204856
|
Increased the default number of runs for GEMV tuning; updated GEMV tuning results for Iris Pro
|
2017-04-21 22:12:20 +02:00 |
|
Cedric Nugteren
|
957aaae6ca
|
Merge branch 'development' into benchmarking
|
2017-04-21 21:59:48 +02:00 |
|
Cedric Nugteren
|
cc9ad7b33b
|
Removed the words SUMMARY from the title of the benchmark script when benchmarking the summary
|
2017-04-21 21:34:44 +02:00 |
|
Cedric Nugteren
|
4d34083039
|
Updated the settings for the batched benchmarks
|
2017-04-20 22:19:29 +02:00 |
|
Cedric Nugteren
|
d7314d4f8e
|
Tuned the direct versus indirect GEMM kernel trade-off point for NVIDIA GPUs
|
2017-04-20 22:19:09 +02:00 |
|
Cedric Nugteren
|
409a5a2ad0
|
Fixed a namespace clash with CUDA FP16 for the half-datatype
|
2017-04-17 16:47:15 +02:00 |
|
Cedric Nugteren
|
3ec14df60e
|
Added proper handling of mismatched arguments in the database script
|
2017-04-17 15:00:45 +02:00 |
|
Cedric Nugteren
|
3e2faa5db8
|
Set proper settings for the benchmarks of batched routines
|
2017-04-16 20:40:15 +02:00 |
|
Cedric Nugteren
|
2673f50518
|
Merge branch 'development' into benchmarking
|
2017-04-16 19:41:14 +02:00 |
|
Cedric Nugteren
|
b20c518f9f
|
Merge pull request #147 from CNugteren/cublas_reference
Added support for performance testing against cuBLAS
|
2017-04-16 19:38:50 +02:00 |
|
Cedric Nugteren
|
e3bb58f602
|
Finalized support for performance testing against cuBLAS
|
2017-04-16 17:53:51 +02:00 |
|
Cedric Nugteren
|
063ef729e1
|
Added settings for benchmarking batched routines
|
2017-04-16 16:55:49 +02:00 |
|
Cedric Nugteren
|
c88ad94338
|
Added a benchmark-all script to run multiple benchmarks automatically
|
2017-04-14 22:02:47 +02:00 |
|
Cedric Nugteren
|
5203402c41
|
Tuned the num-runs settings for the benchmarks
|
2017-04-14 21:22:02 +02:00 |
|
Cedric Nugteren
|
56b2f46fbf
|
Added output-folder for benchmarking and removed the requirement on X
|
2017-04-14 20:32:28 +02:00 |
|
Cedric Nugteren
|
8833ae51be
|
Made the number of runs a benchmark-specific setting in the benchmark scripts
|
2017-04-14 20:16:51 +02:00 |
|
Cedric Nugteren
|
10205d773e
|
Added a new Xaxpy kernel in between the regular and fast version in
|
2017-04-14 20:16:10 +02:00 |
|
Cedric Nugteren
|
f7f8ec644f
|
Fixed CUDA malloc and cuBLAS handles: cuBLAS as a performance-reference now works
|
2017-04-13 21:31:27 +02:00 |
|
Cedric Nugteren
|
f24c142948
|
Made compilation of the cuBLAS wrapper work properly
|
2017-04-11 21:50:18 +02:00 |
|
Cedric Nugteren
|
6b625f8915
|
Added reference implementations for performance-testing against cuBLAS
|
2017-04-10 22:54:14 +02:00 |
|
Cedric Nugteren
|
22b3ea9256
|
Merge branch 'development' into cublas_reference
Conflicts:
scripts/generator/generator.py
|
2017-04-10 20:11:45 +02:00 |
|
Cedric Nugteren
|
0da1e38097
|
Merge pull request #145 from CNugteren/apple_cpu_support
Patch to make tests complete on Apple's CPU implementation
|
2017-04-10 20:09:40 +02:00 |
|
Cedric Nugteren
|
7374c37e2e
|
Fixed a compilation issue under MSVC and GCC
|
2017-04-10 08:38:24 +02:00 |
|
Cedric Nugteren
|
2d45c37676
|
Removed const-vector-of-const-objects from the database class to remain according to the C++11 standard
|
2017-04-10 07:40:27 +02:00 |
|
Cedric Nugteren
|
300531b869
|
Updated the changelog with the Apple CPU override
|
2017-04-10 07:21:34 +02:00 |
|
Cedric Nugteren
|
fb6c78ea07
|
Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance
|
2017-04-07 07:37:30 +02:00 |
|
Cedric Nugteren
|
d28ee082b0
|
Uses float2 and double2 for base complex data-types instead of a custom struct; fixes bug on Apple OpenCL
|
2017-04-07 07:35:15 +02:00 |
|
Cedric Nugteren
|
ce369702d8
|
Added some missing const-ness
|
2017-04-07 07:34:32 +02:00 |
|
Cedric Nugteren
|
52dd7433ca
|
Completed the cuBLAS wrapper
|
2017-04-06 20:56:28 +02:00 |
|
Cedric Nugteren
|
dbe22b5bf3
|
Fixed some size_t to int conversion warnings for the CBLAS interface
|
2017-04-06 19:40:51 +02:00 |
|