Cedric Nugteren
|
75c0e861b8
|
Merge branch 'gemm_direct_bug'
|
2017-07-01 14:44:29 +02:00 |
|
Cedric Nugteren
|
4cf516cfec
|
Fixed an if-statement in the direct GEMM kernel causing a bug with specific sets of input parameters
|
2017-06-30 21:57:41 +02:00 |
|
Cedric Nugteren
|
52881f3864
|
Added batched GEMM example program
|
2017-06-29 21:15:25 +02:00 |
|
Cedric Nugteren
|
4e51b1e1f8
|
Moved and inlined some static member variables and disabled spurious clang warnings
|
2017-06-27 21:05:16 +02:00 |
|
Cedric Nugteren
|
e60b10529a
|
Undo of earlier move of TestBlas::kTransposes constant to fix MSVC 2013 compilation
|
2017-06-27 20:59:28 +02:00 |
|
Cedric Nugteren
|
ce528a9d39
|
Fixed and suppresses several warnings for MSVC
|
2017-06-26 21:38:04 +02:00 |
|
Cedric Nugteren
|
a823edb65f
|
Reduced optimization level for the (non-performance critical) host-code to speed-up compilation
|
2017-06-26 21:36:56 +02:00 |
|
Cedric Nugteren
|
19504ed609
|
Moved static variable declarations from .cpp to .hpp to resolve some Clang warnings
|
2017-06-25 20:59:22 +02:00 |
|
Cedric Nugteren
|
b8df03e5bc
|
Added CLBlast paper and presentation references in README
|
2017-06-25 20:45:14 +02:00 |
|
Cedric Nugteren
|
1a8ed48a35
|
Fixed some Clang and MSVC warnings
|
2017-06-25 11:50:36 +02:00 |
|
Cedric Nugteren
|
7eab65b699
|
Merge branch 'database_compilation_speed'
|
2017-06-25 10:16:52 +02:00 |
|
Cedric Nugteren
|
615a7fdc81
|
Fixes some compilation issues related to the database structure change
|
2017-06-21 23:07:47 +02:00 |
|
Cedric Nugteren
|
e44feb8576
|
Changed the structure of the database to reduce compilation time and save memory
|
2017-06-20 21:19:26 +02:00 |
|
Cedric Nugteren
|
48f2682eb7
|
Added tuning results for the Core i7-920 CPU
|
2017-06-18 20:53:59 +02:00 |
|
Cedric Nugteren
|
3070b502b5
|
Fixed an overflow bug on 32-bit systems when chosing a GEMM kernel
|
2017-06-18 20:51:11 +02:00 |
|
Cedric Nugteren
|
33ed1e5a06
|
Added tuning results for GeForce GT 650M (thanks to bzcheeseman)
|
2017-06-01 22:52:08 +02:00 |
|
Cedric Nugteren
|
f57e209aab
|
Merge pull request #158 from CNugteren/msvc_compilation_fixes
MSVC compilation fixes
|
2017-05-27 17:53:30 +02:00 |
|
Cedric Nugteren
|
4e04008729
|
Update to AppVeyor because of changed Khronos repository (9)
|
2017-05-27 17:39:36 +02:00 |
|
Cedric Nugteren
|
7827cfbe4a
|
Update to AppVeyor because of changed Khronos repository (8)
|
2017-05-27 17:33:47 +02:00 |
|
Cedric Nugteren
|
9ae6f174d9
|
Update to AppVeyor because of changed Khronos repository (7)
|
2017-05-27 17:30:30 +02:00 |
|
Cedric Nugteren
|
bb37bd0814
|
Update to AppVeyor because of changed Khronos repository (6)
|
2017-05-27 17:17:10 +02:00 |
|
Cedric Nugteren
|
53d739129e
|
Update to AppVeyor because of changed Khronos repository (5)
|
2017-05-27 17:11:22 +02:00 |
|
Cedric Nugteren
|
f7a822110c
|
Update to AppVeyor because of changed Khronos repository (4)
|
2017-05-27 17:06:09 +02:00 |
|
Cedric Nugteren
|
3bca9f85d2
|
Update to AppVeyor because of changed Khronos repository (3)
|
2017-05-27 17:01:11 +02:00 |
|
Cedric Nugteren
|
70188686f2
|
Merge pull request #157 from kpot/improved_caching
Fixes inability to run GEMM on multiple identical GPUs (issue #155)
|
2017-05-27 09:47:25 +02:00 |
|
Kirill Mavreshko
|
64ba590279
|
Fixed comment decribing the order of program cache fields
|
2017-05-27 10:30:09 +05:00 |
|
Cedric Nugteren
|
01de4b5413
|
Update to AppVeyor because of changed Khronos repository (2)
|
2017-05-26 22:20:04 +02:00 |
|
Cedric Nugteren
|
e8b6f01e04
|
Update to AppVeyor because of changed Khronos repository
|
2017-05-26 22:12:02 +02:00 |
|
Cedric Nugteren
|
f7a16d427c
|
Fixed a compilation issue under MSVC 2013
|
2017-05-26 22:10:56 +02:00 |
|
Kirill Mavreshko
|
628e1e8cce
|
Fixes inability to run GEMM on multiple identical GPUs (issue #155)
|
2017-05-26 15:04:19 +05:00 |
|
Cedric Nugteren
|
9c703a6021
|
Merge pull request #156 from ctuning/master
changing "wb" to "w" when saving json file (text mode)
|
2017-05-24 20:18:41 +02:00 |
|
Grigori Fursin
|
35e2e6c3a4
|
changing "wb" to "w" when saving json file (text mode) - compatibility for Python 3
|
2017-05-24 15:08:34 +02:00 |
|
Cedric Nugteren
|
953a5a9c22
|
Fixed a minor compilation issue of a sample with GCC 4.8
|
2017-05-15 22:14:17 +02:00 |
|
Cedric Nugteren
|
8400ee3a09
|
Fixed an TRSM issue caused by incorrect block size calculation
|
2017-05-15 22:04:55 +02:00 |
|
Cedric Nugteren
|
512b83dbad
|
Fixed a missing synchronization barrier in the invert kernel; fixes TRSM tests
|
2017-05-14 20:27:35 +02:00 |
|
Cedric Nugteren
|
f151e56daa
|
Added the IxAMIN routines: absolute minimum version of IxAMAX
|
2017-05-12 20:01:33 -07:00 |
|
Cedric Nugteren
|
86e8df60f1
|
Fixed a bug in the TRSM routine; tests now pass
|
2017-05-12 17:43:56 -07:00 |
|
Cedric Nugteren
|
81d9ed3946
|
Removed the included performance reports; README now redirects to the new external website
|
2017-05-12 13:18:10 -07:00 |
|
Cedric Nugteren
|
71933c3411
|
Added tuning results for the AMD Radeon Fiji GPU
|
2017-05-11 22:53:52 -07:00 |
|
Cedric Nugteren
|
d67455fdb8
|
Fixes the build-status table in the README
|
2017-05-11 22:22:10 -07:00 |
|
Cedric Nugteren
|
93c8db7fe7
|
Bug-fix in the half-precision test of the amax routine
|
2017-05-11 22:19:15 -07:00 |
|
Cedric Nugteren
|
1df28a15fc
|
Re-added random tuning for GEMM after accidental removal
|
2017-05-11 22:12:38 -07:00 |
|
Cedric Nugteren
|
97955fc221
|
Minor naming fixes to the benchmark script
|
2017-05-11 22:12:16 -07:00 |
|
Cedric Nugteren
|
81f598eceb
|
Merge branch 'master_is_neww_devel_branch'
|
2017-05-11 21:41:18 -07:00 |
|
Cedric Nugteren
|
b0f3659121
|
The master branch is now the main 'development' branch
|
2017-05-03 19:49:15 +02:00 |
|
Cedric Nugteren
|
606f2871dd
|
Merge pull request #150 from CNugteren/development
Update to version 0.11.0
|
2017-05-02 22:39:50 +02:00 |
|
Cedric Nugteren
|
e9d2a2f54c
|
Updated to version 0.11.0
|
2017-05-02 20:29:59 +02:00 |
|
Cedric Nugteren
|
c9f39ed13a
|
Merge pull request #148 from CNugteren/benchmarking
Various updates related to benchmarking
|
2017-04-23 18:29:59 +02:00 |
|
Cedric Nugteren
|
67d4bbff66
|
Added an option to the database script to remove tuning results from the database
|
2017-04-23 17:59:16 +02:00 |
|
Cedric Nugteren
|
1c33af6eab
|
Re-added Titan X (Pascal) tuning results based on more averaging when tuning
|
2017-04-23 17:58:56 +02:00 |
|