Cedric Nugteren
|
cc5b475425
|
CUDA API now takes context and device in instead of stream
|
2017-10-12 12:20:43 +02:00 |
|
Cedric Nugteren
|
b901809345
|
Added first (untested) version of a CUDA API
|
2017-10-11 23:16:57 +02:00 |
|
Cedric Nugteren
|
9224da19ef
|
Fixed the Python generator script w.r.t. the recent change of testing direct/in-direct GEMM kernels separately
|
2017-10-09 20:06:25 +02:00 |
|
Cedric Nugteren
|
44246053a5
|
Removed include of clpp11.hpp in places other than utilities.hpp
|
2017-10-09 19:41:40 +02:00 |
|
Cedric Nugteren
|
e8f1de0265
|
Made the half-precision header OpenCL-independent
|
2017-10-09 18:30:19 +02:00 |
|
Cedric Nugteren
|
df3c9f4a8a
|
Moved non-routine-specific API functions and includes to separate files
|
2017-10-08 21:52:02 +02:00 |
|
Cedric Nugteren
|
2bb8402ec1
|
Merge pull request #198 from CNugteren/cuda_api_preparation
Cuda API preparation
|
2017-10-08 12:03:15 +02:00 |
|
Cedric Nugteren
|
3598762029
|
Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs
|
2017-10-08 10:29:47 +02:00 |
|
Cedric Nugteren
|
6d3e1212f0
|
Synchronizes clpp11.h with CLCudaAPI 9.0
|
2017-10-07 18:43:29 +02:00 |
|
Cedric Nugteren
|
b2058320d1
|
Merge pull request #197 from CNugteren/single_temporary_gemm_buffer
Single temporary GEMM buffer
|
2017-10-07 18:41:46 +02:00 |
|
Cedric Nugteren
|
86b80cdc98
|
Fixed a small typo
|
2017-10-07 18:39:32 +02:00 |
|
Cedric Nugteren
|
375193fe4e
|
Gemm in-direct implementation now uses only 1 larger instead of max 3 optional temporary buffers
|
2017-10-03 21:55:21 +02:00 |
|
Cedric Nugteren
|
74fd6767b9
|
GEMM tests now test both the in-direct and the direct kernels seperately
|
2017-10-01 20:36:56 +02:00 |
|
Cedric Nugteren
|
6b226028d5
|
Allow OverrideParameters function to work before a kernel was first used
|
2017-10-01 20:32:39 +02:00 |
|
Cedric Nugteren
|
1009303717
|
Merge branch 'additional_tuners'
|
2017-09-30 21:04:32 +02:00 |
|
Cedric Nugteren
|
c86ba85541
|
Merge pull request #196 from CNugteren/preparation_for_size_specific_parameters
Preparation for size specific parameters
|
2017-09-30 21:03:35 +02:00 |
|
Cedric Nugteren
|
29c5283c4b
|
Kernels are now cached based on their routine name and their tuning parameters
|
2017-09-30 20:29:18 +02:00 |
|
Cedric Nugteren
|
3b7371f81b
|
Merge branch 'master' into preparation_for_size_specific_parameters
|
2017-09-30 20:26:50 +02:00 |
|
Cedric Nugteren
|
c151ab1325
|
Refactored the tuning architecture: less duplicate now; more defaults
|
2017-09-30 20:26:26 +02:00 |
|
Cedric Nugteren
|
ef082bba0d
|
Fixed a minor appveyor artifact issue
|
2017-09-30 17:33:37 +02:00 |
|
Cedric Nugteren
|
f4c4674cf6
|
Updated to version 1.1.0
|
2017-09-30 17:19:17 +02:00 |
|
Cedric Nugteren
|
2949e156f5
|
Added notes for Android compilation of CLBlast
|
2017-09-26 21:23:53 +02:00 |
|
Cedric Nugteren
|
00b5771477
|
Added Android header for compilation with gnustl STL
|
2017-09-26 21:20:01 +02:00 |
|
Cedric Nugteren
|
21af690472
|
Added missing headers
|
2017-09-26 21:17:55 +02:00 |
|
Cedric Nugteren
|
ed980a1df1
|
Updated database override function to work with the new database storage format
|
2017-09-24 15:44:14 +02:00 |
|
Cedric Nugteren
|
255f09843c
|
Made program and binary databases dependent on the routine parameters on top of the name
|
2017-09-23 20:40:38 +02:00 |
|
Cedric Nugteren
|
0d8313708c
|
Merge branch 'device_name_slow_on_nvidia_gpu'
|
2017-09-23 18:12:13 +02:00 |
|
Cedric Nugteren
|
2df9f21ab8
|
Added extra benchmarks to verify new database caching keys performance
|
2017-09-23 18:06:43 +02:00 |
|
Cedric Nugteren
|
890281f3e8
|
Made database-caching no longer dependent on device name but on device/platform IDs
|
2017-09-23 17:50:44 +02:00 |
|
Cedric Nugteren
|
0dd2ca9283
|
Merge pull request #192 from CNugteren/diagnostics_helper
Diagnostics helper
|
2017-09-23 11:43:19 +02:00 |
|
Cedric Nugteren
|
65c492edf6
|
Added OpenCL properties printing to the diagnostics helper
|
2017-09-22 21:35:32 +02:00 |
|
Cedric Nugteren
|
2ef6578961
|
Added first version of a small CLBlast diagnostics helper
|
2017-09-19 21:43:35 +02:00 |
|
Cedric Nugteren
|
44b59ec0cb
|
Merge branch 'msvc2013_fixes'
|
2017-09-19 19:54:43 +02:00 |
|
Cedric Nugteren
|
ae1eeb4d1f
|
Fixed type conversion warnings under MSVC 2013
|
2017-09-19 19:44:34 +02:00 |
|
Cedric Nugteren
|
1d2ee29cb9
|
Fixed compilation issues of the database for MSVC 2013
|
2017-09-19 19:44:05 +02:00 |
|
Cedric Nugteren
|
a23cd8d13a
|
Updated README with proper AMD device names; fixed device look-up for names of length 50+
|
2017-09-16 21:26:38 +02:00 |
|
Cedric Nugteren
|
0802e3d84c
|
Added tuning results for Intel Core i7 6770HQ
|
2017-09-16 21:19:06 +02:00 |
|
Cedric Nugteren
|
7d0ef8e10d
|
Merge pull request #191 from CNugteren/database_improvements
Database improvements
|
2017-09-16 20:37:09 +02:00 |
|
Cedric Nugteren
|
bcf39eb79a
|
Fixed a compilation error and warning under MacOS
|
2017-09-16 18:34:11 +02:00 |
|
Cedric Nugteren
|
163474e171
|
Fixed an issue with the NVIDIA compute capability not being retrieved properly
|
2017-09-16 18:25:23 +02:00 |
|
Cedric Nugteren
|
4e317f5e85
|
Improved compilation time of the tuner database
|
2017-09-16 18:02:37 +02:00 |
|
Cedric Nugteren
|
c21878ecce
|
Added a guard against missing AMD and NVIDIA extensions
|
2017-09-14 21:58:08 +02:00 |
|
Cedric Nugteren
|
0d13d814c2
|
Added architecture layer in the tuning database for better performance on unseen devices
|
2017-09-14 21:27:33 +02:00 |
|
Cedric Nugteren
|
14a61d2425
|
Added database compress and de-compress functions
|
2017-09-12 22:25:52 +02:00 |
|
Cedric Nugteren
|
ebe10d5118
|
Database now works with new format of clblast_[property]
|
2017-09-11 20:40:37 +02:00 |
|
Cedric Nugteren
|
76382ff6c1
|
Added the new vendor-architecture-name hierarchy to the tuners as well
|
2017-09-10 16:34:54 +02:00 |
|
Cedric Nugteren
|
91ea7fcde2
|
Introduced the notion of a device-architecture for the database and added device and architecture name mappings
|
2017-09-08 21:09:05 +02:00 |
|
Cedric Nugteren
|
20da5e33a8
|
Split the database files over multiple directories and files; first step towards separate compilation
|
2017-09-06 21:50:42 +02:00 |
|
Cedric Nugteren
|
bb947890de
|
Merge branch 'im2col_bugfix'
|
2017-09-05 20:08:00 +02:00 |
|
Cedric Nugteren
|
8905da259d
|
Fixed a modulo and division issue manifesting on Apple OpenCL for im2col
|
2017-09-05 18:49:23 +02:00 |
|