Cedric Nugteren
4ce584a014
Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC can't handle long strings
2016-09-12 22:13:16 +02:00
Cedric Nugteren
aa3dffe356
Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are now automatically taken from 32-bit if there are no entries at all
2016-09-12 20:13:38 +02:00
Cedric Nugteren
b5a67f86ec
Complete re-write of the database script. Changed Pandas for the much faster and convienient plain JSON/dict data-type
2016-09-11 21:29:28 +02:00
Cedric Nugteren
e21f32bc99
Updated database based on exhaustive tuning results for GEMM for the R9 M370X GPU
2016-09-10 14:00:43 +02:00
Cedric Nugteren
3daba70997
Updated the database script to remove duplicate entries: keeps only the best-performing cases for a specific parameters combination
2016-09-10 11:12:09 +02:00
Cedric Nugteren
55038d3c91
Split GEMM tuning in two parts: a small set of tuning parameters which is explored exhaustively and a larger set which is explored randomly
2016-09-06 20:30:06 +02:00
Cedric Nugteren
b30b26b89e
The GEMM kernel no longer adds beta*C in case beta is zero; this would cause problems if C contains NaNs
2016-09-04 17:21:16 +02:00
Cedric Nugteren
521bf6cdfc
Added tuning results for Intel Broadwell 5500 GT2 GPU
2016-09-03 16:43:23 +02:00
Cedric Nugteren
19574b2519
Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to handle duplicate entries of different runs
2016-09-03 12:45:11 +02:00
Ivan Shapovalov
ea43936e94
test/correctness: read platform and device from environment
...
Support passing environment variables CLBLAST_PLATFORM and CLBLAST_DEVICE
instead of -platform and -device arguments to test executables.
This is for `ctest`.
2016-08-27 05:37:26 +03:00
Cedric Nugteren
8d6a6a5bbf
Merge branch 'database_defaults' into development
2016-08-22 19:31:36 +02:00
Cedric Nugteren
0c0f0ac7f9
Also changed the default-default for unknown device types to use the same method as for known device groups
2016-08-21 20:35:20 +02:00
Cedric Nugteren
84db8958d1
Increased the ratio of GEMM tuning results to explore; reduced the tuning search space to have a better chance to evaluate more likely parameter combinations
2016-08-21 20:28:02 +02:00
Cedric Nugteren
6eca53ee23
Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvasschemacq-master
...
Conflicts:
src/kernels/level1/xaxpy.opencl
src/kernels/level2/xgemv.opencl
src/kernels/level2/xgemv_fast.opencl
src/kernels/level2/xger.opencl
src/kernels/level2/xher.opencl
src/kernels/level2/xher2.opencl
src/kernels/level3/xgemm_part2.opencl
2016-08-20 12:50:31 +02:00
D. Van Assche
57f1aa7685
Adapt opencl files for 1.1 OpenCL
...
In OpenCL 1.1 __kernel has to be before __attribute__, at least with
Vivante compiler.
2016-08-18 17:33:13 +02:00
Cedric Nugteren
7d5631b7e4
Updated the database script to calculate the relative best performance of tuning results common for a device/vendor type
2016-08-15 21:01:07 +02:00
Cedric Nugteren
de1afe168d
Removed all old tuning results for the XgemvFastRot kernel; re-added for a couple of devices
2016-07-25 22:57:23 +02:00
Cedric Nugteren
2582f0290a
Moved the XgemvFast and XgemvFastRot tuning database into a separate file
2016-07-25 22:43:49 +02:00
Cedric Nugteren
0252df731a
Merge branch 'development' into gemv_performance
2016-07-24 17:06:27 +02:00
Cedric Nugteren
ffa35c623a
Minor improvements after merging in groundwork for custom tuning parameters and kernels
2016-07-24 17:00:21 +02:00
Cedric Nugteren
40a72259eb
Fixe a bug in the new XgemvFastRot kernel related to local memory size
2016-07-23 16:58:11 +02:00
Cedric Nugteren
7a4f963763
Further improvements to the XgemvFastRot kernel, properly enables coalescing now
2016-07-23 14:52:32 +02:00
Cedric Nugteren
75fe8235f7
Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enabling better memory performance
2016-07-23 10:20:11 +02:00
Ivan Shapovalov
e4e1f05079
clblast::Database, clblast::Routine: implement "database overlays" provided by routine implementation
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
ae3299da30
clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
5502c5eec4
cl::Kernel: skip NULL entries in waitForEvents
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
2dd5ee3f75
clblast::RunKernel, cl::Kernel: take const vector as waitForEvents
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
1ae71614ac
xgemm: do not hardcode kernel requirements for internal matrix layout
...
Do not hardcode the knowledge about "A and C col-major, B row-major".
This allows for easier reuse of the DoGemm() routine with different
kernels.
2016-07-22 11:15:52 +03:00
Cedric Nugteren
b33bec4a59
Fixed some more types and type conversions in the clpp11 interface to OpenCL
2016-07-16 11:13:23 +02:00
Cedric Nugteren
bee9b959f4
Merge pull request #80 from gcp/getdevinfo_fixes
...
Make sure the passed types are large enough.
2016-07-16 10:59:51 +02:00
Cedric Nugteren
066af4069b
Removed an unused variable from the copy-transpose-pad function
2016-07-16 10:56:37 +02:00
Gian-Carlo Pascutto
e0ba59c0ac
Make sure the passed types are large enough.
...
Make sure all out parameters that are passed to functions such
as clGetDeviceInfo are large enough to contain the replies.
2016-07-13 15:59:02 +02:00
Cedric Nugteren
c87e877bf2
Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel
2016-07-10 20:32:01 +02:00
Cedric Nugteren
57f09178d8
Added tuning results for AMD Oland and for Intel Graphics HD 530
2016-07-10 11:46:44 +02:00
Cedric Nugteren
39e9b1238f
Fixed a bug related to the cache and retrieval of programs based on the OpenCL context
2016-07-10 11:24:36 +02:00
Cedric Nugteren
9caa7ca5b9
Cache now compares cl_context instead of a pointer to a context; added verbose print statements to the cache
2016-07-08 20:57:58 +02:00
Cedric Nugteren
27854070b4
Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen
2016-07-06 21:50:12 +02:00
Cedric Nugteren
77325b8974
Added an option to the performance clients to do a warm-up run before timing
2016-07-06 21:25:55 +02:00
Cedric Nugteren
9683b50c55
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
2016-07-03 20:30:47 +02:00
Gian-Carlo Pascutto
7424532859
Ensure clGetKernelWorkGroupInfo return value fits.
...
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo
to get the "bytes" amount needed to store the result from
CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an
"auto result = size_t", which in 32-bit mode is 4 bytes, regardless
of the previous return value. The spec describes that it will actually
be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure
we are in fact passing a cl_ulong.
Also adjust all callers to take the changed type into account.
2016-07-02 21:14:36 +02:00
Cedric Nugteren
7cf2f8c268
Fixed some memory leaks related to events not properly cleaned-up
2016-07-02 15:34:55 +02:00
Cedric Nugteren
b330ab0866
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
2016-06-30 10:49:17 +02:00
Cedric Nugteren
cd74aaac52
Updated to version 6.0 of the CLCudaAPI header
2016-06-29 19:42:49 +02:00
CNugteren
871b576c06
Made it possible to build the clients and tests on Windows using Visual Studio
2016-06-28 16:38:45 +02:00
Cedric Nugteren
76b20cfe0c
Fixes for the AppVeyor Windows build
2016-06-27 14:44:08 +02:00
Cedric Nugteren
66908ef5cd
Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (thanks to OursDesCavernes)
2016-06-19 14:59:50 +02:00
Cedric Nugteren
61203453aa
Renamed all C++ source files to .cpp to match the .hpp extension better
2016-06-19 13:55:49 +02:00
Cedric Nugteren
f726fbdc9f
Moved all headers into the source tree, changed headers to .hpp extension
2016-06-18 20:20:13 +02:00
Cedric Nugteren
bacb5d2bb2
Clean-up of the routine class, moved RunKernel to the routine/common file
2016-06-18 18:16:14 +02:00
Cedric Nugteren
7b4c0e1cf0
Removed the template from the Routine base-class
2016-06-18 14:56:55 +02:00