Commit graph

203 commits

Author SHA1 Message Date
Cedric Nugteren b1929d8ce7 It is now possible to set the OpenCL compiler options through an environmental variable 2016-09-21 21:22:16 +02:00
Cedric Nugteren 4ce584a014 Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC can't handle long strings 2016-09-12 22:13:16 +02:00
Cedric Nugteren aa3dffe356 Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are now automatically taken from 32-bit if there are no entries at all 2016-09-12 20:13:38 +02:00
Cedric Nugteren b5a67f86ec Complete re-write of the database script. Changed Pandas for the much faster and convienient plain JSON/dict data-type 2016-09-11 21:29:28 +02:00
Cedric Nugteren e21f32bc99 Updated database based on exhaustive tuning results for GEMM for the R9 M370X GPU 2016-09-10 14:00:43 +02:00
Cedric Nugteren 3daba70997 Updated the database script to remove duplicate entries: keeps only the best-performing cases for a specific parameters combination 2016-09-10 11:12:09 +02:00
Cedric Nugteren 55038d3c91 Split GEMM tuning in two parts: a small set of tuning parameters which is explored exhaustively and a larger set which is explored randomly 2016-09-06 20:30:06 +02:00
Cedric Nugteren b30b26b89e The GEMM kernel no longer adds beta*C in case beta is zero; this would cause problems if C contains NaNs 2016-09-04 17:21:16 +02:00
Cedric Nugteren 521bf6cdfc Added tuning results for Intel Broadwell 5500 GT2 GPU 2016-09-03 16:43:23 +02:00
Cedric Nugteren 19574b2519 Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to handle duplicate entries of different runs 2016-09-03 12:45:11 +02:00
Ivan Shapovalov ea43936e94 test/correctness: read platform and device from environment
Support passing environment variables CLBLAST_PLATFORM and CLBLAST_DEVICE
instead of -platform and -device arguments to test executables.

This is for `ctest`.
2016-08-27 05:37:26 +03:00
Cedric Nugteren 8d6a6a5bbf Merge branch 'database_defaults' into development 2016-08-22 19:31:36 +02:00
Cedric Nugteren 0c0f0ac7f9 Also changed the default-default for unknown device types to use the same method as for known device groups 2016-08-21 20:35:20 +02:00
Cedric Nugteren 84db8958d1 Increased the ratio of GEMM tuning results to explore; reduced the tuning search space to have a better chance to evaluate more likely parameter combinations 2016-08-21 20:28:02 +02:00
Cedric Nugteren 6eca53ee23 Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvasschemacq-master
Conflicts:
	src/kernels/level1/xaxpy.opencl
	src/kernels/level2/xgemv.opencl
	src/kernels/level2/xgemv_fast.opencl
	src/kernels/level2/xger.opencl
	src/kernels/level2/xher.opencl
	src/kernels/level2/xher2.opencl
	src/kernels/level3/xgemm_part2.opencl
2016-08-20 12:50:31 +02:00
D. Van Assche 57f1aa7685 Adapt opencl files for 1.1 OpenCL
In OpenCL 1.1 __kernel has to be before __attribute__, at least with
Vivante compiler.
2016-08-18 17:33:13 +02:00
Cedric Nugteren 7d5631b7e4 Updated the database script to calculate the relative best performance of tuning results common for a device/vendor type 2016-08-15 21:01:07 +02:00
Cedric Nugteren de1afe168d Removed all old tuning results for the XgemvFastRot kernel; re-added for a couple of devices 2016-07-25 22:57:23 +02:00
Cedric Nugteren 2582f0290a Moved the XgemvFast and XgemvFastRot tuning database into a separate file 2016-07-25 22:43:49 +02:00
Cedric Nugteren 0252df731a Merge branch 'development' into gemv_performance 2016-07-24 17:06:27 +02:00
Cedric Nugteren ffa35c623a Minor improvements after merging in groundwork for custom tuning parameters and kernels 2016-07-24 17:00:21 +02:00
Cedric Nugteren 40a72259eb Fixe a bug in the new XgemvFastRot kernel related to local memory size 2016-07-23 16:58:11 +02:00
Cedric Nugteren 7a4f963763 Further improvements to the XgemvFastRot kernel, properly enables coalescing now 2016-07-23 14:52:32 +02:00
Cedric Nugteren 75fe8235f7 Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enabling better memory performance 2016-07-23 10:20:11 +02:00
Ivan Shapovalov e4e1f05079 clblast::Database, clblast::Routine: implement "database overlays" provided by routine implementation 2016-07-22 11:15:52 +03:00
Ivan Shapovalov ae3299da30 clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS 2016-07-22 11:15:52 +03:00
Ivan Shapovalov 5502c5eec4 cl::Kernel: skip NULL entries in waitForEvents 2016-07-22 11:15:52 +03:00
Ivan Shapovalov 2dd5ee3f75 clblast::RunKernel, cl::Kernel: take const vector as waitForEvents 2016-07-22 11:15:52 +03:00
Ivan Shapovalov 1ae71614ac xgemm: do not hardcode kernel requirements for internal matrix layout
Do not hardcode the knowledge about "A and C col-major, B row-major".

This allows for easier reuse of the DoGemm() routine with different
kernels.
2016-07-22 11:15:52 +03:00
Cedric Nugteren b33bec4a59 Fixed some more types and type conversions in the clpp11 interface to OpenCL 2016-07-16 11:13:23 +02:00
Cedric Nugteren bee9b959f4 Merge pull request #80 from gcp/getdevinfo_fixes
Make sure the passed types are large enough.
2016-07-16 10:59:51 +02:00
Cedric Nugteren 066af4069b Removed an unused variable from the copy-transpose-pad function 2016-07-16 10:56:37 +02:00
Gian-Carlo Pascutto e0ba59c0ac Make sure the passed types are large enough.
Make sure all out parameters that are passed to functions such
as clGetDeviceInfo are large enough to contain the replies.
2016-07-13 15:59:02 +02:00
Cedric Nugteren c87e877bf2 Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel 2016-07-10 20:32:01 +02:00
Cedric Nugteren 57f09178d8 Added tuning results for AMD Oland and for Intel Graphics HD 530 2016-07-10 11:46:44 +02:00
Cedric Nugteren 39e9b1238f Fixed a bug related to the cache and retrieval of programs based on the OpenCL context 2016-07-10 11:24:36 +02:00
Cedric Nugteren 9caa7ca5b9 Cache now compares cl_context instead of a pointer to a context; added verbose print statements to the cache 2016-07-08 20:57:58 +02:00
Cedric Nugteren 27854070b4 Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen 2016-07-06 21:50:12 +02:00
Cedric Nugteren 77325b8974 Added an option to the performance clients to do a warm-up run before timing 2016-07-06 21:25:55 +02:00
Cedric Nugteren 9683b50c55 Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp) 2016-07-03 20:30:47 +02:00
Gian-Carlo Pascutto 7424532859 Ensure clGetKernelWorkGroupInfo return value fits.
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo
to get the "bytes" amount needed to store the result from
CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an
"auto result = size_t", which in 32-bit mode is 4 bytes, regardless
of the previous return value. The spec describes that it will actually
be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure
we are in fact passing a cl_ulong.

Also adjust all callers to take the changed type into account.
2016-07-02 21:14:36 +02:00
Cedric Nugteren 7cf2f8c268 Fixed some memory leaks related to events not properly cleaned-up 2016-07-02 15:34:55 +02:00
Cedric Nugteren b330ab0866 Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library 2016-06-30 10:49:17 +02:00
Cedric Nugteren cd74aaac52 Updated to version 6.0 of the CLCudaAPI header 2016-06-29 19:42:49 +02:00
CNugteren 871b576c06 Made it possible to build the clients and tests on Windows using Visual Studio 2016-06-28 16:38:45 +02:00
Cedric Nugteren 76b20cfe0c Fixes for the AppVeyor Windows build 2016-06-27 14:44:08 +02:00
Cedric Nugteren 66908ef5cd Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (thanks to OursDesCavernes) 2016-06-19 14:59:50 +02:00
Cedric Nugteren 61203453aa Renamed all C++ source files to .cpp to match the .hpp extension better 2016-06-19 13:55:49 +02:00
Cedric Nugteren f726fbdc9f Moved all headers into the source tree, changed headers to .hpp extension 2016-06-18 20:20:13 +02:00
Cedric Nugteren bacb5d2bb2 Clean-up of the routine class, moved RunKernel to the routine/common file 2016-06-18 18:16:14 +02:00