CLBlast

mirror of https://github.com/CNugteren/CLBlast.git synced 2024-07-04 21:36:57 +02:00

Author	SHA1	Message	Date
Cedric Nugteren	0c0f0ac7f9	Also changed the default-default for unknown device types to use the same method as for known device groups	2016-08-21 20:35:20 +02:00
Cedric Nugteren	84db8958d1	Increased the ratio of GEMM tuning results to explore; reduced the tuning search space to have a better chance to evaluate more likely parameter combinations	2016-08-21 20:28:02 +02:00
Cedric Nugteren	00979faab4	Updated the changelog; refactored the database-get-bests code a bit	2016-08-21 20:16:06 +02:00
Cedric Nugteren	7d5631b7e4	Updated the database script to calculate the relative best performance of tuning results common for a device/vendor type	2016-08-15 21:01:07 +02:00
Cedric Nugteren	7da6492b36	Improved the speed of the new common-best defaults method for the database generation	2016-08-09 21:06:04 +02:00
Cedric Nugteren	3f5401d4c8	Added a first version of the database's common-best default calculation	2016-08-07 16:25:38 +02:00
Cedric Nugteren	35623cd98d	Minor update regarding the previous CMake export/install target changes	2016-07-28 20:45:09 +02:00
Cedric Nugteren	c3712f5b36	Merge pull request #86 from intelfx/cmake CMakeLists.txt: provide a find_package() config for dependent projects	2016-07-28 20:17:13 +02:00
Ivan Shapovalov	227374deba	.appveyor.yml: move {OPENCL,CLBLAST}_ROOT out of source tree Reasoning is the same as in previous commit: CMake does not like having OpenCL header path inside of the source tree. CLBLAST_ROOT is moved for uniformity.	2016-07-28 19:09:30 +03:00
Ivan Shapovalov	6c11fdc12c	.travis.yml: use OpenCL ICD Loader and headers shipped by distro Using our own headers causes problems with CMake which does not like having OpenCL header path inside of the source tree. While at it, use distro's universal OpenCL loader as well.	2016-07-28 19:09:29 +03:00
Ivan Shapovalov	b5d7b58393	CMakeLists.txt: use target_include_directories()	2016-07-28 19:09:29 +03:00
Ivan Shapovalov	570cbcffa7	CMakeLists.txt: provide a find_package() config for dependent projects	2016-07-28 19:09:29 +03:00
Cedric Nugteren	1ec21421d7	Merge branch 'gemv_performance' into development	2016-07-26 20:02:14 +02:00
Cedric Nugteren	de1afe168d	Removed all old tuning results for the XgemvFastRot kernel; re-added for a couple of devices	2016-07-25 22:57:23 +02:00
Cedric Nugteren	2582f0290a	Moved the XgemvFast and XgemvFastRot tuning database into a separate file	2016-07-25 22:43:49 +02:00
Cedric Nugteren	0252df731a	Merge branch 'development' into gemv_performance	2016-07-24 17:06:27 +02:00
Cedric Nugteren	ffa35c623a	Minor improvements after merging in groundwork for custom tuning parameters and kernels	2016-07-24 17:00:21 +02:00
Cedric Nugteren	d4ffa6395e	Merge pull request #84 from intelfx/device-specific-kernels Groundwork for device-specific routines	2016-07-24 16:48:20 +02:00
Cedric Nugteren	622682ffe3	Refactored the Python database script: separated functionality in modules, now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up	2016-07-24 16:41:01 +02:00
Cedric Nugteren	40a72259eb	Fixe a bug in the new XgemvFastRot kernel related to local memory size	2016-07-23 16:58:11 +02:00
Cedric Nugteren	7a4f963763	Further improvements to the XgemvFastRot kernel, properly enables coalescing now	2016-07-23 14:52:32 +02:00
Cedric Nugteren	75fe8235f7	Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enabling better memory performance	2016-07-23 10:20:11 +02:00
Ivan Shapovalov	e4e1f05079	clblast::Database, clblast::Routine: implement "database overlays" provided by routine implementation	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	ae3299da30	clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	5502c5eec4	cl::Kernel: skip NULL entries in waitForEvents	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	2dd5ee3f75	clblast::RunKernel, cl::Kernel: take const vector as waitForEvents	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	1ae71614ac	xgemm: do not hardcode kernel requirements for internal matrix layout Do not hardcode the knowledge about "A and C col-major, B row-major". This allows for easier reuse of the DoGemm() routine with different kernels.	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	a1d80e7402	CMakeLists.txt: use ${clblast_SOURCE_DIR} instead of ${CMAKE_SOURCE_DIR}	2016-07-22 11:15:52 +03:00
Cedric Nugteren	b33bec4a59	Fixed some more types and type conversions in the clpp11 interface to OpenCL	2016-07-16 11:13:23 +02:00
Cedric Nugteren	bee9b959f4	Merge pull request #80 from gcp/getdevinfo_fixes Make sure the passed types are large enough.	2016-07-16 10:59:51 +02:00
Cedric Nugteren	066af4069b	Removed an unused variable from the copy-transpose-pad function	2016-07-16 10:56:37 +02:00
Gian-Carlo Pascutto	e0ba59c0ac	Make sure the passed types are large enough. Make sure all out parameters that are passed to functions such as clGetDeviceInfo are large enough to contain the replies.	2016-07-13 15:59:02 +02:00
Cedric Nugteren	c87e877bf2	Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel	2016-07-10 20:32:01 +02:00
Cedric Nugteren	57f09178d8	Added tuning results for AMD Oland and for Intel Graphics HD 530	2016-07-10 11:46:44 +02:00
Cedric Nugteren	39e9b1238f	Fixed a bug related to the cache and retrieval of programs based on the OpenCL context	2016-07-10 11:24:36 +02:00
Cedric Nugteren	9caa7ca5b9	Cache now compares cl_context instead of a pointer to a context; added verbose print statements to the cache	2016-07-08 20:57:58 +02:00
Cedric Nugteren	27854070b4	Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen	2016-07-06 21:50:12 +02:00
Cedric Nugteren	77325b8974	Added an option to the performance clients to do a warm-up run before timing	2016-07-06 21:25:55 +02:00
CNugteren	2d665099ef	Fixed a linking issue with the tuners on Visual Studio	2016-07-04 19:46:14 +02:00
Cedric Nugteren	9683b50c55	Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)	2016-07-03 20:30:47 +02:00
Cedric Nugteren	4105a79598	Merge pull request #76 from gcp/fix_local_mem_size Fixes clGetKernelWorkGroupInfo to work well with both 32-bit and 64-bit systems	2016-07-03 16:34:44 +02:00
Gian-Carlo Pascutto	7424532859	Ensure clGetKernelWorkGroupInfo return value fits. In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo to get the "bytes" amount needed to store the result from CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an "auto result = size_t", which in 32-bit mode is 4 bytes, regardless of the previous return value. The spec describes that it will actually be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure we are in fact passing a cl_ulong. Also adjust all callers to take the changed type into account.	2016-07-02 21:14:36 +02:00
Cedric Nugteren	5a690f4e36	Prints the current pandas version and reports the minimum required version	2016-07-02 16:44:13 +02:00
Cedric Nugteren	7cf2f8c268	Fixed some memory leaks related to events not properly cleaned-up	2016-07-02 15:34:55 +02:00
Cedric Nugteren	b330ab0866	Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library	2016-06-30 10:49:17 +02:00
Cedric Nugteren	cd74aaac52	Updated to version 6.0 of the CLCudaAPI header	2016-06-29 19:42:49 +02:00
Cedric Nugteren	56483347e8	Prepared the changelog for the next release	2016-06-28 22:33:13 +02:00
Cedric Nugteren	577f0ee117	Updated to version 0.8.0	2016-06-28 21:32:00 +02:00
Cedric Nugteren	33dddd3ff1	Changed the AppVeyor buildscript to use nmake instead of 'cmake --build' (2)	2016-06-28 20:56:49 +02:00
Cedric Nugteren	a003cc2f2c	Changed the AppVeyor buildscript to use nmake instead of 'cmake --build'	2016-06-28 20:48:23 +02:00

1 2 3 4 5 ...

427 commits