CLBlast

mirror of https://github.com/CNugteren/CLBlast.git synced 2024-07-04 21:36:57 +02:00

Author	SHA1	Message	Date
Cedric Nugteren	d4ffa6395e	Merge pull request #84 from intelfx/device-specific-kernels Groundwork for device-specific routines	2016-07-24 16:48:20 +02:00
Cedric Nugteren	622682ffe3	Refactored the Python database script: separated functionality in modules, now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up	2016-07-24 16:41:01 +02:00
Ivan Shapovalov	e4e1f05079	clblast::Database, clblast::Routine: implement "database overlays" provided by routine implementation	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	ae3299da30	clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	5502c5eec4	cl::Kernel: skip NULL entries in waitForEvents	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	2dd5ee3f75	clblast::RunKernel, cl::Kernel: take const vector as waitForEvents	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	1ae71614ac	xgemm: do not hardcode kernel requirements for internal matrix layout Do not hardcode the knowledge about "A and C col-major, B row-major". This allows for easier reuse of the DoGemm() routine with different kernels.	2016-07-22 11:15:52 +03:00
Ivan Shapovalov	a1d80e7402	CMakeLists.txt: use ${clblast_SOURCE_DIR} instead of ${CMAKE_SOURCE_DIR}	2016-07-22 11:15:52 +03:00
Cedric Nugteren	b33bec4a59	Fixed some more types and type conversions in the clpp11 interface to OpenCL	2016-07-16 11:13:23 +02:00
Cedric Nugteren	bee9b959f4	Merge pull request #80 from gcp/getdevinfo_fixes Make sure the passed types are large enough.	2016-07-16 10:59:51 +02:00
Cedric Nugteren	066af4069b	Removed an unused variable from the copy-transpose-pad function	2016-07-16 10:56:37 +02:00
Gian-Carlo Pascutto	e0ba59c0ac	Make sure the passed types are large enough. Make sure all out parameters that are passed to functions such as clGetDeviceInfo are large enough to contain the replies.	2016-07-13 15:59:02 +02:00
Cedric Nugteren	c87e877bf2	Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel	2016-07-10 20:32:01 +02:00
Cedric Nugteren	57f09178d8	Added tuning results for AMD Oland and for Intel Graphics HD 530	2016-07-10 11:46:44 +02:00
Cedric Nugteren	39e9b1238f	Fixed a bug related to the cache and retrieval of programs based on the OpenCL context	2016-07-10 11:24:36 +02:00
Cedric Nugteren	9caa7ca5b9	Cache now compares cl_context instead of a pointer to a context; added verbose print statements to the cache	2016-07-08 20:57:58 +02:00
Cedric Nugteren	27854070b4	Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen	2016-07-06 21:50:12 +02:00
Cedric Nugteren	77325b8974	Added an option to the performance clients to do a warm-up run before timing	2016-07-06 21:25:55 +02:00
CNugteren	2d665099ef	Fixed a linking issue with the tuners on Visual Studio	2016-07-04 19:46:14 +02:00
Cedric Nugteren	9683b50c55	Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)	2016-07-03 20:30:47 +02:00
Cedric Nugteren	4105a79598	Merge pull request #76 from gcp/fix_local_mem_size Fixes clGetKernelWorkGroupInfo to work well with both 32-bit and 64-bit systems	2016-07-03 16:34:44 +02:00
Gian-Carlo Pascutto	7424532859	Ensure clGetKernelWorkGroupInfo return value fits. In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo to get the "bytes" amount needed to store the result from CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an "auto result = size_t", which in 32-bit mode is 4 bytes, regardless of the previous return value. The spec describes that it will actually be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure we are in fact passing a cl_ulong. Also adjust all callers to take the changed type into account.	2016-07-02 21:14:36 +02:00
Cedric Nugteren	5a690f4e36	Prints the current pandas version and reports the minimum required version	2016-07-02 16:44:13 +02:00
Cedric Nugteren	7cf2f8c268	Fixed some memory leaks related to events not properly cleaned-up	2016-07-02 15:34:55 +02:00
Cedric Nugteren	b330ab0866	Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library	2016-06-30 10:49:17 +02:00
Cedric Nugteren	cd74aaac52	Updated to version 6.0 of the CLCudaAPI header	2016-06-29 19:42:49 +02:00
Cedric Nugteren	56483347e8	Prepared the changelog for the next release	2016-06-28 22:33:13 +02:00
Cedric Nugteren	577f0ee117	Updated to version 0.8.0	2016-06-28 21:32:00 +02:00
Cedric Nugteren	33dddd3ff1	Changed the AppVeyor buildscript to use nmake instead of 'cmake --build' (2)	2016-06-28 20:56:49 +02:00
Cedric Nugteren	a003cc2f2c	Changed the AppVeyor buildscript to use nmake instead of 'cmake --build'	2016-06-28 20:48:23 +02:00
Cedric Nugteren	743da1b3fc	Fixes bug in AppVeyor with install directory (2)	2016-06-28 20:06:34 +02:00
Cedric Nugteren	88014e38bc	Fixes bug in AppVeyor with install directory	2016-06-28 18:23:32 +02:00
Cedric Nugteren	7c6bb6e21d	Added configuration for AppVeyor to keep the results of the builds as an 'artifact'	2016-06-28 17:58:34 +02:00
CNugteren	871b576c06	Made it possible to build the clients and tests on Windows using Visual Studio	2016-06-28 16:38:45 +02:00
CNugteren	2c031f3e1d	Made it possible to build the OMATCOPY test and client in case only clBLAS is present	2016-06-28 16:36:01 +02:00
Cedric Nugteren	9171f1c160	Updated the README in various places	2016-06-27 17:28:48 +02:00
Cedric Nugteren	76b20cfe0c	Fixes for the AppVeyor Windows build	2016-06-27 14:44:08 +02:00
Cedric Nugteren	5557a6ae81	Added vcvarsall to AppVeyor and added AppVeyor icons to README	2016-06-27 14:10:56 +02:00
Cedric Nugteren	dac99451d9	Fixed a bug in the Appveyor script	2016-06-27 13:55:16 +02:00
Cedric Nugteren	7eeb790824	Added Appveyor Windows CI support	2016-06-27 12:47:39 +02:00
Cedric Nugteren	5f8886339a	Increased coverage of Travis CI automatic builds	2016-06-27 12:16:12 +02:00
Cedric Nugteren	69beca90f4	Moved the performance graph scripts to the 'scripts' subfolder	2016-06-27 11:51:57 +02:00
Cedric Nugteren	ca386f9883	Added fp16 to the alltuners target	2016-06-27 11:46:33 +02:00
Cedric Nugteren	fdfbc9af13	Changed the symbol for error-code skipped tests to distinguish from succesfull error-code checks in the correctness tests	2016-06-27 11:27:54 +02:00
Cedric Nugteren	8f7131bd90	Increased the verbosity of the '-verbose' option for the correctness tests, now printing when a library is called	2016-06-27 11:16:30 +02:00
Cedric Nugteren	66908ef5cd	Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (thanks to OursDesCavernes)	2016-06-19 14:59:50 +02:00
Cedric Nugteren	eab8d3cda1	Minor fix to the database script	2016-06-19 14:55:17 +02:00
Cedric Nugteren	395a0ef34e	Merge pull request #69 from CNugteren/refactoring Refactoring of the Routine class and file-renaming	2016-06-19 14:03:53 +02:00
Cedric Nugteren	61203453aa	Renamed all C++ source files to .cpp to match the .hpp extension better	2016-06-19 13:55:49 +02:00
Cedric Nugteren	f726fbdc9f	Moved all headers into the source tree, changed headers to .hpp extension	2016-06-18 20:20:13 +02:00

1 2 3 4 5 ...

407 commits