Commit graph

496 commits

Author SHA1 Message Date
Cedric Nugteren 9683b50c55 Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp) 2016-07-03 20:30:47 +02:00
Cedric Nugteren 4105a79598 Merge pull request #76 from gcp/fix_local_mem_size
Fixes clGetKernelWorkGroupInfo to work well with both 32-bit and 64-bit systems
2016-07-03 16:34:44 +02:00
Gian-Carlo Pascutto 7424532859 Ensure clGetKernelWorkGroupInfo return value fits.
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo
to get the "bytes" amount needed to store the result from
CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an
"auto result = size_t", which in 32-bit mode is 4 bytes, regardless
of the previous return value. The spec describes that it will actually
be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure
we are in fact passing a cl_ulong.

Also adjust all callers to take the changed type into account.
2016-07-02 21:14:36 +02:00
Cedric Nugteren 5a690f4e36 Prints the current pandas version and reports the minimum required version 2016-07-02 16:44:13 +02:00
Cedric Nugteren 7cf2f8c268 Fixed some memory leaks related to events not properly cleaned-up 2016-07-02 15:34:55 +02:00
Cedric Nugteren b330ab0866 Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library 2016-06-30 10:49:17 +02:00
Cedric Nugteren cd74aaac52 Updated to version 6.0 of the CLCudaAPI header 2016-06-29 19:42:49 +02:00
Cedric Nugteren 56483347e8 Prepared the changelog for the next release 2016-06-28 22:33:13 +02:00
Cedric Nugteren 7c13bacf12 Merge pull request #70 from CNugteren/development
Update to version 0.8.0
2016-06-28 22:32:25 +02:00
Cedric Nugteren 577f0ee117 Updated to version 0.8.0 2016-06-28 21:32:00 +02:00
Cedric Nugteren 33dddd3ff1 Changed the AppVeyor buildscript to use nmake instead of 'cmake --build' (2) 2016-06-28 20:56:49 +02:00
Cedric Nugteren a003cc2f2c Changed the AppVeyor buildscript to use nmake instead of 'cmake --build' 2016-06-28 20:48:23 +02:00
Cedric Nugteren 743da1b3fc Fixes bug in AppVeyor with install directory (2) 2016-06-28 20:06:34 +02:00
Cedric Nugteren 88014e38bc Fixes bug in AppVeyor with install directory 2016-06-28 18:23:32 +02:00
Cedric Nugteren 7c6bb6e21d Added configuration for AppVeyor to keep the results of the builds as an 'artifact' 2016-06-28 17:58:34 +02:00
CNugteren 871b576c06 Made it possible to build the clients and tests on Windows using Visual Studio 2016-06-28 16:38:45 +02:00
CNugteren 2c031f3e1d Made it possible to build the OMATCOPY test and client in case only clBLAS is present 2016-06-28 16:36:01 +02:00
Cedric Nugteren 9171f1c160 Updated the README in various places 2016-06-27 17:28:48 +02:00
Cedric Nugteren 76b20cfe0c Fixes for the AppVeyor Windows build 2016-06-27 14:44:08 +02:00
Cedric Nugteren 5557a6ae81 Added vcvarsall to AppVeyor and added AppVeyor icons to README 2016-06-27 14:10:56 +02:00
Cedric Nugteren dac99451d9 Fixed a bug in the Appveyor script 2016-06-27 13:55:16 +02:00
Cedric Nugteren 7eeb790824 Added Appveyor Windows CI support 2016-06-27 12:47:39 +02:00
Cedric Nugteren 5f8886339a Increased coverage of Travis CI automatic builds 2016-06-27 12:16:12 +02:00
Cedric Nugteren 69beca90f4 Moved the performance graph scripts to the 'scripts' subfolder 2016-06-27 11:51:57 +02:00
Cedric Nugteren ca386f9883 Added fp16 to the alltuners target 2016-06-27 11:46:33 +02:00
Cedric Nugteren fdfbc9af13 Changed the symbol for error-code skipped tests to distinguish from succesfull error-code checks in the correctness tests 2016-06-27 11:27:54 +02:00
Cedric Nugteren 8f7131bd90 Increased the verbosity of the '-verbose' option for the correctness tests, now printing when a library is called 2016-06-27 11:16:30 +02:00
Cedric Nugteren 66908ef5cd Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (thanks to OursDesCavernes) 2016-06-19 14:59:50 +02:00
Cedric Nugteren eab8d3cda1 Minor fix to the database script 2016-06-19 14:55:17 +02:00
Cedric Nugteren 395a0ef34e Merge pull request #69 from CNugteren/refactoring
Refactoring of the Routine class and file-renaming
2016-06-19 14:03:53 +02:00
Cedric Nugteren 61203453aa Renamed all C++ source files to .cpp to match the .hpp extension better 2016-06-19 13:55:49 +02:00
Cedric Nugteren f726fbdc9f Moved all headers into the source tree, changed headers to .hpp extension 2016-06-18 20:20:13 +02:00
Cedric Nugteren bacb5d2bb2 Clean-up of the routine class, moved RunKernel to the routine/common file 2016-06-18 18:16:14 +02:00
Cedric Nugteren 7b4c0e1cf0 Removed the template from the Routine base-class 2016-06-18 14:56:55 +02:00
Cedric Nugteren f9947b4d7f Removed the precision argument from the routines in favor of a single templated function 2016-06-17 14:30:37 +02:00
Cedric Nugteren 536b7fe4bc Removed the interface to the cache functions from the Routine class, calls them directly now 2016-06-17 13:57:50 +02:00
Cedric Nugteren 98a95c89fc Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine class 2016-06-17 12:32:06 +02:00
Cedric Nugteren 520e28e7a7 Moved the ErrorIn function from the Routine class to the utilities header 2016-06-17 11:41:10 +02:00
Cedric Nugteren afe8852eaa Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file 2016-06-17 11:29:07 +02:00
Cedric Nugteren 52ccaf5b25 Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing 2016-06-16 18:07:46 +02:00
Cedric Nugteren 39b7dbc5e3 Added some constness to variables related to the GEMM routines 2016-06-15 12:34:05 +02:00
Cedric Nugteren b894611ad1 Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) and renamed files and functions appropriately 2016-06-14 18:17:58 +02:00
Cedric Nugteren 3e78a99355 Moved device vendor and type checks to a common header 2016-06-14 14:30:22 +02:00
Cedric Nugteren 6e2017c67d Added support for FP16 on ARM Mali-T628 (officially not supported) 2016-06-14 14:29:53 +02:00
Cedric Nugteren 995a528cec Improved API documentation and added documentation for level-2 and level-3 routines 2016-06-13 20:17:26 +02:00
Cedric Nugteren 4fb8f9517c Added documentation for the matrix-update level-2 family of routines 2016-06-10 11:16:06 +02:00
Cedric Nugteren 6925003e45 Added global memory synchronisation for better cache performance on ARM Mali GPUs 2016-06-08 10:13:37 +02:00
Cedric Nugteren 6d6b030053 Made the CPU BLAS library the default reference to test against in favor of clBLAS 2016-06-08 09:21:39 +02:00
Cedric Nugteren 7a7873d552 Fixed the RPATH settings for linking on OS X 2016-06-06 13:40:52 +02:00
Cedric Nugteren c1895ea459 Made the tests for invalid buffer sizes also verbose in verbose mode 2016-06-06 12:20:42 +02:00