Cedric Nugteren
27854070b4
Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen
2016-07-06 21:50:12 +02:00
Cedric Nugteren
77325b8974
Added an option to the performance clients to do a warm-up run before timing
2016-07-06 21:25:55 +02:00
CNugteren
2d665099ef
Fixed a linking issue with the tuners on Visual Studio
2016-07-04 19:46:14 +02:00
Cedric Nugteren
9683b50c55
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
2016-07-03 20:30:47 +02:00
Cedric Nugteren
4105a79598
Merge pull request #76 from gcp/fix_local_mem_size
...
Fixes clGetKernelWorkGroupInfo to work well with both 32-bit and 64-bit systems
2016-07-03 16:34:44 +02:00
Gian-Carlo Pascutto
7424532859
Ensure clGetKernelWorkGroupInfo return value fits.
...
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo
to get the "bytes" amount needed to store the result from
CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an
"auto result = size_t", which in 32-bit mode is 4 bytes, regardless
of the previous return value. The spec describes that it will actually
be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure
we are in fact passing a cl_ulong.
Also adjust all callers to take the changed type into account.
2016-07-02 21:14:36 +02:00
Cedric Nugteren
5a690f4e36
Prints the current pandas version and reports the minimum required version
2016-07-02 16:44:13 +02:00
Cedric Nugteren
7cf2f8c268
Fixed some memory leaks related to events not properly cleaned-up
2016-07-02 15:34:55 +02:00
Cedric Nugteren
b330ab0866
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
2016-06-30 10:49:17 +02:00
Cedric Nugteren
cd74aaac52
Updated to version 6.0 of the CLCudaAPI header
2016-06-29 19:42:49 +02:00
Cedric Nugteren
56483347e8
Prepared the changelog for the next release
2016-06-28 22:33:13 +02:00
Cedric Nugteren
7c13bacf12
Merge pull request #70 from CNugteren/development
...
Update to version 0.8.0
2016-06-28 22:32:25 +02:00
Cedric Nugteren
577f0ee117
Updated to version 0.8.0
2016-06-28 21:32:00 +02:00
Cedric Nugteren
33dddd3ff1
Changed the AppVeyor buildscript to use nmake instead of 'cmake --build' (2)
2016-06-28 20:56:49 +02:00
Cedric Nugteren
a003cc2f2c
Changed the AppVeyor buildscript to use nmake instead of 'cmake --build'
2016-06-28 20:48:23 +02:00
Cedric Nugteren
743da1b3fc
Fixes bug in AppVeyor with install directory (2)
2016-06-28 20:06:34 +02:00
Cedric Nugteren
88014e38bc
Fixes bug in AppVeyor with install directory
2016-06-28 18:23:32 +02:00
Cedric Nugteren
7c6bb6e21d
Added configuration for AppVeyor to keep the results of the builds as an 'artifact'
2016-06-28 17:58:34 +02:00
CNugteren
871b576c06
Made it possible to build the clients and tests on Windows using Visual Studio
2016-06-28 16:38:45 +02:00
CNugteren
2c031f3e1d
Made it possible to build the OMATCOPY test and client in case only clBLAS is present
2016-06-28 16:36:01 +02:00
Cedric Nugteren
9171f1c160
Updated the README in various places
2016-06-27 17:28:48 +02:00
Cedric Nugteren
76b20cfe0c
Fixes for the AppVeyor Windows build
2016-06-27 14:44:08 +02:00
Cedric Nugteren
5557a6ae81
Added vcvarsall to AppVeyor and added AppVeyor icons to README
2016-06-27 14:10:56 +02:00
Cedric Nugteren
dac99451d9
Fixed a bug in the Appveyor script
2016-06-27 13:55:16 +02:00
Cedric Nugteren
7eeb790824
Added Appveyor Windows CI support
2016-06-27 12:47:39 +02:00
Cedric Nugteren
5f8886339a
Increased coverage of Travis CI automatic builds
2016-06-27 12:16:12 +02:00
Cedric Nugteren
69beca90f4
Moved the performance graph scripts to the 'scripts' subfolder
2016-06-27 11:51:57 +02:00
Cedric Nugteren
ca386f9883
Added fp16 to the alltuners target
2016-06-27 11:46:33 +02:00
Cedric Nugteren
fdfbc9af13
Changed the symbol for error-code skipped tests to distinguish from succesfull error-code checks in the correctness tests
2016-06-27 11:27:54 +02:00
Cedric Nugteren
8f7131bd90
Increased the verbosity of the '-verbose' option for the correctness tests, now printing when a library is called
2016-06-27 11:16:30 +02:00
Cedric Nugteren
66908ef5cd
Added tuning results for 'Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile' (thanks to OursDesCavernes)
2016-06-19 14:59:50 +02:00
Cedric Nugteren
eab8d3cda1
Minor fix to the database script
2016-06-19 14:55:17 +02:00
Cedric Nugteren
395a0ef34e
Merge pull request #69 from CNugteren/refactoring
...
Refactoring of the Routine class and file-renaming
2016-06-19 14:03:53 +02:00
Cedric Nugteren
61203453aa
Renamed all C++ source files to .cpp to match the .hpp extension better
2016-06-19 13:55:49 +02:00
Cedric Nugteren
f726fbdc9f
Moved all headers into the source tree, changed headers to .hpp extension
2016-06-18 20:20:13 +02:00
Cedric Nugteren
bacb5d2bb2
Clean-up of the routine class, moved RunKernel to the routine/common file
2016-06-18 18:16:14 +02:00
Cedric Nugteren
7b4c0e1cf0
Removed the template from the Routine base-class
2016-06-18 14:56:55 +02:00
Cedric Nugteren
f9947b4d7f
Removed the precision argument from the routines in favor of a single templated function
2016-06-17 14:30:37 +02:00
Cedric Nugteren
536b7fe4bc
Removed the interface to the cache functions from the Routine class, calls them directly now
2016-06-17 13:57:50 +02:00
Cedric Nugteren
98a95c89fc
Moved the RunKernel and PadCopyTransposeMatrix functions out of the Routine class
2016-06-17 12:32:06 +02:00
Cedric Nugteren
520e28e7a7
Moved the ErrorIn function from the Routine class to the utilities header
2016-06-17 11:41:10 +02:00
Cedric Nugteren
afe8852eaa
Moved the test-for-valid-buffers function from the Routine class to separate functions in a separate file
2016-06-17 11:29:07 +02:00
Cedric Nugteren
52ccaf5b25
Added XOMATCOPY routines to perform out-of-place matrix scaling, copying, and/or transposing
2016-06-16 18:07:46 +02:00
Cedric Nugteren
39b7dbc5e3
Added some constness to variables related to the GEMM routines
2016-06-15 12:34:05 +02:00
Cedric Nugteren
b894611ad1
Re-organised the level-3 supporting kernels (copy, pad, transpose, convert) and renamed files and functions appropriately
2016-06-14 18:17:58 +02:00
Cedric Nugteren
3e78a99355
Moved device vendor and type checks to a common header
2016-06-14 14:30:22 +02:00
Cedric Nugteren
6e2017c67d
Added support for FP16 on ARM Mali-T628 (officially not supported)
2016-06-14 14:29:53 +02:00
Cedric Nugteren
995a528cec
Improved API documentation and added documentation for level-2 and level-3 routines
2016-06-13 20:17:26 +02:00
Cedric Nugteren
4fb8f9517c
Added documentation for the matrix-update level-2 family of routines
2016-06-10 11:16:06 +02:00
Cedric Nugteren
6925003e45
Added global memory synchronisation for better cache performance on ARM Mali GPUs
2016-06-08 10:13:37 +02:00