Ivan Shapovalov
227374deba
.appveyor.yml: move {OPENCL,CLBLAST}_ROOT out of source tree
...
Reasoning is the same as in previous commit: CMake does not like having
OpenCL header path inside of the source tree. CLBLAST_ROOT is moved for
uniformity.
2016-07-28 19:09:30 +03:00
Ivan Shapovalov
6c11fdc12c
.travis.yml: use OpenCL ICD Loader and headers shipped by distro
...
Using our own headers causes problems with CMake which does not like having
OpenCL header path inside of the source tree. While at it, use distro's
universal OpenCL loader as well.
2016-07-28 19:09:29 +03:00
Ivan Shapovalov
b5d7b58393
CMakeLists.txt: use target_include_directories()
2016-07-28 19:09:29 +03:00
Ivan Shapovalov
570cbcffa7
CMakeLists.txt: provide a find_package() config for dependent projects
2016-07-28 19:09:29 +03:00
Cedric Nugteren
ffa35c623a
Minor improvements after merging in groundwork for custom tuning parameters and kernels
2016-07-24 17:00:21 +02:00
Cedric Nugteren
d4ffa6395e
Merge pull request #84 from intelfx/device-specific-kernels
...
Groundwork for device-specific routines
2016-07-24 16:48:20 +02:00
Cedric Nugteren
622682ffe3
Refactored the Python database script: separated functionality in modules, now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up
2016-07-24 16:41:01 +02:00
Ivan Shapovalov
e4e1f05079
clblast::Database, clblast::Routine: implement "database overlays" provided by routine implementation
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
ae3299da30
clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
5502c5eec4
cl::Kernel: skip NULL entries in waitForEvents
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
2dd5ee3f75
clblast::RunKernel, cl::Kernel: take const vector as waitForEvents
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
1ae71614ac
xgemm: do not hardcode kernel requirements for internal matrix layout
...
Do not hardcode the knowledge about "A and C col-major, B row-major".
This allows for easier reuse of the DoGemm() routine with different
kernels.
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
a1d80e7402
CMakeLists.txt: use ${clblast_SOURCE_DIR} instead of ${CMAKE_SOURCE_DIR}
2016-07-22 11:15:52 +03:00
Cedric Nugteren
b33bec4a59
Fixed some more types and type conversions in the clpp11 interface to OpenCL
2016-07-16 11:13:23 +02:00
Cedric Nugteren
bee9b959f4
Merge pull request #80 from gcp/getdevinfo_fixes
...
Make sure the passed types are large enough.
2016-07-16 10:59:51 +02:00
Cedric Nugteren
066af4069b
Removed an unused variable from the copy-transpose-pad function
2016-07-16 10:56:37 +02:00
Gian-Carlo Pascutto
e0ba59c0ac
Make sure the passed types are large enough.
...
Make sure all out parameters that are passed to functions such
as clGetDeviceInfo are large enough to contain the replies.
2016-07-13 15:59:02 +02:00
Cedric Nugteren
c87e877bf2
Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel
2016-07-10 20:32:01 +02:00
Cedric Nugteren
57f09178d8
Added tuning results for AMD Oland and for Intel Graphics HD 530
2016-07-10 11:46:44 +02:00
Cedric Nugteren
39e9b1238f
Fixed a bug related to the cache and retrieval of programs based on the OpenCL context
2016-07-10 11:24:36 +02:00
Cedric Nugteren
9caa7ca5b9
Cache now compares cl_context instead of a pointer to a context; added verbose print statements to the cache
2016-07-08 20:57:58 +02:00
Cedric Nugteren
27854070b4
Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen
2016-07-06 21:50:12 +02:00
Cedric Nugteren
77325b8974
Added an option to the performance clients to do a warm-up run before timing
2016-07-06 21:25:55 +02:00
CNugteren
2d665099ef
Fixed a linking issue with the tuners on Visual Studio
2016-07-04 19:46:14 +02:00
Cedric Nugteren
9683b50c55
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
2016-07-03 20:30:47 +02:00
Cedric Nugteren
4105a79598
Merge pull request #76 from gcp/fix_local_mem_size
...
Fixes clGetKernelWorkGroupInfo to work well with both 32-bit and 64-bit systems
2016-07-03 16:34:44 +02:00
Gian-Carlo Pascutto
7424532859
Ensure clGetKernelWorkGroupInfo return value fits.
...
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo
to get the "bytes" amount needed to store the result from
CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an
"auto result = size_t", which in 32-bit mode is 4 bytes, regardless
of the previous return value. The spec describes that it will actually
be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure
we are in fact passing a cl_ulong.
Also adjust all callers to take the changed type into account.
2016-07-02 21:14:36 +02:00
Cedric Nugteren
5a690f4e36
Prints the current pandas version and reports the minimum required version
2016-07-02 16:44:13 +02:00
Cedric Nugteren
7cf2f8c268
Fixed some memory leaks related to events not properly cleaned-up
2016-07-02 15:34:55 +02:00
Cedric Nugteren
b330ab0866
Added declspec(dllexport) to ClearCache and FillCache, and added declspec(dllimport) when not building the library
2016-06-30 10:49:17 +02:00
Cedric Nugteren
cd74aaac52
Updated to version 6.0 of the CLCudaAPI header
2016-06-29 19:42:49 +02:00
Cedric Nugteren
56483347e8
Prepared the changelog for the next release
2016-06-28 22:33:13 +02:00
Cedric Nugteren
577f0ee117
Updated to version 0.8.0
2016-06-28 21:32:00 +02:00
Cedric Nugteren
33dddd3ff1
Changed the AppVeyor buildscript to use nmake instead of 'cmake --build' (2)
2016-06-28 20:56:49 +02:00
Cedric Nugteren
a003cc2f2c
Changed the AppVeyor buildscript to use nmake instead of 'cmake --build'
2016-06-28 20:48:23 +02:00
Cedric Nugteren
743da1b3fc
Fixes bug in AppVeyor with install directory (2)
2016-06-28 20:06:34 +02:00
Cedric Nugteren
88014e38bc
Fixes bug in AppVeyor with install directory
2016-06-28 18:23:32 +02:00
Cedric Nugteren
7c6bb6e21d
Added configuration for AppVeyor to keep the results of the builds as an 'artifact'
2016-06-28 17:58:34 +02:00
CNugteren
871b576c06
Made it possible to build the clients and tests on Windows using Visual Studio
2016-06-28 16:38:45 +02:00
CNugteren
2c031f3e1d
Made it possible to build the OMATCOPY test and client in case only clBLAS is present
2016-06-28 16:36:01 +02:00
Cedric Nugteren
9171f1c160
Updated the README in various places
2016-06-27 17:28:48 +02:00
Cedric Nugteren
76b20cfe0c
Fixes for the AppVeyor Windows build
2016-06-27 14:44:08 +02:00
Cedric Nugteren
5557a6ae81
Added vcvarsall to AppVeyor and added AppVeyor icons to README
2016-06-27 14:10:56 +02:00
Cedric Nugteren
dac99451d9
Fixed a bug in the Appveyor script
2016-06-27 13:55:16 +02:00
Cedric Nugteren
7eeb790824
Added Appveyor Windows CI support
2016-06-27 12:47:39 +02:00
Cedric Nugteren
5f8886339a
Increased coverage of Travis CI automatic builds
2016-06-27 12:16:12 +02:00
Cedric Nugteren
69beca90f4
Moved the performance graph scripts to the 'scripts' subfolder
2016-06-27 11:51:57 +02:00
Cedric Nugteren
ca386f9883
Added fp16 to the alltuners target
2016-06-27 11:46:33 +02:00
Cedric Nugteren
fdfbc9af13
Changed the symbol for error-code skipped tests to distinguish from succesfull error-code checks in the correctness tests
2016-06-27 11:27:54 +02:00
Cedric Nugteren
8f7131bd90
Increased the verbosity of the '-verbose' option for the correctness tests, now printing when a library is called
2016-06-27 11:16:30 +02:00