Cedric Nugteren
478fb089d5
Merge pull request #93 from intelfx/test-read-environment
...
test/correctness: read platform and device from environment
2016-08-27 10:16:34 +02:00
Ivan Shapovalov
ea43936e94
test/correctness: read platform and device from environment
...
Support passing environment variables CLBLAST_PLATFORM and CLBLAST_DEVICE
instead of -platform and -device arguments to test executables.
This is for `ctest`.
2016-08-27 05:37:26 +03:00
Cedric Nugteren
8d6a6a5bbf
Merge branch 'database_defaults' into development
2016-08-22 19:31:36 +02:00
Cedric Nugteren
0c0f0ac7f9
Also changed the default-default for unknown device types to use the same method as for known device groups
2016-08-21 20:35:20 +02:00
Cedric Nugteren
84db8958d1
Increased the ratio of GEMM tuning results to explore; reduced the tuning search space to have a better chance to evaluate more likely parameter combinations
2016-08-21 20:28:02 +02:00
Cedric Nugteren
00979faab4
Updated the changelog; refactored the database-get-bests code a bit
2016-08-21 20:16:06 +02:00
Cedric Nugteren
7eeef74338
Merge branch 'development' of github.com:CNugteren/CLBlast into development
...
Conflicts:
README.md
2016-08-20 12:59:21 +02:00
Cedric Nugteren
ce9ba27450
Merge branch 'dvasschemacq-master' into development
2016-08-20 12:51:16 +02:00
Cedric Nugteren
6eca53ee23
Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvasschemacq-master
...
Conflicts:
src/kernels/level1/xaxpy.opencl
src/kernels/level2/xgemv.opencl
src/kernels/level2/xgemv_fast.opencl
src/kernels/level2/xger.opencl
src/kernels/level2/xher.opencl
src/kernels/level2/xher2.opencl
src/kernels/level3/xgemm_part2.opencl
2016-08-20 12:50:31 +02:00
D. Van Assche
57f1aa7685
Adapt opencl files for 1.1 OpenCL
...
In OpenCL 1.1 __kernel has to be before __attribute__, at least with
Vivante compiler.
2016-08-18 17:33:13 +02:00
Cedric Nugteren
7d5631b7e4
Updated the database script to calculate the relative best performance of tuning results common for a device/vendor type
2016-08-15 21:01:07 +02:00
Cedric Nugteren
7da6492b36
Improved the speed of the new common-best defaults method for the database generation
2016-08-09 21:06:04 +02:00
Cedric Nugteren
3f5401d4c8
Added a first version of the database's common-best default calculation
2016-08-07 16:25:38 +02:00
Cedric Nugteren
35623cd98d
Minor update regarding the previous CMake export/install target changes
2016-07-28 20:45:09 +02:00
Cedric Nugteren
c3712f5b36
Merge pull request #86 from intelfx/cmake
...
CMakeLists.txt: provide a find_package() config for dependent projects
2016-07-28 20:17:13 +02:00
Ivan Shapovalov
227374deba
.appveyor.yml: move {OPENCL,CLBLAST}_ROOT out of source tree
...
Reasoning is the same as in previous commit: CMake does not like having
OpenCL header path inside of the source tree. CLBLAST_ROOT is moved for
uniformity.
2016-07-28 19:09:30 +03:00
Ivan Shapovalov
6c11fdc12c
.travis.yml: use OpenCL ICD Loader and headers shipped by distro
...
Using our own headers causes problems with CMake which does not like having
OpenCL header path inside of the source tree. While at it, use distro's
universal OpenCL loader as well.
2016-07-28 19:09:29 +03:00
Ivan Shapovalov
b5d7b58393
CMakeLists.txt: use target_include_directories()
2016-07-28 19:09:29 +03:00
Ivan Shapovalov
570cbcffa7
CMakeLists.txt: provide a find_package() config for dependent projects
2016-07-28 19:09:29 +03:00
Cedric Nugteren
1ec21421d7
Merge branch 'gemv_performance' into development
2016-07-26 20:02:14 +02:00
Cedric Nugteren
de1afe168d
Removed all old tuning results for the XgemvFastRot kernel; re-added for a couple of devices
2016-07-25 22:57:23 +02:00
Cedric Nugteren
2582f0290a
Moved the XgemvFast and XgemvFastRot tuning database into a separate file
2016-07-25 22:43:49 +02:00
Cedric Nugteren
0252df731a
Merge branch 'development' into gemv_performance
2016-07-24 17:06:27 +02:00
Cedric Nugteren
ffa35c623a
Minor improvements after merging in groundwork for custom tuning parameters and kernels
2016-07-24 17:00:21 +02:00
Cedric Nugteren
d4ffa6395e
Merge pull request #84 from intelfx/device-specific-kernels
...
Groundwork for device-specific routines
2016-07-24 16:48:20 +02:00
Cedric Nugteren
622682ffe3
Refactored the Python database script: separated functionality in modules, now complies to the PEP8 style, added proper command-line argument parsing, and cleaned-up
2016-07-24 16:41:01 +02:00
Cedric Nugteren
40a72259eb
Fixe a bug in the new XgemvFastRot kernel related to local memory size
2016-07-23 16:58:11 +02:00
Cedric Nugteren
7a4f963763
Further improvements to the XgemvFastRot kernel, properly enables coalescing now
2016-07-23 14:52:32 +02:00
Cedric Nugteren
75fe8235f7
Improved the XgemvFastRot kernel by tiled loading of the input matrix A, enabling better memory performance
2016-07-23 10:20:11 +02:00
Ivan Shapovalov
e4e1f05079
clblast::Database, clblast::Routine: implement "database overlays" provided by routine implementation
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
ae3299da30
clblast::RunKernel, cl::Kernel: unify variants with/without waitForEvents, support empty LWS
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
5502c5eec4
cl::Kernel: skip NULL entries in waitForEvents
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
2dd5ee3f75
clblast::RunKernel, cl::Kernel: take const vector as waitForEvents
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
1ae71614ac
xgemm: do not hardcode kernel requirements for internal matrix layout
...
Do not hardcode the knowledge about "A and C col-major, B row-major".
This allows for easier reuse of the DoGemm() routine with different
kernels.
2016-07-22 11:15:52 +03:00
Ivan Shapovalov
a1d80e7402
CMakeLists.txt: use ${clblast_SOURCE_DIR} instead of ${CMAKE_SOURCE_DIR}
2016-07-22 11:15:52 +03:00
Cedric Nugteren
b33bec4a59
Fixed some more types and type conversions in the clpp11 interface to OpenCL
2016-07-16 11:13:23 +02:00
Cedric Nugteren
bee9b959f4
Merge pull request #80 from gcp/getdevinfo_fixes
...
Make sure the passed types are large enough.
2016-07-16 10:59:51 +02:00
Cedric Nugteren
066af4069b
Removed an unused variable from the copy-transpose-pad function
2016-07-16 10:56:37 +02:00
Gian-Carlo Pascutto
e0ba59c0ac
Make sure the passed types are large enough.
...
Make sure all out parameters that are passed to functions such
as clGetDeviceInfo are large enough to contain the replies.
2016-07-13 15:59:02 +02:00
Cedric Nugteren
c87e877bf2
Now passing alpha/beta to the kernel as arguments as before fp16 support; in case of fp16 arguments are cast on host and in kernel
2016-07-10 20:32:01 +02:00
Cedric Nugteren
57f09178d8
Added tuning results for AMD Oland and for Intel Graphics HD 530
2016-07-10 11:46:44 +02:00
Cedric Nugteren
39e9b1238f
Fixed a bug related to the cache and retrieval of programs based on the OpenCL context
2016-07-10 11:24:36 +02:00
Cedric Nugteren
9caa7ca5b9
Cache now compares cl_context instead of a pointer to a context; added verbose print statements to the cache
2016-07-08 20:57:58 +02:00
Cedric Nugteren
27854070b4
Added a VERBOSE mode to debug performance: now prints details about compilation and kernel execution to screen
2016-07-06 21:50:12 +02:00
Cedric Nugteren
77325b8974
Added an option to the performance clients to do a warm-up run before timing
2016-07-06 21:25:55 +02:00
CNugteren
2d665099ef
Fixed a linking issue with the tuners on Visual Studio
2016-07-04 19:46:14 +02:00
Cedric Nugteren
9683b50c55
Added tuning results for GTX670, GTX750, and GTX1070 (thanks to gcp)
2016-07-03 20:30:47 +02:00
Cedric Nugteren
4105a79598
Merge pull request #76 from gcp/fix_local_mem_size
...
Fixes clGetKernelWorkGroupInfo to work well with both 32-bit and 64-bit systems
2016-07-03 16:34:44 +02:00
Gian-Carlo Pascutto
7424532859
Ensure clGetKernelWorkGroupInfo return value fits.
...
In LocalMemUsage(), there's a first call to clGetKernelWorkGroupInfo
to get the "bytes" amount needed to store the result from
CL_KERNEL_LOCAL_MEM_SIZE. However, the actual value passed is an
"auto result = size_t", which in 32-bit mode is 4 bytes, regardless
of the previous return value. The spec describes that it will actually
be a cl_ulong which is 8 bytes. To prevent stack corruption, make sure
we are in fact passing a cl_ulong.
Also adjust all callers to take the changed type into account.
2016-07-02 21:14:36 +02:00
Cedric Nugteren
5a690f4e36
Prints the current pandas version and reports the minimum required version
2016-07-02 16:44:13 +02:00