Cedric Nugteren
a9d35cf04c
Merge branch 'development' into gemm_direct
2016-10-01 13:45:08 +02:00
Cedric Nugteren
d59e5c570b
Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0
2016-09-27 21:03:24 +02:00
Cedric Nugteren
db5772e521
Updated to version 8.0 of the CLCudaAPI header
2016-09-27 20:56:49 +02:00
Cedric Nugteren
adc058440c
Fixed the local memory size computation for the GEMM tuners
2016-09-27 20:03:55 +02:00
Cedric Nugteren
6178fcd584
Now generates test/client/tuner data using a fixed seed to enable reproducability of results
2016-09-27 19:55:21 +02:00
Cedric Nugteren
e3076d26cc
Added more relaxed error checking for the half-precision tests
2016-09-27 19:42:58 +02:00
Cedric Nugteren
a2bfae3c46
Merge pull request #103 from dividiti/link_clblas_with_pthread
...
Link clBLAS together with pthread
2016-09-27 08:53:08 +02:00
Anton Lokhmotov
c484bb26b6
Use cross-platform thread lib idiom instead of *nix-specific pthread.
2016-09-26 21:04:28 +00:00
Anton Lokhmotov
c20a5bb7ca
Link clBLAS together with pthread.
2016-09-26 10:30:18 +00:00
Cedric Nugteren
73d135c2ce
Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter
2016-09-25 14:48:34 +02:00
Cedric Nugteren
669f43aed6
Separated the tuning parameters of the new direct GEMM kernel from the indirect version
2016-09-25 13:52:08 +02:00
Cedric Nugteren
140dc12854
Added a first version of the direct version of GEMM with local memory
2016-09-25 11:38:35 +02:00
Cedric Nugteren
115af8c78e
Updated AppVeyor script to fix an issue with changes in the latest AppVeyor servers
2016-09-25 10:44:31 +02:00
Cedric Nugteren
8a5ce05022
Fix another issue with the packaging in the AppVeyor script
2016-09-25 10:32:12 +02:00
Cedric Nugteren
08abb7dfa4
Fix an issue with the packaging in the AppVeyor script
2016-09-25 10:20:47 +02:00
Cedric Nugteren
a594067758
Updated AppVeyor script to fix an issue with changes in the latest AppVeyor servers
2016-09-25 10:10:42 +02:00
Cedric Nugteren
c712fd4cb1
Merge pull request #101 from dividiti/add_ref_includes_to_test_correctness_common
...
Add path to ref library header when building tests.
2016-09-24 15:26:08 +02:00
Anton Lokhmotov
750f185ba9
Add path to ref library header when building tests.
2016-09-24 11:46:34 +00:00
Cedric Nugteren
d595a8ed7e
Fixed a bug waiting for an invalid event in case of a non-succesfull CLBlast call in the tests and samples
2016-09-22 20:47:22 +02:00
Cedric Nugteren
6aa652d6ea
Merge branch 'development' into gemm_direct
2016-09-21 21:32:18 +02:00
Cedric Nugteren
b1929d8ce7
It is now possible to set the OpenCL compiler options through an environmental variable
2016-09-21 21:22:16 +02:00
Cedric Nugteren
63003a1429
Merge branch 'master' into development
2016-09-21 20:57:23 +02:00
Cedric Nugteren
d13a98272b
Merge pull request #100 from gpu/master
...
Fixed link in README.md
2016-09-20 21:47:15 +02:00
Marco Hutter
9b0f6238b3
Fixed link in README.md
...
The GitHub link could be https://github.com/gpu
(without "s"), but the website should be OK, too
2016-09-20 18:03:57 +02:00
Cedric Nugteren
f07ac22f5b
Merge pull request #99 from CNugteren/development
...
Update to version 0.9.0
2016-09-13 21:14:51 +02:00
Cedric Nugteren
4b94afda94
Updated to version 0.9.0
2016-09-13 19:20:39 +02:00
Cedric Nugteren
48ab0428cb
Renamed the DEFAULT_DEVICE and DEFAULT_PLATFORM env variables to be in line with recent usages of CLBLAST_DEVICE and CLBLAST_PLATFORM
2016-09-13 19:08:49 +02:00
Cedric Nugteren
d7305346ca
Merge pull request #98 from intelfx/no-ignored-attributes
...
CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warnings
2016-09-13 17:58:12 +02:00
Ivan Shapovalov
9095537a6a
CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warnings
2016-09-13 16:12:30 +03:00
Cedric Nugteren
4ce584a014
Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC can't handle long strings
2016-09-12 22:13:16 +02:00
Cedric Nugteren
9fb7a0efe1
Merge branch 'database_rewrite' into development
2016-09-12 20:16:18 +02:00
Cedric Nugteren
aa3dffe356
Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are now automatically taken from 32-bit if there are no entries at all
2016-09-12 20:13:38 +02:00
Cedric Nugteren
b5a67f86ec
Complete re-write of the database script. Changed Pandas for the much faster and convienient plain JSON/dict data-type
2016-09-11 21:29:28 +02:00
Cedric Nugteren
94163970ae
Merge branch 'xgemm_tuner_exhaustive' into development
2016-09-10 14:01:21 +02:00
Cedric Nugteren
e21f32bc99
Updated database based on exhaustive tuning results for GEMM for the R9 M370X GPU
2016-09-10 14:00:43 +02:00
Cedric Nugteren
3daba70997
Updated the database script to remove duplicate entries: keeps only the best-performing cases for a specific parameters combination
2016-09-10 11:12:09 +02:00
Cedric Nugteren
55038d3c91
Split GEMM tuning in two parts: a small set of tuning parameters which is explored exhaustively and a larger set which is explored randomly
2016-09-06 20:30:06 +02:00
Cedric Nugteren
a2f8350703
Refactored the Python C++ generator script; now confirms to the PEP8 styleguide
2016-09-04 21:26:30 +02:00
Cedric Nugteren
b30b26b89e
The GEMM kernel no longer adds beta*C in case beta is zero; this would cause problems if C contains NaNs
2016-09-04 17:21:16 +02:00
Cedric Nugteren
521bf6cdfc
Added tuning results for Intel Broadwell 5500 GT2 GPU
2016-09-03 16:43:23 +02:00
Cedric Nugteren
19574b2519
Updated tuning results for Haswell GT2 Mobile GPU; fixed database script to handle duplicate entries of different runs
2016-09-03 12:45:11 +02:00
Cedric Nugteren
478fb089d5
Merge pull request #93 from intelfx/test-read-environment
...
test/correctness: read platform and device from environment
2016-08-27 10:16:34 +02:00
Ivan Shapovalov
ea43936e94
test/correctness: read platform and device from environment
...
Support passing environment variables CLBLAST_PLATFORM and CLBLAST_DEVICE
instead of -platform and -device arguments to test executables.
This is for `ctest`.
2016-08-27 05:37:26 +03:00
Cedric Nugteren
8d6a6a5bbf
Merge branch 'database_defaults' into development
2016-08-22 19:31:36 +02:00
Cedric Nugteren
0c0f0ac7f9
Also changed the default-default for unknown device types to use the same method as for known device groups
2016-08-21 20:35:20 +02:00
Cedric Nugteren
84db8958d1
Increased the ratio of GEMM tuning results to explore; reduced the tuning search space to have a better chance to evaluate more likely parameter combinations
2016-08-21 20:28:02 +02:00
Cedric Nugteren
00979faab4
Updated the changelog; refactored the database-get-bests code a bit
2016-08-21 20:16:06 +02:00
Cedric Nugteren
7eeef74338
Merge branch 'development' of github.com:CNugteren/CLBlast into development
...
Conflicts:
README.md
2016-08-20 12:59:21 +02:00
Cedric Nugteren
ce9ba27450
Merge branch 'dvasschemacq-master' into development
2016-08-20 12:51:16 +02:00
Cedric Nugteren
6eca53ee23
Merge branch 'master' of https://github.com/dvasschemacq/CLBlast into dvasschemacq-master
...
Conflicts:
src/kernels/level1/xaxpy.opencl
src/kernels/level2/xgemv.opencl
src/kernels/level2/xgemv_fast.opencl
src/kernels/level2/xger.opencl
src/kernels/level2/xher.opencl
src/kernels/level2/xher2.opencl
src/kernels/level3/xgemm_part2.opencl
2016-08-20 12:50:31 +02:00