Cedric Nugteren
|
08ee57f494
|
Updated the tuning results for the GTX 750 Ti GPU
|
2016-10-10 16:41:41 +02:00 |
|
Cedric Nugteren
|
2194dee217
|
Merge branch 'gemm_direct' into development
|
2016-10-10 16:05:18 +02:00 |
|
Cedric Nugteren
|
7c228f6a67
|
Changed the thresholds for the direct/indirect GEMM kernels for NVIDIA and Intel GPUs
|
2016-10-10 16:01:02 +02:00 |
|
Cedric Nugteren
|
d7cfb6aa9b
|
Added benchmark script for small matrix sizes, testing the direct GEMM kernels
|
2016-10-08 22:05:54 +02:00 |
|
Cedric Nugteren
|
7baac46e72
|
Fixed a performance bug for Intel Iris Pro GPUs due to incorrect tuning results
|
2016-10-08 21:56:06 +02:00 |
|
Cedric Nugteren
|
b698e45478
|
Added first tuning results for the single-kernel direct GEMM implementation
|
2016-10-06 21:13:14 +02:00 |
|
Cedric Nugteren
|
a3e67f2be2
|
Added a kernel selection database to select between the direct and indirect GEMM kernels
|
2016-10-06 19:51:12 +02:00 |
|
Cedric Nugteren
|
8d5747aa54
|
Made non-standard types void-pointers in the Netlib BLAS interface
|
2016-10-05 08:23:54 +02:00 |
|
Cedric Nugteren
|
a17b714c3e
|
Added first version of Netlib BLAS API header
|
2016-10-05 00:09:39 +02:00 |
|
Cedric Nugteren
|
7052a00a3e
|
Fixed a const-correctness issue with complex conjugation in the GEMM direct kernel
|
2016-10-03 20:13:19 +02:00 |
|
Cedric Nugteren
|
ca0c075de2
|
Added functions to load from off-chip to local memory without vector loads for the GEMM direct kernels
|
2016-10-03 20:09:15 +02:00 |
|
Cedric Nugteren
|
c1c4bc5d20
|
Re-organised GEMM direct kernel and added faster fall-back version for incomplete rectangles
|
2016-10-03 19:32:01 +02:00 |
|
Cedric Nugteren
|
243cef73db
|
Set the default number of runs for all kernels to at least 2 runs
|
2016-10-02 21:23:23 +02:00 |
|
Cedric Nugteren
|
d8827e908c
|
Specialised the GEMM direct kernel in four ways for transposing/non-transposing: NN, NT, TN, TT
|
2016-10-02 17:59:05 +02:00 |
|
Cedric Nugteren
|
61f489e370
|
Split the GEMM direct kernel into two files; set the default tuning target to 256-256-256
|
2016-10-02 15:06:59 +02:00 |
|
Cedric Nugteren
|
a459920105
|
Added padding to the local memory of the GEMM direct kernel
|
2016-10-01 16:58:53 +02:00 |
|
Cedric Nugteren
|
ecc704cc76
|
Added default num-runs to the tuner adding averaging over 10 runs as a default for the GEMM direct kernel
|
2016-10-01 16:55:21 +02:00 |
|
Cedric Nugteren
|
a9d35cf04c
|
Merge branch 'development' into gemm_direct
|
2016-10-01 13:45:08 +02:00 |
|
Cedric Nugteren
|
d59e5c570b
|
Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0
|
2016-09-27 21:03:24 +02:00 |
|
Cedric Nugteren
|
db5772e521
|
Updated to version 8.0 of the CLCudaAPI header
|
2016-09-27 20:56:49 +02:00 |
|
Cedric Nugteren
|
adc058440c
|
Fixed the local memory size computation for the GEMM tuners
|
2016-09-27 20:03:55 +02:00 |
|
Cedric Nugteren
|
6178fcd584
|
Now generates test/client/tuner data using a fixed seed to enable reproducability of results
|
2016-09-27 19:55:21 +02:00 |
|
Cedric Nugteren
|
e3076d26cc
|
Added more relaxed error checking for the half-precision tests
|
2016-09-27 19:42:58 +02:00 |
|
Cedric Nugteren
|
a2bfae3c46
|
Merge pull request #103 from dividiti/link_clblas_with_pthread
Link clBLAS together with pthread
|
2016-09-27 08:53:08 +02:00 |
|
Anton Lokhmotov
|
c484bb26b6
|
Use cross-platform thread lib idiom instead of *nix-specific pthread.
|
2016-09-26 21:04:28 +00:00 |
|
Anton Lokhmotov
|
c20a5bb7ca
|
Link clBLAS together with pthread.
|
2016-09-26 10:30:18 +00:00 |
|
Cedric Nugteren
|
73d135c2ce
|
Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter
|
2016-09-25 14:48:34 +02:00 |
|
Cedric Nugteren
|
669f43aed6
|
Separated the tuning parameters of the new direct GEMM kernel from the indirect version
|
2016-09-25 13:52:08 +02:00 |
|
Cedric Nugteren
|
140dc12854
|
Added a first version of the direct version of GEMM with local memory
|
2016-09-25 11:38:35 +02:00 |
|
Cedric Nugteren
|
115af8c78e
|
Updated AppVeyor script to fix an issue with changes in the latest AppVeyor servers
|
2016-09-25 10:44:31 +02:00 |
|
Cedric Nugteren
|
8a5ce05022
|
Fix another issue with the packaging in the AppVeyor script
|
2016-09-25 10:32:12 +02:00 |
|
Cedric Nugteren
|
08abb7dfa4
|
Fix an issue with the packaging in the AppVeyor script
|
2016-09-25 10:20:47 +02:00 |
|
Cedric Nugteren
|
a594067758
|
Updated AppVeyor script to fix an issue with changes in the latest AppVeyor servers
|
2016-09-25 10:10:42 +02:00 |
|
Cedric Nugteren
|
c712fd4cb1
|
Merge pull request #101 from dividiti/add_ref_includes_to_test_correctness_common
Add path to ref library header when building tests.
|
2016-09-24 15:26:08 +02:00 |
|
Anton Lokhmotov
|
750f185ba9
|
Add path to ref library header when building tests.
|
2016-09-24 11:46:34 +00:00 |
|
Cedric Nugteren
|
d595a8ed7e
|
Fixed a bug waiting for an invalid event in case of a non-succesfull CLBlast call in the tests and samples
|
2016-09-22 20:47:22 +02:00 |
|
Cedric Nugteren
|
6aa652d6ea
|
Merge branch 'development' into gemm_direct
|
2016-09-21 21:32:18 +02:00 |
|
Cedric Nugteren
|
b1929d8ce7
|
It is now possible to set the OpenCL compiler options through an environmental variable
|
2016-09-21 21:22:16 +02:00 |
|
Cedric Nugteren
|
63003a1429
|
Merge branch 'master' into development
|
2016-09-21 20:57:23 +02:00 |
|
Cedric Nugteren
|
d13a98272b
|
Merge pull request #100 from gpu/master
Fixed link in README.md
|
2016-09-20 21:47:15 +02:00 |
|
Marco Hutter
|
9b0f6238b3
|
Fixed link in README.md
The GitHub link could be https://github.com/gpu
(without "s"), but the website should be OK, too
|
2016-09-20 18:03:57 +02:00 |
|
Cedric Nugteren
|
f07ac22f5b
|
Merge pull request #99 from CNugteren/development
Update to version 0.9.0
|
2016-09-13 21:14:51 +02:00 |
|
Cedric Nugteren
|
4b94afda94
|
Updated to version 0.9.0
|
2016-09-13 19:20:39 +02:00 |
|
Cedric Nugteren
|
48ab0428cb
|
Renamed the DEFAULT_DEVICE and DEFAULT_PLATFORM env variables to be in line with recent usages of CLBLAST_DEVICE and CLBLAST_PLATFORM
|
2016-09-13 19:08:49 +02:00 |
|
Cedric Nugteren
|
d7305346ca
|
Merge pull request #98 from intelfx/no-ignored-attributes
CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warnings
|
2016-09-13 17:58:12 +02:00 |
|
Ivan Shapovalov
|
9095537a6a
|
CMakeLists.txt: use -Wno-ignored-attributes to silence unfixable warnings
|
2016-09-13 16:12:30 +03:00 |
|
Cedric Nugteren
|
4ce584a014
|
Split the XGEMM kernel further up: now in 3 parts. This is done because MSVC can't handle long strings
|
2016-09-12 22:13:16 +02:00 |
|
Cedric Nugteren
|
9fb7a0efe1
|
Merge branch 'database_rewrite' into development
|
2016-09-12 20:16:18 +02:00 |
|
Cedric Nugteren
|
aa3dffe356
|
Added XgemvFastRot and Xgemm 16-bit tuning results: just defaults which are now automatically taken from 32-bit if there are no entries at all
|
2016-09-12 20:13:38 +02:00 |
|
Cedric Nugteren
|
b5a67f86ec
|
Complete re-write of the database script. Changed Pandas for the much faster and convienient plain JSON/dict data-type
|
2016-09-11 21:29:28 +02:00 |
|