Shehzan Mohammed
|
0d958bf3b3
|
Fixes for static lib compilation on Windows
|
2016-10-14 18:45:34 -04:00 |
|
Cedric Nugteren
|
c0482ace6c
|
Fixed a bug where clblas.h couldn't be found for the performance tests (clients)
|
2016-10-14 22:11:35 +02:00 |
|
Cedric Nugteren
|
0f9311d46a
|
Fixed an issue with a growing database: the database is now a global variable in a namespace and its container uses const-pointers to the actual data
|
2016-10-14 20:56:32 +02:00 |
|
Cedric Nugteren
|
3386ad49c4
|
Set proper flags for the verbose mode (debug flags)
|
2016-10-14 20:54:05 +02:00 |
|
Cedric Nugteren
|
99a620f9a1
|
Merge pull request #112 from shehzan10/static
Add option to build shared or static library
|
2016-10-14 10:06:44 +02:00 |
|
Shehzan Mohammed
|
56f07e42b1
|
Add option to build shared or static library
|
2016-10-13 12:03:44 -04:00 |
|
Cedric Nugteren
|
ebb505b783
|
Added tuning results for Intel HD Graphics IvyBridge GPU
|
2016-10-13 12:18:28 +02:00 |
|
Cedric Nugteren
|
541415374f
|
Merge pull request #108 from CNugteren/msvc2013
Support for Visual Studio 2013
|
2016-10-13 08:34:07 +02:00 |
|
Cedric Nugteren
|
c60f6715f8
|
Removed a spurious #ifdef
|
2016-10-12 21:49:59 +02:00 |
|
Cedric Nugteren
|
ad2b6ecea2
|
Fixed missing line ending
|
2016-10-12 21:10:22 +02:00 |
|
Cedric Nugteren
|
8a9d3cdf37
|
Added support for compiling the library, the client, and the samples under MSVC 2013
|
2016-10-10 22:45:39 +02:00 |
|
Cedric Nugteren
|
f88c50522d
|
Fixed an issue with const members of structs in the database
|
2016-10-10 22:24:05 +02:00 |
|
Cedric Nugteren
|
de77f00e8c
|
Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 and 2015
|
2016-10-10 22:23:33 +02:00 |
|
Cedric Nugteren
|
fcac81bfef
|
First fixes towards compilation on Visual Studio 2013
|
2016-10-10 20:37:45 +02:00 |
|
Cedric Nugteren
|
39afc9543b
|
Changed the storage location of the database to a separate Github repository
|
2016-10-10 19:10:12 +02:00 |
|
Cedric Nugteren
|
71f5c0c145
|
Changed the license to MIT
|
2016-10-10 18:07:17 +02:00 |
|
Cedric Nugteren
|
42ee4abbbc
|
Updated the performance graphs for Intel Iris Pro GPU and AMD Radeon M370X GPU
|
2016-10-10 18:07:05 +02:00 |
|
Cedric Nugteren
|
f563341e7b
|
Added fresh performance graphs for GeForce 750Ti; removed old GTX480 results
|
2016-10-10 16:59:28 +02:00 |
|
Cedric Nugteren
|
08ee57f494
|
Updated the tuning results for the GTX 750 Ti GPU
|
2016-10-10 16:41:41 +02:00 |
|
Cedric Nugteren
|
2194dee217
|
Merge branch 'gemm_direct' into development
|
2016-10-10 16:05:18 +02:00 |
|
Cedric Nugteren
|
7c228f6a67
|
Changed the thresholds for the direct/indirect GEMM kernels for NVIDIA and Intel GPUs
|
2016-10-10 16:01:02 +02:00 |
|
Cedric Nugteren
|
d7cfb6aa9b
|
Added benchmark script for small matrix sizes, testing the direct GEMM kernels
|
2016-10-08 22:05:54 +02:00 |
|
Cedric Nugteren
|
7baac46e72
|
Fixed a performance bug for Intel Iris Pro GPUs due to incorrect tuning results
|
2016-10-08 21:56:06 +02:00 |
|
Cedric Nugteren
|
b698e45478
|
Added first tuning results for the single-kernel direct GEMM implementation
|
2016-10-06 21:13:14 +02:00 |
|
Cedric Nugteren
|
a3e67f2be2
|
Added a kernel selection database to select between the direct and indirect GEMM kernels
|
2016-10-06 19:51:12 +02:00 |
|
Cedric Nugteren
|
7052a00a3e
|
Fixed a const-correctness issue with complex conjugation in the GEMM direct kernel
|
2016-10-03 20:13:19 +02:00 |
|
Cedric Nugteren
|
ca0c075de2
|
Added functions to load from off-chip to local memory without vector loads for the GEMM direct kernels
|
2016-10-03 20:09:15 +02:00 |
|
Cedric Nugteren
|
c1c4bc5d20
|
Re-organised GEMM direct kernel and added faster fall-back version for incomplete rectangles
|
2016-10-03 19:32:01 +02:00 |
|
Cedric Nugteren
|
243cef73db
|
Set the default number of runs for all kernels to at least 2 runs
|
2016-10-02 21:23:23 +02:00 |
|
Cedric Nugteren
|
d8827e908c
|
Specialised the GEMM direct kernel in four ways for transposing/non-transposing: NN, NT, TN, TT
|
2016-10-02 17:59:05 +02:00 |
|
Cedric Nugteren
|
61f489e370
|
Split the GEMM direct kernel into two files; set the default tuning target to 256-256-256
|
2016-10-02 15:06:59 +02:00 |
|
Cedric Nugteren
|
a459920105
|
Added padding to the local memory of the GEMM direct kernel
|
2016-10-01 16:58:53 +02:00 |
|
Cedric Nugteren
|
ecc704cc76
|
Added default num-runs to the tuner adding averaging over 10 runs as a default for the GEMM direct kernel
|
2016-10-01 16:55:21 +02:00 |
|
Cedric Nugteren
|
a9d35cf04c
|
Merge branch 'development' into gemm_direct
|
2016-10-01 13:45:08 +02:00 |
|
Cedric Nugteren
|
d59e5c570b
|
Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0
|
2016-09-27 21:03:24 +02:00 |
|
Cedric Nugteren
|
db5772e521
|
Updated to version 8.0 of the CLCudaAPI header
|
2016-09-27 20:56:49 +02:00 |
|
Cedric Nugteren
|
adc058440c
|
Fixed the local memory size computation for the GEMM tuners
|
2016-09-27 20:03:55 +02:00 |
|
Cedric Nugteren
|
6178fcd584
|
Now generates test/client/tuner data using a fixed seed to enable reproducability of results
|
2016-09-27 19:55:21 +02:00 |
|
Cedric Nugteren
|
e3076d26cc
|
Added more relaxed error checking for the half-precision tests
|
2016-09-27 19:42:58 +02:00 |
|
Cedric Nugteren
|
a2bfae3c46
|
Merge pull request #103 from dividiti/link_clblas_with_pthread
Link clBLAS together with pthread
|
2016-09-27 08:53:08 +02:00 |
|
Anton Lokhmotov
|
c484bb26b6
|
Use cross-platform thread lib idiom instead of *nix-specific pthread.
|
2016-09-26 21:04:28 +00:00 |
|
Anton Lokhmotov
|
c20a5bb7ca
|
Link clBLAS together with pthread.
|
2016-09-26 10:30:18 +00:00 |
|
Cedric Nugteren
|
73d135c2ce
|
Added a first version of a tuner for the GEMM direct kernel; collapsed MWGD, NWGD and KWGD into one WGD parameter
|
2016-09-25 14:48:34 +02:00 |
|
Cedric Nugteren
|
669f43aed6
|
Separated the tuning parameters of the new direct GEMM kernel from the indirect version
|
2016-09-25 13:52:08 +02:00 |
|
Cedric Nugteren
|
140dc12854
|
Added a first version of the direct version of GEMM with local memory
|
2016-09-25 11:38:35 +02:00 |
|
Cedric Nugteren
|
115af8c78e
|
Updated AppVeyor script to fix an issue with changes in the latest AppVeyor servers
|
2016-09-25 10:44:31 +02:00 |
|
Cedric Nugteren
|
8a5ce05022
|
Fix another issue with the packaging in the AppVeyor script
|
2016-09-25 10:32:12 +02:00 |
|
Cedric Nugteren
|
08abb7dfa4
|
Fix an issue with the packaging in the AppVeyor script
|
2016-09-25 10:20:47 +02:00 |
|
Cedric Nugteren
|
a594067758
|
Updated AppVeyor script to fix an issue with changes in the latest AppVeyor servers
|
2016-09-25 10:10:42 +02:00 |
|
Cedric Nugteren
|
c712fd4cb1
|
Merge pull request #101 from dividiti/add_ref_includes_to_test_correctness_common
Add path to ref library header when building tests.
|
2016-09-24 15:26:08 +02:00 |
|