Ivan Shapovalov
|
106565fa9a
|
src/clpp11.hpp: reinstate error checking on clGetEventProfilingInfo()
|
2016-10-22 07:25:15 +03:00 |
|
Cedric Nugteren
|
597974b40d
|
Merge pull request #118 from matze/add-pkg-config
Generate and install pkg-config description
|
2016-10-21 21:00:07 +02:00 |
|
Cedric Nugteren
|
370105148b
|
Now properly sets the Apache 2.0 license such that GitHub recognises it
|
2016-10-21 20:23:59 +02:00 |
|
Matthias Vogelgesang
|
3797d144cc
|
Generate and install pkg-config description
|
2016-10-21 09:38:25 +02:00 |
|
Cedric Nugteren
|
c8d0e41e84
|
Added the possibility to supply the env-variable CLBLAST_TEST_ARGUMENTS to specify options for the make alltest or ctest targets
|
2016-10-20 23:05:16 +02:00 |
|
Cedric Nugteren
|
d0b8ca9fba
|
Fixed compilation issues of the testers for Visual Studio 2013: mostly conversions of class constants to static
|
2016-10-18 10:19:03 +02:00 |
|
Cedric Nugteren
|
9331442a56
|
Merge branch 'development' into netlib_blas_api
|
2016-10-16 11:43:03 +02:00 |
|
Cedric Nugteren
|
53deed298f
|
Added documentation and minor refactoring for the recent support of static library compilation
|
2016-10-15 17:11:08 +02:00 |
|
Cedric Nugteren
|
a63f57297b
|
Merge pull request #115 from shehzan10/development
Fixes for static lib compilation on Windows
|
2016-10-15 09:48:03 +02:00 |
|
Shehzan Mohammed
|
0d958bf3b3
|
Fixes for static lib compilation on Windows
|
2016-10-14 18:45:34 -04:00 |
|
Cedric Nugteren
|
c0482ace6c
|
Fixed a bug where clblas.h couldn't be found for the performance tests (clients)
|
2016-10-14 22:11:35 +02:00 |
|
Cedric Nugteren
|
0f9311d46a
|
Fixed an issue with a growing database: the database is now a global variable in a namespace and its container uses const-pointers to the actual data
|
2016-10-14 20:56:32 +02:00 |
|
Cedric Nugteren
|
3386ad49c4
|
Set proper flags for the verbose mode (debug flags)
|
2016-10-14 20:54:05 +02:00 |
|
Cedric Nugteren
|
99a620f9a1
|
Merge pull request #112 from shehzan10/static
Add option to build shared or static library
|
2016-10-14 10:06:44 +02:00 |
|
Shehzan Mohammed
|
56f07e42b1
|
Add option to build shared or static library
|
2016-10-13 12:03:44 -04:00 |
|
Cedric Nugteren
|
ebb505b783
|
Added tuning results for Intel HD Graphics IvyBridge GPU
|
2016-10-13 12:18:28 +02:00 |
|
Cedric Nugteren
|
541415374f
|
Merge pull request #108 from CNugteren/msvc2013
Support for Visual Studio 2013
|
2016-10-13 08:34:07 +02:00 |
|
Cedric Nugteren
|
c60f6715f8
|
Removed a spurious #ifdef
|
2016-10-12 21:49:59 +02:00 |
|
Cedric Nugteren
|
ad2b6ecea2
|
Fixed missing line ending
|
2016-10-12 21:10:22 +02:00 |
|
Cedric Nugteren
|
8a9d3cdf37
|
Added support for compiling the library, the client, and the samples under MSVC 2013
|
2016-10-10 22:45:39 +02:00 |
|
Cedric Nugteren
|
f88c50522d
|
Fixed an issue with const members of structs in the database
|
2016-10-10 22:24:05 +02:00 |
|
Cedric Nugteren
|
de77f00e8c
|
Fixed an issue with the length of the GEMM OpenCL string for both MSVC 2013 and 2015
|
2016-10-10 22:23:33 +02:00 |
|
Cedric Nugteren
|
fcac81bfef
|
First fixes towards compilation on Visual Studio 2013
|
2016-10-10 20:37:45 +02:00 |
|
Cedric Nugteren
|
39afc9543b
|
Changed the storage location of the database to a separate Github repository
|
2016-10-10 19:10:12 +02:00 |
|
Cedric Nugteren
|
71f5c0c145
|
Changed the license to MIT
|
2016-10-10 18:07:17 +02:00 |
|
Cedric Nugteren
|
42ee4abbbc
|
Updated the performance graphs for Intel Iris Pro GPU and AMD Radeon M370X GPU
|
2016-10-10 18:07:05 +02:00 |
|
Cedric Nugteren
|
f563341e7b
|
Added fresh performance graphs for GeForce 750Ti; removed old GTX480 results
|
2016-10-10 16:59:28 +02:00 |
|
Cedric Nugteren
|
08ee57f494
|
Updated the tuning results for the GTX 750 Ti GPU
|
2016-10-10 16:41:41 +02:00 |
|
Cedric Nugteren
|
2194dee217
|
Merge branch 'gemm_direct' into development
|
2016-10-10 16:05:18 +02:00 |
|
Cedric Nugteren
|
7c228f6a67
|
Changed the thresholds for the direct/indirect GEMM kernels for NVIDIA and Intel GPUs
|
2016-10-10 16:01:02 +02:00 |
|
Cedric Nugteren
|
d7cfb6aa9b
|
Added benchmark script for small matrix sizes, testing the direct GEMM kernels
|
2016-10-08 22:05:54 +02:00 |
|
Cedric Nugteren
|
7baac46e72
|
Fixed a performance bug for Intel Iris Pro GPUs due to incorrect tuning results
|
2016-10-08 21:56:06 +02:00 |
|
Cedric Nugteren
|
b698e45478
|
Added first tuning results for the single-kernel direct GEMM implementation
|
2016-10-06 21:13:14 +02:00 |
|
Cedric Nugteren
|
a3e67f2be2
|
Added a kernel selection database to select between the direct and indirect GEMM kernels
|
2016-10-06 19:51:12 +02:00 |
|
Cedric Nugteren
|
8d5747aa54
|
Made non-standard types void-pointers in the Netlib BLAS interface
|
2016-10-05 08:23:54 +02:00 |
|
Cedric Nugteren
|
a17b714c3e
|
Added first version of Netlib BLAS API header
|
2016-10-05 00:09:39 +02:00 |
|
Cedric Nugteren
|
7052a00a3e
|
Fixed a const-correctness issue with complex conjugation in the GEMM direct kernel
|
2016-10-03 20:13:19 +02:00 |
|
Cedric Nugteren
|
ca0c075de2
|
Added functions to load from off-chip to local memory without vector loads for the GEMM direct kernels
|
2016-10-03 20:09:15 +02:00 |
|
Cedric Nugteren
|
c1c4bc5d20
|
Re-organised GEMM direct kernel and added faster fall-back version for incomplete rectangles
|
2016-10-03 19:32:01 +02:00 |
|
Cedric Nugteren
|
243cef73db
|
Set the default number of runs for all kernels to at least 2 runs
|
2016-10-02 21:23:23 +02:00 |
|
Cedric Nugteren
|
d8827e908c
|
Specialised the GEMM direct kernel in four ways for transposing/non-transposing: NN, NT, TN, TT
|
2016-10-02 17:59:05 +02:00 |
|
Cedric Nugteren
|
61f489e370
|
Split the GEMM direct kernel into two files; set the default tuning target to 256-256-256
|
2016-10-02 15:06:59 +02:00 |
|
Cedric Nugteren
|
a459920105
|
Added padding to the local memory of the GEMM direct kernel
|
2016-10-01 16:58:53 +02:00 |
|
Cedric Nugteren
|
ecc704cc76
|
Added default num-runs to the tuner adding averaging over 10 runs as a default for the GEMM direct kernel
|
2016-10-01 16:55:21 +02:00 |
|
Cedric Nugteren
|
a9d35cf04c
|
Merge branch 'development' into gemm_direct
|
2016-10-01 13:45:08 +02:00 |
|
Cedric Nugteren
|
d59e5c570b
|
Added an option to run tuned kernels multiple times to average execution times; requires CLTune 2.5.0
|
2016-09-27 21:03:24 +02:00 |
|
Cedric Nugteren
|
db5772e521
|
Updated to version 8.0 of the CLCudaAPI header
|
2016-09-27 20:56:49 +02:00 |
|
Cedric Nugteren
|
adc058440c
|
Fixed the local memory size computation for the GEMM tuners
|
2016-09-27 20:03:55 +02:00 |
|
Cedric Nugteren
|
6178fcd584
|
Now generates test/client/tuner data using a fixed seed to enable reproducability of results
|
2016-09-27 19:55:21 +02:00 |
|
Cedric Nugteren
|
e3076d26cc
|
Added more relaxed error checking for the half-precision tests
|
2016-09-27 19:42:58 +02:00 |
|