Added CUDA API documentation

pull/204/head
Cedric Nugteren 2017-10-16 21:54:23 +02:00
parent 0719f14486
commit 03760f80eb
2 changed files with 17 additions and 1 deletions

View File

@ -1,5 +1,9 @@
Development (next version)
- Added a CUDA API to CLBlast:
* The library and kernels can be compiled with the CUDA driver API and NVRTC (requires CUDA 7.5)
* Two CUDA API sample programs are added: SGEMM and DAXPY
* All correctness tests and performance clients work on CUDA like they did for OpenCL
- Kernels are now cached based on their tuning parameters: fits the use-case of 'OverrideParameters'
- Improved performance for small GEMM problems by going from 3 to 1 optional temporary buffers
- Various minor fixes and enhancements

View File

@ -99,11 +99,23 @@ To get started quickly, a couple of stand-alone example programs are included in
cmake -DSAMPLES=ON ..
For all of CLBlast's APIs, it is possible to optionally set an OS environmental variable `CLBLAST_BUILD_OPTIONS` to pass specific build options to the OpenCL compiler.
Using the library (Netlib API)
-------------
There is also a Netlib CBLAS C API available. This is however not recommended for full control over performance, since at every call it will copy all buffers to and from the OpenCL device. Especially for level 1 and level 2 BLAS functions performance will be impacted severely. However, it can be useful if you don't want to touch OpenCL at all. You can set the default device and platform by setting the `CLBLAST_DEVICE` and `CLBLAST_PLATFORM` environmental variables. This API can be used as follows after providing the `-DNETLIB=ON` flag to CMake:
#include <clblast_netlib_c.h>
For all of CLBlast's APIs, it is possible to optionally set an OS environmental variable `CLBLAST_BUILD_OPTIONS` to pass specific build options to the OpenCL compiler.
Using the library (CUDA API)
-------------
There is also a CUDA API of CLBlast available. Enabling this compiles the whole library for CUDA and thus replaces the OpenCL API. It is based upon the CUDA runtime and NVRTC APIs, requiring NVIDIA CUDA 7.5 or higher. The CUDA version of the library can be used as follows after providing the `-DCUDA=ON -DOPENCL=OFF` flags to CMake:
#include <clblast_cuda.h>
Using the tuners (optional)