CLBlast: Supported routines overview

This document describes which routines are supported in CLBlast. For other information about CLBlast, see the main README.

Full API documentation is available in a separate API documentation file.

Supported types

The different data-types supported by the library are:

S: Single-precision 32-bit floating-point (float).
D: Double-precision 64-bit floating-point (double).
C: Complex single-precision 2x32-bit floating-point (std::complex<float>).
Z: Complex double-precision 2x64-bit floating-point (std::complex<double>).
H: Half-precision 16-bit floating-point (cl_half). See section 'Half precision' below for more information.

Supported routines

CLBlast supports almost all the Netlib BLAS routines plus a couple of extra non-BLAS routines. The supported BLAS routines are marked with '✔' in the following tables. Routines marked with '-' do not exist: they are not part of BLAS at all.

Level-1	S	D	C	Z	H
xSWAP	✔	✔	✔	✔	✔
xSCAL	✔	✔	✔	✔	✔
xCOPY	✔	✔	✔	✔	✔
xAXPY	✔	✔	✔	✔	✔
xDOT	✔	✔	-	-	✔
xDOTU	-	-	✔	✔	-
xDOTC	-	-	✔	✔	-
xNRM2	✔	✔	✔	✔	✔
xASUM	✔	✔	✔	✔	✔
IxAMAX	✔	✔	✔	✔	✔

Level-2	S	D	C	Z	H
xGEMV	✔	✔	✔	✔	✔
xGBMV	✔	✔	✔	✔	✔
xHEMV	-	-	✔	✔	-
xHBMV	-	-	✔	✔	-
xHPMV	-	-	✔	✔	-
xSYMV	✔	✔	-	-	✔
xSBMV	✔	✔	-	-	✔
xSPMV	✔	✔	-	-	✔
xTRMV	✔	✔	✔	✔	✔
xTBMV	✔	✔	✔	✔	✔
xTPMV	✔	✔	✔	✔	✔
xGER	✔	✔	-	-	✔
xGERU	-	-	✔	✔	-
xGERC	-	-	✔	✔	-
xHER	-	-	✔	✔	-
xHPR	-	-	✔	✔	-
xHER2	-	-	✔	✔	-
xHPR2	-	-	✔	✔	-
xSYR	✔	✔	-	-	✔
xSPR	✔	✔	-	-	✔
xSYR2	✔	✔	-	-	✔
xSPR2	✔	✔	-	-	✔
xTRSV	✔	✔	✔	✔

Level-3	S	D	C	Z	H
xGEMM	✔	✔	✔	✔	✔
xSYMM	✔	✔	✔	✔	✔
xHEMM	-	-	✔	✔	-
xSYRK	✔	✔	✔	✔	✔
xHERK	-	-	✔	✔	-
xSYR2K	✔	✔	✔	✔	✔
xHER2K	-	-	✔	✔	-
xTRMM	✔	✔	✔	✔	✔
xTRSM	✔	✔	✔	✔

Furthermore, there are also batched versions of BLAS routines available, processing multiple smaller computations in one go for better performance:

Batched	S	D	C	Z	H
xAXPYBATCHED	✔	✔	✔	✔	✔
xGEMMBATCHED	✔	✔	✔	✔	✔
xGEMMSTRIDEDBATCHED	✔	✔	✔	✔	✔

In addition, some extra non-BLAS routines are also supported by CLBlast, classified as level-X. They are experimental and should be used with care:

Level-X	S	D	C	Z	H
xSUM	✔	✔	✔	✔	✔
IxAMIN	✔	✔	✔	✔	✔
IxMAX	✔	✔	✔	✔	✔
IxMIN	✔	✔	✔	✔	✔
xHAD	✔	✔	✔	✔	✔
xOMATCOPY	✔	✔	✔	✔	✔
xIM2COL	✔	✔	✔	✔	✔

Some less commonly used BLAS routines are not yet supported yet by CLBlast. They are xROTG, xROTMG, xROT, xROTM, xTBSV, and xTPSV.

Half precision (fp16)

The half-precision fp16 format is a 16-bits floating-point data-type. Some OpenCL devices support the cl_khr_fp16 extension, reducing storage and bandwidth requirements by a factor 2 compared to single-precision floating-point. In case the hardware also accelerates arithmetic on half-precision data-types, this can also greatly improve compute performance of e.g. level-3 routines such as GEMM. Devices which can benefit from this are among others Intel GPUs, ARM Mali GPUs, and NVIDIA's latest Pascal GPUs. Half-precision is in particular interest for the deep-learning community, in which convolutional neural networks can be processed much faster at a minor accuracy loss.

Since there is no half-precision data-type in C or C++, OpenCL provides the cl_half type for the host device. Unfortunately, internally this translates to a 16-bits integer, so computations on the host using this data-type should be avoided. For convenience, CLBlast provides the clblast_half.h header (C99 and C++ compatible), defining the half type as a short-hand to cl_half and the following basic functions:

half FloatToHalf(const float value): Converts a 32-bits floating-point value to a 16-bits floating-point value.
float HalfToFloat(const half value): Converts a 16-bits floating-point value to a 32-bits floating-point value.

The samples/haxpy.c example shows how to use these convenience functions when calling the half-precision BLAS routine HAXPY.

5.7 KiB Raw Blame History

CLBlast: Supported routines overview

Supported types

Supported routines

Half precision (fp16)

5.7 KiB

Raw Blame History