*`cl_mem asum_buffer`: OpenCL buffer to store the output asum vector.
*`const size_t asum_offset`: The offset in elements from the start of the output asum vector.
*`cl_mem asum_buffer`: OpenCL buffer to store the output asum vector.
*`const size_t asum_offset`: The offset in elements from the start of the output asum vector.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xSUM: Sum of values in a vector (non-BLAS function)
-------------
Accumulates the values of each element in the x vector. The results are stored in the sum buffer. This routine is the non-absolute version of the xASUM BLAS routine.
*`cl_mem imax_buffer`: OpenCL buffer to store the output imax vector.
*`const size_t imax_offset`: The offset in elements from the start of the output imax vector.
*`cl_mem imax_buffer`: OpenCL buffer to store the output imax vector.
*`const size_t imax_offset`: The offset in elements from the start of the output imax vector.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xMAX: Index of maximum value in a vector (non-BLAS function)
-------------
Finds the index of the maximum of the values in the x vector. The resulting integer index is stored in the imax buffer. This routine is the non-absolute version of the IxAMAX BLAS routine.
*`cl_mem imax_buffer`: OpenCL buffer to store the output imax vector.
*`const size_t imax_offset`: The offset in elements from the start of the output imax vector.
*`cl_mem imax_buffer`: OpenCL buffer to store the output imax vector.
*`const size_t imax_offset`: The offset in elements from the start of the output imax vector.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xMIN: Index of minimum value in a vector (non-BLAS function)
-------------
Finds the index of the minimum of the values in the x vector. The resulting integer index is stored in the imin buffer. This routine is the non-absolute minimum version of the IxAMAX BLAS routine.
*`cl_mem imin_buffer`: OpenCL buffer to store the output imin vector.
*`const size_t imin_offset`: The offset in elements from the start of the output imin vector.
*`cl_mem imin_buffer`: OpenCL buffer to store the output imin vector.
*`const size_t imin_offset`: The offset in elements from the start of the output imin vector.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xGEMV: General matrix-vector multiplication
-------------
Performs the operation y = alpha * A * x + beta * y, in which x is an input vector, y is an input and output vector, A is an input matrix, and alpha and beta are scalars. The matrix A can optionally be transposed before performing the operation.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Transpose`: Transposing the input matrix A, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const size_t m`: Integer size argument.
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const T beta`: Input scalar constant.
*`cl_mem y_buffer`: OpenCL buffer to store the output y vector.
*`const size_t y_offset`: The offset in elements from the start of the output y vector.
*`const size_t y_inc`: Stride/increment of the output y vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xGBMV: General banded matrix-vector multiplication
-------------
Same operation as xGEMV, but matrix A is banded instead.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Transpose`: Transposing the input matrix A, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const size_t m`: Integer size argument.
*`const size_t n`: Integer size argument.
*`const size_t kl`: Integer size argument.
*`const size_t ku`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const T beta`: Input scalar constant.
*`cl_mem y_buffer`: OpenCL buffer to store the output y vector.
*`const size_t y_offset`: The offset in elements from the start of the output y vector.
*`const size_t y_inc`: Stride/increment of the output y vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xHEMV: Hermitian matrix-vector multiplication
-------------
Same operation as xGEMV, but matrix A is an Hermitian matrix instead.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const T beta`: Input scalar constant.
*`cl_mem y_buffer`: OpenCL buffer to store the output y vector.
*`const size_t y_offset`: The offset in elements from the start of the output y vector.
*`const size_t y_inc`: Stride/increment of the output y vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const size_t k`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const T beta`: Input scalar constant.
*`cl_mem y_buffer`: OpenCL buffer to store the output y vector.
*`const size_t y_offset`: The offset in elements from the start of the output y vector.
*`const size_t y_inc`: Stride/increment of the output y vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem ap_buffer`: OpenCL buffer to store the input AP matrix.
*`const size_t ap_offset`: The offset in elements from the start of the input AP matrix.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const T beta`: Input scalar constant.
*`cl_mem y_buffer`: OpenCL buffer to store the output y vector.
*`const size_t y_offset`: The offset in elements from the start of the output y vector.
*`const size_t y_inc`: Stride/increment of the output y vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xSYMV: Symmetric matrix-vector multiplication
-------------
Same operation as xGEMV, but matrix A is symmetric instead.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const T beta`: Input scalar constant.
*`cl_mem y_buffer`: OpenCL buffer to store the output y vector.
*`const size_t y_offset`: The offset in elements from the start of the output y vector.
*`const size_t y_inc`: Stride/increment of the output y vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const size_t k`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const T beta`: Input scalar constant.
*`cl_mem y_buffer`: OpenCL buffer to store the output y vector.
*`const size_t y_offset`: The offset in elements from the start of the output y vector.
*`const size_t y_inc`: Stride/increment of the output y vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem ap_buffer`: OpenCL buffer to store the input AP matrix.
*`const size_t ap_offset`: The offset in elements from the start of the input AP matrix.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const T beta`: Input scalar constant.
*`cl_mem y_buffer`: OpenCL buffer to store the output y vector.
*`const size_t y_offset`: The offset in elements from the start of the output y vector.
*`const size_t y_inc`: Stride/increment of the output y vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xTRMV: Triangular matrix-vector multiplication
-------------
Same operation as xGEMV, but matrix A is triangular instead.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const Transpose`: Transposing the input matrix A, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const Diagonal`: The property of the diagonal matrix, either `Diagonal::kNonUnit` (131) for a non-unit values on the diagonal or `Diagonal::kUnit` (132) for a unit values on the diagonal.
*`const size_t n`: Integer size argument.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`cl_mem x_buffer`: OpenCL buffer to store the output x vector.
*`const size_t x_offset`: The offset in elements from the start of the output x vector.
*`const size_t x_inc`: Stride/increment of the output x vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const Transpose`: Transposing the input matrix A, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const Diagonal`: The property of the diagonal matrix, either `Diagonal::kNonUnit` (131) for a non-unit values on the diagonal or `Diagonal::kUnit` (132) for a unit values on the diagonal.
*`const size_t n`: Integer size argument.
*`const size_t k`: Integer size argument.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`cl_mem x_buffer`: OpenCL buffer to store the output x vector.
*`const size_t x_offset`: The offset in elements from the start of the output x vector.
*`const size_t x_inc`: Stride/increment of the output x vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const Transpose`: Transposing the input matrix A, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const Diagonal`: The property of the diagonal matrix, either `Diagonal::kNonUnit` (131) for a non-unit values on the diagonal or `Diagonal::kUnit` (132) for a unit values on the diagonal.
*`const size_t n`: Integer size argument.
*`const cl_mem ap_buffer`: OpenCL buffer to store the input AP matrix.
*`const size_t ap_offset`: The offset in elements from the start of the input AP matrix.
*`cl_mem x_buffer`: OpenCL buffer to store the output x vector.
*`const size_t x_offset`: The offset in elements from the start of the output x vector.
*`const size_t x_inc`: Stride/increment of the output x vector.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const size_t m`: Integer size argument.
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const cl_mem y_buffer`: OpenCL buffer to store the input y vector.
*`const size_t y_offset`: The offset in elements from the start of the input y vector.
*`const size_t y_inc`: Stride/increment of the input y vector.
*`cl_mem a_buffer`: OpenCL buffer to store the output A matrix.
*`const size_t a_offset`: The offset in elements from the start of the output A matrix.
*`const size_t a_ld`: Leading dimension of the output A matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const size_t m`: Integer size argument.
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const cl_mem y_buffer`: OpenCL buffer to store the input y vector.
*`const size_t y_offset`: The offset in elements from the start of the input y vector.
*`const size_t y_inc`: Stride/increment of the input y vector.
*`cl_mem a_buffer`: OpenCL buffer to store the output A matrix.
*`const size_t a_offset`: The offset in elements from the start of the output A matrix.
*`const size_t a_ld`: Leading dimension of the output A matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xGERC: General rank-1 complex conjugated matrix update
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const size_t m`: Integer size argument.
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const cl_mem y_buffer`: OpenCL buffer to store the input y vector.
*`const size_t y_offset`: The offset in elements from the start of the input y vector.
*`const size_t y_inc`: Stride/increment of the input y vector.
*`cl_mem a_buffer`: OpenCL buffer to store the output A matrix.
*`const size_t a_offset`: The offset in elements from the start of the output A matrix.
*`const size_t a_ld`: Leading dimension of the output A matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`cl_mem a_buffer`: OpenCL buffer to store the output A matrix.
*`const size_t a_offset`: The offset in elements from the start of the output A matrix.
*`const size_t a_ld`: Leading dimension of the output A matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`cl_mem ap_buffer`: OpenCL buffer to store the output AP matrix.
*`const size_t ap_offset`: The offset in elements from the start of the output AP matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const cl_mem y_buffer`: OpenCL buffer to store the input y vector.
*`const size_t y_offset`: The offset in elements from the start of the input y vector.
*`const size_t y_inc`: Stride/increment of the input y vector.
*`cl_mem a_buffer`: OpenCL buffer to store the output A matrix.
*`const size_t a_offset`: The offset in elements from the start of the output A matrix.
*`const size_t a_ld`: Leading dimension of the output A matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const cl_mem y_buffer`: OpenCL buffer to store the input y vector.
*`const size_t y_offset`: The offset in elements from the start of the input y vector.
*`const size_t y_inc`: Stride/increment of the input y vector.
*`cl_mem ap_buffer`: OpenCL buffer to store the output AP matrix.
*`const size_t ap_offset`: The offset in elements from the start of the output AP matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`cl_mem a_buffer`: OpenCL buffer to store the output A matrix.
*`const size_t a_offset`: The offset in elements from the start of the output A matrix.
*`const size_t a_ld`: Leading dimension of the output A matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`cl_mem ap_buffer`: OpenCL buffer to store the output AP matrix.
*`const size_t ap_offset`: The offset in elements from the start of the output AP matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const cl_mem y_buffer`: OpenCL buffer to store the input y vector.
*`const size_t y_offset`: The offset in elements from the start of the input y vector.
*`const size_t y_inc`: Stride/increment of the input y vector.
*`cl_mem a_buffer`: OpenCL buffer to store the output A matrix.
*`const size_t a_offset`: The offset in elements from the start of the output A matrix.
*`const size_t a_ld`: Leading dimension of the output A matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem x_buffer`: OpenCL buffer to store the input x vector.
*`const size_t x_offset`: The offset in elements from the start of the input x vector.
*`const size_t x_inc`: Stride/increment of the input x vector.
*`const cl_mem y_buffer`: OpenCL buffer to store the input y vector.
*`const size_t y_offset`: The offset in elements from the start of the input y vector.
*`const size_t y_inc`: Stride/increment of the input y vector.
*`cl_mem ap_buffer`: OpenCL buffer to store the output AP matrix.
*`const size_t ap_offset`: The offset in elements from the start of the output AP matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Transpose`: Transposing the input matrix A, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const Transpose`: Transposing the input matrix B, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const size_t m`: Integer size argument.
*`const size_t n`: Integer size argument.
*`const size_t k`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem b_buffer`: OpenCL buffer to store the input B matrix.
*`const size_t b_offset`: The offset in elements from the start of the input B matrix.
*`const size_t b_ld`: Leading dimension of the input B matrix.
*`const T beta`: Input scalar constant.
*`cl_mem c_buffer`: OpenCL buffer to store the output C matrix.
*`const size_t c_offset`: The offset in elements from the start of the output C matrix.
*`const size_t c_ld`: Leading dimension of the output C matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xSYMM: Symmetric matrix-matrix multiplication
-------------
C++ API:
```
template <typenameT>
StatusCode Symm(const Layout layout, const Side side, const Triangle triangle,
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Side`: The horizontal position of the triangular matrix, either `Side::kLeft` (141) or `Side::kRight` (142).
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t m`: Integer size argument.
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem b_buffer`: OpenCL buffer to store the input B matrix.
*`const size_t b_offset`: The offset in elements from the start of the input B matrix.
*`const size_t b_ld`: Leading dimension of the input B matrix.
*`const T beta`: Input scalar constant.
*`cl_mem c_buffer`: OpenCL buffer to store the output C matrix.
*`const size_t c_offset`: The offset in elements from the start of the output C matrix.
*`const size_t c_ld`: Leading dimension of the output C matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
xHEMM: Hermitian matrix-matrix multiplication
-------------
C++ API:
```
template <typenameT>
StatusCode Hemm(const Layout layout, const Side side, const Triangle triangle,
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Side`: The horizontal position of the triangular matrix, either `Side::kLeft` (141) or `Side::kRight` (142).
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const size_t m`: Integer size argument.
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem b_buffer`: OpenCL buffer to store the input B matrix.
*`const size_t b_offset`: The offset in elements from the start of the input B matrix.
*`const size_t b_ld`: Leading dimension of the input B matrix.
*`const T beta`: Input scalar constant.
*`cl_mem c_buffer`: OpenCL buffer to store the output C matrix.
*`const size_t c_offset`: The offset in elements from the start of the output C matrix.
*`const size_t c_ld`: Leading dimension of the output C matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const Transpose`: Transposing the input matrix A, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const size_t n`: Integer size argument.
*`const size_t k`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const T beta`: Input scalar constant.
*`cl_mem c_buffer`: OpenCL buffer to store the output C matrix.
*`const size_t c_offset`: The offset in elements from the start of the output C matrix.
*`const size_t c_ld`: Leading dimension of the output C matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const Transpose`: Transposing the input matrix A, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const size_t n`: Integer size argument.
*`const size_t k`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const T beta`: Input scalar constant.
*`cl_mem c_buffer`: OpenCL buffer to store the output C matrix.
*`const size_t c_offset`: The offset in elements from the start of the output C matrix.
*`const size_t c_ld`: Leading dimension of the output C matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const Transpose`: Transposing the packed input matrix AP, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const size_t n`: Integer size argument.
*`const size_t k`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem b_buffer`: OpenCL buffer to store the input B matrix.
*`const size_t b_offset`: The offset in elements from the start of the input B matrix.
*`const size_t b_ld`: Leading dimension of the input B matrix.
*`const T beta`: Input scalar constant.
*`cl_mem c_buffer`: OpenCL buffer to store the output C matrix.
*`const size_t c_offset`: The offset in elements from the start of the output C matrix.
*`const size_t c_ld`: Leading dimension of the output C matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const Transpose`: Transposing the packed input matrix AP, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const size_t n`: Integer size argument.
*`const size_t k`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`const cl_mem b_buffer`: OpenCL buffer to store the input B matrix.
*`const size_t b_offset`: The offset in elements from the start of the input B matrix.
*`const size_t b_ld`: Leading dimension of the input B matrix.
*`const U beta`: Input scalar constant.
*`cl_mem c_buffer`: OpenCL buffer to store the output C matrix.
*`const size_t c_offset`: The offset in elements from the start of the output C matrix.
*`const size_t c_ld`: Leading dimension of the output C matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.
*`const Layout`: Data-layout of the matrices, either `Layout::kRowMajor` (101) for row-major layout or `Layout::kColMajor` (102) for column-major data-layout.
*`const Side`: The horizontal position of the triangular matrix, either `Side::kLeft` (141) or `Side::kRight` (142).
*`const Triangle`: The vertical position of the triangular matrix, either `Triangle::kUpper` (121) or `Triangle::kLower` (122).
*`const Transpose`: Transposing the input matrix A, either `Transpose::kNo` (111), `Transpose::kYes` (112), or `Transpose::kConjugate` (113) for a complex-conjugate transpose.
*`const Diagonal`: The property of the diagonal matrix, either `Diagonal::kNonUnit` (131) for a non-unit values on the diagonal or `Diagonal::kUnit` (132) for a unit values on the diagonal.
*`const size_t m`: Integer size argument.
*`const size_t n`: Integer size argument.
*`const T alpha`: Input scalar constant.
*`const cl_mem a_buffer`: OpenCL buffer to store the input A matrix.
*`const size_t a_offset`: The offset in elements from the start of the input A matrix.
*`const size_t a_ld`: Leading dimension of the input A matrix.
*`cl_mem b_buffer`: OpenCL buffer to store the output B matrix.
*`const size_t b_offset`: The offset in elements from the start of the output B matrix.
*`const size_t b_ld`: Leading dimension of the output B matrix.
*`cl_command_queue* queue`: Pointer to an OpenCL command queue associated with a context and device to execute the routine on.
*`cl_event* event`: Pointer to an OpenCL event to be able to wait for completion of the routine's OpenCL kernel(s). This is an optional argument.