mirror of
https://github.com/CNugteren/CLBlast.git
synced 2024-08-24 14:02:28 +02:00
246 lines
10 KiB
Plaintext
246 lines
10 KiB
Plaintext
# ########################################################################
|
||
# Copyright 2013 Advanced Micro Devices, Inc.
|
||
#
|
||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||
# you may not use this file except in compliance with the License.
|
||
# You may obtain a copy of the License at
|
||
#
|
||
# http://www.apache.org/licenses/LICENSE-2.0
|
||
#
|
||
# Unless required by applicable law or agreed to in writing, software
|
||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||
# See the License for the specific language governing permissions and
|
||
# limitations under the License.
|
||
# ########################################################################
|
||
|
||
clBLAS Readme
|
||
|
||
Version: 1.10
|
||
Release Date: April 2013
|
||
|
||
ChangeLog:
|
||
____________
|
||
Current Version:
|
||
New:
|
||
* New Level 1 routines added (an 'x' implies all 4 precisions)
|
||
xSWAP, xCOPY, xSCAL, CSSCAL, ZDSCAL, xAXPY, SDOT, DDOT,
|
||
CDOTU, ZDOTU, CDOTC, ZDOTC, xROTG, SROTMG, DROTMG,
|
||
SROT, DROT, CSROT, ZDROT, SROTM, DROTM, SNRM2, DNRM2,
|
||
SCNRM2, DZNRM2, ixAMAX, SASUM, DASUM, SCASUM, DZASUM
|
||
* Samples have been added for the new functions
|
||
* This release tested using the 9.012 runtime driver and the 2.8 APPSDK
|
||
Fixed:
|
||
* Failures in *trsm functions with clMAGMA tests
|
||
Known Issues:
|
||
* Failures & hangs in ztrmm, *trsv, *tpsv functions on Southern Island GPU devices
|
||
* Failures in zgemm functions on Northern Island GPU devices
|
||
* Failures & hangs are expected to be fixed in the upcoming AMD graphics driver versions.
|
||
It is strongly recommended that users keep their graphics driver versions up to date.
|
||
|
||
____________
|
||
Version 1.8.291:
|
||
Fixed:
|
||
* Failures in the following functions: ssyr2, ssyr2k, strsm, strsv, ssyrk, cher,
|
||
ctrsv, csymm, cher2, ztrmm on Southern Island GPU devices.
|
||
* Failures in the following functions: dsyr, dsyr2, dgemv, dsyrk,
|
||
dsyr2k, zsyr2k on Trinity platforms.
|
||
Known Issues:
|
||
* Failures in *trsm functions with clMAGMA tests
|
||
|
||
____________
|
||
Version 1.8.269 (Beta, clMAGMA support):
|
||
New:
|
||
* No new routines
|
||
* This release tested using the 8.961 runtime driver and the 2.6 APPSDK
|
||
|
||
Known Issues:
|
||
* The clBLASTune executable has been observed to hang on Windows. If
|
||
this happens, abort execution of the tune program; it is not required
|
||
for correct operation of the BLAS routines (as of 8.872).
|
||
* clBLAS can return invalid results on CPU devices (as
|
||
of 8.961). The CPU device is primarily a test/debug device, and GPU
|
||
devices are unaffected.
|
||
* clBLAS can return invalid results for double precision functions (dsyr,
|
||
dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of
|
||
8.961).
|
||
* clBLAS can return invalid results (ssyr2, ssyr2k, strsm, strsv, ssyrk, cher,
|
||
ctrsv, csymm, cher2, ztrmm) on Southern Island GPU devices (as of 8.961).
|
||
|
||
____________
|
||
Version 1.7 (Beta, clMAGMA support):
|
||
New:
|
||
* New Level 3 routines added (an 'x' implies all 4 precisions)
|
||
CHER2K, ZHER2K
|
||
* New Level 2 routines added (an 'x' implies all 4 precisions)
|
||
xTPMV, xTPSV, SSPVM, DSPMV, CHPMV, ZHPMV, SSPR, DSPR, CHPR, ZHPR,
|
||
SSPR2, DSPR2, CHPR2, ZHPR2, xGBMV, CHBMV, ZHBMV, SSBMV, DSBMV,
|
||
xTBMV, xTBSV
|
||
* Samples have been added for the new functions, but are not fully tested
|
||
* This release tested using the 8.951 runtime driver and the 2.6 APPSDK
|
||
* Note that documentation is incomplete for the new functions
|
||
|
||
Known Issues:
|
||
* The clBLASTune executable has been observed to hang on Windows. If
|
||
this happens, abort execution of the tune program; it is not required
|
||
for correct operation of the BLAS routines (as of 8.872).
|
||
* clBLAS can return invalid results on CPU devices that support AVX (as
|
||
of 8.951). CPU devices that support up to SSE3 are unaffected. The
|
||
CPU device is primarily a test/debug device, and GPU devices are
|
||
unaffected.
|
||
* clBLAS can return invalid results for double precision functions (dsyr,
|
||
dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of
|
||
8.951).
|
||
* clBLAS can return invalid results (ssyr, ssyr2, strsv, ctrsv, ssyrk,
|
||
ssyr2k, ztrmm) on Southern Island GPU devices (as of 8.951).
|
||
|
||
____________
|
||
Version 1.6:
|
||
New:
|
||
* New Level 3 routines added (an 'x' implies all 4 precisions)
|
||
CSYRK, ZSYRK, CSYR2K, ZSYR2K, CHEMM, ZHEMM, CHERK, ZHERK, xSYMM
|
||
* New Level 2 routines added (an 'x' implies all 4 precisions)
|
||
CGEMV, ZGEMV, xTRMV, xTRSV, CHEMV, ZHEMV, SGER, DGER, CGERU, ZGERU,
|
||
CGERC, ZGERC, CHER, ZHER, CHER2, ZHER2, SSYR, DSYR, SSYR2, DSYR2
|
||
* For all the original functions prior to 1.6, a new API has been introduced
|
||
with an *Ex suffix. These extended API's add new parameters that allow
|
||
users to specify an offset to a matrix argument. This allows efficient
|
||
sub-matrix indexing within a clBLAS routine without requiring expensive
|
||
sub-matrix copy operations.
|
||
* Samples have been added for the new functions
|
||
* Preview: Support for AMD Radeon<6F> HD7000 series GPUs
|
||
* This release tested using the 8.92 runtime driver and the 2.6 APP SDK
|
||
|
||
Known Issues:
|
||
* The clBLASTune executable has been observed to hang on Windows. If this
|
||
happens, abort execution of the tune program; it is not required for
|
||
correct operation of the BLAS routines (as of 8.872).
|
||
* The CPU device for clBLAS is not functioning for this release (as of
|
||
8.872). The CPU device is primarily a test/debug device, and GPU
|
||
devices are unaffected.
|
||
|
||
____________
|
||
Version 1.4:
|
||
New:
|
||
* New Level 3 routines added
|
||
SSYRK, DSYRK, SSYR2K, DSYR2K
|
||
* New Level 2 routines added
|
||
SGEMV, DGEMV, SSYMV, DSYMV
|
||
* The image support functions (clblasAddScratchImage,
|
||
clblasRemoveScratchImage) have been deprecated. Images are no
|
||
longer required for the highest performance.
|
||
* InstallShield is now used for APPML libraries. The default install
|
||
location has changed from c:\amd\clBLAS to
|
||
C:\Program Files (x86)\AMD\clBLAS. It is recommended that previous
|
||
versions of clBLAS are uninstalled first.
|
||
* Samples have been added for the new functions
|
||
* This release tested using the 8.872 runtime driver and the 2.5 APP SDK
|
||
|
||
Known Issues:
|
||
* The clBLASTune executable has been observed to hang on Windows. If this
|
||
happens, abort execution of the tune program; it is not required for
|
||
correct operation of the BLAS routines (as of 8.872).
|
||
* The CPU device for clBLAS is not functioning for this release (as of
|
||
8.872). The CPU device is primarily a test/debug device, and GPU
|
||
devices are unaffected.
|
||
|
||
|
||
____________
|
||
Version 1.2:
|
||
* The library now supports both 32- and 64-bit Windows and Linux operating
|
||
systems.
|
||
* xTRSM routines are available in 1.2.
|
||
* clBLAS routines return clBLASStatus error codes, instead of native
|
||
OpenCL error codes
|
||
|
||
Fixed:
|
||
* xTRMM routines were not properly handling implicit unit diagonal
|
||
elements and implicit off-diagonal zero values specified by the BLAS
|
||
parameters SIDE, UPLO and DIAG.
|
||
* Possible crash with CPU device on 32-bit systems.
|
||
* clblasDgemm routine return an invalid event as its last argument.
|
||
* clBLAS routines return clblasStatus error codes, instead of
|
||
native OpenCL error codes.
|
||
|
||
Known Issues:
|
||
* The clBLASTune executable has been observed to hang on Windows. If this
|
||
happens, abort execution of the tune program; it is not required for
|
||
correct operation of the BLAS routines (as of 8.872).
|
||
* The CPU device for clBLAS is not functioning for this release (as of
|
||
8.872). The CPU device is primarily a test/debug device, and GPU
|
||
devices are unaffected.
|
||
|
||
____________________
|
||
Version 1.0:
|
||
* Initial release
|
||
|
||
Known Issues:
|
||
* Available only on Linux64.
|
||
* xTRMM routines were not properly handling implicit unit diagonal elements
|
||
and implicit off-diagonal zero values specified by the BLAS parameters
|
||
SIDE, UPLO and DIAG
|
||
* clblasDgemm returned an invalid event as its last argument
|
||
|
||
_____________
|
||
Building the Samples:
|
||
|
||
To install the Linux versions of clBLAS, uncompress the initial download, then
|
||
execute the install script.
|
||
|
||
For example:
|
||
|
||
tar -xf clBLAS-${version}-Linux.tar.gz
|
||
- This installs three files into the local directory, one being an
|
||
executable bash script.
|
||
|
||
sudo mkdir /opt/clBLAS-${version}
|
||
- This pre-creates the install directory with proper permissions
|
||
in /opt if it is to be installed there. (This is the default.)
|
||
|
||
./install-clBLAS-${version}.sh
|
||
- This prints an EULA and uncompresses files into the chosen install
|
||
directory.
|
||
|
||
cd ${installDir}/bin64
|
||
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${OpenCLLibDir}:${clBLASLibDir}
|
||
- Be sure to export library dependencies to resolve all external
|
||
linkages to the client program; you can create a bash script to
|
||
help automate this procedure.
|
||
|
||
./example_sgemm
|
||
- Run a simple client; one example is provided for each supported
|
||
main BLAS function family.
|
||
|
||
The sample program does not ship with native build files; instead, a CMake
|
||
file is shipped, and the user generates a native build file for their system.
|
||
|
||
For example:
|
||
|
||
cd ${installDir}
|
||
|
||
mkdir samplesBin/
|
||
- This creates a sister directory to the samples directory that
|
||
houses the native makefiles and the generated files from the
|
||
build.
|
||
|
||
cd samplesBin/
|
||
ccmake ../samples/
|
||
- ccmake is a curses-based cmake program; it takes a parameter
|
||
that specifies the location of the source code to compile.
|
||
- Hit 'c' to configure for the platform; ensure that the
|
||
dependencies to external libraries are satisfied, including
|
||
paths to 'ATI Stream SDK'.
|
||
- After dependencies are satisfied, hit 'c' again to finalize
|
||
configuration. Then, hit 'g' to generate a makefile and
|
||
exit ccmake.
|
||
|
||
make help
|
||
- Look at the options available for make.
|
||
|
||
make
|
||
- Build the sample client program.
|
||
|
||
./example_sgemm
|
||
- Run a simple client; one example is provided for each supported main
|
||
BLAS function family.
|