Merge pull request #324 from CNugteren/CLBlast-315-tuning-api-improvements
Made tuning API more flexibleCLBlast-267-convgemm
commit
ff7bee93d3
|
@ -3,6 +3,7 @@ Development (next version)
|
|||
- Added support for shuffle instructions for NVIDIA GPUs (thanks to 'tyler-utah')
|
||||
- Added an option to compile the Netlib API with static OpenCL device and context (-DNETLIB_PERSISTENT_OPENCL=ON)
|
||||
- The tuners now check beforehand on invalid local thread sizes and skip those completely
|
||||
- Made the tuning API (OverrideParameters) more flexible, disregarding superfluous parameters
|
||||
- Fixed an issue with conjugate transpose not being executed in certain cases for a.o. XOMATCOPY
|
||||
- Fixed an issue with AMD GPUs and the new GEMMK == 1 kernel
|
||||
- Fixed an issue with the preprocessor and the new GEMMK == 1 kernel
|
||||
|
|
|
@ -201,7 +201,7 @@ These two functions require/retrieve the parameters as given in [src/database/ke
|
|||
| --------------------|-----------------------|
|
||||
| Xaxpy | VW, WGS, WPT |
|
||||
| Xdot | WGS1, WGS2 |
|
||||
| Xgemv | WGS1, WPT1, UNROLL1 |
|
||||
| Xgemv | WGS1, WPT1 |
|
||||
| XgemvFast | VW2, WGS2, WPT2 |
|
||||
| XgemvFastRot | VW3, WGS3, WPT3 |
|
||||
| Xger | WGS1, WGS2, WPT |
|
||||
|
|
|
@ -161,7 +161,7 @@ StatusCode OverrideParameters(const RawDeviceID device, const std::string &kerne
|
|||
|
||||
// Verifies the parameters size
|
||||
const auto current_parameter_names = current_database.GetParameterNames();
|
||||
if (current_parameter_names.size() != parameters.size()) {
|
||||
if (current_parameter_names.size() > parameters.size()) {
|
||||
return StatusCode::kMissingOverrideParameter;
|
||||
}
|
||||
|
||||
|
|
Loading…
Reference in New Issue