mirror of
https://github.com/CNugteren/CLBlast.git
synced 2024-07-07 12:23:46 +02:00
Fixed the issue with AMD's APP compiler not being able to compile the invert kernel
This commit is contained in:
parent
1511909b6f
commit
ad483123e6
|
@ -5,6 +5,7 @@ Development (next version)
|
||||||
- Added OpenCL pre-processor to unroll loops and perform array-to-register promotions for compilers
|
- Added OpenCL pre-processor to unroll loops and perform array-to-register promotions for compilers
|
||||||
which don't do this themselves (ARM Mali) - greatly improves performance on these platforms
|
which don't do this themselves (ARM Mali) - greatly improves performance on these platforms
|
||||||
- Added first tuners for the TRSV (block size) and TRSM (invert kernel) routines
|
- Added first tuners for the TRSV (block size) and TRSM (invert kernel) routines
|
||||||
|
- Fixed an issue with a crashing/hanging AMD APP compiler with the TRSM routine (invert kernel)
|
||||||
- Improved compilation time by splitting the tuning database into multiple compilation units
|
- Improved compilation time by splitting the tuning database into multiple compilation units
|
||||||
- Various minor fixes and enhancements
|
- Various minor fixes and enhancements
|
||||||
- Added tuned parameters for various devices (see README)
|
- Added tuned parameters for various devices (see README)
|
||||||
|
|
Loading…
Reference in a new issue