Cedric Nugteren
|
560f7a40f6
|
Added convgemm to the CLBlast database, added initial parameters for Skylake GPU
|
2018-12-31 19:05:34 +01:00 |
Cedric Nugteren
|
7c3431a72a
|
Fixes for Apple OpenCL CPU implementation which requires a LWGS of 1 when barriers are present
|
2018-06-01 20:59:44 +02:00 |
Cedric Nugteren
|
a8bb0c9f3c
|
Added Apple OpenCL TRSV block size override; removed failing old Intel GPU test from README
|
2018-05-29 21:29:12 +02:00 |
Cedric Nugteren
|
0f49dd24e5
|
Updated database with defaults of GEMMK=0 and KREG=1
|
2018-04-10 21:26:18 +02:00 |
Cedric Nugteren
|
77ba11f686
|
Extended the maximum number of tuning parameters from 14 to 16
|
2018-04-08 18:12:54 +02:00 |
Cedric Nugteren
|
7a756cbce7
|
Fixed a failing TRSV test using a CPU with Apple OpenCL
|
2018-03-15 20:58:42 +01:00 |
Cedric Nugteren
|
4e317f5e85
|
Improved compilation time of the tuner database
|
2017-09-16 18:02:37 +02:00 |
Cedric Nugteren
|
0d13d814c2
|
Added architecture layer in the tuning database for better performance on unseen devices
|
2017-09-14 21:27:33 +02:00 |
Cedric Nugteren
|
20da5e33a8
|
Split the database files over multiple directories and files; first step towards separate compilation
|
2017-09-06 21:50:42 +02:00 |
Cedric Nugteren
|
e44feb8576
|
Changed the structure of the database to reduce compilation time and save memory
|
2017-06-20 21:19:26 +02:00 |
Cedric Nugteren
|
fb6c78ea07
|
Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance
|
2017-04-07 07:37:30 +02:00 |