Cedric Nugteren
|
ef6b1207df
|
Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel
|
2018-04-17 21:13:28 +02:00 |
|
Cedric Nugteren
|
93610a9cba
|
Fixed some failing tests for GEMM and batched GEMM routines
|
2018-04-15 12:53:32 +02:00 |
|
Cedric Nugteren
|
f14e6f87d2
|
Updated tuning results for the Skylake ULT GT2 GPU with the new kernel
|
2018-04-15 11:45:45 +02:00 |
|
Cedric Nugteren
|
0dff7f1ac4
|
Made GEMM rotation expectations kernel-specific
|
2018-04-13 22:27:11 +02:00 |
|
Cedric Nugteren
|
0f49dd24e5
|
Updated database with defaults of GEMMK=0 and KREG=1
|
2018-04-10 21:26:18 +02:00 |
|
Cedric Nugteren
|
f6a48f05ed
|
Made it possible to add tuning parameters to the database using the script
|
2018-04-10 21:24:36 +02:00 |
|
Cedric Nugteren
|
3fbbb81137
|
Fixed a bug in the compression part of the database script
|
2018-04-10 21:18:11 +02:00 |
|
Cedric Nugteren
|
77ba11f686
|
Extended the maximum number of tuning parameters from 14 to 16
|
2018-04-08 18:12:54 +02:00 |
|
Cedric Nugteren
|
a93fec1026
|
Fixed issues with the pre-processor
|
2018-04-08 18:02:44 +02:00 |
|
Cedric Nugteren
|
7cbc6b7495
|
Merge branch 'master' into CLBlast-228-2d-register-gemm-kernel
|
2018-04-07 17:51:40 +02:00 |
|
Cedric Nugteren
|
16f7f49683
|
Added tuning results for NVIDIA GeForce 970
|
2018-04-07 17:48:25 +02:00 |
|
Cedric Nugteren
|
9596e46d01
|
Added tuning results for NVIDIA GeForce 920MX
|
2018-04-07 17:44:32 +02:00 |
|
Cedric Nugteren
|
cf7965dc68
|
Fixed a python3 import error issue with the database script
|
2018-04-07 17:40:43 +02:00 |
|
Cedric Nugteren
|
048fe90e57
|
Added tuning results for Intel HD Graphics 620
|
2018-04-07 17:33:57 +02:00 |
|
Cedric Nugteren
|
3519d32ac4
|
Extended the GEMM tuner to be able to tune the new 'kernel 1'
|
2018-04-07 17:05:44 +02:00 |
|
Cedric Nugteren
|
381f1fe67a
|
Fixed a compilation issue for complex datatypes and vload
|
2018-04-07 16:57:36 +02:00 |
|
Cedric Nugteren
|
2a29dc061c
|
Fixed a compilation issue for complex datatypes and vload
|
2018-04-06 21:06:13 +02:00 |
|
Cedric Nugteren
|
eae25f5727
|
Added first version of 2D register tiling kernel with A and C transposed as well
|
2018-04-03 21:18:40 +02:00 |
|
Cedric Nugteren
|
63996eb68b
|
Updated pyclblast to 1.1.0 and uploaded to PyPi
|
2018-03-30 10:38:36 +02:00 |
|
Cedric Nugteren
|
4de220a7a2
|
Merge pull request #255 from kodonnell/py_override
Adding override parameters to pyclblast
|
2018-03-30 10:28:00 +02:00 |
|
Cedric Nugteren
|
d86ff75fa5
|
Added argument checking for the GEMM tuner: expects m/n to be multiples of MWG/NWG
|
2018-03-30 10:23:33 +02:00 |
|
Cedric Nugteren
|
7e69c422af
|
Updated the roadmap
|
2018-03-30 10:05:16 +02:00 |
|
Cedric Nugteren
|
bb0889fa7a
|
Merge branch 'CLBlast-227-vivante-compiler-errors'
|
2018-03-30 09:22:09 +02:00 |
|
kodonell
|
173a7eb928
|
merged
|
2018-03-27 08:55:39 +13:00 |
|
kodonell
|
d16f2d1317
|
got the generator thing working
|
2018-03-27 08:45:54 +13:00 |
|
kodonell
|
f07c2a29b8
|
moved override_parameters example out of sgemm example
|
2018-03-27 08:30:58 +13:00 |
|
kodonell
|
58e70c56f1
|
tidying up pyclblast override_parameters api, and added example
|
2018-03-26 08:51:55 +13:00 |
|
Cedric Nugteren
|
1cbe2ea301
|
Removed arrays as function argument from GEMM kernels for Vivante OpenCL compiler
|
2018-03-23 20:29:20 +01:00 |
|
Cedric Nugteren
|
a97d8a0197
|
Merge pull request #269 from CNugteren/CLBlast-266-local-mem-constraint
CLBlast #266 local mem constraint
|
2018-03-22 22:42:33 +01:00 |
|
Cedric Nugteren
|
9fb6550dd0
|
Added the OpenCL local memory size constraint to the tuners
|
2018-03-22 21:01:02 +01:00 |
|
Cedric Nugteren
|
7a2371213b
|
Re-added support for local memory size constraint checking in the tuner
|
2018-03-21 22:58:37 +01:00 |
|
Cedric Nugteren
|
52791bf355
|
Fixed a failing TRSM test using a CPU with Apple OpenCL
|
2018-03-15 21:09:52 +01:00 |
|
Cedric Nugteren
|
7a756cbce7
|
Fixed a failing TRSV test using a CPU with Apple OpenCL
|
2018-03-15 20:58:42 +01:00 |
|
Cedric Nugteren
|
f4d96e80c3
|
Fixed breaking preprocessor test on certain platforms due to empty kernel string
|
2018-03-15 20:45:41 +01:00 |
|
Cedric Nugteren
|
9ff6cd7547
|
Added queue-finish commands to PyCLBlast samples and tests
|
2018-03-15 20:37:48 +01:00 |
|
Cedric Nugteren
|
934893972e
|
Merge pull request #262 from CNugteren/CLBlast-237-tuning-api
CLBlast #237: Tuning API
|
2018-03-11 15:38:33 +01:00 |
|
Cedric Nugteren
|
bcf1208431
|
Added basic tests for PyCLBlast
|
2018-03-11 15:32:36 +01:00 |
|
Cedric Nugteren
|
0dd1bc6f48
|
Made benchmarking script also work for complex numbers
|
2018-03-10 17:03:57 +01:00 |
|
Cedric Nugteren
|
49b02ec194
|
Added initial glossary
|
2018-03-10 17:02:38 +01:00 |
|
Cedric Nugteren
|
86455841d1
|
Added badge for OSX-Intel-CPU builds
|
2018-03-10 16:49:36 +01:00 |
|
Cedric Nugteren
|
903deaf368
|
Fixed an issue for DLL linking under Windows
|
2018-03-10 16:45:31 +01:00 |
|
Cedric Nugteren
|
e7dccfa3cc
|
Fixed an issue for DLL linking under Windows
|
2018-03-10 14:57:36 +01:00 |
|
Cedric Nugteren
|
54bbc99273
|
Updated the documentation for the tuner API
|
2018-03-10 14:52:40 +01:00 |
|
Cedric Nugteren
|
3d2ef9331b
|
Fixed a few things for the new tuning API
|
2018-03-10 14:35:11 +01:00 |
|
Cedric Nugteren
|
0bdc51e47c
|
Completed the API for all tuneable kernels
|
2018-03-10 10:54:44 +01:00 |
|
kodonell
|
c6056da0c8
|
ok, device id working
|
2018-03-10 22:21:30 +13:00 |
|
Cedric Nugteren
|
6397e61746
|
Added several more tuner API functions
|
2018-03-09 21:40:22 +01:00 |
|
kodonell
|
54a4b871b3
|
initial add of override parameters to pyclblast - cython not complaining, but segfault
|
2018-03-09 15:27:33 +13:00 |
|
Cedric Nugteren
|
49cc8b31ff
|
Fixed compilation issue in Xger tuner
|
2018-03-06 20:59:23 +01:00 |
|
Cedric Nugteren
|
0e1a152023
|
First version of the tuning API, added interface for copy-kernel, added sample
|
2018-03-06 20:52:12 +01:00 |
|