Cedric Nugteren
|
b2248a17ae
|
Merge pull request #277 from CNugteren/CLBlast-257-intel-subgroups
Intel subgroup shuffling
|
2018-04-29 15:48:35 +02:00 |
|
Cedric Nugteren
|
9f22bc232b
|
Updated the changelog
|
2018-04-29 15:06:44 +02:00 |
|
Cedric Nugteren
|
0022107b2a
|
Updated the roadmap
|
2018-04-29 15:06:33 +02:00 |
|
Cedric Nugteren
|
7b416c8686
|
Fixed an access violation when compiled with Visual Studio upon releasing the OpenCL program
|
2018-04-26 21:10:17 +02:00 |
|
Cedric Nugteren
|
2965b87dda
|
Added Intel subgroup shuffle support to the 2D register caching GEMM kernel
|
2018-04-24 21:32:42 +02:00 |
|
Cedric Nugteren
|
2b1e0295e6
|
Added a define to enable subgroup shuffling if supported by the device
|
2018-04-24 20:41:15 +02:00 |
|
Cedric Nugteren
|
5d46a3193e
|
Merge pull request #274 from CNugteren/CLBlast-228-2d-register-gemm-kernel
Added 2D-register-caching GEMM kernel
|
2018-04-21 21:15:44 +02:00 |
|
Cedric Nugteren
|
3e3a26e0da
|
Fixes for the CUDA API
|
2018-04-20 21:50:36 +02:00 |
|
Cedric Nugteren
|
458e6717a9
|
Expressed HER2K as two HERK calls
|
2018-04-18 20:58:29 +02:00 |
|
Cedric Nugteren
|
dcce23d938
|
Expressed SYR2K as two SYRK calls
|
2018-04-18 20:29:28 +02:00 |
|
Cedric Nugteren
|
ef6b1207df
|
Updated HERK and SYRK to follow the GEMM style and functions to make it work with the new kernel
|
2018-04-17 21:13:28 +02:00 |
|
Cedric Nugteren
|
93610a9cba
|
Fixed some failing tests for GEMM and batched GEMM routines
|
2018-04-15 12:53:32 +02:00 |
|
Cedric Nugteren
|
f14e6f87d2
|
Updated tuning results for the Skylake ULT GT2 GPU with the new kernel
|
2018-04-15 11:45:45 +02:00 |
|
Cedric Nugteren
|
0dff7f1ac4
|
Made GEMM rotation expectations kernel-specific
|
2018-04-13 22:27:11 +02:00 |
|
Cedric Nugteren
|
0f49dd24e5
|
Updated database with defaults of GEMMK=0 and KREG=1
|
2018-04-10 21:26:18 +02:00 |
|
Cedric Nugteren
|
f6a48f05ed
|
Made it possible to add tuning parameters to the database using the script
|
2018-04-10 21:24:36 +02:00 |
|
Cedric Nugteren
|
3fbbb81137
|
Fixed a bug in the compression part of the database script
|
2018-04-10 21:18:11 +02:00 |
|
Cedric Nugteren
|
77ba11f686
|
Extended the maximum number of tuning parameters from 14 to 16
|
2018-04-08 18:12:54 +02:00 |
|
Cedric Nugteren
|
a93fec1026
|
Fixed issues with the pre-processor
|
2018-04-08 18:02:44 +02:00 |
|
Cedric Nugteren
|
7cbc6b7495
|
Merge branch 'master' into CLBlast-228-2d-register-gemm-kernel
|
2018-04-07 17:51:40 +02:00 |
|
Cedric Nugteren
|
16f7f49683
|
Added tuning results for NVIDIA GeForce 970
|
2018-04-07 17:48:25 +02:00 |
|
Cedric Nugteren
|
9596e46d01
|
Added tuning results for NVIDIA GeForce 920MX
|
2018-04-07 17:44:32 +02:00 |
|
Cedric Nugteren
|
cf7965dc68
|
Fixed a python3 import error issue with the database script
|
2018-04-07 17:40:43 +02:00 |
|
Cedric Nugteren
|
048fe90e57
|
Added tuning results for Intel HD Graphics 620
|
2018-04-07 17:33:57 +02:00 |
|
Cedric Nugteren
|
3519d32ac4
|
Extended the GEMM tuner to be able to tune the new 'kernel 1'
|
2018-04-07 17:05:44 +02:00 |
|
Cedric Nugteren
|
381f1fe67a
|
Fixed a compilation issue for complex datatypes and vload
|
2018-04-07 16:57:36 +02:00 |
|
Cedric Nugteren
|
2a29dc061c
|
Fixed a compilation issue for complex datatypes and vload
|
2018-04-06 21:06:13 +02:00 |
|
Cedric Nugteren
|
eae25f5727
|
Added first version of 2D register tiling kernel with A and C transposed as well
|
2018-04-03 21:18:40 +02:00 |
|
Cedric Nugteren
|
63996eb68b
|
Updated pyclblast to 1.1.0 and uploaded to PyPi
|
2018-03-30 10:38:36 +02:00 |
|
Cedric Nugteren
|
4de220a7a2
|
Merge pull request #255 from kodonnell/py_override
Adding override parameters to pyclblast
|
2018-03-30 10:28:00 +02:00 |
|
Cedric Nugteren
|
d86ff75fa5
|
Added argument checking for the GEMM tuner: expects m/n to be multiples of MWG/NWG
|
2018-03-30 10:23:33 +02:00 |
|
Cedric Nugteren
|
7e69c422af
|
Updated the roadmap
|
2018-03-30 10:05:16 +02:00 |
|
Cedric Nugteren
|
bb0889fa7a
|
Merge branch 'CLBlast-227-vivante-compiler-errors'
|
2018-03-30 09:22:09 +02:00 |
|
kodonell
|
173a7eb928
|
merged
|
2018-03-27 08:55:39 +13:00 |
|
kodonell
|
d16f2d1317
|
got the generator thing working
|
2018-03-27 08:45:54 +13:00 |
|
kodonell
|
f07c2a29b8
|
moved override_parameters example out of sgemm example
|
2018-03-27 08:30:58 +13:00 |
|
kodonell
|
58e70c56f1
|
tidying up pyclblast override_parameters api, and added example
|
2018-03-26 08:51:55 +13:00 |
|
Cedric Nugteren
|
1cbe2ea301
|
Removed arrays as function argument from GEMM kernels for Vivante OpenCL compiler
|
2018-03-23 20:29:20 +01:00 |
|
Cedric Nugteren
|
a97d8a0197
|
Merge pull request #269 from CNugteren/CLBlast-266-local-mem-constraint
CLBlast #266 local mem constraint
|
2018-03-22 22:42:33 +01:00 |
|
Cedric Nugteren
|
9fb6550dd0
|
Added the OpenCL local memory size constraint to the tuners
|
2018-03-22 21:01:02 +01:00 |
|
Cedric Nugteren
|
7a2371213b
|
Re-added support for local memory size constraint checking in the tuner
|
2018-03-21 22:58:37 +01:00 |
|
Cedric Nugteren
|
52791bf355
|
Fixed a failing TRSM test using a CPU with Apple OpenCL
|
2018-03-15 21:09:52 +01:00 |
|
Cedric Nugteren
|
7a756cbce7
|
Fixed a failing TRSV test using a CPU with Apple OpenCL
|
2018-03-15 20:58:42 +01:00 |
|
Cedric Nugteren
|
f4d96e80c3
|
Fixed breaking preprocessor test on certain platforms due to empty kernel string
|
2018-03-15 20:45:41 +01:00 |
|
Cedric Nugteren
|
9ff6cd7547
|
Added queue-finish commands to PyCLBlast samples and tests
|
2018-03-15 20:37:48 +01:00 |
|
Cedric Nugteren
|
934893972e
|
Merge pull request #262 from CNugteren/CLBlast-237-tuning-api
CLBlast #237: Tuning API
|
2018-03-11 15:38:33 +01:00 |
|
Cedric Nugteren
|
bcf1208431
|
Added basic tests for PyCLBlast
|
2018-03-11 15:32:36 +01:00 |
|
Cedric Nugteren
|
0dd1bc6f48
|
Made benchmarking script also work for complex numbers
|
2018-03-10 17:03:57 +01:00 |
|
Cedric Nugteren
|
49b02ec194
|
Added initial glossary
|
2018-03-10 17:02:38 +01:00 |
|
Cedric Nugteren
|
86455841d1
|
Added badge for OSX-Intel-CPU builds
|
2018-03-10 16:49:36 +01:00 |
|