Commit Graph

16 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Angus, Alexander 73f49e9b3d Updated according to feedback from CNugteren 2023-01-17 08:35:29 -08:00
Angus, Alexander 4f394608a2 implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731 2023-01-03 10:56:04 -08:00
Cedric Nugteren af6a9eedd1 Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 2019-05-11 20:39:00 +02:00
Cedric Nugteren 9cbffc9b7c Changed back to cl_intel_subgroups as suggested 2019-05-08 22:01:56 +02:00
Cedric Nugteren c5a82f6978 Added a host-code check to make sure the avc_motion_estimation is available 2019-05-07 20:47:50 +02:00
Cedric Nugteren 8ac39fa331 Disabled Intel subgroup shuffling for double-precision 2018-09-15 16:53:09 +02:00
Tyler Sorensen 7709a7308b Applied feedback from Cedric from first pull request 2018-07-14 19:50:47 -04:00
Tyler Sorensen 7f2e98a140 added inline ptx to support shuffle on Nvidia GPUs 2018-07-11 15:12:22 -04:00
Cedric Nugteren 8258321a74 Now stores a shared_ptr to the Program class in the cache 2018-05-01 20:34:48 +02:00
Cedric Nugteren 2b1e0295e6 Added a define to enable subgroup shuffling if supported by the device 2018-04-24 20:41:15 +02:00
Cedric Nugteren bd540829ea Fixes for the CUDA backend of CLBlast 2017-12-24 12:10:55 +01:00
Cedric Nugteren 69f6591564 Removed all ARM Mali tuning results; re-added Mali-T760 and Mali-T628 results based on kernel pre-processor 2017-12-17 16:59:08 +01:00
Cedric Nugteren ca5dbcd2bd Made the pre-processor run by default for ARM and Qualcomm GPUs 2017-12-09 15:16:53 +01:00
Cedric Nugteren 13eb772343 Integrated pre-processor in compilation flow, default is still disabled 2017-11-30 21:32:47 +01:00
Cedric Nugteren 4e0d08c3bc Added compilation timing and better compilation error reporting 2017-11-19 16:58:13 +01:00
Cedric Nugteren f94d498a37 Moved compilation function to separate file; removed dependency of tuners of the CLBlast library 2017-11-17 20:57:46 +01:00