Commit Graph

27 Commits (6e2ab6ee967c4a9b3350c7ce4e7d7b736c9e45f6)

Author SHA1 Message Date
Angus, Alexander 73f49e9b3d Updated according to feedback from CNugteren 2023-01-17 08:35:29 -08:00
Angus, Alexander 4f394608a2 implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731 2023-01-03 10:56:04 -08:00
Cedric Nugteren af6a9eedd1 Added a function to set the OpenCL kernel standard, either 1.1 or 1.2 2019-05-11 20:39:00 +02:00
Koichi Akabe 032e3b0cc0 Add kernel_mode option to im2col, col2im, and convgemm functions 2018-11-12 10:12:07 +09:00
Koichi Akabe 0b3d04f709 Fix col2im implementation 2018-10-30 14:54:55 +09:00
Cedric Nugteren 3621639b63 Added device-name removal code to handle POCL naming convention 2018-07-13 21:20:27 +02:00
Cedric Nugteren 70d0fe89c6 Fixed a minor typo 2018-02-11 15:31:08 +01:00
Cedric Nugteren 9527c89c30 Made parameter override in the clients a command-line argument and added support for multi-kernel routines 2017-11-22 20:53:20 +01:00
Cedric Nugteren 4bac1287f2 Moved square-difference utility function for use in the tuners 2017-11-13 21:10:44 +01:00
Cedric Nugteren 7408da174c Various fixes to make the first CUDA examples work 2017-10-15 12:17:35 +02:00
Cedric Nugteren 3598762029 Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs 2017-10-08 10:29:47 +02:00
Cedric Nugteren 76382ff6c1 Added the new vendor-architecture-name hierarchy to the tuners as well 2017-09-10 16:34:54 +02:00
Cedric Nugteren 91ea7fcde2 Introduced the notion of a device-architecture for the database and added device and architecture name mappings 2017-09-08 21:09:05 +02:00
Cedric Nugteren 844e68853e Moved some utility functions to a test-specific utility compilation-unit 2017-08-12 15:38:17 +02:00
Cedric Nugteren fb6c78ea07 Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance 2017-04-07 07:37:30 +02:00
Cedric Nugteren b84d2296b8 Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication 2017-04-01 13:36:24 +02:00
Cedric Nugteren 92a657290a Fixed a small compilation bug for MSVC related to a floating-point constant 2017-03-10 20:30:10 +01:00
Cedric Nugteren 7f14b11f1e Changed the way the test-data is generated: now using a single MT generator and distribution for all data 2017-03-05 11:13:47 +01:00
Cedric Nugteren e993ee077b Added a proper data-preparation function for the TRSM tests 2017-03-04 15:21:33 +01:00
Cedric Nugteren e47d95887c Added PrepareData function for TRSM to create proper test input 2017-02-25 12:23:04 +01:00
Cedric Nugteren 133ebfc834 Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass 2017-02-19 17:43:26 +01:00
Cedric Nugteren a2c0a9c551 Set number of decimals for floating-point printing for error reporting 2017-01-20 11:13:44 +01:00
Cedric Nugteren df9a77d74d Added first version of the TRSM routine based on the diagonal invert kernel 2017-01-18 21:29:59 +01:00
Cedric Nugteren 4b3ffd9989 Added a first version of the diagonal block invert routine in preparation of TRSM 2017-01-15 17:30:00 +01:00
Cedric Nugteren 39c49bf4f9 Made it possible to use the command-line environmental variables for each executable and without re-running CMake 2016-11-27 11:00:29 +01:00
Cedric Nugteren 729862e873 Fixed some issues preventing the Netlib CBLAS API from linking correctly 2016-10-25 19:56:42 +02:00
Cedric Nugteren b0ff11acf0 Moved files around a bit; created a utilities subfolder 2016-10-22 15:36:48 +02:00