Angus, Alexander
|
73f49e9b3d
|
Updated according to feedback from CNugteren
|
2023-01-17 08:35:29 -08:00 |
Angus, Alexander
|
4f394608a2
|
implemented changes to boost Adreno performance according to https://jira-dc.qualcomm.com/jira/browse/OSR-8731
|
2023-01-03 10:56:04 -08:00 |
Cedric Nugteren
|
af6a9eedd1
|
Added a function to set the OpenCL kernel standard, either 1.1 or 1.2
|
2019-05-11 20:39:00 +02:00 |
Koichi Akabe
|
032e3b0cc0
|
Add kernel_mode option to im2col, col2im, and convgemm functions
|
2018-11-12 10:12:07 +09:00 |
Koichi Akabe
|
0b3d04f709
|
Fix col2im implementation
|
2018-10-30 14:54:55 +09:00 |
Cedric Nugteren
|
3621639b63
|
Added device-name removal code to handle POCL naming convention
|
2018-07-13 21:20:27 +02:00 |
Cedric Nugteren
|
70d0fe89c6
|
Fixed a minor typo
|
2018-02-11 15:31:08 +01:00 |
Cedric Nugteren
|
9527c89c30
|
Made parameter override in the clients a command-line argument and added support for multi-kernel routines
|
2017-11-22 20:53:20 +01:00 |
Cedric Nugteren
|
4bac1287f2
|
Moved square-difference utility function for use in the tuners
|
2017-11-13 21:10:44 +01:00 |
Cedric Nugteren
|
7408da174c
|
Various fixes to make the first CUDA examples work
|
2017-10-15 12:17:35 +02:00 |
Cedric Nugteren
|
3598762029
|
Moved the remaining OpenCL specific host code to the clpp11.h header where it belongs
|
2017-10-08 10:29:47 +02:00 |
Cedric Nugteren
|
76382ff6c1
|
Added the new vendor-architecture-name hierarchy to the tuners as well
|
2017-09-10 16:34:54 +02:00 |
Cedric Nugteren
|
91ea7fcde2
|
Introduced the notion of a device-architecture for the database and added device and architecture name mappings
|
2017-09-08 21:09:05 +02:00 |
Cedric Nugteren
|
844e68853e
|
Moved some utility functions to a test-specific utility compilation-unit
|
2017-08-12 15:38:17 +02:00 |
Cedric Nugteren
|
fb6c78ea07
|
Added a special override database for the Apple CPU implementation on OS X: this makes the test work, it does not focus on good performance
|
2017-04-07 07:37:30 +02:00 |
Cedric Nugteren
|
b84d2296b8
|
Separated host-device and device-host memory copies from execution of the CBLAS reference code; for fair timing and code de-duplication
|
2017-04-01 13:36:24 +02:00 |
Cedric Nugteren
|
92a657290a
|
Fixed a small compilation bug for MSVC related to a floating-point constant
|
2017-03-10 20:30:10 +01:00 |
Cedric Nugteren
|
7f14b11f1e
|
Changed the way the test-data is generated: now using a single MT generator and distribution for all data
|
2017-03-05 11:13:47 +01:00 |
Cedric Nugteren
|
e993ee077b
|
Added a proper data-preparation function for the TRSM tests
|
2017-03-04 15:21:33 +01:00 |
Cedric Nugteren
|
e47d95887c
|
Added PrepareData function for TRSM to create proper test input
|
2017-02-25 12:23:04 +01:00 |
Cedric Nugteren
|
133ebfc834
|
Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass
|
2017-02-19 17:43:26 +01:00 |
Cedric Nugteren
|
a2c0a9c551
|
Set number of decimals for floating-point printing for error reporting
|
2017-01-20 11:13:44 +01:00 |
Cedric Nugteren
|
df9a77d74d
|
Added first version of the TRSM routine based on the diagonal invert kernel
|
2017-01-18 21:29:59 +01:00 |
Cedric Nugteren
|
4b3ffd9989
|
Added a first version of the diagonal block invert routine in preparation of TRSM
|
2017-01-15 17:30:00 +01:00 |
Cedric Nugteren
|
39c49bf4f9
|
Made it possible to use the command-line environmental variables for each executable and without re-running CMake
|
2016-11-27 11:00:29 +01:00 |
Cedric Nugteren
|
729862e873
|
Fixed some issues preventing the Netlib CBLAS API from linking correctly
|
2016-10-25 19:56:42 +02:00 |
Cedric Nugteren
|
b0ff11acf0
|
Moved files around a bit; created a utilities subfolder
|
2016-10-22 15:36:48 +02:00 |