Cedric Nugteren
|
878d93e7dc
|
Implemented a batched version of the AXPY kernel
|
2017-03-08 20:36:35 +01:00 |
|
Cedric Nugteren
|
fa0a9c689f
|
Make batched routines based on offsets instead of a vector of cl_mem objects - undoing many earlier changes
|
2017-03-08 20:10:20 +01:00 |
|
Cedric Nugteren
|
6aba0bbae7
|
Minor fixes to the client w.r.t. the addition of the batch count
|
2017-03-05 16:44:16 +01:00 |
|
Cedric Nugteren
|
b114ea49a9
|
Added first naive version of the batched AXPY routine
|
2017-03-05 15:06:14 +01:00 |
|
Cedric Nugteren
|
cdf354f895
|
Adjusted the test-infrastructure to support testing of batched-versions of routines
|
2017-03-05 15:04:16 +01:00 |
|
Cedric Nugteren
|
7f14b11f1e
|
Changed the way the test-data is generated: now using a single MT generator and distribution for all data
|
2017-03-05 11:13:47 +01:00 |
|
Cedric Nugteren
|
f9a520b3af
|
Prepared generator for batched routines; added batched AXPY routine interface
|
2017-03-05 10:38:38 +01:00 |
|
Cedric Nugteren
|
37228c9098
|
Fixed a missing include for the tests
|
2017-03-04 20:45:39 +01:00 |
|
Cedric Nugteren
|
e9ef037549
|
Added tuning results for the Radeon HD6750M GPU (Apple OpenCL)
|
2017-03-04 15:24:55 +01:00 |
|
Cedric Nugteren
|
e993ee077b
|
Added a proper data-preparation function for the TRSM tests
|
2017-03-04 15:21:33 +01:00 |
|
Cedric Nugteren
|
3fc73851f7
|
Added proper support for the b_offset argument in TRSM
|
2017-03-01 21:23:33 +01:00 |
|
Cedric Nugteren
|
e8d5923d27
|
Made a double to float cast explicit for MSVC compatibility (C2397)
|
2017-03-01 20:42:06 +01:00 |
|
Cedric Nugteren
|
d6f1b5fca3
|
Added L2 error computation and checking for half-precision tests
|
2017-02-27 21:49:20 +01:00 |
|
Cedric Nugteren
|
00281dad26
|
Fixed half-precision bugs in HTBMV/HTPMV/HTRMV/HSYR2K/HTRMM related to incorrect constants
|
2017-02-27 21:00:04 +01:00 |
|
Cedric Nugteren
|
4284fcd940
|
Updated the README documentation
|
2017-02-26 16:32:53 +01:00 |
|
Cedric Nugteren
|
7de7e7d8ed
|
Merge pull request #138 from CNugteren/triangular_solvers
Added the triangular solvers (TRSV/TRSM)
|
2017-02-26 16:26:41 +01:00 |
|
Cedric Nugteren
|
e09c26c706
|
Split the GEMM kernel further up to prevent C1091 in MSVC
|
2017-02-26 15:03:12 +01:00 |
|
Cedric Nugteren
|
dde67ac79e
|
Minor fix to the generator script
|
2017-02-26 14:53:58 +01:00 |
|
Cedric Nugteren
|
ea6790665d
|
Merge branch 'development' into triangular_solvers
|
2017-02-26 14:51:45 +01:00 |
|
Cedric Nugteren
|
a145890aaa
|
Added a guard against invalid buffer sizes in the prepare-data functions for tests
|
2017-02-26 14:37:29 +01:00 |
|
Cedric Nugteren
|
df7638c305
|
Fixed an out-of-bounds memory access when filling a matrix with a constant
|
2017-02-26 14:31:05 +01:00 |
|
Cedric Nugteren
|
b7310036ed
|
Removed half-precision support from the TRSM routine; too unstable
|
2017-02-26 12:56:21 +01:00 |
|
Cedric Nugteren
|
70d8c4bad7
|
Improved the correctness tests for complex numbers in case either real or imag is much larger than the other
|
2017-02-26 10:19:53 +01:00 |
|
Cedric Nugteren
|
a433987441
|
Fixes division in the kernel for inversion of complex numbers
|
2017-02-26 10:18:45 +01:00 |
|
Cedric Nugteren
|
ccac957f17
|
Added documentation for the TRSV and TRSM routines
|
2017-02-25 13:02:15 +01:00 |
|
Cedric Nugteren
|
492ee3d0a5
|
Removed the invert routine from the tests
|
2017-02-25 12:28:13 +01:00 |
|
Cedric Nugteren
|
e47d95887c
|
Added PrepareData function for TRSM to create proper test input
|
2017-02-25 12:23:04 +01:00 |
|
Cedric Nugteren
|
2f2a510c38
|
Implemented a simple row-major to col-major problem conversion for TRSM
|
2017-02-24 21:08:44 +01:00 |
|
Cedric Nugteren
|
1e5b5157bc
|
Fixed a few issues with the TRSM routine; some tests still failing
|
2017-02-22 20:31:33 +01:00 |
|
Cedric Nugteren
|
133ebfc834
|
Added data-preparation function for the TRSV tests and special nan/inf checks in the error checking to make the tests pass
|
2017-02-19 17:43:26 +01:00 |
|
Cedric Nugteren
|
0643a29af5
|
Added tuning parameters for the AMD RX480 GPU (Ellesmere)
|
2017-02-18 13:59:10 +01:00 |
|
Cedric Nugteren
|
0ea30263ac
|
Merge pull request #137 from CNugteren/custom_parameters
API to override tuning parameters
|
2017-02-18 12:34:38 +01:00 |
|
Cedric Nugteren
|
7b2170818f
|
Changed the override-parameters test such that it is compatible with more devices
|
2017-02-18 11:22:07 +01:00 |
|
Cedric Nugteren
|
2e0951c6dc
|
Fixed small typo in the documentation
|
2017-02-18 11:05:54 +01:00 |
|
Cedric Nugteren
|
fef11a208c
|
Added documentation for the OverrideParameters function
|
2017-02-18 11:02:57 +01:00 |
|
Cedric Nugteren
|
d6538dfc25
|
Fixed the naming of the C API of OverrideParameters and fixed the description
|
2017-02-18 10:59:38 +01:00 |
|
Cedric Nugteren
|
3d10690c83
|
Added missing documentation for the fill and clear cache functions
|
2017-02-18 10:32:32 +01:00 |
|
Cedric Nugteren
|
cda449a5c3
|
Added a C interface to the OverrideParameters function; added some in-line comments to the API
|
2017-02-16 21:14:48 +01:00 |
|
Cedric Nugteren
|
08bfb75a9d
|
Added input-sanity checks for the OverrideParameters function
|
2017-02-16 21:12:50 +01:00 |
|
Cedric Nugteren
|
bdc57221bd
|
Added simple tests for the OverrideParameters function
|
2017-02-14 21:09:00 +01:00 |
|
Cedric Nugteren
|
cdb3bb7166
|
Added first version of the OverrideParameters function
|
2017-02-13 20:53:06 +01:00 |
|
Cedric Nugteren
|
00eb55a2d4
|
Fixed a small bug in GEMV: unused kernel in parameter list
|
2017-02-13 20:48:32 +01:00 |
|
Cedric Nugteren
|
345a5feb9a
|
Split the database into several smaller cached per-kernel databases (in preparation of per-kernel database overrides)
|
2017-02-12 12:02:39 +01:00 |
|
Cedric Nugteren
|
faa842b927
|
Made RemoveBySubset from the cache work with references to keys
|
2017-02-12 11:58:20 +01:00 |
|
Cedric Nugteren
|
36b942a698
|
Added an option to remove items from the caches, optionally by a subset of 2 specific key-values only
|
2017-02-11 14:05:38 +01:00 |
|
Cedric Nugteren
|
dc93523204
|
Added tuning results for Titan X (Pascal version)
|
2017-02-08 21:14:38 +01:00 |
|
Cedric Nugteren
|
c248f900c0
|
Merge branch 'development' into triangular_solvers
|
2017-02-05 22:18:59 +01:00 |
|
Cedric Nugteren
|
e7cbb5915a
|
Fixed complex version of the TRSV kernel
|
2017-02-05 14:36:31 +01:00 |
|
Cedric Nugteren
|
c209dd7af9
|
Improved substition kernels a bit; added complex support
|
2017-02-04 22:48:06 +01:00 |
|
Cedric Nugteren
|
fec8c1a806
|
Completed a first STRSV implementation
|
2017-02-04 16:04:19 +01:00 |
|