Added first version of a roadmap

This commit is contained in:
Cedric Nugteren 2017-10-20 18:21:31 +02:00
parent 472f90501c
commit 5fd1f2fc60
2 changed files with 13 additions and 1 deletions

View file

@ -10,7 +10,7 @@ CLBlast: The tuned OpenCL BLAS library
CLBlast is a modern, lightweight, performant and tunable OpenCL BLAS library written in C++11. It is designed to leverage the full performance potential of a wide variety of OpenCL devices from different vendors, including desktop and laptop GPUs, embedded GPUs, and other accelerators. CLBlast implements BLAS routines: basic linear algebra subprograms operating on vectors and matrices. See [the CLBlast website](https://cnugteren.github.io/clblast) for performance reports on various devices as well as the latest CLBlast news.
The library is not tuned for all possible OpenCL devices: __if out-of-the-box performance is poor, please run the tuners first__. See below for a list of already tuned devices and instructions on how to tune yourself and contribute to future releases of the CLBlast library.
The library is not tuned for all possible OpenCL devices: __if out-of-the-box performance is poor, please run the tuners first__. See below for a list of already tuned devices and instructions on how to tune yourself and contribute to future releases of the CLBlast library. See also the [CLBlast feature roadmap](ROADMAP.md) to get an indication of the future of CLBlast.
Why CLBlast and not clBLAS or cuBLAS?

12
ROADMAP.md Normal file
View file

@ -0,0 +1,12 @@
CLBlast feature road-map
================
This file gives an overview of the main features planned for addition to CLBlast. A first-order indication time-frame for development time is provided:
| Issue# | When | Who | What |
| -----------|-------------|-----------|---------------|
| N/A | Oct '17 | CNugteren | CUDA API for CLBlast |
| #169, #195 | Oct-Nov '17 | CNugteren | Auto-tuning the kernel selection parameter |
| #181, #201 | Nov '17 | CNugteren | Compilation for Android and testing on Qualcomm Adreno |
| #128, #205 | Nov-Dec '17 | CNugteren | Pre-processor for loop unrolling and array-to-register-promotion for e.g. ARM Mali |
| #169 | '17 | dividiti | Problem-specific tuning parameter selection |