llama.cpp/examples
Willy Tarreau 35a84916fb
main: add the possibility to open the prompt cache read-only (#1640)
The prompt cache constitutes a nice speed up when using the same prompt
prefix across multiple evaluations, but when using it, it will also be
updated, which is not always desirable. One use case is to have a large
prompt containing some context and usage rules, and a second part
containing variable data of the problem being studied. In this case it's
desirable to be able to save the first part once, and to always reuse it
as-is without updating it with the second part.

The new argument --prompt-cache-ro enables this read-only mode on the
prompt cache. The prompt's contents that match the cache are loaded
from the cache but the rest is not modified. This allowed to reduce a
total analysis time from 112s to 49.7s here, without having to backup
and restore a copy of the prompt, which takes significant time at 500
MB.

Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-06-06 22:10:17 -04:00
..
baby-llama ggml : implement backward pass for llama + small training-llama-from-scratch example (#1360) 2023-05-13 15:56:40 +03:00
benchmark llama : add llama_init_backend() API (close #1527) 2023-05-20 11:06:37 +03:00
embedding llama : add llama_init_backend() API (close #1527) 2023-05-20 11:06:37 +03:00
jeopardy examples : add Jeopardy example (#1168) 2023-04-28 19:13:33 +03:00
main main: add the possibility to open the prompt cache read-only (#1640) 2023-06-06 22:10:17 -04:00
metal llama : Metal inference (#1642) 2023-06-04 23:34:30 +03:00
perplexity llama : add llama_init_backend() API (close #1527) 2023-05-20 11:06:37 +03:00
quantize ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) 2023-06-05 22:56:18 +03:00
quantize-stats ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) 2023-06-05 22:56:18 +03:00
save-load-state Remove unused n_parts parameter (#1509) 2023-05-17 22:12:01 +00:00
server Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) 2023-06-06 21:33:23 +02:00
alpaca.sh examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107) 2023-04-22 09:54:33 +03:00
chat-13B.bat Create chat-13B.bat (#592) 2023-03-29 20:21:09 +03:00
chat-13B.sh examples : read chat prompts from a template file (#1196) 2023-05-03 20:58:11 +03:00
chat-persistent.sh chat-persistent.sh : use bracket expressions in grep (#1564) 2023-05-24 09:16:22 +03:00
chat.sh If n_predict == -1, generate forever 2023-03-25 21:51:41 +02:00
CMakeLists.txt llama : Metal inference (#1642) 2023-06-04 23:34:30 +03:00
common.cpp main: add the possibility to open the prompt cache read-only (#1640) 2023-06-06 22:10:17 -04:00
common.h main: add the possibility to open the prompt cache read-only (#1640) 2023-06-06 22:10:17 -04:00
gpt4all.sh examples : add -n to alpaca and gpt4all scripts (#706) 2023-04-13 16:03:39 +03:00
Miku.sh examples : various prompt and example fixes (#1298) 2023-05-03 18:26:47 +03:00
reason-act.sh add example of re-act pattern (#583) 2023-03-29 10:10:24 -05:00