llama.cpp

History

Georgi Gerganov bcc0eb4591 llama : per-layer KV cache + quantum K cache (#4309 ) * per-layer KV * remove unnecessary copies * less code duplication, offload k and v separately * llama : offload KV cache per-layer * llama : offload K shift tensors * llama : offload for rest of the model arches * llama : enable offload debug temporarily * llama : keep the KV related layers on the device * llama : remove mirrors, perform Device -> Host when partial offload * common : add command-line arg to disable KV cache offloading * llama : update session save/load * llama : support quantum K cache (#4312) * llama : support quantum K cache (wip) * metal : add F32 -> Q8_0 copy kernel * cuda : add F32 -> Q8_0 copy kernel ggml-ci * cuda : use mmv kernel for quantum cache ops * llama : pass KV cache type through API * llama : fix build ggml-ci * metal : add F32 -> Q4_0 copy kernel * metal : add F32 -> Q4_1 copy kernel * cuda : wip * cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels * llama-bench : support type_k/type_v * metal : use mm kernel only for quantum KV cache * cuda : add comment * llama : remove memory_f16 and kv_f16 flags --------- Co-authored-by: slaren <slarengh@gmail.com> * readme : add API change notice --------- Co-authored-by: slaren <slarengh@gmail.com>		2023-12-07 13:03:17 +02:00
..
base64.hpp	llava : expose as a shared library for downstream projects (#3613 )	2023-11-07 00:36:23 +03:00
build-info.cpp.in	build : link against build info instead of compiling against it (#3879 )	2023-11-02 08:50:16 +02:00
CMakeLists.txt	build : fix build info generation and cleanup Makefile (#3920 )	2023-12-01 00:23:08 +02:00
common.cpp	llama : per-layer KV cache + quantum K cache (#4309 )	2023-12-07 13:03:17 +02:00
common.h	llama : per-layer KV cache + quantum K cache (#4309 )	2023-12-07 13:03:17 +02:00
console.cpp	check C++ code with -Wmissing-declarations (#3184 )	2023-09-15 15:38:27 -04:00
console.h	gguf : new file format with flexible meta data (beta) (#2398 )	2023-08-21 23:07:43 +03:00
grammar-parser.cpp	grammar-parser : fix typo (#4318 )	2023-12-04 09:57:35 +02:00
grammar-parser.h	gguf : new file format with flexible meta data (beta) (#2398 )	2023-08-21 23:07:43 +03:00
log.h	log : make generating separate log files optional (#3787 )	2023-11-01 16:18:27 +02:00
sampling.cpp	common : fix compile warning	2023-12-06 10:41:03 +02:00
sampling.h	sampling : custom samplers order (#4285 )	2023-12-05 12:05:51 +02:00
stb_image.h	examples: support LLaVA v1.5 (multimodal model) (#3436 )	2023-10-12 18:23:18 +03:00
train.cpp	train : move number of gpu layers argument parsing to common/train.cpp (#4074 )	2023-11-17 17:19:16 +02:00
train.h	sync : ggml (backend v2) (#3912 )	2023-11-13 14:16:23 +02:00