haraldwolff/llama.cpp

History

Kerfuffle 6e08281e58 Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843 ) * Extend llama_kv_cache_seq_rm to allow matichng any sequence * Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear Use llama_kv_cache_clear for cache clearing Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality		2023-10-29 11:31:40 -06:00
..
CMakeLists.txt	cmake : install targets (#2256 )	2023-07-19 10:01:11 +03:00
perplexity.cpp	Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843 )	2023-10-29 11:31:40 -06:00
README.md	readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340 )	2023-09-27 18:30:36 +03:00

README.md

perplexity

TODO

Llama 2 70B Scorechart

Quantization	Model size (GiB)	Perplexity	Delta to fp16
Q4_0	36.20	3.5550	3.61%
Q4_1	40.20	3.5125	2.37%
Q5_0	44.20	3.4744	1.26%
Q2_K	27.27	3.7339	8.82%
Q3_K_S	27.86	3.7019	7.89%
Q3_K_M	30.83	3.5932	4.72%
Q3_K_L	33.67	3.5617	3.80%
Q4_K_S	36.39	3.4852	1.57%
Q4_K_M	38.54	3.4725	1.20%
Q5_K_S	44.20	3.4483	0.50%
Q5_K_M	45.41	3.4451	0.40%
Q6_K	52.70	3.4367	0.16%
fp16	128.5	3.4313	-