llama.cpp/examples/perplexity
Kerfuffle 91f6499393
Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040)
* gguf-py: gguf-dump: Respect --no-tensor flag in JSON mode.

* Respect add_bos_token GGUF metadata value

* gguf-py: Try to fix SpecialVocab giving up too easily for the Nth time
2023-11-16 19:14:37 -07:00
..
CMakeLists.txt build : link against build info instead of compiling against it (#3879) 2023-11-02 08:50:16 +02:00
perplexity.cpp Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040) 2023-11-16 19:14:37 -07:00
README.md readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340) 2023-09-27 18:30:36 +03:00

perplexity

TODO

Llama 2 70B Scorechart

Quantization Model size (GiB) Perplexity Delta to fp16
Q4_0 36.20 3.5550 3.61%
Q4_1 40.20 3.5125 2.37%
Q5_0 44.20 3.4744 1.26%
Q2_K 27.27 3.7339 8.82%
Q3_K_S 27.86 3.7019 7.89%
Q3_K_M 30.83 3.5932 4.72%
Q3_K_L 33.67 3.5617 3.80%
Q4_K_S 36.39 3.4852 1.57%
Q4_K_M 38.54 3.4725 1.20%
Q5_K_S 44.20 3.4483 0.50%
Q5_K_M 45.41 3.4451 0.40%
Q6_K 52.70 3.4367 0.16%
fp16 128.5 3.4313 -