llama.cpp/tests
Kawrakow 49662cbed3
ggml : SOTA 2-bit quants (add IQ2_XS) (#4856)
* iq2_xs: basics

* iq2_xs: this should have been in the basics

* iq2_xs: CUDA and scalar CPU works

* iq2_xs: WIP Metal

* iq2_xs: Metal now works

* iq2_xs: working, but dog slow, ARM_NEON dot product

* iq2_xs: better ARM_NEON dot product

We are now at 19.5 t/s for TG-128 and 61 t/s for PP-512 when
running on the CPU.

* iq2_xs: AVX2 dot product - 19.5 t/s

* iq2_xs: faster AVX2 dit product

21.4 t/s for TG-128, 59.2 t/s for PP-512.
The latter is 2x compared to the previous version.

* iq2_xs: had forgotten to delete iq2-data.h

* Add llama enum for IQ2_XS

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-11 21:39:39 +02:00
..
CMakeLists.txt cmake : fix ld warning duplicate libraries libllama.a (#4671) 2023-12-29 16:39:15 +02:00
test-backend-ops.cpp CUDA: faster softmax via shared memory + fp16 math (#4742) 2024-01-09 08:58:55 +01:00
test-c.c tests : add a C compliance test (#2848) 2023-08-30 09:20:26 +03:00
test-double-float.cpp ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861) 2023-10-30 19:19:15 +02:00
test-grad0.cpp cuda : improve cuda pool efficiency using virtual memory (#4606) 2023-12-24 14:34:22 +01:00
test-grammar-parser.cpp gguf : new file format with flexible meta data (beta) (#2398) 2023-08-21 23:07:43 +03:00
test-llama-grammar.cpp gguf : new file format with flexible meta data (beta) (#2398) 2023-08-21 23:07:43 +03:00
test-opt.cpp sync : ggml (backend v2) (#3912) 2023-11-13 14:16:23 +02:00
test-quantize-fns.cpp ggml : SOTA 2-bit quants (add IQ2_XS) (#4856) 2024-01-11 21:39:39 +02:00
test-quantize-perf.cpp ggml : use ggml_row_size where possible (#4472) 2023-12-14 20:05:21 +01:00
test-rope.cpp llama : custom attention mask + parallel decoding + no context swaps (#3228) 2023-09-28 19:04:36 +03:00
test-sampling.cpp sampling : refactor init to use llama_sampling_params (#3696) 2023-10-20 21:07:23 +03:00
test-tokenizer-0-falcon.cpp Minor improvements in GPT2 tokenizer (#3567) 2023-10-10 18:59:52 +02:00
test-tokenizer-0-falcon.py ci : add flake8 to github actions (python linting) (#4129) 2023-11-20 11:35:47 +01:00
test-tokenizer-0-llama.cpp Minor improvements in GPT2 tokenizer (#3567) 2023-10-10 18:59:52 +02:00
test-tokenizer-0-llama.py ci : add flake8 to github actions (python linting) (#4129) 2023-11-20 11:35:47 +01:00
test-tokenizer-1-bpe.cpp Add more tokenizer tests (#3742) 2023-10-24 09:17:17 +02:00
test-tokenizer-1-llama.cpp Work on the BPE tokenizer (#3252) 2023-10-03 09:16:26 +02:00