llama.cpp/examples/quantize
Kerfuffle 4f0154b0ba
llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691)
* Add support for quantizing already quantized models

* Threaded dequantizing and f16 to f32 conversion

* Clean up thread blocks with spares calculation a bit

* Use std::runtime_error exceptions.
2023-06-10 10:59:17 +03:00
..
CMakeLists.txt Add git-based build information for better issue tracking (#1232) 2023-05-01 18:23:47 +02:00
quantize.cpp llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691) 2023-06-10 10:59:17 +03:00
README.md Overhaul the examples structure 2023-03-25 20:26:40 +02:00

quantize

TODO