llama.cpp

History

Nam D. Tran f6793491b5 llama : add AWQ for llama, llama2, mpt, and mistral models (#4593 ) * update: awq support llama-7b model * update: change order * update: benchmark results for llama2-7b * update: mistral 7b v1 benchmark * update: support 4 models * fix: Readme * update: ready for PR * update: readme * fix: readme * update: change order import * black * format code * update: work for bot mpt and awqmpt * update: readme * Rename to llm_build_ffn_mpt_awq * Formatted other files * Fixed params count * fix: remove code * update: more detail for mpt * fix: readme * fix: readme * update: change folder architecture * fix: common.cpp * fix: readme * fix: remove ggml_repeat * update: cicd * update: cicd * uppdate: remove use_awq arg * update: readme * llama : adapt plamo to new ffn ggml-ci --------- Co-authored-by: Trần Đức Nam <v.namtd12@vinai.io> Co-authored-by: Le Hoang Anh <v.anhlh33@vinai.io> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2023-12-27 17:39:45 +02:00
..
examples	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )	2023-11-11 08:04:50 +03:00
gguf	llama : add AWQ for llama, llama2, mpt, and mistral models (#4593 )	2023-12-27 17:39:45 +02:00
scripts	Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040 )	2023-11-16 19:14:37 -07:00
tests	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )	2023-11-11 08:04:50 +03:00
LICENSE	gguf : make gguf pip-installable	2023-08-25 09:26:05 +03:00
pyproject.toml	llama : add Mixtral support (#4406 )	2023-12-13 14:04:25 +02:00
README.md	gguf-py : fix broken link	2023-12-21 23:20:36 +02:00

README.md

gguf

This is a Python package for writing binary files in the GGUF (GGML Universal File) format.

See convert-llama-hf-to-gguf.py as an example for its usage.

Installation

pip install gguf

API Examples/Simple Tools

examples/writer.py — Generates example.gguf in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.

scripts/gguf-dump.py — Dumps a GGUF file's metadata to the console.

scripts/gguf-set-metadata.py — Allows changing simple metadata values in a GGUF file by key.

scripts/gguf-convert-endian.py — Allows converting the endianness of GGUF files.

Development

Maintainers who participate in development of this package are advised to install it in editable mode:

cd /path/to/llama.cpp/gguf-py

pip install --editable .

Note: This may require to upgrade your Pip installation, with a message saying that editable installation currently requires setup.py. In this case, upgrade Pip to the latest:

pip install --upgrade pip

Automatic publishing with CI

There's a GitHub workflow to make a release automatically upon creation of tags in a specified format.

Bump the version in pyproject.toml.
Create a tag named gguf-vx.x.x where x.x.x is the semantic version number.

git tag -a gguf-v1.0.0 -m "Version 1.0 release"

Push the tags.

git push origin --tags

Manual publishing

If you want to publish the package manually for any reason, you need to have twine and build installed:

pip install build twine

Then, follow these steps to release a new version:

Bump the version in pyproject.toml.
Build the package:

python -m build

Upload the generated distribution archives:

python -m twine upload dist/*

TODO

Add tests
Include conversion scripts as command line entry points in this package.