llama.cpp/gguf-py/gguf
Andrew Canis 12247f4c69
llama : add Command-R support (#6033)
Information about the Command-R 35B model (128k context) can be found at:
	https://huggingface.co/CohereForAI/c4ai-command-r-v01

Based on the llama2 model with a few changes:

1) New hyper parameter to scale output logits (logit_scale)
2) Uses LayerNorm instead of RMSNorm
3) Transfomer layers have a single shared LayerNorm that feeds into both the
   self-attention and FFN layers in parallel. There is no post-attention LayerNorm.
4) No support for Rotary Position Embeddings (RoPE) scaling
5) No biases used

Find GGUF files here:
	https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF

To convert model to GGUF format yourself:

1) Download Command-R Hugging Face safetensors:
	git lfs install
	git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01

2) Run:
	python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01
2024-03-15 22:41:22 +02:00
..
__init__.py gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) 2023-11-11 08:04:50 +03:00
constants.py llama : add Command-R support (#6033) 2024-03-15 22:41:22 +02:00
gguf.py gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) 2023-11-11 08:04:50 +03:00
gguf_reader.py gguf : add support for I64 and F64 arrays (#6062) 2024-03-15 10:46:51 +02:00
gguf_writer.py llama : add Command-R support (#6033) 2024-03-15 22:41:22 +02:00
py.typed convert : various script cleanups/fixes + merges and special token handling (#2842) 2023-08-30 11:25:50 +03:00
tensor_mapping.py llama : support Mamba Selective State Space Models (#5328) 2024-03-08 17:31:00 -05:00
vocab.py fix(gguf-py): special tokens are no longer skipped when add_<token>_token is set to false (#5487) 2024-02-15 14:14:37 +01:00