llama.cpp

History

Andrew Canis 12247f4c69 llama : add Command-R support (#6033 ) Information about the Command-R 35B model (128k context) can be found at: https://huggingface.co/CohereForAI/c4ai-command-r-v01 Based on the llama2 model with a few changes: 1) New hyper parameter to scale output logits (logit_scale) 2) Uses LayerNorm instead of RMSNorm 3) Transfomer layers have a single shared LayerNorm that feeds into both the self-attention and FFN layers in parallel. There is no post-attention LayerNorm. 4) No support for Rotary Position Embeddings (RoPE) scaling 5) No biases used Find GGUF files here: https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF To convert model to GGUF format yourself: 1) Download Command-R Hugging Face safetensors: git lfs install git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01 2) Run: python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01		2024-03-15 22:41:22 +02:00
..
__init__.py	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )	2023-11-11 08:04:50 +03:00
constants.py	llama : add Command-R support (#6033 )	2024-03-15 22:41:22 +02:00
gguf.py	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )	2023-11-11 08:04:50 +03:00
gguf_reader.py	gguf : add support for I64 and F64 arrays (#6062 )	2024-03-15 10:46:51 +02:00
gguf_writer.py	llama : add Command-R support (#6033 )	2024-03-15 22:41:22 +02:00
py.typed	convert : various script cleanups/fixes + merges and special token handling (#2842 )	2023-08-30 11:25:50 +03:00
tensor_mapping.py	llama : support Mamba Selective State Space Models (#5328 )	2024-03-08 17:31:00 -05:00
vocab.py	fix(gguf-py): special tokens are no longer skipped when add_<token>_token is set to false (#5487 )	2024-02-15 14:14:37 +01:00