llama.cpp/gguf-py/gguf
postmasters 83e633c27e
llama : differentiate the KV dims in the attention (#4657)
* Add n_key_dim and n_value_dim

Some models use values that are not derived from `n_embd`.
Also remove `n_embd_head` and `n_embd_gqa` because it is not clear
which "head" is referred to (key or value).

Fix issue #4648.

* Fix `llm_build_kqv` to use `n_value_gqa`

* Rebase

* Rename variables

* Fix llm_build_kqv to be more generic wrt n_embd_head_k

* Update default values for n_embd_head_k and n_embd_head_v

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix llm_load_tensors: the asserts were not backcompat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-02 13:51:28 +02:00
..
__init__.py gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) 2023-11-11 08:04:50 +03:00
constants.py llama : differentiate the KV dims in the attention (#4657) 2024-01-02 13:51:28 +02:00
gguf.py gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) 2023-11-11 08:04:50 +03:00
gguf_reader.py gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) 2023-11-11 08:04:50 +03:00
gguf_writer.py llama : differentiate the KV dims in the attention (#4657) 2024-01-02 13:51:28 +02:00
py.typed convert : various script cleanups/fixes + merges and special token handling (#2842) 2023-08-30 11:25:50 +03:00
tensor_mapping.py gpt2 : Add gpt2 architecture integration (#4555) 2023-12-28 15:03:57 +01:00
vocab.py py : open merges file as 'utf-8' (#4566) 2023-12-21 19:07:34 +02:00