llama.cpp/gguf-py/gguf
Ebey Abraham b9e74f9bca
llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490)
* phi2 implementation

* fix breaking change

* phi-2 : various fixes

* phi-2 : use layer norm eps

* py : whitespaces

* llama : fix meta KV override bug

* convert : phi don't add BOS token

* convert : revert "added_tokens_decoder" change

* phi-2 : scale Q instead of KQ for better precision

* ggml : fix NeoX rope to rotate just first n_dims

* cuda : less diff in the rope_neox kernel

* ggml : add ggml_mul_mat_set_prec

ggml-ci

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml-cuda.cu

Co-authored-by: slaren <slarengh@gmail.com>

* cuda : ggml_cuda_op_mul_mat_cublas support F32 precision

* cuda : remove oboslete comment

---------

Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-18 19:27:47 +02:00
..
__init__.py gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) 2023-11-11 08:04:50 +03:00
constants.py llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490) 2023-12-18 19:27:47 +02:00
gguf.py gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) 2023-11-11 08:04:50 +03:00
gguf_reader.py gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) 2023-11-11 08:04:50 +03:00
gguf_writer.py llama : add Mixtral support (#4406) 2023-12-13 14:04:25 +02:00
py.typed convert : various script cleanups/fixes + merges and special token handling (#2842) 2023-08-30 11:25:50 +03:00
tensor_mapping.py llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490) 2023-12-18 19:27:47 +02:00
vocab.py gguf-py : fail fast on nonsensical special token IDs (#4489) 2023-12-17 10:45:46 -05:00