Commit graph

1501 commits

Author SHA1 Message Date
Galunid a75fa576ab
scripts: Generalize convert scripts (#3838)
* Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py
2023-11-09 11:09:29 +01:00
Mihai 57ad015dc3
server : add min_p param (#3877)
* Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841

* Use spaces instead of tabs

* Update index.html.hpp after running deps.sh

* Fix test - fix line ending
2023-11-08 20:00:34 -06:00
slaren 875fb42871
ggml-alloc : fix backend assignments of views (#3982) 2023-11-08 13:15:14 +01:00
Jared Van Bortel 0a7c980b6f
gguf : track writer state, free unneeded tensors, cleanup (#3871) 2023-11-07 12:43:04 -05:00
Georgi Gerganov 413503d4b9
make : do not add linker flags when compiling static llava lib (#3977) 2023-11-07 20:25:32 +03:00
xaedes e9c1cecb9d
ggml : fix backward rope after YaRN (#3974)
* fix backward process of rope

rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions.

the code for the backward process is nearly identically to the forward process:
the only difference is the sign of the sin-values.

to avoid future regressions remove the near-duplicate backward functions and reuse the forward code:

for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`.
the sin-values will be negated when forward is false.

* fix finetune rope call to use correct default attn_factor of 1.0f

* remove unused `ggml_rope_xpos_back`

it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants.

* fix comments explaining the sinus sign in ggml_forward_rope

* add missing function arguments in declaration

* fix function argument type in declaration
2023-11-07 10:04:51 +02:00
Matthew Tejo 54b4df8886
Use params when loading models in llava-cli (#3976)
llava-cli was loading models with default params and ignoring settings
from the cli. This switches to a generic function to load the params
from the cli options.
2023-11-07 10:43:59 +03:00
Meng Zhang 46876d2a2c
cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)
* protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build

* doc: add comments to ggml_cublas_loaded()

* fix defined(...)
2023-11-07 08:49:08 +02:00
Damian Stewart 381efbf480
llava : expose as a shared library for downstream projects (#3613)
* wip llava python bindings compatibility

* add external llava API

* add base64 in-prompt image support

* wip refactor image loading

* refactor image load out of llava init

* cleanup

* further cleanup; move llava-cli into its own file and rename

* move base64.hpp into common/

* collapse clip and llava libraries

* move llava into its own subdir

* wip

* fix bug where base64 string was not removed from the prompt

* get libllava to output in the right place

* expose llava methods in libllama.dylib

* cleanup memory usage around clip_image_*

* cleanup and refactor *again*

* update headerdoc

* build with cmake, not tested (WIP)

* Editorconfig

* Editorconfig

* Build with make

* Build with make

* Fix cyclical depts on Windows

* attempt to fix build on Windows

* attempt to fix build on Windows

* Upd TODOs

* attempt to fix build on Windows+CUDA

* Revert changes in cmake

* Fix according to review comments

* Support building as a shared library

* address review comments

---------

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2023-11-07 00:36:23 +03:00
slaren 2833a6f63c
ggml-cuda : fix f16 mul mat (#3961)
* ggml-cuda : fix f16 mul mat

ggml-ci

* silence common.cpp warning (bonus)
2023-11-05 18:45:16 +01:00
Kerfuffle d9ccce2e33
Allow common process_escapes to handle \x sequences (#3928)
* Allow common process_escapes to handle \x sequences

* Fix edge case when second hex digit is NUL
2023-11-05 10:06:06 -07:00
Thái Hoàng Tâm bb60fd0bf6
server : fix typo for --alias shortcut from -m to -a (#3958) 2023-11-05 18:15:27 +02:00
Jared Van Bortel 132d25b8a6
cuda : fix disabling device with --tensor-split 1,0 (#3951)
Co-authored-by: slaren <slarengh@gmail.com>
2023-11-05 10:08:57 -05:00
Meng Zhang 3d48f42efc
llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)
as done in https://github.com/ggerganov/llama.cpp/pull/3827
2023-11-05 14:40:08 +02:00
Eve c41ea36eaa
cmake : MSVC instruction detection (fixed up #809) (#3923)
* Add detection code for avx

* Only check hardware when option is ON

* Modify per code review sugguestions

* Build locally will detect CPU

* Fixes CMake style to use lowercase like everywhere else

* cleanup

* fix merge

* linux/gcc version for testing

* msvc combines avx2 and fma into /arch:AVX2 so check for both

* cleanup

* msvc only version

* style

* Update FindSIMD.cmake

---------

Co-authored-by: Howard Su <howard0su@gmail.com>
Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>
2023-11-05 10:03:09 +02:00
Eve a7fac013cf
ci : use intel sde when ci cpu doesn't support avx512 (#3949) 2023-11-05 09:46:44 +02:00
slaren 48ade94538
cuda : revert CUDA pool stuff (#3944)
* Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)"

This reverts commit 629f917cd6.

* Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)"

This reverts commit d6069051de.

ggml-ci
2023-11-05 09:12:13 +02:00
Kerfuffle f28af0d81a
gguf-py: Support 01.AI Yi models (#3943) 2023-11-04 16:20:34 -06:00
Peter Sugihara d9b33fe95b
metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) 2023-11-03 21:18:18 +02:00
Xiao-Yong Jin 5ba3746171
ggml-metal: fix yarn rope (#3937) 2023-11-03 14:00:31 -04:00
slaren abb77e7319
ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) 2023-11-03 12:13:09 +01:00
Georgi Gerganov 8f961abdc4
speculative : change default p_accept to 0.5 + CLI args (#3919)
ggml-ci
2023-11-03 09:41:56 +02:00
Georgi Gerganov 05816027d6
common : YAYF (yet another YARN fix) (#3925)
ggml-ci
2023-11-03 09:24:00 +02:00
cebtenzzre 3fdbe6b66b
llama : change yarn_ext_factor placeholder to -1 (#3922) 2023-11-03 08:31:58 +02:00
Kerfuffle 629f917cd6
cuda : add ROCM aliases for CUDA pool stuff (#3918) 2023-11-02 21:58:22 +02:00
Andrei 51b2fc11f7
cmake : fix relative path to git submodule index (#3915) 2023-11-02 21:40:31 +02:00
Georgi Gerganov 224e7d5b14
readme : add notice about #3912 2023-11-02 20:44:12 +02:00
Georgi Gerganov c7743fe1c1
cuda : fix const ptrs warning causing ROCm build issues (#3913) 2023-11-02 20:32:11 +02:00
Oleksii Maryshchenko d6069051de
cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)
* Using cuda memory pools for async alloc/dealloc.

* If cuda device doesnt support memory pool than use old implementation.

* Removed redundant cublasSetStream

---------

Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>
2023-11-02 19:10:39 +02:00
Georgi Gerganov 4ff1046d75
gguf : print error for GGUFv1 files (#3908) 2023-11-02 16:22:30 +02:00
slaren 21958bb393
cmake : disable LLAMA_NATIVE by default (#3906) 2023-11-02 14:10:33 +02:00
Georgi Gerganov 2756c4fbff
gguf : remove special-case code for GGUFv1 (#3901)
ggml-ci
2023-11-02 11:20:21 +02:00
Georgi Gerganov 1efae9b7dc
llm : prevent from 1-D tensors being GPU split (#3697) 2023-11-02 09:54:44 +02:00
cebtenzzre b12fa0d1c1
build : link against build info instead of compiling against it (#3879)
* cmake : fix build when .git does not exist

* cmake : simplify BUILD_INFO target

* cmake : add missing dependencies on BUILD_INFO

* build : link against build info instead of compiling against it

* zig : make build info a .cpp source instead of a header

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>

* cmake : revert change to CMP0115

---------

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
2023-11-02 08:50:16 +02:00
Georgi Gerganov 4d719a6d4e
cuda : check if this fixes Pascal card regression (#3882) 2023-11-02 08:35:10 +02:00
Georgi Gerganov 183b3fac6c
metal : fix build errors and kernel sig after #2268 (#3898) 2023-11-02 08:33:37 +02:00
cebtenzzre 2fffa0d61f
cuda : fix RoPE after #2268 (#3897) 2023-11-02 07:49:44 +02:00
cebtenzzre 0eb332a10f
llama : fix llama_context_default_params after #2268 (#3893) 2023-11-01 19:29:14 -04:00
slaren d02e98cde0
ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891)
* ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel

* fix warnings
2023-11-01 23:10:09 +01:00
cebtenzzre 898aeca90a
llama : implement YaRN RoPE scaling (#2268)
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: Jeffrey Quesnelle <jquesnelle@gmail.com>
2023-11-01 18:04:33 -04:00
Georgi Gerganov c43c2da8af
llm : fix llm_build_kqv taking unused tensor (benign, #3837) 2023-11-01 23:08:30 +02:00
Georgi Gerganov 523e49b111
llm : fix falcon norm after refactoring (#3837) 2023-11-01 23:00:50 +02:00
Georgi Gerganov e16b9fa4ba
metal : multi-simd softmax (#3710)
ggml-ci
2023-11-01 21:25:00 +02:00
Georgi Gerganov ff8f9a88da
common : minor (#3715) 2023-11-01 21:15:55 +02:00
Georgi Gerganov 50337961a6
llm : add llm_build_context (#3881)
* llm : add llm_build_context

* llm : deduce norm eps based on type + explict max_alibi_bias, clamp_kqv

* llm : restore the non-graph llm_build_ functional API

ggml-ci

* llm : cleanup + comments
2023-11-01 20:11:02 +02:00
bandoti 0e40806c1c
common : allow caller to handle help/argument exceptions (#3715)
* Allow caller to handle help/argument exceptions

* Prepend newline to usage output

* Add new gpt_params_parse_ex function to hide arg-parse impl

* Fix issue blocking success case

* exit instead of returning false

* Update common/common.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update common/common.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-01 19:42:01 +02:00
staviq a2758d08e4
log : make generating separate log files optional (#3787)
* impl --log-new, --log-append

* Update common/log.h

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>

* Update common/log.h

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>

* Apply suggestions from code review

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>

---------

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
2023-11-01 16:18:27 +02:00
l3utterfly e75dfdd31b
sampling : null grammar field after reset (#3885) 2023-11-01 15:40:43 +02:00
Georgi Gerganov 9a3b4f6c86
ggml : fix UNUSED macro (#3762) 2023-11-01 13:50:45 +02:00
Andrew Godfrey 73bdcb395e
finetune : add -ngl parameter (#3762)
* Add '-ngl' support to finetune.cpp

* Add fprintf in ggml_cuda_op_add

When I tried CUDA offloading during finetuning following the readme, I got an assert here.
This probably isn't an important case because inference later gives a warning saying you should use f16 or f32 instead when using lora

* Add 'finetune.sh', which currently fails when using GPU

"error: operator (): Finetuning on tensors with type 'f16' is not yet supported"

* tweak finetune.sh

* Suppress some warnings in ggml.c

* Add f16 implementation to ggml_compute_forward_add_f16_f32

* Add an f16 case to ggml_add_cast_impl and llama_build_lora_finetune_graphs

* finetune.sh: Edit comments

* Add "add_f16_f32_f32_cuda"

* Tweak an error message

* finetune.sh: Add an optional LLAMA_MODEL_DIR variable

* finetune.sh: Add an optional LLAMA_TRAINING_DIR variable

* train : minor

* tabs to spaces

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
2023-11-01 13:49:04 +02:00