History

Georgi Gerganov 4760e7cc0b sync : ggml (backend v2) (#3912 ) * sync : ggml (backend v2) (wip) * sync : migrate examples and llama.cpp to dynamic graphs (wip) * sync : update tests + fix max op params to 64 ggml-ci * sync : ggml-cuda ggml-ci * llama : fix save/load state context size ggml-ci * sync : try to fix build on tvOS * sync : pass custom graph sizes in training examples * sync : update graph copies to new ggml API * sync : update sync-ggml.sh with new files * scripts : fix header in sync script * train : fix context size calculations * llama : increase inference graph size up to 4096 nodes * train : allocate grads for backward graphs * train : allocate grads for gb_tmp		2023-11-13 14:16:23 +02:00
..
CMakeLists.txt	train : finetune LORA (#2632 )	2023-09-28 21:40:11 +03:00
export-lora.cpp	sync : ggml (backend v2) (#3912 )	2023-11-13 14:16:23 +02:00
README.md	train : finetune LORA (#2632 )	2023-09-28 21:40:11 +03:00

README.md

export-lora

Apply LORA adapters to base model and export the resulting model.

usage: export-lora [options]

options:
  -h, --help                         show this help message and exit
  -m FNAME, --model-base FNAME       model path from which to load base model (default '')
  -o FNAME, --model-out FNAME        path to save exported model (default '')
  -l FNAME, --lora FNAME             apply LoRA adapter
  -s FNAME S, --lora-scaled FNAME S  apply LoRA adapter with user defined scaling S
  -t N, --threads N                  number of threads to use during computation (default: 4)

For example:

./bin/export-lora \
    -m open-llama-3b-v2-q8_0.gguf \
    -o open-llama-3b-v2-q8_0-english2tokipona-chat.gguf \
    -l lora-open-llama-3b-v2-q8_0-english2tokipona-chat-LATEST.bin

Multiple LORA adapters can be applied by passing multiple -l FN or -s FN S command line parameters.