mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2025-09-10 17:14:36 +00:00
# Conflicts: # .github/workflows/bench.yml # .github/workflows/build.yml # .github/workflows/python-check-requirements.yml # README.md # docs/backend/SYCL.md # flake.lock # ggml/CMakeLists.txt # ggml/src/kompute-shaders/op_rope_f16.comp # ggml/src/kompute-shaders/op_rope_f32.comp # ggml/src/kompute-shaders/rope_common.comp |
||
---|---|---|
.. | ||
export-lora.cpp | ||
README.md |
export-lora
Apply LORA adapters to base model and export the resulting model.
usage: llama-export-lora [options]
options:
-m, --model model path from which to load base model (default '')
--lora FNAME path to LoRA adapter (can be repeated to use multiple adapters)
--lora-scaled FNAME S path to LoRA adapter with user defined scaling S (can be repeated to use multiple adapters)
-t, --threads N number of threads to use during computation (default: 4)
-o, --output FNAME output file (default: 'ggml-lora-merged-f16.gguf')
For example:
./bin/llama-export-lora \
-m open-llama-3b-v2.gguf \
-o open-llama-3b-v2-english2tokipona-chat.gguf \
--lora lora-open-llama-3b-v2-english2tokipona-chat-LATEST.gguf
Multiple LORA adapters can be applied by passing multiple --lora FNAME
or --lora-scaled FNAME S
command line parameters:
./bin/llama-export-lora \
-m your_base_model.gguf \
-o your_merged_model.gguf \
--lora-scaled lora_task_A.gguf 0.5 \
--lora-scaled lora_task_B.gguf 0.5