koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-22 19:47:49 +00:00

History

Aman Gupta 52fb93a2bd server : free draft/MTP resources on sleep to fix VRAM leak (#23461 ) The destroy() function in server_context_impl only cleaned up the main model and context (via llama_init.reset()) but did not free the speculative decoder (spec), draft context (ctx_dft), or draft model (model_dft). For MTP (Multi-Token Prediction) models, ctx_dft holds GPU-allocated resources (KV cache, compute buffers) that are not freed when entering the sleeping state. On each sleep/resume cycle, new resources are allocated without the old ones being freed, leading to a VRAM leak that eventually crashes the server with out-of-memory errors. Fix by explicitly resetting spec, ctx_dft, and model_dft in destroy() before resetting llama_init, ensuring proper cleanup order to avoid use-after-free. ref: https://github.com/ggml-org/llama.cpp/issues/23395 Assisted-by: llama.cpp:local pi		2026-05-21 16:11:11 +08:00
..
batched-bench	app : add batched-bench, fit-params, quantize & perplexity (#23459 )	2026-05-21 10:29:44 +03:00
cli	mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 )	2026-05-21 00:35:37 +02:00
completion	mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 )	2026-05-21 00:35:37 +02:00
cvector-generator	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
export-lora	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
fit-params	app : add batched-bench, fit-params, quantize & perplexity (#23459 )	2026-05-21 10:29:44 +03:00
gguf-split	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
imatrix	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
llama-bench	app : introduce the llama unified executable (#23296 )	2026-05-20 13:22:22 +02:00
mtmd	mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 )	2026-05-21 00:35:37 +02:00
parser	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
perplexity	app : add batched-bench, fit-params, quantize & perplexity (#23459 )	2026-05-21 10:29:44 +03:00
quantize	app : add batched-bench, fit-params, quantize & perplexity (#23459 )	2026-05-21 10:29:44 +03:00
results	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
rpc	fix: rpc-server cache may not work in Windows environments (#22394 )	2026-04-27 17:25:09 +03:00
server	server : free draft/MTP resources on sleep to fix VRAM leak (#23461 )	2026-05-21 16:11:11 +08:00
tokenize	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
tts	logs : reduce (#23021 )	2026-05-14 13:05:52 +03:00
ui	ui: Improve Git Hooks for UI development (#23403 )	2026-05-21 08:27:50 +02:00
CMakeLists.txt	ui: Restructure repo to use `tools/ui` folder and `ui` / `UI` / `llama-ui` / `LLAMA_UI` naming (#23064 )	2026-05-16 02:02:40 +02:00