koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-17 04:09:19 +00:00

History

Pascal 95d469a915 server, webui: accept continue_final_message flag for vLLM API compat (#23012 ) * server, webui: accept continue_final_message flag for vLLM API compat Add the continue_final_message body flag from the vLLM and transformers API. When set together with add_generation_prompt false, it triggers the existing prefill_assistant code path, regardless of the server side opt.prefill_assistant option. Mutual exclusion with add_generation_prompt true is enforced, matching vLLM behavior. WebUI sends continue_final_message and add_generation_prompt false on the Continue button, with the matching opt in option on the chat service. Pure API alignment, no change to the prefill logic itself. Paves the way for the upcoming per-template prefill plumbing in common/chat. * test: add coverage for continue_final_message vLLM compat flag Two cases on top of the existing assistant prefill coverage. First, continue_final_message true with add_generation_prompt false produces the same rendered prompt as the prefill_assistant heuristic, proving the new flag is a correct alias of the existing path. Second, both flags set to true is rejected with HTTP 400, matching the vLLM/transformers mutual exclusion contract. * chore: update webui build output		2026-05-13 20:47:58 +02:00
..
batched-bench	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
cli	spec : update CLI arguments for better consistency (#22964 )	2026-05-13 09:15:39 +03:00
completion	spec : update CLI arguments for better consistency (#22964 )	2026-05-13 09:15:39 +03:00
cvector-generator	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
export-lora	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
fit-params	fit-params : refactor + add option to output estimated memory per device (#22171 )	2026-04-21 09:54:36 +03:00
gguf-split	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
imatrix	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
llama-bench	spec : refactor params (#22397 )	2026-04-28 09:07:33 +03:00
mtmd	mtmd, server, common: expose modalities to /v1/models (#22952 )	2026-05-12 19:08:07 +02:00
parser	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
perplexity	fit-params : refactor + add option to output estimated memory per device (#22171 )	2026-04-21 09:54:36 +03:00
quantize	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
results	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
rpc	fix: rpc-server cache may not work in Windows environments (#22394 )	2026-04-27 17:25:09 +03:00
server	server, webui: accept continue_final_message flag for vLLM API compat (#23012 )	2026-05-13 20:47:58 +02:00
tokenize	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
tts	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
CMakeLists.txt	llama: end-to-end tests (#19802 )	2026-03-08 12:30:21 +01:00