koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-17 04:09:19 +00:00

History

Pascal 5d44db6008 server, webui: support continue generation on reasoning models (#22727 ) * server, webui : support continue generation on reasoning models (#22727) Remove the throw blocking assistant prefill on reasoning models and orchestrate thinking tags around the prefilled message so the parser routes the next stream chunks correctly. WebUI drops the reasoning guard on the Continue button, sends reasoning_content with the prefilled message and persists partial reasoning on stop so the CoT survives reload and resume. Scope : templates with a simple thinking_start_tag / thinking_end_tag pair. Channel-based templates like GPT-OSS are out of scope, pending a per-template prefill API in common/chat. First step toward #21754. * chore: update webui build output * server: reject reasoning prefill on channel based templates		2026-05-13 11:09:51 +02:00
..
batched-bench	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
cli	spec : update CLI arguments for better consistency (#22964 )	2026-05-13 09:15:39 +03:00
completion	spec : update CLI arguments for better consistency (#22964 )	2026-05-13 09:15:39 +03:00
cvector-generator	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
export-lora	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
fit-params	fit-params : refactor + add option to output estimated memory per device (#22171 )	2026-04-21 09:54:36 +03:00
gguf-split	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
imatrix	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
llama-bench	spec : refactor params (#22397 )	2026-04-28 09:07:33 +03:00
mtmd	mtmd, server, common: expose modalities to /v1/models (#22952 )	2026-05-12 19:08:07 +02:00
parser	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
perplexity	fit-params : refactor + add option to output estimated memory per device (#22171 )	2026-04-21 09:54:36 +03:00
quantize	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
results	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
rpc	fix: rpc-server cache may not work in Windows environments (#22394 )	2026-04-27 17:25:09 +03:00
server	server, webui: support continue generation on reasoning models (#22727 )	2026-05-13 11:09:51 +02:00
tokenize	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
tts	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
CMakeLists.txt	llama: end-to-end tests (#19802 )	2026-03-08 12:30:21 +01:00