koboldcpp/tools
Pascal 5d44db6008
server, webui: support continue generation on reasoning models (#22727)
* server, webui : support continue generation on reasoning models (#22727)

Remove the throw blocking assistant prefill on reasoning models and
orchestrate thinking tags around the prefilled message so the parser
routes the next stream chunks correctly. WebUI drops the reasoning
guard on the Continue button, sends reasoning_content with the
prefilled message and persists partial reasoning on stop so the CoT
survives reload and resume.

Scope : templates with a simple thinking_start_tag / thinking_end_tag
pair. Channel-based templates like GPT-OSS are out of scope, pending
a per-template prefill API in common/chat.

First step toward #21754.

* chore: update webui build output

* server: reject reasoning prefill on channel based templates
2026-05-13 11:09:51 +02:00
..
batched-bench libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
cli spec : update CLI arguments for better consistency (#22964) 2026-05-13 09:15:39 +03:00
completion spec : update CLI arguments for better consistency (#22964) 2026-05-13 09:15:39 +03:00
cvector-generator libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
export-lora libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
fit-params fit-params : refactor + add option to output estimated memory per device (#22171) 2026-04-21 09:54:36 +03:00
gguf-split libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
imatrix libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
llama-bench spec : refactor params (#22397) 2026-04-28 09:07:33 +03:00
mtmd mtmd, server, common: expose modalities to /v1/models (#22952) 2026-05-12 19:08:07 +02:00
parser libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
perplexity fit-params : refactor + add option to output estimated memory per device (#22171) 2026-04-21 09:54:36 +03:00
quantize libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
results libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
rpc fix: rpc-server cache may not work in Windows environments (#22394) 2026-04-27 17:25:09 +03:00
server server, webui: support continue generation on reasoning models (#22727) 2026-05-13 11:09:51 +02:00
tokenize libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
tts libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
CMakeLists.txt llama: end-to-end tests (#19802) 2026-03-08 12:30:21 +01:00