koboldcpp/tools/server/tests/unit
Pascal 95d469a915
server, webui: accept continue_final_message flag for vLLM API compat (#23012)
* server, webui: accept continue_final_message flag for vLLM API compat

Add the continue_final_message body flag from the vLLM and transformers
API. When set together with add_generation_prompt false, it triggers the
existing prefill_assistant code path, regardless of the server side
opt.prefill_assistant option. Mutual exclusion with add_generation_prompt
true is enforced, matching vLLM behavior.

WebUI sends continue_final_message and add_generation_prompt false on
the Continue button, with the matching opt in option on the chat service.

Pure API alignment, no change to the prefill logic itself. Paves the way
for the upcoming per-template prefill plumbing in common/chat.

* test: add coverage for continue_final_message vLLM compat flag

Two cases on top of the existing assistant prefill coverage. First,
continue_final_message true with add_generation_prompt false produces
the same rendered prompt as the prefill_assistant heuristic, proving
the new flag is a correct alias of the existing path. Second, both
flags set to true is rejected with HTTP 400, matching the
vLLM/transformers mutual exclusion contract.

* chore: update webui build output
2026-05-13 20:47:58 +02:00
..
test_basic.py server : support multiple model aliases via comma-separated --alias (#19926) 2026-02-27 07:05:23 +01:00
test_chat_completion.py server, webui: accept continue_final_message flag for vLLM API compat (#23012) 2026-05-13 20:47:58 +02:00
test_compat_anthropic.py server: Add cached_tokens info to oaicompat responses (#19361) 2026-03-19 19:09:33 +01:00
test_compat_gcp.py server: support Vertex AI compatible API (#22545) 2026-05-08 15:23:04 +02:00
test_compat_oai_responses.py server: /v1/responses (partial) (#18486) 2026-01-21 17:47:23 +01:00
test_completion.py backend sampling: support returning post-sampling probs (#22622) 2026-05-10 19:12:02 +02:00
test_ctx_shift.py memory : remove KV cache size padding (#16812) 2025-10-28 20:19:44 +02:00
test_embedding.py llama : fix pooling assertion crash in chunked GDN detection path (#20468) 2026-03-13 20:53:42 +02:00
test_ignore_eos.py server: respect the ignore eos flag (#21203) 2026-04-08 17:12:15 +02:00
test_infill.py server : support unified cache across slots (#16736) 2025-11-02 18:14:04 +02:00
test_kv_keep_only_active.py server: rename debug tags to match --cache-idle-slots naming (#22292) 2026-04-24 09:28:44 +03:00
test_lora.py server : disable context shift by default (#15416) 2025-08-19 16:46:37 +03:00
test_proxy.py server: Parse port numbers from MCP server URLs in CORS proxy (#20208) 2026-03-09 17:47:54 +01:00
test_rerank.py server / ranking : add sorting and management of top_n (#16403) 2025-10-11 16:39:04 +03:00
test_router.py server: implement /models?reload=1 (#21848) 2026-05-04 16:23:26 +02:00
test_security.py server: Bypass API Key validation for WebUI static bundle assets (#21269) 2026-04-01 21:32:15 +02:00
test_sleep.py server: add auto-sleep after N seconds of idle (#18228) 2025-12-21 02:24:42 +01:00
test_slot_save.py server : disable context shift by default (#15416) 2025-08-19 16:46:37 +03:00
test_speculative.py spec : parallel drafting support (#22838) 2026-05-11 19:09:43 +03:00
test_template.py tests : use reasoning instead of reasoning_budget in server tests (#20432) 2026-03-12 13:41:01 +01:00
test_tokenize.py server : disable context shift by default (#15416) 2025-08-19 16:46:37 +03:00
test_tool_call.py common/autoparser: fixes for newline handling / forced tool calls (#22654) 2026-05-04 13:18:11 +02:00
test_vision_api.py server: tests: fetch random media marker via /apply-template (#21962) (#21980) 2026-04-16 20:46:21 +03:00