server: fix checkpoints creation (#22929)

* common : add common_chat_split_by_role * cont : fix spans to reach end of message * server: fix checkpoints creation - extract message_spans from chat templates - find the prompt token position before the latest user message - split prompt batching at that position - create a context checkpoint before the latest user input - avoid periodic mid-prompt checkpoints when that position is known - handle multimodal prompts when mapping text/template positions to server prompt tokens - add --checkpoint-min-step to control minimum spacing between checkpoints * cont : clean-up * Support autoparser detection for message barriers * server: fix message span delimiter and update docs --------- Co-authored-by: Alde Rojas <hello@alde.dev> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
2026-05-31 05:03:44 +00:00 · 2026-05-25 07:56:18 +02:00 · 2026-05-25 07:56:18 +02:00 · e2ef8fe42c
commit e2ef8fe42c
parent 6d57c26ef8
15 changed files with 586 additions and 37 deletions
--- a/tools/server/server-common.cpp
+++ b/tools/server/server-common.cpp
@ -1110,6 +1110,16 @@ json oaicompat_chat_params_parse(
        llama_params["chat_parser"] = chat_params.parser;
    }

+    llama_params["message_spans"] = json::array();
+
+    for (const auto & span : chat_params.message_spans) {
+        llama_params["message_spans"].push_back({
+            { "role", span.role },
+            { "pos",  span.pos  },
+            { "len",  span.len  },
+        });
+    }
+
    // Reasoning budget: pass parameters through to sampling layer
    {
        int reasoning_budget = opt.reasoning_budget;