server: expose prompt token counts in /slots endpoint (#23454)

Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor prompt evaluation progress during processing.
2026-07-10 01:18:32 +00:00 · 2026-05-21 13:29:13 +02:00 · 2026-05-21 13:29:13 +02:00 · b65bb4baae
commit b65bb4baae
parent a1a69f777a
1 changed files with 3 additions and 0 deletions
--- a/tools/server/server-context.cpp
+++ b/tools/server/server-context.cpp
@ -506,6 +506,9 @@ struct server_slot {

        if (ptask) {
            res["id_task"] = ptask->id;
+            res["n_prompt_tokens"]           = (int32_t) prompt.tokens.size();
+            res["n_prompt_tokens_processed"] = n_prompt_tokens_processed;
+            res["n_prompt_tokens_cache"]     = n_prompt_tokens_cache;
            res["params"] = ptask->params.to_json(only_metrics);
            res["next_token"] = {
                {