From 6831fe470cdfbeeebcf5645e8d5d9acae3a4ddf0 Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Fri, 15 May 2026 19:33:12 +0200 Subject: [PATCH] docs: document `usage` object in server timings response (#23110) * docs: document `usage` object in server timings response Co-Authored-By: julien-agent * Apply suggestion from @julien-c --------- Co-authored-by: julien-agent --- tools/server/README.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/tools/server/README.md b/tools/server/README.md index 5ba1a58f8..2b3a2b168 100644 --- a/tools/server/README.md +++ b/tools/server/README.md @@ -1322,6 +1322,22 @@ This provides information on the performance of the server. It also allows calcu The total number of tokens in context is equal to `prompt_n + cache_n + predicted_n` +The response also includes a standard `usage` object: + +```js +{ + // ... + "usage": { + "completion_tokens": 48, + "prompt_tokens": 44, + "total_tokens": 92, + "prompt_tokens_details": { + "cached_tokens": 0 + } + } +} +``` + *Reasoning support* The server supports parsing and returning reasoning via the `reasoning_content` field, similar to Deepseek API.