mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-05-19 07:54:38 +00:00
* fix(core): include cache_creation_input_tokens in Anthropic prompt accounting
Anthropic reports the prompt across three mutually-exclusive fields —
input_tokens, cache_read_input_tokens, cache_creation_input_tokens —
but the adapter only summed input + cache_read, dropping the
cache_creation bucket. On a fresh session that wrote the system prompt
to cache, the reported promptTokenCount was off by the cache-creation
amount.
Extract the normalization into a shared helper used by both streaming
and non-streaming paths, and add a guard for non-conforming providers
that expose the Anthropic protocol but follow OpenAI-style accounting
(input_tokens already covers the cache fields). When input_tokens is at
least as large as both cache fields and at least one cache field is
non-zero, trust input_tokens alone so we don't double-count.
* fix(core): prefer promptTokenCount over totalTokenCount for context display
The Footer's "context used" indicator is meant to track prompt size —
how much of the context window the next request will carry. The current
code preferred totalTokenCount (= prompt + output), so output tokens
generated in the in-flight round were double-counted. Across turns this
caused the % bar to oscillate non-monotonically: it could *decrease*
between turns whenever the prior round's output was large.
Flip the preference at every consumer site that drives the live counter:
the per-stream-chunk update in the main chat, the per-round update in
the subagent runtime (which drives auto-compaction), the session-resume
walk, and the in-process agent panel's listener. Producer sites that
expose total for billing/export are left unchanged.
* fix(core): use cache_creation as the discriminator in Anthropic usage normalization
The previous guard fell back to "input alone" whenever input_tokens was
at least as large as both cache fields. In a real Anthropic conversation
input_tokens grows past cache_creation_input_tokens as history
accumulates, so the guard inevitably mis-classified every later turn as
OpenAI-style and silently dropped the cache_creation portion from the
displayed prompt size. The Footer would show a one-shot drop at the
crossover point and then keep under-reporting by ~32k tokens.
cache_creation_input_tokens is unique to Anthropic's protocol (OpenAI
has no equivalent), so its presence is a strong signal the response
follows real Anthropic semantics. Use that as the primary discriminator
and only fall back to "input alone" when cache_creation is zero, cache
reads are reported, and input already covers them — the actual OpenAI-
on-Anthropic case the guard was meant to catch.
Adds a regression test that locks in the crossover scenario.
* chore(core): address PR review — restore isFinite guard and cover cache-field plumbing
- Restore the `isFinite` guard on `lastPromptTokenCount`: the previous
`if (contextTok)` relaxation accepted `Infinity` (truthy), which a
malformed provider response could otherwise latch and poison the
downstream compaction math.
- Add unit coverage for the cache-field plumbing the PR introduced:
- usage.ts: real-Anthropic warm-turn case where `cache_read > 0` and
`cache_creation > 0` simultaneously (mid-conversation breakpoint
advance over an already-cached prefix).
- converter.ts: `convertAnthropicResponseToGemini` now exercised with
all three prompt buckets present to confirm both cache fields are
forwarded to `usageMetadata`.
- anthropicContentGenerator.ts: streaming pipeline test that includes
`cache_creation_input_tokens` in `message_start` and asserts the
accumulated `usageMetadata` carries it through to the final chunk.
|
||
|---|---|---|
| .. | ||
| src | ||
| index.ts | ||
| package.json | ||
| test-setup.ts | ||
| tsconfig.json | ||
| vitest.config.ts | ||