feat(cli): display real-time token consumption during streaming (#2742) (#3329)

* feat(cli): display real-time token consumption during streaming (#2742)

Show ↓/↑ token count in the spinner during model execution:
- ↓ when receiving content, ↑ when waiting for API response
- Accumulates across the whole turn (tool calls don't reset)
- Includes agent/subagent token consumption
- Uses useAnimationFrame hook (50ms polling) to avoid flickering

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix: address review feedback for real-time token display

- Replace unsafe type assertion with proper type guard in Composer
- Fix license header in useAnimationFrame.ts to match project standard
- Clarify tokenCount is replaced (not accumulated) per USAGE_METADATA event
- Use multi-line JSDoc format for isReceivingContent prop
- Improve re-sync comment in useAnimationFrame hook
- Revert unrelated streamingState dep change in AppContainer

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(core): use output-only tokens and accumulate across subagent rounds

Subagent token display had two bugs:
- Used totalTokenCount (input+output) instead of candidatesTokenCount
  (output-only), causing mixed units when aggregated with main stream
- Overwrote tokenCount per round instead of accumulating, so multi-round
  subagents only showed the last round's count

Co-Authored-By: Qwen-Coder <noreply@qwen.ai>

* fix(cli): smooth token counter animation and include tool args

Interpolate displayed token count toward the real value (3/frame for
small gaps, ~20% for medium, 50 for large) so chunked arrivals like
tool-call args no longer cause visible jumps. Also accumulate tool
call args JSON length into the streaming estimate, matching Claude
Code's input_json_delta handling.

Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com>

* fix(cli): scope token animation re-renders to LoadingIndicator

The 50ms useAnimationFrame poll lived in Composer, causing its entire
subtree (InputPrompt, Footer, KeyboardShortcuts) to reconcile 20×/sec
during streaming. Combined with the spinner and streamed text deltas,
ink redrew enough lines to produce visible terminal flicker.

Move the animation hook into LoadingIndicator so only that component
re-renders per frame, and slow polling to 100ms to match the spinner
cadence.

Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com>

* fix: address review nits on token display

1. AgentResultDisplay.tokenCount jsdoc said "(input + output)" but the
   value has been output-only since d393f23df — update the comment so it
   matches the implementation.
2. useAnimationFrame held the previous turn's count in state until the
   next interval tick, briefly flashing stale numbers when a new turn
   reset the ref to 0. Snap displayRef down synchronously on render and
   return Math.min(displayValue, ref.current) so the reset is reflected
   immediately; the interval tick still catches state up afterward.

Co-Authored-By: Qwen-Coder <noreply@alibabacloud.com>

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <noreply@qwen.ai>
Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
This commit is contained in:
qqqys 2026-04-21 17:01:40 +08:00 committed by GitHub
parent 07bd5c41cb
commit c25136f0ef
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 255 additions and 13 deletions

View file

@ -15,7 +15,7 @@ import { useUIState } from '../contexts/UIStateContext.js';
import { useUIActions } from '../contexts/UIActionsContext.js';
import { useVimMode } from '../contexts/VimModeContext.js';
import { useConfig } from '../contexts/ConfigContext.js';
import { StreamingState } from '../types.js';
import { StreamingState, type HistoryItemToolGroup } from '../types.js';
import { ConfigInitDisplay } from '../components/ConfigInitDisplay.js';
import { FeedbackDialog } from '../FeedbackDialog.js';
import { t } from '../../i18n/index.js';
@ -27,17 +27,40 @@ export const Composer = () => {
const uiActions = useUIActions();
const { vimEnabled } = useVimMode();
const { showAutoAcceptIndicator, sessionStats, taskStartTokens } = uiState;
const {
showAutoAcceptIndicator,
streamingResponseLengthRef,
isReceivingContent,
} = uiState;
const tokens = Object.values(sessionStats.metrics?.models ?? {}).reduce(
(acc, model) => ({
prompt: acc.prompt + (model.tokens?.prompt ?? 0),
candidates: acc.candidates + (model.tokens?.candidates ?? 0),
}),
{ prompt: 0, candidates: 0 },
);
// Real-time token animation is performed inside LoadingIndicator itself, so
// the 100ms polling only re-renders that one component — keeping InputPrompt
// and Footer static avoids terminal flicker during streaming.
const isStreaming =
uiState.streamingState === StreamingState.Responding ||
uiState.streamingState === StreamingState.WaitingForConfirmation;
const taskTokens = tokens.candidates - taskStartTokens;
// Aggregate agent tool tokens from executing tool calls. Only changes when
// a subagent reports progress, so it doesn't drive the animation loop.
let agentTokens = 0;
for (const item of uiState.pendingGeminiHistoryItems ?? []) {
if (item.type === 'tool_group') {
const toolGroup = item as HistoryItemToolGroup;
for (const tool of toolGroup.tools) {
const display = tool.resultDisplay;
if (
typeof display === 'object' &&
display !== null &&
'type' in display &&
display.type === 'task_execution' &&
'tokenCount' in display &&
typeof display.tokenCount === 'number'
) {
agentTokens += display.tokenCount;
}
}
}
}
// State for keyboard shortcuts display toggle
const [showShortcuts, setShowShortcuts] = useState(false);
@ -74,7 +97,10 @@ export const Composer = () => {
: uiState.currentLoadingPhrase
}
elapsedTime={uiState.elapsedTime}
candidatesTokens={taskTokens}
candidatesTokens={agentTokens}
streamingCharsRef={streamingResponseLengthRef}
isStreaming={isStreaming}
isReceivingContent={isReceivingContent}
/>
)}