mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-05-03 14:10:43 +00:00
feat(cli,core): LLM-generated summary labels for tool-call batches (#3538)
Some checks are pending
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Some checks are pending
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
* feat(cli,core): generate tool-use summaries for compact mode
After each tool batch completes, fire a parallel fast-model call to
generate a short git-commit-subject-style label summarizing what the
batch accomplished (e.g. "Read txt files", "Searched in auth/"). In
compact mode the label replaces the generic "Tool × N" header so N
parallel tool calls collapse to a single semantic row.
The fast-model call (~1s) runs fire-and-forget, overlapped with the
next turn's API stream, so there is no perceived latency. Missing
fast model, aborted turns, and model failures all degrade silently to
the existing rendering.
The summary is also emitted as a `tool_use_summary` history entry
with `precedingToolUseIds`, keeping the shape compatible with SDK
clients that want to render collapsed tool views on their own.
Gated by `experimental.emitToolUseSummaries` (default on). Can be
overridden per-session with `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0|1`.
The system prompt and truncation rules (300 chars per tool field,
200 chars of trailing assistant text as intent prefix) match the
existing behavior seen in other tools that emit the same message
type, so SDK consumers see a consistent shape across clients.
* fix(core): bound cleanSummary quote-strip regex to avoid ReDoS
CodeQL js/polynomial-redos flagged the /^["'`]+|["'`]+$/g pattern in
cleanSummary because its input comes from an LLM (treated as
uncontrolled). The original regex is anchored and linear in practice,
but tightening the quantifier to {1,10} both satisfies the static
check and caps engine work on pathological model output with a long
run of quotes. Ten opening/closing quotes is well past anything a real
label would produce.
* fix(cli): render tool_use_summary inline so full mode also shows the label
The summary was only visible in compact mode because the full-mode
ToolGroupMessage ignored the compactLabel prop. Compact mode got away
with this because mergeCompactToolGroups triggers refreshStatic(),
which re-renders the merged tool_group with its newly-looked-up
label. Full mode has no such refresh path, so when the fast-model
call resolves *after* the tool_group has been committed to the
append-only <Static>, there is no way to retroactively decorate it.
Switch to rendering `tool_use_summary` as its own inline history item
(a single dim `● <label>` line). New items append cleanly to <Static>,
so the summary flows in naturally once the fast-model call resolves.
Compact mode still replaces the merged tool_group header with the
label and hides the standalone summary line via the `compactMode`
guard.
With this, the feature works under the default `ui.compactMode: false`
— not just the opt-in compact view.
* docs: tool-use-summaries feature guide, settings entry, and design doc
Three new docs matching the existing fast-model feature docs layout:
- docs/users/features/tool-use-summaries.md — user-facing guide
covering full + compact rendering, configuration (settings + env),
failure modes, cost, and cross-links to followup-suggestions.
- docs/users/configuration/settings.md — register the new
experimental.emitToolUseSummaries setting next to the other
fast-model-driven UI settings.
- docs/design/tool-use-summary/tool-use-summary-design.md — deep dive
matching the compact-mode-design.md competitive-analysis style.
Documents the Claude Code port (prompt, truncation, timing, gate),
the deviations (settings layer, default on, cleanSummary, dual
render paths), and the Ink <Static> append-only rationale that
drove the inline full-mode render vs header-replacement split.
* docs: add Recommended pairing section to tool-use-summaries
Full-mode rendering of the summary works, but for small same-type
batches (Read × 3 and similar) the label visibly restates what the
tool lines already show. Pairing with ui.compactMode: true folds
the whole batch into a single labeled row, which is the cleanest
transcript shape once the label is available.
Adds a dedicated section showing the paired settings.json snippet
and explicitly calling out when each mode wins (and when to turn
the feature off instead).
* fix: address review feedback on tool-use summary generation
Addresses multiple issues from @chiga0's review:
Blocking — compact-mode label invisible for single-batch turns.
mergeCompactToolGroups's adjacency-only gating left a trailing
tool_use_summary in the merged result whenever there was no second
batch to merge across. That pushed mergedHistory.length lock-step
with history.length and MainContent's refreshStatic heuristic
(currMLen <= prevMLen) never fired, so Ink's append-only <Static>
never repainted the tool_group with its newly-looked-up label.
Drop tool_use_summary items unconditionally now; gemini_thought
still survives to avoid unnecessary repaints. New tests cover
the single-batch case and the summary-before-user-message case.
Blocking — stale summary appears after Ctrl+C on the next turn.
summarySignal captured the CURRENT turn's AbortController, but the
summary resolves during the NEXT turn's streaming window. The next
turn's submitQuery allocates a fresh controller, so the captured
signal was never aborted — Ctrl+C during the new turn used to let
the previous turn's summary land in the transcript seconds later.
Fix: dedicated per-batch AbortController tracked in a ref set,
aborted eagerly from cancelOngoingRequest; resolve-time check reads
the live abort state and turnCancelledRef.
High — summarizer input pollution.
geminiTools contained error/cancelled tools; retry-loop warnings
and "Cancelled by user" strings were feeding the fast model.
cleanSummary can only reject error-shaped output, not prevent the
model from hallucinating a plausible label from bad input (the PR's
own tmux screenshot showed "Read txt files · 5 tools" where 4 of
the 5 were prior-retry failures). Filter to status === 'success'
before building the prompt; skip the call entirely if nothing's
left.
High — unstable label on merged groups.
getCompactLabel iterated all callIds and returned the first hit,
so asynchronous resolution order made the header visibly flip
from SB to SA when batch A resolved after batch B. Lock onto
item.tools[0].callId to keep stable "leading batch governs"
semantics.
High — force-expanded groups in compact mode had no label at all.
Compact mode routes non-force-expand groups through
CompactToolGroupDisplay (consumes compactLabel) and force-expand
groups through the full ToolGroupMessage (ignores compactLabel);
the standalone ● line was gated on !compactMode, creating a dead
zone — exactly the diagnostically valuable case. MainContent now
computes absorbedCallIds (which groups actually consume the
header replacement) and passes summaryAbsorbed to
HistoryItemDisplay; force-expand groups in compact mode get the
standalone line as the label's only path to the screen.
Medium — cleanSummary robustness.
Extend quote-strip to Unicode curly + CJK corner brackets; strip
markdown emphasis (**bold**, _italic_); broaden refusal-prefix
rejection to curly-apostrophe "I can't", Chinese "我无法 / 我不能 /
抱歉 / 无法", and "Failed to / Sorry, / Request failed". 7 new
cleanSummary tests cover the added cases.
Low — concurrent-rendering safety.
Move historyRef.current = history from render phase into
useLayoutEffect so bailed renders can't leave a dropped value.
Low — CompactToolGroupDisplay readability.
Extract renderSummaryHeader / renderDefaultHeader helpers and
document the toolCalls.length > 1 count-suffix guard so a future
"fix" to >= 1 doesn't reintroduce "Read config.json · 1 tools".
Docs — add Scope & Lifecycle section to tool-use-summaries.md
covering (1) one generation per batch shared by both modes,
(2) no backfill on toggle / session resume, (3) main-agent batches
only with the Task-tool clarification.
* fix: address second-round review feedback on tool-use summaries
Critical — force-expand groups lost their summary entirely.
Previous round's "drop tool_use_summary unconditionally" merge fix
also stripped summaries for force-expanded groups, defeating the
exact case (errors, confirmations, focused shell) where the
standalone ● label is the label's only path to the screen. The
merge function now takes an absorbedCallIds set: summaries whose
preceding callIds are all absorbed by a compact tool_group header
are dropped (so refreshStatic still fires), but force-expanded
summaries pass through to be rendered standalone by
HistoryItemDisplay. MainContent computes absorbedCallIds from raw
history and passes it in. New tests cover both the absorbed-drop
and the force-expand-preserve cases plus the empty-set default
for callers that don't compute absorption.
Suggestion — late-arriving summaries could land out of order.
A slow fast-model call could resolve after the next turn's
content was committed, planting the ● label between later items
in full mode. The resolve callback now captures the first batch
callId, locates the corresponding tool_group at resolve time,
and drops the summary if a newer tool_group has already appeared
in history. New test exercises this with a manually-resolved
fast-model promise.
Suggestion — truncateJson allocated full JSON for large strings.
A 10MB ReadFile result was being JSON.stringify'd in full only to
be sliced down to 300 chars. Added preTruncate that walks the
value (depth-bounded to 4) and slices string leaves to maxLength
before serialization. Tests verify the input never reaches its
full pre-cap form.
Suggestion — settings description over-claimed SDK emission.
The description said summaries are emitted to SDK clients as a
tool_use_summary message; the SDK plumbing isn't actually wired
in this PR (the factory is exported for follow-up). Updated
settings.json description and regenerated the vscode schema to
state CLI-only scope explicitly.
Suggestion — fastModel data-boundary not documented.
When fastModel uses a different provider than the main session
model, tool inputs/outputs cross a new auth boundary that users
may not expect. Added "Data flow & privacy" section to the user
feature doc spelling out: same-provider fast model = no scope
change; different-provider = strictly larger sharing scope; two
escape hatches (same-provider fast model OR feature off).
Code-level mitigation (metadata-only mode) deferred.
This commit is contained in:
parent
7fe853a782
commit
f420742831
22 changed files with 2104 additions and 24 deletions
|
|
@ -117,6 +117,7 @@ Settings are organized into categories. All settings should be placed within the
|
|||
| `ui.enableFollowupSuggestions` | boolean | Enable [followup suggestions](../features/followup-suggestions) that predict what you want to type next after the model responds. Suggestions appear as ghost text and can be accepted with Tab, Enter, or Right Arrow. | `true` |
|
||||
| `ui.enableCacheSharing` | boolean | Use cache-aware forked queries for suggestion generation. Reduces cost on providers that support prefix caching (experimental). | `true` |
|
||||
| `ui.enableSpeculation` | boolean | Speculatively execute accepted suggestions before submission. Results appear instantly when you accept (experimental). | `false` |
|
||||
| `experimental.emitToolUseSummaries` | boolean | Generate short LLM-based labels summarizing each tool-call batch. See [Tool-Use Summaries](../features/tool-use-summaries). Requires `fastModel` to be configured; silently skipped otherwise. Can be overridden per-session with `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0` or `=1`. | `true` |
|
||||
|
||||
#### ide
|
||||
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@ export default {
|
|||
commands: 'Commands',
|
||||
'code-review': 'Code Review',
|
||||
'followup-suggestions': 'Followup Suggestions',
|
||||
'tool-use-summaries': 'Tool-Use Summaries',
|
||||
'sub-agents': 'SubAgents',
|
||||
arena: 'Agent Arena',
|
||||
skills: 'Skills',
|
||||
|
|
|
|||
178
docs/users/features/tool-use-summaries.md
Normal file
178
docs/users/features/tool-use-summaries.md
Normal file
|
|
@ -0,0 +1,178 @@
|
|||
# Tool-Use Summaries
|
||||
|
||||
Qwen Code can generate a short, git-commit-subject-style label after each tool batch completes, summarizing what the batch accomplished. The label appears inline in the transcript and replaces the generic `Tool × N` header in compact mode.
|
||||
|
||||
This is a UX aid for parallel tool calls: when the model fans out into several `Read` + `Grep` + `Bash` calls at once, the summary tells you the intent at a glance instead of forcing you to scan the tool list.
|
||||
|
||||
The feature is enabled by default and runs silently in the background. It requires a configured [fast model](./followup-suggestions#fast-model).
|
||||
|
||||
## What You See
|
||||
|
||||
### Full mode (default)
|
||||
|
||||
The summary appears as a dim badge line directly below the tool group:
|
||||
|
||||
```
|
||||
╭──────────────────────────────────────────────╮
|
||||
│ ✓ ReadFile a.txt │
|
||||
│ ✓ ReadFile b.txt │
|
||||
│ ✓ ReadFile c.txt │
|
||||
│ ✓ ReadFile d.txt │
|
||||
╰──────────────────────────────────────────────╯
|
||||
|
||||
● Read 4 text files
|
||||
```
|
||||
|
||||
### Compact mode (`Ctrl+O` or `ui.compactMode: true`)
|
||||
|
||||
The label replaces the generic `Tool × N` header in the compact one-liner:
|
||||
|
||||
```
|
||||
╭──────────────────────────────────────────────╮
|
||||
│✓ Read txt files · 4 tools │
|
||||
│Press Ctrl+O to show full tool output │
|
||||
╰──────────────────────────────────────────────╯
|
||||
```
|
||||
|
||||
The individual tool calls are still a keystroke away (`Ctrl+O` to toggle to full mode).
|
||||
|
||||
## How It Works
|
||||
|
||||
After a tool batch finalizes, Qwen Code fires a fire-and-forget call to the configured fast model with:
|
||||
|
||||
- The tool names, truncated arguments, and truncated results (each capped at 300 characters).
|
||||
- The assistant's most recent text output (first 200 characters) as an intent prefix.
|
||||
- A system prompt instructing the model to return a past-tense, 30-character label in git-commit-subject style.
|
||||
|
||||
The call runs in parallel with the next turn's API streaming, so its ~1s latency is hidden behind the main model's response. When the label resolves, it is appended to the transcript as a `tool_use_summary` entry.
|
||||
|
||||
Example labels: `Searched in auth/`, `Fixed NPE in UserService`, `Created signup endpoint`, `Read config.json`, `Ran failing tests`.
|
||||
|
||||
## When It Appears
|
||||
|
||||
The summary is generated when **all** of the following are true:
|
||||
|
||||
- `experimental.emitToolUseSummaries` is `true` (default).
|
||||
- A `fastModel` is configured (via settings or `/model --fast`).
|
||||
- At least one tool completed in the batch.
|
||||
- The turn was not aborted before tool completion.
|
||||
- The fast model returned a non-empty, non-error response.
|
||||
|
||||
Subagent tool calls do not trigger summary generation — only the main session's tool batches do.
|
||||
|
||||
## When It Doesn't Appear
|
||||
|
||||
The summary is silently skipped (no error, no UI change) when:
|
||||
|
||||
- No fast model is configured.
|
||||
- The fast model call fails, times out, or returns empty.
|
||||
- The model returned an obvious error-message-like string (e.g., `Error: ...`, `I cannot ...`) — filtered out by the client so the UI does not show misleading labels.
|
||||
- The turn was aborted (`Ctrl+C`) before the model finished.
|
||||
|
||||
In all these cases, the tool group renders as it always has.
|
||||
|
||||
## Fast Model
|
||||
|
||||
The label is generated using the [fast model](./followup-suggestions#fast-model) — the same model you configure for prompt suggestions and speculative execution. Configure it via:
|
||||
|
||||
### Via command
|
||||
|
||||
```
|
||||
/model --fast qwen3-coder-flash
|
||||
```
|
||||
|
||||
### Via `settings.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"fastModel": "qwen3-coder-flash"
|
||||
}
|
||||
```
|
||||
|
||||
When no fast model is configured, summary generation is skipped entirely — the feature has no effect until you set one up.
|
||||
|
||||
## Configuration
|
||||
|
||||
These settings can be configured in `settings.json`:
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
| ----------------------------------- | ------- | ------- | -------------------------------------------------------------------------------------------------- |
|
||||
| `experimental.emitToolUseSummaries` | boolean | `true` | Master switch for summary generation. Turn off to disable the extra fast-model call. |
|
||||
| `fastModel` | string | `""` | Fast model used for summary generation (shared with prompt suggestions). Required; no-op if empty. |
|
||||
|
||||
### Environment override
|
||||
|
||||
`QWEN_CODE_EMIT_TOOL_USE_SUMMARIES` overrides the `experimental.emitToolUseSummaries` setting for the current session:
|
||||
|
||||
- `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0` or `=false` — force off.
|
||||
- `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=1` or `=true` — force on.
|
||||
- Unset — use the `experimental.emitToolUseSummaries` setting.
|
||||
|
||||
### Example
|
||||
|
||||
```json
|
||||
{
|
||||
"fastModel": "qwen3-coder-flash",
|
||||
"experimental": {
|
||||
"emitToolUseSummaries": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Scope & lifecycle
|
||||
|
||||
Three points that tend to trip up a first read of this feature:
|
||||
|
||||
1. **One generation per batch, shared by both display modes.** The fast-model call happens exactly once in `handleCompletedTools` when a tool batch finalizes. Toggling `Ctrl+O` afterwards does **not** trigger a new call — both modes read from the same `tool_use_summary` history entry that was captured the first time. You can flip compact mode on and off freely without extra cost.
|
||||
2. **No backfill on toggle or on session resume.** A `tool_group` that completed before the feature was enabled (or before you flipped the setting on, or in a resumed session — `ChatRecordingService` does not persist summary entries) will never get a label. There is no "sweep existing history" pass. If you turn this setting on mid-session, only _future_ batches will show a label; older groups keep the default rendering with no indicator that a label is missing.
|
||||
3. **Main-agent batches only.** The trigger lives in the main session's turn loop (`useGeminiStream`), so:
|
||||
- ✅ Shell, MCP, file operations, and the `Task` / subagent tool _call itself_ (as it appears in the main batch) are summarized.
|
||||
- ❌ A subagent's **internal** tool batches (run through `packages/core/src/agents/runtime/`) are not summarized.
|
||||
|
||||
An outer batch that _contains_ a `Task` tool will still be labeled, but the fast model sees only the subagent tool call and its aggregated output — not the individual tool calls inside the subagent. Expect labels like `Ran research-agent` or `Delegated file search` rather than `Searched 14 files`. This is intentional — summarizing subagent internals would multiply the fast-model cost and surface noise that never shows up in the primary UI.
|
||||
|
||||
## Recommended pairing: enable compact mode
|
||||
|
||||
For batches of 3+ parallel tool calls, pairing this feature with `ui.compactMode: true` produces the cleanest transcript. The compact view folds the whole batch into a single labeled row (`✓ Read txt files · 4 tools`) instead of showing every tool line plus the trailing summary. Details remain one keystroke away via `Ctrl+O`.
|
||||
|
||||
```json
|
||||
{
|
||||
"fastModel": "qwen3-coder-flash",
|
||||
"ui": {
|
||||
"compactMode": true
|
||||
},
|
||||
"experimental": {
|
||||
"emitToolUseSummaries": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In full mode (the default), the summary renders as a trailing `● <label>` line below the tool group — useful for large or heterogeneous batches, but for small same-type batches (e.g. `Read × 3`) the label can read as a restatement of the visible tool lines. If that matches your usual workflow, either turn compact mode on as above, or turn the summary off entirely via `experimental.emitToolUseSummaries: false`.
|
||||
|
||||
## Monitoring
|
||||
|
||||
Summary model usage appears in `/stats` output under the fast-model token totals, with the `prompt_id` `tool_use_summary_generation` so it can be distinguished from prompt suggestions and other background tasks.
|
||||
|
||||
## Data flow & privacy
|
||||
|
||||
The summary call sends each successful tool's name, truncated `args`, and truncated result (each field capped at 300 characters) to the **fast model**, plus the first 200 characters of the assistant's most recent text as an intent prefix.
|
||||
|
||||
If your fast model is configured for the same provider/auth as your main session model, the data flows along the same boundary your main session already uses — no change in trust scope. If you have configured a fast model from a **different provider**, tool inputs and outputs (potentially including file contents read by `read_file`, command output from shell calls, or values surfaced through MCP tools) will be sent to that other provider as part of the summarization prompt. That is a strictly larger data-sharing scope than the main session alone.
|
||||
|
||||
If this matters for your workflow, you have two clean options:
|
||||
|
||||
- Configure `fastModel` to a model under the same provider as your main session, so the summary call doesn't cross any new auth/data boundary.
|
||||
- Disable the feature entirely with `experimental.emitToolUseSummaries: false` (or `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0`).
|
||||
|
||||
The 300-character per-field cap limits exposure but does not eliminate it — secrets discovered in tool output during the cap window can still be sent. Treat the fast model's data boundary the same way you treat the main model's.
|
||||
|
||||
## Cost
|
||||
|
||||
One fast-model call per qualifying tool batch. Input is a small fixed system prompt plus the truncated tool inputs/outputs (each capped at 300 characters per field). Output is a single short line (capped at 100 characters, typically 20 tokens or fewer). On a typical fast model this is roughly $0.001 per batch.
|
||||
|
||||
If you do not want the extra cost, turn the feature off via `experimental.emitToolUseSummaries: false` or `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0`.
|
||||
|
||||
## Related
|
||||
|
||||
- [Compact Mode](../configuration/settings#ui.compactMode) — toggle with `Ctrl+O`; the summary replaces the generic tool-group header when compact mode is on.
|
||||
- [Followup Suggestions](./followup-suggestions) — another fast-model-driven UX enhancement that shares the same `fastModel` setting.
|
||||
Loading…
Add table
Add a link
Reference in a new issue