mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-05-05 23:42:03 +00:00
feat(cli,core): LLM-generated summary labels for tool-call batches (#3538)
Some checks are pending
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Some checks are pending
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
* feat(cli,core): generate tool-use summaries for compact mode
After each tool batch completes, fire a parallel fast-model call to
generate a short git-commit-subject-style label summarizing what the
batch accomplished (e.g. "Read txt files", "Searched in auth/"). In
compact mode the label replaces the generic "Tool × N" header so N
parallel tool calls collapse to a single semantic row.
The fast-model call (~1s) runs fire-and-forget, overlapped with the
next turn's API stream, so there is no perceived latency. Missing
fast model, aborted turns, and model failures all degrade silently to
the existing rendering.
The summary is also emitted as a `tool_use_summary` history entry
with `precedingToolUseIds`, keeping the shape compatible with SDK
clients that want to render collapsed tool views on their own.
Gated by `experimental.emitToolUseSummaries` (default on). Can be
overridden per-session with `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0|1`.
The system prompt and truncation rules (300 chars per tool field,
200 chars of trailing assistant text as intent prefix) match the
existing behavior seen in other tools that emit the same message
type, so SDK consumers see a consistent shape across clients.
* fix(core): bound cleanSummary quote-strip regex to avoid ReDoS
CodeQL js/polynomial-redos flagged the /^["'`]+|["'`]+$/g pattern in
cleanSummary because its input comes from an LLM (treated as
uncontrolled). The original regex is anchored and linear in practice,
but tightening the quantifier to {1,10} both satisfies the static
check and caps engine work on pathological model output with a long
run of quotes. Ten opening/closing quotes is well past anything a real
label would produce.
* fix(cli): render tool_use_summary inline so full mode also shows the label
The summary was only visible in compact mode because the full-mode
ToolGroupMessage ignored the compactLabel prop. Compact mode got away
with this because mergeCompactToolGroups triggers refreshStatic(),
which re-renders the merged tool_group with its newly-looked-up
label. Full mode has no such refresh path, so when the fast-model
call resolves *after* the tool_group has been committed to the
append-only <Static>, there is no way to retroactively decorate it.
Switch to rendering `tool_use_summary` as its own inline history item
(a single dim `● <label>` line). New items append cleanly to <Static>,
so the summary flows in naturally once the fast-model call resolves.
Compact mode still replaces the merged tool_group header with the
label and hides the standalone summary line via the `compactMode`
guard.
With this, the feature works under the default `ui.compactMode: false`
— not just the opt-in compact view.
* docs: tool-use-summaries feature guide, settings entry, and design doc
Three new docs matching the existing fast-model feature docs layout:
- docs/users/features/tool-use-summaries.md — user-facing guide
covering full + compact rendering, configuration (settings + env),
failure modes, cost, and cross-links to followup-suggestions.
- docs/users/configuration/settings.md — register the new
experimental.emitToolUseSummaries setting next to the other
fast-model-driven UI settings.
- docs/design/tool-use-summary/tool-use-summary-design.md — deep dive
matching the compact-mode-design.md competitive-analysis style.
Documents the Claude Code port (prompt, truncation, timing, gate),
the deviations (settings layer, default on, cleanSummary, dual
render paths), and the Ink <Static> append-only rationale that
drove the inline full-mode render vs header-replacement split.
* docs: add Recommended pairing section to tool-use-summaries
Full-mode rendering of the summary works, but for small same-type
batches (Read × 3 and similar) the label visibly restates what the
tool lines already show. Pairing with ui.compactMode: true folds
the whole batch into a single labeled row, which is the cleanest
transcript shape once the label is available.
Adds a dedicated section showing the paired settings.json snippet
and explicitly calling out when each mode wins (and when to turn
the feature off instead).
* fix: address review feedback on tool-use summary generation
Addresses multiple issues from @chiga0's review:
Blocking — compact-mode label invisible for single-batch turns.
mergeCompactToolGroups's adjacency-only gating left a trailing
tool_use_summary in the merged result whenever there was no second
batch to merge across. That pushed mergedHistory.length lock-step
with history.length and MainContent's refreshStatic heuristic
(currMLen <= prevMLen) never fired, so Ink's append-only <Static>
never repainted the tool_group with its newly-looked-up label.
Drop tool_use_summary items unconditionally now; gemini_thought
still survives to avoid unnecessary repaints. New tests cover
the single-batch case and the summary-before-user-message case.
Blocking — stale summary appears after Ctrl+C on the next turn.
summarySignal captured the CURRENT turn's AbortController, but the
summary resolves during the NEXT turn's streaming window. The next
turn's submitQuery allocates a fresh controller, so the captured
signal was never aborted — Ctrl+C during the new turn used to let
the previous turn's summary land in the transcript seconds later.
Fix: dedicated per-batch AbortController tracked in a ref set,
aborted eagerly from cancelOngoingRequest; resolve-time check reads
the live abort state and turnCancelledRef.
High — summarizer input pollution.
geminiTools contained error/cancelled tools; retry-loop warnings
and "Cancelled by user" strings were feeding the fast model.
cleanSummary can only reject error-shaped output, not prevent the
model from hallucinating a plausible label from bad input (the PR's
own tmux screenshot showed "Read txt files · 5 tools" where 4 of
the 5 were prior-retry failures). Filter to status === 'success'
before building the prompt; skip the call entirely if nothing's
left.
High — unstable label on merged groups.
getCompactLabel iterated all callIds and returned the first hit,
so asynchronous resolution order made the header visibly flip
from SB to SA when batch A resolved after batch B. Lock onto
item.tools[0].callId to keep stable "leading batch governs"
semantics.
High — force-expanded groups in compact mode had no label at all.
Compact mode routes non-force-expand groups through
CompactToolGroupDisplay (consumes compactLabel) and force-expand
groups through the full ToolGroupMessage (ignores compactLabel);
the standalone ● line was gated on !compactMode, creating a dead
zone — exactly the diagnostically valuable case. MainContent now
computes absorbedCallIds (which groups actually consume the
header replacement) and passes summaryAbsorbed to
HistoryItemDisplay; force-expand groups in compact mode get the
standalone line as the label's only path to the screen.
Medium — cleanSummary robustness.
Extend quote-strip to Unicode curly + CJK corner brackets; strip
markdown emphasis (**bold**, _italic_); broaden refusal-prefix
rejection to curly-apostrophe "I can't", Chinese "我无法 / 我不能 /
抱歉 / 无法", and "Failed to / Sorry, / Request failed". 7 new
cleanSummary tests cover the added cases.
Low — concurrent-rendering safety.
Move historyRef.current = history from render phase into
useLayoutEffect so bailed renders can't leave a dropped value.
Low — CompactToolGroupDisplay readability.
Extract renderSummaryHeader / renderDefaultHeader helpers and
document the toolCalls.length > 1 count-suffix guard so a future
"fix" to >= 1 doesn't reintroduce "Read config.json · 1 tools".
Docs — add Scope & Lifecycle section to tool-use-summaries.md
covering (1) one generation per batch shared by both modes,
(2) no backfill on toggle / session resume, (3) main-agent batches
only with the Task-tool clarification.
* fix: address second-round review feedback on tool-use summaries
Critical — force-expand groups lost their summary entirely.
Previous round's "drop tool_use_summary unconditionally" merge fix
also stripped summaries for force-expanded groups, defeating the
exact case (errors, confirmations, focused shell) where the
standalone ● label is the label's only path to the screen. The
merge function now takes an absorbedCallIds set: summaries whose
preceding callIds are all absorbed by a compact tool_group header
are dropped (so refreshStatic still fires), but force-expanded
summaries pass through to be rendered standalone by
HistoryItemDisplay. MainContent computes absorbedCallIds from raw
history and passes it in. New tests cover both the absorbed-drop
and the force-expand-preserve cases plus the empty-set default
for callers that don't compute absorption.
Suggestion — late-arriving summaries could land out of order.
A slow fast-model call could resolve after the next turn's
content was committed, planting the ● label between later items
in full mode. The resolve callback now captures the first batch
callId, locates the corresponding tool_group at resolve time,
and drops the summary if a newer tool_group has already appeared
in history. New test exercises this with a manually-resolved
fast-model promise.
Suggestion — truncateJson allocated full JSON for large strings.
A 10MB ReadFile result was being JSON.stringify'd in full only to
be sliced down to 300 chars. Added preTruncate that walks the
value (depth-bounded to 4) and slices string leaves to maxLength
before serialization. Tests verify the input never reaches its
full pre-cap form.
Suggestion — settings description over-claimed SDK emission.
The description said summaries are emitted to SDK clients as a
tool_use_summary message; the SDK plumbing isn't actually wired
in this PR (the factory is exported for follow-up). Updated
settings.json description and regenerated the vscode schema to
state CLI-only scope explicitly.
Suggestion — fastModel data-boundary not documented.
When fastModel uses a different provider than the main session
model, tool inputs/outputs cross a new auth boundary that users
may not expect. Added "Data flow & privacy" section to the user
feature doc spelling out: same-provider fast model = no scope
change; different-provider = strictly larger sharing scope; two
escape hatches (same-provider fast model OR feature off).
Code-level mitigation (metadata-only mode) deferred.
This commit is contained in:
parent
7fe853a782
commit
f420742831
22 changed files with 2104 additions and 24 deletions
|
|
@ -1168,6 +1168,7 @@ export async function loadCliConfig(
|
|||
argv.maxSessionTurns ?? settings.model?.maxSessionTurns ?? -1,
|
||||
experimentalZedIntegration: argv.acp || argv.experimentalAcp || false,
|
||||
cronEnabled: settings.experimental?.cron ?? false,
|
||||
emitToolUseSummaries: settings.experimental?.emitToolUseSummaries ?? true,
|
||||
listExtensions: argv.listExtensions || false,
|
||||
overrideExtensions: overrideExtensions || argv.extensions,
|
||||
noBrowser: !!process.env['NO_BROWSER'],
|
||||
|
|
|
|||
|
|
@ -1862,6 +1862,16 @@ const SETTINGS_SCHEMA = {
|
|||
'Enable in-session cron/loop tools (experimental). When enabled, the model can create recurring prompts using cron_create, cron_list, and cron_delete tools. Can also be enabled via QWEN_CODE_ENABLE_CRON=1 environment variable.',
|
||||
showInDialog: true,
|
||||
},
|
||||
emitToolUseSummaries: {
|
||||
type: 'boolean',
|
||||
label: 'Tool Use Summaries',
|
||||
category: 'Experimental',
|
||||
requiresRestart: false,
|
||||
default: true,
|
||||
description:
|
||||
'Generate a short LLM-based label after each tool batch completes. In compact mode the label replaces the generic `Tool × N` header; in full mode it appears as a dim `● <label>` line below the tool group. Requires a fast model to be configured; runs in parallel with the next API call so latency is hidden. Currently affects interactive CLI rendering only — SDK / non-interactive emission of the `tool_use_summary` message is not yet wired (the message factory is exported for a follow-up PR). Can be overridden with QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0 or =1.',
|
||||
showInDialog: true,
|
||||
},
|
||||
},
|
||||
},
|
||||
} as const satisfies SettingsSchema;
|
||||
|
|
|
|||
|
|
@ -284,4 +284,23 @@ describe('<HistoryItemDisplay />', () => {
|
|||
|
||||
expect(lastFrame()).toMatchSnapshot();
|
||||
});
|
||||
|
||||
it('renders tool_use_summary as a dim badge line in full mode', () => {
|
||||
const item: HistoryItem = {
|
||||
id: 1,
|
||||
type: 'tool_use_summary',
|
||||
summary: 'Read txt files',
|
||||
precedingToolUseIds: ['c1', 'c2', 'c3', 'c4'],
|
||||
};
|
||||
const { lastFrame } = renderWithProviders(
|
||||
<HistoryItemDisplay
|
||||
{...baseItem}
|
||||
item={item}
|
||||
isPending={false}
|
||||
terminalWidth={80}
|
||||
/>,
|
||||
);
|
||||
expect(lastFrame()).toContain('Read txt files');
|
||||
expect(lastFrame()).toContain('●');
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -64,6 +64,21 @@ interface HistoryItemDisplayProps {
|
|||
activeShellPtyId?: number | null;
|
||||
embeddedShellFocused?: boolean;
|
||||
availableTerminalHeightGemini?: number;
|
||||
/**
|
||||
* When the item is a `tool_group`, an optional short LLM-generated label
|
||||
* summarizing the batch. Replaces the generic "Tool × N" line in compact
|
||||
* mode. Computed by the parent from `tool_use_summary` history items.
|
||||
*/
|
||||
compactLabel?: string;
|
||||
/**
|
||||
* When the item is a `tool_use_summary`, true if a sibling tool_group has
|
||||
* absorbed this label via its compact-mode header. The standalone `● <label>`
|
||||
* line is suppressed in that case. False for force-expanded groups in
|
||||
* compact mode (they render through the full ToolGroupMessage path and
|
||||
* don't consume compactLabel, so the standalone line is the label's only
|
||||
* path to the screen) and for all tool_use_summary items in full mode.
|
||||
*/
|
||||
summaryAbsorbed?: boolean;
|
||||
}
|
||||
|
||||
const HistoryItemDisplayComponent: React.FC<HistoryItemDisplayProps> = ({
|
||||
|
|
@ -77,6 +92,8 @@ const HistoryItemDisplayComponent: React.FC<HistoryItemDisplayProps> = ({
|
|||
activeShellPtyId,
|
||||
embeddedShellFocused,
|
||||
availableTerminalHeightGemini,
|
||||
compactLabel,
|
||||
summaryAbsorbed = false,
|
||||
}) => {
|
||||
const marginTop =
|
||||
item.type === 'gemini_content' || item.type === 'gemini_thought_content'
|
||||
|
|
@ -198,8 +215,33 @@ const HistoryItemDisplayComponent: React.FC<HistoryItemDisplayProps> = ({
|
|||
memoryWriteCount={itemForDisplay.memoryWriteCount}
|
||||
memoryReadCount={itemForDisplay.memoryReadCount}
|
||||
isUserInitiated={itemForDisplay.isUserInitiated}
|
||||
compactLabel={compactLabel}
|
||||
/>
|
||||
)}
|
||||
{/*
|
||||
`tool_use_summary` as a standalone inline item.
|
||||
|
||||
In full mode (`compactMode=false`), the label arrives via the fast-model
|
||||
call AFTER the tool_group has been committed to Ink's append-only
|
||||
<Static>, so we cannot update the tool_group's header retroactively.
|
||||
Rendering a standalone `● <label>` line appends cleanly.
|
||||
|
||||
In compact mode, the label is normally absorbed into the merged
|
||||
tool_group's header (via `compactLabel` prop to CompactToolGroupDisplay),
|
||||
and `summaryAbsorbed=true` is set so this block does nothing. But when
|
||||
the sibling tool_group is force-expanded (errors, confirmations,
|
||||
user-initiated, focused shell), the full-expand path ignores
|
||||
`compactLabel`, and `MainContent` leaves `summaryAbsorbed=false` —
|
||||
the standalone line below is then the label's only route to the UI,
|
||||
which is exactly the case where a summary is most diagnostically
|
||||
useful ("Fixed NPE in UserService" on an errored batch).
|
||||
*/}
|
||||
{itemForDisplay.type === 'tool_use_summary' &&
|
||||
(!compactMode || !summaryAbsorbed) && (
|
||||
<Box paddingLeft={1}>
|
||||
<Text dimColor>● {itemForDisplay.summary}</Text>
|
||||
</Box>
|
||||
)}
|
||||
{itemForDisplay.type === 'compression' && (
|
||||
<CompressionMessage compression={itemForDisplay.compression} />
|
||||
)}
|
||||
|
|
|
|||
|
|
@ -5,7 +5,8 @@
|
|||
*/
|
||||
|
||||
import { Box, Static } from 'ink';
|
||||
import { useEffect, useMemo, useRef } from 'react';
|
||||
import { useCallback, useEffect, useMemo, useRef } from 'react';
|
||||
import type { HistoryItem, HistoryItemWithoutId } from '../types.js';
|
||||
import { HistoryItemDisplay } from './HistoryItemDisplay.js';
|
||||
import { ShowMoreLines } from './ShowMoreLines.js';
|
||||
import { Notifications } from './Notifications.js';
|
||||
|
|
@ -16,7 +17,10 @@ import { useAppContext } from '../contexts/AppContext.js';
|
|||
import { AppHeader } from './AppHeader.js';
|
||||
import { DebugModeNotification } from './DebugModeNotification.js';
|
||||
import { useCompactMode } from '../contexts/CompactModeContext.js';
|
||||
import { mergeCompactToolGroups } from '../utils/mergeCompactToolGroups.js';
|
||||
import {
|
||||
isForceExpandGroup,
|
||||
mergeCompactToolGroups,
|
||||
} from '../utils/mergeCompactToolGroups.js';
|
||||
|
||||
// Limit Gemini messages to a very high number of lines to mitigate performance
|
||||
// issues in the worst case if we somehow get an enormous response from Gemini.
|
||||
|
|
@ -37,7 +41,47 @@ export const MainContent = () => {
|
|||
availableTerminalHeight,
|
||||
} = uiState;
|
||||
|
||||
// Merge consecutive tool_groups for compact mode display
|
||||
// Set of callIds whose label is absorbed by a compact-mode tool_group header.
|
||||
// Computed from RAW history (not merged) — force-expand status depends only
|
||||
// on the tool_group's own state, and mergeable groups don't change force-
|
||||
// expand status when merged. Iterating raw history avoids a circular
|
||||
// dependency with mergedHistory (which receives absorbedCallIds).
|
||||
//
|
||||
// In compact mode, non-force-expanded tool_groups render via
|
||||
// CompactToolGroupDisplay and consume the label as their header replacement.
|
||||
// Force-expanded groups (errors, confirmations, user-initiated, focused
|
||||
// shell) render through the full ToolGroupMessage path and ignore
|
||||
// compactLabel — their callIds are intentionally NOT in this set so the
|
||||
// standalone `● <label>` line in HistoryItemDisplay is the label's only
|
||||
// path to the screen.
|
||||
const absorbedCallIds = useMemo(() => {
|
||||
const absorbed = new Set<string>();
|
||||
if (!compactMode) return absorbed;
|
||||
for (const item of uiState.history) {
|
||||
if (item.type !== 'tool_group') continue;
|
||||
if (
|
||||
isForceExpandGroup(
|
||||
item,
|
||||
uiState.embeddedShellFocused ?? false,
|
||||
uiState.activePtyId,
|
||||
)
|
||||
) {
|
||||
continue;
|
||||
}
|
||||
for (const tool of item.tools) absorbed.add(tool.callId);
|
||||
}
|
||||
return absorbed;
|
||||
}, [
|
||||
compactMode,
|
||||
uiState.history,
|
||||
uiState.embeddedShellFocused,
|
||||
uiState.activePtyId,
|
||||
]);
|
||||
|
||||
// Merge consecutive tool_groups for compact mode display. Summaries for
|
||||
// absorbed call IDs are dropped during merge so refreshStatic fires;
|
||||
// summaries for force-expanded (non-absorbed) groups pass through so
|
||||
// HistoryItemDisplay can render them as standalone `● <label>` lines.
|
||||
const mergedHistory = useMemo(
|
||||
() =>
|
||||
compactMode
|
||||
|
|
@ -45,6 +89,7 @@ export const MainContent = () => {
|
|||
uiState.history,
|
||||
uiState.embeddedShellFocused,
|
||||
uiState.activePtyId,
|
||||
absorbedCallIds,
|
||||
)
|
||||
: uiState.history,
|
||||
[
|
||||
|
|
@ -52,9 +97,56 @@ export const MainContent = () => {
|
|||
uiState.history,
|
||||
uiState.embeddedShellFocused,
|
||||
uiState.activePtyId,
|
||||
absorbedCallIds,
|
||||
],
|
||||
);
|
||||
|
||||
// Build a callId → summary lookup from `tool_use_summary` history items so
|
||||
// compact-mode tool groups can render a semantic label instead of a generic
|
||||
// "Tool × N" line. A summary is indexed under every callId it covers; when
|
||||
// multiple groups are merged, the first group's summary wins (see below).
|
||||
const summaryByCallId = useMemo(() => {
|
||||
const map = new Map<string, string>();
|
||||
for (const item of uiState.history) {
|
||||
if (item.type === 'tool_use_summary') {
|
||||
for (const callId of item.precedingToolUseIds) {
|
||||
// First summary wins — earlier summaries represent the opening
|
||||
// intent of a batch streak, later ones would override it otherwise.
|
||||
if (!map.has(callId)) {
|
||||
map.set(callId, item.summary);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return map;
|
||||
}, [uiState.history]);
|
||||
|
||||
const isSummaryAbsorbed = useCallback(
|
||||
(item: HistoryItem | HistoryItemWithoutId): boolean => {
|
||||
if (item.type !== 'tool_use_summary') return false;
|
||||
return item.precedingToolUseIds.some((id) => absorbedCallIds.has(id));
|
||||
},
|
||||
[absorbedCallIds],
|
||||
);
|
||||
|
||||
const getCompactLabel = useCallback(
|
||||
(item: HistoryItem | HistoryItemWithoutId): string | undefined => {
|
||||
if (item.type !== 'tool_group' || item.tools.length === 0)
|
||||
return undefined;
|
||||
// Look up ONLY the first tool's callId. A merged group concatenates
|
||||
// batch A (earliest calls) then batch B; earlier iterations scanned
|
||||
// all callIds and returned "first hit", but async resolution order
|
||||
// breaks that — if B's summary resolves first, the header renders
|
||||
// SB; when A later resolves, the next render flips to SA. Anchoring
|
||||
// on item.tools[0].callId gives stable "leading batch governs"
|
||||
// semantics; if A's call failed and only B resolved, the header
|
||||
// stays blank for that group (acceptable — the fallback is the
|
||||
// default "Tool × N" rendering once the lookup misses).
|
||||
return summaryByCallId.get(item.tools[0].callId);
|
||||
},
|
||||
[summaryByCallId],
|
||||
);
|
||||
|
||||
// Ink's <Static> is append-only: once an item is rendered to the terminal
|
||||
// buffer, it cannot be replaced. In compact mode, when a new tool_group is
|
||||
// merged into a previous one, the merged result has FEWER items than the
|
||||
|
|
@ -102,6 +194,8 @@ export const MainContent = () => {
|
|||
item={h}
|
||||
isPending={false}
|
||||
commands={uiState.slashCommands}
|
||||
compactLabel={getCompactLabel(h)}
|
||||
summaryAbsorbed={isSummaryAbsorbed(h)}
|
||||
/>
|
||||
)),
|
||||
]}
|
||||
|
|
@ -123,6 +217,8 @@ export const MainContent = () => {
|
|||
isFocused={!uiState.isEditorDialogOpen}
|
||||
activeShellPtyId={uiState.activePtyId}
|
||||
embeddedShellFocused={uiState.embeddedShellFocused}
|
||||
compactLabel={getCompactLabel(item)}
|
||||
summaryAbsorbed={isSummaryAbsorbed(item)}
|
||||
/>
|
||||
))}
|
||||
<ShowMoreLines constrainHeight={uiState.constrainHeight} />
|
||||
|
|
|
|||
|
|
@ -36,7 +36,22 @@ function shellTool(
|
|||
};
|
||||
}
|
||||
|
||||
describe('<CompactToolGroupDisplay />', () => {
|
||||
function toolCall(
|
||||
overrides: Partial<IndividualToolCallDisplay> = {},
|
||||
): IndividualToolCallDisplay {
|
||||
return {
|
||||
callId: 'call-1',
|
||||
name: 'read_file',
|
||||
description: 'Read a.ts',
|
||||
resultDisplay: 'file contents',
|
||||
status: ToolCallStatus.Success,
|
||||
confirmationDetails: undefined,
|
||||
renderOutputAsMarkdown: false,
|
||||
...overrides,
|
||||
};
|
||||
}
|
||||
|
||||
describe('<CompactToolGroupDisplay /> — shell timeout plumbing', () => {
|
||||
beforeEach(() => {
|
||||
vi.useFakeTimers();
|
||||
vi.setSystemTime(NOW);
|
||||
|
|
@ -91,3 +106,76 @@ describe('<CompactToolGroupDisplay />', () => {
|
|||
expect(lastFrame()).not.toContain('timeout');
|
||||
});
|
||||
});
|
||||
|
||||
describe('<CompactToolGroupDisplay /> — summary label', () => {
|
||||
it('renders default header (active tool name + count) when no compactLabel is provided', () => {
|
||||
const tools = [
|
||||
toolCall({ callId: 'c1', name: 'read_file' }),
|
||||
toolCall({ callId: 'c2', name: 'read_file' }),
|
||||
toolCall({ callId: 'c3', name: 'grep' }),
|
||||
];
|
||||
const { lastFrame } = render(
|
||||
<CompactToolGroupDisplay toolCalls={tools} contentWidth={80} />,
|
||||
);
|
||||
const frame = lastFrame()!;
|
||||
// Active tool = last in array when none are executing/confirming.
|
||||
expect(frame).toContain('grep');
|
||||
expect(frame).toContain('× 3');
|
||||
});
|
||||
|
||||
it('replaces header with compactLabel when provided', () => {
|
||||
const tools = [
|
||||
toolCall({ callId: 'c1', name: 'read_file' }),
|
||||
toolCall({ callId: 'c2', name: 'grep' }),
|
||||
];
|
||||
const { lastFrame } = render(
|
||||
<CompactToolGroupDisplay
|
||||
toolCalls={tools}
|
||||
contentWidth={80}
|
||||
compactLabel="Searched in auth/"
|
||||
/>,
|
||||
);
|
||||
const frame = lastFrame()!;
|
||||
expect(frame).toContain('Searched in auth/');
|
||||
expect(frame).toContain('2 tools');
|
||||
// The raw tool name should not appear as the primary header when a
|
||||
// summary is shown.
|
||||
expect(frame).not.toContain('read_file × 2');
|
||||
});
|
||||
|
||||
it('shows tool count suffix only when multiple tools are present', () => {
|
||||
const tools = [toolCall({ callId: 'c1', name: 'read_file' })];
|
||||
const { lastFrame } = render(
|
||||
<CompactToolGroupDisplay
|
||||
toolCalls={tools}
|
||||
contentWidth={80}
|
||||
compactLabel="Read config.json"
|
||||
/>,
|
||||
);
|
||||
const frame = lastFrame()!;
|
||||
expect(frame).toContain('Read config.json');
|
||||
expect(frame).not.toContain('tools');
|
||||
});
|
||||
|
||||
it('renders nothing for empty tool calls', () => {
|
||||
const { lastFrame } = render(
|
||||
<CompactToolGroupDisplay toolCalls={[]} contentWidth={80} />,
|
||||
);
|
||||
expect(lastFrame()).toBe('');
|
||||
});
|
||||
|
||||
it('preserves default rendering for shell commands without label', () => {
|
||||
const tools = [
|
||||
toolCall({
|
||||
callId: 'c1',
|
||||
name: 'Bash',
|
||||
description: 'ls -la',
|
||||
}),
|
||||
];
|
||||
const { lastFrame } = render(
|
||||
<CompactToolGroupDisplay toolCalls={tools} contentWidth={80} />,
|
||||
);
|
||||
expect(lastFrame()).toContain('Bash');
|
||||
expect(lastFrame()).toContain('ls -la');
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -18,6 +18,13 @@ import { ToolElapsedTime } from '../shared/ToolElapsedTime.js';
|
|||
interface CompactToolGroupDisplayProps {
|
||||
toolCalls: IndividualToolCallDisplay[];
|
||||
contentWidth: number;
|
||||
/**
|
||||
* Optional LLM-generated label (~30 chars, git-commit-subject style) that
|
||||
* replaces the "active tool name + count + description" header when
|
||||
* present. Falls back to the default rendering while the label is still
|
||||
* being generated or if generation was skipped/failed.
|
||||
*/
|
||||
compactLabel?: string;
|
||||
}
|
||||
|
||||
// Priority: Confirming > Executing > Error > Canceled > Pending > Success
|
||||
|
|
@ -66,9 +73,57 @@ function getShellTimeoutMs(
|
|||
return undefined;
|
||||
}
|
||||
|
||||
/**
|
||||
* Summary-label header: bold label + " · N tools" count when there are 2+
|
||||
* tools in the batch. The count is intentionally suppressed for N=1 so
|
||||
* single-tool batches don't read as `Read config.json · 1 tools`. Future
|
||||
* edits: keep the `length > 1` guard, not `>= 1`.
|
||||
*/
|
||||
function renderSummaryHeader(label: string, count: number) {
|
||||
return (
|
||||
<>
|
||||
<Text bold>{label}</Text>
|
||||
{count > 1 ? (
|
||||
<Text color={theme.text.secondary}>
|
||||
{' · '}
|
||||
{count} tools
|
||||
</Text>
|
||||
) : null}
|
||||
</>
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Default header: active tool name + " × N" count + first-line description.
|
||||
* Same N=1 suffix suppression as `renderSummaryHeader`.
|
||||
*/
|
||||
function renderDefaultHeader(
|
||||
activeToolName: string,
|
||||
activeToolDescription: string,
|
||||
count: number,
|
||||
) {
|
||||
return (
|
||||
<>
|
||||
<Text bold>{activeToolName}</Text>
|
||||
{count > 1 ? (
|
||||
<Text color={theme.text.secondary}>
|
||||
{' × '}
|
||||
{count}
|
||||
</Text>
|
||||
) : null}
|
||||
{activeToolDescription ? (
|
||||
<Text color={theme.text.secondary}>
|
||||
{' '}
|
||||
{activeToolDescription}
|
||||
</Text>
|
||||
) : null}
|
||||
</>
|
||||
);
|
||||
}
|
||||
|
||||
export const CompactToolGroupDisplay: React.FC<
|
||||
CompactToolGroupDisplayProps
|
||||
> = ({ toolCalls, contentWidth }) => {
|
||||
> = ({ toolCalls, contentWidth, compactLabel }) => {
|
||||
if (toolCalls.length === 0) return null;
|
||||
|
||||
const overallStatus = getOverallStatus(toolCalls);
|
||||
|
|
@ -103,24 +158,18 @@ export const CompactToolGroupDisplay: React.FC<
|
|||
borderColor={borderColor}
|
||||
gap={0}
|
||||
>
|
||||
{/* Status line: icon + tool name + count + description + elapsed */}
|
||||
{/* Status line: icon + (summary | tool name + description) + count + elapsed */}
|
||||
<Box flexDirection="row">
|
||||
<ToolStatusIndicator status={overallStatus} name={activeTool.name} />
|
||||
<Box flexGrow={1}>
|
||||
<Text wrap="truncate-end">
|
||||
<Text bold>{activeTool.name}</Text>
|
||||
{toolCalls.length > 1 ? (
|
||||
<Text color={theme.text.secondary}>
|
||||
{' × '}
|
||||
{toolCalls.length}
|
||||
</Text>
|
||||
) : null}
|
||||
{activeToolDescription ? (
|
||||
<Text color={theme.text.secondary}>
|
||||
{' '}
|
||||
{activeToolDescription}
|
||||
</Text>
|
||||
) : null}
|
||||
{compactLabel
|
||||
? renderSummaryHeader(compactLabel, toolCalls.length)
|
||||
: renderDefaultHeader(
|
||||
activeTool.name,
|
||||
activeToolDescription,
|
||||
toolCalls.length,
|
||||
)}
|
||||
</Text>
|
||||
</Box>
|
||||
<ToolElapsedTime
|
||||
|
|
|
|||
|
|
@ -44,6 +44,12 @@ interface ToolGroupMessageProps {
|
|||
/** Pre-computed count of read ops from managed-auto-memory files. */
|
||||
memoryReadCount?: number;
|
||||
isUserInitiated?: boolean;
|
||||
/**
|
||||
* Short LLM-generated label for this batch. Used in compact mode in place
|
||||
* of the "active tool name × count" line. Undefined when summary
|
||||
* generation is disabled, still in-flight, or failed.
|
||||
*/
|
||||
compactLabel?: string;
|
||||
}
|
||||
|
||||
// Main component renders the border and maps the tools using ToolMessage
|
||||
|
|
@ -57,6 +63,7 @@ export const ToolGroupMessage: React.FC<ToolGroupMessageProps> = ({
|
|||
memoryWriteCount,
|
||||
memoryReadCount,
|
||||
isUserInitiated,
|
||||
compactLabel,
|
||||
}) => {
|
||||
const config = useConfig();
|
||||
const { compactMode } = useCompactMode();
|
||||
|
|
@ -139,6 +146,7 @@ export const ToolGroupMessage: React.FC<ToolGroupMessageProps> = ({
|
|||
<CompactToolGroupDisplay
|
||||
toolCalls={toolCalls}
|
||||
contentWidth={contentWidth}
|
||||
compactLabel={compactLabel}
|
||||
/>
|
||||
);
|
||||
}
|
||||
|
|
|
|||
|
|
@ -207,6 +207,8 @@ describe('useGeminiStream', () => {
|
|||
getArenaAgentClient: vi.fn(() => null),
|
||||
isCronEnabled: vi.fn(() => false),
|
||||
getCronScheduler: vi.fn(() => null),
|
||||
getEmitToolUseSummaries: vi.fn(() => false),
|
||||
getFastModel: vi.fn(() => undefined),
|
||||
getBackgroundTaskRegistry: vi.fn(() => ({
|
||||
setNotificationCallback: vi.fn(),
|
||||
})),
|
||||
|
|
@ -826,6 +828,322 @@ describe('useGeminiStream', () => {
|
|||
expect(result.current.streamingState).toBe(StreamingState.Responding);
|
||||
});
|
||||
|
||||
describe('Tool-use summary generation', () => {
|
||||
const makeCompletedToolCall = (
|
||||
callId: string,
|
||||
name: string,
|
||||
args: Record<string, unknown>,
|
||||
): TrackedCompletedToolCall =>
|
||||
({
|
||||
request: {
|
||||
callId,
|
||||
name,
|
||||
args,
|
||||
isClientInitiated: false,
|
||||
prompt_id: 'prompt-1',
|
||||
},
|
||||
status: 'success',
|
||||
responseSubmittedToGemini: false,
|
||||
tool: {
|
||||
name,
|
||||
displayName: name,
|
||||
description: 'desc',
|
||||
build: vi.fn(),
|
||||
} as any,
|
||||
invocation: {
|
||||
getDescription: () => 'Mock description',
|
||||
} as unknown as AnyToolInvocation,
|
||||
startTime: Date.now(),
|
||||
endTime: Date.now(),
|
||||
response: {
|
||||
callId,
|
||||
responseParts: [{ text: `result for ${name}` }],
|
||||
error: undefined,
|
||||
errorType: undefined,
|
||||
resultDisplay: '',
|
||||
},
|
||||
}) as TrackedCompletedToolCall;
|
||||
|
||||
const runCompletion = async (
|
||||
config: Config,
|
||||
completedTools: TrackedCompletedToolCall[],
|
||||
) => {
|
||||
let capturedOnComplete:
|
||||
| ((completedTools: TrackedToolCall[]) => Promise<void>)
|
||||
| null = null;
|
||||
|
||||
mockUseReactToolScheduler.mockImplementation((onComplete) => {
|
||||
capturedOnComplete = onComplete;
|
||||
return [
|
||||
completedTools,
|
||||
mockScheduleToolCalls,
|
||||
mockMarkToolsAsSubmitted,
|
||||
];
|
||||
});
|
||||
|
||||
// Seed history with a tool_group whose callIds match the completed
|
||||
// tools, so the staleness check (which verifies the tool_group is
|
||||
// still the latest in history) passes. Without this seed the summary
|
||||
// would be dropped as stale before addItem is called.
|
||||
const historyWithToolGroup = [
|
||||
{
|
||||
type: 'tool_group',
|
||||
id: 1,
|
||||
tools: completedTools.map((tc) => ({
|
||||
callId: tc.request.callId,
|
||||
name: tc.request.name,
|
||||
description: '',
|
||||
status: 0,
|
||||
resultDisplay: undefined,
|
||||
confirmationDetails: undefined,
|
||||
})),
|
||||
} as unknown as HistoryItem,
|
||||
];
|
||||
|
||||
renderHook(() =>
|
||||
useGeminiStream(
|
||||
new MockedGeminiClientClass(config),
|
||||
historyWithToolGroup,
|
||||
mockAddItem,
|
||||
config,
|
||||
mockLoadedSettings,
|
||||
mockOnDebugMessage,
|
||||
mockHandleSlashCommand,
|
||||
false,
|
||||
() => 'vscode' as EditorType,
|
||||
() => {},
|
||||
() => Promise.resolve(),
|
||||
false,
|
||||
() => {},
|
||||
() => {},
|
||||
() => {},
|
||||
() => {},
|
||||
80,
|
||||
24,
|
||||
),
|
||||
);
|
||||
|
||||
await act(async () => {
|
||||
if (capturedOnComplete) {
|
||||
await capturedOnComplete(completedTools);
|
||||
}
|
||||
});
|
||||
};
|
||||
|
||||
it('skips summary generation when the feature is disabled', async () => {
|
||||
const config = {
|
||||
...mockConfig,
|
||||
getEmitToolUseSummaries: vi.fn(() => false),
|
||||
getFastModel: vi.fn(() => 'qwen-fast'),
|
||||
getGeminiClient: vi.fn(() => ({
|
||||
generateContent: vi.fn(),
|
||||
})),
|
||||
} as unknown as Config;
|
||||
|
||||
await runCompletion(config, [
|
||||
makeCompletedToolCall('c1', 'Read', { file: 'a.ts' }),
|
||||
makeCompletedToolCall('c2', 'Grep', { pattern: 'foo' }),
|
||||
]);
|
||||
|
||||
// The flag is off — even though a fast model is configured, no summary
|
||||
// history item should be added.
|
||||
const summaryItems = (mockAddItem.mock.calls as any[][]).filter(
|
||||
(call) => call[0]?.type === 'tool_use_summary',
|
||||
);
|
||||
expect(summaryItems).toHaveLength(0);
|
||||
});
|
||||
|
||||
it('skips summary generation when no fast model is configured', async () => {
|
||||
const generateContent = vi.fn();
|
||||
const config = {
|
||||
...mockConfig,
|
||||
getEmitToolUseSummaries: vi.fn(() => true),
|
||||
getFastModel: vi.fn(() => undefined),
|
||||
getGeminiClient: vi.fn(() => ({ generateContent })),
|
||||
} as unknown as Config;
|
||||
|
||||
await runCompletion(config, [
|
||||
makeCompletedToolCall('c1', 'Read', { file: 'a.ts' }),
|
||||
]);
|
||||
|
||||
expect(generateContent).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('fires generation with tool input/output when enabled', async () => {
|
||||
const generateContent = vi.fn().mockResolvedValue({
|
||||
candidates: [{ content: { parts: [{ text: 'Searched auth/' }] } }],
|
||||
});
|
||||
const config = {
|
||||
...mockConfig,
|
||||
getEmitToolUseSummaries: vi.fn(() => true),
|
||||
getFastModel: vi.fn(() => 'qwen-fast'),
|
||||
getGeminiClient: vi.fn(() => ({ generateContent })),
|
||||
} as unknown as Config;
|
||||
|
||||
await runCompletion(config, [
|
||||
makeCompletedToolCall('c1', 'Grep', { pattern: 'login' }),
|
||||
makeCompletedToolCall('c2', 'Read', { file: 'auth.ts' }),
|
||||
]);
|
||||
|
||||
// Wait for the fire-and-forget promise chain to settle (addItem happens in .then()).
|
||||
await waitFor(() => {
|
||||
const summaryItems = (mockAddItem.mock.calls as any[][]).filter(
|
||||
(call) => call[0]?.type === 'tool_use_summary',
|
||||
);
|
||||
expect(summaryItems).toHaveLength(1);
|
||||
expect(summaryItems[0][0]).toMatchObject({
|
||||
type: 'tool_use_summary',
|
||||
summary: 'Searched auth/',
|
||||
precedingToolUseIds: ['c1', 'c2'],
|
||||
});
|
||||
});
|
||||
|
||||
// Model was called with the fast model and includes tool names in the prompt.
|
||||
expect(generateContent).toHaveBeenCalledTimes(1);
|
||||
const callArgs = generateContent.mock.calls[0];
|
||||
expect(callArgs[3]).toBe('qwen-fast');
|
||||
const userText = callArgs[0][0].parts[0].text as string;
|
||||
expect(userText).toContain('Tool: Grep');
|
||||
expect(userText).toContain('Tool: Read');
|
||||
expect(userText).toContain('"pattern":"login"');
|
||||
});
|
||||
|
||||
it('drops a late summary when a newer tool_group has been added', async () => {
|
||||
// Resolve the fast-model call but ensure history shows a NEWER
|
||||
// tool_group AFTER ours — simulates a slow summary landing during
|
||||
// the next turn. The summary must not be appended; otherwise the
|
||||
// ● label line would land in the wrong transcript position.
|
||||
let resolveSummary: (val: { candidates: unknown[] }) => void;
|
||||
const generateContent = vi.fn().mockImplementation(
|
||||
() =>
|
||||
new Promise((resolve) => {
|
||||
resolveSummary = resolve;
|
||||
}),
|
||||
);
|
||||
const config = {
|
||||
...mockConfig,
|
||||
getEmitToolUseSummaries: vi.fn(() => true),
|
||||
getFastModel: vi.fn(() => 'qwen-fast'),
|
||||
getGeminiClient: vi.fn(() => ({ generateContent })),
|
||||
} as unknown as Config;
|
||||
|
||||
let capturedOnComplete:
|
||||
| ((completedTools: TrackedToolCall[]) => Promise<void>)
|
||||
| null = null;
|
||||
const completedTools = [
|
||||
makeCompletedToolCall('c1', 'Read', { file: 'a.ts' }),
|
||||
];
|
||||
mockUseReactToolScheduler.mockImplementation((onComplete) => {
|
||||
capturedOnComplete = onComplete;
|
||||
return [
|
||||
completedTools,
|
||||
mockScheduleToolCalls,
|
||||
mockMarkToolsAsSubmitted,
|
||||
];
|
||||
});
|
||||
|
||||
// History initially has our tool_group, but a newer tool_group is
|
||||
// added before the summary resolves.
|
||||
const history: HistoryItem[] = [
|
||||
{
|
||||
type: 'tool_group',
|
||||
id: 1,
|
||||
tools: [
|
||||
{
|
||||
callId: 'c1',
|
||||
name: 'Read',
|
||||
description: '',
|
||||
status: 0,
|
||||
resultDisplay: undefined,
|
||||
confirmationDetails: undefined,
|
||||
},
|
||||
],
|
||||
} as unknown as HistoryItem,
|
||||
{
|
||||
type: 'tool_group',
|
||||
id: 2,
|
||||
tools: [
|
||||
{
|
||||
callId: 'c2',
|
||||
name: 'Edit',
|
||||
description: '',
|
||||
status: 0,
|
||||
resultDisplay: undefined,
|
||||
confirmationDetails: undefined,
|
||||
},
|
||||
],
|
||||
} as unknown as HistoryItem,
|
||||
];
|
||||
|
||||
renderHook(() =>
|
||||
useGeminiStream(
|
||||
new MockedGeminiClientClass(config),
|
||||
history,
|
||||
mockAddItem,
|
||||
config,
|
||||
mockLoadedSettings,
|
||||
mockOnDebugMessage,
|
||||
mockHandleSlashCommand,
|
||||
false,
|
||||
() => 'vscode' as EditorType,
|
||||
() => {},
|
||||
() => Promise.resolve(),
|
||||
false,
|
||||
() => {},
|
||||
() => {},
|
||||
() => {},
|
||||
() => {},
|
||||
80,
|
||||
24,
|
||||
),
|
||||
);
|
||||
|
||||
await act(async () => {
|
||||
if (capturedOnComplete) {
|
||||
await capturedOnComplete(completedTools);
|
||||
}
|
||||
});
|
||||
|
||||
// Resolve the summary — it should be dropped because tool_group id=2
|
||||
// is newer than our anchor tool_group id=1.
|
||||
await act(async () => {
|
||||
resolveSummary!({
|
||||
candidates: [{ content: { parts: [{ text: 'Read file' }] } }],
|
||||
});
|
||||
});
|
||||
|
||||
const summaryItems = (mockAddItem.mock.calls as any[][]).filter(
|
||||
(call) => call[0]?.type === 'tool_use_summary',
|
||||
);
|
||||
expect(summaryItems).toHaveLength(0);
|
||||
});
|
||||
|
||||
it('does not add a history item when the model returns empty', async () => {
|
||||
const generateContent = vi.fn().mockResolvedValue({
|
||||
candidates: [{ content: { parts: [{ text: '' }] } }],
|
||||
});
|
||||
const config = {
|
||||
...mockConfig,
|
||||
getEmitToolUseSummaries: vi.fn(() => true),
|
||||
getFastModel: vi.fn(() => 'qwen-fast'),
|
||||
getGeminiClient: vi.fn(() => ({ generateContent })),
|
||||
} as unknown as Config;
|
||||
|
||||
await runCompletion(config, [
|
||||
makeCompletedToolCall('c1', 'Read', { file: 'a.ts' }),
|
||||
]);
|
||||
|
||||
// The fast-model call happened but produced no label, so no history item.
|
||||
await waitFor(() => {
|
||||
expect(generateContent).toHaveBeenCalled();
|
||||
});
|
||||
const summaryItems = (mockAddItem.mock.calls as any[][]).filter(
|
||||
(call) => call[0]?.type === 'tool_use_summary',
|
||||
);
|
||||
expect(summaryItems).toHaveLength(0);
|
||||
});
|
||||
});
|
||||
|
||||
describe('Cancellation', () => {
|
||||
it('buffers streamed content until the throttle interval elapses', async () => {
|
||||
vi.useFakeTimers();
|
||||
|
|
|
|||
|
|
@ -4,7 +4,14 @@
|
|||
* SPDX-License-Identifier: Apache-2.0
|
||||
*/
|
||||
|
||||
import { useState, useRef, useCallback, useEffect, useMemo } from 'react';
|
||||
import {
|
||||
useState,
|
||||
useRef,
|
||||
useCallback,
|
||||
useEffect,
|
||||
useMemo,
|
||||
useLayoutEffect,
|
||||
} from 'react';
|
||||
import type {
|
||||
Config,
|
||||
EditorType,
|
||||
|
|
@ -42,6 +49,7 @@ import {
|
|||
ApiCancelEvent,
|
||||
isSupportedImageMimeType,
|
||||
getUnsupportedImageFormatWarning,
|
||||
generateToolUseSummary,
|
||||
} from '@qwen-code/qwen-code-core';
|
||||
import { type Part, type PartListUnion, FinishReason } from '@google/genai';
|
||||
import type {
|
||||
|
|
@ -80,6 +88,48 @@ import { useDualOutput } from '../../dualOutput/DualOutputContext.js';
|
|||
|
||||
const debugLogger = createDebugLogger('GEMINI_STREAM');
|
||||
|
||||
/**
|
||||
* Pull the assistant's most recent visible text from the UI history. Used as
|
||||
* an intent prefix for tool-use summary generation so the summarizer knows
|
||||
* what the user was trying to accomplish.
|
||||
*/
|
||||
function extractLastAssistantText(history: HistoryItem[]): string | undefined {
|
||||
for (let i = history.length - 1; i >= 0; i--) {
|
||||
const item = history[i];
|
||||
if (
|
||||
(item.type === 'gemini' || item.type === 'gemini_content') &&
|
||||
typeof item.text === 'string' &&
|
||||
item.text.trim().length > 0
|
||||
) {
|
||||
return item.text;
|
||||
}
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
/**
|
||||
* Flatten `functionResponse` parts into a compact string for the summarizer.
|
||||
* The summarizer itself truncates to 300 chars per field, so we just join
|
||||
* whatever is available without re-serializing.
|
||||
*/
|
||||
function extractToolResultText(parts: Part[] | Part | undefined): unknown {
|
||||
if (!parts) return '';
|
||||
const list = Array.isArray(parts) ? parts : [parts];
|
||||
const chunks: unknown[] = [];
|
||||
for (const part of list) {
|
||||
if ('functionResponse' in part && part.functionResponse) {
|
||||
const response = (part.functionResponse as { response?: unknown })
|
||||
.response;
|
||||
if (response !== undefined) chunks.push(response);
|
||||
} else if ('text' in part && typeof part.text === 'string') {
|
||||
chunks.push(part.text);
|
||||
}
|
||||
}
|
||||
if (chunks.length === 0) return '';
|
||||
if (chunks.length === 1) return chunks[0];
|
||||
return chunks;
|
||||
}
|
||||
|
||||
/**
|
||||
* Classify API error to StopFailureErrorType
|
||||
* @internal Exported for testing purposes
|
||||
|
|
@ -229,6 +279,22 @@ export const useGeminiStream = (
|
|||
const dualOutput = useDualOutput();
|
||||
const [isResponding, setIsResponding] = useState<boolean>(false);
|
||||
const [thought, setThought] = useState<ThoughtSummary | null>(null);
|
||||
// Hold the latest history in a ref so handleCompletedTools can read it
|
||||
// without depending on `history` (which would recreate the tool scheduler
|
||||
// every render). Use useLayoutEffect instead of writing during render —
|
||||
// writing refs in the render phase is unsafe under React's concurrent
|
||||
// rendering (a bailed-out render could leave the ref with a dropped value).
|
||||
const historyRef = useRef<HistoryItem[]>(history);
|
||||
useLayoutEffect(() => {
|
||||
historyRef.current = history;
|
||||
}, [history]);
|
||||
// In-flight tool-use-summary aborters. Each batch gets its own AbortController
|
||||
// because the captured turn controller is replaced when submitQuery starts
|
||||
// the next turn, and the summary call outlives the current turn (that's the
|
||||
// whole point — it overlaps with the next turn's streaming). cancelOngoingRequest
|
||||
// aborts all in-flight summaries so Ctrl+C during the next turn also kills
|
||||
// this turn's stale summary work.
|
||||
const summaryAbortRefsRef = useRef<Set<AbortController>>(new Set());
|
||||
const [pendingHistoryItem, pendingHistoryItemRef, setPendingHistoryItem] =
|
||||
useStateAndRef<HistoryItemWithoutId | null>(null);
|
||||
const [
|
||||
|
|
@ -495,6 +561,12 @@ export const useGeminiStream = (
|
|||
turnCancelledRef.current = true;
|
||||
isSubmittingQueryRef.current = false;
|
||||
abortControllerRef.current?.abort();
|
||||
// Cancel any in-flight tool-use-summary generations so their Promise.then
|
||||
// doesn't addItem a stale label after the user cancelled.
|
||||
for (const ac of summaryAbortRefsRef.current) {
|
||||
ac.abort();
|
||||
}
|
||||
summaryAbortRefsRef.current.clear();
|
||||
|
||||
// Report cancellation to arena status reporter (if in arena mode).
|
||||
// This is needed because cancellation during tool execution won't
|
||||
|
|
@ -1868,6 +1940,97 @@ export const useGeminiStream = (
|
|||
|
||||
markToolsAsSubmitted(callIdsToMarkAsSubmitted);
|
||||
|
||||
// Fire tool-use summary generation in parallel with the next API call.
|
||||
// The fast-model Haiku-equivalent latency (~1s) is hidden behind the
|
||||
// main-model streaming (5-30s). Mirrors Claude Code's query.ts:1411-1482
|
||||
// behavior. Fire-and-forget: failures are silent and never block the turn.
|
||||
// Subagent exclusion is implicit — useGeminiStream only drives the
|
||||
// main session; subagents run through agents/runtime/ with their own loop.
|
||||
if (config.getEmitToolUseSummaries()) {
|
||||
// Only summarize successful tools. Error/cancelled entries push
|
||||
// "Cancelled by user" / retry-loop warnings into the summarizer
|
||||
// prompt and produce plausibly-worded but misleading labels (the
|
||||
// fast model happily synthesizes "Attempted to read files" from a
|
||||
// batch that was mostly failures). cleanSummary can reject output
|
||||
// prefixes but not prevent this kind of polluted-input hallucination.
|
||||
const successfulTools = geminiTools.filter(
|
||||
(tc) => tc.status === 'success',
|
||||
);
|
||||
if (successfulTools.length > 0) {
|
||||
const toolInfoForSummary = successfulTools.map((tc) => ({
|
||||
name: tc.request.name,
|
||||
input: tc.request.args,
|
||||
output: extractToolResultText(tc.response.responseParts),
|
||||
}));
|
||||
const toolUseIds = successfulTools.map((tc) => tc.request.callId);
|
||||
const lastAssistantText = extractLastAssistantText(
|
||||
historyRef.current,
|
||||
);
|
||||
// Dedicated AbortController for this batch. Scoping it to the
|
||||
// current turn via abortControllerRef.current would be wrong —
|
||||
// submitQuery() below allocates a new controller for the next
|
||||
// turn, so the captured signal becomes stale the moment the
|
||||
// next turn starts. Instead, check the live abort state at
|
||||
// resolve time (which covers both Ctrl+C on the next turn and
|
||||
// mid-flight cancellation of this batch via turnCancelledRef).
|
||||
const summaryAbort = new AbortController();
|
||||
summaryAbortRefsRef.current.add(summaryAbort);
|
||||
|
||||
// Capture the first callId so we can locate "our" tool_group at
|
||||
// resolve time. If a newer tool_group has been added since we
|
||||
// fired (i.e., the conversation moved on), we drop the summary
|
||||
// rather than wedging the `● <label>` line between later items.
|
||||
const anchorCallId = toolUseIds[0];
|
||||
|
||||
void generateToolUseSummary({
|
||||
config,
|
||||
tools: toolInfoForSummary,
|
||||
signal: summaryAbort.signal,
|
||||
lastAssistantText,
|
||||
})
|
||||
.then((summary) => {
|
||||
summaryAbortRefsRef.current.delete(summaryAbort);
|
||||
const cancelled =
|
||||
turnCancelledRef.current ||
|
||||
abortControllerRef.current?.signal.aborted ||
|
||||
summaryAbort.signal.aborted;
|
||||
if (!summary || cancelled) return;
|
||||
|
||||
// Stale-summary check: only append if our tool_group is still
|
||||
// the latest one in history. If a newer batch landed while
|
||||
// the fast-model call was in flight, the conversation has
|
||||
// moved past this batch and dropping in a `● <label>` line
|
||||
// now would land it after later content (full mode) or
|
||||
// attribute it to the wrong group (compact mode).
|
||||
const currentHistory = historyRef.current;
|
||||
const ourIdx = currentHistory.findIndex(
|
||||
(h) =>
|
||||
h.type === 'tool_group' &&
|
||||
h.tools.some((t) => t.callId === anchorCallId),
|
||||
);
|
||||
if (ourIdx < 0) return;
|
||||
const laterToolGroupExists = currentHistory
|
||||
.slice(ourIdx + 1)
|
||||
.some((h) => h.type === 'tool_group');
|
||||
if (laterToolGroupExists) return;
|
||||
|
||||
if (summary && !cancelled) {
|
||||
addItem(
|
||||
{
|
||||
type: 'tool_use_summary',
|
||||
summary,
|
||||
precedingToolUseIds: toolUseIds,
|
||||
} as HistoryItemWithoutId,
|
||||
Date.now(),
|
||||
);
|
||||
}
|
||||
})
|
||||
.catch(() => {
|
||||
summaryAbortRefsRef.current.delete(summaryAbort);
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Don't continue if model was switched due to quota error
|
||||
if (modelSwitchedFromQuotaError) {
|
||||
return;
|
||||
|
|
|
|||
|
|
@ -210,6 +210,19 @@ export type HistoryItemToolGroup = HistoryItemBase & {
|
|||
isUserInitiated?: boolean;
|
||||
};
|
||||
|
||||
/**
|
||||
* Short LLM-generated label summarizing a preceding tool batch. Emitted after
|
||||
* the batch completes and consumed by compact-mode rendering to replace the
|
||||
* generic "Tool × N" line with something like "Searched in auth/". Also
|
||||
* surfaces to SDK clients as a `tool_use_summary` stream message.
|
||||
*/
|
||||
export type HistoryItemToolUseSummary = HistoryItemBase & {
|
||||
type: 'tool_use_summary';
|
||||
summary: string;
|
||||
/** Tool callIds this summary describes. Used to locate the target tool_group. */
|
||||
precedingToolUseIds: string[];
|
||||
};
|
||||
|
||||
export type HistoryItemNotification = HistoryItemBase & {
|
||||
type: 'notification';
|
||||
text: string;
|
||||
|
|
@ -471,6 +484,7 @@ export type HistoryItemWithoutId =
|
|||
| HistoryItemAbout
|
||||
| HistoryItemHelp
|
||||
| HistoryItemToolGroup
|
||||
| HistoryItemToolUseSummary
|
||||
| HistoryItemStats
|
||||
| HistoryItemModelStats
|
||||
| HistoryItemToolStats
|
||||
|
|
|
|||
|
|
@ -343,6 +343,134 @@ describe('mergeCompactToolGroups', () => {
|
|||
expect(merged[3].type).toBe('tool_group');
|
||||
});
|
||||
|
||||
it('drops trailing tool_use_summary after a single absorbed tool_group', () => {
|
||||
// Single-batch turn: one tool_group, then its summary arrives. The group
|
||||
// is non-force-expanded (compact-mode candidate), so its callId is in
|
||||
// absorbedCallIds — the summary is consumed by the compact header and
|
||||
// dropped from merged output. Without this drop, mergedHistory.length
|
||||
// would grow lock-step with history.length and MainContent's
|
||||
// refreshStatic heuristic would never fire.
|
||||
const items: HistoryItem[] = [
|
||||
createToolGroup(1, [createTool('c1', 'Shell', ToolCallStatus.Success)]),
|
||||
{
|
||||
type: 'tool_use_summary',
|
||||
id: 2,
|
||||
summary: 'Ran shell batch',
|
||||
precedingToolUseIds: ['c1'],
|
||||
},
|
||||
];
|
||||
|
||||
const merged = mergeCompactToolGroups(
|
||||
items,
|
||||
false,
|
||||
undefined,
|
||||
new Set(['c1']),
|
||||
);
|
||||
|
||||
expect(merged.length).toBe(1);
|
||||
expect(merged[0].id).toBe(1);
|
||||
expect(merged[0].type).toBe('tool_group');
|
||||
});
|
||||
|
||||
it('drops absorbed trailing tool_use_summary even when followed by visible non-mergeable items', () => {
|
||||
const items: HistoryItem[] = [
|
||||
createToolGroup(1, [createTool('c1', 'Shell', ToolCallStatus.Success)]),
|
||||
{
|
||||
type: 'tool_use_summary',
|
||||
id: 2,
|
||||
summary: 'Ran shell batch',
|
||||
precedingToolUseIds: ['c1'],
|
||||
},
|
||||
{ type: 'user', id: 3, text: 'next prompt' },
|
||||
];
|
||||
|
||||
const merged = mergeCompactToolGroups(
|
||||
items,
|
||||
false,
|
||||
undefined,
|
||||
new Set(['c1']),
|
||||
);
|
||||
|
||||
expect(merged.length).toBe(2);
|
||||
expect(merged[0].type).toBe('tool_group');
|
||||
expect(merged[1].type).toBe('user');
|
||||
});
|
||||
|
||||
it('preserves tool_use_summary for force-expanded (non-absorbed) tool_group', () => {
|
||||
// The errored tool_group is force-expanded: it renders through the full
|
||||
// ToolGroupMessage path, ignoring `compactLabel`. Its callId is NOT in
|
||||
// absorbedCallIds, so the summary must survive in merged output —
|
||||
// HistoryItemDisplay then renders it as a standalone `● <label>` line,
|
||||
// which is the only way the label reaches the screen for this group.
|
||||
const items: HistoryItem[] = [
|
||||
createToolGroup(1, [
|
||||
createTool('c1', 'Shell', ToolCallStatus.Error, 'boom'),
|
||||
]),
|
||||
{
|
||||
type: 'tool_use_summary',
|
||||
id: 2,
|
||||
summary: 'Tried shell batch',
|
||||
precedingToolUseIds: ['c1'],
|
||||
},
|
||||
];
|
||||
|
||||
// Empty absorbedCallIds — the errored group's callId is not absorbed.
|
||||
const merged = mergeCompactToolGroups(items, false, undefined, new Set());
|
||||
|
||||
expect(merged.length).toBe(2);
|
||||
expect(merged[0].type).toBe('tool_group');
|
||||
expect(merged[1].type).toBe('tool_use_summary');
|
||||
});
|
||||
|
||||
it('preserves tool_use_summary when no absorbedCallIds set is provided (default)', () => {
|
||||
// Default empty set — preserves all summaries. This is the safe default
|
||||
// for callers that don't compute absorption (e.g., older test fixtures
|
||||
// and any future callers outside MainContent).
|
||||
const items: HistoryItem[] = [
|
||||
createToolGroup(1, [createTool('c1', 'Shell', ToolCallStatus.Success)]),
|
||||
{
|
||||
type: 'tool_use_summary',
|
||||
id: 2,
|
||||
summary: 'Ran shell batch',
|
||||
precedingToolUseIds: ['c1'],
|
||||
},
|
||||
];
|
||||
|
||||
const merged = mergeCompactToolGroups(items);
|
||||
|
||||
expect(merged.length).toBe(2);
|
||||
expect(merged[0].type).toBe('tool_group');
|
||||
expect(merged[1].type).toBe('tool_use_summary');
|
||||
});
|
||||
|
||||
it('merges tool_groups separated by tool_use_summary (hidden in compact)', () => {
|
||||
// Two mergeable batches separated by an absorbed summary — the summary
|
||||
// is dropped during merge, the two groups concatenate.
|
||||
const items: HistoryItem[] = [
|
||||
createToolGroup(1, [createTool('c1', 'Shell', ToolCallStatus.Success)]),
|
||||
{
|
||||
type: 'tool_use_summary',
|
||||
id: 2,
|
||||
summary: 'Ran first shell batch',
|
||||
precedingToolUseIds: ['c1'],
|
||||
},
|
||||
createToolGroup(3, [createTool('c2', 'Shell', ToolCallStatus.Success)]),
|
||||
];
|
||||
|
||||
const merged = mergeCompactToolGroups(
|
||||
items,
|
||||
false,
|
||||
undefined,
|
||||
new Set(['c1', 'c2']),
|
||||
);
|
||||
|
||||
expect(merged.length).toBe(1);
|
||||
expect(merged[0].id).toBe(1);
|
||||
if (isToolGroup(merged[0])) {
|
||||
expect(merged[0].tools.map((t) => t.callId)).toEqual(['c1', 'c2']);
|
||||
}
|
||||
});
|
||||
|
||||
it('preserves trailing gemini_thought after merged group', () => {
|
||||
const items: HistoryItem[] = [
|
||||
createToolGroup(1, [createTool('c1', 'Shell', ToolCallStatus.Success)]),
|
||||
|
|
|
|||
|
|
@ -34,8 +34,12 @@ function isAgentWithPendingConfirmation(
|
|||
/**
|
||||
* Check if a tool_group history item should be excluded from merging due to force-expand conditions.
|
||||
* These conditions match ToolGroupMessage.tsx:105-112 showCompact logic.
|
||||
* Exported so MainContent can determine which callIds get their label
|
||||
* "absorbed" by the compact tool_group header vs which need the standalone
|
||||
* `● <label>` line rendered (force-expanded groups never go through the
|
||||
* compact path, so their label would otherwise be invisible).
|
||||
*/
|
||||
function isForceExpandGroup(
|
||||
export function isForceExpandGroup(
|
||||
item: HistoryItem,
|
||||
embeddedShellFocused: boolean,
|
||||
activeShellPtyId: number | undefined,
|
||||
|
|
@ -83,12 +87,16 @@ function isForceExpandGroup(
|
|||
|
||||
/**
|
||||
* Check if an item is hidden in compact mode (so it shouldn't break tool_group adjacency).
|
||||
* This mirrors HistoryItemDisplay.tsx:123-142 which hides gemini_thought / gemini_thought_content
|
||||
* when compactMode is true.
|
||||
* This mirrors HistoryItemDisplay.tsx which hides:
|
||||
* - `gemini_thought` / `gemini_thought_content` (thinking — hidden when compactMode is true),
|
||||
* - `tool_use_summary` (consumed upstream to decorate the adjacent tool_group's label;
|
||||
* never rendered standalone so it must not break adjacency between two batches).
|
||||
*/
|
||||
function isHiddenInCompactMode(item: HistoryItem): boolean {
|
||||
return (
|
||||
item.type === 'gemini_thought' || item.type === 'gemini_thought_content'
|
||||
item.type === 'gemini_thought' ||
|
||||
item.type === 'gemini_thought_content' ||
|
||||
item.type === 'tool_use_summary'
|
||||
);
|
||||
}
|
||||
|
||||
|
|
@ -104,12 +112,20 @@ function isHiddenInCompactMode(item: HistoryItem): boolean {
|
|||
* @param items - History items array
|
||||
* @param embeddedShellFocused - Whether embedded shell is focused
|
||||
* @param activeShellPtyId - PTY ID of the active shell (if any)
|
||||
* @param absorbedCallIds - Set of tool callIds whose summary label is consumed
|
||||
* by a compact-mode tool_group header (i.e., the corresponding tool_group is
|
||||
* NOT force-expanded). Summaries for these callIds are dropped from the
|
||||
* merged result so MainContent's refreshStatic heuristic fires and the
|
||||
* tool_group re-renders with its label. Summaries for force-expanded groups
|
||||
* pass through unchanged so HistoryItemDisplay can render them as standalone
|
||||
* `● <label>` lines (the compact path doesn't consume their label).
|
||||
* @returns New array with merged tool_groups (does not mutate input)
|
||||
*/
|
||||
export function mergeCompactToolGroups(
|
||||
items: HistoryItem[],
|
||||
embeddedShellFocused: boolean = false,
|
||||
activeShellPtyId: number | undefined = undefined,
|
||||
absorbedCallIds: ReadonlySet<string> = new Set(),
|
||||
): HistoryItem[] {
|
||||
const result: HistoryItem[] = [];
|
||||
let i = 0;
|
||||
|
|
@ -117,6 +133,34 @@ export function mergeCompactToolGroups(
|
|||
while (i < items.length) {
|
||||
const item = items[i];
|
||||
|
||||
// Drop `tool_use_summary` items whose preceding callIds are *all* absorbed
|
||||
// by a compact tool_group header. Those headers will display the label
|
||||
// directly (via the `compactLabel` lookup in MainContent), so keeping the
|
||||
// standalone summary in the merged result would either double-display the
|
||||
// label (if HistoryItemDisplay rendered both) or, more importantly, would
|
||||
// bump mergedHistory.length lock-step with history.length and prevent
|
||||
// refreshStatic from firing — Ink's <Static> would never repaint the
|
||||
// committed tool_group with the new label.
|
||||
//
|
||||
// Summaries with at least one non-absorbed preceding callId — e.g., when
|
||||
// the corresponding tool_group is force-expanded (errors / confirming /
|
||||
// user-initiated / focused shell) and renders through the full
|
||||
// ToolGroupMessage path that does not consume `compactLabel` — must
|
||||
// survive in the merged result so HistoryItemDisplay can render them as
|
||||
// standalone `● <label>` lines.
|
||||
if (item.type === 'tool_use_summary') {
|
||||
const allAbsorbed =
|
||||
item.precedingToolUseIds.length > 0 &&
|
||||
item.precedingToolUseIds.every((id) => absorbedCallIds.has(id));
|
||||
if (allAbsorbed) {
|
||||
i++;
|
||||
continue;
|
||||
}
|
||||
result.push(item);
|
||||
i++;
|
||||
continue;
|
||||
}
|
||||
|
||||
// Pass through non-mergeable items unchanged
|
||||
if (
|
||||
item.type !== 'tool_group' ||
|
||||
|
|
|
|||
|
|
@ -383,6 +383,7 @@ export interface ConfigParameters {
|
|||
sessionTokenLimit?: number;
|
||||
experimentalZedIntegration?: boolean;
|
||||
cronEnabled?: boolean;
|
||||
emitToolUseSummaries?: boolean;
|
||||
listExtensions?: boolean;
|
||||
overrideExtensions?: string[];
|
||||
allowedMcpServers?: string[];
|
||||
|
|
@ -631,6 +632,7 @@ export class Config {
|
|||
private readonly cliVersion?: string;
|
||||
private readonly experimentalZedIntegration: boolean = false;
|
||||
private readonly cronEnabled: boolean = false;
|
||||
private readonly emitToolUseSummaries: boolean = true;
|
||||
private readonly chatRecordingEnabled: boolean;
|
||||
private readonly loadMemoryFromIncludeDirectories: boolean = false;
|
||||
private readonly importFormat: 'tree' | 'flat';
|
||||
|
|
@ -773,6 +775,7 @@ export class Config {
|
|||
this.experimentalZedIntegration =
|
||||
params.experimentalZedIntegration ?? false;
|
||||
this.cronEnabled = params.cronEnabled ?? false;
|
||||
this.emitToolUseSummaries = params.emitToolUseSummaries ?? true;
|
||||
this.listExtensions = params.listExtensions ?? false;
|
||||
this.overrideExtensions = params.overrideExtensions;
|
||||
this.noBrowser = params.noBrowser ?? false;
|
||||
|
|
@ -1990,6 +1993,22 @@ export class Config {
|
|||
return this.cronEnabled;
|
||||
}
|
||||
|
||||
/**
|
||||
* Whether the turn loop should fire a fast-model call after each tool batch
|
||||
* to emit a `tool_use_summary` message. Mirrors Claude Code's
|
||||
* `CLAUDE_CODE_EMIT_TOOL_USE_SUMMARIES` gate, but defaults to on so the
|
||||
* compact-mode UI benefits without configuration.
|
||||
*
|
||||
* Env overrides (either direction): `QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0`
|
||||
* to force off, `=1` to force on.
|
||||
*/
|
||||
getEmitToolUseSummaries(): boolean {
|
||||
const env = process.env['QWEN_CODE_EMIT_TOOL_USE_SUMMARIES'];
|
||||
if (env === '0' || env === 'false') return false;
|
||||
if (env === '1' || env === 'true') return true;
|
||||
return this.emitToolUseSummaries;
|
||||
}
|
||||
|
||||
getEnableRecursiveFileSearch(): boolean {
|
||||
return this.fileFiltering.enableRecursiveFileSearch;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -147,6 +147,7 @@ export * from './services/sessionService.js';
|
|||
export * from './services/sessionTitle.js';
|
||||
export { stripTerminalControlSequences } from './utils/terminalSafe.js';
|
||||
export * from './services/shellExecutionService.js';
|
||||
export * from './services/toolUseSummary.js';
|
||||
export * from './utils/bareMode.js';
|
||||
|
||||
// ============================================================================
|
||||
|
|
|
|||
409
packages/core/src/services/toolUseSummary.test.ts
Normal file
409
packages/core/src/services/toolUseSummary.test.ts
Normal file
|
|
@ -0,0 +1,409 @@
|
|||
/**
|
||||
* @license
|
||||
* Copyright 2025 Qwen Team
|
||||
* SPDX-License-Identifier: Apache-2.0
|
||||
*/
|
||||
|
||||
import { describe, it, expect, vi, beforeEach } from 'vitest';
|
||||
import type { Config } from '../config/config.js';
|
||||
import {
|
||||
cleanSummary,
|
||||
createToolUseSummaryMessage,
|
||||
generateToolUseSummary,
|
||||
TOOL_USE_SUMMARY_SYSTEM_PROMPT,
|
||||
truncateJson,
|
||||
} from './toolUseSummary.js';
|
||||
|
||||
// Sanity helper for the pre-truncation tests: `y` count in the output must
|
||||
// be less than maxLength (since JSON quoting and the field name eat some of
|
||||
// the budget) — confirming the input never reached its full 10MB form.
|
||||
function maxLengthGuard(maxLength: number) {
|
||||
return maxLength;
|
||||
}
|
||||
|
||||
describe('truncateJson', () => {
|
||||
it('returns JSON for short values', () => {
|
||||
expect(truncateJson({ foo: 'bar' }, 100)).toBe('{"foo":"bar"}');
|
||||
expect(truncateJson('hello', 100)).toBe('"hello"');
|
||||
expect(truncateJson(42, 100)).toBe('42');
|
||||
});
|
||||
|
||||
it('truncates long values with ellipsis', () => {
|
||||
const long = 'x'.repeat(500);
|
||||
const result = truncateJson(long, 50);
|
||||
expect(result.length).toBe(50);
|
||||
expect(result.endsWith('...')).toBe(true);
|
||||
});
|
||||
|
||||
it('handles undefined', () => {
|
||||
expect(truncateJson(undefined, 100)).toBe('[undefined]');
|
||||
});
|
||||
|
||||
it('pre-truncates large string leaves before JSON serialization', () => {
|
||||
// Ensures we don't allocate the full JSON for a 10MB string just to
|
||||
// slice it to maxLength. The result must still be ≤ maxLength.
|
||||
const huge = 'x'.repeat(10_000_000);
|
||||
const result = truncateJson(huge, 300);
|
||||
expect(result.length).toBeLessThanOrEqual(300);
|
||||
expect(result.endsWith('...')).toBe(true);
|
||||
});
|
||||
|
||||
it('pre-truncates large string fields inside objects', () => {
|
||||
// The point: a 10MB string field must not be fully serialized before the
|
||||
// outer cap is applied (would allocate 10MB+ of JSON only to slice it).
|
||||
// Pre-truncation slices each string leaf to maxLength first, so the
|
||||
// serializer never sees the full payload.
|
||||
const obj = { content: 'y'.repeat(10_000_000) };
|
||||
const result = truncateJson(obj, 300);
|
||||
expect(result.length).toBeLessThanOrEqual(300);
|
||||
// The huge field is truncated to <= maxLength characters — far below
|
||||
// its original 10M length.
|
||||
const yCount = (result.match(/y/g) ?? []).length;
|
||||
expect(yCount).toBeLessThan(maxLengthGuard(300));
|
||||
});
|
||||
|
||||
it('handles circular references gracefully', () => {
|
||||
const circular: Record<string, unknown> = {};
|
||||
circular['self'] = circular;
|
||||
expect(truncateJson(circular, 100)).toBe('[unable to serialize]');
|
||||
});
|
||||
});
|
||||
|
||||
describe('cleanSummary', () => {
|
||||
it('preserves well-formed labels', () => {
|
||||
expect(cleanSummary('Searched in auth/')).toBe('Searched in auth/');
|
||||
expect(cleanSummary('Fixed NPE in UserService')).toBe(
|
||||
'Fixed NPE in UserService',
|
||||
);
|
||||
});
|
||||
|
||||
it('takes first line only', () => {
|
||||
expect(cleanSummary('Created signup endpoint\nSome reasoning')).toBe(
|
||||
'Created signup endpoint',
|
||||
);
|
||||
});
|
||||
|
||||
it('strips surrounding quotes', () => {
|
||||
expect(cleanSummary('"Read config.json"')).toBe('Read config.json');
|
||||
expect(cleanSummary("'Ran failing tests'")).toBe('Ran failing tests');
|
||||
expect(cleanSummary('`Fixed bug`')).toBe('Fixed bug');
|
||||
});
|
||||
|
||||
it('strips leading bullet/dash', () => {
|
||||
expect(cleanSummary('- Searched auth')).toBe('Searched auth');
|
||||
expect(cleanSummary('* Read files')).toBe('Read files');
|
||||
expect(cleanSummary('• Fixed NPE')).toBe('Fixed NPE');
|
||||
});
|
||||
|
||||
it('strips Label:/Summary: prefixes', () => {
|
||||
expect(cleanSummary('Label: Fixed bug')).toBe('Fixed bug');
|
||||
expect(cleanSummary('Summary: Ran tests')).toBe('Ran tests');
|
||||
expect(cleanSummary('Label:Searched files')).toBe('Searched files');
|
||||
});
|
||||
|
||||
it('rejects error messages', () => {
|
||||
expect(cleanSummary('API error: 500')).toBe('');
|
||||
expect(cleanSummary('Error: something went wrong')).toBe('');
|
||||
expect(cleanSummary('I cannot generate a summary')).toBe('');
|
||||
expect(cleanSummary("I can't help with that")).toBe('');
|
||||
expect(cleanSummary('Unable to determine')).toBe('');
|
||||
});
|
||||
|
||||
it('caps length at 100 chars', () => {
|
||||
const long = 'x'.repeat(200);
|
||||
expect(cleanSummary(long).length).toBe(100);
|
||||
});
|
||||
|
||||
it('returns empty for empty/whitespace input', () => {
|
||||
expect(cleanSummary('')).toBe('');
|
||||
expect(cleanSummary(' ')).toBe('');
|
||||
expect(cleanSummary('\n\n')).toBe('');
|
||||
});
|
||||
|
||||
it('preserves CJK labels', () => {
|
||||
expect(cleanSummary('搜索了 auth 模块')).toBe('搜索了 auth 模块');
|
||||
});
|
||||
|
||||
it('strips Unicode curly quotes', () => {
|
||||
expect(cleanSummary('“Read config.json”')).toBe('Read config.json');
|
||||
expect(cleanSummary('‘Ran tests’')).toBe('Ran tests');
|
||||
});
|
||||
|
||||
it('strips CJK corner brackets', () => {
|
||||
expect(cleanSummary('「搜索了 auth 模块」')).toBe('搜索了 auth 模块');
|
||||
expect(cleanSummary('『Fixed bug』')).toBe('Fixed bug');
|
||||
});
|
||||
|
||||
it('strips markdown emphasis markers', () => {
|
||||
expect(cleanSummary('**Read 4 files**')).toBe('Read 4 files');
|
||||
expect(cleanSummary('_Searched auth_')).toBe('Searched auth');
|
||||
expect(cleanSummary('__Fixed NPE__')).toBe('Fixed NPE');
|
||||
});
|
||||
|
||||
it('rejects Chinese refusal responses', () => {
|
||||
expect(cleanSummary('我无法生成摘要')).toBe('');
|
||||
expect(cleanSummary('我不能回答这个')).toBe('');
|
||||
expect(cleanSummary('抱歉,我不能帮助')).toBe('');
|
||||
expect(cleanSummary('无法确定')).toBe('');
|
||||
expect(cleanSummary('无法完成')).toBe('');
|
||||
});
|
||||
|
||||
it('rejects curly-apostrophe English refusals', () => {
|
||||
// U+2019 right single quotation mark — models often emit this for
|
||||
// typographic apostrophes and the ASCII-only check missed it.
|
||||
expect(cleanSummary('I can’t generate that')).toBe('');
|
||||
});
|
||||
|
||||
it('rejects additional English refusal patterns', () => {
|
||||
expect(cleanSummary('Failed to read files')).toBe('');
|
||||
expect(cleanSummary('Sorry, I cannot')).toBe('');
|
||||
expect(cleanSummary('Request failed')).toBe('');
|
||||
});
|
||||
});
|
||||
|
||||
describe('createToolUseSummaryMessage', () => {
|
||||
it('creates a message with generated uuid and timestamp', () => {
|
||||
const msg = createToolUseSummaryMessage('Fixed bug', ['call-1', 'call-2']);
|
||||
expect(msg.type).toBe('tool_use_summary');
|
||||
expect(msg.summary).toBe('Fixed bug');
|
||||
expect(msg.precedingToolUseIds).toEqual(['call-1', 'call-2']);
|
||||
expect(msg.uuid).toMatch(/^[0-9a-f-]{36}$/);
|
||||
expect(msg.timestamp).toMatch(/^\d{4}-\d{2}-\d{2}T/);
|
||||
});
|
||||
|
||||
it('generates distinct uuids', () => {
|
||||
const a = createToolUseSummaryMessage('a', []);
|
||||
const b = createToolUseSummaryMessage('b', []);
|
||||
expect(a.uuid).not.toBe(b.uuid);
|
||||
});
|
||||
});
|
||||
|
||||
describe('generateToolUseSummary', () => {
|
||||
const makeMockConfig = (
|
||||
fastModel: string | undefined,
|
||||
generateContentFn?: ReturnType<typeof vi.fn>,
|
||||
): Config => {
|
||||
const mockClient = generateContentFn
|
||||
? { generateContent: generateContentFn }
|
||||
: undefined;
|
||||
return {
|
||||
getFastModel: () => fastModel,
|
||||
getGeminiClient: () => mockClient,
|
||||
} as unknown as Config;
|
||||
};
|
||||
|
||||
const abortController = (): AbortController => new AbortController();
|
||||
|
||||
beforeEach(() => {
|
||||
vi.clearAllMocks();
|
||||
});
|
||||
|
||||
it('returns null when tools array is empty', async () => {
|
||||
const config = makeMockConfig('qwen-fast');
|
||||
const result = await generateToolUseSummary({
|
||||
config,
|
||||
tools: [],
|
||||
signal: abortController().signal,
|
||||
});
|
||||
expect(result).toBeNull();
|
||||
});
|
||||
|
||||
it('returns null when no fast model is configured', async () => {
|
||||
const config = makeMockConfig(undefined);
|
||||
const result = await generateToolUseSummary({
|
||||
config,
|
||||
tools: [{ name: 'Read', input: { file: 'a.ts' }, output: '...' }],
|
||||
signal: abortController().signal,
|
||||
});
|
||||
expect(result).toBeNull();
|
||||
});
|
||||
|
||||
it('returns null when signal is already aborted', async () => {
|
||||
const config = makeMockConfig('qwen-fast');
|
||||
const ac = abortController();
|
||||
ac.abort();
|
||||
const result = await generateToolUseSummary({
|
||||
config,
|
||||
tools: [{ name: 'Read', input: {}, output: '' }],
|
||||
signal: ac.signal,
|
||||
});
|
||||
expect(result).toBeNull();
|
||||
});
|
||||
|
||||
it('calls model with fast model id and system prompt', async () => {
|
||||
const generateContentFn = vi.fn().mockResolvedValue({
|
||||
candidates: [
|
||||
{
|
||||
content: { parts: [{ text: 'Searched in auth/' }] },
|
||||
},
|
||||
],
|
||||
});
|
||||
const config = makeMockConfig('qwen-fast', generateContentFn);
|
||||
|
||||
const result = await generateToolUseSummary({
|
||||
config,
|
||||
tools: [
|
||||
{ name: 'Grep', input: { pattern: 'login' }, output: '3 matches' },
|
||||
],
|
||||
signal: abortController().signal,
|
||||
});
|
||||
|
||||
expect(result).toBe('Searched in auth/');
|
||||
expect(generateContentFn).toHaveBeenCalledTimes(1);
|
||||
|
||||
const args = generateContentFn.mock.calls[0];
|
||||
const [contents, generationConfig, , model, promptId] = args;
|
||||
|
||||
expect(model).toBe('qwen-fast');
|
||||
expect(promptId).toBe('tool_use_summary_generation');
|
||||
expect(generationConfig.systemInstruction).toBe(
|
||||
TOOL_USE_SUMMARY_SYSTEM_PROMPT,
|
||||
);
|
||||
expect(generationConfig.tools).toEqual([]);
|
||||
|
||||
const userText = contents[0].parts[0].text as string;
|
||||
expect(userText).toContain('Tool: Grep');
|
||||
expect(userText).toContain('"pattern":"login"');
|
||||
expect(userText).toContain('3 matches');
|
||||
expect(userText).toContain('Label:');
|
||||
});
|
||||
|
||||
it('includes lastAssistantText as intent prefix', async () => {
|
||||
const generateContentFn = vi.fn().mockResolvedValue({
|
||||
candidates: [{ content: { parts: [{ text: 'Fixed auth bug' }] } }],
|
||||
});
|
||||
const config = makeMockConfig('qwen-fast', generateContentFn);
|
||||
|
||||
await generateToolUseSummary({
|
||||
config,
|
||||
tools: [{ name: 'Edit', input: {}, output: '' }],
|
||||
signal: abortController().signal,
|
||||
lastAssistantText:
|
||||
'I will now fix the authentication bug in the login flow.',
|
||||
});
|
||||
|
||||
const userText = generateContentFn.mock.calls[0][0][0].parts[0]
|
||||
.text as string;
|
||||
expect(userText).toContain(
|
||||
"User's intent (from assistant's last message):",
|
||||
);
|
||||
expect(userText).toContain('fix the authentication bug');
|
||||
});
|
||||
|
||||
it('truncates lastAssistantText to 200 chars', async () => {
|
||||
const generateContentFn = vi.fn().mockResolvedValue({
|
||||
candidates: [{ content: { parts: [{ text: 'Done' }] } }],
|
||||
});
|
||||
const config = makeMockConfig('qwen-fast', generateContentFn);
|
||||
|
||||
const longText = 'A'.repeat(500);
|
||||
await generateToolUseSummary({
|
||||
config,
|
||||
tools: [{ name: 'Edit', input: {}, output: '' }],
|
||||
signal: abortController().signal,
|
||||
lastAssistantText: longText,
|
||||
});
|
||||
|
||||
const userText = generateContentFn.mock.calls[0][0][0].parts[0]
|
||||
.text as string;
|
||||
// 200 As + some wrapper text, but no 500 As
|
||||
expect(userText).toContain('A'.repeat(200));
|
||||
expect(userText).not.toContain('A'.repeat(201));
|
||||
});
|
||||
|
||||
it('uses explicit model parameter over config fast model', async () => {
|
||||
const generateContentFn = vi.fn().mockResolvedValue({
|
||||
candidates: [{ content: { parts: [{ text: 'Done' }] } }],
|
||||
});
|
||||
const config = makeMockConfig('qwen-fast', generateContentFn);
|
||||
|
||||
await generateToolUseSummary({
|
||||
config,
|
||||
tools: [{ name: 'Edit', input: {}, output: '' }],
|
||||
signal: abortController().signal,
|
||||
model: 'qwen-turbo-explicit',
|
||||
});
|
||||
|
||||
expect(generateContentFn.mock.calls[0][3]).toBe('qwen-turbo-explicit');
|
||||
});
|
||||
|
||||
it('returns null when model returns empty text', async () => {
|
||||
const generateContentFn = vi.fn().mockResolvedValue({
|
||||
candidates: [{ content: { parts: [{ text: '' }] } }],
|
||||
});
|
||||
const config = makeMockConfig('qwen-fast', generateContentFn);
|
||||
|
||||
const result = await generateToolUseSummary({
|
||||
config,
|
||||
tools: [{ name: 'Read', input: {}, output: '' }],
|
||||
signal: abortController().signal,
|
||||
});
|
||||
expect(result).toBeNull();
|
||||
});
|
||||
|
||||
it('returns null when model call throws', async () => {
|
||||
const generateContentFn = vi.fn().mockRejectedValue(new Error('API error'));
|
||||
const config = makeMockConfig('qwen-fast', generateContentFn);
|
||||
|
||||
const result = await generateToolUseSummary({
|
||||
config,
|
||||
tools: [{ name: 'Read', input: {}, output: '' }],
|
||||
signal: abortController().signal,
|
||||
});
|
||||
expect(result).toBeNull();
|
||||
});
|
||||
|
||||
it('returns null when the signal aborts during the call', async () => {
|
||||
const ac = abortController();
|
||||
const generateContentFn = vi.fn().mockImplementation(async () => {
|
||||
ac.abort();
|
||||
throw new Error('aborted');
|
||||
});
|
||||
const config = makeMockConfig('qwen-fast', generateContentFn);
|
||||
|
||||
const result = await generateToolUseSummary({
|
||||
config,
|
||||
tools: [{ name: 'Read', input: {}, output: '' }],
|
||||
signal: ac.signal,
|
||||
});
|
||||
expect(result).toBeNull();
|
||||
});
|
||||
|
||||
it('truncates tool input/output to 300 chars', async () => {
|
||||
const generateContentFn = vi.fn().mockResolvedValue({
|
||||
candidates: [{ content: { parts: [{ text: 'Read file' }] } }],
|
||||
});
|
||||
const config = makeMockConfig('qwen-fast', generateContentFn);
|
||||
|
||||
const hugeInput = { content: 'x'.repeat(10000) };
|
||||
const hugeOutput = 'y'.repeat(10000);
|
||||
|
||||
await generateToolUseSummary({
|
||||
config,
|
||||
tools: [{ name: 'Read', input: hugeInput, output: hugeOutput }],
|
||||
signal: abortController().signal,
|
||||
});
|
||||
|
||||
const userText = generateContentFn.mock.calls[0][0][0].parts[0]
|
||||
.text as string;
|
||||
// Each field capped at 300, so overall prompt shouldn't contain the
|
||||
// full 10K repetition.
|
||||
expect(userText).not.toContain('x'.repeat(500));
|
||||
expect(userText).not.toContain('y'.repeat(500));
|
||||
expect(userText).toContain('...');
|
||||
});
|
||||
|
||||
it('cleans markdown bullets / quotes from model output', async () => {
|
||||
const generateContentFn = vi.fn().mockResolvedValue({
|
||||
candidates: [{ content: { parts: [{ text: '- "Searched auth/"' }] } }],
|
||||
});
|
||||
const config = makeMockConfig('qwen-fast', generateContentFn);
|
||||
|
||||
const result = await generateToolUseSummary({
|
||||
config,
|
||||
tools: [{ name: 'Grep', input: {}, output: '' }],
|
||||
signal: abortController().signal,
|
||||
});
|
||||
expect(result).toBe('Searched auth/');
|
||||
});
|
||||
});
|
||||
309
packages/core/src/services/toolUseSummary.ts
Normal file
309
packages/core/src/services/toolUseSummary.ts
Normal file
|
|
@ -0,0 +1,309 @@
|
|||
/**
|
||||
* @license
|
||||
* Copyright 2025 Qwen Team
|
||||
* SPDX-License-Identifier: Apache-2.0
|
||||
*/
|
||||
|
||||
/**
|
||||
* Tool Use Summary Generator
|
||||
*
|
||||
* Generates a short human-readable label (git-commit-subject style, ~30 chars)
|
||||
* describing what a batch of tool calls accomplished. Uses the configured fast
|
||||
* model so the call is cheap; runs in parallel with the next turn's API call so
|
||||
* its ~1s latency is hidden behind the 5-30s main-model streaming.
|
||||
*
|
||||
* Ported from Claude Code (`services/toolUseSummary/toolUseSummaryGenerator.ts`).
|
||||
* The system prompt is verbatim; the input/output shape and truncation rules
|
||||
* are preserved for behavioral parity with the SDK `tool_use_summary` message.
|
||||
*/
|
||||
|
||||
import { randomUUID } from 'node:crypto';
|
||||
import type { Config } from '../config/config.js';
|
||||
import { getResponseText } from '../utils/partUtils.js';
|
||||
import { createDebugLogger } from '../utils/debugLogger.js';
|
||||
|
||||
const debugLogger = createDebugLogger('TOOL_USE_SUMMARY');
|
||||
|
||||
/**
|
||||
* Message emitted into the stream after a tool batch completes with a
|
||||
* successful summary. Mirrors Claude Code's `ToolUseSummaryMessage` so SDK
|
||||
* clients consuming either stream see a compatible shape.
|
||||
*/
|
||||
export interface ToolUseSummaryMessage {
|
||||
type: 'tool_use_summary';
|
||||
summary: string;
|
||||
/** Tool-use call IDs this summary describes. */
|
||||
precedingToolUseIds: string[];
|
||||
uuid: string;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Creates a `tool_use_summary` message. The UUID and timestamp are generated
|
||||
* here so the message is immediately serializable for recording/SDK emission.
|
||||
*/
|
||||
export function createToolUseSummaryMessage(
|
||||
summary: string,
|
||||
precedingToolUseIds: string[],
|
||||
): ToolUseSummaryMessage {
|
||||
return {
|
||||
type: 'tool_use_summary',
|
||||
summary,
|
||||
precedingToolUseIds,
|
||||
uuid: randomUUID(),
|
||||
timestamp: new Date().toISOString(),
|
||||
};
|
||||
}
|
||||
|
||||
export const TOOL_USE_SUMMARY_SYSTEM_PROMPT = `Write a short summary label describing what these tool calls accomplished. It appears as a single-line row in a mobile app and truncates around 30 characters, so think git-commit-subject, not sentence.
|
||||
|
||||
Keep the verb in past tense and the most distinctive noun. Drop articles, connectors, and long location context first.
|
||||
|
||||
Examples:
|
||||
- Searched in auth/
|
||||
- Fixed NPE in UserService
|
||||
- Created signup endpoint
|
||||
- Read config.json
|
||||
- Ran failing tests`;
|
||||
|
||||
/** Max characters per input/output field fed to the summarizer. */
|
||||
const INPUT_TRUNCATE_LENGTH = 300;
|
||||
/** Max characters of the last assistant text included as user-intent prefix. */
|
||||
const LAST_ASSISTANT_TEXT_LENGTH = 200;
|
||||
/** Output length cap. Matches mobile UI truncation behavior. */
|
||||
const MAX_SUMMARY_LENGTH = 100;
|
||||
|
||||
export interface ToolInfo {
|
||||
name: string;
|
||||
input: unknown;
|
||||
output: unknown;
|
||||
}
|
||||
|
||||
export interface GenerateToolUseSummaryParams {
|
||||
config: Config;
|
||||
tools: ToolInfo[];
|
||||
signal: AbortSignal;
|
||||
/**
|
||||
* Trailing text from the assistant's last message, used as intent prefix
|
||||
* so the summarizer knows what the user was trying to accomplish.
|
||||
*/
|
||||
lastAssistantText?: string;
|
||||
/**
|
||||
* Fast model to use. If omitted, falls back to `config.getFastModel()`;
|
||||
* if that also returns undefined, the call is skipped (returns null).
|
||||
* Unlike `sessionRecap`, this does not fall back to the main model —
|
||||
* summary generation is a nice-to-have and must not incur main-model cost.
|
||||
*/
|
||||
model?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Generates a short label for a completed tool batch.
|
||||
*
|
||||
* @returns The summary string, or null when skipped (no tools, no fast model,
|
||||
* aborted, or model failure). Non-critical: callers should not surface errors.
|
||||
*/
|
||||
export async function generateToolUseSummary(
|
||||
params: GenerateToolUseSummaryParams,
|
||||
): Promise<string | null> {
|
||||
const { config, tools, signal, lastAssistantText } = params;
|
||||
|
||||
if (tools.length === 0) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const model = params.model ?? config.getFastModel();
|
||||
if (!model) {
|
||||
debugLogger.debug('No fast model configured — skipping summary generation');
|
||||
return null;
|
||||
}
|
||||
|
||||
if (signal.aborted) {
|
||||
return null;
|
||||
}
|
||||
|
||||
try {
|
||||
const toolSummaries = tools
|
||||
.map((tool) => {
|
||||
const inputStr = truncateJson(tool.input, INPUT_TRUNCATE_LENGTH);
|
||||
const outputStr = truncateJson(tool.output, INPUT_TRUNCATE_LENGTH);
|
||||
return `Tool: ${tool.name}\nInput: ${inputStr}\nOutput: ${outputStr}`;
|
||||
})
|
||||
.join('\n\n');
|
||||
|
||||
const contextPrefix = lastAssistantText
|
||||
? `User's intent (from assistant's last message): ${lastAssistantText.slice(0, LAST_ASSISTANT_TEXT_LENGTH)}\n\n`
|
||||
: '';
|
||||
|
||||
const userPrompt = `${contextPrefix}Tools completed:\n\n${toolSummaries}\n\nLabel:`;
|
||||
|
||||
const geminiClient = config.getGeminiClient();
|
||||
if (!geminiClient) {
|
||||
debugLogger.debug('No gemini client available — skipping');
|
||||
return null;
|
||||
}
|
||||
|
||||
const response = await geminiClient.generateContent(
|
||||
[{ role: 'user', parts: [{ text: userPrompt }] }],
|
||||
{
|
||||
systemInstruction: TOOL_USE_SUMMARY_SYSTEM_PROMPT,
|
||||
tools: [],
|
||||
maxOutputTokens: 60,
|
||||
temperature: 0.3,
|
||||
},
|
||||
signal,
|
||||
model,
|
||||
'tool_use_summary_generation',
|
||||
);
|
||||
|
||||
if (signal.aborted) return null;
|
||||
|
||||
const raw = getResponseText(response)?.trim();
|
||||
if (!raw) {
|
||||
debugLogger.debug('Summary generation returned empty result');
|
||||
return null;
|
||||
}
|
||||
|
||||
const cleaned = cleanSummary(raw);
|
||||
if (!cleaned) {
|
||||
debugLogger.debug(`Summary cleaned to empty: raw="${raw}"`);
|
||||
return null;
|
||||
}
|
||||
|
||||
debugLogger.debug(`Summary generated: "${cleaned}"`);
|
||||
return cleaned;
|
||||
} catch (err) {
|
||||
if (signal.aborted) return null;
|
||||
debugLogger.warn(
|
||||
`Summary generation failed: ${err instanceof Error ? err.message : String(err)}`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Truncates a JSON value to a maximum length for the prompt. Mirrors
|
||||
* Claude Code's `truncateJson` behavior (including the `...` suffix).
|
||||
*
|
||||
* For large string inputs, pre-truncates BEFORE serialization to avoid
|
||||
* allocating the full JSON representation on the interactive turn path —
|
||||
* a 10MB ReadFile result would otherwise be fully stringified just to be
|
||||
* sliced down to 300 chars and discarded.
|
||||
*
|
||||
* For object/array inputs, recursively pre-truncates string fields one
|
||||
* level deep before serialization. Tool inputs (`args`) and outputs
|
||||
* (functionResponse content) are typically shallow objects with the
|
||||
* dominant cost in a small number of long string fields.
|
||||
*/
|
||||
export function truncateJson(value: unknown, maxLength: number): string {
|
||||
try {
|
||||
const pre = preTruncate(value, maxLength);
|
||||
const str = JSON.stringify(pre);
|
||||
if (str == null) return '[undefined]';
|
||||
return str.length <= maxLength ? str : str.slice(0, maxLength - 3) + '...';
|
||||
} catch {
|
||||
return '[unable to serialize]';
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Walks an arbitrary value and pre-truncates string leaves that exceed
|
||||
* `maxLength`. Bounded to depth 4 to keep the cost predictable on deeply
|
||||
* nested objects (rare in tool args/outputs but possible).
|
||||
*/
|
||||
function preTruncate(value: unknown, maxLength: number, depth = 0): unknown {
|
||||
if (value === null || value === undefined) return value;
|
||||
if (typeof value === 'string') {
|
||||
return value.length > maxLength ? value.slice(0, maxLength) : value;
|
||||
}
|
||||
if (depth >= 4) return value;
|
||||
if (Array.isArray(value)) {
|
||||
return value.map((v) => preTruncate(v, maxLength, depth + 1));
|
||||
}
|
||||
if (typeof value === 'object') {
|
||||
const out: Record<string, unknown> = {};
|
||||
for (const [k, v] of Object.entries(value as Record<string, unknown>)) {
|
||||
out[k] = preTruncate(v, maxLength, depth + 1);
|
||||
}
|
||||
return out;
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
/**
|
||||
* Strips markdown, quotes, and common prefix noise from the model's raw
|
||||
* response. Enforces `MAX_SUMMARY_LENGTH` as a hard cap — the mobile UI
|
||||
* truncates around 30 chars, but we allow some slack so unusual-but-useful
|
||||
* labels (e.g. CJK phrases) survive. Returns empty string if the result is
|
||||
* unusable (error message, prefixed label, etc.).
|
||||
*/
|
||||
/**
|
||||
* Character class covering ASCII and common Unicode quote / bracket-quote
|
||||
* pairs. Chinese-instruct models frequently emit U+2018/U+2019 (curly single),
|
||||
* U+201C/U+201D (curly double), U+300C-F (CJK corner brackets). Bounded
|
||||
* quantifier keeps the regex linear (js/polynomial-redos).
|
||||
*/
|
||||
const QUOTE_CHARS = '"\'`‘’“”「」『』';
|
||||
const LEADING_QUOTES_RE = new RegExp(`^[${QUOTE_CHARS}]{1,10}`);
|
||||
const TRAILING_QUOTES_RE = new RegExp(`[${QUOTE_CHARS}]{1,10}$`);
|
||||
|
||||
/**
|
||||
* Error/refusal-like response markers, matched case-insensitively anywhere
|
||||
* the label should not produce. Covers English and Chinese refusals common
|
||||
* in mixed-provider deployments.
|
||||
*/
|
||||
const REFUSAL_PREFIXES = [
|
||||
/^api error\b/i,
|
||||
/^error[::]/i,
|
||||
// Match "I can't" (ASCII U+0027 apostrophe), "I can’t" (curly U+2019),
|
||||
// and "I cannot" as a single pattern.
|
||||
/^i can(?:['’]t|not)\b/i,
|
||||
/^unable to\b/i,
|
||||
/^failed to\b/i,
|
||||
/^sorry[,,]/i,
|
||||
/^request failed\b/i,
|
||||
/^我无法/,
|
||||
/^我不能/,
|
||||
/^抱歉[,,]?/,
|
||||
/^无法/,
|
||||
];
|
||||
|
||||
export function cleanSummary(raw: string): string {
|
||||
// Take first line only
|
||||
let text = raw.split('\n')[0]?.trim() ?? '';
|
||||
|
||||
// Strip leading bullet/dash first — otherwise a bulleted quoted label like
|
||||
// `- "Searched auth/"` would keep its leading quote after the trailing one
|
||||
// is stripped.
|
||||
text = text.replace(/^[-*•]\s+/, '').trim();
|
||||
|
||||
// Strip markdown emphasis (`**bold**`, `__bold__`, `_italic_`). Bounded
|
||||
// quantifiers keep the regex linear.
|
||||
text = text
|
||||
.replace(/^[*_]{1,3}/, '')
|
||||
.replace(/[*_]{1,3}$/, '')
|
||||
.trim();
|
||||
|
||||
// Strip surrounding ASCII + Unicode quotes/backticks.
|
||||
text = text
|
||||
.replace(LEADING_QUOTES_RE, '')
|
||||
.replace(TRAILING_QUOTES_RE, '')
|
||||
.trim();
|
||||
|
||||
// Strip common prefix labels like "Label:" "Summary:"
|
||||
text = text.replace(/^(label|summary|result|output)\s*[::]\s*/i, '').trim();
|
||||
|
||||
if (!text) return '';
|
||||
|
||||
// Reject error/refusal-like responses (English + Chinese variants).
|
||||
for (const re of REFUSAL_PREFIXES) {
|
||||
if (re.test(text)) return '';
|
||||
}
|
||||
|
||||
// Hard cap length
|
||||
if (text.length > MAX_SUMMARY_LENGTH) {
|
||||
text = text.slice(0, MAX_SUMMARY_LENGTH).trim();
|
||||
}
|
||||
|
||||
return text;
|
||||
}
|
||||
|
|
@ -2010,6 +2010,11 @@
|
|||
"description": "Enable in-session cron/loop tools (experimental). When enabled, the model can create recurring prompts using cron_create, cron_list, and cron_delete tools. Can also be enabled via QWEN_CODE_ENABLE_CRON=1 environment variable.",
|
||||
"type": "boolean",
|
||||
"default": false
|
||||
},
|
||||
"emitToolUseSummaries": {
|
||||
"description": "Generate a short LLM-based label after each tool batch completes. In compact mode the label replaces the generic `Tool × N` header; in full mode it appears as a dim `● <label>` line below the tool group. Requires a fast model to be configured; runs in parallel with the next API call so latency is hidden. Currently affects interactive CLI rendering only — SDK / non-interactive emission of the `tool_use_summary` message is not yet wired (the message factory is exported for a follow-up PR). Can be overridden with QWEN_CODE_EMIT_TOOL_USE_SUMMARIES=0 or =1.",
|
||||
"type": "boolean",
|
||||
"default": true
|
||||
}
|
||||
}
|
||||
},
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue