diff --git a/.qwen/skills/structured-debugging/SKILL.md b/.qwen/skills/structured-debugging/SKILL.md index 99ea52903..da19174cf 100644 --- a/.qwen/skills/structured-debugging/SKILL.md +++ b/.qwen/skills/structured-debugging/SKILL.md @@ -1,6 +1,6 @@ --- name: structured-debugging -description: +description: > Hypothesis-driven debugging methodology for hard bugs. Use this skill whenever you're investigating non-trivial bugs, unexpected behavior, flaky tests, or tracing issues through complex systems. Activate proactively when debugging @@ -32,11 +32,9 @@ Good: "The leader hangs because `hasActiveTeammates()` returns true after all ag have reported completed, likely because terminal status isn't being set on the agent object after the backend process exits." -Create a side note file for the investigation: - -``` -~/.qwen/investigations/-.md -``` +For bugs you expect to take more than one round, create a side note file +for the investigation in whichever location the project uses for such +notes. Write your hypothesis there. This file persists across conversation turns and even across sessions — it's your investigation journal. @@ -49,6 +47,12 @@ confirm or reject your hypothesis. Think about what data you need to see. Don't scatter `console.log` everywhere. Identify the 2-3 places where your hypothesis makes a testable prediction, and instrument those. +Prefer logging _values_ (return codes, payload contents, stream types, +message bodies, env state) over _presence checks_ ("was this function +called?", "was this branch taken?"). Code-path traces tell you what ran; +data traces tell you what it ran on. Most non-trivial bugs are correct +code processing wrong data. + Ask yourself: "If my hypothesis is correct, what will I see at point X? If it's wrong, what will I see instead?" @@ -129,6 +133,23 @@ is still broken if the inbox contains stale messages from a previous run. Always inspect the _content_ flowing through the code, not just whether the code runs. Check payloads, message contents, file data, and database state. +### Reframing the user's report instead of investigating it + +When the user reports a symptom your own run doesn't reproduce, the +contradiction _is_ the evidence — the two environments differ in some way +you haven't identified yet. The wrong move is to reframe their report +("they must be on a stale SHA", "they must be confused about what they +saw", "must be a flake") so that your run becomes the ground truth. Once +you do that, every later piece of evidence gets bent to defend the +reframing, and the actual bug stays hidden. + +The right move: catalogue what differs between their environment and +yours (TTY vs pipe, terminal emulator, shell, locale, env vars, prior +state, build artifacts) before forming any hypothesis. For ambiguous +symptoms ("no output", "it's slow", "it's wrong") ask one disambiguating +question first — e.g., "does it hang or exit cleanly?" — that prunes the +hypothesis space cheaply before any test run. + ### Losing context across attempts After several debugging rounds, you start forgetting what you already tried and @@ -164,3 +185,10 @@ Fix: [what you're changing and why it addresses the root cause] ``` Then apply the fix, remove instrumentation, and verify with a clean run. + +## Worked examples + +- [`examples/headless-bg-agent-empty-stdout.md`](examples/headless-bg-agent-empty-stdout.md) + — pipe-captured runs all passed; the user's TTY printed nothing. The + contradiction _was_ the bug. Illustrates _reproduction contradiction is + data_ and _instrument data, not code paths_. diff --git a/.qwen/skills/structured-debugging/examples/headless-bg-agent-empty-stdout.md b/.qwen/skills/structured-debugging/examples/headless-bg-agent-empty-stdout.md new file mode 100644 index 000000000..33a356b76 --- /dev/null +++ b/.qwen/skills/structured-debugging/examples/headless-bg-agent-empty-stdout.md @@ -0,0 +1,59 @@ +# Worked example: headless run prints empty stdout in zsh TTY + +A short qwen-code case to illustrate two failure modes from `SKILL.md`: +_reproduction contradiction is data_, and _instrument the data flow, not +just the code path_. + +## The bug + +User: `npm run dev -- -p "..."` in zsh prints nothing. Process exits clean, +`~/.qwen/logs` shows the model returned proper text. Stdout was empty. + +Cause: `JsonOutputAdapter.emitResult` wrote `resultMessage.result` without +a trailing `\n`. zsh's `PROMPT_SP` (powerlevel10k, agnoster, …) detects +the missing newline and emits `\r\033[K` before drawing the next prompt, +erasing the line. Pipe-captured stdout has no `PROMPT_SP`, so the bug is +invisible there. + +Fix: append `\n` to the write. + +## What made the case instructive + +Every reproduction attempt from a debugging environment that captures +stdout (Cursor's Shell tool, `out=$(...)`, `tee`, file redirect) **passed**. +14/14 success against the user's 0/N. Same SHA, same machine, same +command. The only variable was: pipe stdout vs TTY stdout. + +That contradiction was the entire investigation. Once it was named, the +fix was one line. + +## Lessons mapped to SKILL.md + +- **Reproduction contradiction is data, not user error.** When your run + succeeds and the user's fails on identical state, the _difference + between the two environments_ is where the bug lives. Catalogue what + differs (TTY vs pipe, terminal emulator, shell, locale, env vars, + prior state) before forming any hypothesis. Reframing the user's + report ("they must be on stale code") burns rounds and credibility. + +- **Ask the one disambiguating question first.** "Does it hang or exit + cleanly?" would have falsified the most tempting wrong hypothesis here + (the recently-fixed drain-loop hang) on turn one. For any "no output" + report, that question is free and prunes half the hypothesis space. + +- **Instrument the data flow, not just the code path.** Tracing whether + `write` was called showed the happy path firing every time and resolved + nothing. The breakthrough was logging the _return value_ of + `process.stdout.write` together with `process.stdout.isTTY`. Code-path + traces tell you what ran; data traces tell you what it ran on. + +- **Pipe ≠ TTY.** A passing pipe-captured run does not prove a TTY user + sees the same output. Shell prompts can post-process trailing-newline- + less writes; terminals can swallow control sequences; pipes do + neither. When debugging interactive-shell symptoms, get evidence from + the user's actual terminal at least once. + +## Reference + +Fix commit: qwen-code `feadf052f` — +`fix(cli): append newline to text-mode emitResult so zsh PROMPT_SP doesn't erase the line` diff --git a/packages/cli/src/nonInteractive/io/JsonOutputAdapter.ts b/packages/cli/src/nonInteractive/io/JsonOutputAdapter.ts index 68633675b..5d36ac7f0 100644 --- a/packages/cli/src/nonInteractive/io/JsonOutputAdapter.ts +++ b/packages/cli/src/nonInteractive/io/JsonOutputAdapter.ts @@ -67,9 +67,9 @@ export class JsonOutputAdapter if (this.config.getOutputFormat() === 'text') { if (resultMessage.is_error) { - process.stderr.write(`${resultMessage.error?.message || ''}`); + process.stderr.write(`${resultMessage.error?.message || ''}\n`); } else { - process.stdout.write(`${resultMessage.result}`); + process.stdout.write(`${resultMessage.result}\n`); } } else { // Emit the entire messages array as JSON (includes all main agent + subagent messages) diff --git a/packages/cli/src/nonInteractiveCli.test.ts b/packages/cli/src/nonInteractiveCli.test.ts index 8bd34ca22..6fe474b07 100644 --- a/packages/cli/src/nonInteractiveCli.test.ts +++ b/packages/cli/src/nonInteractiveCli.test.ts @@ -146,6 +146,11 @@ describe('runNonInteractive', () => { isInteractive: vi.fn().mockReturnValue(false), isCronEnabled: vi.fn().mockReturnValue(false), getCronScheduler: vi.fn().mockReturnValue(null), + getBackgroundTaskRegistry: vi.fn().mockReturnValue({ + setNotificationCallback: vi.fn(), + setRegisterCallback: vi.fn(), + getRunning: vi.fn().mockReturnValue([]), + }), } as unknown as Config; mockSettings = { @@ -255,7 +260,7 @@ describe('runNonInteractive', () => { 'prompt-id-1', { type: SendMessageType.UserQuery }, ); - expect(processStdoutSpy).toHaveBeenCalledWith('Hello World'); + expect(processStdoutSpy).toHaveBeenCalledWith('Hello World\n'); expect(mockShutdownTelemetry).toHaveBeenCalled(); }); @@ -319,7 +324,7 @@ describe('runNonInteractive', () => { 'prompt-id-2', { type: SendMessageType.ToolResult }, ); - expect(processStdoutSpy).toHaveBeenCalledWith('Final answer'); + expect(processStdoutSpy).toHaveBeenCalledWith('Final answer\n'); }); it('should handle error during tool execution and should send error back to the model', async () => { @@ -388,7 +393,7 @@ describe('runNonInteractive', () => { 'prompt-id-3', { type: SendMessageType.ToolResult }, ); - expect(processStdoutSpy).toHaveBeenCalledWith('Sorry, let me try again.'); + expect(processStdoutSpy).toHaveBeenCalledWith('Sorry, let me try again.\n'); }); it('should exit with error if sendMessageStream throws initially', async () => { @@ -450,7 +455,7 @@ describe('runNonInteractive', () => { expect(mockCoreExecuteToolCall).toHaveBeenCalled(); expect(mockGeminiClient.sendMessageStream).toHaveBeenCalledTimes(2); expect(processStdoutSpy).toHaveBeenCalledWith( - "Sorry, I can't find that tool.", + "Sorry, I can't find that tool.\n", ); }); @@ -514,7 +519,7 @@ describe('runNonInteractive', () => { ); // 6. Assert the final output is correct - expect(processStdoutSpy).toHaveBeenCalledWith('Summary complete.'); + expect(processStdoutSpy).toHaveBeenCalledWith('Summary complete.\n'); }); it('should process input and write JSON output with stats', async () => { @@ -887,7 +892,7 @@ describe('runNonInteractive', () => { { type: SendMessageType.UserQuery }, ); - expect(processStdoutSpy).toHaveBeenCalledWith('Response from command'); + expect(processStdoutSpy).toHaveBeenCalledWith('Response from command\n'); }); it('should handle command that requires confirmation by returning early', async () => { @@ -912,7 +917,7 @@ describe('runNonInteractive', () => { // Should write error message through adapter to stdout (TEXT mode goes through JsonOutputAdapter) expect(processStderrSpy).toHaveBeenCalledWith( - 'Shell command confirmation is not supported in non-interactive mode. Use YOLO mode or pre-approve commands.', + 'Shell command confirmation is not supported in non-interactive mode. Use YOLO mode or pre-approve commands.\n', ); }); @@ -947,7 +952,7 @@ describe('runNonInteractive', () => { { type: SendMessageType.UserQuery }, ); - expect(processStdoutSpy).toHaveBeenCalledWith('Response to unknown'); + expect(processStdoutSpy).toHaveBeenCalledWith('Response to unknown\n'); }); it('should handle known but unsupported slash commands like /help by returning early', async () => { @@ -970,7 +975,7 @@ describe('runNonInteractive', () => { // Should write error message through adapter to stdout (TEXT mode goes through JsonOutputAdapter) expect(processStderrSpy).toHaveBeenCalledWith( - 'The command "/help" is not supported in non-interactive mode.', + 'The command "/help" is not supported in non-interactive mode.\n', ); }); @@ -995,7 +1000,7 @@ describe('runNonInteractive', () => { // Should write error message to stderr expect(processStderrSpy).toHaveBeenCalledWith( - 'Unknown command result type: unhandled', + 'Unknown command result type: unhandled\n', ); }); @@ -1033,7 +1038,7 @@ describe('runNonInteractive', () => { expect(mockAction).toHaveBeenCalledWith(expect.any(Object), 'arg1 arg2'); - expect(processStdoutSpy).toHaveBeenCalledWith('Acknowledged'); + expect(processStdoutSpy).toHaveBeenCalledWith('Acknowledged\n'); }); it('should emit stream-json envelopes when output format is stream-json', async () => { diff --git a/packages/cli/src/nonInteractiveCli.ts b/packages/cli/src/nonInteractiveCli.ts index bbd7fab7d..ee735c2f7 100644 --- a/packages/cli/src/nonInteractiveCli.ts +++ b/packages/cli/src/nonInteractiveCli.ts @@ -4,7 +4,11 @@ * SPDX-License-Identifier: Apache-2.0 */ -import type { Config, ToolCallRequestInfo } from '@qwen-code/qwen-code-core'; +import type { + BackgroundAgentStatus, + Config, + ToolCallRequestInfo, +} from '@qwen-code/qwen-code-core'; import { isSlashCommand } from './ui/utils/commandUtils.js'; import type { LoadedSettings } from './config/settings.js'; import { @@ -250,6 +254,55 @@ export async function runNonInteractive( const initialParts = normalizePartList(initialPartList); let currentMessages: Content[] = [{ role: 'user', parts: initialParts }]; + // ─── Shared notification queue (cron + background agents) ────── + // Register the callback early so background agents launched during + // the main tool-call chain can push completions onto the queue. + interface LocalQueueItem { + displayText: string; + modelText: string; + sendMessageType: SendMessageType; + sdkNotification?: { + task_id: string; + tool_use_id?: string; + status: BackgroundAgentStatus; + usage?: { + total_tokens: number; + tool_uses: number; + duration_ms: number; + }; + }; + } + const localQueue: LocalQueueItem[] = []; + const registry = config.getBackgroundTaskRegistry(); + registry.setNotificationCallback((displayText, modelText, meta) => { + localQueue.push({ + displayText, + modelText, + sendMessageType: SendMessageType.Notification, + sdkNotification: { + task_id: meta.agentId, + tool_use_id: meta.toolUseId, + status: meta.status, + usage: meta.stats + ? { + total_tokens: meta.stats.totalTokens, + tool_uses: meta.stats.toolUses, + duration_ms: meta.stats.durationMs, + } + : undefined, + }, + }); + }); + + registry.setRegisterCallback((entry) => { + adapter.emitSystemMessage('task_started', { + task_id: entry.agentId, + tool_use_id: entry.toolUseId, + description: entry.description, + subagent_type: entry.subagentType, + }); + }); + let isFirstTurn = true; let modelOverride: string | undefined; while (true) { @@ -380,144 +433,257 @@ export async function runNonInteractive( } currentMessages = [{ role: 'user', parts: toolResponseParts }]; } else { - // No more tool calls — check if cron jobs are keeping us alive + // No more tool calls — drain notifications and cron, then exit. + + // Process one queue item through the full turn loop. + const drainOneItem = async () => { + if (localQueue.length === 0) return; + const item = localQueue.shift()!; + + if (item.sendMessageType === SendMessageType.Notification) { + adapter.emitUserMessage([{ text: item.displayText }]); + if (item.sdkNotification) { + adapter.emitSystemMessage( + 'task_notification', + item.sdkNotification, + ); + } + } + + turnCount++; + // Symmetry with the main turn loop: drain-turns (cron fires and + // background-agent notification replies) count toward the + // configured budget too, otherwise a looping cron or a model + // that keeps replying to notifications can exceed the cap + // silently in headless runs. + if ( + config.getMaxSessionTurns() >= 0 && + turnCount > config.getMaxSessionTurns() + ) { + handleMaxTurnsExceededError(config); + } + let itemMessages: Content[] = [ + { role: 'user', parts: [{ text: item.modelText }] }, + ]; + let itemIsFirstTurn = true; + let itemModelOverride: string | undefined; + + while (true) { + const itemToolCallRequests: ToolCallRequestInfo[] = []; + const itemApiStartTime = Date.now(); + const itemStream = geminiClient.sendMessageStream( + itemMessages[0]?.parts || [], + abortController.signal, + prompt_id, + { + type: itemIsFirstTurn + ? item.sendMessageType + : SendMessageType.ToolResult, + modelOverride: itemModelOverride, + ...(itemIsFirstTurn && { + notificationDisplayText: item.displayText, + }), + }, + ); + itemIsFirstTurn = false; + + adapter.startAssistantMessage(); + + for await (const event of itemStream) { + if (abortController.signal.aborted) { + // Pair the startAssistantMessage() above so stream-json + // mode doesn't leave an unterminated message_start. + adapter.finalizeAssistantMessage(); + return; + } + adapter.processEvent(event); + if (event.type === GeminiEventType.ToolCallRequest) { + itemToolCallRequests.push(event.value); + } + if ( + outputFormat === OutputFormat.TEXT && + event.type === GeminiEventType.Error + ) { + const errorText = parseAndFormatApiError( + event.value.error, + config.getContentGeneratorConfig()?.authType, + ); + process.stderr.write(`${errorText}\n`); + throw new Error(errorText); + } + } + + adapter.finalizeAssistantMessage(); + totalApiDurationMs += Date.now() - itemApiStartTime; + + if (itemToolCallRequests.length > 0) { + const itemToolResponseParts: Part[] = []; + + for (const requestInfo of itemToolCallRequests) { + const itemInputFormat = + typeof config.getInputFormat === 'function' + ? config.getInputFormat() + : InputFormat.TEXT; + const itemToolCallUpdateCallback = + itemInputFormat === InputFormat.STREAM_JSON && + options.controlService + ? options.controlService.permission.getToolCallUpdateCallback() + : undefined; + + const isAgentTool = requestInfo.name === 'agent'; + const { handler: outputUpdateHandler } = isAgentTool + ? createAgentToolProgressHandler( + config, + requestInfo.callId, + adapter, + ) + : createToolProgressHandler(requestInfo, adapter); + + const toolResponse = await executeToolCall( + config, + requestInfo, + abortController.signal, + { + outputUpdateHandler, + ...(itemToolCallUpdateCallback && { + onToolCallsUpdate: itemToolCallUpdateCallback, + }), + }, + ); + + if (toolResponse.error) { + handleToolError( + requestInfo.name, + toolResponse.error, + config, + toolResponse.errorType || 'TOOL_EXECUTION_ERROR', + typeof toolResponse.resultDisplay === 'string' + ? toolResponse.resultDisplay + : undefined, + ); + } + + adapter.emitToolResult(requestInfo, toolResponse); + + if (toolResponse.responseParts) { + itemToolResponseParts.push(...toolResponse.responseParts); + } + + if ('modelOverride' in toolResponse) { + itemModelOverride = toolResponse.modelOverride; + } + } + itemMessages = [{ role: 'user', parts: itemToolResponseParts }]; + } else { + break; + } + } + }; + + // Single-flight drain: concurrent callers wait for the running + // drain. A new drain starts only after the previous one finishes + // its queue, which prevents overlapping turns when cron jobs fire + // while an earlier queued item is still streaming. + // + // The null-clearing is attached via `.finally()` on the outer + // promise rather than inside the async body. When the queue is + // empty the async body runs to completion synchronously (no + // awaits) — an inner `finally { drainPromise = null }` would + // therefore fire BEFORE the outer `drainPromise = p` assignment, + // leaving drainPromise stuck holding a resolved Promise forever + // and making every future call return immediately without ever + // draining. Clearing via `p.finally()` schedules the null as a + // microtask that runs after the outer assignment. + let drainPromise: Promise | null = null; + const drainLocalQueue = (): Promise => { + if (drainPromise) return drainPromise; + const p = (async () => { + while (localQueue.length > 0) { + await drainOneItem(); + } + })(); + drainPromise = p; + void p.finally(() => { + if (drainPromise === p) drainPromise = null; + }); + return p; + }; + + // Start cron scheduler — fires enqueue onto the shared queue. const scheduler = !config.isCronEnabled() ? null : config.getCronScheduler(); - if (scheduler && scheduler.size > 0) { - // Start the scheduler and wait for all jobs to complete or be deleted. - // Each fired prompt is processed as a new turn through the same loop. - await new Promise((resolve) => { - const cronQueue: string[] = []; - let processing = false; - const checkDone = () => { - if (scheduler.size === 0 && !processing) { + if (scheduler && scheduler.size > 0) { + await new Promise((resolve, reject) => { + // Resolve on SIGINT/SIGTERM too — recurring cron jobs never + // drop scheduler.size to 0 on their own, so without this the + // hold-back loop below is unreachable after an abort. + const onAbort = () => { + scheduler.stop(); + resolve(); + }; + if (abortController.signal.aborted) { + onAbort(); + return; + } + abortController.signal.addEventListener('abort', onAbort, { + once: true, + }); + + const checkCronDone = () => { + if (scheduler.size === 0 && !drainPromise) { + abortController.signal.removeEventListener('abort', onAbort); scheduler.stop(); resolve(); } }; - const drainQueue = async () => { - if (processing) return; - processing = true; - try { - while (cronQueue.length > 0) { - const cronPrompt = cronQueue.shift()!; - turnCount++; - let cronMessages: Content[] = [ - { role: 'user', parts: [{ text: cronPrompt }] }, - ]; - let cronIsFirstTurn = true; - let cronModelOverride: string | undefined; - - while (true) { - const cronToolCallRequests: ToolCallRequestInfo[] = []; - const cronApiStartTime = Date.now(); - const cronStream = geminiClient.sendMessageStream( - cronMessages[0]?.parts || [], - abortController.signal, - prompt_id, - { - type: cronIsFirstTurn - ? SendMessageType.Cron - : SendMessageType.ToolResult, - modelOverride: cronModelOverride, - }, - ); - cronIsFirstTurn = false; - - adapter.startAssistantMessage(); - - for await (const event of cronStream) { - if (abortController.signal.aborted) { - const summary = scheduler.getExitSummary(); - scheduler.stop(); - if (summary) { - process.stderr.write(summary + '\n'); - } - resolve(); - return; - } - adapter.processEvent(event); - if (event.type === GeminiEventType.ToolCallRequest) { - cronToolCallRequests.push(event.value); - } - } - - adapter.finalizeAssistantMessage(); - totalApiDurationMs += Date.now() - cronApiStartTime; - - if (cronToolCallRequests.length > 0) { - const cronToolResponseParts: Part[] = []; - - for (const requestInfo of cronToolCallRequests) { - const isAgentTool = requestInfo.name === 'agent'; - const { handler: outputUpdateHandler } = isAgentTool - ? createAgentToolProgressHandler( - config, - requestInfo.callId, - adapter, - ) - : createToolProgressHandler(requestInfo, adapter); - - const toolResponse = await executeToolCall( - config, - requestInfo, - abortController.signal, - { outputUpdateHandler }, - ); - - if (toolResponse.error) { - handleToolError( - requestInfo.name, - toolResponse.error, - config, - toolResponse.errorType || 'TOOL_EXECUTION_ERROR', - typeof toolResponse.resultDisplay === 'string' - ? toolResponse.resultDisplay - : undefined, - ); - } - - adapter.emitToolResult(requestInfo, toolResponse); - - if (toolResponse.responseParts) { - cronToolResponseParts.push( - ...toolResponse.responseParts, - ); - } - - if ('modelOverride' in toolResponse) { - cronModelOverride = toolResponse.modelOverride; - } - } - cronMessages = [ - { role: 'user', parts: cronToolResponseParts }, - ]; - } else { - break; - } - } - } - } catch (error) { - debugLogger.error('Error processing cron prompt:', error); - } finally { - processing = false; - checkDone(); - } + // Propagate drain failures. Without this, a rejected + // drainLocalQueue() (e.g. a text-mode API error surfacing + // out of drainOneItem) would be swallowed by `void` and + // checkCronDone would never fire — hanging the run. + const onDrainError = (err: unknown) => { + abortController.signal.removeEventListener('abort', onAbort); + scheduler.stop(); + reject(err); }; scheduler.start((job: { prompt: string }) => { - cronQueue.push(job.prompt); - void drainQueue(); + const label = job.prompt.slice(0, 40); + localQueue.push({ + displayText: `Cron: ${label}`, + modelText: job.prompt, + sendMessageType: SendMessageType.Cron, + }); + drainLocalQueue().then(checkCronDone, onDrainError); }); - // Also check immediately in case jobs were already deleted - checkDone(); + // Check immediately in case jobs were already deleted + checkCronDone(); }); } + // ─── Terminal hold-back phase ────────────────────────── + // Wait for running background agents to complete and drain + // their notifications before emitting the final result. If + // SIGINT/SIGTERM fires here, abort running background agents + // (they use their own AbortControllers) and route through + // handleCancellationError so the run exits non-zero — falling + // through to the success emitResult below would silently + // convert cancellation into a successful completion. + + while (true) { + if (abortController.signal.aborted) { + registry.abortAll(); + handleCancellationError(config); + } + await drainLocalQueue(); + const running = registry.getRunning(); + if (running.length === 0 && localQueue.length === 0) break; + await new Promise((r) => setTimeout(r, 100)); + } + const metrics = uiTelemetryService.getMetrics(); const usage = computeUsageFromMetrics(metrics); // Get stats for JSON format output @@ -567,6 +733,14 @@ export async function runNonInteractive( }); handleError(error, config); } finally { + try { + const reg = config.getBackgroundTaskRegistry(); + reg.setNotificationCallback(undefined); + reg.setRegisterCallback(undefined); + } catch { + // Ignore — registry may not be initialized if we failed early. + } + process.stdout.removeListener('error', stdoutErrorHandler); // Cleanup signal handlers process.removeListener('SIGINT', shutdownHandler); diff --git a/packages/cli/src/ui/hooks/useGeminiStream.ts b/packages/cli/src/ui/hooks/useGeminiStream.ts index 3208e47e6..b584044ba 100644 --- a/packages/cli/src/ui/hooks/useGeminiStream.ts +++ b/packages/cli/src/ui/hooks/useGeminiStream.ts @@ -547,7 +547,8 @@ export const useGeminiStream = ( // Notification messages (e.g. background agent completions) are // pre-processed by the notification drain loop which already // added the display item to history. Just pass the model text - // through to the API. + // through to the API. Cron prompts still go through the normal + // slash/@-command/shell preprocessing path below. if (submitType === SendMessageType.Notification) { onDebugMessage( `Received notification (${trimmedQuery.length} chars)`, @@ -605,10 +606,15 @@ export const useGeminiStream = ( localQueryToSendToGemini = trimmedQuery; - addItem( - { type: MessageType.USER, text: trimmedQuery }, - userMessageTimestamp, - ); + // Cron prompts are already rendered as a `● Cron: …` notification by + // the queue drain, so skip the user-message history item to avoid + // a duplicate `> …` line. Preprocessing (@/slash/shell) still runs. + if (submitType !== SendMessageType.Cron) { + addItem( + { type: MessageType.USER, text: trimmedQuery }, + userMessageTimestamp, + ); + } // Handle @-commands (which might involve tool calls) if (isAtCommand(trimmedQuery)) { @@ -1862,17 +1868,29 @@ export const useGeminiStream = ( storage, ]); - // ─── Cron scheduler integration ───────────────────────── - const cronQueueRef = useRef([]); - const [cronTrigger, setCronTrigger] = useState(0); + // ─── Unified notification queue (cron + background agents) ────── + const notificationQueueRef = useRef< + Array<{ + displayText: string; + modelText: string; + sendMessageType: SendMessageType; + }> + >([]); + const [notificationTrigger, setNotificationTrigger] = useState(0); - // Start the scheduler on mount, stop on unmount + // Start the cron scheduler on mount, stop on unmount. + // Cron fires enqueue onto the shared notification queue. useEffect(() => { if (!config.isCronEnabled()) return; const scheduler = config.getCronScheduler(); scheduler.start((job: { prompt: string }) => { - cronQueueRef.current.push(job.prompt); - setCronTrigger((n) => n + 1); + const label = job.prompt.slice(0, 40); + notificationQueueRef.current.push({ + displayText: `Cron: ${label}`, + modelText: job.prompt, + sendMessageType: SendMessageType.Cron, + }); + setNotificationTrigger((n) => n + 1); }); return () => { const summary = scheduler.getExitSummary(); @@ -1883,37 +1901,23 @@ export const useGeminiStream = ( }; }, [config]); - // When idle, drain the cron queue one prompt at a time - useEffect(() => { - if ( - streamingState === StreamingState.Idle && - cronQueueRef.current.length > 0 - ) { - const prompt = cronQueueRef.current.shift()!; - submitQuery(prompt, SendMessageType.Cron); - } - }, [streamingState, submitQuery, cronTrigger]); - - // ─── Background agent notification queue ─────────────────── - const notificationQueueRef = useRef< - Array<{ displayText: string; modelText: string }> - >([]); - const [notificationTrigger, setNotificationTrigger] = useState(0); - + // Register background agent notification callback onto the shared queue. useEffect(() => { const registry = config.getBackgroundTaskRegistry(); - registry.setNotificationCallback( - (displayText: string, modelText: string) => { - notificationQueueRef.current.push({ displayText, modelText }); - setNotificationTrigger((n) => n + 1); - }, - ); + registry.setNotificationCallback((displayText, modelText) => { + notificationQueueRef.current.push({ + displayText, + modelText, + sendMessageType: SendMessageType.Notification, + }); + setNotificationTrigger((n) => n + 1); + }); return () => { - registry.setNotificationCallback(() => {}); + registry.setNotificationCallback(undefined); }; }, [config]); - // When idle, drain the notification queue one item at a time + // When idle, drain the unified queue one item at a time. useEffect(() => { if ( streamingState === StreamingState.Idle && @@ -1924,7 +1928,7 @@ export const useGeminiStream = ( { type: 'notification' as const, text: item.displayText }, Date.now(), ); - submitQuery(item.modelText, SendMessageType.Notification, undefined, { + submitQuery(item.modelText, item.sendMessageType, undefined, { notificationDisplayText: item.displayText, }); } diff --git a/packages/cli/src/ui/utils/resumeHistoryUtils.ts b/packages/cli/src/ui/utils/resumeHistoryUtils.ts index f8ee52017..bf134785a 100644 --- a/packages/cli/src/ui/utils/resumeHistoryUtils.ts +++ b/packages/cli/src/ui/utils/resumeHistoryUtils.ts @@ -256,15 +256,19 @@ function convertToHistoryItems( } switch (record.type) { case 'user': { - // Restore notification items (background agent completions) - if (record.subtype === 'notification') { + // Restore notification items (background agent completions and cron fires) + if (record.subtype === 'notification' || record.subtype === 'cron') { const payload = record.systemPayload as | { displayText?: string } | undefined; + const fallback = + record.subtype === 'cron' + ? 'Cron job fired' + : 'Background agent completed'; const text = payload?.displayText || extractTextFromParts(record.message?.parts as Part[]) || - 'Background agent completed'; + fallback; items.push({ type: 'notification', text }); break; } diff --git a/packages/core/src/agents/background-tasks.test.ts b/packages/core/src/agents/background-tasks.test.ts index 456dd27b5..477dbe2e8 100644 --- a/packages/core/src/agents/background-tasks.test.ts +++ b/packages/core/src/agents/background-tasks.test.ts @@ -193,8 +193,11 @@ describe('BackgroundTaskRegistry', () => { // Status should remain 'cancelled', not flip to 'completed' expect(registry.get('test-1')!.status).toBe('cancelled'); - // No notification should have been sent - expect(callback).not.toHaveBeenCalled(); + // Exactly one notification, emitted by cancel() itself — the late + // complete() must be no-op'd by the running-status guard. + expect(callback).toHaveBeenCalledTimes(1); + const [, modelText] = callback.mock.calls[0]; + expect(modelText).toContain('cancelled'); }); it('fail is a no-op after cancellation (state race guard)', () => { @@ -213,7 +216,9 @@ describe('BackgroundTaskRegistry', () => { registry.fail('test-1', 'late error'); expect(registry.get('test-1')!.status).toBe('cancelled'); - expect(callback).not.toHaveBeenCalled(); + expect(callback).toHaveBeenCalledTimes(1); + const [, modelText] = callback.mock.calls[0]; + expect(modelText).toContain('cancelled'); }); it('does not send notification without callback', () => { @@ -229,4 +234,68 @@ describe('BackgroundTaskRegistry', () => { registry.complete('test-1', 'done'); expect(registry.get('test-1')!.status).toBe('completed'); }); + + it('propagates toolUseId through XML and notification meta', () => { + const callback = vi.fn(); + registry.setNotificationCallback(callback); + + registry.register({ + agentId: 'test-1', + description: 'test agent', + status: 'running', + startTime: Date.now(), + abortController: new AbortController(), + toolUseId: 'call-abc-123', + }); + + registry.complete('test-1', 'done'); + + expect(callback).toHaveBeenCalledOnce(); + const [, modelText, meta] = callback.mock.calls[0]; + expect(modelText).toContain('call-abc-123'); + expect(meta.toolUseId).toBe('call-abc-123'); + }); + + it('omits tool-use-id XML tag when toolUseId is absent', () => { + const callback = vi.fn(); + registry.setNotificationCallback(callback); + + registry.register({ + agentId: 'test-1', + description: 'test agent', + status: 'running', + startTime: Date.now(), + abortController: new AbortController(), + }); + + registry.complete('test-1', 'done'); + + const [, modelText, meta] = callback.mock.calls[0]; + expect(modelText).not.toContain(''); + expect(meta.toolUseId).toBeUndefined(); + }); + + it('escapes XML metacharacters in interpolated fields', () => { + const callback = vi.fn(); + registry.setNotificationCallback(callback); + + registry.register({ + agentId: 'test-1', + description: 'summarize & ', + status: 'running', + startTime: Date.now(), + abortController: new AbortController(), + }); + + registry.complete('test-1', 'here is bold & '); + + const [, modelText] = callback.mock.calls[0]; + // No injected closing tags — subagent text is escaped so the + // parent envelope stays a single task-notification element. + expect(modelText.match(/<\/task-notification>/g)!.length).toBe(1); + expect(modelText).toContain('</result>'); + expect(modelText).toContain('</task-notification>'); + expect(modelText).toContain('<b>bold</b>'); + expect(modelText).toContain('&'); + }); }); diff --git a/packages/core/src/agents/background-tasks.ts b/packages/core/src/agents/background-tasks.ts index e7b57c44a..a02c7e0b1 100644 --- a/packages/core/src/agents/background-tasks.ts +++ b/packages/core/src/agents/background-tasks.ts @@ -17,6 +17,20 @@ import { createDebugLogger } from '../utils/debugLogger.js'; const debugLogger = createDebugLogger('BACKGROUND_TASKS'); const MAX_DESCRIPTION_LENGTH = 40; +const MAX_RESULT_LENGTH = 2000; + +// Escape text so it is safe to interpolate into an XML element body. +// Subagent-produced strings (description, result, error) can contain `<`, +// `>`, or literal `` — without escaping, a subagent +// summarizing HTML or another agent's notification could close the +// envelope early and forge sibling tags (e.g. a faked ) that the +// parent model would treat as trusted metadata. +function escapeXml(text: string): string { + return text + .replace(/&/g, '&') + .replace(//g, '>'); +} export type BackgroundAgentStatus = | 'running' @@ -24,6 +38,12 @@ export type BackgroundAgentStatus = | 'failed' | 'cancelled'; +export interface AgentCompletionStats { + totalTokens: number; + toolUses: number; + durationMs: number; +} + export interface BackgroundAgentEntry { agentId: string; description: string; @@ -35,16 +55,29 @@ export interface BackgroundAgentEntry { error?: string; abortController: AbortController; name?: string; + stats?: AgentCompletionStats; + toolUseId?: string; +} + +export interface NotificationMeta { + agentId: string; + status: BackgroundAgentStatus; + stats?: AgentCompletionStats; + toolUseId?: string; } export type BackgroundNotificationCallback = ( displayText: string, modelText: string, + meta: NotificationMeta, ) => void; +export type BackgroundRegisterCallback = (entry: BackgroundAgentEntry) => void; + export class BackgroundTaskRegistry { private readonly agents = new Map(); private notificationCallback?: BackgroundNotificationCallback; + private registerCallback?: BackgroundRegisterCallback; /** * Register a new background agent. @@ -52,6 +85,14 @@ export class BackgroundTaskRegistry { register(entry: BackgroundAgentEntry): void { this.agents.set(entry.agentId, entry); debugLogger.info(`Registered background agent: ${entry.agentId}`); + + if (this.registerCallback) { + try { + this.registerCallback(entry); + } catch (error) { + debugLogger.error('Failed to emit register callback:', error); + } + } } /** @@ -59,13 +100,18 @@ export class BackgroundTaskRegistry { * No-op if the agent is not in 'running' state (guards against race * with concurrent cancellation). */ - complete(agentId: string, result: string): void { + complete( + agentId: string, + result: string, + stats?: AgentCompletionStats, + ): void { const entry = this.agents.get(agentId); if (!entry || entry.status !== 'running') return; entry.status = 'completed'; entry.endTime = Date.now(); entry.result = result; + entry.stats = stats; debugLogger.info(`Background agent completed: ${agentId}`); this.emitNotification(entry); @@ -76,13 +122,14 @@ export class BackgroundTaskRegistry { * No-op if the agent is not in 'running' state (guards against race * with concurrent cancellation). */ - fail(agentId: string, error: string): void { + fail(agentId: string, error: string, stats?: AgentCompletionStats): void { const entry = this.agents.get(agentId); if (!entry || entry.status !== 'running') return; entry.status = 'failed'; entry.endTime = Date.now(); entry.error = error; + entry.stats = stats; debugLogger.info(`Background agent failed: ${agentId}`); this.emitNotification(entry); @@ -99,6 +146,12 @@ export class BackgroundTaskRegistry { entry.status = 'cancelled'; entry.endTime = Date.now(); debugLogger.info(`Background agent cancelled: ${agentId}`); + + // Emit the terminal notification here — the fire-and-forget + // complete()/fail() path is guarded by `status !== 'running'` and + // will no-op, so without this the SDK contract breaks: consumers + // saw task_started but never receive a matching task_notification. + this.emitNotification(entry); } /** @@ -133,10 +186,20 @@ export class BackgroundTaskRegistry { * Set the callback that delivers completion notifications to the CLI. * Called by AppContainer during initialization. */ - setNotificationCallback(cb: BackgroundNotificationCallback): void { + setNotificationCallback( + cb: BackgroundNotificationCallback | undefined, + ): void { this.notificationCallback = cb; } + /** + * Set the callback fired when a new background agent is registered. + * Used by the CLI to emit task_started SDK events. + */ + setRegisterCallback(cb: BackgroundRegisterCallback | undefined): void { + this.registerCallback = cb; + } + /** * Abort all running background agents. Called during session cleanup. */ @@ -146,22 +209,16 @@ export class BackgroundTaskRegistry { entry.abortController.abort(); entry.status = 'cancelled'; entry.endTime = Date.now(); + // Same reasoning as cancel(): emit the terminal notification + // here, because the fire-and-forget complete()/fail() will + // be no-op'd by the running-status guard. + this.emitNotification(entry); } } debugLogger.info('Aborted all background agents'); } - private emitNotification(entry: BackgroundAgentEntry): void { - if (!this.notificationCallback) return; - - const statusText = - entry.status === 'completed' - ? 'completed' - : entry.status === 'failed' - ? `failed` - : 'was cancelled'; - - // Build the label: "Explore: list ts files..." (truncated) + private buildDisplayLabel(entry: BackgroundAgentEntry): string { // Strip the subagent type prefix if the description already starts with it // to avoid duplication like "Explore: Explore: list ts files". let rawDesc = entry.description; @@ -175,24 +232,67 @@ export class BackgroundTaskRegistry { rawDesc.length > MAX_DESCRIPTION_LENGTH ? rawDesc.slice(0, MAX_DESCRIPTION_LENGTH) + '...' : rawDesc; - const label = entry.subagentType ? `${entry.subagentType}: ${desc}` : desc; + return entry.subagentType ? `${entry.subagentType}: ${desc}` : desc; + } - // Short display line shown in the UI + private emitNotification(entry: BackgroundAgentEntry): void { + if (!this.notificationCallback) return; + + const statusText = + entry.status === 'completed' + ? 'completed' + : entry.status === 'failed' + ? 'failed' + : 'was cancelled'; + + const label = this.buildDisplayLabel(entry); const displayLine = `Background agent "${label}" ${statusText}.`; - // Full model-facing text (includes result/error for the LLM to act on) - const modelLines: string[] = [ - `Background agent "${entry.description}" (${entry.agentId}) ${statusText}.`, + // Truncate before escaping so we don't slice through an escape + // sequence (e.g. mid-`&`) and emit malformed XML. + const rawResult = entry.result + ? entry.result.length > MAX_RESULT_LENGTH + ? entry.result.slice(0, MAX_RESULT_LENGTH) + '\n[truncated]' + : entry.result + : undefined; + + const xmlParts: string[] = [ + '', + `${escapeXml(entry.agentId)}`, ]; - if (entry.result) { - modelLines.push('', entry.result); + if (entry.toolUseId) { + xmlParts.push(`${escapeXml(entry.toolUseId)}`); + } + xmlParts.push( + `${escapeXml(entry.status)}`, + `Agent "${escapeXml(entry.description)}" ${statusText}.`, + ); + if (rawResult) { + xmlParts.push(`${escapeXml(rawResult)}`); } if (entry.error) { - modelLines.push('', `Error: ${entry.error}`); + xmlParts.push(`Error: ${escapeXml(entry.error)}`); } + if (entry.stats) { + xmlParts.push( + '', + `${entry.stats.totalTokens}`, + `${entry.stats.toolUses}`, + `${entry.stats.durationMs}`, + '', + ); + } + xmlParts.push(''); + + const meta: NotificationMeta = { + agentId: entry.agentId, + status: entry.status, + stats: entry.stats, + toolUseId: entry.toolUseId, + }; try { - this.notificationCallback(displayLine, modelLines.join('\n')); + this.notificationCallback(displayLine, xmlParts.join('\n'), meta); } catch (error) { debugLogger.error('Failed to emit background notification:', error); } diff --git a/packages/core/src/core/client.ts b/packages/core/src/core/client.ts index b9f343065..5faa0bf3a 100644 --- a/packages/core/src/core/client.ts +++ b/packages/core/src/core/client.ts @@ -676,8 +676,14 @@ export class GeminiClient { }); } - // record user message for session management - this.config.getChatRecordingService()?.recordUserMessage(request); + // record user/cron message for session management + if (messageType === SendMessageType.Cron) { + this.config + .getChatRecordingService() + ?.recordCronPrompt(request, options?.notificationDisplayText); + } else { + this.config.getChatRecordingService()?.recordUserMessage(request); + } // Idle cleanup: clear stale thinking blocks after idle period. // Latch: once triggered, never revert — prevents oscillation. diff --git a/packages/core/src/core/coreToolScheduler.ts b/packages/core/src/core/coreToolScheduler.ts index 826c3b21c..27007c94c 100644 --- a/packages/core/src/core/coreToolScheduler.ts +++ b/packages/core/src/core/coreToolScheduler.ts @@ -610,6 +610,7 @@ export class CoreToolScheduler { const invocationOrError = this.buildInvocation( call.tool, args as Record, + targetCallId, ); if (invocationOrError instanceof Error) { const response = createErrorResponse( @@ -646,9 +647,17 @@ export class CoreToolScheduler { private buildInvocation( tool: AnyDeclarativeTool, args: object, + callId?: string, ): AnyToolInvocation | Error { try { - return tool.build(structuredClone(args)); + const invocation = tool.build(structuredClone(args)); + if (callId) { + const maybeAware = invocation as { setCallId?: (id: string) => void }; + if (typeof maybeAware.setCallId === 'function') { + maybeAware.setCallId(callId); + } + } + return invocation; } catch (e) { if (e instanceof Error) { return e; @@ -827,6 +836,7 @@ export class CoreToolScheduler { const invocationOrError = this.buildInvocation( toolInstance, reqInfo.args, + reqInfo.callId, ); if (invocationOrError instanceof Error) { const error = reqInfo.wasOutputTruncated diff --git a/packages/core/src/services/chatRecordingService.ts b/packages/core/src/services/chatRecordingService.ts index c81541c03..b4b2c6fa0 100644 --- a/packages/core/src/services/chatRecordingService.ts +++ b/packages/core/src/services/chatRecordingService.ts @@ -58,7 +58,8 @@ export interface ChatRecord { | 'slash_command' | 'ui_telemetry' | 'at_command' - | 'notification'; + | 'notification' + | 'cron'; /** Working directory at time of message */ cwd: string; /** CLI version for compatibility tracking */ @@ -300,16 +301,33 @@ export class ChatRecordingService { } } + /** + * Records a cron-fired prompt. + * Stored as a user-role message with subtype 'cron' so the UI + * restores it as a notification item instead of a user turn. + */ + recordCronPrompt(message: PartListUnion, displayText?: string): void { + this.recordNotificationLike(message, 'cron', displayText); + } + /** * Records a background agent notification. - * Stored as a user-role message (so the API history includes it on resume) - * with subtype 'notification' (so the UI can restore it as an info item). + * Stored as a user-role message with subtype 'notification' so the + * UI restores it as an info item, not a user turn. */ recordNotification(message: PartListUnion, displayText?: string): void { + this.recordNotificationLike(message, 'notification', displayText); + } + + private recordNotificationLike( + message: PartListUnion, + subtype: 'notification' | 'cron', + displayText?: string, + ): void { try { const record: ChatRecord = { ...this.createBaseRecord('user'), - subtype: 'notification', + subtype, message: createUserContent(message), systemPayload: displayText ? ({ displayText } as NotificationRecordPayload) @@ -317,7 +335,7 @@ export class ChatRecordingService { }; this.appendRecord(record); } catch (error) { - debugLogger.error('Error saving notification:', error); + debugLogger.error(`Error saving ${subtype} record:`, error); } } diff --git a/packages/core/src/tools/agent/agent.test.ts b/packages/core/src/tools/agent/agent.test.ts index a8975693f..ce5b60498 100644 --- a/packages/core/src/tools/agent/agent.test.ts +++ b/packages/core/src/tools/agent/agent.test.ts @@ -1439,12 +1439,12 @@ describe('AgentTool', () => { }; vi.mocked(config.getApprovalMode).mockReturnValue(ApprovalMode.DEFAULT); - (config as Record).isInteractive = vi + (config as unknown as Record)['isInteractive'] = vi .fn() .mockReturnValue(true); - (config as Record).getBackgroundTaskRegistry = vi - .fn() - .mockReturnValue(mockRegistry); + (config as unknown as Record)[ + 'getBackgroundTaskRegistry' + ] = vi.fn().mockReturnValue(mockRegistry); vi.mocked(mockSubagentManager.loadSubagent).mockResolvedValue(bgSubagent); vi.mocked(mockSubagentManager.createAgentHeadless).mockResolvedValue( @@ -1526,7 +1526,7 @@ describe('AgentTool', () => { expect(mockRegistry.register).not.toHaveBeenCalled(); }); - it('should reject background when non-interactive, even with background: true config', async () => { + it('should allow background in non-interactive mode (headless support)', async () => { vi.mocked( config.isInteractive as ReturnType, ).mockReturnValue(false); @@ -1543,9 +1543,28 @@ describe('AgentTool', () => { const result = await invocation.execute(); const llmText = partToString(result.llmContent); - expect(llmText).toContain('not supported in non-interactive mode'); - const display = result.returnDisplay as AgentResultDisplay; - expect(display.status).toBe('failed'); + expect(llmText).toContain('Background agent launched'); + expect(mockRegistry.register).toHaveBeenCalled(); + }); + + it('forwards the scheduler-provided callId as toolUseId on the registry entry', async () => { + const params: AgentParams = { + description: 'Start monitor', + prompt: 'Watch for changes', + subagent_type: 'monitor', + }; + + const invocation = ( + agentTool as AgentToolWithProtectedMethods + ).createInvocation(params); + (invocation as unknown as { setCallId: (id: string) => void }).setCallId( + 'call-xyz-789', + ); + await invocation.execute(); + + expect(mockRegistry.register).toHaveBeenCalledWith( + expect.objectContaining({ toolUseId: 'call-xyz-789' }), + ); }); }); }); diff --git a/packages/core/src/tools/agent/agent.ts b/packages/core/src/tools/agent/agent.ts index 280142242..5be55444d 100644 --- a/packages/core/src/tools/agent/agent.ts +++ b/packages/core/src/tools/agent/agent.ts @@ -391,6 +391,7 @@ class AgentToolInvocation extends BaseToolInvocation { readonly eventEmitter: AgentEventEmitter = new AgentEventEmitter(); private currentDisplay: AgentResultDisplay | null = null; private currentToolCalls: AgentResultDisplay['toolCalls'] = []; + private callId?: string; constructor( private readonly config: Config, @@ -400,6 +401,15 @@ class AgentToolInvocation extends BaseToolInvocation { super(params); } + /** + * Invoked by the tool scheduler after `build` to link this invocation + * back to the model's original tool-use request. Used so background + * agents carry the tool-use id through to completion notifications. + */ + setCallId(callId: string): void { + this.callId = callId; + } + /** * Updates the current display state and calls updateOutput if provided */ @@ -660,10 +670,12 @@ class AgentToolInvocation extends BaseToolInvocation { const generationConfig = geminiClient?.getChat().getGenerationConfig(); if (generationConfig?.systemInstruction) { // Inline FunctionDeclaration[] from the parent — passed verbatim - // including `agent` itself so the fork's tool-name set matches the - // parent's. prepareTools bypasses the exclusion filter for inline - // decls; `isInForkExecution()` (ALS-based) is the sole - // recursive-fork block at runtime. + // (including `agent` and cron tools) so the fork's system prompt, + // tools, and history exactly match the parent's and share its + // DashScope cache prefix. A fork is a context-sharing extension of + // the parent, not an isolated subagent, so the general subagent + // exclusion list does not apply. Recursive forks are blocked by the + // ALS-based `isInForkExecution()` guard. const parentToolDecls: FunctionDeclaration[] = ( generationConfig.tools as Array<{ @@ -963,23 +975,6 @@ class AgentToolInvocation extends BaseToolInvocation { subagentConfig.background === true; if (shouldRunInBackground) { - // Background agents are not supported in non-interactive mode - if (!this.config.isInteractive()) { - return { - llmContent: - 'Background agents are not supported in non-interactive mode. Retry without run_in_background.', - returnDisplay: { - type: 'task_execution' as const, - subagentName: this.params.subagent_type || 'unknown', - taskDescription: this.params.description, - taskPrompt: this.params.prompt, - status: 'failed' as const, - terminateReason: - 'Background agents are not supported in non-interactive mode', - }, - }; - } - // Fire SubagentStart hook before background launch const hookSystem = this.config.getHookSystem(); if (hookSystem) { @@ -1005,6 +1000,38 @@ class AgentToolInvocation extends BaseToolInvocation { // survive ESC cancellation of the parent's current turn. const bgAbortController = new AbortController(); + // Background agents can't show interactive permission prompts + // (no UI). Instead of YOLO (which would auto-approve everything), + // we set shouldAvoidPermissionPrompts so the tool scheduler + // auto-denies 'ask' decisions — matching claw-code's approach. + // PermissionRequest hooks still run and can override the denial. + // Base on agentConfig so the resolved approval mode override (e.g. + // subagent-level `approvalMode: auto-edit`) is preserved. + // eslint-disable-next-line @typescript-eslint/no-explicit-any + const bgConfig = Object.create(agentConfig) as any; + bgConfig.getShouldAvoidPermissionPrompts = () => true; + + // Rebuild the subagent against bgConfig. For forks, go through + // createForkSubagent so the parent's rendered system prompt and + // inherited history are carried over; otherwise the background fork + // degrades to a plain FORK_AGENT without context. + // + // Register in the background task registry only AFTER init + // succeeds. If construction throws (e.g. invalid agent config or + // model setup), registering first would leave a phantom 'running' + // entry that the non-interactive hold-back loop would wait on + // forever. + let bgSubagent: AgentHeadless; + if (isFork) { + const fork = await this.createForkSubagent(bgConfig as Config); + bgSubagent = fork.subagent; + } else { + bgSubagent = await this.subagentManager.createAgentHeadless( + subagentConfig, + bgConfig as Config, + ); + } + const registry = this.config.getBackgroundTaskRegistry(); registry.register({ agentId: hookOpts.agentId, @@ -1013,23 +1040,17 @@ class AgentToolInvocation extends BaseToolInvocation { status: 'running', startTime: Date.now(), abortController: bgAbortController, + toolUseId: this.callId, }); - // Background agents can't show interactive permission prompts - // (no UI). Instead of YOLO (which would auto-approve everything), - // we set shouldAvoidPermissionPrompts so the tool scheduler - // auto-denies 'ask' decisions — matching claw-code's approach. - // PermissionRequest hooks still run and can override the denial. - // Inherit from agentConfig so the resolved approval mode is preserved. - // eslint-disable-next-line @typescript-eslint/no-explicit-any - const bgConfig = Object.create(agentConfig) as any; - bgConfig.getShouldAvoidPermissionPrompts = () => true; - - // Create a dedicated subagent that uses the bg-specific config. - const bgSubagent = await this.subagentManager.createAgentHeadless( - subagentConfig, - bgConfig as Config, - ); + const getCompletionStats = () => { + const summary = bgSubagent.getExecutionSummary(); + return { + totalTokens: summary.totalTokens, + toolUses: summary.totalToolCalls, + durationMs: summary.totalDurationMs, + }; + }; // Fire-and-forget: start the subagent without blocking the parent. void (async () => { @@ -1104,12 +1125,14 @@ class AgentToolInvocation extends BaseToolInvocation { // model (and the UI) don't treat incomplete runs as completed. const terminateMode = bgSubagent.getTerminateMode(); const finalText = bgSubagent.getFinalText(); + const completionStats = getCompletionStats(); if (terminateMode === AgentTerminateMode.GOAL) { - registry.complete(hookOpts.agentId, finalText); + registry.complete(hookOpts.agentId, finalText, completionStats); } else { registry.fail( hookOpts.agentId, finalText || `Agent terminated with mode: ${terminateMode}`, + completionStats, ); } } catch (error) { @@ -1117,7 +1140,7 @@ class AgentToolInvocation extends BaseToolInvocation { error instanceof Error ? error.message : String(error); debugLogger.error(`[Agent] Background agent failed: ${errorMsg}`); - registry.fail(hookOpts.agentId, errorMsg); + registry.fail(hookOpts.agentId, errorMsg, getCompletionStats()); } })();