* feat(cli): add bare startup mode
Skip implicit startup discovery in bare mode while keeping explicit inputs such as include directories and extension overrides.
Add a repository plan document and targeted tests for config, startup, skills, extensions, and memory discovery.
* fix(bare): enforce explicit-only startup behavior
* fix(cli): preserve bare tools in non-interactive mode
* chore(docs): remove bare mode planning note
* feat(cli): add session recap with /recap and auto-show on return
Users often open an old session days later and need to scroll through
pages to remember where they left off. This change adds a short
"where did I leave off" recap — a 1-3 sentence summary generated by
the fast model — so they can resume without re-reading the history.
Two triggers:
- /recap: manual slash command.
- Auto: when the terminal has been blurred for 5+ minutes and gets
focused again (uses the existing DECSET 1004 focus protocol via
useFocus). Gated on streamingState === Idle so it never interrupts
an active turn. Only fires once per blur cycle.
The recap is rendered in dim color with a chevron prefix, visually
distinct from assistant replies. A new `general.showSessionRecap`
setting controls the auto-trigger (default on). /recap works
independent of the setting.
Implementation notes:
- generateSessionRecap uses fastModel (falls back to main model),
tools: [], maxOutputTokens: 300, and a tight system prompt. It
strips tool calls / responses from history before sending — tool
responses can hold 10K+ tokens of file content that drown the recap
in irrelevant detail. The 30-message window respects turn boundaries
(slice never starts on a dangling model/tool response).
- Output is wrapped in <recap>...</recap> tags; the extractor returns
empty (skips render) if the tag is missing, preventing model
reasoning from leaking into the UI.
- All failures are silent (return null) and logged via a scoped
debugLogger; recap is best-effort and must never break main flow.
- /recap refuses to run while a turn is pending.
* fix(cli): abort in-flight recap when showSessionRecap is disabled
If the user disables showSessionRecap while an auto-recap LLM call is
already in flight, the previous code returned early without aborting.
The pending .then would still pass its idle/abort guards and append the
recap, producing an unwanted message after the user has opted out.
Abort the controller and clear it eagerly so the resolved promise no
longer adds to history.
* fix(cli): gate /recap and auto-recap on streaming idle state
Two related issues from review:
1. /recap was only refusing when ui.pendingItem was set, but a normal
model reply runs with streamingState === Responding and a null
pendingItem. Invoking /recap mid-stream would generate a recap from
a partial conversation and insert it between the user prompt and
the assistant reply.
2. useAwaySummary cleared blurredAtRef before checking isIdle, so if
focus returned during a still-streaming turn (after a >5min blur)
the recap was permanently dropped — there was no later retry when
the turn became idle, because isIdle was not in the effect deps.
Fixes:
- Expose isIdleRef on CommandContext.ui (mirrors btwAbortControllerRef
pattern). Plumb it from AppContainer through useSlashCommandProcessor.
- recapCommand now refuses when isIdleRef.current is false OR
pendingItem is non-null.
- useAwaySummary preserves blurredAtRef on the !isIdle bail and adds
isIdle to the effect deps, so the trigger re-evaluates when the
current turn finishes.
- Brief blurs (< AWAY_THRESHOLD_MS) still reset blurredAtRef.
Also seeds isIdleRef in nonInteractiveUi and mockCommandContext so the
new field has a sensible default outside the interactive UI.
* docs: document /recap command, showSessionRecap setting, and design
- User docs: add /recap to the Session and Project Management table in
features/commands.md and a dedicated subsection covering manual use,
the auto-trigger, the dim-color rendering, and the fast-model tip.
- User docs: add general.showSessionRecap row to the configuration
settings reference.
- Design doc: docs/design/session-recap/session-recap-design.md covers
motivation, the two trigger paths, the per-file architecture, prompt
design with the <recap> tag and three-tier extractor, history
filtering rationale (functionResponse can be 10K+ tokens), the
useAwaySummary state machine, the isIdleRef gating for /recap, model
selection, observability, and out-of-scope items.
* fix(core): exclude thought parts from session recap context
filterToDialog kept any non-empty text part, but @google/genai's Part
type also marks model reasoning with part.thought / part.thoughtSignature.
That hidden chain-of-thought was being fed to the recap LLM and could
get summarized as if it were user-visible dialogue.
Drop parts where either flag is set. Update the design doc's
History 过滤 section to call this out alongside the existing
tool-call/response rationale.
* docs(session-recap): correct debug-logging guidance, fill in state machine, sharpen UX wording
Audit of the session recap docs against the implementation found three
issues worth fixing:
- Design doc claimed debug logs were enabled via a QWEN_CODE_DEBUG_LOGGING
env var. That var does not exist; debug logs are written to
~/.qwen/debug/<sessionId>.txt by default, gated by QWEN_DEBUG_LOG_FILE.
Replace with the accurate path + opt-out behavior, and tell the reader
to grep for the [SESSION_RECAP] tag.
- Design doc's useAwaySummary state machine table was missing the
isFocused && blurredAtRef === null path (taken on first render and
right after a brief-blur reset). Add the row.
- User doc's "Refuses to run ... failures are silent" line conflated the
inline-error refusal with silent generation failures, and "(when the
conversation is idle)" used internal jargon. Split the two cases and
spell out what "idle" means, including the wait-then-fire behavior
when focus returns mid-turn.
* docs(session-recap): correctly describe /recap vs auto-trigger failure modes
The previous wording said "Generation/network failures are silent — the
recap simply does not appear", but recapCommand returns a user-facing
info message ("Not enough conversation context for a recap yet.") in
exactly that path, and also returns inline messages for the
config-not-loaded and busy-turn guards.
Only the auto-trigger path is truly silent (it just skips addItem when
generateSessionRecap returns null). Split the two paths in the doc so
the manual command's "always responds with something" behavior is
distinguished from the auto-trigger's no-op-on-failure behavior.
* docs(session-recap): align prompt-rules section with the actual prompt
Two doc-vs-code mismatches in the design doc's "System Prompt" section,
caught with the same lens as yiliang114's failure-mode review:
- The bullet list claimed RECAP_SYSTEM_PROMPT forbids "推测用户意图"
and "用 'you' 称呼用户". Those rules existed in an early draft but
were dropped when the <recap> tag rules were added; the current
prompt has no such restrictions. Replace with the actual rules and
add a "与 RECAP_SYSTEM_PROMPT 一一对应" marker so future edits stay
in sync.
- The doc said systemInstruction "覆盖" the main agent prompt. True
for the agent prompt portion, but GeminiClient.generateContent
internally calls getCustomSystemPrompt which appends user memory
(QWEN.md / 自动 memory) as a suffix. Spell that out — the final
system prompt is recap prompt + user memory, which is actually
useful project context for the recap.
* docs(session-recap): translate design doc to English
The repo convention for docs/design is English (7 of 8 existing files;
auto-memory/memory-system.md is the only Chinese one). The first version
of this design doc followed the auto-memory example, which turned out
to be the wrong sample.
Translate to English while preserving the existing structure, the
state-machine table, the prompt-vs-doc 1:1 alignment, the
QWEN_DEBUG_LOG_FILE description, and the failure-mode notes added in
prior commits.
* fix(cli): drop empty info return from /recap interactive success path
The interactive success path inserts the away_recap history item
directly via ui.addItem and then returned `{type: 'message',
messageType: 'info', content: ''}`. The slash-command processor's
'message' case unconditionally calls addMessage, which adds another
HistoryItemInfo with empty text. The empty info renders as nothing
(StatusMessage early-returns null), but it still bloats the in-memory
history list and shows up in /export and saved sessions.
Return void on the interactive success path and on the abort path so
the processor's `if (result)` check skips the message-handler branch
entirely. Widen the action's return type to `void | SlashCommandActionReturn`
to match (same shape as btwCommand).
* feat(vscode-companion): enable Plan Mode toggle and approval UI
- Add Plan Mode to the approval mode cycle (plan → default → auto-edit → yolo → plan)
- Add Tab key shortcut to cycle approval modes in the input field
- Fix cancel handling for exit_plan_mode: reject plan without aborting agent session
- Add plan approval UI in PermissionDrawer with markdown content rendering
Closes#1985
Made-with: Cursor
* fix(vscode-ide-companion/webview): finalize rejected plan prompts
* fix(ui): constrain shell output width to prevent box overflow
When shell commands produce wide table output (e.g., gh run list),
the text would overflow the bordered box container in the TUI because
AnsiOutputText didn't apply any width constraint.
This fix:
1. Adds maxWidth prop to AnsiOutputText component
2. Wraps output in MaxSizedBox for proper width/height constraints
3. Adds wrap=truncate to individual text tokens
4. Passes childWidth from ToolMessage (matching other renderers)
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(ui): address review feedback on AnsiOutput MaxSizedBox wrapping
MaxSizedBox requires its direct children to be row <Box> elements;
wrapping the rows in an extra <Box flexDirection="column"> broke the
layout contract and caused shell output to render as empty content.
Remove the wrapper so each line is a direct <Box> child of MaxSizedBox.
Update the "handles empty lines and empty tokens" test: with row
<Box> elements in place, empty AnsiLines are now correctly preserved
as blank output rows (matching the source terminal) instead of being
silently collapsed by the former <Text>-per-row rendering.
* test(ui): cover multi-token wide-line truncation in AnsiOutputText
The existing truncation test used a single 100-char token, which takes
the straightforward MaxSizedBox single-segment path. Real-world shell
output like `gh run list` is a single logical row composed of many
styled-column tokens whose combined width exceeds the box — that path
relies on per-token wrap="truncate" plus ink's flex layout for the
final crop, not MaxSizedBox itself. Cover that shape so future
regressions in either half of the mechanism are caught.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
The /clear command cleared the history log but left an active /btw
side-question dialog visible in the fixed bottom area, because /btw
stores state in dedicated btwItem state (via setBtwItem) rather than
in history items. The ui.clear callback only called clearItems() and
clearScreen(), never cancelBtw(), so the pending-btw dialog survived.
Call cancelBtw() from ui.clear so /clear (and /reset, /new) abort any
in-flight btw request and null out the btwItem state.
Fixes#3334
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat(core): add dynamic swarm worker tool
Add a swarm tool for ad-hoc parallel worker execution with bounded concurrency, wait-all and first-success modes, per-worker failure
isolation, and aggregated results.
Register the tool in core, prevent nested worker recursion, and document the new workflow.
* fix(core): harden swarm worker execution
Prevent swarm calls from bypassing the outer scheduler concurrency budget.
Disallow interactive question prompts in swarm workers by default, and avoid incomplete Markdown table escaping by using an HTML entity for
pipe characters. Add focused tests for the scheduler behavior, worker tool restrictions, and result formatting.
Replace git init --initial-branch with git init followed by
symbolic-ref HEAD refs/heads/main. This keeps new repositories on main
without requiring Git 2.28 or newer.
Also ensure checkpoint shadow repository setup uses its dedicated git
config during the initial commit.
* feat(cli): support refreshInterval in statusLine for periodic refresh
The statusLine (#3311) re-runs only when Agent state changes (token count,
model, git branch, etc.). Commands that display *external* data — a clock,
rate-limit counters, CI build status — have no Agent event to hook into
and go stale between messages.
Add an optional `ui.statusLine.refreshInterval` field (seconds, minimum 1)
that schedules a setInterval alongside the existing event-driven updates.
Overlap with state-change debounce is safe: `doUpdate` kills any in-flight
child and bumps the generation counter, so only the most recent output
reaches the footer.
Validation lives in `getStatusLineConfig`:
- Must be `number`, `Number.isFinite(...)`, `>= 1`
- Anything else is silently dropped (no interval scheduled)
No changes to the default behavior — configs without `refreshInterval`
behave exactly as before.
* fix(cli): yield periodic statusLine tick when previous exec is in flight
Review feedback on #3383: with `refreshInterval: 1` and a command whose
real exec time exceeds 1s, each tick was unconditionally calling
`doUpdate()` — which kills the in-flight child and bumps the generation
counter — so the prior exec's callback was always discarded as stale.
`setOutput` was never reached and the statusline stayed empty until
`refreshInterval` was removed or the command became faster.
Guard the interval callback with an `activeChildRef` check so a pending
exec is allowed to finish. State-change triggers (model switch, token
count, branch, etc.) still go through `scheduleUpdate` → `doUpdate`
directly and legitimately preempt stale children; only the periodic
tick yields. The existing 5s exec timeout is still the hard ceiling.
Also drop the redundant `'refreshInterval' in raw` check — the `typeof
raw.refreshInterval === 'number'` guard already excludes missing /
undefined values.
Tests:
- Add regression test `'skips periodic ticks while a previous exec is
still running'` — three ticks during one unfinished exec trigger zero
new spawns; the next tick after callback completion does spawn.
- Update two existing tests to resolve the mount exec before expecting
subsequent ticks (the old tests implicitly relied on the starvation
behavior being tolerated).
* test(cli): assert user-visible lines state in starvation regression
Self-review insight: the existing `skips periodic ticks while a previous
exec is still running` test only counted `exec` calls — it confirmed the
guard prevents redundant spawns, but would have silently passed even if
the eventual callback was still being discarded as stale (which is the
actual user-visible symptom of the starvation bug).
Add `expect(result.current.lines).toEqual(['done'])` after resolving the
mount's pending callback. Without the guard, generationRef would have
bumped 3 times during the yielded ticks, the callback's captured gen
would fail the stale check, `setOutput` would never fire, and `lines`
would stay empty — now caught explicitly.
* perf(cli): dedupe statusLine output to skip unchanged Footer re-renders
Review feedback on #3383 (narrow terminal stacking): when
`refreshInterval` fires at 1s and the command output is unchanged, the
mount-and-setOutput cycle still allocates a new array and triggers a
Footer re-render. Under certain narrow-terminal conditions, Ink's
erase-line accounting mis-counts wrapped rows and stale content
accumulates on screen.
The Footer-layout root cause is in #3311's narrow-mode flex setup and
Ink's truncate semantics, which is out of scope for this PR. But we
can cut the re-render surface here by preserving the `lines` array
reference when the command produces identical output — a strict
Pareto improvement for any caller (clock-style statuslines with
second-precision still re-render; rate-limit / branch / CI-status
style statuslines that change infrequently stop triggering work every
tick).
Tests:
- `preserves the same lines array reference when output is unchanged`
asserts referential equality after a re-exec with identical stdout.
- `produces a new reference when output changes` guards against
over-eager dedup that would miss legitimate updates.
* fix(cli): stabilize Footer rendering in narrow terminals
Narrow-terminal E2E feedback on #3383: with `refreshInterval` at 1s,
empty lines were accumulating above the input prompt each tick. Root
cause is in the Footer flex layout — originally from #3311 — where Ink
miscounts logical rows vs the physical rows the terminal actually uses.
Two adjustments, both idiomatic (used elsewhere in the repo already):
1. Left column — `minWidth={0}`. Without this, Yoga's `min-width: auto`
default keeps the Box at its natural content width, so a statusline
wider than the terminal doesn't engage `<Text wrap="truncate">`; the
text renders at content-width and the terminal wraps it physically.
`minWidth={0}` lets the column shrink so the text child can truncate
at container width.
2. Right section — `flexWrap="wrap"`. With multiple indicators (sandbox
label, debug badge, dream, context-usage) the row can exceed a narrow
terminal's width. Without `flexWrap` Ink lays them out in a single
logical row, but the terminal physically wraps to two — Ink's erase
sequence (`\e[2K\e[1A…` per logical row) then clears one row while
two exist, and the extra row ghosts every re-render. With `wrap` Ink
tracks the second row explicitly and erases correctly.
Together these make the Footer's row count match between Ink's logical
view and the terminal's physical view, so frequent re-renders (as
`refreshInterval` enables) stop accumulating ghost rows.
Needs verification in a real narrow TTY — from this environment I can
reason about the flex semantics and confirm both props are supported by
Ink's Box, but actually observing ghost-row elimination requires
process.stdout.columns on a real terminal.
* Revert "fix(cli): stabilize Footer rendering in narrow terminals"
This reverts commit 9758cda85f. Reason: I could not reproduce BZ-D's
reported ghost-row stacking in tmux (40x25, 2-line statusline + real
exec + Static history + refreshInterval: 1) over 14+ ticks. Both
`minWidth={0}` and `flexWrap="wrap"` are legitimate defensive idioms,
but without a failing repro I can't verify they address the reported
bug, and I shouldn't ship a speculative layout change as "the fix".
Keeping the output-dedup commit (e1d321186) — that one is a strict
improvement regardless of the underlying Ink behavior. Will request
BZ-D's specific terminal setup and reopen with a verified fix (or
confirm the issue is specific to a particular emulator, not flex/Ink).
* ci(stale): enable 28+28 stale/close policy for pull requests
- Fix the repository guard so the workflow actually runs on
QwenLM/qwen-code (it was previously gated to google-gemini/gemini-cli
and never executed in this repo).
- Scope the behavior to pull requests for now; issue policy will be
introduced separately once triage labels are in place.
- Mark a PR stale after 4 weeks without activity, then close it after
another 4 weeks.
- Exempt pinned, security, status/blocked, status/on-hold, and
status/ready-for-merge from auto-close.
- Remove the stale label automatically when activity resumes, and
process the oldest PRs first on each run.
* ci(stale): loosen PR cadence from 28+28 to 35+35
Five weeks + five weeks gives contributors more slack around holidays
and busy periods, and reduces the first-run impact on the existing
backlog. The total window moves from 56 days to 70 days.
* ci(stale): move cron from 01:30 UTC to 00:30 UTC
Shift by one hour so results are ready before the Beijing work day
starts (08:30 local), while still avoiding the top of the hour (the
high-contention window for GitHub-hosted runners) and staying 30
minutes after release.yml at 00:00 UTC.
* ci(stale): drop redundant repo guard and document ops-per-run
- Remove the `github.repository == 'QwenLM/qwen-code'` job guard:
scheduled runs are already disabled on forks by GitHub, and
workflow_dispatch is manually-triggered so the guard adds no safety.
- Add a comment explaining the `operations-per-run: 100` rationale
(rate-limit headroom given the ~150-PR backlog).
* fix(build): invoke tsx directly via node --import instead of npx
npx resolution breaks when scripts/build.js is invoked under bun
(bun's npx wrapper intercepts and runs tsx inside bun's runtime, where
tsx's CJS entry fails to resolve). Using 'node --import tsx/esm' skips
the npx layer entirely and works under both npm and bun invocation.
* fix(build): use node --import tsx/esm for generate:settings-schema script
Matches the approach taken in scripts/build.js so running
`bun run generate:settings-schema` directly bypasses bun's npx wrapper
and avoids the `Cannot find module './cjs/index.cjs'` tsx CJS failure.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat(cli): add early input capture to prevent keystroke loss during startup (#3224)
Start raw mode stdin listening immediately after setRawMode(true), buffer
user input during REPL initialization (200-500ms), then replay it once
KeypressProvider is mounted. Prevents keystrokes typed before the REPL
is ready from being silently dropped.
- Filter out terminal response sequences (DA, DA2, OSC, DCS, APC)
while preserving real user input (arrow keys, function keys, etc.)
- 64KB buffer limit for safety
- Replay via setImmediate() to ensure subscribers are registered first
- Disable via QWEN_CODE_DISABLE_EARLY_CAPTURE=1
- Add benchmark-startup.sh / benchmark-startup-simple.sh for baseline
startup time measurement
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(cli): fix bugs and optimize early input capture
- Fix getAndClearCapturedInput resetting captured flag, preventing potential re-arm
- Fix passthrough mode replay bypassing paste marker handling in KeypressContext
- Optimize buffer storage from O(n^2) concat to chunked collection
- Optimize filterTerminalResponses to use pre-allocated Buffer instead of number[]
- Add atomic stopAndGetCapturedInput API to prevent two-step usage errors
- Remove unrelated benchmark shell scripts
- Add test for stopAndGetCapturedInput
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(cli): fix listener leak, silent failures, and error handling in early input capture
- Register cleanup for stdin listener in gemini.tsx to prevent orphaned
listener on any error path before UI mounts
- Add try-catch and cancellation guard to setImmediate replay in
KeypressContext to handle component unmount and replay errors gracefully
- Stop capture immediately and warn when buffer limit is reached instead
of silently dropping data with a debug-level log
- Capture stdin reference at registration time so removeListener always
operates on the correct stream instance
- Add debug log when early capture is skipped due to non-TTY stdin
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(cli): fix early input capture being lost under React StrictMode
Move stopAndGetCapturedInput() from inside KeypressProvider's useEffect
to before render() in startInteractiveUI. When DEBUG=1, React StrictMode
deliberately runs effect→cleanup→effect, causing the first mount to drain
the buffer and schedule a replay that the cleanup immediately cancels. The
second mount found an empty buffer, silently discarding startup keystrokes.
By draining once before render() and passing the bytes as a stable prop,
StrictMode remounts always read the same data and can schedule replay on
the second (stable) mount.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix: handle split ESC prefixes in early input capture
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix: conditionally flush pending startup capture bytes
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix: drop incomplete escape sequences instead of replaying as user input
When capture stops with an incomplete ESC sequence in pendingTerminalResponse
(e.g. lone \x1b or \x1b[), classifyEscapeSequence returns 'incomplete'.
Previously shouldReplayPendingAtStop used !== 'terminal' which treated
incomplete sequences as user input. Changed to === 'user' so only
definitively-user input is replayed; ambiguous sequences are safely dropped.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
---------
Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Extract message list into a React.memo component to prevent
re-rendering the entire chat history on every keystroke.
- Extract MessageList as a memoized component
- Wrap UserMessage, AssistantMessage, ThinkingMessage with React.memo
- Stabilize onFileClick callback with useCallback
- Remove console.log from render path
- Wrap handleToggleThinking with useCallback
Fixes#2395
Made-with: Cursor
When MCP OAuth authentication falls back to the "copy this URL into
your browser" path (e.g. remote/web terminal where the browser can't
auto-open), long URLs wrap across lines inside the bordered dialog and
the trailing │ border characters get selected alongside the URL,
forcing the user to manually strip them out before pasting.
Surface the URL on a dedicated event and let the user press 'c' to
push it to the local clipboard via an OSC 52 escape sequence. Works
through SSH and modern web terminals (iTerm2, Windows Terminal,
xterm.js-based emulators, tmux with set-clipboard, etc.) without a
subprocess, and falls back to a visible "copy the URL above manually"
hint when the terminal is not a TTY or OSC 52 is blocked.
Key points:
- OAuth provider emits OAUTH_AUTH_URL_EVENT carrying the full URL.
- AuthenticateStep listens, tracks it in state, and binds 'c' while
authenticating (modifier/paste keys are filtered out).
- copyToClipboardViaOsc52 writes to stderr when it's a TTY,
falls back to stdout, and wraps the sequence for tmux/GNU screen
via DCS passthrough so multiplexed sessions still work.
- Honest feedback: distinct "copy request sent" / "cannot write to
terminal" states with a short auto-revert so repeated presses reset
the timer.
Fixes#3337
Make DualOutputBridge.shutdown() await the underlying write stream close
event instead of returning immediately after stream.end(). This removes
the Windows temp directory cleanup race in DualOutputBridge tests and
makes interactive cleanup reliably flush session_end.
Update the CoreToolScheduler retry-loop test registry mock to match the current
ToolRegistry interface. Add ensureTool and getAllToolNames so the tests exercise
the scheduler path used in production.
Closes#3221.
Introduces a lazy factory API on ToolRegistry (registerFactory,
ensureTool, warmAll, getAllToolNames) as infrastructure for future
esbuild code-splitting (#3226). With the current single-bundle build,
the lazy API does not change startup time on its own — the primary
immediate value is fixing three pre-existing bugs uncovered while
designing it.
Bug fixes:
- Concurrent instantiation (P0): the original ensureTool had no
concurrency protection around `await factory()` — two concurrent
calls for the same tool both passed the cache check and each ran the
factory, producing two instances. AgentTool and SkillTool register
SubagentManager listeners in their constructors, so the extra
instance leaked listeners. Fix: a per-name `inflight: Map<string,
Promise<Tool>>` so concurrent ensureTool() calls share a single
promise. On factory rejection the inflight entry is cleared so a
subsequent call can retry.
- stop() resource leak: stop() only disposed tools already in
`this.tools`; tools still loading in `inflight` when stop() ran
finished afterward and were never disposed. Fix: await
Promise.allSettled(inflight.values()) before the dispose loop.
- Cache hit left stale factory: ensureTool's cache-hit branch did not
delete the factory entry, so warmAll() would re-invoke the factory
for an already-loaded tool. Fix: delete the factory on cache hit.
Additional hardening in response to review feedback:
- warmAll({ strict?: boolean }): strict mode re-throws the first
factory failure rather than swallowing it. Config.initialize() uses
strict: true so a broken built-in tool fails startup fast instead of
silently leaving a partially initialized registry; runtime-path
callers (GeminiChat, agent runtime, etc.) continue to use the
non-strict default and log failures via debugLogger.
- getAllTools() and getFunctionDeclarationsFiltered() emit a debug
warning when called while unloaded factories remain, nudging callers
toward warmAll() without hard-breaking existing code paths.
- copyDiscoveredToolsFrom() now iterates source.tools.values()
directly instead of source.getAllTools() — the copy path deals only
with already-discovered MCP/command tools and should not trigger the
unloaded-factory warning.
- MemoryTool and SkillTool config parsing was extracted into
memory-config.ts and skill-utils.ts so a factory can resolve tool
metadata without importing the tool module.
Tests:
- tool-registry.test.ts adds 128 lines covering: concurrent ensureTool
runs the factory exactly once, warmAll and ensureTool overlap,
retries succeed after a prior factory failure, stop() disposes tools
that finish loading after stop was called, and warmAll strict vs
default behavior.
- 33 existing call sites across cli, core, agents, and subagents were
updated to await warmAll() before bulk tool access.
Primary change: prevent the model from burning tokens in an infinite retry
loop when a tool call repeatedly fails schema validation with the same
error (observed with ask_user_question and a malformed `questions`
parameter retrying 10+ times with the same validation error).
- Track consecutive validation failures per (tool name, error message)
pair in CoreToolScheduler via a `validationRetryCounts` Map.
- After 3 consecutive failures for the same (tool, error) pair, append a
RETRY LOOP DETECTED directive to the error response instructing the
model to stop, re-examine the schema, try a fundamentally different
approach, or surface the issue to the user.
- Reset per-tool counters when the tool invocation succeeds; reset
globally when an incoming batch shares no tool name with any
previously failing tool; reset the per-tool counter when the tool
returns a different validation error so unrelated mistakes do not
accumulate toward the threshold.
- Distinct from LoopDetectionService, which tracks model-behavior loops
(repeated thoughts, stagnant actions); this change catches tool-API
misuse loops at the scheduler layer.
Piggyback fixes bundled in the same PR:
- packages/cli/index.ts, packages/core/src/services/shellExecutionService.ts:
treat PTY `EAGAIN` on the read path as an expected read error alongside
`EIO`, avoiding noisy surface-level failures from transient
non-blocking reads.
- scripts/build.js: switch the settings-schema generation step from
`npx tsx` to `node --import tsx/esm` for Bun compatibility.
Tests:
- Unit tests in coreToolScheduler.test.ts cover: directive injection on
the 3rd consecutive failure, counter reset when a different tool is
called, and counter reset after a successful invocation of the same
tool (fail → fail → succeed → fail → fail must not trip the directive).
Fixes#500.
Number keys in AskUserQuestionDialog previously only moved the highlight
cursor without submitting, inconsistent with RadioButtonSelect and the
standard tool approval dialog. Users pressed a number, saw the option
highlight, and assumed it was selected, but the dialog was still waiting
for Enter.
- For single-select predefined options, pressing a number key now
auto-submits immediately.
- Multi-select, "Other" custom input, and the Submit tab remain
highlight-only (unchanged).
- Extracted a shared selectAndAdvance helper to deduplicate the
select-and-submit/advance logic across 4 code paths (number key,
Enter, multi-select submit, custom input submit).
- Removed redundant isFocused guard inside the useKeypress callback;
it is already handled via the isActive parameter.
Tests cover all four behavioral branches: single-select auto-submits,
multi-select does not, "Other" custom input does not, and the Submit
tab does not.
PNG's magic bytes are 89 50 4E 47, but detectImageMime only checked
the first three. The WebP branch in the same function correctly checks
all four bytes of its signature — the PNG path was clearly an oversight.
Extend the PNG check to include 0x47 ('G') for consistency and to
eliminate the (admittedly rare) false-positive window.
getPositionFromOffsets used different per-line length calculations and
different comparison operators for start vs end offsets, producing
asymmetric and sometimes invalid results at line boundaries.
Concrete failure: lines=['abc','def'], endOffset=4 (the position after
the newline at offset 3). The start calc correctly resolved this to
(row=1,col=0), but the end calc used lineLength=length (no +1 for the
last-line case) combined with >= and returned (row=0,col=4) — an
out-of-bounds column on a 3-char line.
Downstream, replaceRangeInternal rejects endCol > currentLineLen(endRow)
as an invalid range and silently returns state unchanged. This caused
vim line-change commands (vim_change_movement 'j'/'k', vim_change_line
spanning row boundaries) to no-op while still pushing an empty undo
frame.
Replace both loops with a single offsetToRowCol helper that matches the
original start-calc logic, and update the vim 'change multiple lines
down' test whose expectation was baked around the silent no-op.
schema.description is only assigned when setting.description is truthy.
For enum settings missing a description, the subsequent += produced the
literal string 'undefined Options: foo, bar' in the generated JSON
schema. Initialize the field when absent instead of concatenating onto
undefined.
The stdinDoesNotEnd option was completely broken. The original code had
a conditional stdin.end() scoped to object-type promptOrOptions, followed
by an unconditional stdin.end() that always ran — so stdinDoesNotEnd: true
had no effect.
Restructure as an explicit keepStdinOpen check: close stdin unless the
caller passed an options object with stdinDoesNotEnd: true. The string-
prompt call path still closes stdin, and null is guarded (typeof null
=== 'object' in JS).
scripts/clean.js deleted the bundle directory twice. The second call
was harmless (the first already removed it) but clearly a copy-paste
leftover from when RMRF_OPTIONS was introduced.
If the sandbox image name has no explicit :tag and QWEN_SANDBOX_IMAGE_TAG
is unset, imageName.split(':')[1] returns undefined, producing a bogus
build target like 'myimage:undefined'. Fall back to 'latest' to match
Docker's conventional default.
* fix(cli): reduce terminal redraw cursor movement
Collapse Ink multiline erase sequences into a single relative cursor move plus erase-down operation.
This avoids excessive repeated cursor-up writes during streaming interactive renders while preserving normal TTY behavior. Screen
reader mode and non-TTY output are left unchanged, with a legacy env fallback available.
* Optimize Ink multiline erase sequences during interactive TTY rendering.
Collapse repeated cursor-up movement while preserving bounded line clearing, so redraws avoid excessive upward cursor jumps without erasing
unrelated terminal output below the frame. Non-TTY output, screen reader mode, and non-string writes are unchanged.
* feat(cli): add dual-output sidecar mode for TUI
Adds an optional **dual-output** mode for the interactive TUI: while Qwen
Code keeps rendering normally on stdout, it concurrently emits a structured
JSON event stream on a second channel (--json-fd / --json-file) and
optionally watches a JSONL command file (--input-file) for prompts and
tool-permission responses written by an external program.
This unlocks programmatic embedding of the TUI from IDE extensions, web
frontends, CI agents, or automation scripts without forcing them to give
up the rich interactive UI in favor of --output-format=stream-json.
## Design
The TUI already has a battle-tested JSON event emitter
(`StreamJsonOutputAdapter`). This change makes that adapter pluggable on
its output stream and wires a small `DualOutputBridge` that forwards TUI
events to a second instance of the adapter writing to fd / file.
For tool approvals, when a tool enters awaiting_approval the bridge emits
`control_request` (subtype `can_use_tool`); whichever side resolves first
(TUI's native UI or `confirmation_response` via --input-file) wins, and a
`control_response` is mirrored back so all observers stay in sync.
`session_start` is announced once when the bridge is constructed so
consumers can correlate the channel with a session before any other event
arrives.
## CLI surface
- `--json-fd <n>` — write JSON events to fd n (n >= 3; provided via spawn
stdio).
- `--json-file <path>` — write JSON events to a file / FIFO / /dev/fd/N.
- `--input-file <path>` — watch this file for JSONL commands.
`--json-fd` and `--json-file` are mutually exclusive. fds 0/1/2 are
rejected to prevent corrupting the TUI.
## Wire protocol
Output: existing stream-json schema with `includePartialMessages` always
enabled, plus:
- `system` / `subtype: session_start` — emitted once on bridge
construction.
- `control_request` / `subtype: can_use_tool` — pending tool approval.
- `control_response` — final approval outcome (mirrors TUI-native or
external resolution).
Input (--input-file):
{"type":"submit","text":"What does this function do?"}
{"type":"confirmation_response","request_id":"...","allowed":true}
`submit` is queued and retried when the TUI returns to idle.
`confirmation_response` is dispatched immediately — a pending tool call
is blocking and the response cannot wait behind earlier submits.
See `docs/users/features/dual-output.md` for the full schema, latency
notes, failure modes, and a spawn example.
## What changes when the flags are absent
Nothing. The bridge and watcher are constructed only when the relevant
flags are set; otherwise the React Context providers carry `null` and
every callsite short-circuits. No overhead, no behavioral change for
existing users.
## Failure handling
- Bad fd / unopenable path → warning on stderr, dual output stays
disabled, TUI launches normally.
- Consumer disconnect (EPIPE) → bridge silently disables itself, TUI
keeps running.
- Any exception inside the adapter → caught, logged, bridge disabled.
The TUI is never crashed by a dual-output failure.
## Files
New:
- packages/cli/src/dualOutput/{DualOutputBridge,DualOutputContext,index}.{ts,tsx}
- packages/cli/src/remoteInput/{RemoteInputWatcher,RemoteInputContext,index}.{ts,tsx}
- packages/cli/src/nonInteractive/io/index.ts
- docs/users/features/dual-output.md
Modified:
- packages/core/src/config/config.ts — 3 new ConfigParameters fields + getters
- packages/cli/src/config/config.ts — yargs options + mutex validation
- packages/cli/src/gemini.tsx — instantiate bridge / watcher in
startInteractiveUI, wrap with Context Providers, register cleanup
- packages/cli/src/ui/AppContainer.tsx — connect RemoteInput to
submitQuery, bridge tool confirmations
- packages/cli/src/ui/hooks/useGeminiStream.ts — call
dualOutput?.processEvent(...) at five existing event points
- packages/cli/src/nonInteractive/io/{Base,Stream}JsonOutputAdapter.ts —
StreamJsonOutputAdapter accepts an injected output stream; base adapter
exposes emitPermissionRequest / emitControlResponse through a new
emitControlMessageImpl hook (default no-op in batch mode).
## Tests
- packages/cli/src/dualOutput/DualOutputBridge.test.ts — fd validation,
auto session_start, control-event routing, post-shutdown safety.
- packages/cli/src/remoteInput/RemoteInputWatcher.test.ts — submit
forwarding, immediate confirmation dispatch, busy/idle retry,
malformed-line tolerance, shutdown.
- packages/cli/src/nonInteractive/io/StreamJsonOutputAdapter.dualOutput.test.ts —
custom outputStream injection and new emitPermissionRequest /
emitControlResponse paths.
tsc --noEmit -p packages/cli/tsconfig.json is clean.
vitest run src/nonInteractive src/dualOutput src/remoteInput → 297 passed,
1 skipped, 11 files.
* feat(cli): dual-output capability handshake, session_end, control_error, settings.json
Incremental improvements on top of the initial dual-output PR based on
reviewer feedback. All extensions are additive; older consumers that
ignore unknown fields keep working.
## Capability handshake in session_start
`session_start.data` now carries three new fields so consumers can
feature-detect without sniffing the stream:
- `protocol_version` (integer, currently 1) — bumped on any protocol
change consumers might care about.
- `version` (string) — the Qwen Code CLI version, threaded in from
`gemini.tsx`.
- `supported_events` (string[]) — the event kinds this bridge version
is known to emit, exported as `SUPPORTED_EVENTS` from the module.
## session_end on bridge shutdown
DualOutputBridge.shutdown() now emits a final
`system` / `session_end` event carrying `session_id` before closing the
stream. Gives consumers a definitive termination signal rather than
requiring them to infer it from EPIPE. Idempotent — calling shutdown
twice emits exactly one session_end.
## control_error emission path
`ControlErrorResponse` (already defined in types.ts) now has a first-
class emission path: `BaseJsonOutputAdapter.emitControlError(requestId,
message)` → `control_response` with `subtype: 'error'`. Wired into
AppContainer's remote-input confirmation handler so that a
`confirmation_response` referencing an unknown / already-resolved
request_id produces a structured error reply instead of silently
dropping, letting consumers retry or surface the error.
## settings.json support
New `dualOutput` top-level settings block with `jsonFile` and
`inputFile` properties. `--json-fd` has no settings equivalent (fd
passing is a spawn-time concern). CLI flag wins over settings when
both are present, so scripted one-off runs still work unchanged.
`requiresRestart: true` since the bridge is constructed once at
startup.
## Documentation
`docs/users/features/dual-output.md` gains three major sections:
- **Use cases** — concrete integration scenarios (terminal+chat dual
sync, IDE extensions, web frontends, CI observers, multi-agent
orchestration, session replay, observability, QA).
- **Why two output flags?** — detailed rationale for coexisting
`--json-fd` and `--json-file`, including the PTY constraint
(`node-pty` / `bun-pty` expose no stdio array, and `forkpty(3)` /
`login_tty` actively close fds >= 3 before exec).
- **Comparison with Claude Code's stream-json** — schema-parity
matrix, transport-topology differences, permission-control-plane
behavioral notes, and a "room to improve" section as a design
horizon.
- **Runnable demos** — seven copy-paste POCs: event observer, remote
submit, permission bridge, Node embedder with capability
feature-detection, session_end handling, failure drills.
- **Settings-based configuration** — example settings.json snippet and
precedence rules.
## Tests
- DualOutputBridge.test.ts: new cases for capability handshake shape,
session_end on shutdown, shutdown idempotency, and emitControlError.
- StreamJsonOutputAdapter.dualOutput.test.ts: new case for
emitControlError at the adapter level.
302 passed, 1 skipped, 11 files. tsc --noEmit -p packages/cli is clean.
* docs(dual-output): shrink Claude Code comparison to one honest sentence
After actually reading the Claude Code source (src/cli/structuredIO.ts,
src/bridge/*, src/utils/messages/systemInit.ts), the previous
"Comparison with Claude Code's stream-json" section was overstated:
- Claude Code has no equivalent of TUI + sidecar running simultaneously.
Its stream-json only works with --print (non-interactive); the bridge
in src/bridge/* is Anthropic's own remote worker protocol, not a
local embedding surface.
- CC uses `system/init` (not `session_start`) and has no session_end in
the wire protocol, so the schema-parity table contained false ticks.
- Framing this PR as "parity with Claude Code" is therefore inaccurate;
it's filling a gap Claude Code does not address.
Replace the whole multi-section comparison (schema matrix, transport
table, permission notes, borrow list, roadmap) with a single sentence
stating the accurate relation: same event format in spirit, different
topology — CC's is non-interactive only.
* fix(cli): address review feedback on dual-output sidecar mode
- Fix control_response mirror: external-initiated confirmations now
emit control_response via the same mirror useEffect as TUI-native
resolutions, making the emission path symmetric for all observers.
- Fix ENOENT: --json-file with a non-existent path now falls back to
createWriteStream (auto-creates the file) instead of throwing.
- Fix race: add reading guard to RemoteInputWatcher.readNewLines()
preventing duplicate command processing on rapid appends.
- Refactor confirmationHandler to use refs (pendingToolCallsRef,
dualOutputRef) and register once (deps: [remoteInput]) to eliminate
teardown/re-registration churn.
- Add debug logging to shutdown bare catch for ops correlation.
- Add ENOENT fallback test case for DualOutputBridge.
- Regenerate settings.schema.json for dualOutput section.
Generated with AI
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(cli): make RemoteInputWatcher poll interval configurable for CI reliability
RemoteInputWatcher.test.ts was timing out in CI (5s default) because
fs.watchFile's 500ms poll interval is unreliable under load. Fix:
- Accept optional `pollIntervalMs` in constructor (default 500ms).
- Tests use 100ms poll interval for faster feedback.
- Increase per-test timeout to 15s and waitFor timeout to 10s.
- Increase "TUI busy" wait from 800ms to 1500ms for CI headroom.
Generated with AI
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(cli): eliminate fs.watchFile timing dependency in RemoteInputWatcher tests
Tests were flaky across all CI platforms (macOS/ubuntu/windows) because
fs.watchFile polling (even at 100ms) is unreliable under CI load.
Fix: expose checkForNewInput() as a public method that directly triggers
file reading and returns a Promise. Tests now call it synchronously after
writing to the input file — no polling, no timeouts, deterministic.
Also fixes:
- Windows ENOTEMPTY: add delay in afterEach before rmSync
- Add active check in readNewLines to respect shutdown state
- readNewLines now returns Promise<void> for awaitable reads
Generated with AI
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
---------
Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Cron prompts are rendered as `● Cron: …` notifications since the
distinct Cron message type was added; the interactive tests still
waited for the legacy `> …` user-message line and timed out.
* feat(core): add path-based context rule injection from .qwen/rules/
Support multiple rule files in `.qwen/rules/` directories with optional
YAML frontmatter for conditional loading based on glob patterns.
Rules with a `paths:` field only load when matching files exist in the
project. Rules without `paths:` always load as baseline rules.
Key behaviors:
- Global rules from ~/.qwen/rules/ always load
- Project rules from <root>/.qwen/rules/ require folder trust
- HTML comments stripped to save tokens
- Files sorted alphabetically for deterministic ordering
- Deduplication when project root equals home directory
- Uses globIterate for early termination on first match
* feat(core): align rules loading with Claude Code reference implementation
Closes three gaps with Claude Code's .claude/rules/ feature:
1. Recursive directory scanning — .qwen/rules/ now supports subdirectories
like frontend/, backend/ for organized rule hierarchies.
2. Exclusion patterns — new `contextRuleExcludes` config parameter accepts
glob patterns to skip specific rule files (useful in monorepos with
other teams' rules).
3. Turn-level lazy loading — conditional rules (with `paths:` frontmatter)
are no longer injected eagerly at session start. Instead, they are
stored in a per-session ConditionalRulesRegistry and injected on-demand
via <system-reminder> when the model reads/edits a matching file
(read_file, edit, write_file). Each rule is injected at most once per
session.
Internals:
- loadRules() now returns { content, ruleCount, conditionalRules } — only
baseline rules flow into the system prompt; conditional rules are
deferred.
- ConditionalRulesRegistry pre-compiles picomatch matchers for efficiency
and tracks injected rules to avoid duplicate injection.
- coreToolScheduler.ts injects matched rules after PostToolUse hooks but
before the tool response is sent to the model.
- Path matching defensively rejects files outside the project root.
- /memory refresh and /directory add keep the registry in sync via
setConditionalRulesRegistry().
* fix(core): correct field placement in config.test.ts mocks after merge
Earlier replace_all inserted ruleCount/conditionalRules/projectRoot
into the wrong mock call (readAutoMemoryIndex instead of
loadServerHierarchicalMemory), breaking the build with syntax errors.
Move the fields back to the correct mocked return value.
* fix(core): normalize rule display paths to forward slashes for Windows
On Windows, path.relative() returns backslash-separated paths, causing
the "Rule from:" marker to differ from Linux/macOS and breaking the
formats-rules-with-source-markers test on Windows CI.
Normalize to forward slashes for cross-platform consistency, matching
the convention used in glob patterns (paths: field) so that the model
sees the same format regardless of the host OS.
* fix(core): harden rulesDiscovery path checks and sort determinism
Two small defensive improvements surfaced by the audit:
1. matchAndConsume now rejects the exact '..' relative path in addition
to '../'-prefixed paths. path.relative returns '..' (no trailing
slash) when the target equals the parent of projectRoot — rare in
practice but worth guarding against.
2. loadRulesFromDir now uses Array.sort() default (UTF-16 code point
comparison) instead of localeCompare. The previous sort was
locale-dependent and could produce different rule loading order on
machines with non-English locales (e.g. zh-CN). Rule filenames are
typically ASCII so behaviour is unchanged in common cases, but
deterministic ordering is preferable across environments.
Adds one test case for the '..' rejection path.
* fix(core): address CodeQL incomplete HTML comment sanitization
stripHtmlComments only matched complete <!-- ... --> pairs in a single
pass, so input like 'A<!-- one --><!-- two -->B<!--unclosed' would
leave a residual '<!--' marker — flagged by CodeQL as
incomplete-multi-character-sanitization.
Not a security issue in our context (the output goes to an LLM system
prompt, not an HTML renderer), but worth fixing to:
- clear the CodeQL alert in CI
- avoid token waste from dangling markers
- produce deterministic output
Strategy: iteratively strip <!-- ... --> pairs until stable, then
remove any residual <!-- markers (leaving the following content
visible since the author probably intended it to appear in the rule).
* feat(core): add run_in_background support for Agent tool
Enable sub-agents to run asynchronously via `run_in_background: true`
parameter. Background agents execute independently from the parent,
which receives an immediate launch confirmation and continues working.
A notification is injected into the parent conversation when the
background agent completes.
Key changes:
- BackgroundTaskRegistry tracks lifecycle of background agents
- Agent tool gains async execution path with fire-and-forget semantics
- Background agents use YOLO approval mode to prevent deadlock
- Independent AbortControllers survive parent ESC cancellation
- CLI bridges notifications via useMessageQueue for between-turn delivery
- State race guards prevent complete/fail after cancellation
- Session cleanup aborts all running background agents
* feat(background): improve notification formatting and UI handling
- Add prefix/separator protocol to distinguish background notifications from user input
- Show concise summary in UI while sending full details to LLM
- Add 'notification' history item type with specialized display
- Add 'background' agent status for background-running agents
- Prevent notifications from polluting prompt history (up-arrow)
- Truncate long descriptions in display text
This improves the UX for background agents by showing cleaner, more concise
notifications while preserving full context for the LLM.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(background): reject run_in_background in non-interactive mode
Headless mode skips AppContainer, so the notification callback is never
registered and background agent results would be silently dropped. Return
an error prompting the model to retry without run_in_background.
* refactor(background): replace prefix/separator protocol with typed notification queue
Replace the stringly-typed \x00__BG_NOTIFY__\x00 prefix/separator
encoding with a typed notification path using SendMessageType.Notification.
- Add SendMessageType.Notification to the enum
- Change BackgroundNotificationCallback to emit (displayText, modelText)
- Move notification queue from AppContainer into useGeminiStream (mirrors
the cron queue pattern): register on registry, queue structured items,
drain on idle via submitQuery
- prepareQueryForGemini short-circuits for Notification type (skips slash
commands, shell mode, @-commands, prompt history logging)
- Remove BACKGROUND_NOTIFICATION_PREFIX/SEPARATOR constants
* refactor(background): move abortAll to Config.shutdown
Background agent cleanup belongs in Config.shutdown() alongside other
resource teardown (skillManager, toolRegistry, arenaRuntime), not in
AppContainer's registerCleanup. This also ensures headless mode gets
cleanup for free.
* fix(background): persist notification items for session resume
Background agent notifications were missing after session resume because
they were never recorded in the chat history. The model text was absent
from the API history and the display item was lost.
- Add recordNotification() to ChatRecordingService — stores as user-role
message with subtype 'notification' and displayText payload
- Thread notificationDisplayText through submitQuery → sendMessageStream
- Restore as HistoryItemNotification in resumeHistoryUtils
* fix(background): replace YOLO with deny-by-default for background agents
Background agents were using YOLO approval mode which auto-approves all
tool calls — too permissive. Replace with shouldAvoidPermissionPrompts
which auto-denies tool calls that need interactive approval, matching
claw-code's approach.
The permission flow for background agents is now:
1. L3/L4 permission rules (allow/deny) — same as foreground
2. Approval mode overrides (AUTO_EDIT for edits) — same as foreground
3. PermissionRequest hooks — can override the denial
4. Auto-deny — if no hook decided, deny because prompts are unavailable
* fix(background): add missing getBackgroundTaskRegistry mock in useGeminiStream tests
* refactor(core): move fork subagent params from execute() to construction time
Identity-shaping fork inputs (parent history, generationConfig, tool decls,
env-skip flag) were threaded through `AgentHeadless.execute()`'s options bag
and re-passed by the SubagentStop hook retry loop. They belong on the agent's
construction-time configs, not its per-invocation options.
- PromptConfig gains `renderedSystemPrompt` (verbatim, bypasses templating
and userMemory injection) and drops the `systemPrompt`/`initialMessages`
XOR so fork can carry both. createChat skips env bootstrap when
`initialMessages` is non-empty.
- AgentHeadless.execute() shrinks to (context, signal?). Fork dispatch in
agent.ts builds synthetic PromptConfig/ModelConfig/ToolConfig from the
parent's cache-safe params and calls AgentHeadless.create directly
(bypassing SubagentManager). Parent's tool decls flow through verbatim
including the `agent` tool itself for cache parity.
- Recursive-fork prevention switches from fork-side tool stripping to a
runtime guard. The previous `isInForkChild(history)` helper was dead
code (it scanned the main GeminiClient's history, not the fork child's
chat). Replaced with `isInForkExecution()` backed by AsyncLocalStorage:
the fork's background execution runs inside `runInForkContext`, and the
ALS frame propagates through the standard async chain into nested
AgentTool.execute() calls where the guard fires.
* refactor(core): move agent tool files into dedicated tools/agent/ directory
Move agent.ts, agent.test.ts, and fork-subagent.ts under
tools/agent/ and update all import paths accordingly.
* refactor(core): remove dead temp and top_p fields from ModelConfig
These fields were never populated from subagent frontmatter and served
no purpose in the fork path either. The ModelConfig interface retains
only the actively-used model field.
* refactor(core): read parent generation config directly instead of getCacheSafeParams
Fork subagent now reads system instruction and tool declarations from
the live GeminiChat via getGenerationConfig() instead of the global
getCacheSafeParams() snapshot. This removes the cross-module coupling
between the agent tool and the followup infrastructure.
* fix(core): prevent duplicate tool declarations when toolConfig has only inline decls
prepareTools() treated asStrings.length === 0 as "add all registry tools",
which is correct when no tools are specified at all, but wrong when the
caller provides only inline FunctionDeclaration[] (no string names). The
fork path passes parent tool declarations as inline decls for cache parity,
so prepareTools was adding the full registry set on top — duplicating every
non-excluded tool.
Add onlyInlineDecls.length === 0 to the condition so that pure-inline
toolConfigs bypass the registry entirely.
* feat(core): support agent-level `background: true` in frontmatter
Subagent definitions can now declare `background: true` in their YAML
frontmatter to always run as background tasks. This is OR'd with the
`run_in_background` tool parameter — useful for monitors, watchers, and
proactive agents so the LLM doesn't need to remember to set the flag.
* fix(core): address background subagent lifecycle gaps
- Inherit bgConfig from agentConfig so the resolved approval mode is
preserved for background agents (foreground would run AUTO_EDIT but
background fell back to DEFAULT, which combined with shouldAvoid-
PermissionPrompts would auto-deny every permission request).
- Honor SubagentStop blocking decisions in background runs by looping
on hook output up to 5 iterations, matching runSubagentWithHooks.
- Check terminate mode before reporting completion; non-GOAL modes
(ERROR, MAX_TURNS, TIMEOUT) are now reported as failures instead of
emitting a success notification for an incomplete run.
- Exclude SendMessageType.Notification from the UserPromptSubmit hook
guard so background completion messages are not rewritten or blocked
as if they were user input.
* feat(cli): headless support and SDK task events for background agents (#3379)
* feat(cli): unify notification queue for cron and background agents
Migrate cron from its own queue (cronQueueRef / cronQueue) to the shared
notification queue used by background agents. Both producers now push the
same item shape { displayText, modelText, sendMessageType } and a single
drain effect / helper processes them in FIFO order.
Cron fires render as HistoryItemNotification (● prefix) instead of
HistoryItemUser (> prefix), with a "Cron: <prompt>" display label.
Records use subtype 'cron' for clean resume and analytics separation.
Lift the non-interactive rejection for background agents. Register a
notification callback in nonInteractiveCli.ts with a terminal hold-back
phase (100ms poll) that keeps the process alive until all background
agents complete and their notifications are processed.
* feat(cli): emit SDK task events for background subagents
Emit `task_started` when a background agent registers and
`task_notification` when it completes, fails, or is cancelled, so
headless/SDK consumers can track lifecycle without parsing display
text. Model-facing text is now structured XML with status, summary,
truncated result, and usage stats. Completion stats (tokens, tool
uses, duration) are captured from the subagent and included in both
the SDK payload and the model XML.
* fix: address codex review issues for background subagents
- Background subagents now inherit the resolved approval mode from
agentConfig instead of the raw session config, so a subagent with
`approvalMode: auto-edit` (or execution in a trusted folder) keeps
that override when it runs asynchronously.
- Non-interactive cron drains are single-flight: concurrent cron fires
now await the same in-flight drain, and the cron-done check gates
on it, preventing the final result from being emitted while a cron
turn is still streaming.
- Background forks go through createForkSubagent so they retain the
parent's rendered system prompt and inherited history instead of
degrading to a plain FORK_AGENT.
* fix(cli): restore cancellation, approval, and error paths in queued drain
- Hold-back loop now reacts to SIGINT/SIGTERM: when the main abort
signal fires it calls registry.abortAll() so background agents with
their own AbortControllers stop promptly instead of pinning the
process open.
- Queued-turn tool execution forwards the stream-json approval update
callback (onToolCallsUpdate) so permission-gated tools inside a
background-notification follow-up emit can_use_tool requests.
- Queued-turn stream loop mirrors the main loop's text-mode handling
of GeminiEventType.Error, writing to stderr and throwing so provider
errors produce a non-zero exit code instead of silently succeeding.
- Interactive cron prompts go through the normal slash/@-command/shell
preprocessing again; only Notification messages skip that path.
* fix(cli): skip duplicate user-message item for cron prompts
Cron prompts already render as a `● Cron: …` notification via the queue
drain, so adding them again as a `USER` history item produced a
duplicate `> …` line.
* fix(cli): honor SIGINT/SIGTERM during cron scheduler wait
The non-interactive cron phase awaits a Promise that resolves only when
scheduler.size reaches 0 and no drain is in flight. Recurring cron jobs
never drop the scheduler size to 0 on their own, so the previous abort
handling (added to the hold-back loop) was unreachable — the process
hung indefinitely after SIGINT/SIGTERM. Attach an abort listener inside
the promise so abort stops the scheduler and resolves immediately,
allowing the hold-back loop to run and the process to exit cleanly.
* feat(core): propagate tool-use id through background agent notifications
Plumb the scheduler's callId into AgentToolInvocation via an optional
setCallId hook on the invocation, detected structurally in
buildInvocation. The agent tool forwards it as toolUseId on the
BackgroundTaskRegistry entry so completion notifications can carry a
<tool-use-id> tag and SDK task_started / task_notification events can
emit tool_use_id — letting consumers correlate background completions
back to the original Agent tool-use that spawned them.
* fix(cli): drain single-flight race kept task_notification from emitting
drainLocalQueue wrapped its body in an async IIFE and cleared the
promise reference via finally. When the queue is empty the IIFE has
no awaits, so its finally runs synchronously as part of the RHS of
the assignment `drainPromise = (async () => {...})()` — clearing
drainPromise BEFORE the outer assignment overwrites it with the
resolved promise. The reference then stayed stuck on that fulfilled
promise forever, so later calls short-circuited through
`if (drainPromise) return drainPromise` and never processed
queued notifications.
Symptom: in headless `--output-format json` (and `stream-json`),
task_started emitted but task_notification never did, even after
the background agent completed. The process sat in the hold-back
loop until SIGTERM.
Fix: move the null-clearing out of the async body into an outer
`.finally()` on the returned promise. `.finally()` runs as a
microtask after the current synchronous block, so it clears the
latest drainPromise reference instead of the pre-assignment null.
* fix(cli): append newline to text-mode emitResult so zsh PROMPT_SP doesn't erase the line
Headless text mode wrote `resultMessage.result` without a trailing newline.
In a TTY, zsh themes that use PROMPT_SP (powerlevel10k, agnoster, …) detect
the missing `\n` and emit `\r\033[K` before drawing the next prompt, which
wipes the final line off the screen. Pipe-captured output was unaffected,
so the bug only surfaced for interactive shell users — most visibly in the
background-agent flow where the drain-loop's final assistant message is
the *only* stdout write in text mode.
Append `\n` to both the success (stdout) and error (stderr) writes.
* docs(skill): tighten worked-example blurb in structured-debugging
Mirror the simplified blurb from .claude/skills/structured-debugging/SKILL.md
(knowledge repo). Drops the round-by-round narrative; keeps the contradiction
+ two lessons.
* docs(skill): mirror SKILL.md improvements (reframing failure mode, generalized path, value-logging guidance)
Mirror of knowledge repo commit 38eb28d into the qwen-code .qwen/skills
copy.
* docs(skill): mirror worked example into .qwen/skills/structured-debugging/
Mirrors knowledge/.claude/skills/structured-debugging/examples/
headless-bg-agent-empty-stdout.md so the .qwen copy of the skill links
resolve.
* docs(skill): mirror generalized side-note path guidance
* fix(cli): harden headless cron and background-agent failure paths
Three regressions surfaced by Codex review of feat/background-subagent:
- Cron drain rejections were dropped by a bare `void`, so a failing
queued turn left the outer Promise unresolved and hung the run. Route
drain failures through the Promise's reject so they propagate to the
outer catch.
- The background-agent registry entry was inserted before
`createForkSubagent()` / `createAgentHeadless()` was awaited. Failed
init returned an error from the tool call but left a phantom `running`
entry, and the headless hold-back loop (`registry.getRunning()`) waited
forever. Register only after init succeeds.
- SIGINT/SIGTERM during the hold-back phase aborted background tasks,
then fell through to `emitResult({ isError: false })`, so a cancelled
`qwen -p ...` exited 0 with the prior assistant text. Route through
`handleCancellationError()` so cancellation exits non-zero, matching
the main turn loop.
* test(cli): update stdout/stderr assertions for trailing newline
`feadf052f` appended `\n` to text-mode `emitResult` output, but the
nonInteractiveCli tests still asserted the pre-change strings. Update
the 11 affected assertions to expect the trailing newline.
* fix: address review comments on background-agent notifications
Four additional issues from the PR review that the prior regression-fix
commit didn't cover:
- Escape XML metacharacters when interpolating `description`, `result`,
`error`, `agentId`, `toolUseId`, and `status` into the task-notification
envelope. Subagent output (which itself may carry untrusted tool output,
fetched HTML, or another agent's notification) could contain
`</result>` or `</task-notification>` and forge sibling tags the parent
model would treat as trusted metadata. Truncate result text *before*
escaping so the truncation never slices through an entity like `&`.
- Emit the terminal notification from `cancel()` and `abortAll()`. The
fire-and-forget `complete()`/`fail()` from the subagent task is guarded
by `status !== 'running'` and was no-op'd after cancellation, so SDK
consumers saw `task_started` with no matching `task_notification`,
breaking the contract this PR establishes. Updated two race-guard
tests that asserted the old behavior.
- Call `adapter.finalizeAssistantMessage()` before the abort-triggered
early return inside `drainOneItem`'s stream loop. Without it,
`startAssistantMessage()` had already been called, so stream-json mode
left `message_start` unpaired.
- Enforce `config.getMaxSessionTurns()` in `drainOneItem` for symmetry
with the main turn loop. Cron fires and notification replies otherwise
bypass the budget cap in headless runs.
* fix: address codex review comments for background subagents
- Wrap background fork execute() in runInForkContext so the
recursive-fork guard (AsyncLocalStorage-based) fires when a
background fork's child model calls `agent` again. Previously only
the foreground fork path was wrapped, so background forks could
spawn nested implicit forks.
- Emit queued terminal task_notifications on SIGINT/SIGTERM before
handleCancellationError exits. abortAll() enqueues cancellation
notifications via the registry callback, but the process was
exiting before the drain loop had a chance to flush them — leaving
stream-json consumers that already saw task_started without a
matching terminal task_notification. Extracted the SDK-emit block
into a shared emitNotificationToSdk helper reused by the normal
drain and the cancellation flush.
- Skip notification/cron subtypes in ACP HistoryReplayer. These
records are persisted as type: 'user' so the model's chat history
keeps them for continuity, but they were never user input —
replaying them leaked raw <task-notification> XML (and cron
prompts) back into the ACP session as if the user typed them.
* test(cli): sync JsonOutputAdapter text-mode assertions with trailing newline
Commit 0da1182b7 appended a newline to text-mode emitResult output
(zsh PROMPT_SP fix) and updated the nonInteractiveCli tests, but
four assertions in JsonOutputAdapter.test.ts were missed. Update
them to expect the trailing newline so CI passes.
* refactor: simplify background subagent plumbing
- Extract the SubagentStop hook blocking-decision loop into a
runSubagentStopHookLoop helper so the foreground and background
paths no longer duplicate the iteration/abort/log scaffolding.
- Unify BackgroundTaskRegistry.abortAll to delegate to cancel,
removing copy-pasted abort/notification bookkeeping.
- Drop the unused findByName and BackgroundAgentEntry.name field.
- In nonInteractiveCli drain, hoist inputFormat and
toolCallUpdateCallback out of the inner tool loop, and drop the
unreachable try/catch around the readonly registry.
- Trim boilerplate doc/narration comments while keeping load-bearing
WHY comments.
* fix: address codex review comments for background subagents
- Use tool callId (or short random suffix) instead of Date.now() for
background agentIds; avoids registry collisions when parallel
same-type agents launch in the same millisecond.
- Reset loopDetector and lastPromptId for Notification turns so a
prior turn's loop count doesn't trip LoopDetected on the
notification response.
- Replay notification/cron displayText in ACP HistoryReplayer so
the assistant reply has an antecedent in resumed transcripts.
---------
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* refactor(core): move fork subagent params from execute() to construction time
Identity-shaping fork inputs (parent history, generationConfig, tool decls,
env-skip flag) were threaded through `AgentHeadless.execute()`'s options bag
and re-passed by the SubagentStop hook retry loop. They belong on the agent's
construction-time configs, not its per-invocation options.
- PromptConfig gains `renderedSystemPrompt` (verbatim, bypasses templating
and userMemory injection) and drops the `systemPrompt`/`initialMessages`
XOR so fork can carry both. createChat skips env bootstrap when
`initialMessages` is non-empty.
- AgentHeadless.execute() shrinks to (context, signal?). Fork dispatch in
agent.ts builds synthetic PromptConfig/ModelConfig/ToolConfig from the
parent's cache-safe params and calls AgentHeadless.create directly
(bypassing SubagentManager). Parent's tool decls flow through verbatim
including the `agent` tool itself for cache parity.
- Recursive-fork prevention switches from fork-side tool stripping to a
runtime guard. The previous `isInForkChild(history)` helper was dead
code (it scanned the main GeminiClient's history, not the fork child's
chat). Replaced with `isInForkExecution()` backed by AsyncLocalStorage:
the fork's background execution runs inside `runInForkContext`, and the
ALS frame propagates through the standard async chain into nested
AgentTool.execute() calls where the guard fires.
* refactor(core): move agent tool files into dedicated tools/agent/ directory
Move agent.ts, agent.test.ts, and fork-subagent.ts under
tools/agent/ and update all import paths accordingly.
* refactor(core): remove dead temp and top_p fields from ModelConfig
These fields were never populated from subagent frontmatter and served
no purpose in the fork path either. The ModelConfig interface retains
only the actively-used model field.
* refactor(core): read parent generation config directly instead of getCacheSafeParams
Fork subagent now reads system instruction and tool declarations from
the live GeminiChat via getGenerationConfig() instead of the global
getCacheSafeParams() snapshot. This removes the cross-module coupling
between the agent tool and the followup infrastructure.
* fix(core): prevent duplicate tool declarations when toolConfig has only inline decls
prepareTools() treated asStrings.length === 0 as "add all registry tools",
which is correct when no tools are specified at all, but wrong when the
caller provides only inline FunctionDeclaration[] (no string names). The
fork path passes parent tool declarations as inline decls for cache parity,
so prepareTools was adding the full registry set on top — duplicating every
non-excluded tool.
Add onlyInlineDecls.length === 0 to the condition so that pure-inline
toolConfigs bypass the registry entirely.
* refactor(core): remove dead temp and skipEnvHistory fields from AgentPathParams
These fields were carried over from earlier designs but have no remaining
effect after the fork subagent refactor:
- `temp` was never forwarded into ModelConfig, which this PR already
stripped of the temperature field.
- `skipEnvHistory` is redundant with the auto-skip in `AgentCore.createChat`,
which already bypasses env bootstrap whenever `initialMessages` is
non-empty — the condition under which any caller would set this flag.
Also drops the corresponding `skipEnvHistory: true` at the one caller in
the memory extraction planner.
* fix(core): add shell argument quoting guidance to prevent special char errors
When models pass arguments containing special characters (parentheses,
backticks, single quotes, dollar signs, etc.) to shell commands like
`gh pr create --body '...'`, bash can misinterpret them as shell syntax,
causing the command to fail with cryptic errors.
Add an explicit quoting guide to `getShellToolDescription()` covering:
- Single quotes: pass everything literally but cannot contain `'`
- ANSI-C quoting (`$'...'`): supports escape sequences including `\'`
- Heredoc: the most robust approach for multi-line or mixed-quote text,
with a concrete `gh pr create` example
Fixes#3300
* test: update shell tool description snapshots
* docs: update authentication methods to reflect OAuth discontinuation
Remove deprecated Qwen OAuth references and update documentation to
direct users to valid authentication methods (API Key, Coding Plan,
or Local Inference) following the OAuth free tier discontinuation on
2026-04-15.
Closes#3316
* docs: fix quickstart auth description to match actual /auth UI
The /auth command shows three options: Alibaba Cloud Coding Plan,
API Key, and Qwen OAuth (discontinued). Updated quickstart.md to
accurately reflect this UI instead of splitting into Option A/B/C.
Also updated settings.md, commands.md, and troubleshooting.md with
minor OAuth-related cleanups.
* docs: update .qwen workspace description in quickstart
Remove reference to 'Qwen account' since OAuth is discontinued.
The .qwen directory is created by Qwen Code itself for storing
credentials, configuration, and session data.
* docs: fix warning block formatting in quickstart
- Add missing '>' continuation for the OAuth discontinuation warning block
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* docs: update README Qwen3.6-Plus description
- Remove mention of running Qwen3.6-Plus locally via Ollama/vLLM
- Keep only the Alibaba Cloud ModelStudio API key option
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* docs: address review feedback - remove Local Inference from auth, add dual-region links
- Local Inference removed from auth method lists, kept as separate
'Local Model Setup' section with detailed Ollama/vLLM config examples
- All links now provide dual-region URLs (Beijing + intl)
- .qwen workspace note restored to original meaning (cost tracking)
- Device auth flow error kept scoped to legacy OAuth
- API setup guide links updated with confirmed intl URL
---------
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
When switching models mid-session, reasoning_content fields from
thinking-capable models leaked into API requests sent to the new
provider, causing 422 errors on strict OpenAI-compatible endpoints.
Call stripThoughtsFromHistory() in handleModelChange() so thought parts
are removed before the next request is built for the new model.
* fix(core): limit skill watcher depth to prevent FD exhaustion (#3289)
The chokidar file watcher in SkillManager.updateWatchersFromCache() had
no depth limit or ignored paths. When skill directories contained heavy
subtrees like node_modules, chokidar recursively watched every file,
exhausting file descriptors and breaking child-process I/O (node-pty
onData/onExit callbacks silently stop firing).
Fix: set depth to 2 (skills use a fixed <skill-name>/SKILL.md layout)
and add an ignored function that filters out special file types (sockets,
FIFOs, devices) and .git directories.
Made-with: Cursor
* fix(core): use path.join in watcher test for Windows compat
The watcherIgnored test used hardcoded forward-slash paths which don't
split correctly on Windows where path.sep is backslash.
Made-with: Cursor