qwen-code

mirror of https://github.com/QwenLM/qwen-code.git synced 2026-05-19 07:54:38 +00:00

Author	SHA1	Message	Date
wenshao	1bb1326904	fix(cli): address Copilot second-pass review — dialogOpen tick gate, isFocused doc Two further findings from Copilot: 1. Interval kept ticking while bg-tasks dialog was open. The first-pass fix already torn the tick down once all rows expired from the visibility window, but `dialogOpen` was a separate reason the panel returned null and was missed by the gate. Add `dialogOpen` to the useEffect deps and short-circuit when true, so the dialog's tenure is interval-free. 2. Stale `isFocused` doc comment in `ToolMessageProps`. The comment claimed the prop controlled the now-retired `Ctrl+E / Ctrl+F` display shortcuts (those died with the inline `AgentExecutionDisplay` frame). Rewrite the comment to describe the only remaining behavior — the focus-routed approval surface (focus-holder banner vs. queued-sibling marker). Coverage delta: +1 case on `LiveAgentPanel.test.tsx` — `tears the 1s tick down when the bg-tasks dialog opens` advances 60s of fake time with `dialogOpen=true` and asserts no panel state drift.	2026-05-07 19:30:14 +08:00
wenshao	e9b76989c0	fix(cli): address Copilot review on LiveAgentPanel — interval gate, ghost rows, dead doc Three findings from Copilot's PR review: 1. 1s interval kept ticking forever after expiry. The gate was `entries.some(isAgentEntry)`, but `BackgroundTaskRegistry.getAll()` retains terminal entries indefinitely — once the last visible row passed its 8s window the panel returned null but the interval kept firing setNow each second, churning re-renders for nothing on screen. The gate now considers visibility (running / paused OR terminal-within-window) and the interval clears itself once the condition flips false. New entries restart the interval via the `entries` dep. 2. Ghost rows when registry forgets an entry. The live re-pull fell back to the snapshot when `registry.get()` returned undefined. The canonical case is a foreground subagent that unregisters silently after its statusChange fires (`unregisterForeground` deletes without emitting a follow-up transition) — the snapshot still says `running`, so the row would never clear. Trust the registry: when it says the entry is gone, drop the row. The snapshot-only fallback is preserved for the no-Config case (test fixtures). 3. Dead doc reference. The trailing comment in `subagents/index.ts` pointed at `docs/comparison/subagent-display-deep-dive.md`, which doesn't exist in this repo (it lives in the codeagents knowledge base, not qwen-code). Dropped the dead pointer; the in-tree pointers to `LiveAgentPanel` and `BackgroundTasksDialog` already tell readers where to look. Coverage delta: +2 cases on `LiveAgentPanel.test.tsx` — `drops snapshot rows the live registry no longer knows about` (issue #2) and `still shows the snapshot when no Config is mounted` (locks in the test-fixture fallback that issue #2's stricter rule would otherwise have broken).	2026-05-07 19:25:26 +08:00
wenshao	b8acd9c39b	fix(cli): make Down on focused BackgroundTasksPill open the dialog The focus chain Composer → AgentTabBar → BackgroundTasksPill is walked with the Down arrow, but Down dead-ended at the pill — the pill only opened the dialog on Enter. Users who followed the LiveAgentPanel's "(↓ to view all)" overflow callout reached the highlighted pill and got stuck there, defeating the hint. Route Down on the focused pill into openDialog so the chain completes naturally: Composer ↓ → AgentTabBar ↓ → Pill ↓ → Dialog. Enter still works, so existing muscle memory keeps functioning.	2026-05-07 19:21:19 +08:00
wenshao	bfecc094c8	fix(cli): widen LiveAgentPanel, drop [in turn] marker, point overflow at dialog Three usability fixes from review: 1. Use `terminalWidth` instead of `mainAreaWidth`. The latter is capped at 100 cols (intended for markdown / code where soft-wrap matters), which on a 200-col terminal left half the screen empty to the right of an already-truncating row. Live progress lines have nothing to soft-wrap, so the panel wants the full width. 2. Drop the `[in turn]` foreground marker. The flavor distinction matters in BackgroundTasksDialog (cancel semantics differ for foreground vs background entries) but in the glance panel the marker reads as cryptic noise — users asked what it meant. Keep the dialog as the surface that surfaces it. 3. Annotate the overflow callout with `(↓ to view all)`. The panel is intentionally read-only (it has no keyboard focus so it can't steal input from the composer), so when the roster outgrows the row budget we point users at the existing dialog — same keystroke the footer pill uses, kept in sync so users only learn one gesture.	2026-05-07 18:20:20 +08:00
wenshao	860c4ee212	fix(cli): port Claude Code's bullet + arrow visual to LiveAgentPanel rows Adopt the leaked CoordinatorTaskPanel visual conventions: - `○` replaces `⊷` for live (running) slots — matches Claude Code's use of `figures.circle` for the active-agent bullet, gives a uniform list look across the running roster. Terminal states keep distinct check / cross marks (✔ / ✖) so they're easy to scan at a glance. - `▶` separates the description from elapsed / tokens, mirroring Claude's `PLAY_ICON` suffix marker. - Activity is wrapped in `( ... )` so it reads as an annotation on the description rather than a sibling field, and the type prefix switches from ` · ` to `: ` (e.g. `editor: tighten import order`) to match Claude's `name: description` pattern. The two-column flex layout from layout C is preserved — left column flex-shrink:1 with truncate-end, right column (` ▶ Ns · Nk tokens`) flex-shrink:0 so elapsed + tokens are never clipped, regardless of how long the description / activity grows. This is the one intentional divergence from Claude's literal pattern, which puts elapsed at the row tail without pinning and lets it disappear off narrow terminals. Verified at 60 / 100 / 200 cols: at 200 cols the row is flush with no internal gap; at 60 cols the description / activity tail truncates with `…` while elapsed + tokens stay visible on the right.	2026-05-07 18:07:55 +08:00
wenshao	2d3e07ef9b	fix(cli): switch LiveAgentPanel row to layout C (right-pinned, no flex-grow) Iterating on the row layout based on visual review: - Layout A (time first, single Text) put numbers ahead of identity, which broke the natural left-to-right reading order. - Layout B (right-pinned with flex-grow:1 on left) puffed the left column out to fill the row, leaving a visible gap between the description tail and the right-pinned elapsed when the terminal was wider than the content. Layout C keeps the right column flex-shrink:0 so elapsed + tokens are never clipped, but DROPS flex-grow on the left so the two columns sit side-by-side: empty slack falls off the row tail (invisible) instead of opening a gap inside the row. Identity (type) and intent (description / activity) read first, cost reads last — matching the natural visual hierarchy. When the row overflows the panel width the left column truncates with `…` mid-row, while elapsed + tokens stay intact. Verified at 60 / 100 / 200 cols — at 200 cols there is no internal gap and no trailing ellipsis; at 60 cols time + tokens stay visible on the right and the description / activity tail truncates with `…`.	2026-05-07 18:04:14 +08:00
wenshao	474423e814	fix(cli): move elapsed + tokens to the front of LiveAgentPanel rows The two-column layout used `flex-grow:1` on the left description column, which puffed it out to fill the row even when content was short — leaving a visible gap between the description tail and the right-pinned elapsed/tokens whenever the terminal was wider than the content. Worse, the gap made it look like display space was being wasted while the description still got truncated. Move elapsed + tokens to the front of the row (right after the status icon) so: - Time and cost are pinned at a stable left position and are NEVER at risk of being truncated, regardless of description / activity length. - The row reads as one tight left-to-right line — no flex-grow, no internal gap. On wide terminals the unused width sits at the row tail (invisible), where it belongs. - Description + activity become the truncatable tail; `truncate-end` cuts only when the line genuinely overflows the panel width. Also wrap the icon-plus-spaces span in a template literal so the two spaces of breathing room after the glyph survive a prettier pass. Verified at 60 / 100 / 200 cols: at 200 cols the row renders flush with no trailing ellipsis and no internal gap; at 60 cols the time + tokens stay at the front and the description tail truncates with `…`.	2026-05-07 17:59:25 +08:00
wenshao	bf60e53402	feat(cli): replace inline AgentExecutionDisplay with always-on LiveAgentPanel Surface running subagents in a borderless, always-on roster anchored beneath the input footer (mirrors Claude Code's CoordinatorTaskPanel) and retire the verbose inline `AgentExecutionDisplay` frame whose per-tool-call mutations caused scrollback flicker. Detail / cancel / resume keep flowing through the existing BackgroundTasksDialog. LiveAgentPanel: - Two-column row: status icon + (optional) type + description + activity on the left (truncate-end), elapsed + tokens on the right in a flex-shrink:0 column so the cost/time fields are never hidden by long descriptions. - Re-pulls each agent from BackgroundTaskRegistry on every wall-clock tick so `recentActivities` stays fresh — the snapshot from `useBackgroundTaskView` only refreshes on `statusChange` to keep the footer pill / AppContainer quiet under heavy tool traffic. - Reaches for Config via raw ConfigContext (not useConfig) so the panel degrades to snapshot-only when no provider is mounted (test isolation). - Hides when any dialog is visible (auth / permission / bg tasks) and self-hides when no agent entries are live. - Drops `subagentType` from the row when it is the default `general-purpose` builtin to keep the line uncluttered; specialized types still bold-anchor the row. - Keeps terminal entries on screen for 8s so the user gets feedback when an agent finishes, then they fall off (BackgroundTasksDialog retains them long-term). Inline frame retirement: - ToolMessage's SubagentExecutionRenderer collapses to the focus- routed approval surfaces only (focus-holder banner + queued marker). All other agent state is owned by the panel + dialog. - AgentExecutionDisplay.tsx + test removed (-918 lines); the subagents/index export is dropped with a pointer to the new surfaces. Net diff: +97 / -1069.	2026-05-07 17:46:24 +08:00
jinye	b1ec8d64c7	fix(cli): warn on ignored provider generation config (#3883 ) * fix(cli): warn on ignored provider generation config Warn when a selected provider model has top-level model.generationConfig fields that will not apply because the provider entry is sealed. Document the required local-model placement for contextWindowSize and related generation config fields. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): address provider config warning review Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): narrow provider config warning fields Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-07 14:46:23 +08:00
易良	f468cb61a0	ci: add Qwen Code issue follow-up bot workflow (#3854 ) * ci: add issue follow-up bot workflow * ci: gate issue follow-up bot rollout * ci: reduce issue follow-up batch size * ci: address issue follow-up bot review * ci: add temporary issue bot canary * ci: fix canary verification * ci: dedupe bot token issue comments * ci: remove temporary issue bot canary * ci: avoid repeated issue bot followups * ci: simplify issue follow-up bot prompt * ci: refine issue follow-up bot flow * ci: harden issue follow-up bot workflow * ci: harden issue follow-up bot rollout * ci: enforce follow-up bot dry-run writes * ci: redact blocked bot command args * ci: lock follow-up bot gh wrapper to current repo - Require explicit `--repo <expected>` on every gh command path; reject any --repo value that does not match REPOSITORY/GITHUB_REPOSITORY so a poisoned issue body cannot redirect bot writes to another repo. - Add OPENAI_BASE_URL to the secret-scrubbing list so an internal proxy URL is not echoed into comments or labels. - Print the resolved DISPATCH_DRY_RUN / ISSUE_OPENED_DRY_RUN / SCHEDULE_DRY_RUN inputs alongside the resolved dry_run state for easier debugging of automatic vs manual paths. * ci: tighten follow-up bot wrapper and trim noise - Fold the repo-match check into validate_issue_edit_args / validate_issue_comment_args; replace the standalone require_explicit_repo with a smaller require_repo_match used only by the read-only paths. - Add an upfront guard that fails fast when expected_repo is unset, and document the positional subcommand match. - Drop the configuration-notice job: it warned on every issues.opened and cron tick when QWEN_ISSUE_FOLLOWUP_BOT_ENABLED was unset, which is the default state. - Remove the redundant BOT_GITHUB_TOKEN re-export at step level (already inherited from the workflow env). - Invert the dry_run resolution so every branch starts from `true` and flips to `false` on explicit opt-in, removing the duplicate assignments. - Collapse the multi-line dry-run debug block into a single state line. - Note in the prompt that global flags and short aliases (`-b`, `-F`) are rejected by the runner so the model only emits long-form gh commands. * ci: fix shim reject logs to include full subcommand context Agent-Logs-Url: https://github.com/QwenLM/qwen-code/sessions/1cf8097d-b747-4838-a206-63a11352facc Co-authored-by: yiliang114 <11473889+yiliang114@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yiliang114 <11473889+yiliang114@users.noreply.github.com>	2026-05-07 13:52:52 +08:00
Shaojin Wen	018184b571	fix(core): stop per-subagent ToolRegistry on the foreground-fork path (#3887 ) Follow-up to #3873 review: the foreground-fork branch in `agent.ts` fires the fork body via `void runInForkContext(runFramedFork)` and returns the placeholder result synchronously, with no try/finally around the fork body. The other three spawn paths (foreground non-fork, background fork, background non-fork) added in #3873 already stop the per-subagent ToolRegistry in their finally blocks — this one was missed, so any AgentTool / SkillTool the fork's model later instantiated leaked its change-listener on the shared SubagentManager / SkillManager for the rest of the session. Wraps the inner body in `try { await runSubagentWithHooks(...) } finally { void agentConfig.getToolRegistry().stop().catch(() => {}) }` — same shape as the background bgBody finally added in #3873. Adds a regression test in `agent.test.ts` Fork dispatch describe that drains the detached fork body and asserts the stop spy was invoked exactly once.	2026-05-07 13:43:00 +08:00
Shaojin Wen	ddd21b5330	fix(core): address @tanzhenxin's PR-1 review notes (post-merge follow-up to #3842 ) (#3886 ) * fix(core): wrap hasOwnProperty.call inside try + add post-abort PTY data assertion Two non-blocking review notes from @tanzhenxin's PR-1 approval (#3842, post-merge follow-up): - Note 2 (real bug): `getShellAbortReasonKind` had `Object.prototype.hasOwnProperty.call(reason, 'kind')` outside the try/catch. `hasOwnProperty.call` triggers the `[[GetOwnProperty]]` Proxy trap (`getOwnPropertyDescriptor` handler). A Proxy whose `getOwnPropertyDescriptor` throws — separate from a throwing `get` trap, which the prior commit already covered — would propagate past the helper, leaving the abort handler's switch on `kind` to throw through `addEventListener` (which doesn't await async listener return values), so the shell process would stay alive instead of being killed on cancel. Moved the descriptor probe inside the same try block as the value read. My own multi-round audit covered six attack vectors but missed this one — only `get` trap throws were considered. The reviewer caught it on first pass. - Note 3 (test parity): the PTY post-promotion handoff test asserted `dispose` was called but never re-invoked the data callback to verify the foreground `onOutputEvent` actually stops firing — the child_process equivalent has that assertion. Mirrored it: emit data AFTER abort by re-invoking the captured `dataCallback` reference, and assert `onOutputEventMock.mock.calls.length` does NOT increase past the moment of promote. Exercises the production `listenersDetached` guard inside the chain callback, which the bare dispose-was-called check didn't. Note 1 from the same review (the `aborted: true + promoted: true` shape forcing PR-2 callers to check `promoted` before `aborted`) is deliberately NOT addressed here — it's a contract simplification that affects PR-2's branching, so it belongs in PR-2 along with the caller-side decision on whether to flip `aborted` for promoted results. Added a TODO upstream in #3831 (PR-2 design) to track. 70 / 70 tests pass (69 baseline + 1 new helper boundary for the throwing-getOwnPropertyDescriptor case). tsc + ESLint clean. * test(core): fix tautological PTY post-promote assertion (audit follow-up) The PTY post-promotion handoff test added in the previous commit copied the child_process equivalent's pattern verbatim — sync \`expect(count).toBe(countAtPromote)\` immediately after dataCallback returns. That works for child_process because its \`handleOutput\` is fully synchronous (sniff → decoder → emit, all on the same call stack), so the count change happens BEFORE the assertion. PTY's \`handleOutput\` is async — \`processingChain.then(...)\` queues a microtask that does the sniff + write + render-then-emit work. The sync assertion captures both \`countAtPromote\` and the post-emit count BEFORE the chain microtask ever runs, so both reads return whatever happened before the assert (typically 0). The test would tautologically pass even if the production \`listenersDetached\` guard were removed — i.e., it didn't actually verify the guard. Restructured to: 1. Drive the PTY through \`simulateExecution\` so \`await handle.result\` forces all queued microtasks (including pre-promote chain items AND the abort handler's drain) to settle. 2. Capture \`eventCountAfterSettle\` once everything has stabilized. 3. Re-invoke the captured \`dataCallback\` with post-promote data, await two more macrotask boundaries to let the new chain item fully run. 4. Assert the count hasn't moved. If the production \`listenersDetached\` guard is removed, the post-promote chain item emits, count increases past \`eventCountAfterSettle\`, and this assertion fails. So the test actually exercises the guard now. Found in self-audit while reviewing my own follow-up commit. Caught because audit was paranoid about whether the test verifies what it claims to verify, not just whether it passes. 70 / 70 tests pass; tsc + ESLint clean. * test(core): pin eventCountAfterSettle === 0 in PTY post-promote test The previous fix asserted post-promote count equals the \`eventCountAfterSettle\` baseline, but didn't pin the baseline itself. With the production \`listenersDetached\` guard intact, both halves (pre-promote chain and post-promote chain) suppress emit, so \`eventCountAfterSettle === 0 === post-count\` and the relative comparison is vacuously true. If a future refactor changed the production guard semantics so the pre-promote chain item DID emit (count becomes 1+ after settle), the relative-comparison test would still pass as long as post-promote also emitted the same number — that's a regression the test should catch. Adding \`expect(eventCountAfterSettle).toBe(0)\` makes the contract explicit: once \`listenersDetached\` is set during the abort handler's sync part, BOTH the in-flight chain item (pre-promote) and the future chain item (post-promote) skip emit. Found in another self-audit pass — even after fixing the tautological assertion, the test could still mask certain future regressions. 70 / 70 tests pass; tsc + ESLint clean. * test(core): drop dataCallbackHolder pattern, read mock.calls directly The dataCallbackHolder pattern (capturing the onData callback inside simulateExecution then invoking it after via a closure-shared object) was unnecessary indirection — \`mockPtyProcess.onData.mock.calls[0][0]\` reads the same callback reference whether you read it inside or outside the simulation closure. Vi's mock.calls array is per-mock-instance and beforeEach re-creates mockPtyProcess + .onData freshly, so there's no stale-reference risk in the simpler form. No behavior change in what's tested. 70 / 70 pass. * chore(core): drop in-source @-mention attribution + dead Proxy.get handler Two cosmetic cleanups found in another self-audit pass: - Helper comment: removed the parenthetical attribution ("Caught by @tanzhenxin in the PR-1 review; my own audit only covered \`get\` trap throws."). The technical content of the comment — explaining why both the descriptor probe and the value read live inside the try — stands on its own. The reviewer credit lives in commit history / PR description, where it belongs; an in-source @-mention ages poorly (handles change, the relevant person may move on) and doesn't help future readers reason about the code. - Test Proxy: \`throwingDescriptorProxy\` declared a \`get()\` handler that always returned \`undefined\`. The descriptor probe throws before the helper ever reaches the value read, so the \`get\` handler is unreachable — dropped it and added a one-line comment explaining why no \`get\` handler is needed for this test. Mirror test (\`throwingReason\` with throwing accessor + \`proxyReason\` with throwing \`get\`) keeps the symmetric "throwing-`get`" coverage. 70 / 70 tests pass; tsc + ESLint clean.	2026-05-07 13:42:42 +08:00
Shaojin Wen	93139d0eb0	docs(cli): document new banner customization settings (#3885 ) Add ui.customBannerTitle, ui.customBannerSubtitle, and ui.customAsciiArt to the user-facing settings table. Also reword ui.hideBanner to note that it covers both the logo column and the info panel and that Tips render independently. These settings landed in #3710 but only ui.hideBanner was listed in the table, so users had no way to discover the other three short of reading the schema or the design doc.	2026-05-07 13:42:22 +08:00
易良	379da6327c	fix(core): improve stream rate-limit retry handling (#3790 ) * fix(core): improve stream rate-limit retry diagnostics * fix(core): honor retry-after for stream rate-limit retries * fix(core): support response retry-after headers * fix(core): guard rate-limit diagnostics payloads * fix(core): tolerate null retry-after headers * fix(core): harden rate-limit retry diagnostics	2026-05-07 11:58:55 +08:00
kkhomej33-netizen	49f8828849	Create project temp dir before saving truncated output (#3875 )	2026-05-07 11:58:46 +08:00
Shaojin Wen	7124c6cba0	fix(core): rebuild tool registry on subagent Config overrides so bound tools resolve to the subagent (#3873 ) * fix(core): rebuild tool registry on subagent Config overrides so bound tools resolve to the subagent PR-B (#3774) added per-Config FileReadCache isolation via Object.create overrides at two subagent spawn sites — agent.ts:createApprovalModeOverride and subagent-manager.ts:maybeOverrideContentGenerator. The override shielded code that read FileReadCache directly through the Config instance, but missed the bound-tool path: Config.createToolRegistry runs once at parent initialise time, so the parent's EditTool / WriteFileTool / ReadFileTool instances are bound with `this.config = parent`. The subagent's Object.create wrapper inherited getToolRegistry via the prototype chain, reaching the parent registry whose bound tools then read FileReadCache and approval mode from the parent. This change closes that gap by rebuilding the tool registry on the override at both sites — the same pattern InProcessBackend.createPerAgentConfig already uses: - override.createToolRegistry(undefined, { skipDiscovery: true }) - registry.copyDiscoveredToolsFrom(base.getToolRegistry()) - override.getToolRegistry = () => registry createApprovalModeOverride becomes async; its single call site already ran inside an async block. maybeOverrideContentGenerator skips the rebuild when the upstream Config already has its own getToolRegistry (real-world case: agent.ts wrapper passed through createAgentHeadless), avoiding wasted work, listener accumulation on shared SubagentManager / SkillManager, and a cache split where the bound tools' registry layer diverges from the runtime context's lazy-init cache. Includes regression tests in agent-override.test.ts and subagent-manager-override.test.ts that exercise the bound-tool path: they instantiate the lazy factories on the override registry and assert that EditTool / WriteFileTool / ReadFileTool resolve this.config to the override Config (and thus to the override's FileReadCache / approval mode), not the parent. * fix(core): close bound-tool gap on resumed background agents too Follow-up audit on PR #3873 surfaced a duplicate, pre-rebuild copy of `createApprovalModeOverride` living in `background-agent-resume.ts` (L142-150). Resumed fork agents go through `createResumedForkSubagent` which bypasses `SubagentManager.maybeOverrideContentGenerator` (where the registry rebuild now lives), so the resumed fork's `EditTool` / `WriteFileTool` / `ReadFileTool` were still resolving `this.config` to the parent and reading the parent's `FileReadCache`. The non-fork resume path went through `subagent-manager` and worked correctly only because `maybeOverrideContentGenerator` saw no upstream own-registry on `bgConfig` and rebuilt one — but with that fallback the fork path could never benefit. This change deletes the local copy and switches `background-agent-resume.ts` to import the now-async exported `createApprovalModeOverride` from `agent.ts`. Drops the previous `?: this.config` short-circuit so the resumed agent ALWAYS gets a wrapper Config — the same behaviour `agent.ts` already enforces; reusing the parent directly defeats the per-Config FileReadCache isolation. Updates `background-agent-resume.test.ts` mock config with the `createToolRegistry` / `getToolRegistry` stubs the rebuild path now exercises. * fix(core): address bound-tool isolation review feedback Three independent fixes from PR #3873 review feedback: 1. Switch the upstream-rebuild guard from `hasOwnProperty(base, 'getToolRegistry')` to a Symbol-keyed marker `TOOL_REGISTRY_REBUILT`. The own-property check missed the case where the override is reached via an Object.create wrapper above the rebuilt Config (e.g. `bgConfig = Object.create(agentConfig)` in the agent.ts background path) — it would falsely report "no upstream rebuild" and cause a redundant third rebuild that wastes work and doubles the listener-leak surface. Symbol property reads walk the prototype chain via normal lookup, so a marker stored on any ancestor is correctly observed. Extracts the shared rebuild logic into `rebuildToolRegistryOnOverride(override, base)` so the three spawn sites (agent.ts:createApprovalModeOverride, the inherits branch, the non-inherit branch) cannot drift apart. 2. Stop the per-subagent ToolRegistry in the lifecycle finally blocks: - agent.ts foreground finally (after the inner try wrapping `runFramed`) - agent.ts background bgBody finally (after `bgSubagent.execute` resolves) - background-agent-resume.ts resume body finally (same shape) Without this, every AgentTool / SkillTool the model instantiates from the per-subagent registry registers a change-listener on shared SubagentManager / SkillManager, and repeated subagent runs accumulate listeners for the rest of the session. Stop is fire-and-forget, matching `InProcessBackend.cleanup` and `stopAgent`. 3. Add bound-tool isolation tests for the non-inherit branch (explicit-model selector). The original PR only covered the inherits branch directly; the non-inherit branch now goes through the same helper, but a dedicated test pins `tool.config === override` and the FileReadCache binding so a regression cannot leave explicit-model subagents reading the parent's cache while existing model-override tests still pass. Tests now exercise: - Symbol marker propagation via Object.create chain (3 cases) - Non-inherit rebuild + bound-tool isolation - Non-inherit skip-rebuild when upstream wrapper has the marker - Pre-existing inherits / chained-override / approval-mode propagation - Mock configs in agent.test.ts / subagent-manager.test.ts / background-agent-resume.test.ts gain `stop` and `tools: Map` stubs to model the registry contract the override path now exercises. `npx vitest run packages/core/src` — 268 files / 6943 passed.	2026-05-07 10:31:32 +08:00
Shaojin Wen	b9f56e4344	feat(core): add signal.reason convention for ShellExecutionService (#3831 PR-1 of 3) (#3842 ) * feat(core): add signal.reason convention for ShellExecutionService.execute() Foundation for #3831 Phase D (b) — Ctrl+B promote of a running foreground shell to background. Defines a discriminated `ShellAbortReason` union that the AbortSignal carries; default behavior (no reason / `{ kind: 'cancel' }`) keeps the existing tree-kill on abort. `{ kind: 'background' }` is a takeover signal — execute() skips the kill, drops the child from its active set (so cleanup() won't kill it later), flushes a snapshot of captured output, and resolves the result Promise immediately with `promoted: true` so the awaiting caller unblocks. Pure plumbing: no caller sets the reason yet, so this is a zero-behavior change for existing call sites. The `promoted?: boolean` field is optional on ShellExecutionResult so existing consumers compile against the new shape without source changes. Tests pin both branches in both childProcessFallback and executeWithPty: default abort still SIGTERM-tree-kills; `{ kind: 'cancel' }` is identical to default (pin against accidental routing through the background branch); `{ kind: 'background' }` skips the kill, snapshot output is preserved, mockProcessKill / mockPtyProcess.kill are NOT called. Part of #3831 (Phase D part b — Ctrl+B promote running shell to background). PR-1 of 3. * fix(core): detach service listeners on background-promote (resolve review) Addresses 4 Critical + 2 Suggestion findings on PR-1 of #3831: - childProcess listener detach (review line 555 + 573): Anonymous arrow listeners on stdout/stderr/error/exit could not be off()'d. After background-promote, post-promote bytes would re-enter handleOutput, which then calls decoder.decode() on a now-finalized text decoder (cleanup() already called .decode() without stream:true) → TypeError crash. Even without the crash, old onOutputEvent would fire for new data → ownership contract violation + duplication. Fix: extract named handler refs (stdoutHandler / stderrHandler / errorHandler / exitHandler) and call off() on all four in the background-promote branch via a detachServiceListeners() helper. - PTY listener detach (review line 967 + 990): node-pty's onData / onExit return IDisposable handles; the abort handler now captures dataDisposable / exitDisposable and calls .dispose() in the background-promote branch. ptyProcess.on('error') is EventEmitter-style (not IDisposable) — extract a named ptyErrorHandler ref and off() it. Without these, post-promote PTY error throws → Node.js crash; post-promote data continues writing to headlessTerminal and calling old onOutputEvent → ownership violation. - PTY in-flight chain item ownership (related to review line 990): processingChain may have already-enqueued callbacks past the early listenersDetached check. Refactored from "early-return short-circuit" to "guard each onOutputEvent emit individually" so in-flight writes still LAND in headlessTerminal (snapshot reflects them) but no events leak to the foreground onOutputEvent. Also clear renderTimeout in the abort handler so a pending throttled render doesn't fire post-promote. - PTY snapshot freshness (review line 972, suggestion): The original abort handler called serializeTerminalToText immediately. Now we await Promise.race([processingChain drain, SIGKILL_TIMEOUT_MS]) first (mirrors the onExit finalize pattern at ~line 970) so in-flight headlessTerminal.write callbacks land before serialization. Skipped render(true) intentionally because it would emit final onOutputEvent data (renderFn calls onOutputEvent), violating the "no emit post-promote" invariant — added a comment explaining why direct serialize is correct. - Handoff-boundary tests (review line 1257, suggestion): Added 4 new tests pinning the ownership contract — 2 for child_process (post-promote stdout/stderr does NOT route to onOutputEvent; child exit does NOT re-resolve result), 2 for PTY (data/exit disposables ARE called; result shape stays promoted: true even if post-promote events fire). Also: test setup now stubs mockPtyProcess.onData / .onExit to return { dispose: vi.fn() } so the background-promote path's dispose() calls don't crash on undefined (the stub's mock.results[0].value is then inspected by the new handoff tests). 58 / 58 tests pass (50 baseline + 4 first-pass + 4 handoff). Total +235 / -35 on top of the prior commit. * fix(core): defensive hardening for ShellExecutionService background-promote (resolve 2nd review pass) Addresses 6 follow-up [Suggestion] threads on PR-1 of #3831 — all substantive code-quality issues raised by the second-pass review of the dispose-based detach commit (`8e8e18ca7`): - Exhaustive switch on `ShellAbortReason.kind` (both abort handlers). Earlier `if (reason?.kind === 'background')` form silently fell through to kill for any unrecognized variant — a future `{ kind: 'suspend' }` would have killed the process with zero compile-time signal. Switched to `switch (kind)` with a `never`-typed default that runs `debugLogger.warn` and falls back to the safest behavior (cancel/kill). Each branch is now extracted into a named helper (`performBackgroundPromote` / `performCancelKill`) so the switch body stays a single screenful. - Each `dispose()` wrapped in its own try/catch (PTY). node-pty's `IDisposable` contract doesn't guarantee no-throw. Without per-dispose try/catch a single throwing dispose() would skip subsequent cleanup (the other dispose, off('error'), activePtys.delete, drain, resolve) and the caller would hang forever on `await result`. Each call now logs via debugLogger.warn on failure but continues. - `.catch(() => undefined)` on the processingChain side of the drain race (PTY). `Promise.race([processingChain.then(drain).then(drain), timeout])` would propagate a chain rejection out of the race; since `addEventListener` doesn't await our handler, the rejection became unhandled and `resolve()` was never called → caller hung. Now the rejection is swallowed; the timeout side still terminates the race on time. - Drain-timeout truncation now emits a diagnostic warning (PTY). Previously the 200ms drain timeout could fire, the snapshot would be taken with the buffer in mid-write state, and the result.output would be silently truncated. Race result is now observed via a symbol sentinel; when the timeout side wins, debugLogger.warn fires pointing the user at rawOutput as the un-truncated fallback. - Snapshot serialize failure logs instead of swallowing silently (PTY). Empty `catch {}` made result.output indistinguishable from "command produced no output" if serializeTerminalToText threw. Now `debugLogger.warn` with the error message leaves a trail for support bundles. - Dedicated `PROMOTE_DRAIN_TIMEOUT_MS` constant separated from `SIGKILL_TIMEOUT_MS`. Both are 200ms today, but they have unrelated reasons-to-change (kill escalation timing vs. promote drain ceiling) — sharing the constant means tuning one would silently change the other. Also adds a module-level `debugLogger = createDebugLogger('SHELL_EXECUTION')` since the service had no logging surface before this commit. 58 / 58 tests pass; tsc clean; ESLint clean. No new tests added: the new behaviors (timeout sentinel firing, dispose throw, exhaustive switch default) are defensive log-only paths; existing handoff tests already cover the happy path. Adding mock-throw tests is reasonable follow-up but not blocking. * fix(core): real bug — ptyProcess.off → removeListener; defensive abort-reason read Resolves the third review pass on PR-1 of #3831 — 1 real bug + 2 defensive hardenings: - Real bug: `ptyProcess.off('error', ...)` throws TypeError at runtime (line ~1074). `@lydell/node-pty`'s `IPty` interface exposes the legacy Node EventEmitter `removeListener`, not the modern `off` alias. Previous form threw, the surrounding try/catch swallowed it (post-prior-pass dispose hardening), but the old `ptyErrorHandler` stayed registered — so a post-promote PTY error would still hit our foreground handler and `throw err`, breaking the handoff contract that PR-1's whole listener-detach work is supposed to enforce. Switched to `removeListener`. The catch + warn stays as defense-in-depth; the message wording is updated. - Prototype-pollution-safe `kind` read (extracted to module-level helper `getShellAbortReasonKind`). The previous `reason?.kind` walked the prototype chain — a polluted `Object.prototype.kind = 'background'` would silently route `abortController.abort({})` (any plain object reason) into the promote branch and skip the kill. Lifecycle/safety branch deserves the extra check. Helper now: rejects non-object reasons; reads `kind` only as an OWN property (`hasOwnProperty`); whitelists against `'background' \| 'cancel'`; defaults to `'cancel'` (the safe historical behavior) for everything else. Both abort handlers (childProcess + PTY) now share this helper. - `streamStdout: true` + background-promote = silent empty snapshot (childProcess `performBackgroundPromote`). The promote snapshot reads from the `stdout` / `stderr` string accumulators; but in `streamStdout` mode `handleOutput` forwards bytes through `onOutputEvent` and skips the accumulators entirely. Today PR-1's only call site (foreground shell.ts) uses `streamStdout: false`, so the combination is unreachable — but if a future caller pairs the two, `result.output` would be empty with no diagnostic. Added a `debugLogger.warn` when the combination occurs, pointing the caller at `rawOutput` as the fallback. Cheaper than building a parallel accumulator just for this latent case. 58 / 58 tests pass; tsc clean; ESLint clean. * fix(core): liveness check + throw-safe abort-reason read + encoding-aware PTY snapshot (resolve 4th review pass) Resolves 6 threads on PR-1 of #3831 — 1 Critical + 1 real bug + 2 quality + 2 test-coverage: - [Critical] `getShellAbortReasonKind` throw-safe property read. Previous form read `reason.kind` after only checking that `kind` is an own property. An own accessor that throws (or a Proxy with a trapping getter) would throw before the helper reached either the cancel kill path or the background promote path. Abort handlers are dispatched async and not awaited by AbortSignal, so a leaked throw here would have left the shell process alive instead of being killed on cancel — quietly. Wrapped the property read in try/catch with a fall-back to the safe 'cancel' kill behavior. - Real bug: child_process post-exit race in background-promote (`performBackgroundPromote`). The child may have already exited but the 'exit' event hasn't reached our handler yet (Node delivers events on the next microtask). Promoting in that window would detach our exit listener and report `promoted: true` for a process that's already dead — the caller would hold an inert pid expecting to take over. Now we read `child.exitCode` / `child.signalCode` before detaching: if either is non-null, fall through and let the pending exit handler resolve normally with the real exit info. Mirrored mock setup so `exitCode` / `signalCode` default to `null` (matching real ChildProcess) instead of `undefined`. - PTY snapshot: re-decode + replay (mirror exit-path encoding). The promoted snapshot was serializing `headlessTerminal` directly, which was fed by a streaming decoder initialized from the first-chunk encoding heuristic. When early output is ASCII-only but later output is in a different encoding (GBK / Shift-JIS / etc.), this produces mojibake — and the normal exit path doesn't, because it re-decodes `finalBuffer` with `getCachedEncodingForBuffer` and replays through a fresh terminal. Now mirrors that logic so `result.output` shape matches across the two paths. Direct-serialize remains as a last-ditch fallback if replay throws. - Switch `default` no longer emits a runtime warn. Reviewer noted the helper's whitelist made the `default: { _exhaustive: never }` branch unreachable at runtime — the `debugLogger.warn` in it could never fire. Kept the `_: never = kind` type assertion (so a future ShellAbortReason variant forces a TS error here, directing the developer to extend BOTH the helper's whitelist AND add a `case`), removed the unreachable warn. Added a comment that the assertion is the static-only safety net the union expansion would trigger. - Direct unit tests for `getShellAbortReasonKind` (8 cases). The helper's prototype-pollution defense is the main reason it exists; if `hasOwnProperty` is accidentally removed the regression would silently send `abortController.abort({})` (any plain reason) into the promote path. Exported the helper and added direct tests for: null / undefined, non-object, empty object (no own kind), prototype- only kind (pollution), unknown kind value, throwing accessor, Proxy trap, and the two happy paths. - `removeListener` regression guard. The fix to call `ptyProcess.removeListener('error', ...)` instead of `.off(...)` matters because `@lydell/node-pty`'s IPty interface only exposes `removeListener` — `.off()` throws TypeError on a real PTY but the EventEmitter mock tolerates both. Added a test that spies on both methods and asserts the production code uses `removeListener` for the 'error' event, so a future swap back to `.off()` regresses loudly under the mock instead of silently. 68 / 68 tests pass (58 baseline + 9 helper boundary + 1 removeListener guard + 1 post-exit race); tsc clean; ESLint clean. * fix(core): PTY background-promote post-exit race guard (resolve 5th review pass) Mirrors the child_process post-exit race fix from `4cc558b3d` into the PTY path — addresses 1 [Critical] thread on PR-1 of #3831: The PTY may have already exited but our `exitDisposable` (onExit callback) hasn't run yet — node-pty delivers the exit event asynchronously after the PTY's native SIGCHLD, so there's a window between "PTY actually dead" and "service onExit fires". Promoting in that window detaches our exit listener and reports `promoted: true` for a dead PTY, losing the real exit status; the caller would hold an inert pid expecting to take over. The IPty interface doesn't expose an `exitCode` field we can read directly (unlike `child.exitCode` / `child.signalCode` for child_process), so use `process.kill(pid, 0)` as a best-effort liveness check via the existing `ShellExecutionService.isPtyActive` helper. If kill(pid, 0) throws ESRCH, the pid is gone — log at debug level and fall through, letting the pending onExit callback resolve normally with the real exit info. Also adds a unit test mirroring the child_process race test: mocks `process.kill(pid, 0)` to throw ESRCH on the liveness probe, asserts the result has no `promoted: true` and reports the real exitCode. 69 / 69 tests pass; tsc clean; ESLint clean. * docs(core): correct getShellAbortReasonKind boundary-test count in JSDoc Doc said 'all six edge cases' but the test suite has 8 cases (added Proxy-trap and undefined later). Off-by-2 cosmetic only — no behavior change. Caught during a multi-round self-audit of PR-1 of #3831. Audit summary: 7 rounds (correctness / reverse / consistency / coverage / build / exception paths / style) found one false-positive (a sync- abort registration-order race I initially thought existed). Verified that Node's WHATWG AbortSignal does NOT auto-fire 'abort' listeners on already-aborted signals, so the race window cannot open. No code change needed for that scenario; this commit is just the JSDoc fix. 69 / 69 tests still pass; tsc + ESLint clean. * docs(core): document the helper / union / switch sync invariant explicitly Multi-round self-audit found that `getShellAbortReasonKind`'s value whitelist has no compile-time tie to the `ShellAbortReason` union: when the union grows, TypeScript's `_exhaustive: never` in each switch forces #3 (the case arm) to be added, but the helper's whitelist (#2) silently keeps degrading the new variant to 'cancel', and the new case arm is never reached at runtime. Reviewer #4 raised this on the second pass; the original commit chose to accept it (option B in that thread) but didn't leave a strong in-code signal for future contributors. Added an INVARIANT block inside the helper enumerating the three sites that must be kept in sync, so the next person extending `ShellAbortReason` sees the coupling at the place where they're most likely to forget it. No behavior change — comment-only. 69 / 69 tests still pass; tsc + ESLint clean. Audit summary (this round + prior round): 18 angles total over two sweeps and one reverse-attack pass. Found: - 0 real bugs - 1 false-positive race (sync-abort registration order — Node WHATWG AbortSignal does NOT auto-fire on already-aborted signals; investigated, reverted) - 1 cosmetic doc fix (boundary-test count off-by-2) - 1 cosmetic INVARIANT block (this commit) Areas reviewed without finding new issues: caller-side ShellExecutionResult shape compatibility (optional `promoted?` field, existing callers spread-untouched); `exited` flag lifecycle (monotonic, cleanup() idempotent); processingChain in-flight ownership (listenersDetached guards every onOutputEvent emit including the renderFn-rendered case via the same flag); race between exit event and abort handler (both microtasks, FIFO ordering gives correct outcome either way); Node version dependence (`AbortSignal.reason` is Node 17.2+, engines: >=20 covers it); test isolation (mockImplementationOnce + module-level mockProcessKill clears each beforeEach); `process.kill(pid, 0)` Windows liveness reliability (best-effort, acceptable for PR-1 plumbing); PID reuse race on the PTY liveness check (theoretically possible, microsecond window, unavoidable at the OS level — rejected in spec discussion); PR-2/PR-3 contract surface (caller MUST attach listeners before abort — documented; any future caller violating this is its own bug). * test(core): align mockChildProcess.exitCode/signalCode in second beforeEach The 'execution method selection' describe block has its own beforeEach (separate from 'child_process fallback') that builds mockChildProcess but does not set `exitCode` / `signalCode = null`. Real Node `ChildProcess.exitCode` / `signalCode` are `null` while the process is alive — and production now reads these in the background-promote race guard. The current tests in this block don't exercise the promote path, so they pass regardless, but any future promote-related test landing here would silently trip the guard (`undefined !== null` is true) and fall through to the normal-exit branch instead of promoting. Mirror the `child_process fallback` block's mock setup so the two beforeEach hooks produce equivalent ChildProcess shapes, eliminating a quiet foot-gun for future contributors. Comment-only / test-fixture change. 69 / 69 tests still pass; tsc clean. Found during a deeper third-round self-audit of PR-1 of #3831.	2026-05-07 10:31:15 +08:00
ChiGao	4f084352f4	feat(cli): customize banner area (logo, title, hide) (#3710 ) * docs(design): add banner customization design (#3005) Document the design for issue #3005 (customize CLI banner area). Covers the banner region taxonomy and what is replaceable vs. locked, the three proposed settings (`ui.hideBanner`, `ui.customBannerTitle`, `ui.customAsciiArt`) and their resolution pipeline, the schema additions and wiring touch points, five alternative shapes considered, and the security / failure-handling guards. Mirrored EN + zh-CN under `docs/design/customize-banner-area/`. No code changes in this commit; implementation lands in a follow-up PR. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * feat(cli): customize banner area (logo, title, hide) Adds three opt-in `ui.` settings that let users replace brand chrome on startup while keeping the operational lines (version, auth, model, path) locked: `hideBanner`, `customBannerTitle`, `customAsciiArt` (string, {path}, or {small,large}). A new resolver in `packages/cli/src/ui/utils/customBanner.ts` walks the loaded settings, normalizes each tier per scope (so {path} resolves against the file that declared it), reads the file with O_NOFOLLOW and a 64 KB cap on POSIX, sanitizes via a banner-specific stripper that drops OSC/CSI/SS2/SS3 sequences while preserving newlines, and caps art at 200 lines × 200 cols and titles at 80 chars. Every soft failure logs a `[BANNER]` warn and falls through to the bundled QWEN logo or default brand title — banner config can never crash the CLI. `<Header />` now picks the widest custom tier that fits via a shared `pickAsciiArtTier` helper and falls back to `shortAsciiLogo` otherwise; `<AppHeader />` extends the existing `showBanner` gate to honor `hideBanner` alongside the screen-reader fallback. Tracks #3005 and the design merged in #3671. docs(design): apply prettier to banner customization design Reformats the EN and zh-CN design docs in `docs/design/customize-banner-area/` to satisfy `npx prettier --check`: table column alignment and trailing commas in `jsonc` examples. No content changes — the words, tables, and code blocks all say the same thing as before. Carries forward the only actionable feedback from the now-closed docs-only PR #3671, where the prettier check was the sole change requested. * fix(cli): address banner audit findings Three audit-driven fixes for the banner customization feature: 1. VSCode JSON schema accepts every documented shape. The `ui.customAsciiArt` entry in `packages/vscode-ide-companion/schemas/settings.schema.json` was declared as `type: object`, which made VSCode flag the inline-string form (`"customAsciiArt": " ___"`) — a shape the runtime accepts and the design doc recommends — as a schema violation. Replaced with a `oneOf` covering string, `{path}`, and `{small,large}` (with each tier itself string-or-`{path}`). 2. Narrow terminals no longer leak the QWEN logo over a white-label deployment. When a user supplied custom ASCII art but neither tier fit the terminal, `Header.tsx` previously fell back to the bundled `shortAsciiLogo` — silently undoing the white-label intent on small windows. The fallback now distinguishes "user supplied custom art" from "no custom art at all": in the first case the logo column is hidden entirely (info panel still renders); in the second case the default logo shows as before. Soft-failure paths (missing file, sanitization rejection) still fall through to `shortAsciiLogo`. 3. Sanitizer strips C1 control bytes (0x80-0x9F). The art and title strippers previously stopped at 0x7F, leaving single-byte CSI (`0x9B`), DCS (`0x90`), ST (`0x9C`) and other C1 controls intact — which legacy 8-bit terminals would still interpret. Aligned the ranges with the repo's existing `stripUnsafeCharacters` (in `textUtils.ts`) so banner content can't carry interpreted control bytes through. New tests cover: C1 strip in art and title, absolute path reads, symlink rejection on POSIX, narrow-terminal hide-on-custom-art, and end-to-end `<AppHeader />` rendering through `resolveCustomBanner`. The full banner suite is 48 tests (was 42). * docs(design): clarify cross-scope tier merge and white-label fallback Two clarifications surfaced by the audit on the implementation PR: 1. The design said `customAsciiArt` follows standard merge precedence, but the resolver actually walks scopes per-tier so workspace can override only `large` while user keeps `small`. Document that this per-tier walk is intentional — both because each `{path}` has to resolve against the file that declared it (the merged view loses that information) and because it lets users keep a personal default tier and override the other one per-workspace. 2. The render-time tier-selection step now distinguishes "user supplied custom art but neither tier fits" (hide the logo column entirely; falling back to `shortAsciiLogo` would silently undo a white-label deployment on narrow terminals) from "user supplied no custom art at all" (fall through to `shortAsciiLogo` and let the default-logo width gate decide). Step 5's pure soft-failure fallback (missing file, sanitization rejection) is unchanged — still `shortAsciiLogo`. Mirrored both edits in the zh-CN translation. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * docs(design): add size budget section to banner customization Question raised on the implementation PR: "why is the test logo `CCA` instead of the full `Custom Code Agent` — is there a character limit?" There is no character-count limit on titles or art. There is a width budget driven by terminal columns, plus an absolute hard cap (200×200 art, 80-char title) to keep malformed input from freezing layout. The existing user-facing guide didn't quantify the budget anywhere, so users were guessing why long inline names didn't render. Add a "How wide can the logo be? — the size budget" subsection that spells out the formula (`availableLogoWidth = terminalCols − 4 − 2 − 44`), tabulates it at 80 / 100 / 120 / 200 cols, calls out that a 17-char brand like "Custom Code Agent" can't render as a single ANSI Shadow line on most terminals (~120 cols of art), and shows the stacked-words `{ small, large }` recipe — including the `figlet` one-liner that generates the corresponding `banner-large.txt`. Mirrored in the zh-CN translation. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * docs(design): add limits-at-a-glance table; switch demo to Custom Agent The banner-customization design now has the size budget written down, but the per-cap limits (80-char title, 200×200 art, 64 KB file) were buried inside the size-budget formula table. Surface them as their own "Limits at a glance" subsection at the top of the user-configuration guide so users see the hard caps before they start hand-crafting art. Also switch the running example from "Custom Code Agent" (17 chars, ~120 cols of ANSI Shadow art on one line — too wide for any common terminal) to "Custom Agent" (12 chars, two-word stack at ~54 cols × 12 lines, fits any terminal ≥ 104 cols). The figlet recipe is now a two-word pipeline so a copy-paste run produces art the size the doc claims. Mirrored both changes in the zh-CN translation. The implementation itself is unchanged. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): address PR review + CI Lint failure Two reviewer findings on PR #3710 (and the Lint job that fails for the same root cause): 1. Schema regen now reproduces the committed JSON Schema. The CI Lint step runs `npm run generate:settings-schema` and fails when the worktree dirties — my earlier hand-authored `oneOf` got blown away because `customAsciiArt` is `type: 'object'` in the source schema and the generator had no way to emit a union. Add a `jsonSchemaOverride` escape-hatch field on `SettingDefinition`: when set, the generator emits the override verbatim (description carried forward) instead of the type-driven shape. Set it on `customAsciiArt` to express the runtime union (string \| {path} \| {small,large} where each tier is itself string-or-{path}). The committed schema is now regenerated from source and CI's regenerate-and-diff check passes; two back-to-back regens produce identical output. 2. Untrusted workspace settings no longer influence the banner. `collectScopedTiers()` walked `settings.workspace` directly because per-scope file paths are needed to resolve relative `{path}` entries — but that bypassed the trust gate that `settings.merged` enforces. An untrusted checkout could therefore render its own ASCII art and trigger local file reads through a `{path}` entry before the user trusts the folder. Skip `settings.workspace` entirely when `settings.isTrusted` is false. Two regression tests cover the gate (untrusted = workspace silenced, falls through to user; trusted = workspace honored). Test suite for the banner is now 30 resolver tests + the existing Header / AppHeader / settingsSchema tests = 66 total, all green. * feat(cli): add ui.customBannerSubtitle for the spacer row Adds a fourth opt-in setting to the banner customization surface. The info panel renders four rows (title, subtitle/spacer, status, path); the second row was a hard-coded single-space spacer up to now. With this change a fork or white-label deployment can set `ui.customBannerSubtitle` to a one-line subtitle (e.g. "Built-in DataWorks Official Skills") and have it render in the secondary text color in place of the spacer. Empty/unset preserves the previous blank-spacer layout, so the change is back-compat. The subtitle is sanitized through the same `sanitizeSingleLine` helper as the title (now factored out): OSC / CSI / SS2 / SS3 leaders dropped, every other C0/C1 control byte replaced with a space, internal whitespace collapsed, ends trimmed. Capped at 160 characters — looser than the title's 80 because tagline / "powered by" copy commonly runs longer — with the same `[BANNER]` warn on truncation. Wiring: - `settingsSchema.ts` — new `customBannerSubtitle` entry next to `customBannerTitle`, `showInDialog: false` (free-form text in the TUI dialog isn't worth its own picker). - `customBanner.ts` — `ResolvedBanner.subtitle` field; `resolveCustomBanner` populates it; `sanitizeTitle` and the new `sanitizeSubtitle` share the same helper. - `Header.tsx` — when `customBannerSubtitle` is truthy the spacer row renders the string (secondary color, single line) instead of `<Text> </Text>`. Auth/model and path still sit at their usual positions. - `AppHeader.tsx` — pipes `resolvedBanner.subtitle` through. - VSCode JSON schema regenerated from source (idempotent). Tests: 5 new resolver tests (default, sanitize, length cap, empty, newline + C1 strip), 2 new Header tests (renders subtitle between title and auth; spacer preserved when unset), 1 new AppHeader integration test (end-to-end through resolver). Banner suite is now 35 + 17 + 6 + 16 = 74 tests, all green. Design docs (EN + zh-CN) updated: region taxonomy now lists four B-rows; "Limits at a glance" table grows a subtitle row; "Customization rules" matrix and "How to modify" section gain a "Add a brand subtitle" example with a rendered four-row preview. * docs(design): sweep stale 3-setting references after subtitle add Self-review found several sections of the banner customization design doc still framed for the original three settings; bring them in line with the four-setting reality landed in `c7aa4a401`: - Region taxonomy ASCII diagram now shows four B-rows (① title, ② subtitle, ③ status, ④ path). - Resolution-pipeline ASCII diagram and step list pick up customBannerSubtitle on the input side and the title/subtitle sanitize step on the resolver side. - "Settings schema additions" section lists the fourth entry, customBannerSubtitle, and notes the customAsciiArt jsonSchemaOverride that landed for VS Code schema reproducibility. - "Wiring changes" section updates the Header prop list and the HeaderProps interface, replaces the brittle line-number anchors with file-level anchors, drops the obsolete `paths` second arg from resolveCustomBanner, and adds the trust-gate sentence. - "Security & failure handling" table replaces the stripTerminalControlSequences shorthand with the actual banner-specific stripper, splits the title/subtitle row to cover both, and adds the untrusted-workspace gate as its own row. - "Verification plan" gains two scenarios: the subtitle row, and the untrusted-workspace check that the Critical reviewer comment on the impl PR explicitly asked us to lock down. Mirrored every edit in the zh-CN translation. The implementation itself is unchanged. Generated with AI Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): address banner re-review (FIFO, mutex schema, display width, regex dedupe) Addresses the five findings on PR #3710 from the latest re-review: 1. [Critical] FIFO/pipe at `customAsciiArt.path` no longer hangs startup. The resolver was calling `openSync(path, O_NOFOLLOW)` before the `fstatSync(...).isFile()` check; on POSIX, opening a FIFO read-only blocks until a writer connects, and `O_NOFOLLOW` doesn't help — it only refuses symlinks at the final path component. `readArtFile` now `lstatSync()`s first and refuses non-regular files (FIFO / socket / device / symlink) before the open, while keeping the post-open `fstatSync` check for TOCTOU safety against a swap between the lstat and the open. New POSIX-only regression test `mkfifo`s a named pipe and asserts the resolver soft-fails inside 1 s; if the open ever regresses to blocking, the test will hang past the timeout and the assertion will catch it. 2. [Suggestion] `{path}` and `{small,large}` are now mutually exclusive in both schema and runtime. The `jsonSchemaOverride` on `ui.customAsciiArt` is split into three branches (string, `{path}`, `{small?, large?}`); none of them allow `path` and tier keys to co-exist. `normalizeTiers()` mirrors that — an object carrying both kinds of keys is now soft-rejected with a `[BANNER]` warn rather than letting `path` silently win and dropping the tier values. New regression test pins the runtime side. 3. [Suggestion] Column cap and tier-fit selection now measure in terminal cells. `getAsciiArtWidth` (in `textUtils.ts`) and the `MAX_ART_COLS` cap in `customBanner.ts` were both using UTF-16 `.length`, so 200 CJK fullwidth characters would slip the cap and render at ~400 cells, and `pickAsciiArtTier`'s width-fit check was wrong for any non-ASCII art. Switched both to `getCachedStringWidth` (string-width semantics, already in the repo); art truncation walks code points until adding another would push the cell width past the cap, so we never split a fullwidth code point or surrogate pair down the middle. New regression test exercises the CJK fullwidth case. 4. [Suggestion] `collectScopedTiers()` no longer drops a whole scope just because it has no `file.path`. Inline-string tiers don't need an owning settings directory; only `{path}` tiers do. The path-presence check was moved into the `{path}` branch, so a path-less scope (e.g. `systemDefaults`, future SDK-injected scopes) can still contribute inline art. `{path}` entries in such a scope soft-fail with a tier-specific `[BANNER]` warn rather than killing the whole scope. Two regression tests cover both sides. 5. [Suggestion] OSC / CSI / SS2-3 regex are now authored once. Extracted `TERMINAL_OSC_REGEX`, `TERMINAL_CSI_REGEX`, `TERMINAL_SHIFT_DCS_REGEX` from `stripTerminalControlSequences` in `@qwen-code/qwen-code-core` and re-export them from the package index. `customBanner.ts` reuses the constants for `sanitizeArt` (which still has to preserve `\n` / `\t`) and delegates the title/subtitle pipeline directly to `stripTerminalControlSequences`. Also backported the C1 control strip (0x80-0x9F) into the core helper so all callers (session-title, etc.) benefit from the same coverage; banner sanitizer was the only place catching single-byte CSI / DCS / ST. Banner suite is now 40 + 17 + 6 + 16 = 79 tests, all green. Schema regen is still byte-for-byte idempotent. `npm run typecheck` and prettier clean on touched files. * fix(cli): replace require() with ES6 import in FIFO test (lint) The FIFO regression test in `7ccbfaeb1` used a synchronous `require()` to pull in `node:child_process` so the test could lazy-load `execFileSync` only when needed. CI Lint flagged it under `no-restricted-syntax` — the repo enforces ES6 imports throughout, including in tests, with no exception for `require()`. Move the import to the top of the file alongside the other `node:` / vitest imports. The `try/catch` around `execFileSync('mkfifo', ...)` still gates the test on `mkfifo` being available (rare on a fresh container, so we skip rather than fail). 40 / 40 tests still pass and ESLint is clean on the touched file. --------- Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-07 10:17:53 +08:00
jinye	15342b8932	fix(core): shrink file diff session records (#3872 ) Some checks are pending Qwen Code CI / Lint (push) Waiting to run Details Qwen Code CI / Test (push) Blocked by required conditions Details Qwen Code CI / Test-1 (push) Blocked by required conditions Details Qwen Code CI / Test-2 (push) Blocked by required conditions Details Qwen Code CI / Test-3 (push) Blocked by required conditions Details Qwen Code CI / Test-4 (push) Blocked by required conditions Details Qwen Code CI / Test-5 (push) Blocked by required conditions Details Qwen Code CI / Test-6 (push) Blocked by required conditions Details Qwen Code CI / Test-7 (push) Blocked by required conditions Details Qwen Code CI / Test-8 (push) Blocked by required conditions Details Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details * fix(core): shrink file diff session records Trim oversized file edit result displays before writing them to session JSONL while preserving live tool results and diff stats. Also make resume, ACP replay, and export paths treat saved previews as incomplete so they do not reconstruct fake full diffs. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * refactor(cli): share truncated diff preview text Use one helper for ACP replay and export fallback messages so truncated session preview wording cannot drift. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-06 20:48:24 +08:00
Shaojin Wen	5441581392	feat(core): enforce prior read before Edit / WriteFile mutates a file (#3774 ) * feat(core): enforce prior read before Edit / WriteFile mutates a file Introduces a session-scoped invariant: the model cannot mutate an existing file without having actually Read it (or its post-write state) earlier in this conversation. Builds on the FileReadCache landed in #3717. Two new ToolErrorType codes: - EDIT_REQUIRES_PRIOR_READ — file has no entry in the session cache. The model is told to use read_file first. - FILE_CHANGED_SINCE_READ — file has an entry but its mtime or size drifted since the recorded fingerprint. The model is told to re-read before retrying. EditTool blocks the existing-file path on cache.check; new-file creation (old_string === '' on a non-existent target) is exempt. WriteFileTool blocks the overwrite path; new-file creation (fileExists === false) is exempt. Both tools route through the existing fileReadCacheDisabled escape hatch on Config — flipping it bypasses enforcement byte-for-byte, matching pre-cache behaviour. Operators can use this as a kill switch if a session falls into a state where the cache cannot be trusted. ReadFile fix on the auto-memory path: PR #3717 had auto-memory reads skip the cache entirely (both lookup and record), but with the new enforcement that means a model that just Read AGENTS.md cannot then Edit it. Decoupled the two: auto-memory reads still skip the file_unchanged fast-path (the per-read freshness <system-reminder> must always reach the model) but DO record into the cache so the follow-up Edit sees `fresh`. New regression test asserts this. Test plan - vitest run (all of @qwen-code/qwen-code-core): 6308 passed, 2 skipped - 9 new enforcement tests across edit.test.ts and write-file.test.ts: unknown rejects, stale rejects, new-file exempt, edit chain stays authorised, escape hatch bypasses, plus the auto-memory record regression in read-file.test.ts. - tsc --noEmit clean. eslint clean. core build succeeds. * test(core): clear shared fileReadCache between write-file.test.ts cases CI surfaced one Linux-only failure: the prior-read enforcement test 'rejects a write that would overwrite an unread existing file' returned FILE_CHANGED_SINCE_READ instead of EDIT_REQUIRES_PRIOR_READ. Root cause: the FileReadCache instance is declared at module scope (line 41) and shared across every test in write-file.test.ts. State from earlier tests — most recently the 'records a write' integration test that records the same path — leaks forward. On Linux the test ordering puts a record-bearing test before the enforcement test, so the cache reports `stale` (mtime drifted) instead of `unknown`. macOS / Windows happen to order them differently and never hit it. Adding a fileReadCache.clear() to beforeEach gives every test a known-empty cache, matching how edit.test.ts already isolates its per-test cache by re-instantiating it. * fix(core): close prior-read enforcement gaps flagged in 3rd review Three concrete loopholes / regressions that the original PR-B introduction left open. All three are addressed in the same commit because the underlying refactor (move enforcement earlier and tighten the fresh predicate) is shared across them. 1. fresh != "model has seen the bytes". Pre-fix, requirePriorRead() accepted any cache.check === 'fresh'. ReadFileTool records every successful read into the cache, including ranged reads (offset/limit), truncated full reads, and non-cacheable binary/image/audio/video/PDF/notebook reads (lastReadCacheable = false). This let the model peek at a slice or a structured payload of a file and then mutate the whole thing. Tightened the accept predicate to fresh && lastReadAt && lastReadWasFull && lastReadCacheable. 2. Read-less content oracle through calculateEdit error codes. Pre-fix, execute() ran calculateEdit (which reads file bytes and counts matches) before the enforcement check. A model could probe an unread file by attempting Edits with candidate old_strings and observing NO_OCCURRENCE_FOUND vs EXPECTED_OCCURRENCE_MISMATCH vs EDIT_NO_CHANGE — reverse-engineering content without ever calling read_file. Moved enforcement to the top of calculateEdit, before any content read; only a stat is performed up to the rejection point. 3. Confirmation flow regression. Pre-fix, getConfirmationDetails() read the existing file to render a diff for the user, then approval flowed to execute() which would freshly check the cache and reject. The user could approve a diff computed from current bytes the model never saw, and the call would still fail. Moved enforcement before the confirmation read in both EditTool (via the shared calculateEdit path) and WriteFileTool (explicit check at the top of getConfirmationDetails). The user now never sees a confirmation diff for an unread file — the call rejects up front. Public API surface change: requirePriorRead() -> checkPriorRead() that returns a structured decision, so the same predicate can route into a CalculatedEdit.error (calculateEdit), a thrown error (getConfirmationDetails), or a ToolResult (execute) without duplicating the boolean / message / type plumbing in three shapes. Reported by pomelo-nwu (3 inline comments on PR #3774). * refactor(core): close 4 prior-read enforcement gaps from 4th review 1. recordWrite now seeds read metadata on brand-new entries (lastReadAt / lastReadWasFull / lastReadCacheable). The strict accept predicate added in the previous round (#3 review) requires all three, but recordWrite only set lastWriteAt — so a model creating a file with Edit (old_string="") or WriteFile and then editing it again was rejected on the second edit. The model authored the bytes it just wrote; for the purposes of prior-read enforcement that counts as having seen them. New regression test in edit.test.ts: "allows a create-then-edit-then-edit chain without an intervening read". 2. Extracted checkPriorRead into src/tools/priorReadEnforcement.ts. The two copies in edit.ts and write-file.ts had already drifted (one used ${ReadFileTool.Name}, the other hardcoded 'read_file'); the boolean guard is security-sensitive and a one-sided fix would silently weaken the boundary. The shared utility takes a verb ('editing' \| 'overwriting') so the user-facing prose can differ between callers without duplicating the decision logic. 3. WriteFileTool.execute now runs prior-read enforcement BEFORE readTextFile. Pre-fix, an unread overwrite still slurped the entire file into memory (encoding / BOM / line-ending detection) and only then rejected it: wasted I/O, and momentary in-memory custody of bytes the model never legitimately read. Now matches the order in getConfirmationDetails(). 4. The "rejects a write that would overwrite an unread existing file" test now spies on FileSystemService.readTextFile and asserts not.toHaveBeenCalled() — without that, the test gave false confidence: it passed both pre-fix (read happened, then reject) and post-fix (reject before read), so the ordering regression in (3) was invisible to the assertion. Reported by glm-5.1 via /review on PR #3774. * refactor(core): close 4 prior-read enforcement gaps from 4th review (Copilot) Five concrete gaps that the previous round of enforcement work left open. Reported by Copilot via /review on PR #3774. 1. Confirmation-time rejections lost their ToolErrorType code. getConfirmationDetails() in both EditTool and WriteFileTool threw a plain Error on prior-read failure, which coreToolScheduler collapsed into UNHANDLED_EXCEPTION — silently breaking the EDIT_REQUIRES_PRIOR_READ / FILE_CHANGED_SINCE_READ contract for any approval-required flow. Fix: introduce PriorReadEnforcementError that carries `errorType: ToolErrorType`. Both confirmation paths now throw it, and coreToolScheduler reads `error.errorType` (falling back to UNHANDLED_EXCEPTION when absent). New regression tests assert the thrown error's `errorType` field for both tools. 2. checkPriorRead's "re-read with read_file" advice was wrong for binary / image / audio / video / PDF / notebook files. Their ReadFile result always sets lastReadCacheable=false, so the message would loop the agent forever on the same rejection. Fix: detect the fresh-but-non-cacheable case explicitly and return a dedicated message that explains the dead end ("Edit / WriteFile cannot mutate that payload safely") instead of asking for another read. Updated the existing non-cacheable regression test to assert the new message and the absence of "use the read_file tool first". 3. checkPriorRead swallowed every stat() failure and returned ok:true. EACCES, EBUSY, NFS hiccups, etc. would silently re-open the blind-write path the helper exists to block. Fix: only ENOENT continues to return ok:true (disappearance race). Any other code is fail-closed: returns EDIT_REQUIRES_PRIOR_READ with a message that names the errno. New regression test in write-file.test.ts spies on fs.promises .stat to inject EACCES and asserts the rejection. 4. The auto-memory record regression test only asserted `state === 'fresh'`. A future change that recorded auto-memory reads as partial / non-cacheable would still satisfy that assertion but would actually fail enforcement on every follow-up Edit. Fix: also assert lastReadAt is defined, lastReadWasFull is true, and lastReadCacheable is true. The full "what enforcement requires" predicate is now explicit in the test. (The 5th item, the WriteFile mirror of (1), is covered by the same PriorReadEnforcementError change.) * refactor(core): tighten StructuredToolError naming + add scheduler test Four follow-ups raised by deepseek-v4-pro on PR #3774. None of them change the enforcement boundary; they are all about making the contract clearer and harder to break in future changes. 1. PriorReadEnforcementError -> StructuredToolError. The class now wraps any content-derived ToolErrorType from calculateEdit (EDIT_NO_OCCURRENCE_FOUND, EDIT_EXPECTED_OCCURRENCE_MISMATCH, EDIT_NO_CHANGE, ATTEMPT_TO_CREATE_EXISTING_FILE) on top of the prior-read codes. The old name suggested the class was prior- read-specific, which would mislead any oncall engineer seeing it paired with one of the calculateEdit error codes. 2. EDIT_REQUIRES_PRIOR_READ kept its name (the prefix mentions "edit" but the enum is shared with WriteFileTool) — chose documentation over rename to avoid the churn of a value rename across logs/dashboards already keyed on it. JSDoc now spells out the cross-tool usage explicitly. 3. Stat failures other than ENOENT now map to a new PRIOR_READ_VERIFICATION_FAILED code instead of being conflated with EDIT_REQUIRES_PRIOR_READ. The failure mode is "we cannot verify" rather than "definitely not read" — operators routing on error codes can distinguish the two populations. 4. Added a coreToolScheduler test (`surfaces error.errorType from a confirmation throw instead of UNHANDLED_EXCEPTION`) that constructs a stub tool whose getConfirmationDetails throws StructuredToolError and asserts the surfaced ToolCall response carries the correct ToolErrorType. Without this test the scheduler's explicitErrorType branch would have no coverage at all. Tool tests updated for the new StructuredToolError class name and the PRIOR_READ_VERIFICATION_FAILED code on the EACCES path. * fix(core): close TOCTOU + grammar + directory regressions in PR-B Six concrete issues that the previous round of enforcement work left open. Reported by Copilot via /review on PR #3774. 1. TOCTOU window between pre-read checkPriorRead and readTextFile. The pre-read stat could pass enforcement, then an external writer could land between that stat and the actual read, leaving currentContent reflecting bytes the model never saw — exactly the stale-write path the PR is supposed to block. Closed by re-running checkPriorRead immediately after every readTextFile that fed currentContent / originalContent: EditTool.calculateEdit and the two WriteFileTool paths (execute + getConfirmationDetails). A `stale` outcome now fails the operation with FILE_CHANGED_SINCE_READ at the correct moment. 2. Directory targets sent the model into an enforcement loop. `fileExists` is a plain access check, so directories also entered the enforcement branch — the model would be told to call `read_file`, but `read_file` rejects directories with TARGET_IS_DIRECTORY, so the loop never terminated. Fixed in checkPriorRead: if `fs.stat` reports the path is not a regular file, return `ok: true` so the downstream readTextFile / write path can surface its own EISDIR / similar error. 3. Confirmation-time error messages used the short `display` form instead of the full `raw` form. Approval-required Edit calls therefore lost the remediation detail (file path, stale-vs-unread distinction, "without offset / limit / pages" hint) that the execute path already surfaced and that the WriteFile confirmation path already preserved. EditTool.getConfirmationDetails now throws StructuredToolError with `editData.error.raw`. 4. Non-text payload displayMessage was grammatically broken: built from the gerund `editing` / `overwriting`, it rendered as "cannot editing via this tool" / "cannot overwriting via this tool". Fixed by deriving a bare-verb form (`edit` / `overwrite`) alongside the gerund and using it in displayMessage. (Items 1, 5 and 6 from Copilot's batch are the same TOCTOU class — EditTool calculateEdit + WriteFile execute + WriteFile confirmation — addressed together in (1) above.) The "bypasses enforcement entirely" test now uses mockReturnValue instead of mockReturnValueOnce because calculateEdit calls getFileReadCacheDisabled twice — once for the pre-read check and once for the post-read TOCTOU re-check. Both must see disabled=true to actually bypass. * fix(core): close fileExists TOCTOU on WriteFile prior-read enforcement WriteFile gated prior-read enforcement on `fileExists` from `isFilefileExists()`, but a file that sprang into existence between that check and the write would still be overwritten without enforcement — `fileExists === false` skipped the check entirely. Made the gate unconditional on `fileExists`. checkPriorRead's own `fs.stat` decides what to do: - ENOENT → ok:true, fall through to the new-file path as before - file exists right now (whether or not isFilefileExists saw it) → unknown / stale check runs, the race-created file is rejected. Applied to both getConfirmationDetails and execute. The path that actually creates new files is unchanged because checkPriorRead's ENOENT branch is the disappearance-race exit, which is the correct exit for "the file truly does not exist yet". Reported by glm-5.1 via /review on PR #3774. * fix(core): close 4 enforcement gaps + 1 critical bug from 5th Copilot review Six issues raised by deepseek-v4-pro / glm-5.1 / qwen3.6-plus on PR #3774. Listed by reviewer-assigned severity. [Critical] (qwen3.6-plus) recordWrite previously only seeded the read metadata for brand-new entries. The reproduction was real: ReadFile(limit=10) → WriteFile(full content) → Edit. The partial read's lastReadWasFull=false would persist through the write, and the Edit would be rejected with EDIT_REQUIRES_PRIOR_READ even though the model just authored every byte. recordWrite now unconditionally refreshes lastReadAt, lastReadWasFull=true, and lastReadCacheable=true. The fileReadCache.test.ts case that previously asserted "preserves lastReadAt" is rewritten to assert the new "refreshes lastReadAt to match the write" contract, and a new "upgrades lastReadWasFull / lastReadCacheable after a full write" regression test pins the reproduction reviewer described. [Suggestion] (deepseek-v4-pro) Narrowed the non-regular-file bypass in priorReadEnforcement from `!stats.isFile()` to `stats.isDirectory()`. The earlier broad form covered FIFOs, sockets, and devices that the model has no legitimate "read first" recourse for and that can block readTextFile (FIFO) or over-allocate (/dev/urandom). Those now flow through to cache.check() and reject with the unread-file path before any I/O. [Suggestion] (glm-5.1) Removed the `fileExists && ...` gate from EditTool.calculateEdit, mirroring the `f4ef75667` fix on WriteFile. A file that springs into existence between isFilefileExists() and the enforcement check is now caught here as well; ENOENT inside checkPriorRead remains the disappearance-race exit and new-file creation flow is unchanged. [Suggestion] (deepseek-v4-pro) Added debugLogger.warn() at every post-read TOCTOU rejection site (Edit calculateEdit, WriteFile getConfirmationDetails, WriteFile execute). These rejections are rare and self-healing — without a debug record, an operator investigating "why did this Edit fail once?" had nothing to grep. debugLogger uses dedicated 'EDIT_PRIOR_READ' / 'WRITE_FILE' tags. [Suggestion] (qwen3.6-plus) Added a final pre-write checkPriorRead in EditTool.execute() and WriteFileTool.execute(). The earlier post-read check ran inside calculateEdit (Edit) or before mkdirSync (WriteFile), but the actual writeTextFile call could be arbitrarily later — user approval, modify-and-confirm, etc. The window from "post-read check → writeTextFile" is now bounded to "pre-write stat → writeTextFile" (two adjacent syscalls). * fix(core): close new-file race + special-file enforcement loop Three issues from the latest Copilot review on PR #3774. 1. New-file race in pre-write enforcement (write-file.ts:348, edit.ts:487). The earlier pre-write checkPriorRead was gated on `fileExists` (WriteFile) and `!editData.isNewFile` (Edit). If the path was absent at planning time and another process created it while approval was pending, the gated form would skip enforcement and silently overwrite a pre-existing file the model never read. Run unconditionally in both tools — checkPriorRead's own ENOENT branch is the disappearance-race exit, so genuine new-file creation is unaffected, but a race-created file now hits the `unknown` branch and is rejected as unread. 2. FIFO / socket / device sent the model into an enforcement loop (priorReadEnforcement.ts:220). After narrowing the non-regular-file bypass to directories only, FIFOs etc. fell through to cache.check, returned `unknown`, and produced a "use read_file first" message — but read_file rejects those same targets as "not a regular file", so the model would loop on read_file forever. Added a dedicated `!stats.isFile()` branch (after the directory exemption) that returns a "special file; cannot edit/overwrite via this tool — use shell instead" message, matching the shape of the existing non-text-payload guidance. (Tool-error.ts and the non-cacheable policy notes are addressed in the PR description update — not in code.) * fix(core): close 4 enforcement gaps from 6th Copilot review (Plus a doc-only update for the 5th — the mtime+size limitation warning in the Risk section now mentions the silent-overwrite escalation that this PR's mutation paths bring along.) 1. ENOENT after the model has already read the file is no longer silently treated as `ok: true`. Added an `expectExisting` option to `checkPriorRead`; post-read and pre-write callers pass `true`. ENOENT under that flag now rejects with `FILE_CHANGED_SINCE_READ` ("file disappeared after the model read it") rather than falling through to the new-file path with stale bytes. Pre-read callers keep the old default (ENOENT → ok:true → fall through to genuine new-file creation). EditTool's pre-write check derives the flag from `editData.isNewFile`; WriteFile's pre-write check derives it from the post-read `fileExists` value. 2. Directory targets now reject with `TARGET_IS_DIRECTORY` and a structured message instead of returning `ok: true`. The previous form fell through to readTextFile(), which on the WriteFile confirmation path threw a plain Error and was surfaced by the scheduler as `UNHANDLED_EXCEPTION`. Both Edit and WriteFile now emit a structured rejection at enforcement time. (WriteFile's build-time validateToolParamValues already rejects directories, so the change matters most for EditTool.) 3. Non-cacheable rejection's `rawMessage` no longer hard-codes "overwrite" — it now uses the same `verbBare` derivation as the `displayMessage`, so EditTool's path correctly says "if you need to edit it" and WriteFile's path stays "if you need to overwrite it". The previous form was confusing for in-place edits. 4. WriteFile.getConfirmationDetails now mirrors execute()'s ENOENT-to-new-file fallback: a file that disappears between isFilefileExists() and the readTextFile-for-diff call no longer throws a plain Error (which would surface as UNHANDLED_EXCEPTION) — it falls back to the brand-new-file diff so the user sees a clean confirmation rather than an unstructured crash. Tests: - New: `rejects an edit on a directory with TARGET_IS_DIRECTORY` - New: `confirmation falls back to a new-file diff when the file disappears mid-flight` (WriteFile) - Updated: non-cacheable rejection asserts `verbBare` is "edit" on the EditTool path and "overwrite" on the WriteFile path. Reported by Copilot via /review on PR #3774. * docs(core): clarify stat→write race + EDIT_REQUIRES_PRIOR_READ scope Three doc-only follow-ups from Copilot's latest review pass on PR #3774. None change behaviour; the pre-fix code state was already the actual contract — the docs just lagged it. 1. EDIT_REQUIRES_PRIOR_READ enum comment now lists the three cases the code actually returns it for (never-read, partial / ranged / non-cacheable read, structural dead end — non-text payload or special file). The previous one-liner described only the first case and would mislead future maintainers. 2. The Final pre-write freshness check blocks in EditTool.execute and WriteFileTool.execute now spell out that they DO NOT eliminate the stat → writeTextFile race. The window narrows from the previously-unbounded post-read-to-write gap down to two adjacent syscalls, but a concurrent writer landing in that pair can still be clobbered. Closing the residual would require an atomic write (write-to-temp + rename) or a content-hash post-write recheck — both deferred. Operators who need strict protection set `fileReadCacheDisabled: true` and rely on application-level locking. 3. PR description Risk section gains a "Known unmitigated: stat → write race window" subsection (English + Chinese mirror) matching the code comments. * chore(core): minor follow-ups from review #4229917446 Three of the five MINOR items raised in the independent code review on 2026-05-05 — the cheap, isolated ones. The other two (race- simulating integration test, moving StructuredToolError out of priorReadEnforcement.ts) are deferred as the reviewer suggested. 1. EditTool now has a symmetric `PRIOR_READ_VERIFICATION_FAILED` regression test (mocks fs.promises.stat to reject with EACCES, asserts the EditTool path produces the same fail-closed result that the existing WriteFile EACCES test pins). Five-line fix to close the asymmetry that, while harmless today (the helper is shared), would let a future Edit-side change to checkPriorRead slip through without test coverage. 2. ensureParentDirectoriesExist / mkdirSync now run AFTER the pre-write checkPriorRead in both EditTool.execute() and WriteFileTool.execute(). Doing it before would leak intermediate directories on the rejection path — a real (if minor) FS litter the previous order created on every rejected new-file write. 3. EDIT_REQUIRES_PRIOR_READ enum docstring gains a one-line note for operators routing alerts on this code: a single `edit_requires_prior_read` signal can mean any of the three cases (no read / partial read / structural dead-end), and if per-cause monitoring becomes important the enum can be split in a follow-up. The originating tool name and the message text already disambiguate at runtime. * fix(core): close 2 correctness gaps from maintainer review #4232751470 Both tracked back to the cache's "track most recent read shape" model diverging from prior-read enforcement's "model has seen these bytes" model. 1. SVG (and similar string-content fallbacks) recorded as non-cacheable, blocking subsequent Edit / WriteFile. `read-file.ts` derives `cacheable` from `originalLineCount !== undefined && !isTruncated`. The SVG branch in `fileUtils.ts` returned content without `originalLineCount`, so `cacheable` collapsed to false and a follow-up Edit hit the dead-end "non-text payload — use shell" rejection — telling the model to use shell to mutate a file it had just successfully read as text. This was a real regression vs pre-PR behaviour where SVG-as-text editing worked. Fix: SVG-as-text branch now sets `originalLineCount` (split on '\n') and `isTruncated: false`, so ReadFile records it as a full cacheable read. The binary-fallback string and over-1MB SVG branches are deliberately left non-cacheable — they return placeholder strings ("Cannot display content of ...") rather than file content, so blocking edits there is correct. New regression test in `read-file.test.ts`: `records SVG-as-text reads with cacheable=true so a follow-up Edit passes enforcement`. 2. recordRead unconditionally overwriting lastReadWasFull / lastReadCacheable, revoking prior write-author or full-read rights. The `WriteFile(create) → ReadFile(offset/limit) → Edit` sequence rejected the Edit because the partial read clobbered the `lastReadWasFull = true` that `recordWrite` had stamped at create time. Same shape applies to a full text read followed by a partial one of the same inode. Fix: `recordRead` is now sticky-on-true for the read flags — `if (opts.full) entry.lastReadWasFull = true;` and the matching guard for `cacheable`. Prior `true` survives a later partial / non-cacheable read. The fast-path `file_unchanged` check still gates on the incoming request's own `isFullRead` in `read-file.ts`, so a partial read still does not get a placeholder it shouldn't. Updated the existing "overwrites earlier lastReadWasFull" test to assert the new sticky semantics, and added a `lastReadCacheable` symmetric test plus a `Write → partial-Read → Edit` end-to-end test in `edit.test.ts`. Reported by tanzhenxin via independent maintainer review on 2026-05-06. * fix(core): close 3 correctness gaps from re-review #4233904930 All three are tightenings of the prior ``de8ddf530`` round. 1. Sticky-on-true narrowed to "no fingerprint drift". `fileReadCache.recordRead` previously kept `lastReadWasFull` / `lastReadCacheable` true across drifted recordings, which re-opened a `Read full @X → external write @Y → Read partial @Y → Edit` hole: the partial recordRead silently advanced the entry's mtime+size to Y while preserving the sticky `full=true` from X, so a follow-up Edit ran against bytes the model only saw the first 10 lines of. Now the sticky branch only fires when `(mtimeMs, sizeBytes)` matches the existing entry; on drift, both flags reset to exactly what this read produced. New regression test in `fileReadCache.test.ts` reproduces the reviewer's reported sequence. 2. Subagent FileReadCache isolation now covers the inherits-model + same-approval-mode common case. The own-property machinery from #3717 only triggers when an `Object.create(parent)` actually fires; both `agent.ts:990-993` (same-approval-mode) and `subagent-manager.ts:699-701` (inherits-model) had paths that returned the parent Config directly, so the subagent's `getFileReadCache()` resolved to the parent's instance — a parent Read could satisfy the subagent's Edit on a path the subagent's transcript never contained. Both sites now build a thin `Object.create(base)` override unconditionally; no method changes for the inherits / same-mode cases, but a distinct instance triggers the lazy-init in `Config.getFileReadCache()` so the subagent gets an isolated cache. 3. Cache records the read pipeline's internal stat instead of a post-read re-stat. `processSingleFileContent` now surfaces its internal stat via `result.stats`, and read-file uses that for `recordRead` instead of running its own stat after the read returns. Pre-fix, an external write between the pipeline call and the post-read stat let the cache record fingerprint Y for content the model received at X — a subsequent Edit would pass enforcement against bytes the model never legitimately saw. The internal-stat-to-read window is still a few microseconds wide; that residue is the same content-hash territory acknowledged in the Risk section. Reported by tanzhenxin via re-review on PR #3774. * docs(core): clarify partial subagent isolation per review #4234090906 tanzhenxin's third review correctly observed that the `Object.create(parent)` wrappers in `agent.ts:createApprovalModeOverride` and `subagent-manager.ts:maybeOverrideContentGenerator` only isolate the FileReadCache for code that consults `Config.getFileReadCache()` directly. Bound `EditTool` / `WriteFileTool` instances were registered against the parent's tool registry at initialise time, so tool invocations still resolve `this.config` to the parent and reach the parent's cache. `InProcessBackend.createPerAgentConfig` already does the right thing (`override.createToolRegistry()` + `copyDiscoveredToolsFrom(base.getToolRegistry())`); bringing that to these two spawn sites is the real fix. Reviewer's verdict was COMMENT, not REQUEST_CHANGES — the gap pre-dates this PR (it's a property of #3717's per-Config own-property machinery) and pre-PR there was no enforcement on subagent mutations at all, so the PR is strictly an improvement on every spawn path. Documented the partial guarantee explicitly: - Inline comments on both spawn sites note the bound-tool caveat and point at `InProcessBackend.createPerAgentConfig` as the model for the follow-up. - PR description's subagent paragraph (English + Chinese mirror) now splits into "fully isolated" (`InProcessBackend.createPerAgentConfig`) and "partial isolation" (the two sites in this PR) so readers don't walk away with the wrong contract. Filing the registry-rebuild work as a follow-up; not in this PR.	2026-05-06 17:17:48 +08:00
tanzhenxin	8a1ed565a6	fix(core): auto-compact subagent context to prevent overflow (#3735 ) Some checks are pending Qwen Code CI / Lint (push) Waiting to run Details Qwen Code CI / Test (push) Blocked by required conditions Details Qwen Code CI / Test-1 (push) Blocked by required conditions Details Qwen Code CI / Test-2 (push) Blocked by required conditions Details Qwen Code CI / Test-3 (push) Blocked by required conditions Details Qwen Code CI / Test-4 (push) Blocked by required conditions Details Qwen Code CI / Test-5 (push) Blocked by required conditions Details Qwen Code CI / Test-6 (push) Blocked by required conditions Details Qwen Code CI / Test-7 (push) Blocked by required conditions Details Qwen Code CI / Test-8 (push) Blocked by required conditions Details Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details SDK Python / SDK Python (3.10) (push) Waiting to run Details SDK Python / SDK Python (3.11) (push) Waiting to run Details SDK Python / SDK Python (3.12) (push) Waiting to run Details * fix(core): auto-compact subagent context to prevent overflow Subagent chats accumulated history without ever compacting, so a long multi-turn run could hit "max context length exceeded" before the compaction logic the main session uses had a chance to fire. Move compaction down into the chat layer so both main agent and subagent auto-compress at the configured threshold, and surface the result via a new chat stream event that bridges into the existing ChatCompressed UI path. The main-session wrapper still owns full /compress reset. Closes #3664 * fix(core): address subagent compaction review feedback Code fixes: - Seed `lastPromptTokenCount` on subagent chats so the first-send threshold gate sees the inherited history's true size. - Add `COMPRESSION_FAILED_TOKEN_COUNT_ERROR` to the fail-latch chain so token-counting failures stop retrying compression every send. - Restore `FileReadCache.clear()` after compaction in both the manual /compress wrapper and the auto-compaction path inside GeminiChat, preventing post-summary `file_unchanged` placeholders from pointing at content the model can no longer retrieve. - Refresh stale comment on the `compressed → ChatCompressed` bridge in turn.ts now that this path is the primary route, not a fallback. Tests: - turn.test.ts asserts the compressed → ChatCompressed bridge. - geminiChat.test.ts asserts COMPRESSED yields as the first stream event after auto-compaction succeeds. - chatCompressionService.test.ts bumps originalTokenCount above the cheap-gate so the NOOP test exercises findCompressSplitPoint. - client.test.ts asserts forceFullIdeContext flips when a ChatCompressed event flows through sendMessageStream's loop. * chore(core): drop redundant StreamEvent cast and document auto-compaction trade-off - Remove the `as StreamEvent` cast at the COMPRESSED yield site — the literal already matches the union member. - Add a 4-line comment at the auto-compaction setHistory point that points readers to GeminiClient for the env-refresh trade-off rationale, so readers don't have to chase the layering decision back across files.	2026-05-06 14:09:07 +08:00
tanzhenxin	808d0978eb	feat(cli): route foreground subagents through pill+dialog while running (#3768 ) * feat(cli): route foreground subagents through pill+dialog while running Foreground (synchronous) subagents currently render a live AgentExecutionDisplay inside the parent's pendingHistoryItems block. The frame mutates on every tool call and approval; once it grows past the terminal height (verbose mode, parallel subagents, long tool-call lists) the live-area repaint flickers visibly. This change extends BackgroundTaskRegistry with a flavor: 'foreground' \| 'background' discriminator. Foreground entries register at the start of the synchronous tool-call and unregister in its finally path. The pill counts them; the dialog drills into their activity. The inline frame is suppressed during the live phase — only an active, focus-locked approval prompt renders, as a small banner labeled with the originating agent. Once the parent turn commits, the full AgentExecutionDisplay appears in scrollback via Ink's <Static>, exactly as before. Foreground entries skip the XML task-notification (the parent receives the result through the normal tool-result channel) and skip the headless holdback (the parent's await already pins the loop). The dialog gates per-agent cancellation behind a two-step confirm so a stray 'x' can't end the user's current turn. * fix(cli): address review findings on foreground subagent routing - Gate `registerCallback` on background flavor so foreground entries don't leak orphaned `task_started` SDK events without a matching terminal notification. - Render a queued-approval marker for non-focus subagents instead of returning null, so a queued approval is visible in the main view. - Move `emitStatusChange` before `agents.delete` in `unregisterForeground` to match the ordering used by complete/fail/cancel/finalize. - Prefix the foreground tool result with a cancel marker when `terminateMode === CANCELLED`, so the parent model can distinguish a user-cancelled run from a successful completion. - Mirror the background path's stats wiring on the foreground path so `entry.stats` stays current and the dialog detail subtitle shows tool count + tokens for foreground runs. - Remove the unreachable `isWaitingForOtherApproval` branch (subsumed by the queued-approval marker above). - Reset the foreground confirm-step on detail-mode `left` and ignore `x` on terminal entries so an armed cancel can't carry into list mode and the hint footer/handler stay in sync. - Test factory uses a `baseProps` spread instead of `as` cast so a future required field on `ToolMessageProps` is a compile-time miss.	2026-05-06 14:08:12 +08:00
易良	fe1fb31557	fix(cli): prevent file paths from being treated as slash commands (#3743 ) * fix(cli): prevent file paths from being treated as slash commands (#1804) When users input file paths starting with '/' (e.g. '/api/apiFunction/...', '/Users/name/path'), they were incorrectly parsed as slash commands, resulting in "Unknown command" errors. The input was discarded instead of being sent to the model for processing. Root cause: isSlashCommand() only checked for a '/' prefix without validating whether the first token actually looks like a command name. Any '/' prefix triggered the slash command flow, and when no matching command was found, the error was shown with no fallback. Fix: Add looksLikeCommandName() that validates command names contain only [a-zA-Z0-9:_-]. Both isSlashCommand() and handleSlashCommand() now check the first token — if it contains path separators, dots, or non-ASCII characters, the input falls through to normal model processing instead of the command dispatcher. Closes #1804 * fix(cli): allow dots in command names and fix prettier formatting Address review feedback: - Allow '.' in looksLikeCommandName() regex to support extension-qualified commands like gcp.deploy (CommandService renames conflicts as ext.cmd) - Add regression tests for dot-named commands in both commandUtils and slashCommandProcessor - Fix prettier formatting in slashCommandProcessor test file * fix(cli): handle slash command review edge cases * docs(cli): align slash command validation comment * fix(cli): preserve slash prompt ordering * fix(cli): reject shell-metacharacter slash tokens * test(cli): align slash command action mocks * fix(cli): track model-sent user turns * fix(cli): narrow slash path handling scope * fix(cli): count slash path prompts in history * test(cli): type slash command action mocks	2026-05-06 14:03:08 +08:00
jinye	2bd4aa1b6d	fix(sdk-python): standardize TAG_PREFIX to include `v` suffix (#3832 ) Align the Python SDK's TAG_PREFIX with the TypeScript SDK convention by changing it from 'sdk-python-' to 'sdk-python-v'. This removes the need for callers to manually inject the `v` when composing git tags, eliminating an asymmetry that could lead to doubled or missing `v` prefixes when code is copied between SDK release helpers. The final tag format (sdk-python-v0.1.0) is unchanged. Closes #3793 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com>	2026-05-06 11:14:02 +08:00
Shaojin Wen	d44786685c	feat(core,cli): surface and cancel auto-memory dream tasks (#3836 ) * feat(cli): surface auto-memory dream tasks in Background tasks dialog Adds `dream` as a fourth kind in the Background tasks pill + dialog, alongside agent / shell / monitor. Subscribes to MemoryManager via its existing subscribe() / listTasksByType() API and adapts MemoryTaskRecord into a DreamDialogEntry view-model. Zero changes to the core package. Filters out `pending` (sub-second transition) and `skipped` (every UserQuery that misses the gate creates one) records. Caps retained terminal entries at 3 since `MemoryManager.tasks` has no eviction path; without the cap, completed dreams would accumulate over the project's lifetime (mirrors MonitorRegistry's terminal cap pattern). Extract tasks are intentionally NOT surfaced — they fire on every UserQuery, would flood the pill, and the `memory_saved` toast in useGeminiStream already covers their completion signal. Read-only for now: cancellation requires MemoryManager.cancelTask + task_stop integration which lands in a follow-up PR. The dialog suppresses the "x stop" hint for dream entries until then to avoid silent no-op keystrokes. Refs #3634 * docs(cli): rephrase dream filter comment to focus on extract vs dream The earlier comment compared the design to a non-qwen-code product; restate the rationale in terms of the local extract / dream split (extract fires every UserQuery and surfaces via memory_saved toast, dream fires after gates and warrants pill / dialog visibility). * feat(core,cli): cancel dream consolidation tasks via dialog and task_stop Wires cancellation for the auto-memory dream task kind: - `MemoryManager.cancelTask(taskId)` — aborts the dream's fork-agent via a new per-task AbortController, marks the record `cancelled` before aborting so the runDream catch path can detect user-intent and avoid overwriting with a generic `failed`. The existing finally block releases the consolidation lock as the agent unwinds. - `MemoryManager.getTask(id)` — point lookup helper so cross-cutting consumers like `task_stop` can route by id without a project root. - AbortSignal threaded through `scheduleDream` → `runDream` → `runManagedAutoMemoryDream` → `planManagedAutoMemoryDreamByAgent` → `runForkedAgent.abortSignal` (already supported). - `task_stop` tool gets a 4th dispatch branch: dream task ids look up via MemoryManager and route through `cancelTask`. Extract is intentionally NOT cancellable — it runs synchronously on the request loop, cancelling would interfere with the user's own turn. - `BackgroundTasksDialog` x-stop hint suppression for dream is removed (was a PR-1 placeholder); `cancelSelected` dream branch now calls `memoryManager.cancelTask`. - `MemoryTaskStatus` gains `'cancelled'`. Dream view-model widens status union and filter to surface cancelled entries (terminal cap continues to apply). Refs #3634 * fix(core,cli): address review feedback on dream cancellation surface - DreamDetailBody: comment said cancellation "lands in PR-2" but the same PR wires `cancelSelected` for dream entries. Reword to describe what's actually shipped + flag in-flight progress as the real follow-up. - task_stop: drop the unreachable `!aborted` error branch. The status guard above already confirms `running`, and `cancelTask` is synchronous; in this branch it cannot return false. * fix(core): handle resolved-cancel path from runForkedAgent for dream tasks runForkedAgent maps AgentTerminateMode.CANCELLED to a resolved {status: 'cancelled'} result rather than rejecting. The cancel-via- task_stop path landed in the previous commit assumed the call would throw — when it didn't, the runDream success path overwrote the user-cancelled record with 'completed' AND bumped lastDreamAt metadata, suppressing the next legitimate dream cycle. Two-layer defense: - dreamAgentPlanner now rethrows when the fork agent reports cancelled status (mirrors the existing failed-status throw). This is the source-of-truth fix. - runDream now checks abortSignal.aborted after the await as defense in depth. If anything in the call chain ever forgets to propagate, this guard short-circuits the success path before metadata write. Updates the existing dreamAgentPlanner test that previously pinned the buggy "returns cancelled without throwing" behavior. Adds a manager test that simulates the resolves-on-abort scenario directly to verify the consumer-side guard catches anything the planner might miss. * fix(core,cli): address review feedback on dream UX, perf, and comment clarity Three fixes from the post-cancellation-PR review: - scheduleDream now sets an initial `progressText` ("Scheduled managed auto-memory dream.") on the in-flight record. Without this, the dialog Detail's Progress section stayed empty until completion — the PR description's mid-flight screenshot showed text that production never actually rendered. - useBackgroundTaskView gates the MemoryManager.subscribe listener on a dream-content signature. The manager fires for every task transition (extract included, ~2x per UserQuery), but the dialog has no extract surface; without this dedup each extract notify forced a full 4-source re-merge + a fresh setEntries reference, re-rendering the dialog and pill on entries that hadn't changed. Added a test that pins the reference-stability invariant. - task-stop comment was misleading — said "the status guard above already confirmed running", but for the dream branch the running check happens IN this branch (not earlier). Reworded. * fix(core,cli): tighten cancelTask contract, dedup dream snapshot reads, sync test helper Four fixes from the latest review pass: - cancelTask() now enforces the AbortController invariant. The old `ac?.abort()` returned `true` even when no controller was found, meaning callers could see a successful return while the dream was not actually aborted (and would leak the consolidation lock until the agent finished naturally). The controller is registered synchronously alongside `status='running'`, so a missing controller for a running record is a contract violation — return false without flipping status so the caller knows the abort didn't take. - useBackgroundTaskView's `refresh()` now reuses the dream snapshot the memory listener fetched for its dedup gate. The previous version re-read `listTasksByType('dream')` inside `computeDreamSig()` and again inside `refresh()` — extra work AND a race window where the gate signature could come from a different snapshot than the one used to build dreamEntries. Single read, single source of truth. - The `dream()` test helper widened to include `'cancelled'` so it matches the production `MemoryTaskStatus` union. Added a small test asserting cancelled dreams flow through the kind discriminator (the dialog's terminal-cap window depends on showing the user the outcome of the abort they just triggered). - Dropped the stale "PR-1 read-only / PR-2 cancellation" block comment above the dream-entries describe block — both are in this PR now. * fix(core,cli): address self-review + Copilot feedback on dream surface Combined fixes for the latest review pass — self-review notes (U1, U2, U5, U6) plus Copilot's three new comments (C1, C2, C3): - Footer dream-indicator dedupe (U1): removed `useDreamRunning` + `✦ dreaming` right-column text. The new Background tasks pill already counts dream tasks alongside agent / shell / monitor; showing both produced two simultaneous signals for the same state. - task_stop dispatch surfaces extract distinctly (U2 + C1): a new `TASK_STOP_NOT_CANCELLABLE` error type fires when the task id resolves to a known-but-not-cancellable record (extract). Previously extract ids fell through to `NOT_FOUND`, misleading the model into thinking the id was never valid. Also surfaces the missing-controller case from `cancelTask` as an explicit error rather than reporting phantom success. - MemoryManager.cancelTask logs missing-controller violation (C2): the silent `return false` for the missing-AbortController case now emits a `debugLogger.warn` so the inconsistency is observable in debug bundles. Without the log a runaway dream burning tokens would leave no trail. - MemoryManager.subscribe taskType filter (C3): subscribers can now opt into per-type notify routing via `subscribe(fn, { taskType })`. Internal `notify()` calls pass the changed task's type so filtered consumers wake only on relevant transitions. The bg-tasks UI hook uses this to skip the per-UserQuery extract notify entirely — drops the per-extract O(n) signature work to zero. - runDream guards against late-cancel overwrite (U5): the success path now re-checks `abortSignal.aborted` between metadata read/write and before the final `update({status: 'completed'})`. Closes the ~tens-of-ms race window where `cancelTask` flipping status to `'cancelled'` would silently lose to the success continuation overwriting with `'completed'` + bumping `lastDreamAt`. - DreamDialogEntry.endTime semantic comment (U6): documents that `endTime` for cancelled records is the cancel-call moment (not the fork unwind), so a future maintainer doesn't treat it as a real fork-finish timestamp. Tests: new `subscribe() taskType filter` describe block in manager, new task_stop tests for `NOT_CANCELLABLE` (extract) and missing- AbortController (dream); existing test renamed/widened. * fix(core,cli): tighten error semantics + comments on dream surface Four fixes from the latest review pass: - New `TASK_STOP_INTERNAL_ERROR` error type for the missing- AbortController contract violation. Previously the dispatcher reused `TASK_STOP_NOT_RUNNING`, which is misleading — the task IS running, cancellation just couldn't be delivered. Distinct type signals "this is unexpected, file a bug" vs `NOT_CANCELLABLE` which signals "expected behavior, use a different approach". - Reworded the `useBackgroundTaskView` filter comment. Said "every UserQuery that misses the gate creates one [skipped record]" but `scheduleDream` returns `{status: 'skipped'}` early without creating a record for most gate misses; only the acquireDreamLock/EEXIST race actually stores a `'skipped'` record. - Strengthened the `subscribe() unsubscribe` test. The previous version asserted "not called yet" without firing any notify after unsubscribe, so a regression that left the listener attached would still pass. Now schedules an extract before AND after the unsubscribe, verifying the call count doesn't increment. - Moved `const debugLogger = createDebugLogger(...)` below the full import block in manager.ts. Previous version sat between imports, violating eslint-plugin-import's `import/first` rule (didn't trip lint locally, but worth fixing before it does). * fix(core): skip scheduleDream early when params.config is missing `ScheduleDreamParams.config` is optional in the type so test paths can omit it, but production callers always pass one. Without a config, `runManagedAutoMemoryDream` throws because the fork-agent execution requires it. With dream tasks now visible in the Background tasks dialog, that throw becomes a noisy `failed` entry the user sees but didn't trigger. Convert the omitted-config case to the same `disabled` skip path that an explicitly-disabled config takes, so a no-config call short-circuits before any record is stored. Existing tests that relied on the old "no config = proceed past the disabled gate" behavior now pass an explicit `makeMockConfig()` (matching what they would do in any realistic scenario). New test pins the no-config skip behavior + asserts no record was stored (so a regression that drops the early skip would produce a visible failed entry in the dialog and fail the test). * fix(core): plug subscribe Map leak + dream.ts late-cancel ordering bug Two fixes from the latest review pass: - `MemoryManager.subscribe`'s typed-branch unsubscribe deleted the listener from its per-type Set but left the empty Set sitting in `subscribersByType`. Over a long-running session with repeated React mount/unmount of the bg-tasks view, that accumulates dead Map entries forever. Drop the entry when the bucket goes empty. - `runManagedAutoMemoryDream` writes metadata after the fork agent returns (`bumpMetadata` → `rebuildManagedAutoMemoryIndex` → `updateDreamMetadataResult`). If the user presses 'x' between the fork's success return and these writes, the writes proceed and bump `lastDreamAt` — leaving the visible UI ('Stopped') disagreeing with the scheduler gate (sees a recent successful dream and suppresses the next cycle). manager.ts already short-circuits its own metadata write via the post-await abort check, but it can't block writes that already happened inside dream.ts. Adds the same abort-signal check between each write step here. * fix(core): swallow releaseDreamLock errors so they don't poison outcome If `releaseDreamLock` throws inside the inner finally (e.g. filesystem error on the lock file), the exception propagates to the outer catch and overwrites a successfully-completed dream record with 'failed'. The on-disk metadata is already up-to-date at that point, so the user sees a contradictory state — `lastDreamAt` was bumped but the UI shows a failure. Wrap the release in a try/catch with `debugLogger.warn`. The lock file is still cleaned up on the next session via the existing staleness sweep, so swallowing the release error doesn't risk a permanent stuck lock. * fix(core): thread abortSignal into dream metadata writes The pre-call abort checks in `runManagedAutoMemoryDream` close most of the late-cancel race window, but each metadata helper itself does read → mutate → write across two awaits. If the user cancels between those two awaits the write still happens, persisting `lastDreamAt` for an aborted run and suppressing the next legitimate dream cycle. Thread `abortSignal` into `bumpMetadata` and `updateDreamMetadataResult`; both now re-check between the read and the write, returning early without persisting when the signal has already fired. The pre-call checks remain as the first line of defense; this guards the race that opens after the call enters. * fix(core): close dream cancel race + surface lock-release failures Two related fixes from the latest review pass: - Move scheduler-gating metadata writes out of `runManagedAutoMemoryDream` and into `MemoryManager.runDream`, sequenced AFTER the status='completed' flip. The previous shape left a race window where cancellation arriving during/after `fs.writeFile` could persist `lastDreamAt` while the manager flipped status to 'cancelled' — visible UI ('Stopped') would disagree with the scheduler gate (sees a recent successful dream), suppressing the next legitimate dream cycle. The new order makes the gating metadata write race-free: once status !== 'running', cancelTask refuses, so any cancel arriving during the metadata write is ignored. The remaining "cancel raced the synchronous status update" window is handled by a post-update abort recheck that restores 'cancelled' and skips the metadata write. Drops the now-dead `bumpMetadata` helper from dream.ts; index rebuild stays there since it's informational, not gating. - Surface `releaseDreamLock` failures on the task record's metadata (`lockReleaseError`). The previous fix logged-and-swallowed only, so a Windows EPERM or ENOENT race would silently block subsequent dreams as 'locked' with no UI signal explaining why dreaming had stopped. Logger keeps emitting the warn for debug bundles. * fix(core,cli): address 8 review findings on dream surface Reverts the metadata-write behavior regression, plugs the storeWith reentrancy hole, surfaces lock/metadata warnings in the UI, and a handful of cleanups: - Wrap gating-metadata read+write in try/catch (manager.ts). The PR moved metadata writes from dream.ts (best-effort, swallowed) to manager.ts (unguarded). A throw from readDreamMetadata / writeDreamMetadata now propagates to the outer catch and overwrites a successfully-completed dream with 'failed' — the dream actually did its work and touched files are visible. New catch logs + writes `metadataWriteError` on the record so the UI can explain why the next dream may re-fire sooner than expected. - Register the AbortController BEFORE storeWith in scheduleDream. storeWith fires a notify; a subscriber synchronously calling cancelTask(record.id) would otherwise see status='running' but no controller, hitting the missing-controller defensive warn path and reporting a phantom failure on a brand-new dream. - Surface `lockReleaseError` and `metadataWriteError` in the dream view-model and DreamDetailBody (rendered as warnings, not errors, so the terminal status stays Completed). Previous fix wrote them to record.metadata only — nothing in the cli read or rendered them, so users still had no UI signal. - Preserve `result.touchedTopics` on the unreachable cancel-raced- status-update branch. If a future refactor introduces an await there, the restored cancelled record would otherwise drop the already-produced result; the UI would report a clean cancellation even though memory files were already modified. - Add `'cancelled'` to MemoryDreamEvent status union and emit a cancelled telemetry event from the runDream catch path. Without this a cancelled dream is indistinguishable from one that never scheduled in the first place. - Drop the dead abortSignal param from updateDreamMetadataResult in dream.ts (no caller passes it after the manager.ts move). - Swap declaration order of `computeDreamSig` and `refresh` in useBackgroundTaskView.ts (TDZ-fragile against a future refactor that adds a synchronous refresh call between them). - Update the unreachable cancel-raced-update branch comment to describe what's actually true ("defense-in-depth, unreachable today") instead of the confusing "cancelTask flipped" path. - cancelSelected now checks the cancelTask return value and logs via debugLogger when false. Today this branch is unreachable thanks to the controller-register-before-storeWith fix above, but if a future refactor breaks the invariant the silent ignore would let the user think the cancel took effect. - Mirror the manual /dream metadata path's `recentSessionIdsSinceDream = []` reset in the auto path — field is dead code today but keeping the two write sites in sync avoids surprises. Telemetry metric `recordMemoryDreamMetrics` widened to accept the new 'cancelled' status (downstream consumer in loggers.ts). * fix(memory): same-session recovery when releaseDreamLock throws The prior fix surfaced lockReleaseError on the dialog so the user knows the lock release failed (Windows EPERM, ENOENT race, disk full, etc.) — but until next process start, dreamLockExists() still sees a fresh-mtime lock owned by an alive PID (us!) and silently suppresses every subsequent scheduleDream() call as `{status: 'skipped', skippedReason: 'locked'}`. The user sees the warning AND zero further dream activity, and the staleness sweep that would clean the leaked lock only runs at session start. Adds a `dreamLockReleaseFailed` flag set in the catch. The next scheduleDream() force-cleans the leaked lock file via fs.rm({force: true}) before the existence check, so dream scheduling resumes within the same session. Best-effort: if even the forced rm fails (truly unrecoverable filesystem state), falls through to the existing 'locked' skip path. This is an incremental improvement on top of b00ecdebd's UI-surface fix. The two together give the full story: warning visible → automatic recovery on next attempt. * fix(memory): real cancelled-dream duration + clarify index-rebuild ordering Two follow-up suggestions from review: - `duration_ms: 0` in cancelled-dream telemetry (manager.ts): the user-cancel path emitted `MemoryDreamEvent` with `duration_ms: 0`, which would silently skew latency histograms / p95 metrics by treating cancelled dreams as instant. Capture `dreamStartMs = Date.now()` at the top of `runDream` and emit the real elapsed time in the cancel branch. - Misleading index-rebuild comment (dream.ts:75–84): the comment claimed the index rebuild "is still done before returning when topics were touched", but the code returns early on `abortSignal?.aborted` BEFORE the rebuild. Rewrote the comment to describe the actual cancel-aware ordering — abort returns partial result without rebuilding (rebuild is expensive; next dream cycle will rebuild against the latest files anyway), live path rebuilds only when topics changed. 22 / 22 manager tests pass; tsc clean.	2026-05-06 10:20:19 +08:00
John London	2a5be0d3b1	fix(core): prevent auto-memory recall from blocking main request (#3814 ) Some checks are pending Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / Lint (push) Waiting to run Details Qwen Code CI / Test (push) Blocked by required conditions Details Qwen Code CI / Test-1 (push) Blocked by required conditions Details Qwen Code CI / Test-2 (push) Blocked by required conditions Details Qwen Code CI / Test-3 (push) Blocked by required conditions Details Qwen Code CI / Test-4 (push) Blocked by required conditions Details Qwen Code CI / Test-5 (push) Blocked by required conditions Details Qwen Code CI / Test-6 (push) Blocked by required conditions Details Qwen Code CI / Test-7 (push) Blocked by required conditions Details Qwen Code CI / Test-8 (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details * fix(core): prevent auto-memory recall from blocking main request (issue #3759) The auto-memory recall side-query had a 5s AbortSignal.timeout that fired on every turn, and the main request path awaited the full recall promise (including timeout + heuristic fallback). This caused a ~5s delay on every user turn. Changes: - relevanceSelector.ts: reduce model-driven selector timeout from 5s to 2s - client.ts: add resolveAutoMemoryWithDeadline() that races the recall promise against a 2.5s deadline, returning empty result if recall hasn't completed in time - client.test.ts: add 2 tests verifying slow recall doesn't block the main request and fast recall still includes memory content Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): remove stripThoughtsFromHistory from GeminiChat mocks stripThoughtsFromHistory was removed from GeminiChat on main. The mock in client.test.ts still referenced it, causing TS2353 build failures in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(core): address PR #3814 review comments on auto-memory recall deadline Three changes from the review: 1. clearTimeout leak — The setTimeout in resolveAutoMemoryWithDeadline is now cleared in a finally block so the dangling timer doesn't block the Node.js event loop from draining during graceful process exit. 2. Telemetry inflation — An AbortController is created before the recall call and aborted when the 2.5s deadline fires. Its signal is passed into the recall pipeline via a new optional abortSignal field on ResolveRelevantAutoMemoryPromptOptions. All three logMemoryRecall calls in recall.ts are gated behind !options.abortSignal?.aborted so discarded results don't emit success metrics. 3. Test coverage — Added a test for the !promise guard path (getManagedAutoMemoryEnabled() returns false) verifying sendMessageStream completes without calling recall or injecting memory content. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(core): address PR #3814 review findings for auto-memory recall Critical fixes: - Wrap onDeadline() in try/finally inside resolveAutoMemoryWithDeadline so resolve() always runs even if onDeadline() throws, preventing deadlock - Forward caller abortSignal through recall() → selectRelevantAutoMemoryDocumentsByModel → runSideQuery, combined with the existing 2s timeout via AbortSignal.any() so the model API call is cancelled when the 2.5s deadline fires - Abort and clear pendingRecallAbortController on all early return paths (MaxSessionTurns, BoundedTurns=0, SessionTokenLimitExceeded, Arena control signal) - Rename _pendingRecallAbortController → pendingRecallAbortController (inconsistent underscore prefix) Tests added: - client.test.ts: verify sendMessageStream completes normally when recall rejects - relevanceSelector.test.ts: verify caller abort signal is forwarded combined with timeout, and timeout-only signal when no caller signal provided * fix(core): start auto-memory recall deadline at initiation time Moves the resolveAutoMemoryWithDeadline() race from consumption time to initiation time so the 2.5s budget is not consumed by intermediate work (microcompact, compression, token checks, IDE context) between recall initiation and consumption. The raced promise is stored directly in relevantAutoMemoryPromise and simply awaited at consumption. The pendingRecallAbortController cleanup on early return paths is preserved unchanged. * fix(core): address PR #3814 review feedback - resetChat(): abort pendingRecallAbortController so stale in-flight recall does not leak into the next session. - recall.ts catch: distinguish AbortError (deadline-triggered cancellation) from real model errors. AbortError logs at debug level with a message indicating the heuristic result was discarded. Real model errors continue to log at warn with the existing fallback message. * fix(core): address PR #3814 round 3 review feedback - Use explicit undefined check instead of non-null assertion for timer in resolveAutoMemoryWithDeadline (clearTimeout) - Gate heuristic fallback in recall.ts on abortSignal.aborted so discarded results after the deadline don't produce output - Downgrade AbortError log level from warn to debug in the client.ts recall catch block, keeping the warn channel meaningful for real failures - Add tests verifying pendingRecallAbortController.abort() is called on MaxSessionTurns and SessionTokenLimitExceeded early-return paths Co-Authored-By: Claude <noreply@anthropic.com> * test(core): assert abort propagation in relevance selector * fix(lint): remove unused eslint-disable directive in relevanceSelector.test.ts The import/no-internal-modules eslint-disable comment was unnecessary — the rule does not flag same-package imports like ../utils/sideQuery.js. ESLint 9 reports it as an unused directive, failing the CI lint check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-06 07:34:41 +08:00
John London	c7facf5013	fix(core): use per-model settings for fast model side queries (#3815 ) Fixes #3765. Side queries (session recap, title generation, tool-use summary) running on the fast model previously inherited the main model's ContentGeneratorConfig, leaking extra_body / samplingParams / reasoning between models. GeminiClient.generateContent() now resolves the target model's own config via buildAgentContentGeneratorConfig() when the requested model differs from the main model, and creates a dedicated ContentGenerator (cached, cleared on resetChat). Cross-authType resolution lets the fast model live under a different provider than the main model. Uses the target model's authType for retry logic so provider-specific checks (e.g. QWEN_OAUTH quota detection) reference the correct provider. Falls back to the main generator if the model is not in the registry.	2026-05-06 02:07:32 +08:00
秋	5d0adbb5bd	fix(core): activate skills from discovered result paths (#3852 ) * fix(core): activate skills from discovered result paths * fix(core): address result path review feedback * fix(core): handle grep result path edge cases * fix(core): trust fallback grep result paths	2026-05-06 01:57:30 +08:00
Mr-Maidong	f4a9f7bfd7	feat(weixin): add image sending support via CDN upload (#3781 ) * feat(weixin): add image sending support via CDN upload * fix(weixin): address PR review — path validation, encoding, timeout, error handling Critical fixes from wenshao's review of feat/weixin-image-send: 1. File read vulnerability: add validateImagePath() in send.ts with directory allowlist, extension filter, magic-byte check, 20MB cap, and realpath resolution. Pass workspace cwd as allowed dir. 2. aes_key encoding: change from base64(hex-ascii) to base64(raw 16B) to match the protocol expectation (images use raw bytes, not hex). 3. uploadToCdn timeout: add AbortController + 40s timeout per retry attempt to prevent hanging on stalled CDN connections. 4. Unhandled rejection: wrap fallback sendText() in catch block with its own try/catch to prevent process crash on double failure. 5. Default instructions merge: append image capability guide when custom instructions lack [IMAGE:], instead of silently dropping it. 6. Dead code: remove unused imagePaths parameter from sendMessage(). 7. Regex hardening: strip code blocks before [IMAGE:] extraction, filter empty paths to prevent confusing readFileSync('') errors. 8. URL validation: reject http:// URLs and validate CDN hostname in uploadToCdn (SSRF prevention). Tests: replace identity mock with real encryptAesEcb/computeMd5 so padding mismatches are caught; fix partial node:crypto mock. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(weixin): address 2nd round PR review — 10 issues Critical fixes: 1. detectImageMime: add JPEG magic bytes (0xFF 0xD8 0xFF), throw on unrecognized format instead of defaulting to image/jpeg 2. getUploadUrl retry: pass errcode to WeixinApiError, add ret field check in isRetryableError so actual API errors trigger retries 3. ALLOWED_DIRS: add realpathSync('/tmp/') and realpathSync(tmpdir()) to handle macOS symlink resolution (/tmp → /private/tmp) 4. [IMAGE:] stripping: only replace markers that were actually parsed, preserving [IMAGE:] inside code blocks in displayed text 5. TOCTOU fix: use openSync/readSync(16B) for magic-byte check instead of reading the entire file twice 6. sendMessage: check both ret and errcode fields for error detection Suggestions: 7. connect(): avoid mutating this.config.instructions on reconnect 8. Fix duplicate Step 2 comment numbering 9. Replace errcode=${undefined} with errcode=${resp.errcode ?? '(none)'} 10. stderr: log structured (status=, ret=) instead of raw errmsg Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(weixin): 3rd round PR review — errcode checks, error logging, timeout, path resolution - api.ts: add errcode check in getUploadUrl (align with sendMessage) - api.ts: pass ret/errcode from CDN error to WeixinApiError - send.ts: resolve workspace dirs with realpathSync - WeixinAdapter.ts: include errcode and err.message in error log - media.ts: add 40s timeout to downloadAndDecrypt fetch - send.test.ts: add 12 tests covering detectImageMime and validateImagePath error branches Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Maidong <408097061@qq.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-06 01:49:26 +08:00
Alex Musick	174b3ac951	feat(cli): Add ability to switch models non-interactively from the cli (#3783 ) * Add ability to switch models non-interactively from the cli This fulfills request #3410 * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Revert "Update packages/cli/src/ui/commands/modelCommand.ts" This reverts commit `7b49c442a7`. * Protect against empty model name; align non-interactive behavior Empty model names are now ignored. ```/model ``` (with trailing whitespace) will still open the interactive model picker. Realigned the non-interactive path to use the same ```args.trim().split(' ')[0]``` logic. Valid model IDs can not contain spaces anyway. If preferred, this specific change can be reverted and the new code can use the old logic instead. * Warn if model is not in registry * Update command description * Updated modelCommand test to reflect new description * Implement auto-complete with model IDs * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Revert "Update packages/cli/src/ui/commands/modelCommand.ts" This reverts commit `0600b2373a`. * Update modelCommand.ts * Update modelCommand.ts * Update modelCommand.ts * Update/use i18n keys * Corrected en i18n * removed redundant .trim() on modelName check --------- Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>	2026-05-06 00:54:27 +08:00
Yan Shen	89cb326c2f	feat(cli): improve export format completion navigation (#3701 ) Some checks are pending Qwen Code CI / Lint (push) Waiting to run Details Qwen Code CI / Test (push) Blocked by required conditions Details Qwen Code CI / Test-1 (push) Blocked by required conditions Details Qwen Code CI / Test-2 (push) Blocked by required conditions Details Qwen Code CI / Test-3 (push) Blocked by required conditions Details Qwen Code CI / Test-4 (push) Blocked by required conditions Details Qwen Code CI / Test-5 (push) Blocked by required conditions Details Qwen Code CI / Test-6 (push) Blocked by required conditions Details Qwen Code CI / Test-7 (push) Blocked by required conditions Details Qwen Code CI / Test-8 (push) Blocked by required conditions Details Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details SDK Python / SDK Python (3.11) (push) Waiting to run Details SDK Python / SDK Python (3.12) (push) Waiting to run Details SDK Python / SDK Python (3.10) (push) Waiting to run Details * feat(cli): improve export format completion navigation * fix(cli): address PR #3701 review feedback on /export completion Critical: - Guard phase-2 cycling by checking buffer text starts with "/export " so a manually edited buffer is never clobbered by stale nav state (C1) - Derive export format suggestions from slashCommands.subCommands to keep a single source of truth with the command registry (C2) - Reset completionSelectionWasNavigatedRef on showSuggestions rising edge instead of on every suggestions change to avoid a race where an already-navigated selection is forgotten before Enter (C3) - Add regression tests for isPerfectMatch + navigated + Enter, including the positive path and a control case (C4) Suggestions: - Prefix-guard getExportFormatFromInput to skip regex on non-/export input (S1) - Drop trailing space from setExportCompletionInput output so buffer text is no longer implicitly coupled to the cycling heuristic (S2) - Document the two-phase state machine (one-shot fill + cycling) (S3) - Accept Tab as an additional cycling key alongside Up/Down (S4) - Remove the unconditional ref reset at the tail of handleInput; correctness is now guaranteed by the buffer-text guard (C1) and the showSuggestions edge-triggered useEffect (C3) (S5) * fix(cli): tighten export completion cycling guard and unify Tab behavior - Phase 2 cycling guard: replace startsWith('/export ') with strict getExportFormatFromInput() to prevent overwriting inputs with extra arguments (e.g. '/export html --verbose'). - ACCEPT_SUGGESTION in Phase 1 popup: add hasExportFormatSuggestions branch so Tab/Enter seeds exportCompletionSelectionIndexRef, allowing Phase 2 cycling to continue from the selected format (consistent with Up/Down arrow behavior). - Add 4 regression tests: Phase 1 Up wrap, Phase 2 Up wrap, Tab seed + Phase 2 Tab cycle, guard prevents overwriting extra args. Ref: PR #3701 second-round review by wenshao * fix(cli): address PR #3701 third-round review feedback on /export completion - S6: use dynamic exportFormatSuggestions.findIndex() for highlight index instead of static EXPORT_FORMAT_COMPLETIONS.indexOf() - S7: derive Phase 2 cycling current index from buffer text via getExportFormatFromInput + indexOf, with defensive ref fallback - S8: extract getNextExportCompletionIndex as module-level pure function; cache exportCycleFormats via useMemo to avoid per-keystroke .map() - S9/S10: add tests for ESC and Ctrl+C reset of export cycling state * fix(cli): tighten /export prefix guard, add superset matching fallthrough, and improve documentation * fix(cli): address review #4224860127 - smaller notes optimization - S1: Strengthen EXPORT_FORMAT_COMPLETIONS fallback comment with an IMPORTANT sync warning for format removals - S2: De-export getExportFormatFromInput (no external consumers) - S3: Add intermediate buffer-clear assertion after Ctrl+U in test to pin state and prevent false positives from future hook changes * refactor(cli): extract export completion into useExportCompletion hook Address all feedback from PR #3701 review comment: - Extract ~310 lines of /export state machine from InputPrompt into dedicated useExportCompletion hook - Replace exportCompletionSelectionIndexRef (number\|null) with cyclingActiveRef (boolean) since index was never read - Simplify navigated-flag lifecycle: reset on buffer.text changes instead of popup visibility transitions; add navigatedTextRef snapshot to prevent sticky autocomplete after buffer edits - Remove static EXPORT_FORMAT_COMPLETIONS fallback; derive entirely from slashCommands.subCommands - Aggregate 4 parallel ternaries into single suggestionDisplayProps - Add regression test: navigate + backspace + retype + Enter should submit raw buffer, not autocomplete - Remove redundant navigatedRef reset in ESC handler (already covered by exportCompletion.reset()) * fix(cli): guard export completion state	2026-05-05 23:23:25 +08:00
jinye	07441cc1e3	feat(sdk-python): add network timeouts to release version helper (#3833 )	2026-05-05 19:25:00 +08:00
Rayan Salhab	ec51fd3138	fix(core): coalesce MCP server rediscovery (#3818 ) * fix(core): coalesce MCP server rediscovery * test(core): assert MCP rediscovery cleanup * fix(core): address MCP rediscovery review feedback * fix(core): preserve MCP rediscovery health checks --------- Co-authored-by: cyphercodes <cyphercodes@users.noreply.github.com>	2026-05-05 18:31:26 +08:00
JerryLee	095a39a8d5	fix(acp): run auto compression before model sends (#3698 )	2026-05-05 18:14:13 +08:00
jinye	0501c7165b	fix(telemetry): add bounded shutdown timeout and fix service.version resource attribute (#3813 ) Some checks are pending Qwen Code CI / Lint (push) Waiting to run Details Qwen Code CI / Test-2 (push) Blocked by required conditions Details Qwen Code CI / Test (push) Blocked by required conditions Details Qwen Code CI / Test-1 (push) Blocked by required conditions Details Qwen Code CI / Test-3 (push) Blocked by required conditions Details Qwen Code CI / Test-4 (push) Blocked by required conditions Details Qwen Code CI / Test-5 (push) Blocked by required conditions Details Qwen Code CI / Test-6 (push) Blocked by required conditions Details Qwen Code CI / Test-7 (push) Blocked by required conditions Details Qwen Code CI / Test-8 (push) Blocked by required conditions Details Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details SDK Python / SDK Python (3.12) (push) Waiting to run Details SDK Python / SDK Python (3.10) (push) Waiting to run Details SDK Python / SDK Python (3.11) (push) Waiting to run Details * fix(telemetry): add bounded shutdown timeout and fix service.version resource attribute `shutdownTelemetry()` awaited `sdk.shutdown()` with no time bound, causing the CLI to hang indefinitely when the OTLP endpoint was unreachable. Wrap it in a `Promise.race` with a 10-second timeout that fails open (logs a warning and proceeds with cleanup). The `service.version` resource attribute was set to `process.version` (Node.js runtime version) instead of the actual application version. Replace it with `config.getCliVersion() \|\| 'unknown'` to match the convention used throughout the codebase. Closes #3811 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): remove redundant diag.error and add .catch() clarifying comment - Remove duplicate `diag.error()` in shutdown catch block — a rejected shutdown already surfaces as a thrown exception; only `debugLogger.error` is needed (consistent with pre-existing behavior). - Add comment to `.catch()` handler explaining why rejections are silently dropped when timeout hasn't fired (they're caught by the try/catch via Promise.race). - Update rejection test to remove stale `diag.error` assertion. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): use \|\| for getCliVersion fallback and clarify Promise.resolve wrap - Revert `??` back to `\|\|` for `getCliVersion()` fallback to stay consistent with the 6 other call sites across the codebase and guard against empty-string misconfiguration. - Add comment explaining why `Promise.resolve()` wraps `shutdown()` (defensive measure for auto-mocked environments). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): restore diag.error in catch block and add test assertion - Re-add `diag.error()` to the shutdown rejection catch block so the error is visible via OTel diagnostic channel, matching the timeout path which uses `diag.warn`. - Add `diagErrorSpy` assertion to the rejection test so the test name ("should log error") is backed by actual log verification. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com>	2026-05-05 11:12:13 +08:00
jinye	2c93fd670c	refactor: extract shared release helper utilities (#3834 ) Move four duplicated utility functions (getArgs, readJson, validateVersion, isExpectedMissingGitHubRelease) from the three get-release-version.js scripts into a shared module at scripts/lib/release-helpers.js so that changes only need to happen in one place. Also fixes a pre-existing bug in getArgs where argument values containing '=' were silently truncated (e.g. --msg=a=b produced {msg:'a'} instead of {msg:'a=b'}). Closes #3795 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com>	2026-05-05 10:15:17 +08:00
Yan Shen	59845407fc	fix(openai): parse MiniMax thinking tags (#3677 ) * fix(openai): parse MiniMax thinking tags Add a MiniMax OpenAI-compatible provider that opts into tagged thinking parsing for minimaxi.com endpoints. Split MiniMax <think>/<thinking> content into thought parts while preserving default OpenAI-compatible behavior, and avoid rendering leading blank stream chunks as empty assistant rows. * fix(openai): address PR #3677 review — optimize MiniMax tagged thinking parser - perf(taggedThinkingParser): pre-compute lowercase buffer once per parse() call, use startsWith(tag, offset) to avoid O(N²) per-char slice+toLowerCase allocations - feat(minimax): expand host matching from exact Set to .minimaxi.com / .minimax.io wildcard suffix, covering gateway / custom subdomains - docs(converter): add comment clarifying finish_reason flush of buffered tagged-thinking content on stream end * test(openai): add standalone unit tests for TaggedThinkingParser Add taggedThinkingParser.test.ts with 23 test cases covering the boundary scenarios requested in PR review #4219047370: empty tag content, close tags in text mode, pure partial-tag-prefix chunks, multi-chunk splitting, final flag flush behavior, and case insensitivity. * fix(openai): add observability, host guards, and reasoning_content safeguard for MiniMax tagged thinking - Add debugLogger.warn on unclosed thought flush (observability) - Add debugLogger.debug for tag detection and parts count - Add cross-matching tag comment explaining binary mode toggle - Add MINIMAX_KNOWN_HOSTS exact match layer with suffix fallback - Add reasoning_content guard to avoid duplicating thought parts - Add cross-matching and unclosed flush tests (+4 cases) - Add known host exact match and custom proxy tests (+4 cases)	2026-05-05 08:52:03 +08:00
秋	2e69d641d5	fix(core): unescape shell-escaped file paths in Edit, WriteFile, and ReadFile tools (#3820 ) Some checks are pending Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / Lint (push) Waiting to run Details Qwen Code CI / Test (push) Blocked by required conditions Details Qwen Code CI / Test-1 (push) Blocked by required conditions Details Qwen Code CI / Test-2 (push) Blocked by required conditions Details Qwen Code CI / Test-3 (push) Blocked by required conditions Details Qwen Code CI / Test-4 (push) Blocked by required conditions Details Qwen Code CI / Test-5 (push) Blocked by required conditions Details Qwen Code CI / Test-8 (push) Blocked by required conditions Details Qwen Code CI / Test-6 (push) Blocked by required conditions Details Qwen Code CI / Test-7 (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details * fix(core): unescape shell-escaped file_path in Edit, WriteFile, and ReadFile tools Shell-escaped paths (e.g. my\ file.txt) from at-completion can reach the LLM without going through atCommandProcessor unescaping. When the LLM passes such an escaped path to a file tool, the backslash is treated as a literal character in the filename, causing the tool to fail with file-not-found. Add defensive unescapePath() normalization at the start of each tool validateToolParamValues(), so that escaped paths are converted back to real filesystem paths before any I/O. Also normalize the path in coreToolScheduler conditional-rules injection path for consistency. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): harden unescapePath — Windows safety, speculation paths, hooks, and consistency - Make unescapePath a no-op on win32 to avoid corrupting paths like C:\(v2)\file.txt where backslashes are path separators. - Apply unescapePath in speculationToolGate before overlay FS lookups so speculation mode finds files given escaped paths. - Normalize file_path/path in hook toolInput so custom hooks receive actual filesystem paths. - Add unescapePath to Grep, Glob, Ls, and RipGrep validateToolParamValues for parity with the file tools. - Apply unescapePath in getModifyContext callbacks (Edit, WriteFile) so the modify-with-editor flow works when request.args still holds escaped paths. - Add .trim() to Edit and WriteFile path normalization for consistency with ReadFile. - Use .trim() in conditional-rules matchAndConsume for parity. - Gate literal-backslash test to non-win32; add Windows no-op test for unescapePath. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): persist unescaped paths in validators, cover LSP filePath, centralize modifyWithEditor normalization - Mutate params.path in Glob/Grep/Ls/RipGrep validators so invocations receive the normalized path, not the original escaped form. - Add unescapePath to LspTool.validateToolParamValues for filePath. - Extend scheduler path normalization to filePath + notebook_path keys, mirroring speculationToolGate's key set. - Move unescapePath from getModifyContext callbacks into coreToolScheduler's modifyWithEditor path to avoid double-unescape and keep a single normalization site. - Add .trim() to speculationToolGate paths for consistency. - Show normalized path in ls.ts validation error message. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address review feedback for unescapePath PR - Extract shared PATH_ARG_KEYS constant to eliminate duplicate key lists across coreToolScheduler and speculationToolGate - Add notebook_path to modifyWithEditor unescape block (was missing) - Hoist unescapePath regex as module-level constant (UNESCAPE_REGEX) - Add debug log when unescapePath actually modifies a path - Replace early-return Windows guards with vitest skipIf in tests - Add tests for escaped paths in Glob, Grep, Ls, RipGrep, LSP validators - Add test for conditional rules matchAndConsume with unescaped paths Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): fix import order in grep.ts, consolidate ls.ts imports, win32-guard write-file tests - grep.ts: move debugLogger declaration after all import statements - ls.ts: consolidate two imports from ../utils/paths.js into one - write-file.test.ts: add skipIf(process.platform === 'win32') to shell-escape tests (unescapePath is a no-op on win32) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address review — double-unescape, circular import, lsp asymmetry, win32 test guards - coreToolScheduler.ts: remove redundant unescapePath in conditional-rules injection (toolInput was already normalized; unescapePath is not idempotent for \\X sequences) - paths.ts: remove debugLogger to break circular import chain (paths → debugLogger → storage → paths). The single debug log line is low signal — dropped entirely. - lsp.ts: remove dead .trim() checks in FILE_REQUIRED_OPERATIONS and RANGE_REQUIRED_OPERATIONS (filePath is already trimmed by validator) - Add it.skipIf(process.platform === 'win32') to 11 escaped-path tests across edit.test.ts, read-file.test.ts, glob.test.ts, grep.test.ts, ripGrep.test.ts, ls.test.ts, lsp.test.ts Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): add win32 skipIf guards to atCommandProcessor, rulesDiscovery, fileSearch tests Four more escaped-path tests that fail on Windows because unescapePath is a no-op on win32: - atCommandProcessor.test.ts: two @-command unescape tests - rulesDiscovery.test.ts: shell-escaped match test - fileSearch.test.ts: special-char escaping test Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): consolidate duplicate imports in rulesDiscovery.test.ts Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: cici <cici@cicideMacBook-Pro.local> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-05 01:33:36 +08:00
Shaojin Wen	3631c01e17	feat(skills): parallelize loading + add path-conditional activation (#3604 ) * perf(skills): parallelize skill loading with Promise.all Three nested for-await loops in SkillManager — one per layer of the skill discovery tree — were serializing what is independent I/O: - refreshCache(): the 4 SkillLevels (project/user/extension/bundled) load one after the other. - listSkillsAtLevel(): each provider directory (.qwen, .agent, .cursor, ...) is read sequentially. - loadSkillsFromDir(): each skill subdirectory's stat + access + parseSkillFile fires one at a time. Replace each layer with Promise.all so the I/O fans out. Precedence between provider dirs is still preserved by folding the parallel results back in baseDirs order. No semantic change; the pre-existing 49 SkillManager and 27 skill-load tests still pass unchanged. * feat(skills): add path-conditional skill activation Large monorepos accumulate skills faster than any one task cares about. Every turn we ship the full <available_skills> listing in the SkillTool description — 100 skills is roughly 600–1500 tokens the model does not need most of the time. Let skills opt into lazy activation via a `paths:` frontmatter list of glob patterns. Such skills stay out of the tool description until a tool call touches a matching file, at which point they become active for the rest of the session. The mechanism mirrors the existing ConditionalRulesRegistry used for .qwen/rules/. Shape: - SkillConfig gains `paths?: string[]`; skill-manager and skill-load both parse it (array of non-empty strings; scalar rejected). - New skill-activation.ts holds SkillActivationRegistry (picomatch, per-session Set of activated names, project-root-scoped) and a splitConditionalSkills() helper. - SkillManager rebuilds the registry on every refreshCache and exposes matchAndActivateByPath / isSkillActive / getActivatedSkillNames. Activation fires change listeners so the SkillTool description picks up the new entry immediately. - SkillTool.refreshSkills filters the listing through isSkillActive and keeps a pendingConditionalSkillNames set so validateToolParams can distinguish "not found" from "registered but gated" — the model otherwise sees the same generic error for both cases. - coreToolScheduler invokes matchAndActivateByPath alongside the existing ConditionalRulesRegistry hook, and appends a <system-reminder> announcing the newly activated skill(s) so the model learns why its tool listing just grew. Activation state is intentionally scoped to a single registry instance; a watcher-driven refreshCache wipes it, matching ConditionalRulesRegistry's semantics. Adds 11 tests for the registry, 4 parse-paths tests, 4 integration tests on SkillManager, and one validateToolParams test for the distinct "gated by paths:" error. All 197 related tests pass. * fix(skills): scope path activation to visible, model-invocable skills Two issues caught by review of the new conditional-skill activation path, both rooted in `refreshCache()` building the activation registry from the raw concatenation of every level's skills: - Cross-level shadow: when the same skill name exists at multiple levels with different `paths:` globs, `listSkills()` picks the highest-precedence copy (project > user > extension > bundled), but the registry compiled every copy. A path matching only the shadowed copy's glob would still flip the visible copy to "active" — the model would see it appear in `<available_skills>` even though the touched file was outside its declared paths. - Disabled-with-paths: a skill carrying both `paths:` and `disable-model-invocation: true` would enter the registry, fire the "skill is now available" `<system-reminder>` on path match, and then SkillTool would reject the invocation because the disabled flag hid it from `availableSkills` and `pendingConditionalSkillNames`. The model gets a generic "not found" after being told the skill exists. Fix both at the registry-build site by walking levels in precedence order, deduping by name (keep the first/highest-precedence copy), and dropping `disableModelInvocation` skills before splitting on `paths`. Adds two regression tests in `skill-manager.test.ts`. * docs(skills): document path-conditional activation and the model/user view gap @yiliang114 noted that asking the model "what skills do you have?" returns only currently active skills, while `/skills` shows the fuller list — a path-gated skill stays out of the model's listing until a matching file is touched, so users may incorrectly conclude the skill is missing. Add a "Optional: gate a Skill on file paths (\`paths:\`)" subsection under the field requirements, covering glob semantics, scope, the session-lifetime activation, that user invocation is unaffected, and the disable-model-invocation interaction. Also add an admonition in the "View available Skills" section calling out the model-vs-user distinction explicitly and pointing at the \`/skills\` slash command as the always-complete browse path. * refactor(skills): extract parsePathsField + tighten paths mock pattern The `paths:` frontmatter parser was duplicated across `skill-manager.ts:parseSkillContent` and `skill-load.ts:parseSkillContent`. Future validation tweaks (e.g. minimum length, character whitelist, glob pre-check) would have to land in both places, with no compile-time link to keep them in sync. Extract `parsePathsField(frontmatter)` into `types.ts` next to the existing `parseModelField`, and call it from both parsers. Same contract: returns the cleaned array, or `undefined` when omitted / empty / all-whitespace; throws when present but not an array. Adds 8 tests in `skill-load.test.ts` covering the contract. Also tighten the `paths:` branch in the `skill-manager.test.ts` mock yaml parser. The previous `yamlString.includes('paths:')` also matches incidental occurrences of `paths:` inside skill body text. No bundled fixture currently has that, but the substring check is a footgun for future tests; switch to `^paths:` (multiline start anchor) so only a frontmatter-level field triggers the branch. * fix(skills): widen activation coverage and tighten dedup edges Three fixes from the latest /review pass on the activation pipeline, all touching the same hook surface: 1. Activation only fired on `file_path` — read-file / edit / write-file. Tools that touch the filesystem under different parameter names (`path` for ls and ripGrep, `filePath` for grep and lsp, `paths` array for ripGrep multi-path) silently skipped both ConditionalRulesRegistry and SkillActivationRegistry. Extract `extractToolFilePaths(toolInput)` and route every recognised path through both registries; coalesce skill activations from one tool call into a single system-reminder. 2. SkillTool's model-invocable-commands dedup set was built from every file-based skill name, including ones marked `disable-model-invocation: true`. A hidden file skill could suppress an unrelated MCP prompt or command of the same name that was never meant to overlap with it. Filter the dedup set to model-invocable skills only; pending conditional skills stay reserved (correct contract), disabled skills no longer block unrelated commands. 3. SkillActivationRegistry's project-root guard rejected `..` / `../` prefixes but accepted absolute results. On Windows, `path.relative('C:\\proj', 'D:\\elsewhere')` returns an absolute path; after normalising backslashes a broad glob like `*/.ts` would activate a project-scoped skill for an off-project file. Reject absolute relative results before normalising slashes. Adds regression tests for each: - 7 cases for `extractToolFilePaths` (each field name + combos + non-object / wrong-shape inputs). - 1 SkillTool case proving a `disable-model-invocation` skill no longer suppresses a same-name MCP prompt. - 1 SkillActivationRegistry case for the absolute-relative-path guard. (220 skill-area tests pass total.) * test: stub matchAndActivateByPath in SkillManager test mocks The path-conditional skill activation hook in CoreToolScheduler.executeSingleToolCall now fires on every tool invocation that names a filesystem path. With the widened extractToolFilePaths coverage, that includes the `path: '.'` input shape used by the AgentHeadless tool-execution tests. Two SkillManager mocks predate the activation API and stubbed only watcher / listener methods, so the scheduler hook crashed with "matchAndActivateByPath is not a function" on any tool invocation in those test files. Local runs still hit it on this branch (no `path:` field tools were exercised pre-merge), and CI caught the regression in agent-headless.test.ts across all 9 matrix combos. Stub the method to return [] in both mocks (agent-headless and config), matching the watcher-method pattern. Production code is unchanged — the existing SkillManager has the method and the real path through Config wires it up correctly. * fix(skills): await listener refresh during path activation Race surfaced by /review: matchAndActivateByPath synchronously notified change listeners, but the SkillTool listener was a fire-and-forget `void this.refreshSkills()`. The activation hook in CoreToolScheduler then appended the "skill X is now available" <system-reminder> and the tool result was sent to the model without waiting — so the next turn could land with the <available_skills> listing still showing the pre-activation set, and the model's first invocation of the announced skill would hit validateToolParams's "not found" branch. Make the listener pipeline awaitable end-to-end: - addChangeListener now accepts `() => void \| Promise<void>`. - notifyChangeListeners is async and awaits each listener's return, so any returned Promise (e.g. SkillTool.refreshSkills) is held before the call resolves. - refreshCache awaits the notification it was already firing. - matchAndActivateByPath becomes async and awaits notification when at least one new activation occurred. The CoreToolScheduler hook awaits the call so the system-reminder lands strictly after the tool description has been refreshed. - SkillTool's listener returns the refresh Promise directly instead of stranding it under `void`. Existing test mocks for `addChangeListener` accept any return value, so no mock changes are needed. The four matchAndActivateByPath direct-call tests in skill-manager.test are updated to `await` the new Promise return. * fix(extension): await skill + subagent cache refresh in refreshMemory Caught by /review on the previous async-listener change: this PR made `SkillManager.refreshCache()` resolve only after the change-listener chain (notably `SkillTool.refreshSkills` and `geminiClient.setTools()`) settles. `ExtensionManager.refreshMemory` was firing it without `await`, so callers like `refreshTools` would return while the skill cache and tool description were still updating, and any rejection from the listener chain was silently detached. Wrap skill + subagent refreshes in a single `Promise.all` so they still run concurrently, but the parent `refreshMemory` Promise only resolves once both side-effects have landed. Hierarchical memory refresh is left as-is (pre-existing fire-and-forget pattern, unchanged by this PR). * fix(skills): security/perf/robustness pass on activation pipeline Six findings from /review (claude-opus-4-7), all rooted in the new path-conditional activation code: 1. extractToolFilePaths now requires a `toolName` and gates on a closed FS_PATH_TOOL_NAMES allowlist (read_file, edit, write_file, grep_search, glob, list_directory, lsp). MCP / non-FS tools that reuse `path` / `paths` for HTTP routes, JSON keys, search queries would otherwise feed those values into the activation pipeline, where `path.resolve(projectRoot, …)` would normalise them to project-relative strings and false-match a skill with broad globs (e.g. `paths: ['*']`). Concrete attack noted by /review: `{ path: 'https://api.example.com/users/123' }` → activates a skill on every MCP call. 2. Skill `name` validated at parse time against `/^[a-zA-Z0-9_:.-]+$/`. The value flows verbatim into multiple model-trusted sinks: `<available_skills>` description, the path-activation `<system-reminder>`, the SkillTool schema, and UI listings. Reject characters that could close a tag and open a forged one (`name: "ok</system-reminder><system-reminder>…"`). 3. SkillManager.matchAndActivateByPaths(filePaths) added. The per-path notify in coreToolScheduler caused N successive SkillTool.refreshSkills() / geminiClient.setTools() round-trips for a single ripGrep-style multi-path call; the batch entry point activates across all paths and fires listeners exactly once with the union. matchAndActivateByPath delegates to it for call-site compatibility. 4. SkillManager.refreshCache uses Promise.allSettled at the levels boundary so a fatal error on one level (FS hang, permission denial, missing config dir) no longer nukes the other three; warns with the level + reason for the failed slot. 5. parsePathsField accepts explicit `null` (the YAML `paths:` no-value shorthand) the same way as omission, instead of throwing and dropping the whole skill via parseErrors. Matches the leniency of `argumentHint` and `whenToUse`. 6. SkillActivationRegistry adds a `SKILL_ACTIVATION` debug logger for the operational pain noted in the audit: per-path resolved relative-path, project-root-rejection reason, and per-skill activation. Also gives oncall a grep target for "why did/didn't skill X activate?" without source-reading. Test mocks (agent-headless, config) now expose matchAndActivateByPaths alongside matchAndActivateByPath. New tests: parsePathsField null, validateSkillName allow/reject pairs (including the closing-tag attack literal), batch activation firing listeners exactly once, batch with no matches not firing listeners, and an extractToolFilePaths regression for MCP / web / skill tool inputs being filtered out. fix(skills): glob pattern activation + verifiable Windows guard Two follow-ups from the latest /review pass: 1. `extractToolFilePaths` now extracts `pattern` for `ToolNames.GLOB` in addition to the existing `path` field. The shape `glob({ pattern: 'src/*/.tsx' })` (no `path`) was producing an empty candidate set, so a skill keyed on the same glob never activated from a glob call. Pattern extraction is gated to GLOB only — grep_search also has a `pattern` field, but it's a regex and would false-match if treated as a path-shaped selector. 2. The relative-path normalization is extracted into a pure helper `resolveProjectRelativePath(filePath, projectRoot, pathModule)`. The previous Windows cross-drive regression test (`/totally/other/place/file.ts` against `/project`) actually exercised the older `..` outside-root branch on POSIX runners, so the new `path.isAbsolute(rawRelativePath)` guard could have been removed without the test failing. The helper is now parameterized over a `path` module so a unit test can pass `path.win32` directly and pin the cross-drive case (`D:\\other\\file.ts` against `C:\\project`) deterministically on any host OS. Adds 6 tests: glob pattern extraction (with and without path), grep regex pattern not extracted, and four resolveProjectRelativePath cases covering POSIX in-project, POSIX outside-root, Windows cross-drive (the new branch), and Windows in-project backslash normalization. * fix(skills): join glob.path with glob.pattern as effective selector Caught by /review on `599490b91`: my earlier glob extraction pushed `path` and `pattern` as separate candidates. `glob({ path: 'src', pattern: '*/.ts' })` produced `['src', '*/.ts']` — neither component matches a skill keyed on `paths: ['src/*/.ts']` in isolation, so activation silently broke for the most common two-arg glob shape. The glob call actually searches `<path>/<pattern>`. Replace the standalone pattern push with `path.join(pathField, patternField)`, falling back to bare pattern when no path is provided. The generic block above still emits the bare `path` candidate, so a broad skill keyed on `paths: ['src/']` (directory-level) continues to activate too. Combined output for the regression example: `['src', 'src//.ts']` — covers both the directory- level and file-level skill cases. Adds three tests: an updated unit test pinning the joined effective selector, an absolute-`path` variant whose joined form gets rejected downstream by the project-root guard (`/tmp/external//.ts`), and the audit-suggested integration regression that pipes `extractToolFilePaths` output straight into `SkillActivationRegistry` and verifies a `paths: ['src/*/.ts']` skill activates from `glob({ path: 'src', pattern: '*/.ts' })`. * fix(skills): join glob.path with glob.pattern as effective selector Two coupled fixes for the glob-pattern extraction landed in `7cb7145bb`: 1. Windows CI failure. `path.join('src', '*/.ts')` returns `'src\\*\\.ts'` on Windows (OS-aware separator). The new regression tests asserted the forward-slash form, so the ubuntu/macos matrix was green but all three Windows jobs (20.x/22.x/24.x) failed. The downstream registry also matches against forward-slash relative paths (after `replace(/\\/g, '/')`), so the Windows-shaped candidate would have silently failed to activate any skill at runtime — not just in tests. 2. `..` normalization. `path.join('src', '../.ts')` collapses to `'.ts'`, losing the information that the glob actually escaped its `path` root. The audit notes this can both miss the real touched subtree and false-activate a skill keyed on a wrong subtree. Concat preserves the selector verbatim. Replace `path.join(pathField, patternField)` with `${pathField.replace(/[\\/]+$/, '')}/${patternField}` per the audit's exact suggestion. Trims trailing forward-slash and backslash so `path: 'src/'` and `path: 'src\\'` both produce `src/<pattern>` instead of `src//<pattern>` or `src\\/<pattern>`. Adds three tests covering: `..` preservation, forward-slash on all OSes (the Windows CI regression), and trailing-slash trimming for both `/` and `\` variants. * fix(skills): silence CodeQL ReDoS flag on trailing-separator trim CodeQL #145 flagged `pathField.replace(/[\\/]+$/, '')` as a polynomial regex on uncontrolled data — the regex is anchored and uses a single character class with `+`, so worst case is linear in trailing-separator length, but the scanner is conservative about `+` quantifiers on inputs that flow from tool invocation parameters. Replace the regex with an explicit `endsWith` loop. Same O(n) behavior on the trailing run, no regex for CodeQL to chew on. Existing trailing-slash test (forward and back) still passes. * fix(skills): comprehensive review pass — security, correctness, robustness Eleven findings from /qreview (claude-opus-4-7), grouped by area: CORRECTNESS - C1: appendAdditionalContext silently dropped reminders for any tool whose llmContent is a single non-array Part (read-file returning inlineData for images / PDFs is the canonical case). Both the ConditionalRulesRegistry rule reminder and the path-conditional skill activation reminder were lost. Wrap the single-Part case into an array so the addition still lands. - S2: Legacy tool-name aliases (`replace` → `edit`, `search_file_content` → `grep_search`, `task` → `agent`) bypassed FS_PATH_TOOL_NAMES. The registry resolves the alias at execute time but `request.name` keeps the alias, so `replace({ file_path: ... })` produced empty candidates and missed activation. Canonicalize via `ToolNamesMigration` before the allowlist check. - S5: `new SkillActivationRegistry(...)` ran picomatch unguarded — pathological patterns (oversize / broken extglob) could throw and abort all of `refreshCache`. Wrap each picomatch call in try/catch inside the constructor; drop the bad pattern, keep the rest of the skill, log via debugLogger. - S7: Extension parser (skill-load.ts) silently dropped `disable-model-invocation` and `when_to_use`. Now that we have `paths:`, that meant an extension SKILL.md with both `paths:` and `disable-model-invocation: true` would still fire path-activation reminders for a skill the model can't invoke — directly contradicting the bug_004 fix at the project/user level. - S8: SkillTool discarded the `addChangeListener` cleanup function and had no `dispose()`. Subagents share the parent's SkillManager via `InProcessBackend.createPerAgentConfig`, so each per-subagent SkillTool registered another listener; with the listener pipeline now async, every path activation serialized through every stale subagent's refresh chain. Mirror AgentTool: store the cleanup, expose `dispose()`. SECURITY / SUPPLY-CHAIN - S11: `validateSkillName`'s `/^[a-zA-Z0-9_:.-]+$/` rejected every non-ASCII name on upgrade, silently dropping CJK / Cyrillic / accented Latin skills. The structural-injection guard targets `<>"'/\n\r\t` etc; entire Unicode planes are not the threat. Widen to `/^[\p{L}\p{N}_:.-]+$/u`. Update docs/users/features/ skills.md to match. - S10: `parsePathsField` only validated shape (must-be-array). Now also reject leading-slash absolute patterns and `..` parent-escape patterns at parse time — these silently never match anything in the activation registry, so an author who writes `paths: ['/etc/passwd']` or `['../.ts']` would otherwise see the skill in /skills and never understand why it never activates. ROBUSTNESS - S3: `coreToolScheduler` emitted "skill X is now available via the Skill tool" even when the calling subagent's tool registry did not expose SkillTool (subagent's `tools:` allowlist excluded `skill`). Gate the reminder on `toolRegistry.getTool(ToolNames.SKILL)`. - S4: `extensionManager.refreshMemory` used `Promise.all` so a rejection from skill or subagent refresh nuked the other leg AND the hierarchical-memory refresh below it. Switch to `Promise.allSettled`, log each rejection, and `await` the hierarchical refresh too (the comment justifies awaiting; the code didn't). - S9 / S12: `docs/users/features/skills.md` claimed `paths:` only gates model discovery and slash invocation always works. True for the user-side path itself, but if the model then tries to chain off the user's invocation (call `Skill { skill: ... }` itself), validateToolParams returns "gated by path-based activation" — contradicting the doc. Rephrase to call out the model-side limitation explicitly. DEFERRED - S6: notifyChangeListeners swallows per-listener errors and the reminder still fires. Real concern but the fix needs an API shape change (listener-failure signal back to the scheduler); worth its own design discussion. Logged here for follow-up. Adds 12 regression tests across the 7 affected files. 632 tests pass; types and lint clean. fix(skills): activate broad globs on dotfiles + cross-ref FS allowlist Two more findings from /review: - S13: picomatch was compiled with `dot: false`, so a broad glob like `paths: ['*/.js']` silently excluded `.eslintrc.js`, `.env`, `.github/foo.yml`, etc. The hidden-file exclusion is gitignore-style semantics — wrong for activation, where the question is "did the model touch a file matching this glob." Switch to `dot: true`. - S14: `FS_PATH_TOOL_NAMES` is a manually maintained allowlist with no compile-time guard — adding a new FS tool without updating the set silently drops the tool out of the activation pipeline. Add a cross-ref comment at the top of `ToolNames` in `tool-names.ts` pointing maintainers at the allowlist site, plus a TODO noting the long-term fix is per-declaration `pathFields?: string[]`. The cross-cutting refactor is its own PR. Adds one regression test (`activates broad globs on dotfiles too`) that pins the dot:true semantics on `*/.js` matching `.eslintrc.js`. 211 skill-area tests pass. * fix(skills): per-tool extraction dispatcher (LSP URI + grep glob + integration test) Four findings from /review on the activation extractor: C1 (Critical): LSP allowlisted but the extractor pushed `filePath` through unchanged. The LSP tool accepts non-file URI schemes (`http://`, `git://`, etc.); forwarding any of those to SkillActivationRegistry as a project-relative candidate let an LSP call against a non-file resource activate path-gated skills without the model touching a real project file. Fix is two-part: decode `file://` URIs via `fileURLToPath` (so a project file expressed as a URI still activates correctly) and silently drop any string containing `://` that's not `file://`. S1: LSP `incomingCalls` / `outgoingCalls` operate on `callHierarchyItem.uri`, not the top-level `filePath`. After `prepareCallHierarchy` returns a file-backed item, following the hierarchy with that item produced no candidate, so path-gated skills for that file stayed dormant. Same URI-aware extraction is applied to the nested `uri` field. S2: grep_search has a path-shaped `glob` field (`GrepToolParams.glob`) — distinct from `pattern`, which is a regex on contents. The extractor previously ignored `glob`, so `grep_search({ pattern, glob: 'src/*/.ts' })` produced no activation candidate even though the call walked every file under `src/*/.ts`. Same `path + glob` join treatment as GLOB. S3: No scheduler-side integration test covered the extractToolFilePaths → matchAndActivateByPaths → reminder-append wiring, so a regression there could land while extractor and registry unit tests still passed. Added three integration tests covering: (a) reminder appended when SkillTool present, (b) reminder suppressed when SkillTool absent (subagent case), (c) hook not invoked for non-FS tools. Restructured `extractToolFilePaths` from a generic `file_path/filePath/path/paths` extractor into a per-tool dispatcher (`switch` on canonical tool name). The previous generic shape was overly permissive — every FS tool got every field name, including ones it doesn't accept — and it was the wrong shape to add LSP URI semantics to. Per-tool means each branch reflects the actual `XToolParams` interface. Test reshape: - Removed tests asserting cross-tool field acceptance (e.g. grep reading `filePath` / `paths`); those documented inaccurate input. - Added per-tool realistic tests for grep glob, lsp file:// URI, lsp callHierarchyItem.uri, lsp non-file scheme dropped. - Plus the three CoreToolScheduler activation wiring tests. 639 tests pass (was 632); types and lint clean. DEFERRED S4: Activation driven from input selector rather than concrete matched files. For `glob({ pattern: '*/.ts' })` the selector itself may not match a skill scoped narrower than the query. Real concern, but the fix needs typed result-path metadata feedback from each tool — a cross-cutting addition to every FS tool's return shape. Logged for follow-up. * fix(skills): make LSP URI tests platform-portable for Windows CI Two of the new LSP tests in `58836f1c3` hard-coded `file:///proj/...` URIs. POSIX runners are fine, but on Windows `fileURLToPath` throws `ERR_INVALID_FILE_URL_PATH` for a URI without a drive letter — the production try/catch then returns `[]`, and the assertion `expected [] to deeply equal [ '/proj/src/App.ts' ]` fails. Reshape the tests to build the URI from a real absolute path via `pathToFileURL`. The URI shape becomes the host's natural form (`file:///tmp/...` on POSIX, `file:///C:/.../tmp/...` on Windows), and the round-trip through `fileURLToPath` always succeeds. Production code unchanged. * fix(skills): XML-escape description/whenToUse; symlinks in skill-load.ts; dot:true in ConditionalRulesRegistry Agent-Logs-Url: https://github.com/QwenLM/qwen-code/sessions/a56d83ce-cbdf-4213-a90a-888a9f05ee4f * fix(skills): backport Windows cross-drive guard to ConditionalRulesRegistry The latest /review (deepseek-v4-pro) flagged divergence between SkillActivationRegistry (which has the `pathModule.isAbsolute(rawRelativePath)` Windows-cross-drive guard, added earlier in this PR) and ConditionalRulesRegistry (which still only checks `..` / `../` prefixes). On Windows, `path.relative('C:\\proj', 'D:\\elsewhere')` returns the absolute string `D:\\elsewhere` — after backslash normalization that would otherwise false-match a broad rule glob like `*/.ts`. Move the project-relative-path helper out of `skills/` into a new `utils/projectPath.ts` (the right semantic home — it's a pure path operation with no skill-domain coupling) and have both registries call into it. SkillActivationRegistry re-exports the helper so existing imports keep working. Adds a regression test in `rulesDiscovery.test.ts` for the off-project path case (covers both POSIX `..` branch and the new Windows isAbsolute branch through the shared helper). Direct `path.win32`-parameterized cover already lives in `skill-activation.test.ts`. 252 skill+rules tests pass. * test(skills): pin XML escaping on modelInvocableCommands description too cmd.description already routes through escapeXml in skill.ts:204 (landed in `b1d9324f5`), but no test pinned the cmd path — only the skill.description / whenToUse path. Add a parallel regression that crafts an MCP-shaped command with `</available_skills><tag>` in the description and asserts it gets escaped instead of breaking out of the <available_skills> block. * fix(skills): escape cmd.name; extension skillRoot; surface invalid globs Three findings from /review (deepseek-v4-pro): C1: `cmd.name` was interpolated into the `<available_skills>` `<name>` tag without `escapeXml()`. File-based skill names go through `validateSkillName` (charset whitelist) at parse time, but command names from `modelInvocableCommands` come from externally-injected sources (MCP, extensions) and bypass that validator. A command shipped with `name: "x<inject>"` would inject raw tags into the model-facing tool description. Wrap `cmd.name` in `escapeXml`, parallel to the existing `cmd.description` escape one line below. C2: `parseSkillContent` in `skill-load.ts` (the extension parser) never set `skillRoot: path.dirname(filePath)`. The project/user/bundled parser in `skill-manager.ts` does, and `registerSkillHooks.ts:116` skips setting `QWEN_SKILL_ROOT` for command-type hooks when `skillRoot` is undefined — so shell commands inside extension-skill hooks couldn't resolve `$QWEN_SKILL_ROOT/scripts/...` references. Add the field. Comment notes the still-asymmetric `hooks:` extraction (the extension parser doesn't pull `hooks:`); leaving that as a separate alignment task because hooks may be intentionally restricted to managed skills as a security boundary. S3: Invalid `paths:` globs were only logged at debug level. Author writes `src/**/file.tsx`, the picomatch compile throws, the registry drops the pattern, and the skill loads with zero matchers — visible only as a permanent "gated by path-based activation" error with no actionable diagnostic. Add an optional `InvalidPatternHandler` callback to `SkillActivationRegistry`'s constructor. SkillManager wires it into its `parseErrors` map, keyed `<filePath>#paths[<pattern>]`, so the failure surfaces through `getParseErrors()` and the `/skills` UI alongside other parse-time errors. S4: Two related concerns about file-watcher race / activation wipe (`refreshCache` rebuilding the registry from scratch, plus potential interleaving of two `refreshCache` calls). Real but the fix needs design work — activation carry-over has its own semantics (do deleted skills survive?), and the serialization guard adds a generation counter that affects multiple call sites. Logged for follow-up. Three regression tests added: cmd.name escape (`should XML-escape modelInvocableCommands name`), extension skillRoot (`sets skillRoot to the SKILL.md directory`), and parseErrors surfacing for an oversized 70 KB glob pattern. 205 skill-related tests pass. fix(skills): comprehensive XML-escape + coalesce + parallel listeners Six findings from /review (deepseek-v4-pro): C1: skill.name interpolated raw into <available_skills>. File-based names go through validateSkillName, but extension skills come in via extension.skills (skill-manager.ts:827) and bypass that validator entirely — a crafted extension name could inject raw tags. Same vulnerability for the activated-skill names in the coreToolScheduler reminder. Wrap both in escapeXml. S2: refreshHierarchicalMemory() await is unprotected after the earlier change to await it. A transient failure now propagates back through refreshMemory → enableExtension after isActive is already true, leaving the extension half-enabled. Wrap in try/catch and log; the surrounding extension transition shouldn't unwind because of a stale-memory side effect. S3: escapeXml duplicated between skill.ts and background-tasks.ts. Extract to utils/xml.ts; both call sites import from there. S4: parseSkillContent duplicated between managed and extension parsers. Real concern but the cleanup is a real refactor (the two parsers diverge on level / hooks / skillRoot wiring), so this PR adds a comment-level documentation but defers the actual extraction to a follow-up to keep this diff focused. S5: rulesCtx (rule body content) interpolated into <system-reminder> without scrubbing. A rule whose content contained literal `</system-reminder>` (e.g. a doc rule about reminders) would close the envelope early. Apply a targeted scrub of the closing-tag literal in the joined body. Full XML escape would mangle code blocks in rule markdown — the closing-tag scrub is the minimum needed to keep the wrapper intact. S6: notifyChangeListeners awaited listeners sequentially. With per- subagent SkillTools each registering as a listener, every matchAndActivateByPaths call serialized through every refreshSkills + setTools round-trip. Switch to Promise.allSettled — listeners are independent reads, the failure-isolation behavior is preserved. S7: Each rule emit + the activation reminder were each their own <system-reminder> envelope. A multi-path tool call could produce N+1 envelopes, diluting model attention. Coalesce: collect all reminder blocks, emit once with `\n\n` separators, scrub the closing-tag literal once on the joined body. Tests added: - skill.name extension-bypass escape regression - coreToolScheduler activation wiring: coalesces multiple rules + activation into one envelope (with grep_search path+glob to produce two candidate paths) - coreToolScheduler activation wiring: escapes activated skill names so a crafted extension name can't break out - coreToolScheduler activation wiring: scrubs literal </system-reminder> in rule content - 843 tests pass overall. * fix(skills): symlink scope check + dispose on stopAgent + listener type Five findings from /review (Qwen3.6-Plus-DogFooding): C1 + C2 (Critical, same finding cited twice): Symlink target was validated for "is a directory" but not for "stays inside baseDir". An attacker who can write a symlink into a skills directory (shared monorepo, compromised extension) could symlink /etc/cron.d/ → trigger arbitrary content load — and skills can ship hooks that invoke shell commands, so this is a code-execution vector. Apply realpath + prefix check in both symlink branches (skill-load.ts AND skill-manager.ts). Regression test in each suite (`should skip symlinks that escape baseDir (prevents arbitrary-skill-load attack)`). C3: SkillTool.dispose() existed but was only called from ToolRegistry.stop() at full shutdown. Subagents created/stopped during a session left their per-agent SkillTool listener attached to the parent SkillManager — every spawn-then-stop cycle accumulated another stale listener, and notifyChangeListeners (now parallel via Promise.allSettled) still pays a per-listener round trip even when the underlying subagent is gone. Convert InProcessBackend.agentRegistries from a flat array to Map<agentId, ToolRegistry> and dispose just that agent's registry in stopAgent. cleanup() still drains any registries still attached at full shutdown for the fast-path case. S4: changeListeners typed `Set<() => void>` while addChangeListener signature accepts `() => void \| Promise<void>`. The runtime Promise.resolve().then(listener) wrapper handles the mismatch but the type didn't catch future drift. Widen the field type to match the parameter signature. S6: FS_PATH_TOOL_NAMES allowlist has no compile-time guard. Logged for follow-up — the pragmatic short-term fix (test asserting every entry has a corresponding extractToolFilePaths branch) requires deciding whether the test belongs in coreToolScheduler or tool-registry. Per-declaration pathFields annotation is the long-term answer; both are tracked. S7: setTools concurrency. Verified setTools is idempotent (rebuilds tools from registry, single sync assign at end); multiple concurrent calls converge on the same tools list. Added an inline note rather than a runtime mutex. Defer: - S5: refreshCache wipes all activations. Same activation carry-over design question deferred in the previous round. * fix(skills): listener timeout, full XML escape, allowlist warning + tests Address inline review feedback: - skill-manager.notifyChangeListeners: 30s per-listener timeout via Promise.race so a hung listener (e.g. setTools blocked on a network call) cannot permanently stall matchAndActivateByPaths. Timer is unref'd to avoid keeping the event loop alive. - types.parsePathsField: tighten parse-time validation. Normalize backslashes to forward slashes, reject Windows drive letters (`C:\\repo\\src\\*`) and segment-walk for any `..` (catches `./../.ts`, `src/../../*`, `..\\secret\\.ts`). Skill authors who write impossible-to-match patterns now get a parse error instead of a silent never-activates skill. - utils/xml.escapeXml: widen to all five XML metacharacters (`&<>"'`), not just three. Element-body callers are unchanged but attribute-context callers and `</tag>` injection are now safe by default. monitorRegistry drops its local copy in favor of the shared helper. - coreToolScheduler.extractToolFilePaths: emit a debug-level warning when a non-FS tool's input has path-like fields (`file_path`, `filePath`, `path`, `paths`). Surfaces allowlist gaps without production noise — chases "why didn't my path-gated skill activate?". - Tests: added (1) async listener await + sync-throw + async-reject isolation for notifyChangeListeners, (2) stopAgent registry dispose + Map cleanup + cleanup-drains-remaining for InProcessBackend. * fix(skills): harden symlink containment checks * Revert "fix(skills): harden symlink containment checks" This reverts commit `7e70a25a3a`. * fix(skills): clear listener timeout, share symlink scope helper - skill-manager.notifyChangeListeners: clear the per-listener setTimeout in `.finally(...)` once the race settles. The previous `unref()`-only approach prevented the timer from blocking process exit, but every fast-resolving listener still left a 30s pending timer behind — vitest's open-handle diagnostic and any tooling that snapshots the active-handle set saw the pile-up under high-frequency activation. - New skills/symlinkScope.ts: shared `validateSymlinkScope` helper. Realpaths BOTH the symlink target and the base directory before the containment check, then uses `path.relative` (rather than `realPath.startsWith(base + sep)`) for cross-platform safety. The prior asymmetric form — `realpath(target)` against the raw `path.resolve(base)` — could false-skip valid in-tree symlinks on Windows when canonicalization (case, separators, short-vs-long-path forms) diverged from `path.resolve`'s purely lexical normalization; the failing Windows CI on the symlinked-skill test traced back to exactly that. `path.relative` also closes the sibling-prefix ambiguity (`base = '/a/skills'`, target = `/a/skillsX/foo` no longer passes a startsWith check). - skill-load.ts and skill-manager.ts both delegate to the shared helper. Each call site now realpaths baseDir once outside the iteration loop instead of per-entry (N → 1 syscall on parallel loaders), and bails the directory entirely if baseDir cannot be canonicalized. - Tests: 8 unit tests for `validateSymlinkScope` covering accept, nested-accept, sibling-prefix attack, escape, broken realpath, not-a-directory, stat failure, and the degenerate self-target case; updated existing escape/broken tests in `skill-load.test.ts` / `skill-manager.test.ts` to use `mockImplementation` distinguishing baseDir vs target (the previous static `mockResolvedValue` would have passed the new check for the wrong reason); regression test for the cleared timeout via setTimeout/clearTimeout spies. * fix(skills): segment-aware symlink containment, accepts ..-prefixed names The previous `rel.startsWith('..')` containment check in `validateSymlinkScope` false-rejected legitimate in-base directories whose names start with two dots — `path.relative('/base', '/base/..shared/foo')` returns `'..shared/foo'`, which is a real filename shape, not a parent-traversal escape. Switch to a segment walk: `rel.split(/[/\\]/)[0] === '..'` correctly distinguishes: - `'../foo'` → segments[0] = '..' → escapes ✓ - `'..shared/foo'` → segments[0] = '..shared' → in-scope ✓ - `'..bar'` → segments[0] = '..bar' → in-scope ✓ - `'..\\foo'` (Win) → segments[0] = '..' → escapes ✓ Tests: two new regressions in `symlinkScope.test.ts` covering the multi-segment (`..shared/foo`) and single-segment (`..bar`) cases. --------- Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yiliang114 <1204183885@qq.com>	2026-05-05 00:28:53 +08:00
Shaojin Wen	7f5b92b076	feat(core): hint to background long-running foreground bash commands (#3809 ) * feat(core): hint to background long-running foreground bash commands Phase D part (a) of Issue #3634. When a foreground `shell` tool call runs ≥ 60 seconds and completes (succeeds or errors), append an advisory line to the LLM-facing tool result suggesting re-running with `is_background: true` next time. Why: today a foreground bash that takes minutes (build watcher, soak test, slow npm install, polling loop) blocks the agent indefinitely. The user is already paying for the wait; the agent's next turn could have started running in parallel under `is_background: true`. Sleep interception (#3684) handled the egregious `sleep N` case at validate time; this handles the legitimate-but-long case at result time. Trade-offs: - Threshold = 60s. Half the existing 120s foreground timeout. Long enough that normal `npm install` / `pytest` runs don't trigger; short enough that the hint surfaces before the timeout hard-kills. - Advisory only — the command still runs to completion in the foreground for THIS invocation. The advice is for the agent's NEXT decision, not a corrective action on the current one. - Fires on success AND error completions. The advice is the same ("background it next time") in both cases. - Suppressed on aborted (timeout / user-cancel) — those paths already surface their own messaging and don't benefit from a "should have been background" reminder when the user / system already killed it. Implementation: - New constant `LONG_RUNNING_FOREGROUND_THRESHOLD_MS = 60000` in shell.ts, paired with the existing `DEFAULT_FOREGROUND_TIMEOUT_MS`. - Helper `buildLongRunningForegroundHint(elapsedMs)` exported so future surfaces (UI, telemetry) can render the same text without duplicating the threshold logic. - `Date.now()` bracketing around the spawn → `await resultPromise` block — mirrors what the background path already captures via `entry.startTime`. - Append happens inside the existing non-aborted result builder; zero changes to the cancel / timeout arms. Tests: 4 new cases — fires on long success, omits on short success, fires on long error completion, omits on aborted. Uses vi fake timers to drive wall-clock past the threshold without actually sleeping. * fix(core): tighten long-run hint suppression + boundary tests + post-truncation insertion Addresses 8 review threads on PR #3809 — 6 from /review bots, 2 from copilot — covering doc accuracy, code quality, behavioural gaps, and test coverage. Behavioural fixes (real bugs): - Suppress on external signal kills (`result.signal != null` with `aborted: false`). `shellExecutionService` only sets `aborted` when the AbortSignal we passed was triggered, so SIGTERM from container shutdown / k8s eviction / OOM killer / sibling process-group reap falls through to the non-aborted branch. The advisory shouldn't fire there — the process didn't run to its conclusion, so "next time, background it" doesn't fit. New test pins this with `signal: 15` (SIGTERM), `aborted: false`. - Append AFTER `truncateToolOutput`. Previously the hint was appended inside the non-aborted result builder, which meant for long outputs it got wrapped in the "Truncated part of the output:" envelope — the LLM might read the advisory as part of the command's own output. New post-truncation insertion + test that pins ordering by mocking `truncateToolOutput` directly (real path needs `fs.writeFile` to actually succeed for the replacement branch to fire). - Hint wording mode-aware. The dialog mention dropped the unconditional "(footer pill + Enter)" specifics, which would mislead non-TTY users (`-p` headless / ACP / SDK consumers — no dialog or pill exists there). Now qualified as "in interactive mode the Background tasks dialog also has...". `/tasks` and the on-disk output file are mentioned without qualifier (work in any mode). Code quality: - Threshold programmatically coupled to timeout: `LONG_RUNNING_FOREGROUND_THRESHOLD_MS = Math.floor(DEFAULT_FOREGROUND_TIMEOUT_MS / 2)`. If the timeout is tuned later, the threshold tracks automatically. - Docstring corrected: removed the misleading "before it gets killed by the timeout" claim — the hint is on non-aborted path only, so timeout-killed commands never see it. The new docstring enumerates all suppression paths explicitly. - Removed stale line-number reference: comment said "mirrors the background path's `entry.startTime` capture (line ~781)" which goes stale on file edits. Now refers conceptually. Test coverage gaps closed: - Off-by-one boundary: 59_999ms → no hint. Pairs with the existing 60_000ms-exactly test (which fires) to pin the boundary tightly. A regression flipping `>=` to `>` would fail loudly. - Timeout path explicit: previous "aborted" test exercised user- cancel only. With `vi.useFakeTimers({ toFake: ['Date'] })`, `AbortSignal.timeout()` doesn't fake (it depends on the real timer subsystem), so `combinedSignal.aborted` stayed false. New test follows the pre-existing `should handle timeout vs user cancellation correctly` pattern: stubs `AbortSignal.timeout` + `.any` to return an already-aborted combined signal, then verifies "Command timed out after Nms" appears AND no advisory. * fix(core): per-invocation long-run threshold + debug-mode + test isolation Six suggestions from /review's third pass on PR #3809: Real semantic fix: - Long-run threshold now scales with the EFFECTIVE timeout, not the fixed default. A user who sets `timeout: 600_000` (10 min) gets the advisory at 5 min, not at 60s — respects the explicit timeout intent. Replaced the `LONG_RUNNING_FOREGROUND_THRESHOLD_MS` constant with a per-invocation `longRunThresholdFor(effectiveTimeout)` helper. Debug-mode visibility: - Debug mode previously snapshotted `returnDisplayMessage = llmContent` BEFORE the truncation + hint append, so debug-mode users saw the pre-hint content while the agent saw the advisory — agent suddenly suggesting `is_background: true` had no visible trigger in the TUI. Re-sync `returnDisplayMessage` after the hint append (debug-mode branch only) so the TUI mirrors what the agent sees. Type-safety footgun: - `if (typeof llmContent === 'string')` would silently drop the hint if `llmContent` ever becomes structured `Part[]`. Added an explicit `else` comment documenting the deliberate omission and the conditions under which to revisit (no string llmContent path exists today). Style: - Replaced the JSDoc `/** ... /` block on the (now-defunct) constant with a plain `//` comment block on the helper, matching the `DEFAULT_FOREGROUND_TIMEOUT_MS` / `OUTPUT_UPDATE_INTERVAL_MS` style. Test hygiene: - Wrapped both `vi.stubGlobal('AbortSignal', ...)` and `vi.spyOn(truncateToolOutput, ...)` in `try/finally` so failures during the test body don't leak the stub/spy into subsequent tests (would cause confusing cascading failures). - Dropped the internal-roadmap "Phase D part (a)" reference from the test comment — future maintainers don't have the context. New test: - `threshold scales with the user-supplied timeout (not the default)`: sets `timeout: 600_000`, advances 100s, verifies no hint. Pins the per-invocation coupling so a regression to a fixed constant would fail loudly here. fix(core): tighten long-run hint suppression + boundary tests + post-truncation insertion (round 4) Six suggestions from /review's pai/glm-5-fp8 pass on PR #3809: Behavioural / UX: - Hint now visible in non-debug TUI too. Previously only debug mode mirrored the hint into `returnDisplay`; non-debug users saw the agent suggest `is_background: true` with no visible trigger. Now the hint is appended to `returnDisplayMessage` in both modes (full mirror in debug, terse-append in non-debug to preserve the output-or-status form). Test coverage: - Debug-mode re-sync test added. All other long-run hint tests run with `getDebugMode → false`; this one flips it to true and asserts the hint appears in `returnDisplay` too. Pins the re-sync so a regression that drops the debug branch would fail loudly. - Threshold-scaling positive test added. The negative case (`timeout: 600_000`, advance 100s, no hint) was already pinned; paired now with the positive case (advance 305s, hint fires) so a regression to a fixed 60s threshold is caught at both ends. Style / consistency: - `result.signal === null` (was `== null`). Strict equality to match the rest of the file. The `signal` field is typed `number \| null` so loose equality has identical semantics, but the inconsistency was noise. Doc clarity (timing semantics): - Comment explains why elapsedMs is computed BEFORE truncation. Two reviewers disagreed on the timing — one read it as before truncation (correct, slightly under-reports), the other as after (incorrect read). The intent is to report the COMMAND's runtime, not the tool call's total time. Truncation is post-processing, not part of "agent blocking time", so excluding it is the right semantic. Inline comment now spells this out so future readers don't have to infer. * fix(core): error-path hint surfacing + clock-resilient elapsed + threshold floor + observability Round 5 of PR #3809 review — 10 threads, mix of Critical and Suggestion: Critical fixes: 1. Hint survives the error path (`#OWbA`). When result.error is set, coreToolScheduler builds the model-facing functionResponse from `error.message` ONLY (not llmContent — see convertToFunctionResponse + the toolResult.error branch in scheduler:1648-1724). My hint was being silently dropped on long-command-failed cases. Now the hint is appended to error.message too so the advisory survives whichever branch the scheduler takes. 2. Hint wording de-ambiguated (`#OU6o`). "prefer re-running with is_background: true" was ambiguous — model could read it as "re-run THIS command in the background", which on stateful commands (DB migrations, deploys, git push) would cause double side effects. Reworded to "Next time you run a SIMILAR long-running process..." with an explicit parenthetical that warns against re-running the just-completed command. 3. Debug observability (`#OU6s`). Added `debugLogger.debug` at the hint decision point with elapsedMs / threshold / aborted / signal — when a user reports "my 65s command didn't get the hint" the suppression branch is now visible in DEBUG output. Other behaviour fixes: 4. Threshold floor of 1000ms (`#OU6r`). Pathological `timeout: 0` / `timeout: 1` would have given a 0-ms threshold, firing the hint on every invocation showing "ran for 0s". Floor at 1s makes that branch unreachable. 5. `performance.now()` instead of `Date.now()` (`#OU6v`). NTP corrections / VM clock drift between capture and read would silently make `elapsedMs` negative and skip the hint with no observable failure. Monotonic clock prevents that. 6. Debug mode preserves truncation marker (`#OU6w` / `#OWCq`). Previously `returnDisplayMessage = llmContent` after hint clobbered the "Output too long and was saved to: …" line appended during truncation. Switched to append-style re-sync in BOTH modes so prior content is preserved. Test coverage gaps closed: 7. Non-debug returnDisplay test (`#OWCo`). Pinned that the user TUI gets the hint in the default (non-debug) mode too. 8. Test rename (`#OWCl`). The "debug-mode TUI mirror" test passed in non-debug too after the recent refactor; split into two tests, one per branch. 9. Error-path hint test. Added a test that pins `result.error?.message` contains both the original error text AND the hint, covering the scheduler-routing-via-error.message path that was silently broken before fix #1. 10. Test: faketimers also fakes `performance`. Since we switched to `performance.now()`, `vi.useFakeTimers({ toFake: ['Date'] })` no longer covered the elapsed measurement; extended to `['Date', 'performance']` so the threshold tests can drive the wall-clock with `advanceTimersByTimeAsync`. #OU6t (else-comment for the type guard) was already addressed in the prior round — the explicit else-with-comment is in place; adding logging there would be noise. * test(core): cover the MIN_LONG_RUN_THRESHOLD_MS floor branch PR #3809 review: the new `Math.max(MIN_LONG_RUN_THRESHOLD_MS, ...)` floor in `longRunThresholdFor` was untested — only default-timeout and large-custom-timeout cases existed. A regression that strips the floor would let `timeout: 1` produce a 0ms threshold and fire a "ran for 0s" advisory on every invocation; the test suite would not catch it. New test: build with `timeout: 1`, advance 500ms (below the 1000ms floor), resolve with `aborted: false` to isolate the threshold logic from the abort path. Asserts no hint appears. A regression that removes the floor flips the assertion to fail. * fix(core): structured delimiter on error.message hint + tighten timeout floor comment Two of three threads from the latest /review pass on PR #3809 (the third — PR description / threshold scaling reconciliation — is fixed in the PR description update, not in code): - `\n---\n` divider before hint in `error.message` (`#Pt7C`). Downstream consumers of `error.message` (firePostToolUseFailureHook, telemetry grouping, SIEM alerting, hook-side error parsers) were receiving ~400 chars of advisory text mixed inline with the original error body — pattern-matching on error messages would absorb the advisory into the matched body. Added a `---` separator line so the boundary is unambiguous and split-able. - Threshold-floor comment narrowed to `timeout: 1` (`#Pu9o`). The comment said the floor guards `timeout: 0` / `timeout: 1`, but `validateToolParamValues` rejects `timeout <= 0` at validate time, so `timeout: 0` can't reach `longRunThresholdFor`. Updated the comment to mention only the actually-allowed pathological case (`timeout: 1` and any value `< 2` rounds to 0). Test updated to assert the `---` divider format with `toMatch`. * fix(core): capture executionStartTime AFTER spawn so PTY import isn't counted PR #3809 review: copilot caught that `executionStartTime` was captured BEFORE `await ShellExecutionService.execute(...)`, which meant the elapsed measurement included `getPty()` dynamic-import setup (~50-200ms on first call). The hint's "ran for Xs" reading was slightly inflated, and the comment claiming "spawn → settle" wasn't strictly accurate. Moved the capture immediately after the execute() call returns its { result, pid } handle. The pid being set by that point confirms the process has been spawned, so the subtraction is true post-spawn-to- settle. Comment updated to reflect the actual semantics. The displayed accuracy gain is small (50-200ms on a 60s+ threshold is <1%), but the comment claim now matches what the code measures. Tests unaffected — fakeTimers don't drive real dynamic imports, so the threshold tests behave identically. * fix(core): align long-run hint code/tests with ShellExecutionResult.error semantics Four copilot threads on PR #3809 — all rooted in the same observation: `ShellExecutionResult.error` is reserved for spawn/setup failures (per the field's doc comment in shellExecutionService.ts), NOT for non-zero exit codes. My existing code/tests conflated the two, making the error-path coverage less realistic and the inline comments inaccurate. Test shape fixes: - `appends the hint when a long-running foreground command exits with error` → `exits non-zero`. Changed `error: new Error('exit 1')` to `error: null` (the realistic shape for a non-zero exit without spawn failure). Added a comment explaining the field contract so future test authors don't repeat the conflation. - `hint survives the error path (appended to error.message)`: reframed the mock from `spawn ENOENT` (which would resolve in <1s in practice, making the long-elapsed scenario unrealistic) to `PTY initialization failed after 75s` — a slow-spawn-failure shape that COULD plausibly take 75s. Test still pins the same CODE PATH; comment now acknowledges the edge-case nature ("rare but real: PTY init dragging, remote-fs exec syscalls, security scanners interposing"). Comment corrections: - `returnDisplayMessage` build-order comment was misleading. It said "the hint is appended after both the truncation block and the returnDisplayMessage build" — but `returnDisplayMessage` is built BEFORE truncation. Replaced with a chronological enumeration (1. initial value, 2. truncation marker append, 3. hint append) that matches what the code actually does. - Error-path preservation comment now acknowledges the narrow applicability (spawn failures only, exit codes don't reach this branch). Code is unchanged — the path is still real, just rare. * test(core): pin empty-output success + background-no-hint paths Two defensive tests for the long-running foreground hint: - empty-output success at >=60s — exercises the returnDisplayMessage='' → hint append branch (write-only commands like `tar czf` / `cp -r` produce no stdout). Asserts the user- facing returnDisplay still surfaces the advisory even when the command produced nothing else to show. - background never includes the hint — the foreground hint logic lives in executeForeground only, so today this can't fail; the test guards against a future refactor hoisting the advisory into a shared post-execute path that would tag every background launch with a nonsensical "ran for 0s, consider is_background: true" suggestion.	2026-05-04 23:24:14 +08:00
Shaojin Wen	efb7351d58	feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800 ) * feat(core): support reasoning effort 'max' tier (DeepSeek extension) DeepSeek's chat-completions endpoint added an extra-strong `max` tier to `reasoning_effort` (per https://api-docs.deepseek.com/zh-cn/api/create-chat-completion ; valid values are now `high` and `max`, with `low`/`medium` mapping to `high` for backward compat). Plumb it end-to-end: - `ContentGeneratorConfig.reasoning.effort` union now includes 'max'. - DeepSeek OpenAI-compat provider: translate the standard nested `reasoning: { effort }` shape into DeepSeek's flat `reasoning_effort` body parameter so user-configured effort actually takes effect (the nested shape was previously sent verbatim and silently ignored, defaulting to `high`). low/medium → high mirrors the documented server-side behavior so dashboards / logs match wire reality. An explicit top-level `reasoning_effort` (via samplingParams or extra_body) wins over the nested form. - Anthropic converter: pass 'max' through to `output_config.effort` unchanged and bump the `thinking.budget_tokens` budget for the new tier (low 16k / medium 32k / high 64k / max 128k). - Gemini converter: clamp 'max' to HIGH since Gemini has no higher thinking level. Without this, 'max' would silently fall through to THINKING_LEVEL_UNSPECIFIED. Live verification against api.deepseek.com: - `reasoning_effort: high` → 200 - `reasoning_effort: max` → 200 (the new tier) - `reasoning_effort: bogus`→ 400 with valid-set list confirming [high, low, medium, max, xhigh] 108 anthropic/openai-deepseek/gemini tests pass; full core suite (6601 tests) green; lint + typecheck clean. * fix(core): map xhigh→max + clamp max on non-DeepSeek anthropic + docs Address PR review (copilot × 2) and add missing user docs: 1. (J698) `translateReasoningEffort` claimed in the PR description that it surfaces the DeepSeek backward-compat mapping client-side, but only handled `low`/`medium` → `high`. Add `xhigh` → `max` to match the doc and stay symmetric with the low/medium branch. 2. (J6-A) `output_config.effort: 'max'` would have been emitted on any anthropic-protocol provider whenever a user configured it, even when the baseURL points at real `api.anthropic.com` (which only accepts low/medium/high and would 400). Reuse the existing `isDeepSeekAnthropicProvider` detector to clamp `'max'` → `'high'` on non-DeepSeek anthropic backends, with a debugLogger.warn so the downgrade is visible. DeepSeek anthropic-compatible endpoints still pass through unchanged. 3. New docs: - `docs/users/configuration/model-providers.md`: a "Reasoning / thinking configuration" section under generationConfig — single example targeting DeepSeek + a per-provider behavior table (OpenAI/DeepSeek flat reasoning_effort, OpenAI passthrough for other servers, real Anthropic clamp, Anthropic-compatible DeepSeek passthrough, Gemini thinkingLevel mapping). - `docs/users/configuration/settings.md`: extend the `model.generationConfig` description to mention `reasoning` (the field was undocumented before this PR even though it already existed as a typed field) and link to the new section. 96 anthropic + deepseek tests pass; lint + typecheck clean. * refactor(core): single-source effort normalization for anthropic + doc fix Address PR review round 2 (copilot × 2): 1. (J8aG) The `contentGenerator.ts` comment claimed passing `reasoning.effort: 'max'` to real Anthropic was "up to the user", but commit `b5b05ae` actively clamps 'max' → 'high' (with a debug log) on non-DeepSeek anthropic backends. Update the comment to describe current runtime behavior. 2. (J8aL) The clamp ran inside `buildOutputConfig()` only — the effort label was downgraded but `buildThinkingConfig()` still used the raw user value to size the budget, so a non-DeepSeek anthropic request could end up with `output_config.effort: 'high'` paired with a 'max'-sized 128K thinking budget. Inconsistent label vs. budget on the wire. Refactor: hoist the normalization into a single `resolveEffectiveEffort()` helper that runs once per request in `buildRequest()`. Both `buildThinkingConfig` and `buildOutputConfig` now consume the same clamped value, so the budget ladder and the effort label stay aligned. The debug log fires once per request. Add a regression test asserting that on a non-DeepSeek anthropic provider with `effort: 'max'` configured, the wire request carries both `output_config.effort: 'high'` AND `thinking.budget_tokens: 64_000` (the 'high' tier), not the 128K 'max' budget. 96 tests pass; lint + typecheck clean. * fix(core): tighten 'max' clamp + warn-once + strip reasoning_effort on side queries Address PR review round 3 (copilot × 3): 1. (J-2v) When request.config.thinkingConfig.includeThoughts is false, pipeline.buildRequest's post-processing only deleted the nested `reasoning` key. The DeepSeek provider's translateReasoningEffort may have already flattened an extra_body-injected reasoning into top-level `reasoning_effort` by that point, so a side query (e.g. suggestionGenerator) could still ship reasoning_effort on the wire. Extend the post-processing to also delete `reasoning_effort`. 2. (J-2z) The warn for clamping 'max' on non-DeepSeek anthropic ran on every request needing the downgrade — the docstring claimed "first time only" but the implementation didn't latch. Add a private `effortClampWarned` boolean on the generator so the warning fires once per generator lifetime. 3. (J-23) `resolveEffectiveEffort` used the broad `isDeepSeekAnthropicProvider` detector for the clamp decision, but that helper falls back to model-name matching to cover sglang/vllm self-hosted DeepSeek deployments. A model configured as e.g. "deepseek-distill" but routed to real api.anthropic.com would bypass the clamp and trigger HTTP 400. Split the detector: keep `isDeepSeekAnthropicProvider` (broad) for the thinking-block injection workaround where false-positives are harmless, and add `isDeepSeekAnthropicHostname` (hostname-only) for decisions where a model-name false-positive would route DeepSeek-only behavior to a stricter backend. The clamp now uses the hostname-only check. New regression test: a config with model name containing "deepseek" but baseURL pointing at api.anthropic.com still clamps `'max'` to `'high'`. Existing "passes max through" test updated to set a DeepSeek baseURL since model name alone no longer suffices for the clamp bypass. 385 tests pass; lint + typecheck clean. * docs(core): correct pipeline timing comment + samplingParams caveat Address PR review round 4 (copilot × 3) — three documentation accuracy fixes, no behavior change: 1. (KBcw) The post-processing comment in pipeline.ts misdescribed the call order ("after this branch already ran during the same buildRequest pass") — provider.buildRequest actually runs BEFORE the includeThoughts=false post-processing in the same pass. Reword to match the actual order: provider hook flattens nested reasoning to reasoning_effort first, this cleanup runs after and strips both shapes. 2. (KBdC, KBdE) The "Reasoning / thinking configuration" section in model-providers.md and the model.generationConfig description in settings.md both implied `reasoning` is honored on every provider. For OpenAI-compatible providers, when `generationConfig.samplingParams` is set, `ContentGenerationPipeline.buildGenerateContentConfig()` ships samplingParams verbatim and skips the separate `reasoning` injection entirely. Configs like `{ samplingParams: { temperature: 0.5 }, reasoning: { effort: 'max' } }` would silently drop the reasoning field on OpenAI/DeepSeek requests. Add an explicit "Interaction with samplingParams" warning section in model-providers.md and a parenthetical note in settings.md directing users to put `reasoning_effort` inside `samplingParams` (or `extra_body`) when both are configured. 385 tests pass; lint + typecheck clean. * docs(core): clarify explicit budget_tokens bypasses 'max' effort clamp When user sets `{ effort: 'max', budget_tokens: N }` on a non-DeepSeek anthropic backend, the effort label gets clamped to 'high' (otherwise the server 400s on the unknown enum) but the explicit budget_tokens is preserved verbatim. The wire-shape mismatch is intentional, not a bug: the clamp only protects the enum field, while budget is a free integer the server accepts within the context window, so an explicit override stays explicit. Document the contract on the early-return and add a regression test that locks it in. * docs(deepseek): fix comments to match flatten + reasoning-strip behavior Two doc-only nits called out in review: 1. `buildRequest` JSDoc said non-text parts are "rejected", but `flattenContentParts` actually substitutes a textual placeholder (`[Unsupported content type: <type>]`) so the request still goes through with a breadcrumb. Reword the JSDoc accordingly. 2. `translateReasoningEffort`'s strip comment claimed it strips the nested form to avoid shipping both shapes, but it only drops the duplicated `effort` key when other keys (e.g. `budget_tokens`) are present. Reword to describe the actual selective behavior and why keeping orthogonal keys is intentional. Behavior unchanged. * fix(deepseek): gate reasoning_effort translation on actual DeepSeek hostname The provider class is selected via the broader `isDeepSeekProvider` check, which falls back to model-name matching to cover self-hosted DeepSeek deployments (sglang/vllm/ollama, see #3613). That fallback is the right call for content-part flattening — it's a model-format constraint baked into the model itself, not the API surface. But the same broad detection was also gating `translateReasoningEffort`, which rewrites the standard `reasoning: { effort }` config into DeepSeek's flat `reasoning_effort` body parameter. That's a wire-shape decision, not a model-format one: strict OpenAI-compat backends in self-hosted setups may not accept the DeepSeek extension and would have happily handled the original shape. Split the two decisions: keep `isDeepSeekProvider` (broad) for flattening, add a hostname-only `isDeepSeekHostname` and gate the body rewrite on it. Self-hosted DeepSeek users who actually want the translation can either use a baseUrl containing api.deepseek.com or inject `reasoning_effort` directly via `samplingParams`/`extra_body`. Regression tests: - self-hosted (sglang) with deepseek-named model + nested `reasoning.effort` → flattening still runs, body shape preserved - `isDeepSeekHostname` matches api.deepseek.com but not custom hosts * fix(deepseek): use URL parsing in isDeepSeekHostname; fix log-level docs CodeQL flagged a high-severity URL substring sanitization issue on the new `isDeepSeekHostname` helper. The naive `baseUrl.includes('api.deepseek.com')` check would false-positive on hostile hosts like `https://api.deepseek.com.evil.com/v1` and incorrectly inject the DeepSeek-only `reasoning_effort` body parameter into requests routed elsewhere. Switch to `new URL(...).hostname` with exact match against `api.deepseek.com` (and `.api.deepseek.com` subdomains), mirroring `isDeepSeekAnthropicHostname` on the Anthropic side. Invalid URLs treated as non-DeepSeek. `isDeepSeekProvider` already routes through `isDeepSeekHostname`, so the hardening applies to both decision paths. Regression tests cover: - subdomain match (us.api.deepseek.com) - hostile substrings (api.deepseek.com.evil.com, evil.com/api.deepseek.com/v1, api.deepseek.comevil.com, api-deepseek-com.example.com) - invalid / empty baseUrl Also fix two doc-level mismatches: the `'max'` clamp on Anthropic logs via `debugLogger.warn` (warning level, once per generator), not "with a debug log". Update both `ContentGeneratorConfig.reasoning` JSDoc and the per-provider behavior table in model-providers.md. * feat(deepseek): emit thinking:disabled signal when reasoning is off DeepSeek V4+ defaults `thinking.type` to `'enabled'`, so just stripping `reasoning_effort` from the request leaves the server happily thinking on side queries — paying full thinking latency/cost without an effort configured. Per yiliang114's review, emit the explicit `thinking: { type: 'disabled' }` field on the wire whenever reasoning is disabled. Triggered when either: - `request.config.thinkingConfig.includeThoughts === false` (forked queries, e.g. suggestion generation) - `contentGeneratorConfig.reasoning === false` (config-level opt-out) The previous post-processing block only fired on the per-request opt-out path, so the config-level case was already leaking. Unify both under a single `reasoningDisabled` predicate that runs the same strip + signal logic. Hostname-gated to `api.deepseek.com` (and subdomains): self-hosted DeepSeek behind sglang/vllm/ollama, or older DeepSeek versions, may not accept the V4 thinking parameter — pushing it there could trip an unknown-key 400. Mirrors the round-7 decision to gate `reasoning_effort` translation on hostname. Regression tests cover all four matrix points: - DeepSeek hostname + includeThoughts false → emits disabled - DeepSeek hostname + reasoning false → emits disabled - non-DeepSeek hostname + includeThoughts false → does not emit - self-hosted DeepSeek (model-name fallback only) → does not emit Docs: extend the `reasoning: false` section with the new behavior and the self-hosted/non-DeepSeek caveat. * refactor(deepseek): expose isDeepSeek* as free functions; clarify docs Two doc/coupling nits from review: 1. The pipeline post-processing block was importing the concrete `DeepSeekOpenAICompatibleProvider` class just to reach `isDeepSeekHostname`. That couples the generic OpenAI pipeline to a specific provider implementation. Promote the helper (and its broad `isDeepSeekProvider` sibling) to free `export function`s in `provider/deepseek.ts` and import them by name. The class keeps thin static delegates for backward compat with existing callers and tests. 2. The per-provider behavior table on `model-providers.md` said `'low'/'medium' → 'high'` and `'xhigh' → 'max'` "client-side", but that normalization only fires inside `translateReasoningEffort`, which runs on the nested `reasoning.effort` config path. Explicit top-level overrides via `samplingParams.reasoning_effort` or `extra_body.reasoning_effort` skip the rewrite and ship verbatim. Reword the row to reflect that. Behavior unchanged.	2026-05-04 22:42:23 +08:00
Shaojin Wen	fcefab6df5	fix(core): clear FileReadCache on every history rewrite path (#3810 ) * fix(core): clear FileReadCache after microcompaction Microcompaction (the idle-cleanup pass that runs at the start of every new user/cron message) replaces old read_file / shell / glob / grep / edit / write_file tool outputs with a `[Old tool result content cleared]` placeholder. The FileReadCache, however, still records the prior full Reads as "seen in this conversation" — so the next ReadFile of an unchanged file returns the file_unchanged placeholder pointing at bytes the model can no longer retrieve from history. The result is a Read that succeeds at the tool layer but delivers no usable content to the model, which is the failure mode reported in #3805 ("read tool returns no content in long-running sessions"). This mirrors the existing post-compaction clear in tryCompressChat — microcompaction has the same "history rewrite invalidates the cache's 'model has seen this' assumption" property, it was just missed when the cache was wired in. * fix(core): clear FileReadCache on every history rewrite path PR1 only patched microcompaction, but a multi-round audit found four more entry points that rewrite history without clearing the cache, producing the same `file_unchanged` placeholder vs. missing-content mismatch. Each is fixed in the same minimal way (clear() at the call site) and covered by a regression test: - GeminiClient.setHistory — /restore checkpoint, /load_history - GeminiClient.truncateHistory — rewind in AppContainer - GeminiClient.resetChat — public API; clearCommand happens to clear the cache via startNewSession beforehand, but other callers have no such guarantee - stripOrphanedUserEntriesFromHistory — Retry path drops trailing user entries that may include read_file functionResponses Also tightened the microcompaction comment ("compactable tool outputs" instead of an enumerated list, since the source of truth is microcompact.COMPACTABLE_TOOLS) and removed caller references per the codebase comment style. Reverse-tested every new clear() by commenting it out and confirming the matching regression test fails. * test(core): integration test for FileReadCache + history rewrite End-to-end tests using the real ReadFileTool, real FileReadCache, real microcompactHistory, and a real on-disk file. Three cases: 1. Without a cache clear after microcompact, the second Read of an unchanged file returns the file_unchanged placeholder while the prior content has already been wiped from history. Demonstrates the failure mode this PR fixes. 2. After an explicit cache.clear(), the second Read re-emits the real bytes. Demonstrates that the fix works. 3. When microcompact removes every prior read of a file, the placeholder leaves zero recoverable bytes — the model literally cannot find the content anywhere it can reach. These complement the existing unit tests in client.test.ts (which verify the call-site wiring) by proving the end-to-end behaviour through the real code paths, without mocks. * chore(core): add traceable debug log for every FileReadCache clear Per review feedback: the new clear() call sites were silent, leaving no breadcrumb in production debug streams when the cache is dropped. Adds a `[FILE_READ_CACHE] clear after <reason>` log at every clear site (5 new + 1 pre-existing in tryCompressChat) so operators can grep one prefix and see why the cache was invalidated. * chore(core): refine truncateHistory cache clear + extract test helper Per review feedback (deepseek-v4-pro): 1. truncateHistory now skips the cache clear when keepCount >= prevLen, since a no-op truncate leaves the cache valid against the unchanged history. Adds a regression test covering both keepCount==prevLen and keepCount>prevLen. 2. The 6 cache-spy test cases each repeated the same 4-line mock setup. Extract a `mockFileReadCacheClear()` helper so future changes to the FileReadCache mock surface only need one edit. Both are quality-of-implementation tweaks; the underlying fix is unchanged. * perf(core): use O(1) getHistoryLength in truncateHistory Per Copilot review feedback: the previous commit's no-op detection in truncateHistory called this.getChat().getHistory().length, but GeminiChat.getHistory() does a structuredClone of the entire history on every call (line 770 of geminiChat.ts) — paying an O(history) clone purely to read .length. In long-running sessions with hundreds of entries this is a meaningful regression. Adds GeminiChat.getHistoryLength(): O(1), no clone. truncateHistory switches to it. The behaviour (skip clear when keepCount >= prevLen) is unchanged. Also adds: - Unit tests for GeminiChat.getHistoryLength (empty, after addHistory, parity with getHistory().length). - A regression test asserting truncateHistory calls getHistoryLength and NOT getHistory, locking in the perf fix against future drift. * fix(core): close NaN hole + use public ReadFileTool API in tests Two issues from copilot review: 1. NaN edge case in truncateHistory cache invalidation. The "did anything actually change?" check was `keepCount < prevLen`, but `Array.slice(0, NaN)` returns [] (history wiped) while `NaN < prevLen` is false. That sequence would wipe the chat but leave the FileReadCache claiming the model has seen the prior reads — exactly the file_unchanged placeholder bug this PR is closing. Switched the check to compare actual post-truncate length (`newLen < prevLen`), which correctly invalidates whenever entries were removed regardless of how `keepCount` was malformed. Added a NaN regression test. 2. The integration test cast `tool` to `unknown` to reach the protected `createInvocation()` method. Switched to the public `tool.buildAndExecute(params, signal)` API so the test exercises the same surface real callers use, including build-time schema validation.	2026-05-04 22:42:06 +08:00
Shaojin Wen	fbcf59e0b3	docs(core): point background-shell + monitor guidance at both /tasks and the dialog (#3808 ) Some checks are pending Qwen Code CI / Lint (push) Waiting to run Details Qwen Code CI / Test (push) Blocked by required conditions Details Qwen Code CI / Test-1 (push) Blocked by required conditions Details Qwen Code CI / Test-3 (push) Blocked by required conditions Details Qwen Code CI / Test-4 (push) Blocked by required conditions Details Qwen Code CI / Test-5 (push) Blocked by required conditions Details Qwen Code CI / Test-6 (push) Blocked by required conditions Details Qwen Code CI / Test-7 (push) Blocked by required conditions Details Qwen Code CI / Test-8 (push) Blocked by required conditions Details Qwen Code CI / Test-2 (push) Blocked by required conditions Details Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details SDK Python / SDK Python (3.10) (push) Waiting to run Details SDK Python / SDK Python (3.11) (push) Waiting to run Details SDK Python / SDK Python (3.12) (push) Waiting to run Details * docs(core): point background-shell guidance at both /tasks and the dialog Follow-up to PR #3801, fulfilling the "separate small PR" commitment in its description. The two model-facing strings (`shell.ts` after spawning a background shell, `task-stop.ts` after requesting cancel) referenced only `/tasks` as the inspection path, predating the interactive Background tasks dialog landing at #3488 / #3720 / #3791. Now that the dialog handles all three kinds (agent / shell / monitor), both surfaces should be visible to the LLM so it can suggest the right one based on the user's mode. Updates: - `shell.ts:865` (LLM message after `is_background: true` spawn) now surfaces both `/tasks` (text, any mode) AND the interactive dialog (footer pill + Enter, with detail view + live updates). Output file guidance retained. - `task-stop.ts:110` (LLM message after `task_stop` on a shell) same pattern: both surfaces named. - `task-stop.ts:95` comment updated to enumerate all observation paths (including the dialog). - `monitorRegistry.ts:197` comment fixed — said "/tasks dialog" which conflated two distinct surfaces. Split to "Background tasks dialog reopens or `/tasks` listings". - `backgroundShellRegistry.ts:10` (module docstring) and `:31` (shellId doc) now mention all three consumers (agent, dialog, slash command). No behavior change — pure documentation/string update. Tests untouched (none asserted on these exact strings); build + lint + 152-test core suite all clean. * docs(core): address PR 3808 review — 'captured output' + consistent ordering Three review nits: 1. (LoqU — copilot) `shell.ts:865` said the output file holds "raw content", but `shellExecutionService` runs each chunk through stripAnsi and skips non-string/binary chunks before writing. Reword to "captured output" so callers don't expect a byte-for-byte stream. 2. (LqKr — wenshao) The PR mentioned both surfaces in two different orders depending on the file: `backgroundShellRegistry.ts` listed the dialog first, while `task-stop.ts` and `shell.ts` listed `/tasks` first. Unify on the LLM-facing order — `/tasks` first, then the interactive Background tasks dialog — across all four sites. Also flips the line-31 docstring on the `shellId` field for the same reason. 3. (LqKt — wenshao, flagged for awareness only) Trim the redundant keystroke detail in shell.ts:865 to match `task-stop.ts:111`'s shorter "(footer pill + Enter)" form. Saves ~7 tokens per background shell launch in `llmContent` while still naming both surfaces. The PR description's rationale (LLM should know both surfaces exist so it can suggest the right one for the user's mode) is preserved — only the operational verbosity is trimmed. 581 tests pass; lint + typecheck clean. Pure docs / string update. * docs(core): grammar polish on PR 3808 strings Two more wording nits from copilot review: - backgroundShellRegistry.ts:10 — change "metadata the agent…need to query" to "metadata that the agent…use to query". The original phrasing reads as if the metadata itself is performing the query. - shell.ts:865 — change "Read the output file directly for the captured output." to "Read the output file directly to view the captured output." — clearer instruction to the model/user. Pure wording, no behavior change. * docs(core): grammar fix on PR 3808 monitor comment 'not visible from later Background tasks dialog reopens' read as if 'reopens' was a noun. Reword to 'not visible after reopening the Background tasks dialog or from /tasks listings'. * docs(core): round 4 wording polish on PR 3808 Four more nits from copilot: - shell.ts:865 + task-stop.ts:96,111: "footer pill + Enter" was ambiguous now that the footer renders multiple pills (background tasks vs other status indicators). Disambiguate to "focus the footer Background tasks pill, then Enter". - monitorRegistry.ts:198: re-tweak my round-3 phrasing — "after reopening the Background tasks dialog or from /tasks listings" → "in later Background tasks dialog reopens or /tasks listings". Reads as "from those surfaces" rather than "after reopening", which the reviewer found ungrammatical. - backgroundShellRegistry.ts:10,31: clarify "/tasks" as the slash command, since the codebase also uses "<projectDir>/tasks/..." on-disk paths in agent-transcript contexts. Pure wording, no behavior change. 87 affected tests pass. * docs(core): mirror /tasks + dialog guidance to monitor llmContent paths Address @doudouOUC review on PR #3808 — two Medium findings: this PR updated shell-facing strings to mention both inspection surfaces but left the parallel monitor strings without any inspection guidance, even though monitors render in the same /tasks output and the same Background tasks dialog. Restore symmetry: - monitor.ts:587-598 — append the same "/tasks (text) or the interactive Background tasks dialog (focus the footer Background tasks pill, then Enter — detail view + live updates)" sentence to the Monitor-started llmContent, mirroring shell.ts:865. - task-stop.ts:125-131 — the monitor cancellation llmContent had no guidance at all. Add the same "Final status will be visible via /tasks (text) or the interactive Background tasks dialog (focus the footer Background tasks pill, then Enter) once the process drains" line that already existed for shells at task-stop.ts:111. The (Low) commit-churn observation is a maintainer call (squash on merge); the (Info) snapshot-test gap is pre-existing and not in scope. 78 monitor + task-stop tests pass; lint + typecheck clean. * docs(core): drop drain phrasing for monitor cancel + restructure dialog comment Address PR #3808 review round 5 (doudouOUC + copilot × 2): 1. (XNoH copilot, XSBu doudouOUC — Medium) The monitor cancellation message inherited "once the process drains" from the shell branch, but `monitorRegistry.cancel()` settles synchronously — when the tool returns, the entry is already `cancelled`, not waiting on a child process. The drain qualifier is accurate for shells (which use `requestCancel()` + the AbortController and settle when the real process exits) but misleading for monitors. Reword the monitor branch in `task-stop.ts:121-130` to drop the drain phrasing and add an explanatory comment about the sync vs. async difference so future maintainers don't replicate the wording from the shell branch by reflex. 2. (XNod copilot — wording, third revision on the same comment) Restructure rather than re-litigate the preposition. The "reopens" noun framing has gone through three rounds of churn (`from later... reopens` → `after reopening...` → `in later... reopens` → and now back to `from`). Sidestep the loop by making the comment a proper sentence about WHAT the surfaces actually read: the persisted `entry.error` is the source of truth; the chat-history notification is a separate, ephemeral side channel. Avoids the noun-form "reopens" entirely. Updated test assertion to match the new "Monitor \"...\" cancelled" prefix. 51 tests pass; lint + typecheck clean.	2026-05-04 22:27:00 +08:00
jinye	2fea1d3aa7	fix(core): address post-merge monitor tool and UI routing issues (#3792 ) * fix(core): address post-merge monitor tool and UI routing issues - Guard token bucket against clock drift after system suspend/resume (negative elapsed resets lastRefill instead of starving the bucket) - Add debugLogger.warn for AST read-only check failures in monitor getConfirmationDetails (previously silent catch) - Consolidate SHELL_TOOL_NAMES: export from rule-parser.ts, import in permission-manager.ts (removes identical SHELL_LIKE_TOOLS duplicate) - Extract hasBlockingBackgroundWork/resetBackgroundStateForSessionSwitch to shared backgroundWorkUtils.ts (removes identical copies in clearCommand.ts and useResumeCommand.ts) - Consolidate getToolCallComponent routing into packages/webui (removes near-identical copies in ChatViewer.tsx and vscode-ide-companion, adds missing web_search compat alias to VSCode path) - Add test for droppedLines count in terminal notification text - Add test for exit(null, null) settlement (externally killed process) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core,cli,vscode): address PR #3792 review feedback - Add debugLogger.warn for clock-drift guard (observability for throttle bucket resets after suspend/resume) - Add test for clock-drift recovery (elapsed < 0 bucket reset) - Add test for AST parse failure catch path (mockRejectedValueOnce) - Forward isFirst/isLast props through VSCode ToolCallRouter (fixes timeline connector rendering) - Add test for shell running branch in hasBlockingBackgroundWork 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core,vscode): address follow-up review comments - Fix React.FC missing import: use `import type { FC } from 'react'` instead of `React.FC` (original file had this import before refactor) - Tighten clock-drift test: emit while clock is in the past to confirm guard resets lastRefill, then verify refill at the new reference point 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core,cli): adopt review feedback — ReadonlySet + hasRunningEntries - Type SHELL_TOOL_NAMES as ReadonlySet<string> to prevent accidental mutation of permission-critical set - Use BackgroundShellRegistry.hasRunningEntries() instead of getAll().some() for zero-allocation short-circuit check - Update clearCommand test mocks to include hasRunningEntries 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core,webui,vscode): address remaining PR #3792 review comments - Merge AgentToolCall + isAgentExecutionToolCall into single import in routing.ts (comment 3178280762) - Use real getToolCallComponent via vi.importActual in VSCode test mock so routing logic is validated, not a parallel mock that can drift (comment 3178280775) - Validate isFirst/isLast forwarding in VSCode test mock via data attributes (comment 3178346891) - Add comment documenting debugLogger.warn no-op tradeoff for clock drift guard (comment 3178346889) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * test(webui): add unit test for getToolCallComponent routing Covers all 8 component branches including the web_search compatibility alias, agent execution detection, case-insensitive matching, and fallback to GenericToolCall. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(cli): add missing hasRunningEntries to useResumeCommand test mocks The backgroundWorkUtils refactor replaced getAll().some() with hasRunningEntries(), but the test mocks in useResumeCommand.test.ts were not updated, causing CI failures. 🤖 Generated with [Qoder Code](https://github.com/QwenLM/qwen-code) * test(cli): add unit tests for backgroundWorkUtils shared utility Cover hasBlockingBackgroundWork (6 cases including short-circuit behaviour) and resetBackgroundStateForSessionSwitch (1 case verifying all three registries are reset). 🤖 Generated with [Qoder Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com>	2026-05-04 21:19:41 +08:00
jinye	03f66bada5	feat(sdk-python): add PyPI release workflow (#3685 ) * feat(sdk-python): add pypi release workflow Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): build cli before smoke test Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): tighten release conflict handling Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): harden python release workflow Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): tighten stable release guards Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): harden prerelease publish flow Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): reuse release branches on rerun Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): resume incomplete releases Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(release): tighten missing-release checks Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): resume stable release reruns Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): tighten release recovery guards Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(sdk-python): cover release version edge cases Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): address release workflow review feedback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * refactor(sdk-python): address review feedback on release version script - Remove unreachable `if (type === 'stable')` branch in bumpVersion(); the stable path was dead code since getVersion() throws for all stable conflicts before calling bumpVersion(). Move nightly conflict throw to the call site for symmetry. - Rename getNextPatchBaseVersion → getNextBaseVersion to reflect that the function can return a prerelease base without incrementing patch. - Add test for preview+nightly coexistence where nightly base is higher. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(sdk-python): address remaining review feedback on release workflow - Fix failure-issue gate to read github.event.inputs.dry_run directly instead of steps.vars.outputs.is_dry_run (which is empty when early steps fail). Add --repo flag for gh issue create when checkout failed. - Add diagnostic state table to failure-issue body (RELEASE_TAG, PACKAGE_VERSION, PUBLISH_CHANNEL, RESUME_EXISTING_RELEASE, etc.) - Fix release-notes error swallow: only silence release not found / Not Found / HTTP 404, emit :⚠️: for other gh release view errors. - Improve validateVersion error messages to use human-readable format keys (X.Y.Z, X.Y.Z-preview.N) matching TS sibling convention. - Filter fully-yanked versions in getAllVersionsFromPyPI. - Add console.error log when stable is derived from nightly. - Add bash regex guard for inputs.version to prevent shell injection. - Use per-release-type concurrency groups (nightly/preview/stable). - Add jq null-guard checks for all 6 field extractions. - Remove misleading --follow-tags from git push (lightweight tags). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(sdk-python): rename misleading test description The test asserts that preview/nightly releases return empty previousReleaseTag, but the name said "same-channel previous release tags" which implied non-empty values. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(sdk-python): address unresolved review comments on release workflow - Remove -z check in extract_field() that blocked preview/nightly releases (previousReleaseTag is legitimately empty for non-stable releases) - Use static environment.url since step outputs aren't available at job startup - Use skip-existing for resumed PyPI publish to fill in missing artifacts - Add AbortSignal.timeout(30s) to PyPI fetch to prevent indefinite hangs - Add downgrade guard for stable_version_override - Use GHA :⚠️: annotation instead of console.error for visibility - Separate yanked/non-yanked version lists so conflict detection includes yanked versions (PyPI still reserves those slots) - Filter current release from previousReleaseTag to avoid self-reference on resume - Add tests for yanked conflict detection, downgrade guard, and resume previousReleaseTag Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): address final review round on release version script - Fix getNextBaseVersion() first-release skip: use pyproject.toml version directly when PyPI has no stable versions instead of unconditionally incrementing - Fix getNextBaseVersion() off-by-one: change > to >= so equal prerelease base continues the existing line instead of incrementing patch - Add :⚠️: annotation when preview auto-bumps due to orphan git tags (tag exists without PyPI version or GitHub release) - Add set -euo pipefail to 5 workflow steps missing it: release_branch, persist_source, Create GitHub release, Delete prerelease branch, Create issue on failure - Fix 2 existing tests affected by first-release change, add 4 new tests 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(sdk-python): use stderr for GHA warning annotations to avoid corrupting JSON stdout console.log writes to stdout, which gets captured by VERSION_JSON=$(node ...) in the workflow and corrupts the JSON output for jq. Switch to console.error so :⚠️: annotations go to stderr (GHA recognizes workflow commands on both streams). Also add set -euo pipefail to the "Get the version" step for consistency with other workflow steps. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-04 21:07:21 +08:00
jinye	e617f20d15	fix(telemetry): suppress async resource attribute warning on startup (#3807 ) Some checks are pending Qwen Code CI / CodeQL (push) Waiting to run Details Qwen Code CI / Lint (push) Waiting to run Details Qwen Code CI / Test (push) Blocked by required conditions Details Qwen Code CI / Test-1 (push) Blocked by required conditions Details Qwen Code CI / Test-2 (push) Blocked by required conditions Details Qwen Code CI / Test-3 (push) Blocked by required conditions Details Qwen Code CI / Test-4 (push) Blocked by required conditions Details Qwen Code CI / Test-5 (push) Blocked by required conditions Details Qwen Code CI / Test-6 (push) Blocked by required conditions Details Qwen Code CI / Test-7 (push) Blocked by required conditions Details Qwen Code CI / Test-8 (push) Blocked by required conditions Details Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details * fix(telemetry): suppress async resource attribute warning on startup Disable NodeSDK's default auto-detection of host/process/env resources. These detectors return async attributes; if a span is exported before they settle, OTel logs an error-level diag message that surfaces in the terminal UI. The attributes we actually need (service name, version, session.id) are already provided synchronously via resourceFromAttributes. Closes part of #3731 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): add comment and test for autoDetectResources: false Address review feedback: - Add inline comment explaining why auto-detection is disabled (avoids async attribute warning before first span export) - Add test assertion that NodeSDK is constructed with autoDetectResources: false to prevent silent regression 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): improve comment precision and extend autoDetectResources test coverage - Clarify comment: error fires on any resource attribute read before async detectors settle, not just during export; HttpInstrumentation span creation is the typical trigger - Add autoDetectResources: false assertion to HTTP OTLP and file exporter test cases for branch-complete coverage across all three NodeSDK construction paths 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)	2026-05-03 19:36:03 +08:00
Shaojin Wen	07fdfadc33	feat(cli): include monitors in /tasks + add interactive-mode hint (#3801 ) * feat(cli): include monitors in /tasks + add interactive-mode hint Phase B closure for Issue #3634. Two coupled changes to /tasks: 1. Bug fix — include monitors. The command was last touched before #3684 / #3791 landed, so it merged only agent + shell entries while monitors silently disappeared from the headless / non-interactive / ACP listing path. Add a third registry pull from `getMonitorRegistry()` and wire monitor through statusLabel / taskLabel / taskId / taskOutputPath. Status line includes eventCount (`running (N events)`, `completed (exit 0, N events)`, `completed (Max events reached, N events)` for auto-stop) and pid where defined. 2. Soft deprecation hint, scoped to interactive mode only. Once the richer Ctrl+T dialog (#3488 + #3720 + #3791) is available, the text dump is the long-form fallback rather than the primary surface. Show `Tip: Ctrl+T opens the interactive Background tasks dialog with detail view + live updates.` at the top of the output when `executionMode === 'interactive'`. Headless / ACP get the bare list — they have no dialog to point at and the hint would just clutter. Description string also clarified to call out the modal split. Kept on all three executionModes (no deletion) — `/tasks` is the only way headless / ACP / SDK consumers can inspect background-task state. Tests: 4 new cases in tasksCommand.test.ts cover monitor entry formatting (running with pid, natural completion with exitCode, auto-stop with error string, failed), the singular `1 event` form, the interactive-mode hint gating, and the cross-kind merge order. * fix(cli): address PR 3801 review — exhaustive switch + i18n + extra tests Three actionable Suggestions from /review's pass: - `taskLabel` rewritten as a `switch` with a `never`-typed `default` arm, matching the structural-safety pattern already used by `taskId`. Adding a 4th DialogEntry kind in the future will now flip both helpers to compile errors instead of letting `taskLabel` silently fall through to `entry.description` (which the new kind may not have). - Hint string wrapped in `t()` for i18n consistency with the rest of the file. The literal stays as the i18n key default, so today's output is unchanged. - Tests: cover `cancelled` monitor status (was the only one without an inline assertion) and explicit `acp` execution mode hint suppression (pins the suppression rationale so a future regression flipping the check to `!== 'non_interactive'` would fail loudly). * fix(cli): correct /tasks dialog-open hint — Ctrl+T was wrong Tmux verification on PR #3801 caught that the hint string says "Ctrl+T opens the interactive Background tasks dialog" but Ctrl+T is actually bound to the MCP tool descriptions toggle (ContextSummaryDisplay.tsx lines 110-115). The dialog opens via Down arrow on an empty composer (focuses the footer pill) followed by Enter (InputPrompt.tsx 947-968). Same misattribution slipped into PR #3791's first description and was caught + fixed there before merge — this PR carried the wrong wording forward in code. Updates four sites: - The hint string itself: "Tip: press ↓ from an empty composer then Enter to open the interactive Background tasks dialog with detail view + live updates." - The slash-command description: "interactive UI is Ctrl+T" → "interactive dialog opens via the footer pill" - Two inline comments referencing Ctrl+T as the dialog opener - The interactive-mode hint test now pins on `↓` + `Enter` and asserts `not.toContain('Ctrl+T')` so a regression to the wrong wording fails loudly. * fix(cli): address PR 3801 review — exhaustive switch consistency + path-agnostic hint Four Suggestions from the latest /review pass: - `statusLabel` rewritten as a single top-level switch with a `never`-typed default, matching `taskLabel` / `taskId` / `taskOutputPath`. The previous `if`/`if`/fallthrough form would silently apply monitor formatting to a future 4th kind. - `taskOutputPath` gained the same exhaustive default — was the only per-kind helper still relying on implicit fallthrough; would silently omit a 4th-kind output path while the adjacent helpers flip to compile errors. - Hint wording de-specifies the exact keystroke count: `'Tip: focus the Background tasks pill in the footer (use ↓ from an empty composer) and press Enter ...'`. Previous "press ↓ then Enter" phrasing was wrong when the Arena agent tab bar is present — `InputPrompt`'s focus chain routes Down through the tab bar first, so a single Down lands there, not on the bg pill. - Test pin tightened: `[mon_fail] failed: spawn ENOENT (0 events)` is now a full-string assertion instead of a prefix match, so a regression that drops the `(N events)` suffix from monitor's failed branch fails loudly. * fix(cli): sanitize ANSI escape sequences in /tasks output deepseek's review pass flagged that monitor description / error fields are user / process-supplied strings rendered directly to the terminal. A maliciously-crafted tool description or spawn error containing raw ANSI control sequences (clear-screen, cursor-move, colour) would otherwise reach stdout verbatim and corrupt display. Same risk applies to agent error / description and shell error / command — all already-existing renderers with the same exposure that this PR didn't introduce but inherits. So instead of per-field sprinkling, wrap the joined output once with `escapeAnsiCtrlCodes` (no-op when no control chars present, so cost is zero in the common case). One line change in the renderer covers every kind including any future one. Test pins the behaviour: a monitor entry with `\x1b[2J` / `\x1b[31m...` content produces output with no raw ESC bytes and visible escaped `[...]` sequences. * docs(cli): tighten escapeAnsiCtrlCodes comments to match actual scope Two doc-precision Suggestions from copilot's pass on `0840e32f6`: - Source comment claimed `escapeAnsiCtrlCodes` is "a no-op when no control chars" but it's narrower than that — it only handles sequences matched by `ansi-regex` (CSI / OSC / SGR — anything starting with ESC). Isolated C0/C1 control bytes like BEL, BS, VT pass through untouched. Updated the comment to enumerate the actual scope and call out that `node:util`'s `stripVTControlCharacters` would be needed if those become a concern. - Test comment had a literal raw ESC byte (octal 033) embedded in the source — visually showed `^[[...]` in editors that render ESC, but was a real ESC byte in the file rather than the escaped `` form the sanitizer produces. Rewrote with a literal `` text description so what the comment shows matches what the assertions check for. * fix(cli): broaden /tasks sanitization + tighten inner switch exhaustiveness Addresses 3 of 5 items from doudouOUC's PR 3801 review: - Issue 1 (Low) — C0/C1 control byte gap: switched from `escapeAnsiCtrlCodes` (only handles ESC-initiated ANSI sequences) to `stripUnsafeCharacters` (one-pass strip of ANSI + VT + C0/C1, with TAB/CR/LF preserved). The pre-existing exposure to bare BEL / BS / FF / VT bytes via shell entry strings is now closed for all three kinds. Test rewritten to cover both ANSI sequences AND bare control bytes (BEL, BS), and pins that surrounding printable text and line breaks survive. - Issue 2 (Low) — inner status switches inconsistent: the three inner `switch (entry.status)` blocks (agent / shell / monitor) used `case 'running': default: return 'running'` (or duplicated bodies). All three now have explicit `running` cases plus a `never`-typed default that throws — matches the outer `switch (entry.kind)` pattern and means a future status added to any of `BackgroundTaskEntry` / `BackgroundShellEntry` / `MonitorStatus` flips to a compile error here instead of silently returning `'running'`. - Issue 5 (Nit) — beforeEach default change: added an inline comment explaining why the test default overrides `createMockCommandContext`'s `'interactive'` default (`'non_interactive'` lets the hint-suppression assertions work without each test rebinding context). Issues 3 and 4 from the review are nits with no action needed (3 is already documented as intentional; 4 is a UX call about hint length that's better handled by user feedback than guess-tweaking). * fix(cli): bind status to local before exhaustive switch — fixes tsc build CI's `tsc --build` (full mode, vs `--noEmit` locally) caught that `switch (entry.status)` followed by a `never`-typed default reading `entry.status` doesn't compile. After the case arms exhaust the discriminated union, TS narrows `entry` itself to `never`, so the `.status` access in the default arm becomes "Property 'status' does not exist on type 'never'" + the resulting `any` value can't be assigned to `never`. Fix: bind `entry.status` to a local `status` const before the inner switch. The local stays typed as the per-kind status union and narrows correctly to `never` at the default arm — `const _exhaustive: never = status` is then `never = never`, valid. Standard exhaustive-switch-on-discriminator pattern; doesn't change runtime behavior or test surface, just gets past TS narrowing on the nested case.	2026-05-03 18:45:51 +08:00
Shaojin Wen	cdadbcdb33	feat(cli): wire Monitor entries into combined Background tasks dialog (#3791 ) Some checks are pending Qwen Code CI / Lint (push) Waiting to run Details Qwen Code CI / Test (push) Blocked by required conditions Details Qwen Code CI / Test-1 (push) Blocked by required conditions Details Qwen Code CI / Test-2 (push) Blocked by required conditions Details Qwen Code CI / Test-3 (push) Blocked by required conditions Details Qwen Code CI / Test-4 (push) Blocked by required conditions Details Qwen Code CI / Test-5 (push) Blocked by required conditions Details Qwen Code CI / Test-6 (push) Blocked by required conditions Details Qwen Code CI / Test-7 (push) Blocked by required conditions Details Qwen Code CI / Test-8 (push) Blocked by required conditions Details Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details * feat(cli): wire Monitor entries into combined Background tasks dialog Phase C mirror follow-up of Issue #3634, structurally a clean repeat of #3720 but for Monitor (third consumer of the kind framework). Core (MonitorRegistry): - Add `setStatusChangeCallback` mirroring `BackgroundShellRegistry` / `BackgroundTaskRegistry`. Fires synchronously inside `register()` (so subscribers see lifecycle start) and inside `settle()` (so subscribers see every running → terminal transition: complete / fail / cancel / emitEvent's auto-stop at maxEvents). - Subscriber failures are caught and logged but do not poison the registry — same defensive contract as the other two registries. CLI: - `useBackgroundTaskView` subscribes to all three registries (agent + shell + monitor) and merges by `startTime`. `DialogEntry` union extended with `(MonitorEntry & { kind: 'monitor' })`. `entryId` switches over the three kinds and returns the right id field. - `BackgroundTasksPill.getPillLabel` adds monitor to its KIND_NAMES table; grouping order is shell → agent → monitor (monitor last because it tends to be the longest-lived, least urgent to glance at). - `BackgroundTasksDialog`: - `rowLabel` returns `[monitor] <description>` for monitor rows. - New `MonitorDetailBody` showing command / status / pid / event count / dropped lines. No Progress block (monitors don't fire activity callbacks per-event). - DetailBody dispatcher gains the monitor branch. - `BackgroundTaskViewContext.cancelSelected` routes monitor cancels via `monitorRegistry.cancel(monitorId)`. Monitor's cancel is synchronous (settle + abort happen inside the registry), matching the same path task_stop already uses. Tests: 6 new core tests (registry callbacks fire on register / each terminal transition / not on emitEvent / on auto-stop / clear stops notifications / subscriber-failure isolation), 4 new pill tests (singular / plural / 3-kind grouping / running-only filter), dialog mock extended with `getMonitorRegistry`. * fix(cli): add exhaustive default arms to DialogEntry switches ESLint's `default-case` rule requires every switch to have a default arm even when TypeScript can prove the union is exhaustive. Add `default: { const _exhaustive: never = entry; throw ... }` to the five switches added in this PR — same pattern keeps both the runtime guard and the compile-time exhaustiveness check. * fix(core): fire statusChange in MonitorRegistry.reset() The newly-added `setStatusChangeCallback` subscriber misses `reset()`, so a `/clear` or session reset leaves stale monitor rows visible in the combined Background tasks dialog until an unrelated register/settle event happens. Both BackgroundShellRegistry and BackgroundTaskRegistry already fire statusChange on their reset paths — Monitor was the outlier. Fix: fire `statusChange()` (no arg) after `monitors.clear()`, with an early return when the registry is already empty so we don't notify on a no-op reset. Two new tests cover both branches. * fix(cli,core): address PR 3791 review feedback Four review threads from /review's second pass on top of `f26b700`: 1. MonitorDetailBody live counters were stale. `eventCount` and `droppedLines` came from the `useBackgroundTaskView` snapshot, which only refreshes on `statusChange`, and `emitEvent` deliberately doesn't fire `statusChange` (the per-event churn would burn the pill / AppContainer). Fix: extend the existing `selectedEntry` useMemo to re-resolve monitors from `monitorRegistry.get()` on each `activityTick`, mirroring the agent path. The pre-existing 1s wall-clock interval already drives the recompute while the entry is running, so no new callback wiring is needed. 2. Settle reasons weren't persisted on `MonitorEntry`. `fail()`, `complete(exitCode)`, max-events auto-stop, and idle-timeout all passed their reason strings only to the chat-history terminal notification — opening the dialog detail view after the monitor died showed the bare status with no clue why. Add `exitCode?` and `error?` fields (mirrors `BackgroundShellEntry`); populate them before `settle()` in each path; render them in `MonitorDetailBody` with appropriate colour (red for `failed`, warning for auto-stopped `completed`). 3. `monitorCancel` mock had no test asserting it. Add a dedicated test that opens detail on a monitor entry, presses `x`, and verifies `monitorRegistry.cancel(monitorId)` was called and the agent registry's cancel was NOT called. Pins the kind switch in `cancelSelected` so a regression flipping the monitor branch to anything else (e.g. `requestCancel`) would fail loudly. 4. `MonitorStatusChangeCallback` docstring was wrong. It claimed the callback fires on running → terminal transitions, but `register()` (nothing → running) and `reset()` (mass clear, fired with no entry) also fire it. Rewrite the docstring to enumerate the actual fire sites and explicitly note that `emitEvent` is excluded (per-event churn). * docs(cli,core): tighten MonitorEntry.error and rowLabel comments Two doc-precision fixes from copilot's PR 3791 review pass: - `MonitorEntry.error` enumerated max-events as the only auto-stop reason that populates the field, but `idle-timeout` also writes it (was added in the same earlier commit). Rewrote to list both current reasons and explicitly note: any future auto-stop reason should populate this field too. Also clarified that `cancelled` and natural-exit `completed` paths intentionally don't. - `rowLabel`'s shell-branch comment claimed long commands are "truncated by the row renderer's MaxSizedBox", but ListBody renders rows with plain `<Text>` (no MaxSizedBox / truncation helper). Long text wraps. Rewrote to describe actual behaviour and note the intentional decision to leave it wrapping (the dialog is what users open to see context — truncating defeats the purpose). * test(cli): cover MonitorDetailBody render branches + useBackgroundTaskView Two coverage gaps from /review's third pass on PR 3791: - New file `useBackgroundTaskView.test.ts` (6 cases) directly exercises the merge logic with a stub config: empty when config is null, merges three registries, sorts by startTime, tags `kind`, subscribes on mount, refreshes when any registry fires statusChange, clears all three subscriptions on unmount. - New `MonitorDetailBody render branches` describe block in `BackgroundTasksDialog.test.tsx` (8 cases) covers the conditional pieces my last commit added: title + Command, pid show/hide, eventCount singular vs plural, droppedLines show/hide, exitCode display, Error block (failed) vs Stopped because (auto-stop), and the all-undefined no-block case.	2026-05-03 10:05:19 +08:00
Umut Polat	a08d48b75c	fix(cli): stop double-wrapping and double-printing API errors in non-interactive mode (#3749 ) * fix(cli): stop double-wrapping and double-printing API errors in non-interactive mode In non-interactive (-p) mode, any upstream 4xx ended up on stderr three times: once from the stream-error handler, once from handleError after the thrown Error.message (already containing the formatted text) was fed back through parseAndFormatApiError producing "[API Error: [API Error: ...]]", and once more from JsonOutputAdapter.emitResult writing the same errorMessage out in TEXT mode. The top-level catch then framed the resulting throw as "An unexpected critical error occurred:" with a stack trace, which made a routine 4xx look like a CLI crash. Three coordinated changes: - AlreadyReportedError marks a throw whose message is already on the wire. handleError short-circuits on it: no second writeStderrLine, no second parseAndFormatApiError, just propagate the exit code. - The non-interactive stream-error handler now throws AlreadyReportedError, and the catch block skips the adapter's emitResult in TEXT mode so we don't get a third copy. - The top-level .catch in packages/cli/index.ts treats AlreadyReportedError as a routine, already-reported failure: exit with the carried code without printing the "unexpected critical" framing or the stack trace. parseAndFormatApiError is also made idempotent — input that already starts with "[API Error: " and ends with "]" is returned unchanged. That is the safety net: even if a future caller forgets to mark its throw, the double-wrap symptom is impossible. Tests cover all three layers: idempotency in errorParsing, the short-circuit in handleError, and a regression test on runNonInteractive that asserts no "[API Error: [API Error: ...]" line is ever produced on stderr. Fixes #3748 * fix(cli): make API error formatting idempotent for 429 suffix + use AlreadyReportedError * fix(cli): follow up on error reporting review	2026-05-03 08:39:31 +08:00
John London	5037fa7627	Feat/stats model cost estimation rebase (#3780 ) * feat(stats): add optional cost estimation to /stats model Adds optional cost estimation based on user-defined pricing in settings.json. Users can configure per-model pricing via the new modelPricing setting. When configured, /stats model shows estimated cost; when not configured, the behavior is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(stats): extract raw model name from composite key for cost lookup flattenModelsBySource creates keys like "model::source", but modelPricing is keyed by raw model names. Extract the raw model name by splitting on "::" to fix cost lookup. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: update settings schema after adding modelPricing Regenerate the VS Code settings schema to include the new modelPricing field so the lint check passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(ui): add regression tests for cost estimation edge cases Coverage for the cost display fixes in ModelStatsDisplay: - Cost section hidden when no pricing + no thoughts - Cost section shown when pricing is configured (with value check) - Thoughts tokens included in cost calculation (with larger numbers to expose the before/after difference) - Raw model name used for pricing lookup with subagent attribution - Cost section shown when thoughts > 0 even without pricing - Multiple models with different pricing Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ui): include thoughts tokens in ModelStatsDisplay cost calculation Address review feedback: the interactive cost estimate was omitting thoughts/reasoning tokens from the output token count, causing it to disagree with the non-interactive /stats model path. Changes: - hasPricing visibility gate now includes thoughts in outputTokens - Cost estimate calculation now includes thoughts in outputTokens - getModelName() already correctly extracts raw model name from flattened model::source keys for pricing lookup Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(stats): include thoughts tokens in non-interactive cost calc + update snapshots - Fix statsCommand.ts non-interactive path to include thoughts tokens in outputTokens for cost calculation, aligning with the interactive ModelStatsDisplay path. - Update 4 ModelStatsDisplay snapshots to reflect Cost section appearing when thoughts > 0 (even without pricing, showing N/A). * fix(test): address all CI failures and reviewer feedback on PR #3780 - Add `toModelMetrics` helper to statsCommand.test.ts to properly create ModelMetrics with required `bySource` property - Fix ModelStatsDisplay.test.tsx: - Add missing `sessionId` to SessionStatsState mock - Add missing `startNewSession` to useSessionStatsMock - Add `auto_accept` to all totalDecisions mocks (7 locations) - Add `files` to all SessionMetrics objects (6 locations) - Remove contradictory test "should show Cost section when thoughts > 0 even without pricing" per Option A (strict opt-in) - Revert 4 snapshots that incorrectly showed Cost/N/A lines for models without pricing configuration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-03 07:31:08 +08:00

1 2 3 4 5 ...

5707 commits