Commit graph

2823 commits

Author SHA1 Message Date
pomelo-nwu
f4e01a409e fix(auth): address PR #4287 review (critical + suggestion)
vscode AuthMessageHandler (Critical):
- Add the missing protocol-selection step so custom-provider users can
  pick Anthropic/Gemini instead of being silently locked to OpenAI.
- Validate free-form base URL with the same /^https?:\/\// check the
  CLI uses; reject file:/javascript: schemes.

vscode AuthMessageHandler (Suggestion):
- Stop filtering separator entries from the provider QuickPick so
  groups (Alibaba Cloud / Third Party / Custom) actually show as
  headers instead of a flat list.
- Treat a null authInteractiveHandler as an error: surface an
  authError + cancellation notification instead of silently dropping
  the user's input.
- Call notifyAuthCancelled when validateApiKey rejects so the
  webview state resets and the user can retry.

core/providers/presets/openrouter.ts (Critical):
- Replace the substring includes() in ownsModel with a URL-hostname
  match so paths like https://api.example.com/openrouter.ai/v1 stop
  being misidentified as OpenRouter models (and getting removed on
  re-install).

vscode/services/settingsWriter.ts (Critical):
- stripTrailingCommas() so JSONC files with trailing commas (VSCode's
  default style) parse instead of silently returning {} and then
  overwriting the entire settings file.
- readSettings() distinguishes ENOENT (return {}) from parse errors
  (log + rethrow) so a malformed file never gets clobbered.
- writeSettings() writes through a temp file + fs.renameSync atomic
  rename, eliminating the half-written file window on EACCES /
  disk-full / crash.
- setValue() refuses to overwrite a scalar at an intermediate path
  segment (would have silently destroyed e.g. {"env": "legacy-string"}).

core/providers/install.ts (Suggestion):
- Move settings.backup?.() inside the try block so a backup failure
  still triggers the env-rollback path in catch.

cli/config/loadedSettingsAdapter.ts (Suggestion):
- Add the same UNSAFE_KEY_PARTS guard the vscode adapter has, so
  __proto__/constructor/prototype segments are rejected before
  reaching the underlying setNestedPropertySafe walker. Defense in
  depth: not exploitable today but the utility has no built-in guard.

vscode/webview/providers/WebViewProvider.ts (Suggestion):
- Hoist buildInstallPlan / applyProviderInstallPlanToFile to static
  imports (both modules already top-level imported); drops two
  per-call await import() round-trips.

cli/utils/doctorChecks.ts (Suggestion):
- Whitespace nit before the comma in the qwen-code-core import.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-19 13:58:25 +08:00
pomelo-nwu
18b35b48ec i18n(cli): translate "Connect an LLM provider" in all locales
Strict-parity locales (zh, zh-TW) require every built-in command
description to be translated; the renamed /auth description was
falling back to English and breaking the must-translate test.

Add translations for zh / zh-TW (required) and refresh the other
seven locales (en, ru, de, ja, fr, ca, pt) so the old
"Configure authentication information for login" key is removed
everywhere rather than left as a dangling dictionary entry.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 22:36:09 +08:00
pomelo-nwu
1b0b7f93a8 refactor(cli): rename /auth description to "Connect an LLM provider"
The old description ("Configure authentication information for login")
implied a Qwen-account login. After the /auth refactor it's really
about picking an LLM provider and entering credentials, so the menu
entry should say that.

Also add 'connect' as an alt-name alongside the existing 'login' so
users can type /connect when 'auth' feels wrong. Keep 'login' for
muscle-memory compatibility.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 22:12:36 +08:00
pomelo-nwu
ed08f7296b fix(auth): show base URL default as placeholder, not prefilled value
In Custom Provider Step 2/6 (and on protocol switch), the base URL
input started with the protocol's default URL pre-filled. Users who
wanted a non-default endpoint had to manually clear the field first.

Switch to placeholder semantics: the input starts empty, the default
URL is shown as a hint, and submitting blank falls back to that
default (then writes it back to baseUrl so downstream steps see a
real value).

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 21:56:28 +08:00
pomelo-nwu
0d8fe738f5 fix(auth): default Audio modality to off in provider advanced config
In the /auth Custom Provider advanced-config step, "Enable modality"
should default to Image + Video only. Audio was on by default, which
implied the model accepts audio input even though most providers
people configure here don't.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 21:49:31 +08:00
pomelo-nwu
efcfce5b6b Merge remote-tracking branch 'origin/main' into refactor/unify-provider-config-to-core
# Conflicts:
#	packages/cli/src/ui/components/ManageModelsDialog.test.tsx
#	packages/cli/src/ui/components/ManageModelsDialog.tsx
#	packages/cli/src/utils/apiPreconnect.ts
#	packages/core/src/providers/all-providers.ts
2026-05-18 20:49:31 +08:00
pomelo-nwu
6804a18f01 refactor(auth): unify applyProviderInstallPlan in core, drop cli/auth
CLI and vscode now share core's applyProviderInstallPlan instead of keeping
two parallel implementations. The CLI-only env rollback (snapshot
process.env, restore on error) is folded into the core version so vscode
also benefits from it.

CLI ships a LoadedSettingsAdapter that maps LoadedSettings to core's
ProviderSettingsAdapter contract. Backup/restore is layered: write a .orig
file, structuredClone settings + originalSettings, then recomputeMerged()
on restore — same guarantees as before, just routed through the adapter.

Tests for the install logic are migrated to core and rewritten against the
adapter mock (more focused than the previous LoadedSettings/Config mocks).

packages/cli/src/auth/ is gone entirely.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 18:14:12 +08:00
pomelo-nwu
423e453069 refactor(cli): drop OpenRouter OAuth + /manage-models, simplify /auth
OpenRouter now uses the standard API-key flow under "Third-party Providers"
(issue #4108). The whole OpenRouter OAuth implementation (PKCE, callback
server, model auto-install) and the /manage-models command (only OpenRouter
was wired in; /auth Step 2 already covers model selection) are removed.

/auth is renamed around the "Connect a Provider" mental model:
- Dialog title is now "Connect a Provider"; the OAuth main entry is gone
- handleAuthSelect (mixed close + auth trigger) is split into a single-purpose
  closeAuthDialog; legacy wrappers (handleSubscriptionPlanSubmit,
  handleApiKeyProviderSubmit, handleCustomApiKeySubmit, ...) are dropped in
  favor of the unified handleProviderSubmit

Core: openRouterProvider switches to authMethod='input', uiGroup='third-party',
ships with two recommended free models, and is reordered to the end of the
third-party list to keep DeepSeek as the default highlight.

Net diff: 34 files, +124 / -3835.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 17:54:40 +08:00
jinye
33b2e0dccc
fix(serve): normalize Windows path separators in workspace file read responses (#4279)
`workspaceRelative` returned the platform-native separator from
`path.relative`, leaking backslashes into `/file`, `/stat`, `/list`,
and `/glob` response paths on Windows. Surfaced as a Windows-only CI
failure in the `GET /glob > scopes glob matches to cwd` test
(`['sub\\inside.ts']` vs expected `['sub/inside.ts']`).

Always emit POSIX-style separators so SDK consumers see the same
shape across platforms.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
2026-05-18 17:53:14 +08:00
jifeng
b208f34455
feat(serve): add /demo debug page for qwen serve daemon (#4132)
* feat(serve): add /demo debug page for qwen serve daemon

Add a self-contained HTML debug page at GET /demo that provides a
browser-based UI for exercising all daemon routes: session
create/attach, prompt send/cancel, SSE event streaming, model
switching, permission voting, and health/capabilities checks.

Also add a same-origin exemption middleware (before the CORS deny
layer) so browser fetch calls from the demo page pass through while
external Origins remain blocked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(serve): address CR feedback for /demo page security and robustness

- Fix XSS: build permission buttons with DOM APIs instead of innerHTML
- Fix SSE: move currentEvent outside read loop for cross-chunk frames
- Fix SSE: handle stream end (flush trailing buffer, update UI status)
- Security: move /demo route behind hostAllowlist and bearerAuth guards
- Security: add host.docker.internal to same-origin Origin allowlist
- Add Auth Token input and include Authorization header in API/SSE calls
- Add try/catch to /demo route handler with writeStderrLine logging
- Check API result before removing permission card from UI
- Add 7 tests for /demo route and Origin-stripping middleware

* fix(serve): move /demo before bearerAuth so browsers can reach it

Browsers cannot attach Authorization headers on address-bar navigation,
so /demo behind bearerAuth was unreachable when --token was set. Move
the /demo route after CORS + Host allowlist but before bearerAuth. The
static HTML shell contains no secrets; all API/SSE routes remain
bearer-protected and the in-page token input authenticates them.

* feat(serve): show 401 token hint on demo page

When an API call returns 401 Unauthorized, highlight the Auth Token
input field with a yellow border and display a hint message guiding
the user to enter their bearer token. Applies to both API calls and
SSE connections. The hint auto-dismisses after 6 seconds.

* fix(serve): address round-2 CR feedback for /demo security

- Loopback-gate /demo like /health: pre-auth on loopback, post-auth on
  non-loopback. Prevents unauthenticated access on public interfaces.
- Add X-Frame-Options: DENY + CSP frame-ancestors to /demo response to
  prevent clickjacking via cross-origin iframe embedding.
- Cache selfOrigins Set in Origin-stripping middleware (rebuild only
  when port changes) instead of allocating per-request.
- Clear pendingPerms + reset currentAssistantBubble on session switch
  to prevent stale permission cards from the previous session.
- Update tests: loopback vs non-loopback /demo auth, anti-clickjacking
  headers, rename CORS test, add localhost and [::1] origin coverage.

* fix(serve/demo): address CR round-2 feedback for demo page UX

- Replace innerHTML with DOM APIs in addLog() to prevent XSS
- Add MAX_LOG_ENTRIES=500 pruning to prevent unbounded DOM growth
- Add concurrent-send guard (promptInFlight) to prevent double submits
- Show error feedback in Chat tab when sendPrompt or createSession fails
- Disable prompt controls on SSE failure (both catch and non-OK paths)
- Add toolCall context detail (command/input/path/diff) to permission cards

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-18 16:31:03 +08:00
jinye
52d2850c7f
feat(serve): safe workspace file read routes (#4175 PR 19) (#4269)
* refactor(serve/fs): glob audit hashes workspace + emits pattern

Closes PR #4250 follow-up #4.

Hashing the per-call cwd for glob audit produced a different
pathHash for every subdirectory glob without giving operators any
actionable difference (raw paths are privacy-gated). Replace the
hash basis with the bound workspace itself and surface the
literal pattern on a new schema field, so every glob row carries
a stable workspace marker and a per-call pattern.

The pattern field also fires on parse_error denials (path-escape
patterns, non-relative patterns) so audit consumers debugging a
production glob rejection can see the exact rejected pattern
without needing QWEN_AUDIT_RAW_PATHS=1.

* feat(serve): safe workspace file read routes (#4175 PR 19)

Add four read-only HTTP routes that consume PR 18's per-request
WorkspaceFileSystem boundary:

  - GET /file?path=...       text content + meta (encoding/BOM/lineEnding)
  - GET /list?path=...       directory entries (name/kind/ignored)
  - GET /glob?pattern=...    workspace-relative match paths
  - GET /stat?path=...       file/directory metadata

The routes share one error envelope (sendFsError) that maps
FsError.kind through the boundary's existing DEFAULT_STATUS_BY_KIND
table to a typed JSON response. All four 200 responses set
Cache-Control: no-store and X-Content-Type-Options: nosniff so a
browser-adjacent client cannot cache or sniff source content.

Routes are advertised under a single workspace_file_read capability
tag — the four endpoints share the same backing boundary and the
same failure shape, so per-route tags would force four
simultaneous registry entries with no operator-meaningful
difference between them. Mutating routes will ship in PR 20 under
their own workspace_file_write tag.

Trust gate is unchanged: read intents pass on untrusted workspaces
per PR 18's policy.ts. Auth follows the global bearer flow only;
read routes never run mutate(), since none of them mutate state.

* feat(serve): runQwenServe injects fsFactory + emit pipeline

Closes PR #4250 follow-up #2.

runQwenServe now constructs a WorkspaceFileSystem factory from
the bound workspace, threads its emit hook through to the read
routes, and exposes the trust snapshot via deps.trustedWorkspace.
Test additions pin the wiring contract:

  - audit events emitted on success / denial flow back through
    the test-supplied fsAuditEmit hook
  - deps.fsFactory override is honored (built-in default does not
    silently shadow injection)
  - trust snapshot defaults to true (operator-chosen workspace)
  - trust=false routes through to the boundary and trips
    untrusted_workspace on write intents

Default emit stays a stderr warning so a wiring regression that
drops events remains visible. PR 21's SSE fan-out will replace
the default with a workspace-scoped event channel.

* fixup(serve): address PR #4269 round-1 review feedback

Closes 8 findings from Copilot inline review + Codex review on
PR #4269 (5 P0, 3 P1):

P0 (correctness / privacy / operations)

- runQwenServe.ts: throttle the default fsAuditEmit by reusing
  the exported `createDefaultFsAuditEmit` from server.ts. The
  earlier per-event `writeStderrLine` would print one line for
  every /file/list/glob/stat audit event under normal traffic.
  Now warns once + every 100th drop with payload context, so a
  wiring regression is still visible without flooding logs.
  (Copilot runQwenServe.ts:316; Codex runQwenServe.ts:305)
- routes/workspaceFileRead.ts: probe glob with maxResults+1 and
  trim, so `truncated` reflects whether the boundary actually
  had more matches. Earlier `length === maxResults` heuristic
  false-positived when the workspace happened to hold exactly N
  matches. (Copilot workspaceFileRead.ts:399)
- routes/workspaceFileRead.ts: glob `relMatches` now flows
  through the shared `workspaceRelative` helper. Root match
  (`pattern=.`) renders as "." rather than the empty string
  `path.relative` returns; helper also covers the
  boundWorkspace-undefined edge case so the route no longer
  carries its own fallback branch.
  (Copilot workspaceFileRead.ts:388; review summary HIGH-1)
- fs/audit.ts: `pattern` field now rides on the same privacy
  gate as `relPath` / `message`. Glob patterns commonly carry
  workspace-relative or absolute path fragments
  (`src/secrets/*.env`, rejected `/Users/alice/ws/**`), so
  emitting them in privacy mode bypassed the same redaction the
  other path-bearing fields honor. Operators wanting full
  forensic context opt in via QWEN_AUDIT_RAW_PATHS=1.
  (Codex audit.ts:249)
- routes/workspaceFileRead.ts: cwd resolves with intent='list'
  rather than 'glob'. The orchestrator's `recordAndWrap`
  auto-derives `data.pattern` from `intent === 'glob'`, which
  turned cwd-resolution failures into rows where the cwd string
  masqueraded as the glob pattern (`?cwd=../outside` →
  `pattern: ../outside` in audit). Switching to 'list' is the
  correct semantic shape (cwd is a directory we intend to walk)
  with identical trust + path-resolution behavior.
  (Codex workspaceFileSystem.ts:941)

P1 (cosmetic / comment accuracy)

- server.test.ts: `honors deps.fsFactory override` test comment
  rewritten to match the actual failure mode (a regression would
  404 on a.txt, not 200 against package.json). (Copilot server.test.ts:3219)
- routes/workspaceFileRead.ts: `limit` error message uses the
  MAX_LIST_ENTRIES constant instead of the literal 2000.
  (review summary MEDIUM)
- fs/audit.ts: expanded the JSDoc explaining why the AuditPublisher
  request types Omit four fields and pass `pattern` through.
  (review summary MEDIUM)

Test additions / adjustments

- audit.test.ts: split the existing pattern tests into raw-paths
  and privacy-default cases; added two new privacy-mode assertions
  that strip pattern under default config.
- workspaceFileSystem.test.ts: harness accepts `includeRawPaths`;
  glob audit suite runs with raw paths to observe `pattern`;
  new `glob audit privacy default` suite asserts pattern + relPath
  are stripped without the env opt-in.
- workspaceFileRead.test.ts: new GET /glob cases for the
  truncated edge case (count == maxResults → false; count >
  maxResults → true) and root-match normalization.

Not adopted (with rationale)

- review summary HIGH-2 (glob pathHash uses boundWorkspace): this
  is the deliberate follow-up #4 contract from PR 18; pattern is
  the per-call signal, pathHash is the workspace marker.
- review summary MEDIUM-1 (parseIntInRange three-state return):
  matches `parseMaxQueuedQuery` in server.ts; consistency wins.
- review summary LOW-1/2/3 (capabilities comment length, CSP
  header, reverse truncated:false assertion): rationale already
  documented in code, CSP belongs in a hardening PR, the
  reverse assertion already exists.

518/518 serve tests pass; typecheck + eslint clean within
src/serve/.

* fix(serve): address workspace file read review

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(serve): tighten workspace file read review followups

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 16:17:14 +08:00
易良
8f9b979ec0
test: reduce wait-dependent UI test delays (#3987)
* test: reduce wait-dependent UI test delays

* test: address UI test review feedback

* test: address remaining UI test review feedback

- Replace runPendingRetryTimer's polling loop with
  vi.advanceTimersToNextTimerAsync to decouple from connect()'s
  internal async structure.
- Document that newSessionWithRetry's 300ms auth-delay timer is
  unreachable under the current mocks so future tests advance it
  via the same helper instead of restoring shouldAdvanceTime.
- Stop swallowing runPendingRetryTimer errors behind
  connectPromise.catch(e => e); attach a noop catch only to mute
  the unhandled-rejection guard and assert directly on the
  original promise.
- Comment why flush() yields twice (Ink 7 + React 19 split a
  render across two microtask ticks) so it is not "optimized" to
  a single yield.
- Move the trailing unmount() into the finally block in the
  shell-history placeholder test so an assertion failure does
  not leak the Ink instance.
2026-05-18 15:34:34 +08:00
jinye
103090669e
feat(serve): workspace memory and agents CRUD (#4175 Wave 4 PR 16) (#4249)
* feat(serve): workspace memory and agents CRUD (#4175 Wave 4 PR 16)

Adds the first Wave 4 mutation route surface: workspace-scoped memory
and subagent CRUD over HTTP. Remote clients (TUI / channels / web /
IDE adapters) can now list, read, create, update, and delete subagent
definitions and read / append / replace QWEN.md without disturbing
session state.

Routes:
- GET    /workspace/memory             (read-only snapshot)
- POST   /workspace/memory             (append/replace, strict-gated)
- GET    /workspace/agents             (list project + user + builtin)
- POST   /workspace/agents             (create-only; 409 on collision)
- GET    /workspace/agents/:agentType  (full detail incl. systemPrompt)
- POST   /workspace/agents/:agentType  (update; 403 read-only on builtin)
- DELETE /workspace/agents/:agentType  (idempotent for SDK callers)

Mutation paths use mutate({ strict: true }) from PR 15 so they refuse
unauthenticated requests even on no-token loopback defaults. Workspace
mutations validate X-Qwen-Client-Id against bridge.knownClientIds() and
stamp originatorClientId on emitted events.

Capability tags added: workspace_memory, workspace_agents.

New typed events fanned out via bridge.publishWorkspaceEvent (best-
effort to every active session bus; read-after-write is the contract):
- memory_changed { scope, filePath, mode, bytesWritten }
- agent_changed  { change, name, level }

writeContextFile.ts is the new core helper that resolves
QWEN.md placement (workspace vs ~/.qwen) and append-vs-replace
semantics. Whitespace-only appends short-circuit before fs.writeFile,
so a no-op POST does not bump mtime or fan out a misleading event.

SubagentManager is wrapped with a CRUD-scoped Config stub via Proxy:
only getSdkMode / getProjectRoot / getActiveExtensions are stubbed
(verified against subagent-manager.ts; getToolRegistry is execution-
path only). Any future Config method touched on a CRUD path throws
immediately so dependency creep is visible.

Auto-memory CRUD, persistent audit log, and the EACCES → NOT_FOUND
unlink mapping in core SubagentManager.deleteSubagent are explicit
follow-ups (PR 16.5 / PR 24 / separate fix).

Validation:
- typecheck: cli + sdk-typescript clean
- vitest:    serve 348/348, writeContextFile 10/10, SDK 335/335
- eslint:    clean

* fix(serve): address Codex P2 review on PR 16 (#4175 Wave 4 PR 16 follow-up)

Three correctness issues Codex flagged on the just-shipped workspace
memory + agents CRUD surface:

1. Concurrent POST /workspace/memory append no longer loses writes.
   Two simultaneous appends would each read the same existing file,
   compose new content in JS memory, then race the fs.writeFile —
   the later write silently overwrote the earlier appended entry.
   Add a per-resolved-path Mutex map (mirroring jsonl-utils.ts's
   fileLocks pattern) and wrap the entire read-compose-write
   sequence in runExclusive.

2. GET /workspace/agents now reflects out-of-band file changes.
   SubagentManager.listSubagents() default served the in-memory cache;
   developer / IDE adapter edits to .qwen/agents/*.md never appeared
   even though GET /workspace/agents/:agentType always reads disk.
   Pass { force: true } so the LIST route walks disk every call,
   matching the detail route's "filesystem is the source of truth"
   contract.

3. Reject builtin agent names on POST /workspace/agents to prevent
   undeleteable shadow files. A client could write a project-level
   agent named "general-purpose" — list/load resolved the shadow
   first, but SubagentManager.deleteSubagent's name-based builtin
   guard (subagent-manager.ts:302) rejected DELETE forever. Add a
   BuiltinAgentRegistry.isBuiltinAgent check in parseAgentConfig
   so the conflict surfaces at create time instead of trapping the
   file beyond the API. The check is case-insensitive, matching the
   resolver's case-insensitive cascade.

New tests:
- writeContextFile.test.ts: 10 parallel appends, all 10 entries
  must survive in the final file (would fail without the mutex).
- workspaceAgents.test.ts: GET /workspace/agents observes a
  freshly-written agent file on the second call (force-refresh
  proof); POST with name="general-purpose" returns 422 + the
  case-insensitive variant "explore" too.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: serve 351/351 (was 348, +3 new), writeContextFile 11/11
- eslint: clean

* fix(serve): apply round-1 review fold-in 2a (HIGH + CodeQL) on PR 16

Round-1 inline review (#4249) flagged ~28 items across Copilot,
wenshao, and CodeQL. This commit lands the HIGH-severity correctness
fixes plus the two CodeQL polynomial-regex warnings.

Validation tighten — `parseAgentConfig` + `parseAgentUpdates`:
- Trim leading/trailing whitespace on `name` before passing to
  SubagentManager. `" tester "` would otherwise create a frontmatter
  name with spaces that case-insensitive lookups can never find.
- Fail-closed (422 invalid_config) on present-but-wrong-type optional
  scalars: `model`, `color`, `approvalMode`, `background`. Previously
  malformed values silently dropped through validation, masking
  client-serialization bugs.
- Validate `approvalMode` against the `APPROVAL_MODES` enum on both
  create and update; an unknown value used to 201 with the field
  silently omitted from the saved file.
- `runConfig` is now whitelist-sanitized to `{ max_time_minutes,
  max_turns }` only; unknown keys are dropped, malformed values
  return 422. Previously the whole input object was persisted
  verbatim into YAML frontmatter.
- `?scope=` query is fail-closed for repeated values
  (`?scope=workspace&scope=global`) — Express parses these as arrays
  which the previous `typeof === 'string'` check silently treated as
  absent, broadening DELETE/UPDATE semantics from one level to both.
- Empty update body returns 400 invalid_config (previously rewrote
  the file + emitted a misleading `agent_changed` event).
- No-op updates (every supplied field already matches `existing`)
  return 200 + `changed: false` and SKIP the file rewrite + event
  fan-out.

Memory write helper — `writeContextFile.ts`:
- Move whitespace-only no-op detection BEFORE `fs.mkdir`. Without
  this, an empty POST still created the parent directory and bumped
  its mtime even though `changed: false` was reported.
- Replace two polynomial regex patterns flagged by CodeQL
  (`/^\s+|\s+$/g` and `/^\n+|\n+$/g`) with hand-rolled `while` loops.
  Same pattern auth.ts:120-125 already uses for the same CodeQL rule.

SDK — `DaemonClient.ts` + `types.ts`:
- `DaemonWriteMemoryResult` gains optional `changed?: boolean` so
  typed callers can suppress redundant cache invalidation on no-op
  appends. Optional for forward-compat with daemons that predate the
  field — undefined treats as "changed: true" (legacy contract).
- `deleteWorkspaceAgent` only swallows 404 when the body's `code`
  is `agent_not_found`. A bare 404 (older daemon, misrouted proxy,
  generic gateway page) now throws — previously the SDK silently
  reported success even when the request never reached a route that
  understands workspace agents.
- `updateWorkspaceAgent` adds an optional `scope` parameter
  mirroring `deleteWorkspaceAgent`, so callers can target the user-
  level definition when a project-level agent shadows it.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: serve 357/357 + writeContextFile 12/12 = 369/369 passing
  (was 362; +7 new)
- eslint: clean

Explicitly NOT applying (out of scope per issue #4175 PR 16
review-resolution policy):
- Copilot's "strict gate after body parser" finding — already
  documented as PR 15 review-resolved tradeoff at auth.ts:256-269.

* fix(serve): apply round-1 review fold-in 2b (MEDIUM + tests) on PR 16

MEDIUM hardening:
- Fix the JSDoc on `collectWorkspaceMemoryStatus` to match the
  workspace-root-only discovery the implementation actually does
  today. The 32-iteration upward walk is reserved for a future
  hierarchical mode but breaks after iteration 1 in v1.
- Lower the depth limit on `walkWorkspaceForMemory` from 32 → 12.
  Realistic project depth sits well below 8; 12 leaves headroom
  without amplifying blast radius from symlink cycles.
- Daemon `Config` Proxy now defines a `has` trap symmetric to the
  existing `get` trap. Without it, a future SubagentManager path
  doing `'someMethod' in this.config` would silently get `false` and
  bypass the safety net the throw-on-unknown-property design
  installed.
- Preflight `manager.loadSubagent(name, level)` before
  `manager.createSubagent`. The default-path collision check inside
  SubagentManager would otherwise miss same-frontmatter-name +
  different-filename collisions; the preflight makes 409
  agent_already_exists deterministic.
- Multi-level DELETE now emits one `agent_changed` event per level
  that actually had a file removed. Previously an unscoped DELETE
  removing both project and user shadows would publish only one
  event with one level — misleading subscribers using event metadata
  for toasts / audit / echo-suppression.

Test additions (covers the new event types + bridge fan-out + SDK
helpers):
- `daemonEvents.test.ts`: predicate narrowing for `memory_changed` /
  `agent_changed` (rejects malformed scope/mode/level), reducer
  records `lastWorkspaceMutation` + `lastWorkspaceMutationType` with
  latest-event-wins semantics and stays non-terminal.
- `httpAcpBridge.test.ts`: `publishWorkspaceEvent` fans out to every
  active session bus; `knownClientIds()` aggregates clientIds across
  sessions and the returned set is a snapshot (mutating it does not
  affect future calls).
- `workspaceAgents.test.ts`: success-path test stamping
  `originatorClientId` on the create / update / delete events for a
  known client.
- `DaemonClient.test.ts`: 7 round-trip tests for the new SDK helpers
  (workspaceMemory, writeWorkspaceMemory, listWorkspaceAgents,
  getWorkspaceAgent, createWorkspaceAgent, updateWorkspaceAgent with
  scope query, deleteWorkspaceAgent: 204 / structured 404 / bare 404
  triage).
- `writeContextFile.test.ts`: replace the 30ms-mtime test with a
  `vi.spyOn(fs, 'writeFile')` assertion that the no-op path never
  invokes writeFile. Deterministic on every filesystem.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: serve 363/363 + writeContextFile 12/12 + SDK 347/347
- eslint: clean

Reviewer guide: combined with fold-in 2a (commit 134c43c82),
PR 16's round-1 review feedback is closed except for the explicitly-
deferred Copilot finding on "strict gate after body parser" (already
documented as PR 15 review-resolved tradeoff at auth.ts:256-269).
The DRY refactor wenshao suggested for `resolveOriginatorClientId`
is left as a future sweep — it touches multiple Wave 4 routes and
should land alongside PR 17/19/20/21 to keep the helper's shape
informed by all consumers.

* docs(serve): apply round-1 review fold-in 2c (doc/type tightening) on PR 16

Two doc-only fixes that close the last open Copilot threads on PR
#4249 — both are JSDoc/tsdoc corrections where the wording promised
broader behavior than the implementation actually delivers, so a
maintainer or SDK consumer reading the type would form a wrong
mental model.

1. `DaemonAgentLevel` (sdk-typescript) and `ServeAgentLevel` (cli
   serve) keep `'extension'` + `'session'` on the union for forward-
   compat but the JSDoc now explicitly says the daemon does NOT
   return either today. The `'extension'` case is gated by the
   daemon's stub `Config.getActiveExtensions()` returning `[]`;
   `'session'` is a runtime-only `SubagentManager` cache the CRUD
   routes don't read. Both arms stay so a future PR exposing either
   source is not a breaking SDK change.
2. `DaemonClient.workspaceMemory()` tsdoc no longer says
   "hierarchical" — v1 only discovers files at the bound workspace
   root + the global `~/.qwen` directory, no parent-directory walk.
   The 12-iteration upward-walk loop body inside
   `walkWorkspaceForMemory` is reserved for PR 16.5 hierarchical
   mode and breaks after iteration 1 today; the SDK doc now states
   that explicitly so callers don't expect more than they receive.

No runtime change. Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 363/363 serve + 12/12 writeContextFile + SDK unchanged
- eslint: clean

* fix(serve): apply round-2 review fold-in 2d on PR 16

wenshao round-2 (4 inline comments at 16:51-16:53Z): three real bugs
+ one performance-tradeoff doc note.

1. `composeAppendedContent` now inserts inside the MEMORY section,
   not at EOF. Previously a QWEN.md whose `## Qwen Added Memories`
   block was followed by another `## ...` heading would silently
   land each new entry past the next heading — moving entries into
   the wrong section. Walk the memory header forward, find the next
   `\n## ` heading, and insert just before it. Fall back to the EOF
   append when the memory section is the last block.

2. `parseAgentUpdates` now matches the create-side trim/empty rule
   for `description` (rejects whitespace-only) and ensures
   `systemPrompt` rejects the empty string. Update path used to
   silently accept `"   "` and overwrite the field with blank
   content — divergent from create which 422s the same payload.

3. `isNoOpUpdate`'s runConfig comparison no longer false-positives
   on partial updates. Comparing every known runConfig field against
   `existing` treated absent keys as `undefined` while existing had
   real values — so `{max_time_minutes: 30}` against `{max_time_minutes:
   30, max_turns: 10}` claimed non-no-op and re-emitted
   `agent_changed`. Fixed to only compare keys actually present in
   `updates.runConfig`, matching `mergeConfigurations` semantics
   (existing values preserved when not in updates).

4. JSDoc on the LIST-route `force: true` call now explains the
   tradeoff (no TTL cache / no fs.watch invalidation): re-introducing
   caching would re-introduce the stale-list bug Codex P2 #2 fixed,
   `fs.watch` is platform-fragile, and PR 24's audit/policy layer is
   the proper home for request rate limiting. Sub-millisecond cost
   per request on local SSD; revisit if profiling flags it.

Tests:
- writeContextFile.test.ts: section-boundary insertion + EOF fallback
- workspaceAgents.test.ts: whitespace-only description rejected; partial
  runConfig no-op detection; partial runConfig real change preserves
  omitted keys via mergeConfigurations

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 368/368 (was 363, +5 new)
- eslint: clean

* fix(serve): apply round-3 review fold-in 2e on PR 16

wenshao round-3 (5 inline [Suggestion]s, all real correctness or
forward-compat issues; one item carried over from round-2):

1. `parseAgentConfig` rejects whitespace-only `systemPrompt` on
   create, matching the description field's `trim().length === 0`
   rule. Pure-whitespace prompts collapse to nothing on YAML
   serialization and the agent can't operate without instructions —
   422 at the boundary is friendlier than the downstream "agent does
   nothing" failure.
2. `parseAgentUpdates` mirrors the same `trim()` check on the update
   path so `{systemPrompt: "   "}` returns 422 rather than silently
   blanking the field.
3. `POST /workspace/memory` `file_error` 500 response now carries
   `scope`, `mode`, optional `osCode` (`EACCES`/`EROFS`/`ENOSPC`/...)
   and a redacted `errorMessage`. Previous shape was just
   `{error, code: 'file_error'}` — callers had nothing to branch on.
4. `composeAppendedContent` runs `fs.stat` before `fs.readFile` and
   refuses with a typed `WorkspaceMemoryFileTooLargeError` when the
   existing file exceeds 16 MB. Without this cap a pathological QWEN.md
   would be loaded into the daemon heap on every append. The route
   maps the typed error to a 413 with `code: 'memory_file_too_large'`
   plus `bytes` / `limit` so callers can decide whether to trim or
   switch to mode=replace.
5. `toDetail` no longer spreads `config.runConfig` with a cast.
   Explicit field-by-field pick of `max_time_minutes` / `max_turns`
   ensures any future `SubagentConfig.runConfig` field requires a
   deliberate route-schema update rather than silently leaking
   through the HTTP API.

Tests:
- workspaceAgents.test.ts: whitespace-only systemPrompt rejected on
  create AND update; toDetail.runConfig only emits whitelisted keys
- existing tests still cover the description-side trim and the
  partial runConfig no-op detection from fold-in 2d

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 371/371 (was 368, +3 new)
- eslint: clean

Reviewer note: response shape on 500 file_error is additive
(`scope`/`mode`/`osCode`/`errorMessage` are new fields), so SDK
callers that only consumed `{error, code}` keep working. The new
413 `memory_file_too_large` is a new error code SDK consumers can
branch on but that pre-PR-16 daemons never emitted, so adding it is
also additive.

* fix(sdk): expose `changed` on DaemonAgentMutationResult (PR 16 round-4)

wenshao round-4 review (single inline at types.ts:434): the agent
update route emits `changed: true` for real updates and
`changed: false` for no-op short-circuits (introduced in fold-in
2a alongside the no-op detection), but `DaemonAgentMutationResult`
in the SDK type still only exposed `{ ok, agent }`. Typed callers
of `updateWorkspaceAgent()` couldn't observe the no-op signal even
though `DaemonClient` already returns the raw JSON at runtime.

Add optional `changed?: boolean` matching the shape introduced for
`DaemonWriteMemoryResult.changed` in fold-in 2a. Optional for
forward-compat with daemons that predate the field; SDK consumers
should treat `undefined` as `true` (the legacy contract — every
successful create / update was a write before fold-in 2a's no-op
short-circuit landed).

Test:
- `DaemonClient.test.ts`: round-trip asserts the typed result
  surfaces `changed: false` from the wire payload.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 82/82 in DaemonClient.test.ts (was 81; +1 new)
- eslint: clean

* fix(serve): apply round-6 review fold-in 2g on PR 16

Round-6 review (gpt-5.5 [Critical] + 5 wenshao [Suggestion]s).

[Critical] Per-level delete verification (workspaceAgents.ts):
- gpt-5.5 flagged that `SubagentManager.deleteSubagent` swallows
  per-level `fs.unlink()` failures (subagent-manager.ts:332-336)
  and returns success as long as ANY level was removed. Trusting
  that signal would let the route publish `agent_changed`/`deleted`
  for a file still on disk under EACCES/EBUSY/EPERM — the client UI
  would drop a still-active definition from cache.
- Route now runs `fs.access` on each pre-checked level's file path
  AFTER `manager.deleteSubagent` returns and partitions into
  `removed` / `remaining`. Events are emitted ONLY for confirmed
  removals; if any level still has its file, the route returns 500
  `agent_delete_partial` with `removedLevels` + `remainingLevels`
  so callers can act precisely.
- New test installs a 0o555 chmod on the user-level agents directory
  so `fs.unlink` raises EACCES while the project-level unlink
  succeeds, asserting both the 500 response and that exactly one
  `agent_changed` event fired for the level that actually went away.

Concurrency consistency (writeContextFile.ts):
- Whitespace-only no-op detection now happens INSIDE the per-file
  mutex's `runExclusive` block. The pre-fix layout did the
  short-circuit `fs.stat` outside the lock; under concurrent
  POSTs (one whitespace-only, one with real content) the no-op's
  `bytesWritten` could lag the post-write reality. Functional
  behavior was already correct; this aligns the snapshot with the
  post-write state.

Defense-in-depth + DRY (workspaceAgents.ts):
- `validateAgentType(req, res)` regex-validates `:agentType` URL
  parameter at the route boundary against the same
  `^[\\p{L}\\p{N}_-]+$/u` pattern as `SubagentValidator.validateName`,
  with a 64-char cap. `findSubagentByNameAtLevel`'s readdir scan
  already prevented path traversal, but failing fast at the boundary
  keeps surprising inputs out of downstream code paths. Two new
  tests cover `..%2Fetc%2Fpasswd` and over-long names.
- `parseScopeQuery(req, res)` extracts the duplicated `?scope=` query
  parser from the POST update + DELETE handlers. Same fail-closed
  semantics on repeated/non-string values.
- `assertMutableLevel(found, agentType, res)` extracts the
  duplicated `isBuiltin || level === 'builtin' || 'extension' ||
  'session'` 403 guard. Future Wave 4 mutation routes (PR
  17 / 19 / 20) call this helper instead of re-implementing the
  predicate.

Client-id helper consistency (workspaceMemory.ts):
- `resolveWorkspaceClientId` removed; the inline branch in the POST
  handler now mirrors `workspaceAgents.ts:resolveOriginatorClientId`
  (validate against `bridge.knownClientIds()`, send 400 directly,
  return so the caller short-circuits). Previously this file threw
  `InvalidClientIdError` and caught it locally — wenshao round-6
  flagged the throw-vs-direct-400 inconsistency between the two
  files. The deeper full-extraction DRY refactor remains deferred
  to the cross-Wave-4 sweep with PR 17/19/20/21.

Won't-fix doc note (workspaceMemory.ts):
- Mount-point JSDoc now explicitly explains why the route returns
  absolute on-disk paths (success / 413 / GET list): clients
  pre-flight `caps.workspaceCwd` to learn the bound workspace and
  can compute relative paths if they want; the global scope's
  `~/.qwen/QWEN.md` is NOT under the workspace root, so a
  workspace-relative form would lose information. Path redaction
  for multi-tenant deployments belongs to PR 24's `--redact-errors`
  policy work, not a per-route default flip in PR 16.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 374/374 (was 371, +3 new)
- eslint: clean

* fix(serve): apply round-7 review fold-in 2h on PR 16

glm-5.1 round-7: 2 [Critical] + 5 [Suggestion] inline comments. Five
applied as code changes; one is a stale-snapshot false positive
(workspaceMemory.ts no longer has the InvalidClientIdError call site
glm-5.1 referenced — fold-in 2g already replaced it with inline
400); one is rationale-replied (INVALID_CONFIG → 422 mapping
suggestion is based on incorrect premise about manager semantics).

[Critical] Code-fence-aware section-boundary detection (writeContextFile.ts):
- The naive `\n## ` indexOf scan would split user-authored memory
  entries that quote markdown documentation containing `##` headings
  inside fenced code blocks. New `findNextTopLevelHeading` helper
  tracks fence state line-by-line and only accepts matches outside
  fences. Two new tests: (a) entry containing a fenced `## Request
  Body` keeps its body intact; (b) real `## post` heading outside
  fences still acts as the section boundary.

[Suggestion] errorMessage + filePath gating (workspaceMemory.ts):
- 500 `file_error` and 413 `memory_file_too_large` responses now
  omit `errorMessage` and `filePath` unless `QWEN_SERVE_DEBUG` is
  set. Default response carries `error / code / scope / mode /
  osCode` — enough for SDK callers to branch without leaking
  absolute filesystem paths. New test asserts both modes round-trip
  the right shape.

[Suggestion] publishWorkspaceEvent visibility (httpAcpBridge.ts):
- Catch block now writes to stderr unconditionally during normal
  operation; only downgrades to the debug channel when
  `shuttingDown` is true. `EventBus.publish` is documented never to
  throw, so a hit in normal ops is by definition a regression that
  must be visible in production logs — silencing via debug-gate
  could let a true bug succeed at the route layer (200 OK) while
  SSE subscribers stop receiving events.

[Suggestion] Log-injection defense for `agentType` (workspaceAgents.ts):
- New `safeLogValue` helper wraps `agentType` interpolations in
  `JSON.stringify(...).slice(0, 82)` before stderr writes (mirrors
  `server.ts:1340`). The route's `validateAgentType` regex already
  rejects names with control chars, but defense-in-depth covers
  legacy on-disk shadows and future fields. Five `writeStderrLine`
  call sites updated (GET / POST / DELETE failure, reload-failure,
  partial-delete, create-reload-failure).

[Suggestion] Simplify walkWorkspaceForMemory (workspaceMemory.ts):
- Replaced the 12-iteration loop with a straightforward single-pass
  stat-each-filename. The `seen` Set, `cursor = parent` walk, and
  filesystem-root guard were dead code (the loop unconditionally
  broke on first iteration). PR 16.5's hierarchical mode lands as a
  fresh upward walk rather than re-enabling commented-out code.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 377/377 (was 374, +3 new)
- eslint: clean

Reviewer notes (NOT adopting):
- glm-5.1's "InvalidClientIdError('workspace', ...)" message-confusion
  Critical: stale-snapshot false positive — fold-in 2g already
  removed `resolveWorkspaceClientId` and inlined a 400 with the
  correct "registered for this workspace" wording. Only a comment
  reference remains.
- glm-5.1's "INVALID_CONFIG → 422" suggestion: SubagentManager only
  ever throws INVALID_CONFIG for read-only conditions (built-in /
  extension / session) — not for malformed config (which uses
  VALIDATION_ERROR). The current 403 mapping in update + delete is
  correct for the manager's actual semantics.

* fix(serve): apply round-8 review fold-in 2i on PR 16

wenshao round-8: 2 [Critical] path-disclosure + 5 [Suggestion]
(name regex, per-field caps, mutex timeout, test gaps, tilde fence).
All adopted.

[Critical] C1 — 413 `err.message` path disclosure (workspaceMemory.ts):
- The 413 `memory_file_too_large` response sent `err.message`
  unconditionally as the `error` field. The
  `WorkspaceMemoryFileTooLargeError` constructor embeds the
  absolute file path in its message ("Existing memory file at
  /Users/<x>/.qwen/QWEN.md is ..."), bypassing the `debugMode()`
  gating that already hid the `filePath` field. Same gating now
  applies to both `error` and `filePath`; default response carries
  a generic string + structured `code` / `bytes` / `limit` so SDK
  callers can branch without the path leak.

[Critical] C2 — workspaceAgents FILE_ERROR `err.message` (workspaceAgents.ts):
- Two catch blocks (create + update) sent `SubagentError(FILE_ERROR)`
  messages directly in the response. Node fs errors embed paths
  like "ENOENT: ... '/Users/<x>/.qwen/agents/foo.md'". Both now
  gate behind `isServeDebugMode()`; default response is the generic
  "Failed to write workspace agent file" envelope.

Shared `isServeDebugMode` helper (debugMode.ts new):
- Moved from inlined copies in workspaceMemory.ts to a small
  shared module so both route files (and future Wave 4 mutation
  routes) share one canonical predicate.

[Suggestion] S1 — POST body `name` validation (workspaceAgents.ts):
- `parseAgentConfig` now applies the same regex + length contract
  as `validateAgentType` (`^[\p{L}\p{N}_-]+$/u`, 2-64 chars). A
  client posting `name: "my/agent"` or 100-char name now fails at
  the body-validation boundary with a 422 `invalid_config` instead
  of bubbling a less-specific `SubagentValidator` error.

[Suggestion] S2 — Per-field size caps (workspaceAgents.ts):
- `description` / `systemPrompt`: 256 KB each
- `tools` / `disallowedTools`: 256 entries, each at most 256 chars
  Applied on both create + update; matches workspaceMemory's
  `MAX_MEMORY_CONTENT_BYTES = 1 MB` posture and keeps `GET
  /workspace/agents` list-response cost bounded.

[Suggestion] S3 — Mutex timeout (writeContextFile.ts):
- `getFileLock` now wraps each Mutex with `withTimeout(..., 30_000)`
  so a wedged filesystem (NFS hiccup, OneDrive lock, kernel I/O
  hang) cannot indefinitely hold the per-file lock. The
  `E_TIMEOUT` sentinel is caught and re-thrown as a typed
  `WorkspaceMemoryWriteTimeoutError`; the route maps it to 500
  `memory_write_timeout` with `timeoutMs` so SDK callers can
  branch on stalled-fs without parsing a generic 500.

[Suggestion] S4 — Test gaps:
- `DELETE /workspace/agents/:id?scope=workspace` happy path:
  removes only the project shadow, leaves user file on disk,
  emits exactly one `agent_changed` event with `level: project`.
- `POST /workspace/agents/:id?scope=global` happy path: updates
  user shadow, leaves project file untouched.
- 413 `memory_file_too_large`: write a 17 MB QWEN.md externally,
  POST append fails with the structured 413 payload (`bytes` /
  `limit`, no `filePath` / no path-embedding error message in
  default response).

[Nice] N1 — Tilde fence support (writeContextFile.ts):
- `findNextTopLevelHeading` now toggles fence state on both ``` `
  and `~~~` openers (CommonMark allows both). A `## heading`
  inside a `~~~` fenced block no longer counts as the section
  boundary.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 380/380 (was 377, +3 new)
- eslint: clean

* fix(serve): apply round-9 review fold-in 2j on PR 16

Two real correctness fixes from wenshao's 2026-05-18 review:

1. resolveContextFilePath now uses getCurrentGeminiMdFilename() so
   POST /workspace/memory writes to the same file GET surfaces.
   Without this, a deployment that ran setGeminiMdFilename('AGENTS.md')
   saw GET list AGENTS.md while POST kept appending to a stale QWEN.md
   — clients then observed "I just wrote content but it's missing
   from /workspace/memory".

2. runWrite no-op branch now returns bytesWritten: 0 instead of the
   existing file's stat.size. The prior value conflated "bytes I
   wrote" with "current file size"; clients accumulating writes via
   sum(bytesWritten) added the file size for every whitespace POST.
   changed: false already signals the no-op; the byte count should
   match its field name.

JSDoc updated on both WriteContextFileResult.bytesWritten and
DaemonWriteMemoryResult.bytesWritten so the contract is explicit.
New test covers setGeminiMdFilename(AGENTS.md) round-trip; existing
no-op test updated for the new bytesWritten semantics.

Round-8 thread PRRT_kwDOPB-92c6Cpyap (DRY resolveOriginatorClientId)
stays open as the cross-Wave-4 tracking marker. CodeQL "missing rate
limiting" alert deferred to PR 24's audit/policy layer (bearer +
max-connections + mutation gate provide v1 mitigations).

* fix(serve): skip two Windows-incompatible test fixtures on win32

Both tests rely on `fs.chmod(dir, 0o555)` to trigger EACCES on a
subsequent write/unlink. Windows ignores Unix-style permission bits
passed to `fs.chmod`, so the directory stays writable, the operation
succeeds, and the error path the test exercises is unreachable —
the test then sees the success status (200 / 204) instead of the
expected 500. CI failed on Windows runner only; Ubuntu + macOS pass.

Route logic is platform-agnostic — these tests validate that:

- `workspaceMemory.test.ts` POST returns the structured 500 envelope
  (no `errorMessage` / `filePath` leakage outside QWEN_SERVE_DEBUG).
- `workspaceAgents.test.ts` DELETE returns 500 `agent_delete_partial`
  when one level's `fs.unlink` silently fails inside SubagentManager.

Both invariants are still covered by the Ubuntu + macOS runs. We can't
swap in a `vi.spyOn(fs, 'unlink')` mock for the agents case either —
`SubagentManager` does `import * as fs from 'fs/promises'`, creating
a sealed ESM namespace object vitest can't redefine.

Skip pattern mirrors `customBanner.test.ts:232`
(`if (process.platform === 'win32') return;`).
2026-05-18 14:26:59 +08:00
jinye
495d11f016
refactor(serve): add FileSystemService boundary (#4175 Wave 4 PR 18) (#4250)
Some checks are pending
Qwen Code CI / Classify PR (push) Waiting to run
Qwen Code CI / Lint (push) Blocked by required conditions
Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* refactor(serve): add FileSystemService boundary (#4175 PR 18)

Introduce a per-request workspace filesystem boundary inside the
`qwen serve` daemon. The boundary centralizes path canonicalization,
symlink-aware boundary checks, ignore/trust policy, size/binary
limits, and audit hooks behind a single typed surface — preparing
PR 19 (read-only file routes) and PR 20 (write/edit routes) to
share a guarded chokepoint instead of re-implementing path safety
per route.

Wave 4 PR 18 of #4175 — pure refactor, no new HTTP routes; depends
on PR 12 (#4241) and PR 15 (#4236), both merged.

New module under `packages/cli/src/serve/fs/`:

- `paths.ts` extracts `canonicalizeWorkspace` from `httpAcpBridge.ts`
  (re-exported there for backward compatibility) and adds:
  - `ResolvedPath` brand and `Intent` union (read/write/edit/list/
    glob/stat) with exhaustiveness checks at the trust gate
  - `hasSuspiciousPathPattern` — detects NTFS ADS, 8.3 short names,
    long-path prefixes, UNC paths, trailing dots, DOS device names,
    and three-or-more-dot path components (claude-code-style)
  - `findExistingAncestor` with explicit ENOTDIR rejection so a
    regular file in a path component throws `parse_error` rather
    than passing boundary inspection and 500-ing later
  - `resolveWithinWorkspace` running a chain-aware realpath check
    with ENOENT-tolerant ancestor walk for write/stat intents
- `errors.ts` defines `FsError` / `FsErrorKind` plus `wrapAsFsError`,
  which categorizes raw `fs.promises` errnos (EACCES → permission_
  denied, ELOOP → symlink_escape, ENOTDIR → parse_error, etc.) so
  body-level failures emit audit events instead of escaping
  uncategorized
- `policy.ts` carries `MAX_READ_BYTES` (256 KiB), `MAX_WRITE_BYTES`
  (5 MiB), `BINARY_PROBE_BYTES` (4 KiB), `shouldIgnore` (file/
  directory aware), and `assertTrustedForIntent` with an
  exhaustive switch over `Intent`
- `audit.ts` emits typed `fs.access` / `fs.denied` `BridgeEvent`
  frames with SHA-256-hashed paths, optional raw-path passthrough
  via `QWEN_AUDIT_RAW_PATHS=1`, and discriminator `kind` fields so
  SDK consumers can exhaustively narrow `event.data`
- `workspaceFileSystem.ts` — `WorkspaceFileSystem` interface +
  `createWorkspaceFileSystemFactory` with eight methods (resolve,
  stat, readText, readBytes, list, glob, writeText, edit). Every
  body method funnels failures through `recordAndWrap`, which
  wraps raw fs errors and always emits an `fs.denied` audit event
  before rethrowing. `readText` enforces `MAX_READ_BYTES` *before*
  delegating to the slurping core service so unbounded requests
  against multi-gigabyte files can no longer OOM the daemon.
  `glob` realpath-checks each hit against the canonical workspace
  and reports filtered escapes via a single aggregated `fs.denied`
  event with the dropped count
- `index.ts` is the barrel re-export PR 19/20 will import from

Modified files:

- `packages/cli/src/serve/httpAcpBridge.ts` — extracted
  `canonicalizeWorkspace` to `fs/paths.ts`; the bridge re-exports
  it so existing callers in `server.ts` and `runQwenServe.ts` keep
  working
- `packages/cli/src/serve/server.ts` — added
  `fsFactory?: WorkspaceFileSystemFactory` to `ServeAppDeps`;
  `createServeApp` builds a strict default (`trusted: false`,
  warn-once no-op `emit`) when none is injected so a future
  refactor that forgets `fsFactory` injection cannot silently
  allow writes against an untrusted workspace; factory parked on
  `app.locals` for PR 19/20 route handlers
- `packages/core/src/index.ts` — re-exports `Ignore`,
  `loadIgnoreRules`, and `LoadIgnoreRulesOptions` from
  `utils/filesearch/ignore.js` for cli consumption

411 serve tests pass; typecheck clean.

Engineering principles checklist:
- [x] Independently mergeable (no new routes, no new capability tag)
- [x] Backward compatible (no removed routes / event fields / CLI behavior)
- [x] Default off (no public surface change; PR 19/20 will activate routes)
- [x] qwen serve Stage 1 routes preserved
- [x] Gradual migration (PR 19/20 will adopt the boundary)
- [x] Reversible (single PR rollback)
- [x] Tests-first (101 unit tests across the new module + contract test)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): address PR review feedback (#4250)

Codex + Copilot review found 8 substantive issues against
7b0db4c3a; this commit fixes all of them. The first two are
P1 build-breakers introduced by the pre-commit `eslint --fix`
auto-promoting value imports to `import type` — 31 fs tests
were failing post-commit until this fix.

Issue list (links to PR comments at #4250):

1. Import type erased runtime values —
   workspaceFileSystem.ts:10-15. eslint's
   consistent-type-imports rule rewrote
   `import { Ignore, StandardFileSystemService, loadIgnoreRules,
   type WriteTextFileOptions }` -> `import type { ... }` because
   it saw type-only usage of `Ignore`. That erased
   `loadIgnoreRules` (called at runtime) and
   `StandardFileSystemService` (constructed at runtime), causing
   TS1361/TS2206 and runtime ReferenceErrors. Restored as a
   value import with inline `type` modifiers per-symbol and an
   eslint-disable line + comment so future autofixes don't
   repeat the regression.

2. Same import-type erasure in contract.test.ts:14 — `isFsError`
   was lumped under `import type` even though it's called at
   runtime. Same fix shape.

3. edit() OOM hole — workspaceFileSystem.ts. The earlier review
   pass added a pre-stat MAX_READ_BYTES gate to readText but
   missed edit, which fsp.readFile's the whole file before any
   size check. Multi-GB targets inside the workspace could OOM
   the daemon. Now stat-first; refuses above the cap with
   file_too_large; also rejects binary files (string indexOf
   over arbitrary bytes is meaningless).

4. glob accepted absolute / device patterns —
   workspaceFileSystem.ts. The ..-segment check stopped lexical
   traversal but /etc/** / C:\\Users\\foo\\** /
   \\\\server\\share\\** / //server/share/** still reached
   globAsync, walking outside the workspace before per-hit
   filtering dropped the results. Now rejects these patterns
   up-front with parse_error so no I/O happens outside.

5. glob ignore filter probed every hit as `file` —
   workspaceFileSystem.ts. The underlying `ignore` library
   needs a trailing-slash probe for dist/ / .git/-style
   directory patterns; probing as `file` silently leaked
   directory matches. Now lstats each hit and routes 'directory'
   vs 'file' to shouldIgnore so dir-only ignore rules actually
   match.

6. ReadTextOptions.line off-by-one — workspaceFileSystem.ts. The
   public option was documented as 1-based but forwarded as-is
   to readFileWithLineAndLimit, which is 0-based. A request with
   line: 1 returned content starting at the second line. Now
   converts 1-based -> 0-based at the boundary; doc clarified;
   truncation check uses the converted index.

7. ServeAppDeps.fsFactory JSDoc said trusted=true — server.ts:96.
   Stale from before the strict-default refactor in the same
   review pass. Rewrote to match the actual trusted: false +
   warn-once emit behavior.

8. MAX_READ_BYTES JSDoc said reads above cap return truncated —
   policy.ts:18. Stale from before the hard-cap refactor; now
   correctly states the cap throws file_too_large and that soft
   truncation only applies under the cap via enforceReadSize.

7 new tests cover the new behaviors:
- POSIX-absolute pattern rejection
- Win32 / UNC pattern rejection (4 variants)
- directory-pattern ignore (dist/)
- edit file-too-large
- edit binary refusal
- readText line: 1 returns from first line
- readText line: 2 starts from second line

418/418 serve tests pass; typecheck + eslint clean.

Deferred follow-ups (per PR review reply):
- glob maxResults is applied after globAsync materializes every
  match. A streaming iterator (glob.iterate) would bound the
  walk too. Non-trivial; tracked as a separate hardening
  follow-up since current behavior is correctness-safe (just
  not optimal under huge trees).
- Per-path glob escape hash in audit hint (currently aggregated
  count) — can revisit once PR 19 wires the routes and we see
  real audit volume.
- EVENT_SCHEMA_VERSION migration mechanism — orthogonal; the
  whole BridgeEvent schema lacks one and that's a Wave 5+
  concern.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 3 review-round-2 findings (#4250)

The wenshao + DeepSeek reviewer pass on a81ada43f surfaced 3 more
issues; this commit fixes them.

1. Dangling-symlink write escape (paths.ts) — Critical security
   bug. A request like `write /ws/escape` where `<ws>/escape` is
   a symlink whose target doesn't exist YET would pass the
   ENOENT-tolerant ancestor walk: realpath fails ENOENT, the
   walk-up returns `<ws>` as ancestor, the canonical becomes
   `<ws>/escape`, containment passes — but the eventual write
   follows the symlink and creates the file at the symlink
   target outside the workspace. lstat-then-readlink before the
   ancestor walk catches this; the symlink target is itself
   resolved via the deepest existing ancestor so macOS
   /private/var canonicalization stays consistent with
   boundCanonical (an absolute target inside the workspace
   tmpdir would otherwise have been false-flagged on macOS).

2. glob realpath catch over-reported symlink_escape
   (workspaceFileSystem.ts) — every realpath failure inside the
   per-hit boundary loop was counted as `symlink_escape`. EIO,
   EACCES, ENAMETOOLONG, EBUSY are environmental failures, not
   security events; mislabeling them poisoned the audit signal
   for operators trying to investigate genuine escape attempts.
   Now distinguished: ENOENT/ELOOP count as escapes; other
   errnos count as transient errors and emit a separate
   aggregated `fs.denied` with errorKind: 'permission_denied'.

3. policy.ts:enforceReadSize JSDoc said the boundary "intentionally
   does NOT throw" — stale after a81ada43f's hard-cap refactor.
   Rewrote to clarify the helper is the soft truncation gate that
   only fires under the hard cap; readText itself enforces the
   hard cap with file_too_large via its pre-stat check. The
   readme/contract is now consistent with workspaceFileSystem.ts.

2 new tests:
- dangling symlink targeting outside-workspace path → symlink_escape
- dangling symlink targeting future-inside-workspace path → succeeds
  (ahead-of-mkdir flow for atomic-write-via-rename)

420/420 serve tests pass; typecheck clean.

Remaining tracked follow-ups (per PR review reply):
- list/glob brand cast (P2 deferred per PR description)
- glob audit pathHash hashes pattern not paths
- edit() TOCTOU read-modify-write race (atomic-via-temp + rename)
- wrapAsFsError ENOSPC/EIO mapping to a distinct kind
- runQwenServe → fsFactory injection integration test
- glob maxResults streaming (glob.iterate)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): cross-platform ENOTDIR detection (#4250 CI fix)

Windows CI test failure on a81ada43f surfaced a real cross-platform
bug in `findExistingAncestor`. POSIX returns `ENOTDIR` when
`fs.stat` traverses through a non-directory in a path component
(e.g. `${ws}/file.txt/leaf` where `file.txt` is a regular file).
**Windows returns `ENOENT` for the same case.** The errno-based
guard added in a81ada43f only branched on `ENOTDIR`, so the
Windows path silently fell through to the ancestor walk and the
boundary returned a "canonical" the eventual write could not
honor — `WorkspaceFileSystem - audit always emits on body errors >
rejects ENOTDIR ancestor walk with parse_error rather than
passing boundary` failed with `expected false to be true` on the
windows-latest runner.

Fix: switch from errno-based detection (platform-divergent) to
dirent-kind detection. After `fs.stat` succeeds during the
walk-up, if the existing ancestor is NOT a directory AND there
are unresolved tail components, throw `parse_error`. Both `ENOENT`
and `ENOTDIR` from `fs.stat` are now treated as "the *current*
path doesn't resolve, keep walking" — the post-walk kind check
fires regardless of which errno surfaced. Cross-platform-safe.

The local 110/110 fs tests still pass on macOS/Linux; the Windows
case will exercise the kind-check branch on next CI run.

macOS CI failures on the same workflow run (`InputPrompt.test.tsx`
placeholder reuse, `SettingsDialog.test.tsx` 5s timeout) are pre-
existing flaky UI tests, NOT touched by this PR.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 4 review-round-3 findings (#4250)

DeepSeek's third review pass (after b38e82157) flagged four more
issues; this commit fixes all of them.

1. Multi-hop dangling symlink bypass (paths.ts) — Critical
   security fix. The earlier single-readlink fix at efd7a4611
   was bypassed by chained dangling symlinks: <ws>/leak ->
   <ws>/middle -> /scratch/evil where every layer is a symlink
   and the final target doesn't exist. The fix only readlink'd
   the first hop (<ws>/middle), saw it was inside the workspace
   via findExistingAncestor, and let the chain through. The OS
   write at <ws>/leak would then follow both hops and create
   /scratch/evil. Now loops lstat + readlink up to
   MAX_ANCESTOR_HOPS, tracks visited inodes for cycle detection,
   and only validates containment on the fully-dereferenced
   leaf. Cycle detection rejects with symlink_escape; chains
   that exceed the hop bound also surface as symlink_escape
   with a "too long or contains a cycle" hint.

2. opts.line validation (workspaceFileSystem.ts) — the docstring
   committed to "1-based positive integer" but Infinity / floats
   / negative values flowed through to readFileWithLineAndLimit
   and degraded silently. Now enforces Number.isSafeInteger +
   line >= 1 at the boundary; everything else throws
   parse_error. Test covers Infinity, -Infinity, 0, -1, 1.5,
   NaN.

3. io_error FsErrorKind (errors.ts) — wrapAsFsError previously
   conflated EIO/EBUSY/ENAMETOOLONG/EMFILE/ENFILE/ENOSPC/ETXTBSY
   under permission_denied. Monitoring pipelines that key on
   errorKind for security alerting would page oncall on a full
   disk. New io_error kind (HTTP 503) maps the environmental-
   failure errnos with distinct hints. EACCES/EPERM stay on
   permission_denied (literal access denial); EIO (failing
   disk), EBUSY (busy file), ENAMETOOLONG (PATH_MAX),
   EMFILE/ENFILE (fd exhaustion), ENOSPC (df -h reporting
   100%), ETXTBSY (text-busy) all route to io_error.

4. glob audit kind taxonomy (workspaceFileSystem.ts) — three-way
   classification mirrors wrapAsFsError so the per-hit realpath
   catch surfaces ENOENT/ELOOP -> symlink_escape, EACCES/EPERM
   -> permission_denied, everything else -> io_error. Each
   class emits its own aggregated fs.denied event.

5. edit() matchedIgnore (workspaceFileSystem.ts) — readText and
   writeText both stamp matchedIgnore in their access audit;
   edit didn't, so operators monitoring fs.access events couldn't
   distinguish edits to .gitignore'd files (build artifacts,
   logs) from edits to tracked source. Added the same
   shouldIgnore + matchedIgnore plumbing that readText uses.

8 new tests:
- multi-hop dangling symlink (security)
- symlink cycle (security)
- ENOSPC/EIO/EBUSY/ETXTBSY/ENAMETOOLONG -> io_error mapping
- io_error -> HTTP 503
- EMFILE/ENFILE updated to io_error (was permission_denied)
- opts.line rejects Infinity/-Infinity/0/-1/1.5/NaN
- edit() audit records matchedIgnore on .log file

426/426 serve tests pass; typecheck clean.

Remaining tracked follow-ups (per PR review reply):
- list/glob brand cast (P2 deferred per PR description)
- glob audit pathHash hashes pattern not paths
- edit() TOCTOU read-modify-write race (atomic-via-temp + rename) — pinned to PR 20
- runQwenServe → fsFactory injection integration test — pinned to PR 19
- glob maxResults streaming (glob.iterate)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 3 review-round-4 critical findings (#4250)

DeepSeek's fourth review pass surfaced three more critical bugs;
all are fixed in-PR.

1. TOCTOU symlink substitution in readText/readBytes/edit
   (workspaceFileSystem.ts) — Critical. fsp.stat(p) and the
   subsequent lowFs.readTextFile(p) (or fsp.readFile / fsp.readFile
   for edit) are two separate syscalls. An attacker who can write
   into the workspace can swap p from a regular file to a symlink
   pointing outside between the two calls; pre-stat sees the
   original, the read follows the swap.

   Fix: assertInodeStableAfterRead(p, preIno) — after each read,
   re-lstat p; reject with symlink_escape if the path is now a
   symlink (post.isSymbolicLink()) or its inode changed (preIno
   !== post.ino, with a 0-ino fallback for procfs / virtual
   mounts that don't report meaningful inodes). Catches the
   swap-and-leave attack and the swap-and-keep-swapped attack.
   Residual race (attacker swaps back AFTER our read but BEFORE
   our lstat) is much smaller than the original window and
   outside PR 18's threat model; fd-based reading via
   fsp.open + fileHandle.read would close it entirely but
   requires a new variant of lowFs that takes a FileHandle —
   tracked as a follow-up.

2. UTF-8 truncation corruption in readText (workspaceFileSystem.ts)
   — Critical. buf.subarray(0, sizeOutcome.bytesToRead).toString
   ('utf-8') silently emits U+FFFD when bytesToRead falls
   mid-codepoint (CJK, emoji). A downstream consumer parsing
   JSON or source code over the truncated content would see
   broken trailing bytes; meta.truncated would be true but the
   prefix is corrupt. A subsequent edit() with the corrupted
   string as oldText would also fail to match the on-disk content.

   Fix: safeUtf8Truncate(buf, maxBytes) walks back from maxBytes
   through any continuation bytes (0b10xxxxxx), then verifies
   the leading byte's full sequence fits within the cap; drops
   the leading byte if it doesn't. The result is always a clean
   prefix at a valid codepoint boundary. Test pins '中文测试' (12
   bytes / 3 bytes per char) truncated at 7 bytes -> '中文' (no
   U+FFFD).

3. glob opts.cwd bypasses workspace boundary
   (workspaceFileSystem.ts) — Critical. opts.cwd was used
   directly as the glob root with no validation against
   boundWorkspace. ResolvedPath is a brand cast and a stale
   or forged value lets a glob('**/*', { cwd: '/etc' })
   enumerate files outside the workspace. The pattern-side
   absolute / UNC checks added in a81ada43f only constrain
   the *pattern*; cwd is the actual hazard.

   Fix: at the entry point of glob(), path.resolve cwd and
   isWithinRoot-check against boundWorkspace. Throws
   path_outside_workspace if cwd is outside, even when the
   pattern itself is harmlessly relative. Test pins the case
   with cwd: scratch (outside workspace).

3 new tests:
- readText with mid-operation symlink swap -> symlink_escape
- safeUtf8Truncate keeps CJK codepoints intact at 7-byte cap
- glob with opts.cwd outside workspace -> path_outside_workspace

429/429 serve tests pass; typecheck + eslint clean.

Remaining tracked follow-ups (Post-PR-18 hardening, in #4175):
- list/glob brand-cast contract (PR 19)
- runQwenServe → fsFactory injection contract test (PR 19)
- edit() write-side TOCTOU + atomic-via-temp + expectedHash (PR 20)
- glob audit pathHash (independent audit.ts commit)
- glob maxResults streaming (independent hardening)
- glob pattern preflight refactor to reuse hasSuspiciousPathPattern (cosmetic)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 7 review-round-5 findings (#4250)

DeepSeek round-5 surfaced 9 comments; 7 are real findings (the
other 2 — UTF-8 truncation + glob opts.cwd — already fixed in
546834267 from round-4 and replied-to inline). Stack on round-4.

1. edit() empty-oldText silent-prepend (workspaceFileSystem.ts) —
   Critical silent data corruption. JS `''.indexOf('')` returns 0,
   so without an empty-string guard
   `current.slice(0,0) + newText + current.slice(0)` = `newText +
   current` — silently prepends `newText` to the whole file with
   a success audit event. PR 20 routes that thread user-supplied
   `oldText` verbatim must not be able to trigger this. Now
   throws `parse_error` BEFORE the read with a hint explaining
   why empty matches are rejected.

2. DOS device name regex misses bare names + first-extension forms
   (paths.ts) — Windows attack surface. The earlier
   /\.(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])$/i only caught the
   last-extension form (file.CON). NTFS reserves these names
   regardless of extension: CON, NUL, CON.txt, NUL.dat,
   CON.foo.bar are ALL reserved device handles. New regex
   /(^|\.)(CON|...|LPT[1-9])(\.|$)/i covers all four positions
   (bare / first-ext / last-ext / middle-ext) while still
   admitting legitimate substrings (BACON, concat.txt, precon.go).

3. Buffer.byteLength replaces Buffer.from for size-only checks
   (workspaceFileSystem.ts) — 5 MB heap allocation per write
   eliminated. `writeText` and `edit()` previously materialized
   the entire UTF-8 payload (up to MAX_WRITE_BYTES) just to read
   `.length`; `Buffer.byteLength(content, 'utf-8')` returns the
   count without allocating.

4. edit() error message includes oldText snippet
   (workspaceFileSystem.ts) — Production debuggability. The
   earlier hint was just "edit() expects oldText to appear
   verbatim in the file" — at 3 AM an operator can't tell whether
   the mismatch is whitespace, a stale file, or a wrong target.
   Now includes a JSON-quoted truncated snippet (max 80 chars +
   ellipsis) of the searched-for text.

5. recordAndWrap forwards FsError message into fs.denied audit
   (workspaceFileSystem.ts + audit.ts) — Audit observability gap.
   Audit consumers debugging an incident saw `errorKind` + `hint`
   but lost the underlying OS error detail (path, errno text,
   byte count). FsDeniedAuditPayload now carries an optional
   `message` field; recordAndWrap forwards `fs.message`
   automatically.

6. EISDIR / ENOTDIR distinct hints (errors.ts) — UX. Both shared
   the same hint "a path component is not a directory where one
   was expected" — for EISDIR (path IS a directory but a file was
   expected) the wording was reversed. Now distinct hints with
   the errno name explicitly cited.

7. kindFromStats / kindFromDirent merged into kindFromStatLike
   (workspaceFileSystem.ts) — duplicate function bodies removed.
   Both fs.Stats and fs.Dirent expose the same isFile /
   isDirectory / isSymbolicLink interface, and both targets
   (FsStat['kind'], FsEntry['kind']) are the same 4-value union.
   Single helper avoids drift if the union grows.

4 new tests:
- bare/multi-ext DOS device names (CON, NUL, CON.txt, CON.foo.bar)
  + legitimate substrings (BACON, concat, precon, contemplating)
- edit() empty oldText -> parse_error + file unchanged
- edit() not-found error includes searched snippet in hint
- fs.denied audit payload carries FsError message

Plus the 2 already-fixed items (UTF-8 boundary, glob opts.cwd)
have new test coverage from round-4.

433/433 serve tests pass; typecheck + eslint clean.

Stack: 7b0db4c3aa81ada43fefd7a4611b38e82157911cb8e5d546834267 → THIS

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 7 review-round-6 findings (#4250)

Round-6 review (wenshao + gpt-5.5) flagged 7 items including
two Critical and one privacy regression I introduced in round-5.

1. writeText / edit pre-write symlink guard
   (workspaceFileSystem.ts) — Critical, two reviewers (wenshao
   #CsBcq + gpt-5.5 #CsB3M) independently flagged.
   `atomicWriteFile` (`packages/core/src/utils/atomicFileWrite.ts`)
   resolves symlinks at write time, so a swap between the
   boundary's `resolve()` and `lowFs.writeTextFile()` would let
   the write follow the symlink to outside the workspace.
   `writeText` had no read phase, so its window was wider than
   `edit()`'s. New helper `assertNotSymlinkBeforeWrite(p)` lstats
   the path immediately before each `lowFs.writeTextFile` call;
   ENOENT is fine (ahead-of-create flow), but an actual symlink
   throws `symlink_escape`. Used in `writeText` AND `edit()`.
   Residual race after this guard but before the write completes
   is the deferred PR 20 atomic-via-temp follow-up.

2. recordAndWrap message field bypassed privacy mode
   (audit.ts) — Critical privacy regression I introduced at
   ebd9e78a1 (round-5). The new `FsDeniedAuditPayload.message`
   field forwarded `FsError.message` unconditionally — and many
   throw-sites embed `${p}` (absolute paths) or user-supplied
   `oldText` snippets into the message. Privacy-mode operators
   (without `QWEN_AUDIT_RAW_PATHS=1`) saw paths leak through the
   message field even though they explicitly disabled raw-path
   logging. Fixed: `message` is now gated behind `includeRawPaths`
   alongside `relPath`. Privacy mode = no path-bearing fields,
   period. Operators wanting forensic context opt in via
   `QWEN_AUDIT_RAW_PATHS=1` and accept both fields together.

3. glob opts.cwd via symlink (workspaceFileSystem.ts) — gpt-5.5
   #CsB3P. Textual `path.resolve(cwd) + isWithinRoot` admits
   `<ws>/link` even when `<ws>/link → /etc` is a symlink to
   outside; `globAsync` walks `/etc` before the per-hit filter
   drops results. Switched to `fsp.realpath(path.resolve(cwd))`
   so the containment check sees the actual walk root. ENOENT
   on cwd surfaces as `path_not_found`.

4. readText OOM via concurrent file growth
   (workspaceFileSystem.ts) — wenshao #CsBeE. The pre-stat
   `MAX_READ_BYTES` gate only sees the size at stat time; a
   concurrent writer can grow the file before the actual
   `readFileWithLineAndLimit` slurp. Added post-read
   `Buffer.byteLength(result.content) > MAX_READ_BYTES` check.
   The proper fix (fd-based read tying size + read to the same
   inode) is a hardening follow-up; this byte-length check is
   the defense-in-depth layer.

5. readBytes maxBytes can widen past MAX_READ_BYTES
   (policy.ts) — wenshao #CsBj5. `enforceReadBytesSize(st.size,
   opts.maxBytes)` used the caller-supplied `maxBytes` as the
   ceiling, replacing rather than clamping `MAX_READ_BYTES`. A
   future PR 19/20 route forwarding `req.query.maxBytes` could
   blindly bypass the daemon's 256 KiB safety cap. Now clamps
   via `Math.min(maxBytes, MAX_READ_BYTES)`.

6. ENOENT_TOLERATING_INTENTS docstring + test (paths.ts) —
   wenshao #CsBk3. The Intent docstring only mentioned `'write'`
   tolerating ENOENT; `'stat'` was in the set undocumented. A
   future maintainer removing `'stat'` thinking it was a
   copy-paste error would silently change behavior (stat on a
   concurrently-deleted path would throw `path_not_found` from
   the resolver instead of letting `fsp.lstat` throw `ENOENT`
   naturally). Amended docstring to call out `'stat'`'s rationale
   explicitly + added contract corpus case.

6 new tests:
- glob cwd via symlink to outside → path_outside_workspace
- writeText with mid-operation symlink swap → symlink_escape +
  outside file unchanged
- edit with mid-operation symlink swap → symlink_escape + outside
  file unchanged
- readBytes opts.maxBytes attempting widening → file_too_large
- fs.denied message field absent in privacy mode (default)
- fs.denied message field present in raw-paths mode (forensic)
- contract corpus: resolve('newdir/leaf', 'stat') succeeds for
  ENOENT path

439/439 serve tests pass; typecheck + eslint clean.

Stack: 7b0db4c3aa81ada43fefd7a4611b38e82157911cb8e5d546834267ebd9e78a1 → THIS

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 5 review-round-7 findings (#4250)

glm-5.1 round-7 review surfaced 6 items; 5 are real fixes, 1 was
already addressed in 7f4f30d0b (writeText pre-write symlink
guard — reviewer looked at ebd9e78a1 snapshot ~8h before the
round-6 fix). Stack on round-6.

1. edit() bypassed lowFs.readTextFile (workspaceFileSystem.ts) —
   Critical encoding round-trip corruption. The earlier
   `fsp.readFile(p, 'utf-8')` included the UTF-8 BOM verbatim in
   `current`, breaking `oldText` matching even when the user
   passed the exact source text from a previous read; lost
   iconv-supported codepage handling (GBK / Big5 / Shift_JIS),
   so non-UTF-8 files would mojibake into `current` and
   round-trip-corrupt on write-back; and the subsequent
   `lowFs.writeTextFile` passed no `_meta`, stripping the BOM
   and forcing UTF-8 on write-back even when the original was
   BOM'd or non-UTF-8. Fix: read via `lowFs.readTextFile` AND
   forward `readResult._meta` into the write-back. Test pins
   UTF-8 BOM round-trip end-to-end.

2. hasSuspiciousPathPattern multi-digit and POSIX false positive
   (paths.ts) — `/~\\d/` only matched single-digit; NTFS
   allocates `~10+` on >9 collisions. POSIX false-positives:
   editor swap files, backup tools used `~N` legitimately. Fixed
   to `/~\\d+/` and gated behind `process.platform === 'win32'`,
   matching the ADS-colon check.

3. canonicalizeWorkspace redundant per-request syscall
   (paths.ts) — Performance. Factory canonicalizes once at
   build; every `resolveWithinWorkspace` also ran realpathSync
   on the same path, blocking the event loop. Added
   CANONICAL_BOUND_CACHE Map keyed on the input string;
   steady-state size = 1 per `1 daemon = 1 workspace`.

4. readBytes opts.maxBytes API contract
   (workspaceFileSystem.ts) — Semantic mismatch. Parameter name
   promised window-read; impl only used it as a hard reject
   gate. Now truncates the buffer post-read so `readBytes(p,
   { maxBytes: 1024 })` on a 200 KB file returns 1 KB. Hard
   `MAX_READ_BYTES` cap still throws for files above it.

5. glob walks node_modules and .git unnecessarily
   (workspaceFileSystem.ts) — Performance. Without an `ignore`
   option, `globAsync` traversed every file under those dirs
   before our per-hit `shouldIgnore` filter. Now passes
   `ignore: ['**/node_modules/**', '**/.git/**']` to
   short-circuit traversal. Post-filter via `shouldIgnore`
   remains authoritative.

5 new tests:
- 8.3 short-name regex Windows / POSIX split
- readBytes truncates 2048-byte file to 1024 with maxBytes
- readBytes throws file_too_large only above hard cap
- edit() preserves UTF-8 BOM round-trip
- glob prunes node_modules and .git

442/442 serve tests pass; typecheck + eslint clean.

Stack: 7b0db4c3aa81ada43fefd7a4611b38e82157911cb8e5d546834267ebd9e78a17f4f30d0b → THIS

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 10 review-round-8/9 findings (#4250)

Two more review passes (gpt-5.5 + DeepSeek + wenshao) flagged 13
items; 10 are real fixes, 3 are reviewer-stale-snapshot or
already-tracked. Stack on round-7.

Critical (5):

1. paths.ts symlink-escape hint embedded the symlink target
   (gpt-5.5) — Privacy regression sibling to round-6 audit
   `message` gate. `recordDenied` always forwards `hint` into
   `fs.denied` even with `QWEN_AUDIT_RAW_PATHS` off; the hint
   `'symlink points to /Users/alice/secret'` leaks the
   attacker's intended exfiltration path through audit. Hint is
   now path-free; operators wanting the resolved target enable
   `QWEN_AUDIT_RAW_PATHS=1` and read it from `relPath` /
   `message`.

2. paths.ts dangling-symlink chain discarded its verified
   canonical (DeepSeek) — After the multi-hop walk validated
   `cursor → canonicalTarget` was inside the workspace, the
   code fell through to `findExistingAncestor(absolute)`,
   re-walking from the original input and discarding the
   verified result. An attacker swapping an intermediate
   symlink between the verification and the re-walk could
   produce a different canonical than the one validated. The
   verified `canonicalTarget` is now captured in
   `symlinkResolvedCanonical` and used directly; the
   `findExistingAncestor(absolute)` fallthrough only runs when
   no symlink was traversed.

3. workspaceFileSystem.ts readBytes missing post-read size
   check (DeepSeek) — Same TOCTOU shape as `readText`'s
   round-6 fix. The pre-stat `enforceReadBytesSize` sees the
   size at stat time; a concurrent appender keeps the same
   inode but grows the file past the cap before
   `fsp.readFile` returns. `assertInodeStableAfterRead`
   catches inode changes but not same-inode growth. Added a
   post-read `buf.length > MAX_READ_BYTES` check matching
   `readText`'s defense-in-depth pattern.

4. errors.ts wrapAsFsError default = permission_denied
   (DeepSeek) — Misclassified non-errno errors (`TypeError`,
   programmer-error throws, native module exceptions) as
   security denials, paging security oncall for what should
   be a developer ticket. New `internal_error` kind (HTTP
   500) is the new default; `permission_denied` reserved for
   actual `EACCES`/`EPERM`.

5. audit.ts AuditContext.sessionId not forwarded to
   BridgeEvent (DeepSeek) — Multi-session daemons couldn't
   trace audit events back to the session that triggered
   them. `originatorClientId` identifies the client, not the
   session. Added optional `sessionId` field to both
   `FsAccessAuditPayload` and `FsDeniedAuditPayload`,
   forwarded from `ctx.sessionId` when present.

Improvements (4):

6. workspaceFileSystem.ts glob cwd realpath redundant when
   cwd === boundWorkspace (wenshao) — `boundWorkspace` is
   already canonicalized by the factory (`realpathSync.native`
   at build time), so calling `fsp.realpath` per-request when
   no `opts.cwd` was supplied is a redundant async syscall.
   Added a short-circuit.

7. workspaceFileSystem.ts kindFromStatLike JSDoc orphaned
   (wenshao) — Inserting `assertNotSymlinkBeforeWrite` between
   the JSDoc and `kindFromStatLike` left the doc floating
   above the wrong function. IDE hovers showed the wrong
   description. Moved the doc back to its function.

8. workspaceFileSystem.ts shared mutable Ignore object
   (DeepSeek) — `createWorkspaceFileSystemFactory` builds one
   `Ignore` instance and shares it across every
   `WorkspaceFileSystemImpl` returned by `forRequest()`.
   `Ignore.add(): this` is a public mutator. A future
   "per-session ignore rules" feature calling `.add()` from a
   request handler would silently corrupt all concurrent
   sessions. `Object.freeze` turns the cross-request mutation
   into a `TypeError` rather than a silent leak.

9. server.ts createDefaultFsAuditEmit one-shot warned
   (DeepSeek) — Permanent silent no-op after the first event;
   only logged the event `type` with no pathHash / errorKind /
   intent. If PR 19 forgets the real factory injection, every
   write 403s and audit is silent past the first warning —
   exactly the regression the warning exists to surface.
   Periodic warning (every 100th drop) + first-event context
   (errorKind, intent, pathHash) makes the regression
   actionable in production logs.

Cleanup (1):

10. workspaceFileSystem.ts safeUtf8Truncate dead code
    (DeepSeek noted as "off-by-one") — The lead-byte
    seqLen-check block was dead code: `subarray(0, end)`
    already excludes the leading byte at `end`, so no
    further adjustment is ever needed. Removed the block;
    function is now 4 lines and still produces a valid
    codepoint prefix. Reviewer's suggested fix
    (`buf[end-1] → buf[end]`) was technically correct but
    redundant with the subarray cut.

Already-fixed (3 reviewer-stale-snapshot, reply + resolve):

- writeText pre-write symlink guard — fixed in 7f4f30d0b
- edit() read-modify-write race — already deferred to PR 20
  atomic-via-temp follow-up
- glob maxResults walk-bound — already follow-up #5

3 new tests + 2 updated:
- wrapAsFsError unknown errno → internal_error (default change)
- internal_error has HTTP 500
- non-Error throwables → internal_error (not permission_denied)
- readBytes post-stat growth → file_too_large
- existing wrapAsFsError test updated for new default

445/445 serve tests pass; typecheck + eslint clean.

Stack: 7b0db4c3aa81ada43fefd7a4611b38e82157911cb8e5d546834267ebd9e78a17f4f30d0b1dc9d2290 → THIS

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 3 review-round-10 findings (#4250)

Round-10 review (wenshao) flagged 4 items, all marked
non-blocking by the reviewer. 3 are landed in-PR; 1 (glob
withFileTypes optimization) is moved to issue #4175 follow-ups
since the reviewer themselves recommended "test it on a 100K-file
workspace after PR 19 lands."

1. canonicalizeWorkspace docstring (paths.ts) — Documentation.
   The earlier FIXME warned about sync syscall blocking the
   event loop without mentioning the `CANONICAL_BOUND_CACHE`
   added in 1dc9d2290 brings steady-state cost to zero. Future
   reviewers reading the FIXME again would re-flag the
   already-mitigated concern. Added a paragraph noting cache hit
   rate = 100% under the `1 daemon = 1 workspace` model so
   per-request blocking only happens at boot or on fresh
   workspace values (e.g. tests).

2. enforceReadBytesSize dead `maxBytes` parameter (policy.ts) —
   Round-7 changed `readBytes` to use post-read truncation for
   the soft window cap, leaving the function's `maxBytes`
   parameter unused at the only callsite. The
   `Math.min(maxBytes, MAX_READ_BYTES)` clamp branch became dead
   code. Reviewer's option 1: tighten the signature to
   `enforceReadBytesSize(fileBytes: number): void`. Done — the
   function is now purely the hard-cap enforcer its name
   implies; soft-window truncation lives in the orchestrator's
   `buf.subarray(0, opts.maxBytes)` post-read step where it's
   visible alongside the cap check it complements.

3. relForAudit cross-drive sentinel (audit.ts) — Windows-only
   privacy edge case. `path.relative('C:\\ws', 'D:\\evil')`
   returns `'D:\\evil'` (an absolute path) because Win32 can't
   express cross-drive relatives. Even when raw-paths mode is
   ENABLED, the audit `relPath` field would carry the off-drive
   absolute path, exposing the attacker's drive letter +
   directory. Added a `path.isAbsolute(rel)` post-check that
   substitutes a `<cross-drive>` sentinel — audit consumers see
   the cross-drive case distinctly without leaking the offending
   path. This was previously P2 deferred in PR 18's description
   ("Windows cross-drive path.relative"); reviewer's "few lines
   beats deferring" assessment was right.

Tracked as follow-up (not in this commit):

4. glob withFileTypes optimization — Replace per-hit `lstat`
   (line ~664) with `glob` v10's `withFileTypes: true` so each
   hit comes back as a `Path` object with `isDirectory()` /
   `isFile()` / `isSymbolicLink()` already available. Saves N
   syscalls in large workspaces. Non-trivial restructure (return
   type changes from `string[]` to `Path[]`). Reviewer
   themselves marked "[performance / non-blocking]" and said
   "test it on a 100K-file workspace after PR 19 lands so we
   know whether it's worth it." Added to issue #4175 body's
   Post-PR-18 hardening follow-ups.

446/446 serve tests pass; typecheck + eslint clean.

Stack: 7b0db4c3aa81ada43fefd7a4611b38e82157911cb8e5d546834267ebd9e78a17f4f30d0b1dc9d2290a33d459df → THIS

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
2026-05-18 13:14:43 +08:00
jinye
96219924a0
feat(serve): MCP client guardrails (#4175 Wave 3 PR 14) (#4247)
* feat(serve): MCP client guardrails (#4175 Wave 3 PR 14)

Adds an in-process MCP client counter, slot-reservation enforcement at all 3 spawn sites (discoverAllMcpTools / discoverAllMcpToolsIncremental / readResource), new `--mcp-client-budget=N` + `--mcp-budget-mode={enforce,warn,off}` CLI flags forwarded to the ACP child via env, and additive `clientCount` / `clientBudget` / `budgetMode` / `budgets[]` fields plus `disabledReason: 'budget'` tagging on `GET /workspace/mcp`.

Always-on capability tag `mcp_guardrails` with `modes: ['warn', 'enforce']` so SDK clients can pre-flight refusal semantics. Typed SSE push events (`mcp_budget_warning` / `mcp_child_refused_batch`) intentionally deferred to a small follow-up PR — the snapshot already exposes `budgets[0].status: 'warning'|'error'` + `refusedCount` so operator visibility isn't blocked.

* fixup(serve): address PR 14 review (#4247) findings 1-7

Addresses Codex + Copilot review feedback on #4247. Seven functional and forward-compat fixes; (8) `tcp` transport mapper vs createTransport deferred pending @wenshao direction (separate core/protocol decision).

1. **Single-server rediscovery bypass** — add `tryReserveSlot` at the top of `discoverMcpToolsForServerInternal`. Pre-fix a server refused at startup could be brought online later via `/mcp reconnect <name>` and exceed the cap in enforce mode.
2. **Empty `budgets[]` when mode=off** — early `return []` in `buildBudgetCells` when mode is `off`. Protocol docs / SDK types promise empty array; pre-fix emitted a synthetic noisy cell.
3. **runQwenServe validation + env leakage** — mirror CLI budget validation in `runQwenServe` (the embedded entry point); explicitly delete `QWEN_SERVE_MCP_*` env vars when options are undefined so multiple daemons in one process don't leak prior budget config to subsequent ACP children.
4. **Disabled-vs-refused precedence + stale refusal log** — config-disable wins over budget refusal in the per-server cell; `removeServer` + `disconnectServer` drop the entry from `lastRefusedServerNames` so operator action immediately clears the budget tag.
5. **Incremental remove-before-reserve ordering** — process config-removed servers FIRST in `discoverAllMcpToolsIncremental` so freed slots are visible to subsequent `tryReserveSlot` calls. Pre-fix scenario {a,b}→{a,c} with budget=2 wasted a slot.
6. **`scope` forward-compat type widening** — `'workspace' | (string & {})` on both `ServeMcpBudgetStatusCell` and `DaemonMcpBudgetStatusCell` so SDK consumers don't break when PR 23 adds `scope: 'pool'` per the documented no-schema-bump contract.
7. **Test comment alignment** — fix "With budget=1" comment to match `clientBudget: 2` code.

Plus 4 new core regression tests covering #1/#2/#4/#5, and 4 new serve tests covering #3 (boot rejection + env cleanup). 237/237 pass across the affected files (36 core mcp-client-manager + 50 acpAgent + 151 serve).

* docs(serve): clarify v1 snapshot-based budget warning detection (#4247)

Address github-actions review-summary finding (I) on PR #4247: v1 operators have no SSE push event for budget pressure yet (deferred to PR 14b), so the protocol doc should explicitly say how to detect warning / error states from the snapshot. Adds the three-way mapping `budgets[0].status` ↔ live/refused counts.

* fixup(serve): address PR 14 review round 2 (#4247 wenshao)

Addresses @wenshao review on PR #4247. Three critical safety fixes + four suggestion-level improvements.

Critical (zombie slot leaks — would break `enforce` mode for the rest of the daemon's lifetime):
- C2: `discoverAllMcpTools` connect() catch now releases reservedSlots + clients entry. Pre-fix one failed connect permanently consumed a budget slot.
- C3: `readResource` wraps client.connect() in try/catch; on throw the slot + client entry are cleaned up before re-raising. Tracked `weReservedSlot` so the cleanup only fires for newly-created lazy spawns (reused already-CONNECTED clients are untouched).
- (wenshao C1 was the rediscovery-bypass also caught by Codex + Copilot — already addressed in fixup 597f011e6.)

Suggestion:
- S4: `readBudgetFromEnv` downgrades `mode='enforce'` → `'off'` when no budget is set, mirroring the CLI + `runQwenServe` invariant. Fail-closed on operator misconfiguration rather than silently bypassing enforcement.
- S5: extract duplicated `mcp_budget_decision` telemetry into private `emitBudgetTelemetry(configuredCount)`.
- S6: rename `BudgetExhaustedError` constructor param `liveCount` → `reservedCount`. `reservedSlots.size` is what's blocking the new server, not the live CONNECTED count (those differ when a reserved server is disconnected).
- S7a: bump accounting-failure log level — `debugLogger.debug` (gated on debug=true) replaced by `process.stderr.write` so production daemons surface slot-leak / type-mismatch failures in journald/docker logs.

(S7b — expose `reservedSlots[]` on the wire for slot-leak debugging — deferred as additive; will be in PR 14b alongside the typed events.)

+ 3 new core regression tests (C2 leak release, C3 lazy-spawn leak release, S4 env enforce-downgrade). 626/626 tests pass across the focused suite; typecheck + lint clean.

* fixup(serve): address PR 14 review round 3 (#4247 wenshao second pass)

Addresses @wenshao's second review pass on PR #4247 (submitted 15:56Z after round 2 fixup landed). Four code fixes + three doc clarifications.

Code:
- R3 #5: `readResource` lazy-spawn path now checks `isMcpServerDisabled` BEFORE the budget gate. Pre-existing gap: a server disabled via `mcpServers.<name>.disabled: true` or `/mcp disable <name>` could be resurrected by any resource read. Disabled precedence over budget mirrors the per-server cell logic.
- R3 #6: `buildBudgetCells` now receives the post-disabled-filter `refusedCount` so the workspace cell matches the per-server cell precedence. Pre-fix a server disabled after being refused rendered `disabled` on its per-server row but `error: budget_exhausted` on the workspace row.
- R3 #7: extract `MCP_BUDGET_WARN_FRACTION = 0.75` constant. Was hardcoded in `acpAgent.buildBudgetCells` AND `commands/serve.ts` stderr breadcrumb (the latter with `Math.ceil` divergence on non-integer multiples). Pre-extract so PR 14b's dual-threshold (0.75 warn + 0.375 rearm) lands in one file.
- R3 #1: env-var enforce-without-budget downgrade (already fixed in round 2 ba3e3febd S4 — reply-only on the new thread).

Docs:
- R3 #2: docstring on `mcpTransportOf` now spells out the `tcp` vs `createTransport` divergence + records the deferred decision (PR 14b / future core). Closes the "comment claims X but code does Y" gap.
- R3 #3: comments in both `discoverAllMcpTools` catch (release slot — stop() owns lifecycle) AND `discoverMcpToolsForServerInternal` catch (KEEP slot — operator intent + health-monitor retry). Different paths, different contracts, both explicit.
- R3 #4: invariant note in `readResource` lookup→reserve sequence documenting the synchronous no-await guarantee that closes the TOCTOU window.

+ 3 new core regression tests (readResource disabled gate, disabled-wins-over-budget precedence, MCP_BUDGET_WARN_FRACTION pin). 629/629 tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 4 (#4247 wenshao second + third pass)

Addresses @wenshao's second + third review passes on PR #4247. One critical scope-correction (per-session vs per-workspace) + one zombie leak fix shared across three threads.

Critical correction — per-session vs per-workspace (wenshao R3 line 117 docs):
- Reality check: `acpAgent.newSessionConfig()` constructs a fresh `Config` + `ToolRegistry` + `McpClientManager` for EVERY ACP session. Each manager independently reads `QWEN_SERVE_MCP_CLIENT_BUDGET` env. So `--mcp-client-budget=10` with 5 sessions caps at 5 × 10 = 50 live MCP clients across the daemon, NOT 10. The "per-workspace" framing in v1 docs was incorrect.
- Pragmatic v1 path (not the big refactor): rewrite docs + change `scope: 'workspace'` → `scope: 'session'` so the wire contract reflects reality. Wave 5 PR 23 (shared MCP pool) will introduce a workspace-scoped manager and add `scope: 'workspace'` cells alongside.
- Files touched: `status.ts` + `sdk types.ts` (cell `scope` field widened to `'session' | 'workspace' | (string & {})` with v1 emitting `'session'`), `acpAgent.buildBudgetCells` (emits `'session'` + new code comment explaining the per-session truth), `docs/users/qwen-serve.md` (CLI flag + budget section relabel + ⚠️ v1 limitation callout), `docs/developers/qwen-serve-protocol.md` (capabilities section + JSON example + paragraph rewrite + per-session detection hint).

Zombie leak fix — single weReserved-pattern fix in discoverMcpToolsForServerInternal closes wenshao R3 line 546 + R4 line 639 + R4 line 929:
- Same pattern as R2 C3 (`readResource`): track `weReservedSlot = reservation === 'reserved' && this.reservedSlots.has(serverName)` (the set-membership guard distinguishes a real fresh reservation from `off`-mode's no-op return). On connect-failure, release slot + drop client only when `weReservedSlot`; an `'already_held'` reconnect keeps its slot so health-monitor retry doesn't compete for capacity.
- Pre-fix a brand-new server connecting via /mcp reconnect / health monitor / incremental's serversToUpdate that failed on connect() would permanently consume a budget slot under enforce mode.
- Updated R3's "always keep" doc comment to reflect the new two-mode cleanup (release on fresh + keep on reconnect).
- Caught and added a tripwire test for the `off`-mode no-op edge case (`tryReserveSlot` returns `'reserved'` without adding to the set in off mode — without the has-guard, my fix would have broken the pre-existing "should restore health checks after failed server rediscovery" test by deleting the failed client even in unbudgeted operation).

+ 2 new core regression tests (fresh-reserve connect-failure releases slot, reconnect connect-failure keeps slot). 631/631 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 5 (#4247 wenshao fourth pass)

Addresses @wenshao's fourth review pass on PR #4247. Two critical zombie-leak / staleness fixes; three reviewer findings deferred or already-addressed (replied + resolved on the threads).

Critical fixes:
- R5 line 956: `runWithDiscoveryTimeout` timeout handler now releases `reservedSlots.delete(serverName)` and drops the stale `lastRefusedServerNames` entry alongside the existing `clients.delete`. Pre-fix a timed-out server in `enforce` mode permanently held its budget slot; N consecutive timeouts permanently degraded daemon capacity. + regression test.
- R5 line 1268-1: `readResource` lazy-spawn path drops the server from `lastRefusedServerNames` when `tryReserveSlot` returns `'reserved'` (a successful late re-reservation). Pre-fix a server refused at discovery but later re-reserved via `readResource` (e.g., after another server freed a slot) kept its stale `disabledReason: 'budget'` tag in the snapshot. + regression test.

Reviewer findings deferred / already done (replied + resolved):
- R5 line 1268-2 (`no try/catch around connect()` in readResource): stale view — R2 C3 fixup ba3e3febd added the try/catch with the weReservedSlot cleanup pattern.
- R5 line 1274 (`BudgetExhaustedError.liveCount` semantic mismatch): R2 S6 fixup ba3e3febd renamed the param + readonly field to `reservedCount`, exactly matching the proposed semantic.
- R5 acpAgent.ts null line (`Math.ceil(0.75 * budget)` for small budgets): proposed fix is semantically a no-op for integer liveCount — `liveCount >= 0.75` and `liveCount >= Math.ceil(0.75) === 1` give identical results when liveCount is an integer. The underlying "small budgets jump ok→error" observation is a real but inherent limitation of percentage-based thresholds at small N; design tradeoff, not implementation bug.

46/46 core tests pass (44 prior + 2 new R5 regression). Typecheck + lint clean.

* fixup(serve): address PR 14 review round 6 (#4247 wenshao fifth pass)

Addresses @wenshao's fifth review pass on PR #4247. Two critical fixes (one TOCTOU race, one cross-daemon env leak).

Critical fixes:
- R6 Thread 2 (line 956): remove the duplicate pre-reservation block in `discoverAllMcpToolsIncremental`. The reservation already happens inside `discoverMcpToolsForServerInternal` (R1 fix #1). With both sites reserving, the timeout cleanup raced against the inner connect path — `runWithDiscoveryTimeout`'s timeout handler could release the slot mid-flight while the inner `connect()` later resolved successfully, leaving a CONNECTED client with NO reservation and breaking `enforce`-mode budget enforcement. With pre-reservation removed, the inner call owns the entire reservation lifecycle (reserve → connect → release-on-failure-via-weReservedSlot → cleared-by-timeout-if-fires) at a single site. Refusal behavior is observably identical from outside.

- R6 Thread 1 (runQwenServe.ts:216): per-handle env passthrough via new `BridgeOptions.childEnvOverrides` instead of mutating global `process.env`. Pre-fix concurrent embedded `runQwenServe()` handles with different MCP budgets would race on the global env — `defaultSpawnChannelFactory` snapshots `process.env` AT SPAWN TIME, so the last `runQwenServe()` call to set the var would silently win for ALL daemon handles' subsequent ACP child spawns. Wire surface:
  - `ChannelFactory` signature: `(workspaceCwd, childEnvOverrides?) => Promise<AcpChannel>`.
  - `BridgeOptions.childEnvOverrides?: Readonly<Record<string, string | undefined>>` — `undefined` value means "scrub this var from the child env" so an embedded caller can wipe a stale inherited var without touching global state.
  - `defaultSpawnChannelFactory` merges overrides AFTER `SCRUBBED_CHILD_ENV_KEYS` so the daemon-only secret list still wins (operators can't override the scrub).
  - `runQwenServe` closes over per-handle overrides; never touches `process.env`.

+ 3 new regression tests (incremental refusal post-pre-reservation-removal, runQwenServe-doesn't-mutate-process.env, bridge forwards childEnvOverrides to channelFactory with two concurrent bridges asserting isolation). 327/327 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 7 (#4247 wenshao sixth pass)

Addresses @wenshao's sixth review pass on PR #4247 (glm-5.1 via Qwen Code /review). One critical staleness fix + four real bug fixes + one operator-visibility breadcrumb + one refactor.

Critical:
- R7 #1 line 612: `discoverMcpToolsForServerInternal` now drops the entry from `lastRefusedServerNames` on successful connect+discover. Pre-fix a previously-refused server that reconnects via `/mcp reconnect` (or health-monitor retry after another server frees capacity) left the snapshot reporting `error / disabledReason: 'budget'` for a CONNECTED, working server until the next discovery pass cleared the per-pass log.

Real bugs:
- R7 #2 line 528: disabled gate added to `discoverMcpToolsForServerInternal`. Reachable from `/mcp reconnect`, OAuth re-discovery, and health-monitor `reconnectServer` — none of which previously checked `isMcpServerDisabled`. Pre-fix a disabled server could be resurrected through any of these paths, wasting a budget slot and registering tools the operator told us to ignore. Mirrors the bulk-discovery + readResource patterns. Optional-chain on the call to stay defensive against test fixtures missing the method.
- R7 #3 line 634: transport leak in the `discoverMcpToolsForServerInternal` connect-failure catch. Pre-fix when `connect()` succeeded (transport established) and `discover()` later threw, the catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / socket until Node exit. Best-effort `await client.disconnect()` added before the map cleanup.
- R7 #4 line 1302: `readResource`'s `weReservedSlot` now uses the same `reservation === 'reserved' && this.reservedSlots.has(serverName)` guard as `discoverMcpToolsForServerInternal`. Distinguishes a real fresh reservation from `off`-mode's no-op return. Maintenance-trap fix; in `off` mode the cleanup branch never fires now.
- R7 #5 line 1342: `readResource` re-checks `isMcpServerDisabled` on EVERY call, regardless of whether the client was just lazy-spawned or pre-existing. Pre-fix a server connected pre-disable and then operator-disabled mid-session via settings reload still served resource reads via its existing CONNECTED client until the next incremental discovery pass called `removeServer`.

Polish:
- R7 #6 line 191: `readBudgetFromEnv` now emits a stderr breadcrumb when env values are invalid (`QWEN_SERVE_MCP_CLIENT_BUDGET=abc`, `QWEN_SERVE_MCP_BUDGET_MODE=foo`). Pre-fix operator typos silently fell through to "no enforcement". Same pattern as the `--require-auth` boot log.
- R7 #7 line 464: extracted `dropRefusalEntry` (4 sites) + `refuseAndLog` (3 sites) helpers. Pure refactor, zero behavior change. The `readResource` refusal path now calls `refuseAndLog` before throwing `BudgetExhaustedError` so operators get the same stderr trail as bulk-discovery refusals.

+ 5 new core regression tests (refusal-cleared-on-success, internal-disabled-gate, discover-throw-disconnects, env-typo-breadcrumb, existing-client-disabled-rejected). 52/52 core tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 8 (#4247 wenshao seventh pass)

Addresses @wenshao's seventh review pass on PR #4247 (gpt-5.5 + DeepSeek/deepseek-v4-pro via Qwen Code /review). One critical transport leak + three soundness/consistency fixes; one optional clarity refactor explicitly deferred.

Critical:
- R8 #1 line 532 (4 duplicate threads): bulk-path transport leak. Mirrors the R7 #3 fix but in `discoverAllMcpTools` instead of the per-server path. Pre-fix: when `connect()` succeeded (transport established) and `discover()` later threw, the bulk catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / WebSocket / HTTP socket for the rest of the daemon's lifetime (`stop()` can't see what we just removed from `this.clients`). Best-effort `await client.disconnect()` added before `clients.delete` + `reservedSlots.delete`. Updated the doc comment that misleadingly claimed `stop()` was the lifecycle owner — true only for slot bookkeeping, not transports.

Soundness:
- R8 #2 line 431: tighten `readBudgetFromEnv` mode-without-budget downgrade. Originally only `enforce` got downgraded to `off` when no budget was set; `warn` mode without a budget threshold reached `emitBudgetTelemetry` with `clientBudget: undefined`, contradicting the JSDoc invariant `mode !== 'off' ⇒ clientBudget defined`. Now both `enforce` AND `warn` downgrade to `off` when no budget is configured. The invariant comment was also weakened to match the actual `?? 0` defense-in-depth (the new R8 #5 constructor downgrade closes the remaining edge case).

- R8 #5 line 302: constructor mirrors the `readBudgetFromEnv` downgrade for the direct `budgetConfig` parameter. All production callers (CLI, `runQwenServe`, env-var fallback) validate upfront, but a future code path that injects `budgetConfig` directly without re-validating would re-introduce the silent fail-open. Defense in depth.

- R8 #4 line 1221: distinguish fresh vs `'already_held'` reservations in `runWithDiscoveryTimeout`'s timeout handler. New private `freshReservations: Set<string>` field marked when `weReservedSlot === true` inside `discoverMcpToolsForServerInternal` and cleared in finally / catch / success. Timeout handler now releases the slot ONLY when `freshReservations.has(serverName)` — meaning the slot was freshly reserved by THIS in-flight call. `'already_held'` reconnect timeouts (a previously-healthy server's transient hiccup) keep the slot so health-monitor retry doesn't have to compete for capacity with new servers admitted during the timeout window. Aligns the timeout handler with the connect-failure catch's `weReservedSlot` semantics — closes the asymmetry wenshao R8 #4 caught.

Deferred:
- R8 #3 line 332 (`tryReserveSlot` `'observed'` return value clarity): optional, non-blocking style improvement that ripples through 3 call sites + many tests for zero behavior change. Worth doing in a focused refactor PR; flagged as deferred polish, not in this fixup.

+ 3 new core regression tests (bulk discover-throw disconnects, warn-no-budget downgrade, constructor enforce downgrade). 679/679 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 9 (#4247 wenshao eighth pass)

Addresses @wenshao's eighth review pass on PR #4247 (glm-5.1 via Qwen Code /review). Six actionable findings adopted; two threads explained as not-actionable (one stale-view, one reviewer hallucination).

Critical / real bugs:
- R9 #2 line 1534: `readResource` lazy-spawn connect-failure catch now does best-effort `await client.disconnect()` BEFORE `clients.delete` + `reservedSlots.delete`. Mirror of R7 #3 (per-server discovery) and R8 #1 (bulk discovery) — closes the same transport-leak class for the third spawn path. Pre-fix: connect() establishing the transport but throwing on a later handshake step would orphan the stdio child / socket.
- R9 #6 line 1521: `readResource` lazy `client.connect()` now wraps in `Promise.race` against `discoveryTimeoutFor(serverConfig)` — same per-server timeout the bulk + incremental paths use. Pre-fix a hung MCP server during a resource-read spawn would block forever and permanently consume a budget slot under enforce mode, cascading into total budget exhaustion. `serverConfig` lookup hoisted to the top of `readResource` so both lazy-spawn and existing-client branches use identical timeout behavior.
- R9 #8 line 1514: `readResource` lazy spawn now calls `this.startHealthCheck(serverName)` after a successful connect. Pre-fix a lazy-spawned server that later disconnected (crash, network) had no automatic reconnect — sat DISCONNECTED until the next readResource or incremental pass. Mirrors `discoverMcpToolsForServerInternal`'s finally-block pattern.

Operator-visibility:
- R9 #7 (general): `readBudgetFromEnv` now writes a stderr breadcrumb when the `(enforce|warn)`-without-budget downgrade fires. Pre-fix a Docker Compose / k8s env that set `QWEN_SERVE_MCP_BUDGET_MODE=enforce` but forgot the matching `_BUDGET=N` would silently boot with enforcement off and `mcp_guardrails` capability advertised — operator only signal was the snapshot's `budgetMode: 'off'`. Now mirrors the R7 #6 invalid-value breadcrumb pattern.

Doc fixes:
- R9 #4 line 81: `McpBudgetConfig.clientBudget` JSDoc now reflects the R4 per-session scope correction. The doc was a leftover from the original "per-workspace" framing — every other doc surface (protocol doc, user doc, type comments on the snapshot cell, capability tag) was rewritten in R4 except this one.
- R9 #5 line 870: `acpAgent.buildBudgetCells` now spells out the `liveCount` (`accounting.total`, CONNECTED only — operator observability) vs `reservedSlots.size` (all reserved including in-flight — enforcement) semantic distinction. The intentional gap was undocumented in the type signatures, JSDoc, and protocol doc; future PR 14b SSE event payloads should reference both.

Not adopted:
- R9 #1 acpAgent:15: claimed "MCP_BUDGET_WARN_FRACTION not exported + getMcpClient* methods don't exist + 4 tsc errors" — verified incorrect: the constant IS exported (mcp-client-manager.ts:61), the 3 methods ARE class members (lines 379, 407, 412), and `npm run typecheck` is clean across all 4 workspaces. Reviewer's tool hallucinated this critical finding.
- R9 #3 mcp:410: reported the bulk-path transport leak that R8 #1 (commit 7228813c5) had already closed. Reviewer was on the pre-R8 commit view.

+ 2 new core regression tests (readResource lazy connect-fail disconnects + R9 #7 stderr breadcrumb). 57/57 core tests + 679/679 focused suite pass. Typecheck + lint clean.

* fixup(serve): address PR 14 review round 10 (#4247 wenshao ninth pass)

Two non-blocking 🟢 nits — both adopted for symmetry / explicitness.

- R10 line 357: constructor downgrade now emits the same stderr breadcrumb the env-var path got in R9 #7. Pre-R10 the `(enforce|warn)`-without-budget downgrade was silent for the direct-`budgetConfig` path, so a future caller bypassing CLI / env-var validation would have shipped a daemon advertising `mcp_guardrails` while silently disabling enforcement. Now boot logs surface the misconfiguration uniformly across all three resolution paths.
- R10 line 1572: documented the `McpClient.disconnect()` cancel-pending-connect contract that the timeout-race cleanup relies on across all three spawn paths (lazy `readResource`, bulk `discoverAllMcpTools`, per-server `discoverMcpToolsForServerInternal`). The bulk path's production stability since #3889 is implicit evidence the contract holds; comment makes the assumption discoverable to the next reader and notes a follow-up unit test would be valuable. No behavior change.

57/57 core tests pass. Typecheck + lint clean.
2026-05-18 12:07:23 +08:00
ChiGao
d07c958bb5
feat(tui): add daemon adapter spike (#4202)
* docs(tui): draft daemon adapter plan

* feat(tui): add daemon adapter spike

* fix(tui): harden daemon adapter event handling

* fix(tui): report daemon prompt failures

* fix(tui): surface daemon terminal failures

* fix(tui): harden daemon adapter state handling

* fix(tui): harden daemon adapter lifecycle

* fix(tui): harden daemon adapter follow-ups

---------

Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
2026-05-18 11:22:39 +08:00
jinye
f44ed09412
feat(serve): preflight and env diagnostics routes (#4175 Wave 3 PR 13) (#4251)
* feat(serve): introduce ServeErrorKind and BridgeTimeoutError (#4175 Wave 3 PR 13)

Lay the type foundation for `/workspace/preflight` and `/workspace/env` (and
the eventual MCP guardrails route) so cells emitted by all three share a
closed `errorKind` taxonomy:

- `SERVE_ERROR_KINDS` literal-list + `ServeErrorKind` union — the seven
  values from #4175 (`missing_binary`, `blocked_egress`, `auth_env_error`,
  `init_timeout`, `protocol_error`, `missing_file`, `parse_error`).
- `BridgeTimeoutError` typed class — `withTimeout` now rejects with this
  rather than a plain `Error`, letting `mapDomainErrorToErrorKind` recognize
  init / heartbeat / extMethod timeouts via `instanceof` instead of
  regex-matching message strings. Message format is preserved bit-for-bit.
- `mapDomainErrorToErrorKind` helper — one place to classify
  `BridgeTimeoutError`, `SkillError`, fs ENOENT/EACCES/EPERM, ModelConfigError
  subclasses (recognized by `name` field — they aren't on the public surface
  of `@qwen-code/qwen-code-core`), `SyntaxError`, plus message-regex fallbacks
  for legacy throw sites (`agent channel closed`, missing CLI entry path).
- `ServeStatusCell.errorKind` tightened from open `string` to the closed
  `ServeErrorKind` union. Backward compatible — PR 12 never assigned the
  field.
- SDK mirrors: `DAEMON_ERROR_KINDS` const + `DaemonErrorKind` type;
  `DaemonStatusCell.errorKind` tightened.

Tests: 11 new unit tests in `status.test.ts` covering each mapping rule plus
the BridgeTimeoutError shape.

No route changes; no behavior changes for any existing path.

* feat(serve): add buildEnvStatusFromProcess helper (#4175 Wave 3 PR 13)

Pure helper that constructs the `/workspace/env` payload from `process.*`
state. No I/O, no ACP roundtrip, no globals beyond `process.env`. The route
itself lands in the next commit.

- `ServeEnvKind` discriminant: `runtime | platform | sandbox | proxy | env_var`
- `ServeEnvCell extends ServeStatusCell` with `name` + optional `present` /
  `value`. Cells with `kind: 'env_var'` are presence-only — `value` is
  ALWAYS omitted to keep secret env vars off the wire even by accident.
- `ServeWorkspaceEnvStatus` envelope: `{ v, workspaceCwd, initialized: true,
  acpChannelLive, cells, errors? }`. `initialized` is structurally `true`
  because env answers from the daemon process directly; `acpChannelLive`
  reports whether a child is up but does not change the payload shape.

Whitelist policy:
- Auth/secret keys (presence-only): OPENAI/ANTHROPIC/GEMINI/GOOGLE/DASHSCOPE/
  OPENROUTER `_API_KEY`, `QWEN_SERVER_TOKEN`.
- Non-secret keys (also presence-only for shape uniformity): base URLs, locale,
  TZ, NODE_EXTRA_CA_CERTS, QWEN_CLI_ENTRY.
- Proxy vars (`HTTP_PROXY`/`HTTPS_PROXY`/`NO_PROXY`/`ALL_PROXY` + lowercase
  variants): credentials stripped via `redactProxyCredentials`, then
  `URL().host` so the wire only carries `host:port`. NO_PROXY is a host list
  rather than a URL so we pass the redacted form verbatim.

SDK mirrors: `DaemonEnvKind`, `DaemonEnvCell`, `DaemonWorkspaceEnvStatus`.

Tests: 9 unit tests covering the proxy-credential redaction, lowercase env
fallback, NO_PROXY pass-through, presence-only `env_var` invariant
(`'value' in cell === false`), whitelist enforcement, runtime tag detection,
and envelope shape.

* feat(serve): add GET /workspace/env route (#4175 Wave 3 PR 13)

Wire `buildEnvStatusFromProcess` from the previous commit through the
bridge, server, and SDK so remote clients can pre-flight the daemon's
runtime environment without spawning an ACP child.

- `workspace_env` capability tag (always advertised on a current daemon).
- `bridge.getWorkspaceEnvStatus()` answers entirely from `process.*` —
  the route never consults ACP. `acpChannelLive` reflects whether a child
  exists but does not change the payload, so an idle daemon and a busy
  one return the same env shape.
- `app.get('/workspace/env', ...)` mirrors PR 12's one-liner pattern.
- SDK: `DaemonClient.workspaceEnv()` returning `DaemonWorkspaceEnvStatus`.
- Docs: bullet in `docs/users/qwen-serve.md` calling out the
  presence-only redaction policy and the no-ACP-spawn guarantee.

Tests: server-level (env returned + `'value' in env_var === false`
assertion), bridge-level (idle and live both answer locally without
hitting ACP extMethod), SDK-level (recording-fetch round-trip on
`/workspace/env`). The `workspace_env` tag is added to the
`EXPECTED_STAGE1_FEATURES` capability list assertion.

* feat(serve): add /workspace/preflight daemon-cells path (#4175 Wave 3 PR 13)

Wire the preflight route. Daemon-level cells are populated unconditionally
from `process.*` and `node:fs`; ACP-level cells fall back to `not_started`
placeholders when no child is alive so a poll never spawns one.

- `workspace_preflight` capability tag.
- `ServePreflightKind` discriminant (12 values: node_version, cli_entry,
  workspace_dir, ripgrep, git, npm — daemon-level — plus auth, mcp_discovery,
  skills, providers, tool_registry, egress — ACP-level).
- `ServePreflightCell extends ServeStatusCell` with `locality: 'daemon' | 'acp'`
  + free-form `detail`. `ServeWorkspacePreflightStatus` envelope.
- `createIdleAcpPreflightCells()` factory: emits the six ACP-level cells with
  `status: 'not_started'` + a uniform `hint` so the bridge can stitch them in
  alongside daemon cells without ever calling ACP.
- `bridge.getWorkspacePreflightStatus()`:
  - Daemon cells via `buildDaemonPreflightCells` (Promise.all over Node-version,
    CLI-entry resolution mirroring `defaultSpawnChannelFactory`, `fs.stat` on
    `boundWorkspace` with ENOENT/EACCES/EPERM mapped to `missing_file`,
    best-effort `canUseRipgrep` / `getGitVersion` / `getNpmVersion` warnings).
  - ACP cells via `requestWorkspaceStatus` — idle factory returns the
    `not_started` placeholders; live path delegates to ACP via the
    `qwen/status/workspace/preflight` ext method (handler lands in next
    commit). Bridge-side timeout / channel-close while consulting ACP folds
    into envelope `errors[]` with `mapDomainErrorToErrorKind` classification;
    daemon cells still render.
- `app.get('/workspace/preflight', ...)` route + JSDoc bullet.
- SDK: `DaemonPreflightKind` / `DaemonPreflightCell` / `DaemonWorkspacePreflightStatus`
  mirrors; `DaemonClient.workspacePreflight()`.

Tests: server-level (route returns the bridge payload), bridge-level (idle
returns 6 daemon + 6 ACP `not_started` cells without spawning a channel),
SDK-level (`workspacePreflight()` round-trip). Capability test updated.

* feat(serve): wire ACP-side preflight cells (#4175 Wave 3 PR 13)

Populate the six ACP-level preflight cells inside the ACP child so
`/workspace/preflight` returns real values for live sessions.

- `extMethod(qwen/status/workspace/preflight, ...)` dispatches to a new
  `buildAcpPreflightCells(config)` private method.
- Five cell builders, each returning a `ServePreflightCell` with
  `locality: 'acp'`:
  - `auth`: `validateAuthMethod(authType, config)` returning non-null
    string → `auth_env_error`. Missing auth method → warning. Throws
    classified via `mapDomainErrorToErrorKind` with `auth_env_error`
    fallback.
  - `mcp_discovery`: rolls up `getMCPDiscoveryState()` + per-server
    `getMCPServerStatus(name)` counts. `connecting > 0` or in-progress
    discovery → warning + `init_timeout`; `disconnected > 0` post-discovery
    → error + `protocol_error`.
  - `skills`: `SkillManager.listSkills()`; SkillError throws are mapped
    via the helper (`PARSE_ERROR` → `parse_error`, `FILE_ERROR` →
    `missing_file`).
  - `providers`: `getAllConfiguredModels()`; empty list with a configured
    `authType` → warning + `auth_env_error`. ModelConfigError throws map
    to `auth_env_error`.
  - `tool_registry`: null registry → error + `protocol_error`. Otherwise
    surfaces tool count.
- `egress`: stays `not_started`. PR 14 plugs in the real probe.
- `errorCell` private helper extended with optional `errorKind` parameter;
  defaults to `mapDomainErrorToErrorKind(error)` so existing call sites
  (`mcp` / `skills` / `providers` envelope errors) automatically gain
  classification.

Tests: 2 new acpAgent tests — preflight returns the six expected ACP cells
with correct locality + statuses; preflight surfaces a `SkillError`
(`PARSE_ERROR`) on the `skills` cell as `errorKind: 'parse_error'`. The
core `vi.mock` block adds a SkillError class for `instanceof` matching
inside `mapDomainErrorToErrorKind`.

* docs(serve): preflight and env protocol section (#4175 Wave 3 PR 13)

Document `/workspace/env` and `/workspace/preflight` end-to-end:

- Common-cell shape: tighten `errorKind` from open `string` to the closed
  `DaemonErrorKind` enum (seven literals from #4175). Add an explicit
  redaction-policy paragraph covering env-var presence-only, proxy
  host:port reduction, and the whitelisted-secrets list.
- Capability-tag list: add `workspace_env` and `workspace_preflight`.
- New `### GET /workspace/env` section with sample payload, `DaemonEnvKind`
  / `DaemonEnvCell` types, and the redaction-policy paragraph spelling
  out which secret env vars are enumerated and how proxy URLs are
  reduced to `host:port`.
- New `### GET /workspace/preflight` section with idle sample payload,
  `DaemonPreflightKind` / `DaemonPreflightCell` types, the seven-value
  `errorKind` semantics table, and the bridge-error fallback contract
  (mid-request ACP channel close → cells drop to `not_started` + envelope
  carries one `errors[]` entry).
- Source-layout table: extend the `status.ts` row to mention the new
  `ServeErrorKind` / `BridgeTimeoutError` / `mapDomainErrorToErrorKind`
  surface; add a new `envSnapshot.ts` row.
2026-05-18 07:29:05 +08:00
qqqys
f84ddd434b
feat(core): fail impossible goals (#4230)
Some checks are pending
Qwen Code CI / Classify PR (push) Waiting to run
Qwen Code CI / Lint (push) Blocked by required conditions
Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* feat(core): fail impossible goals

* fix(core): refine impossible goal judgement

* fix(core): include goal feedback when continuing

* fix(core): clarify impossible goal terminal state

* fix(core): harden impossible goal feedback

* fix(core): log suppressed impossible verdicts

* fix(goal): address review suggestions

* test(goal): cover impossible parsing suggestions
2026-05-18 00:31:51 +08:00
Shaojin Wen
c93d66cd23
fix(serve): align build and integration test coverage (#4248)
* fix(serve): align test coverage with build inputs

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* test(serve): address review feedback

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 00:01:47 +08:00
jinye
60fe594e8f
feat(serve): add read-only status routes (#4241)
* feat(serve): add read-only status routes

Add read-only daemon status endpoints for workspace MCP, skills, providers, session context, and session supported commands.

Expose matching typed SDK helpers and document the new additive v1 status surface.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(serve): harden read-only status snapshots

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(serve): address read-only status review feedback

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 21:37:20 +08:00
jinye
aef35c390e
feat(serve): session metadata and close/delete lifecycle (#4175 Wave 2.5 PR 11) (#4240)
* feat(serve): session metadata and close/delete lifecycle (#4175 Wave 2.5 PR 11)

Add explicit session close and metadata management to the daemon serve
infrastructure, closing the Stage 1 limitation that sessions could only
end via child crash or daemon shutdown.

- DELETE /session/:id — force-closes a live session (cancels active
  prompt, resolves pending permissions, publishes session_closed event)
- PATCH /session/:id/metadata — update mutable displayName
- Enriched GET /workspace/:id/sessions with createdAt, displayName,
  clientCount, hasActivePrompt
- session_closed + session_metadata_updated SDK event types with
  validation, reducer, and terminal event priority
- DaemonClient.closeSession / updateSessionMetadata + session client
  wrappers
- Capabilities: session_close, session_metadata

* fix(serve): address review feedback on session lifecycle PR

- Fix JSDoc on closeSession: clarify that bridge throws SessionNotFoundError
  (SDK absorbs 404 for client-side idempotency)
- Tighten event validators: isSessionClosedData checks closedBy type,
  isSessionMetadataUpdatedData checks displayName type
- PATCH /session/:id/metadata now returns effective stored metadata
  instead of echoing request fields, avoiding ambiguous no-op responses
- Only publish session_metadata_updated event when displayName changes
- Update chooseTerminalEvent comment to reflect session_closed

* fix: address PR 4240 review feedback

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix: address remaining PR 4240 suggestions

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix: update serve sessions test mock

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 20:42:15 +08:00
JerryLee
9abd704e09
fix(cli): record mid-turn queued user prompts (#4215) 2026-05-17 20:21:06 +08:00
jinye
4e06967c2b
feat(serve): mutation gating helper and --require-auth (#4236)
Some checks are pending
Qwen Code CI / Classify PR (push) Waiting to run
Qwen Code CI / Lint (push) Blocked by required conditions
Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* feat(serve): mutation gating helper and --require-auth

Implements issue #4175 Wave 4 PR 15. Adds the centralized
state-changing-route gate that Wave 4 follow-ups (memory CRUD, file
edit, MCP restart, device-flow auth) will reuse, plus the
`--require-auth` deployment knob that hardens the loopback developer
default for shared dev hosts / CI runners.

- `createMutationGate({ tokenConfigured, requireAuth })` factory in
  serve/auth.ts — per-route middleware with a 4-cell behavior matrix:
  pass-through under `requireAuth` or any token configured;
  `401 token_required` for `strict: true` routes on no-token loopback
  defaults; baseline pass-through otherwise.
- Existing Wave 1-2 mutation routes (POST /session, /session/:id/{load,
  resume,prompt,cancel,model}, /permission/:requestId) opt into the
  default non-strict factory call as the centralization marker. Wave 4
  routes will pass `{ strict: true }` to require a token even on
  loopback.
- `--require-auth` CLI flag + `ServeOptions.requireAuth`. Boot refuses
  without a token; closes the `/health` exemption when on so loopback
  `/health` also requires bearer auth; stderr breadcrumb so the
  hardened mode is visible in journald/docker logs.
- Conditional `require_auth` capability tag advertised only when the
  flag is on. New `CONDITIONAL_SERVE_FEATURES` registry primitive so
  future per-deployment toggles follow the same shape.
- 5 new unit tests in auth.test.ts covering the gate matrix; 5 added
  in server.test.ts for capability advertisement, conditional tag,
  /health 401 under --require-auth, and runQwenServe boot
  refusal + happy path. 245/245 serve tests pass; typecheck + eslint
  clean.

Refs: #4175

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4236 review feedback

Three small follow-ups from the automated reviewers on PR #4236:

1. **Drop misleading `--require-auth` from `token_required` error
   message** (Copilot inline auth.ts:262). The strict-mode 401
   listed three remediations but `--require-auth` is paired-required
   with a token at boot — naming it standalone would loop the operator
   into a different boot error. Keep the two valid standalone fixes
   (env var, --token); add inline note explaining the omission.
   `auth.test.ts` regex updated to `not.toMatch(/--require-auth/)`
   to anchor the new wording.

2. **Mention `/health` gating in `--require-auth` CLI description**
   (auto-reviewer Medium #2). Operators flipping the flag without
   reading the protocol doc would get paged when k8s/Compose probes
   start 401-ing. One sentence in the yargs description prevents that.

3. **Drift insurance comment between registry and
   `CONDITIONAL_SERVE_FEATURES`** (auto-reviewer Low #3). Document
   the four-step procedure for adding a new conditional tag so a
   future contributor doesn't update only the registry and silently
   advertise the tag unconditionally. Notes the Map<predicate>
   refactor as the right move when a second tag lands.

Deferred (not in this fix-up):
- Module-level PASSTHROUGH singleton (High #1) — micro-optimization,
  unmeasurable.
- Map<feature, predicate> for conditional features (High #2) —
  premature abstraction with one tag.
- Per-route `// non-strict marker` comments (Medium #1) — noise.
- `@see` cross-ref in types.ts (Low #2) — sugar.
- JSDoc bullet-list vs table (Low #1) — current format is fine.

Refs: #4175 #4236

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4236 round-2 review feedback

Five small follow-ups from @wenshao + DeepSeek (via Qwen Code /review)
on PR #4236:

1. **Map<predicate> refactor for `CONDITIONAL_SERVE_FEATURES`**
   (review threads #3254467192 + #3254485912). Two reviewers asked
   for the same shape on the grounds that the `Set` + per-feature
   `if`-branch needed FOUR coordinated changes per new conditional
   tag and silently fail-CLOSED when the branch was missed. The Map
   collapses the predicate-decision and the set-membership into one
   entry per feature — adding a new conditional tag is now two
   coordinated changes (registry + Map entry) and a missing predicate
   is a TypeScript error rather than a silent omission. JSDoc
   updated.

2. **Drift-insurance test that iterates `CONDITIONAL_SERVE_FEATURES`**
   (review thread #3254467192 option 1, layered on top of #1).
   `server.test.ts` now walks every Map entry and asserts the
   predicate accepts/rejects as expected; future entries that don't
   add an assertion branch fail the test loudly so a missing
   predicate cannot ship silently. Adoption-of-record for the Map
   shape rather than relying on a hand-maintained invariant.

3. **Cache `strictDenier` for allocation symmetry** (review thread
   #3254467193). Wave 4 PRs will mount strict mode on multiple
   routes; without the cache each `mutate({strict:true})` call would
   allocate a fresh 401 closure. Now both the passthrough and the
   strict denier are pre-built singletons. Identity assertion in
   `auth.test.ts` anchors the cache so a future change that loses it
   surfaces in CI.

4. **Doc cosmetic — extra blank line in qwen-serve.md** (review
   thread #3254467198). Single blank line between the `>` quoted
   example and the following non-quoted bash block now.

5. **Doc correctness — `require_auth` is post-auth confirmation**
   (review thread #3254485910 from DeepSeek). When `--require-auth`
   is on, the global `bearerAuth` middleware gates every route
   including `/capabilities`, so an unauthenticated client cannot
   pre-flight `caps.features` to discover that auth is required —
   the discovery surface is the 401 response body itself. Both
   `qwen-serve.md` and `qwen-serve-protocol.md` rewritten to
   describe the tag as a post-authentication confirmation, matching
   the auth.ts JSDoc which already stated this correctly.

Trade-offs documented (no code change):

- **Body-parser ordering** (review thread #3254485915 from DeepSeek)
  noted as a comment block in `auth.ts`. Strict-mode 401 fires AFTER
  `express.json()` because the gate is per-route middleware. On
  loopback no-token defaults a strict route therefore parses the
  request body before refusing it — bounded by
  `express.json({limit: '10mb'})` × `--max-connections` (256
  default). Strict routes Wave 4 actually adds carry small bodies in
  legitimate use, so this isn't a production hot path. Future routes
  accepting large bodies should lift the gate to app-level (maintain
  a strict-path Set in `createServeApp`); flagged as a Wave 4
  follow-up rather than re-architecting the helper.

- **`bearerAuth` body-shape inconsistency** (review thread
  #3254467197 from @wenshao) flagged as a Wave 4 cross-PR
  follow-up. `bearerAuth` returns `{error: 'Unauthorized'}` while
  the strict gate returns `{code: 'token_required', error: '...'}`;
  SDK clients have to branch on both shapes. Standardizing
  `bearerAuth` to also carry a `code` field is orthogonal to this
  PR's scope.

Validation: 260/260 cli serve tests pass (was 258 — added the drift
insurance test + strict denier identity test); typecheck + eslint
clean.

Refs: #4175 #4236

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 20:10:37 +08:00
易良
eef06ce376
feat(cli): add structured memory diagnostics JSON (#3785)
* feat(cli): add memory diagnostics doctor command

* fix(core): platform-aware maxRSS conversion and accurate risk message

- Extract platform detection before building diagnostics so the correct
  unit conversion can be applied: multiply by 1024 on Linux (where
  process.resourceUsage().maxRSS is in KB) but leave the value unchanged
  on macOS/Windows (where it is already in bytes).
- Correct the native-memory-pressure risk message to accurately state
  that the threshold is 2× heap used, not just "larger than heapUsed".
- Add a dedicated test to assert that maxRSS is not multiplied on a
  non-Linux platform (darwin).

All 3 core and 9 CLI tests pass; typecheck clean.

Agent-Logs-Url: https://github.com/QwenLM/qwen-code/sessions/9b413337-68ed-4d5c-af99-0d42378900c3

* test(core): cover active request memory risk

* fix(cli): address memory diagnostics review feedback

* fix(cli): harden memory diagnostics review fixes

* fix(memory-diagnostics): tighten risk thresholds and expand readable output

- Add 64MB absolute floor on native-memory-pressure so cold processes don't trip
  the 2x ratio check; raise active-handles threshold from 100 to 256
- Show detachedContexts, nativeContexts, maxRSS, CPU times, smapsRollup
  availability, and v8HeapSpaces summary in the readable /doctor memory output
- Validate unknown memory subcommand args with a usage hint instead of silently
  dropping them
- Wrap human-readable strings in t(...) for i18n parity with the rest of doctor
- Advertise the memory subcommand via /doctor argumentHint while keeping
  acceptsInput false so the parent still auto-submits
- Document _getActiveHandles/_getActiveRequests as undocumented Node internals
- Update tests for new thresholds, expanded output, unknown-arg path, and
  abort-during-json

* fix(cli): harden memory doctor diagnostics

* fix(core): correct maxRSS byte handling and heapRatio consistency

- Remove incorrect * 1024 multiplier for maxRSS on Linux (Node.js >=14.10 returns bytes on all platforms)
- Use v8HeapStats.usedHeapSize for heapRatio to avoid cross-API inconsistency
- Update test expectations and rename "does not multiply" test

* fix(cli): resolve rebase conflicts in memory diagnostics

- Rename local formatMemoryDiagnostics to formatCoreDiagnostics to avoid
  naming conflict with the imported utility from memoryDiagnostics.js
- Update Session.test.ts to use objectContaining for _meta field added
  in recent main commits
- Align doctorCommand.test.ts assertions with current parent command
  state (argumentHint includes --sample/--snapshot from main)

* fix(core): use null instead of undefined for optional probes, deduplicate active count helpers

- optionalProbe/optionalSyncProbe now return null on failure so
  JSON.stringify preserves the keys instead of silently omitting them.
- Merge getActiveHandlesCount/getActiveRequestsCount into a single
  parameterized getProcessInternalCount helper.
- Update MemoryDiagnostics interface: v8HeapSpaces, openFileDescriptors,
  smapsRollup are now T | null instead of T | undefined.

* fix(cli): finish memory diagnostics review fixes

* fix(cli): address memory diagnostics review feedback

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-05-17 19:52:46 +08:00
Yan Shen
9985d91e08
feat(cli): add configurable plansDirectory for Plan Mode (#4062)
* feat(cli): add configurable plansDirectory for Plan Mode

Add a plansDirectory setting that allows users to define a custom
directory for approved Plan Mode files. Relative paths are resolved
against the project root and validated to prevent path traversal.

- Storage: add isPathWithinDirectory() with realpathSync-based symlink
  resolution to prevent traversal bypass attacks (direct, intermediate,
  and cross-drive)
- Config: cache plansDir at construction time, use atomic write
  (write-temp then rename) to prevent corrupted plan files on crash
- CLI: respect bareMode by clearing plansDirectory in minimal mode
- Docs: document plansDirectory with requiresRestart and gitignore hint
- Tests: 26 new tests covering path validation, symlink attacks
  (direct and intermediate), Windows cross-drive paths, mixed
  separators, and configuration integration

Closes #3548

* fix(core): align symlink test with return value

* fix(core): harden plans directory handling

* fix(config): address PR #4062 review findings for plansDirectory

- Handle EXDEV during atomic plan writes (cross-device rename fallback)

- Sanitize session IDs to prevent path traversal in plan filenames

- Expand tilde (~) in configured plansDirectory paths

- Preserve plansDirectory in bare mode

- Add EACCES/EPERM handling to getPlanFileNames with user-visible warnings

- Close TOCTOU gap with post-write path containment validation

- Fix docs to clarify plansDirectory is a top-level key

- Add happy-path I/O tests for configured plansDirectory
2026-05-17 19:43:24 +08:00
jinye
d2d426fad0
feat(serve): SSE replay sizing + slow_client_warning backpressure (#4175 Wave 2.5 PR 10) (#4237)
* feat(serve): SSE replay sizing + slow_client_warning backpressure

#4175 Wave 2.5 PR 10. Closes the SSE replay / backpressure knobs
called out in #3803 §02 so chatty Stage 1 sessions get an honest
reconnect window and operators get a heads-up signal before clients
are summarily evicted.

- **`DEFAULT_RING_SIZE` 4000 → 8000.** Per-session replay ring depth
  now matches the #3803 §02 target for chatty sessions.
- **`--event-ring-size <n>`** CLI flag (default 8000) lets operators
  tune the ring per daemon. Threaded `ServeOptions` →
  `BridgeOptions.eventRingSize` → both `new EventBus()` construction
  sites (fresh sessions + restore path). Validation is fail-CLOSED
  (positive finite integer; 0 / NaN / negative throw at boot).
- **`slow_client_warning` SSE frame.** When a subscriber's queue
  crosses 75% full the bus force-pushes a synthetic
  `slow_client_warning` to that subscriber once per overflow
  episode, carrying `{queueSize, maxQueued, lastEventId}`. The flag
  re-arms after the queue drains below 37.5% (hysteresis, no flap
  near threshold). If the queue actually overflows after the
  warning, the existing `client_evicted` terminal frame path still
  fires. Like `client_evicted`, the warning has no `id` (synthetic
  frame; must not burn a sequence slot for other subscribers).
- **`?maxQueued=N`** query param on `GET /session/:id/events`
  (range `[16, 2048]`, default 256). Lets cold reconnect clients
  pre-size their per-subscriber backlog so a large `Last-Event-ID:
  0` replay doesn't trip the warning on the first publish. Range
  rationale: lower bound 16 (smaller is useless for any replay);
  upper bound 2048 (so a single subscriber can't pin ~1 MB just by
  asking). Out-of-range / non-decimal returns `400
  invalid_max_queued` BEFORE opening the SSE stream — clean 4xx
  beats half-opening a stream + emitting a `stream_error` (which
  EventSource would auto-reconnect on).
- **`slow_client_warning` capability tag** — single source of truth
  for the warning frame + `?maxQueued` query param + ring-size
  knob. Old daemons silently lack all of these; pre-flight via
  `caps.features`.
- **SDK extensions** (`@qwen-code/sdk`): typed
  `DaemonSlowClientWarningEvent` (added to known event union and
  `DaemonStreamLifecycleEvent`); schema-validated by a new
  `isSlowClientWarningData` predicate; reducer
  (`reduceDaemonSessionEvent`) increments `slowClientWarningCount`
  + stores `lastSlowClientWarning`. Warning is **non-terminal** —
  `alive` stays true (only `client_evicted` / `stream_error` /
  `session_died` close the stream). Re-exported from the public
  SDK entry.
- **Docs**: `qwen-serve-protocol.md` updates the features list (adds
  `slow_client_warning` and the previously-missing `client_identity`
  to match reality post-#4231), documents the `?maxQueued` query
  param, adds the warning frame to the event table, and notes the
  new default ring size. `qwen-serve.md` adds the `--event-ring-size`
  flag row.

Tests: 19 eventBus (4 new: warning at 75%, once per episode,
no `id` on the synthetic frame, hysteresis re-arm), 106 bridge
(2 new: validate eventRingSize accept/reject), 111 server (4 new:
?maxQueued accept/absent/non-decimal/out-of-range +
EXPECTED_STAGE1_FEATURES update), 14 SDK daemonEvents (2 new:
schema validation + non-terminal reducer behavior). 321 focused
tests total, all green.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* refactor(serve): adopt PR #4237 review feedback (eventBus polish)

Address the actionable items from the Qwen Code review bot's pass
on PR #4237:

- Pre-compute `warnThreshold` / `warnResetThreshold` per
  `InternalSub` at `subscribe()` time so `publish()`'s per-event
  hot path is one integer compare per subscriber instead of a
  multiply + compare. The `!warned` short-circuit still collapses
  the steady state to a single boolean read; this just shaves a
  multiply when the threshold check actually fires.
- Document the back-of-queue ordering choice for the synthetic
  `slow_client_warning` frame in `EventBus.publish()`: front-push
  was considered but mid-stream front-insertion would mis-count
  `forcedInBuf` in `BoundedAsyncQueue.next()`, and `forcePush`
  already short-circuits via `resolvers.shift()` for the
  active-consumer case — the back-of-queue path only matters for
  stalled consumers, who can't drain regardless of warning
  position.
- Reuse the existing `collect()` helper in the "default ring size
  8000" test for consistency with the rest of the file; the new
  test also tightens the assertion by checking that the first
  retained event id is 2 (id=1 dropped by the ring) and the last
  is 8001.
- Soften the "~500 B per session" magic number in
  `BridgeOptions.eventRingSize`'s JSDoc to a qualitative
  description (each retained `BridgeEvent` is a reference plus its
  serialized payload; ceiling scales as
  `ringSize × average-event-size`).

Rejected:
- Bot's claim that the error JSON contains `\`...\`` escape
  sequences — bot misread the JS template-literal source as the
  wire output; `JSON.stringify` does not escape backticks, and
  the existing `cwd` error messages use the same style.
- Bot's "use `Record<string, never>` instead of `[key: string]:
  unknown`" suggestion on `DaemonSlowClientWarningData` — every
  other event-data type in `sdk-typescript/src/daemon/events.ts`
  carries the same index signature for additive-field
  compatibility.
- Bot's "features list breaks alphabetical order" — the
  capability list is grouped by protocol lifecycle (health →
  capabilities → session lifecycle → events → permissions), not
  alphabetical.

Tests: 139 focused tests across eventBus + httpAcpBridge + SDK
daemon events — all passing. Behavior unchanged; this is
hot-path micro-opt + comment polish only.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve): correct queue tagging + plumb maxQueued through SDK

Address both P2 findings from the Codex review pass on PR #4237.

**Bug 1: `BoundedAsyncQueue.forcedInBuf` position-invariant break**

The previous `forcedInBuf` counter only tracked LIVE-vs-FORCED
correctly when all forced entries lived at the FRONT of the buffer
(subscribe-time `Last-Event-ID` replay). The new mid-stream
`slow_client_warning` path force-pushes to the BACK of the queue
while the queue is still open, which the existing accounting was
not designed for:

  - publish 6 events at maxQueued=8 → 75% threshold trips →
    force-push warning at the back → buf=[1..6, warning],
    forcedInBuf=1.
  - consumer shifts `1` → forcedInBuf decremented to 0 (incorrect:
    `1` was a live frame, not the forced one).
  - consumer drains 2..6 + warning → buf=[], forcedInBuf=0, true
    live count = 0, but `size` getter and `push()` cap check then
    use `buf.length - forcedInBuf` which drifts over subsequent
    refills, causing premature warn / eviction before the cap is
    actually reached.

Replace the position-dependent counter with a per-entry
`{value, forced}` tag. `liveCount` is incremented in `push()` /
decremented in `next()` only when the shifted entry was non-forced
— position becomes irrelevant. `size` getter returns `liveCount`
directly. The class doc comment is rewritten to call out that the
new tag is the position-independent replacement for the old
"forced frames must stay at the front" invariant.

Regression test in `eventBus.test.ts` reproduces the codex trace
(warn at 75%, drain past warning, refill to cap) and asserts no
premature eviction.

**Bug 2: SDK does not expose `?maxQueued`**

`docs/users/qwen-serve.md` and `docs/developers/qwen-serve-protocol.md`
both document `?maxQueued=N` as something SDK clients can request,
but `SubscribeOptions` on `DaemonClient` only declared `lastEventId`
+ `signal`, and `subscribeEvents()` always fetched `/events` without
a query string. Typed-SDK consumers had no way to opt in without
hand-crafting URLs.

  - Add `SubscribeOptions.maxQueued?: number` with JSDoc noting the
    daemon range `[16, 2048]` and the pre-flight requirement on
    `caps.features.slow_client_warning`.
  - `DaemonClient.subscribeEvents` builds the URL with an optional
    `?maxQueued=<n>` segment. No client-side range validation —
    the daemon's `parseMaxQueuedQuery` is the source of truth and
    returns structured `400 invalid_max_queued`; duplicating the
    bounds in two layers would diverge on the next tweak.
  - `DaemonSessionSubscribeOptions extends SubscribeOptions` so the
    new field flows through `DaemonSessionClient` automatically.

Three new SDK tests:
  - subscribeEvents appends `?maxQueued=N` when set
  - omits the query string when absent (existing behavior preserved)
  - propagates a `400 invalid_max_queued` unchanged

Tests: 214 focused tests across eventBus / bridge / SDK
DaemonClient / DaemonSessionClient / daemonEvents, plus 111 in the
server suite. All green; the new eventBus regression case proves
the position-invariant fix.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* refactor(serve): adopt PR #4237 copilot review feedback

Address 6 of 8 copilot-reviewer findings on PR #4237; the other 2
(#1 forcedInBuf live-size corruption, #5 SDK lacks maxQueued) were
already fixed in bae42c88b — replied on the threads with the
commit hash.

- **[2] server.ts:1068** — `?maxQueued=` (present-but-empty) now
  fails closed with `400 invalid_max_queued` instead of silently
  falling back to the default queue cap. The API documents
  fail-closed for any malformed value before opening SSE, so an
  empty string is unambiguously malformed. New server.test.ts
  case locks this in.
- **[3] commands/serve.ts:93** — CLI help text for
  `--event-ring-size` no longer mis-shapes `Last-Event-ID` as a
  query parameter. It is an HTTP header, and the daemon's SSE
  route does not parse a `?Last-Event-ID=` query.
- **[4] docs/developers/qwen-serve-protocol.md:351** — clarify
  that `?maxQueued=N` controls the LIVE-event backlog cap.
  Replay frames are force-pushed and exempt from the cap; what
  consumes it is live events that arrive while the subscriber is
  still draining a cold-reconnect replay. Bumping for cold
  reconnects is still the right answer, but for the live tail,
  not for the replay frames themselves.
- **[6] eventBus.ts:214** — stale `ringSize=4000` performance
  comment updated to the new `ringSize=8000` default with a note
  about the O(n) `shift()` cost scaling.
- **[7] sdk-typescript events.ts:492** — `isSlowClientWarningData`
  now uses the existing `isFiniteNumber` helper instead of bare
  `typeof === 'number'`. Mirrors the sibling predicates and
  rejects `NaN` / `Infinity` payloads as schema garbage. New
  daemonEvents.test.ts assertions cover both.
- **[8] server.ts:127** — `createServeApp`'s default-bridge
  construction now also forwards `opts.eventRingSize` to
  `createHttpAcpBridge`, symmetric with the `runQwenServe.ts`
  path. Direct embeds / tests that called `createServeApp`
  without supplying their own bridge but did pass
  `ServeOptions.eventRingSize` were silently getting the
  default 8000 ring.

Tests: 326 focused tests across eventBus / bridge / SDK
DaemonClient / DaemonSessionClient / daemonEvents / server. All
green; the new server.test.ts case + the extended
daemonEvents.test.ts assertions cover the tightened guards.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* refactor(serve): adopt PR #4237 wenshao round-2 review feedback

Six adopted findings from @wenshao's second review pass on
PR #4237. The seventh ([10] forcedInBuf 3rd case invariant) was
already fixed in bae42c88b — replied on that thread.

- **[9] + [14] server.ts** — Sanitize attacker-controlled values
  before stderr interpolation in both `parseMaxQueuedQuery` and
  `parseLastEventId`. New `safeLogValue()` helper uses
  `JSON.stringify` to escape control characters (`\n`/`\r`/…) so a
  URL-encoded newline in `?maxQueued=%0a` can't inject extra log
  lines into journald/Loki/Splunk pipelines. Matches the
  `workspace_mismatch` sanitization style in `sendBridgeError`.
  Fixed in both helpers (the sibling pre-existing
  `parseLastEventId` had the same shape) so the file stays
  consistent.

- **[11] httpAcpBridge.ts** — `!Number.isFinite(eventRingSize)`
  was redundant: `Number.isInteger(NaN)` and
  `Number.isInteger(Infinity)` both return `false`, so the sibling
  `!Number.isInteger` already catches both. Drop the dead guard.

- **[12] httpAcpBridge.ts** — Add soft upper bound
  `MAX_EVENT_RING_SIZE = 1_000_000` on `eventRingSize` to catch
  operator typos (`--event-ring-size 80000000` vs `8000000`). At
  ~500 B per `BridgeEvent` an 1M-frame ring already pins ~500 MB
  per session — well past any realistic workload. Not a security
  boundary (operator-controlled flag), pure typo defense. Existing
  bridge construction test extended with an `80_000_000` case.

- **[13] commands/serve.ts** — CLI `--event-ring-size` flag now
  sources its default from `DEFAULT_RING_SIZE` (imported from
  `serve/eventBus.js`) instead of the hardcoded literal `8000`.
  Without this, a future bump of the bus default would silently
  not take effect for daemons launched through the CLI because
  the flag always overrides — single source of truth fixes that.

- **[15] eventBus.ts** — Drop unreachable `event.id ?? this.lastEventId`
  fallback in the `slow_client_warning` frame. `event` is locally
  constructed at the top of `publish()` with `id: this.nextId++`
  and is guaranteed defined. Use `event.id as number` directly +
  an inline note about the invariant.

Tests: 197 (eventBus 20 / bridge 107 / SDK DaemonClient 57 / SDK
daemonEvents 14) + 112 server. All green; the new upper-bound
bridge case + the existing log assertions pin the changed
behaviors.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 19:30:43 +08:00
jinye
0a4a08e443
feat(serve): add client heartbeat (#4175 Wave 2.5 PR 9) (#4235)
* feat(serve): add client heartbeat route

Adds POST /session/:id/heartbeat plus SDK helpers so long-lived
adapters (TUI/IDE/web) can refresh the daemon's last-seen
bookkeeping. Bridge stores per-session and per-client timestamps
behind a getHeartbeatState() snapshot accessor that PR 12
read-only diagnostics and PR 24 revocation policy will consume.

- Capability tag: client_heartbeat (advertised on /capabilities.features)
- Identified clients must echo X-Qwen-Client-Id; the bridge validates
  the id BEFORE bumping any timestamp so a forged id can't mask
  client absence
- Per-client entries are dropped together with the registration
  ref-count in unregisterClient, so churn doesn't leak stale ids
- getHeartbeatState returns a snapshot Map; mutating it does not
  leak into bridge state
- Anonymous heartbeats bump only the per-session watermark

Errors mirror the rest of the routes — 404 SessionNotFoundError, 400
invalid_client_id (header malformed or unknown for this session).

Roadmap PR 9 from #4175. Depends on PR 7 (#4231 client identity,
merged) for the trusted clientId registry.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* feat(sdk): re-export HeartbeatResult from package root

The published @qwen-code/sdk only exposes the root entrypoint via
`exports`; daemon subpath imports are not part of the public API.
Adding HeartbeatResult to packages/sdk-typescript/src/daemon/index.ts
made it reachable internally but not for downstream consumers writing
`import type { HeartbeatResult } from '@qwen-code/sdk'` — every other
daemon result type (PromptResult, SetModelResult, DaemonSession, etc.)
is forwarded through the root barrel, so HeartbeatResult was the only
hole in the heartbeat helper's public surface.

Inserted alphabetically between DaemonStreamLifecycleEvent and
KnownDaemonEvent to match the existing ordering convention.
2026-05-17 18:57:28 +08:00
jinye
07e0e82258
feat(serve): advertise typed_event_schema + pin SDK public surface (#4175 PR 4 follow-up) (#4226)
* feat(serve): advertise typed_event_schema capability

Follow-up to #4217 (`feat(protocol): add typed daemon event schema v1`,
Wave 1 PR 4 of #4175), which landed the SDK-side typed schema +
`KnownDaemonEvent` union + reducer but did not register a daemon-side
capability tag for it. Without the tag, non-SDK clients (web debug
UI, third-party adapters, channel/IDE backends not yet on
`@qwen-code/sdk`) have no way to detect at the protocol envelope
level that the daemon promises to emit only `KnownDaemonEvent`-shaped
frames — they would either pin against SDK version, or pre-flight
every frame defensively.

Add `typed_event_schema: { since: 'v1' }` to `SERVE_CAPABILITY_REGISTRY`,
inserted right after `session_events` (the route that delivers the
frames whose schema this tag describes). The capability is purely
informational — `narrowDaemonEvent`/`asKnownDaemonEvent` already
fall back to "unknown" for older daemons that don't advertise the
tag, so the SDK does not gate any behavior off the tag.

Sync `EXPECTED_STAGE1_FEATURES` (server.test.ts) and the integration
test array (qwen-serve-routes.test.ts) with the registry order, the
same lockstep discipline #4214 codified.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* test(sdk): pin typed event surface at the public SDK entry, point DaemonSessionClient docstring at it

Two small follow-ups to #4217 (Wave 1 PR 4 of #4175).

1. Public-entry regression fence

   `@qwen-code/sdk` is a single-entry package: `package.json.exports`
   only exposes `.` (`dist/index.{cjs,mjs,d.ts}`), and the bundle
   is built from `src/index.ts`. Symbols re-exported only from
   `src/daemon/index.ts` are unreachable to consumers unless they
   are also forwarded by `src/index.ts`. #4217 forwards the typed
   event schema correctly today, but the two-layer chain has no
   compile-time test pinning it — a future daemon export that lands
   in `src/daemon/index.ts` but is missed by `src/index.ts` would
   ship invisibly.

   Add `test/unit/daemon-public-surface.test.ts` that imports
   `* as Public from '../../src/index.js'`, asserts at runtime that
   every PR 4 value is `typeof === 'function'` (or a primitive of
   the expected shape), round-trips a raw `DaemonEvent` through the
   public `asKnownDaemonEvent` to prove the wire-up actually works,
   and compile-imports every PR 4 type so any drift fails to build.

2. DaemonSessionClient docstring pointer

   The class docstring already deferred typed event consumption to
   "the protocol schema layer" without a concrete pointer. Now that
   #4217 has put `asKnownDaemonEvent` and `reduceDaemonSessionEvent`
   in `./events.js`, name them so future readers can find the
   typed surface without grepping. No code change.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
2026-05-17 18:43:38 +08:00
kkhomej33-netizen
a5e4839e07
fix(cli): restore ACP prompt counter on resume (#4233) 2026-05-17 18:32:44 +08:00
ChiGao
c25e22b575
feat(serve): add session-scoped permission route (#4232)
Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
2026-05-17 17:48:30 +08:00
ChiGao
4d9cbe49c0
feat(serve): add daemon-stamped client identity (#4231)
* feat(serve): add daemon-stamped client identity

* fix(serve): harden daemon client identity handling

---------

Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
2026-05-17 16:19:30 +08:00
kkhomej33-netizen
605e5eea16
fix(cli): include skill base dir in slash commands (#4224) 2026-05-17 15:52:29 +08:00
jinye
2453b82add
[codex] Add daemon session load/resume (#4222)
* feat(serve): add daemon session load resume

Adds HTTP and SDK support for restoring persisted daemon sessions through load/resume routes, including replay buffering for load and guarded concurrent restore handling.

Refs #4175

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(serve): address review feedback on daemon session load/resume

- Gate `defaultEntry` claim in `restoreSession` on
  `defaultSessionScope === 'single'`, mirroring `doSpawn`. Without the
  gate, a restored session silently became the omitted-scope attach
  target on `'thread'`-default daemons.
- Rename advertised capability `session_resume` to
  `unstable_session_resume` to match the underlying ACP method
  (`connection.unstable_resumeSession`). `session_load` stays stable.
- Seed `lastEventId: 0` in `DaemonSessionClient.resume`, symmetric with
  `load`. The agent's `unstable_resumeSession` schedules an
  `available_commands_update` via `setTimeout(0)`; without the seed the
  SDK consumer would miss that frame.
- Add HTTP-level test for the `RestoreInProgressError → 409` envelope.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* docs(serve): adopt review feedback comments on session load/resume

- Cross-reference the `POST /session` disconnect-cleanup rationale
  from `restoreSessionHandler`'s `!res.writable` branch so future
  maintainers find the BQ9tV race + tanzhenxin attach-rollback
  context without grep.
- Document `DaemonSessionState.{models, modes, configOptions}` in
  the SDK so callers can narrow to the ACP `SessionModelState` /
  `SessionModeState` / `SessionConfigOption` shapes.
- Add JSDoc on `DaemonClient.restoreSession` explaining why
  `loadSession` and `resumeSession` collapse into one transport.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve): preserve restore state and harden in-flight restore races

Address the four Critical findings from PR #4222 review (wenshao):

- Coalesced restore waiters now observe the same ACP state the
  original restore caller did. `state: {}` in `restoreSession`'s
  coalesce branch was clobbering the spread `restored.state`, so
  concurrent callers got different payloads based purely on timing.
  Cache the load/resume response on `SessionEntry.restoreState` and
  return it from both the existing-byId early return and the
  coalesce branch.
- Drop the `defaultEntry` promotion on restore. Explicit
  `session/load` / `session/resume` is "give me THIS id"; it must
  not become the implicit attach target for subsequent omitted-id
  `POST /session` callers under `single` scope. Reserves
  `defaultEntry` for sessions created through `doSpawn` only.
- Reserve coalesced attaches synchronously via
  `InFlightRestore.coalesceState.count` so the spawn owner's
  `requireZeroAttaches` disconnect-reaper sees a non-zero
  `attachCount` on the freshly registered entry and skips the
  kill. Without this, B's `attachCount++` happened after `await
  inFlight.promise`, leaving a window where A's HTTP-disconnect
  cleanup could reap the session out from under B.
- Include `pendingRestoreIds` in the `killSession` channel-teardown
  decision. The last live session leaving while a restore is
  in-flight on the same channel would otherwise SIGTERM the
  channel mid-restore.
- Bump `RestoreInProgressError`'s `Retry-After` from 1s to 5s
  (matches `SessionLimitExceededError`); under the default
  `initTimeoutMs` of 10s, 1s pushed clients into tight loops.

Tests: new bridge cases covering state propagation through
coalesce, the spawn-owner-disconnect race, the
pendingRestoreIds-aware channel teardown, and the no-promote-
on-restore invariant. Existing "attaches twice" test rewritten
to assert the cached restore state propagates.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* test(serve): cover acpAgent load/resume + restore route error mappings

Close the test-coverage gaps wenshao called out in PR #4222 review:

- acpAgent.test.ts gains a `QwenAgent loadSession /
  unstable_resumeSession` block that locks down the new contract
  end-to-end at the agent layer:
  * `loadSession` missing persisted session → throws
    `RequestError.resourceNotFound("session:<id>")` (code -32002
    + `data.uri`).
  * `loadSession` existing session → returns LoadSessionResponse
    AND triggers `session.replayHistory(messages)` so SSE
    subscribers see the persisted turns.
  * `unstable_resumeSession` missing session → same
    resourceNotFound contract.
  * `unstable_resumeSession` existing session → returns the
    response WITHOUT replaying history (resume restores model
    context internally; UI replay is intentionally suppressed).
  Required extending the mocked `RequestError` with
  `resourceNotFound`, and mocking `SessionService` per case.
- server.test.ts adds the missing restore-route wire mappings:
  `WorkspaceMismatchError → 400 workspace_mismatch` and
  `SessionLimitExceededError → 503 + Retry-After: 5`. Combined with
  the existing 409 case for `RestoreInProgressError`, the route
  layer now has full structured-error coverage.
- Updated the 409 test's `Retry-After` expectation from `1` to `5`
  to match the bumped retry hint.

Disconnect-cleanup tests for the restore route were intentionally
not added — the cleanup branch is line-for-line identical to
`POST /session`'s handler (which itself ships without route-level
disconnect tests due to flaky supertest + Node http close-event
timing).

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* docs(serve): document daemon session load/resume routes

Sync the docs to the routes that landed via PR #4222:

- `docs/developers/qwen-serve-protocol.md`:
  * Add `session_load` and `unstable_session_resume` to the
    advertised features list, with a note on the `unstable_`
    prefix mirroring ACP's underlying method name.
  * Document `POST /session/:id/load` and `POST /session/:id/resume`
    — request body, response shape (including the cached `state`
    field that late attachers observe), and the full error
    envelope: 404 unknown id, 400 workspace_mismatch, 503
    session_limit_exceeded (counts in-flight restores), 409
    restore_in_progress (cross-action race).
  * Note the SSE replay ring bound (4000 frames default) and the
    "subscribe immediately after load" guidance for long histories.
- `docs/users/qwen-serve.md`:
  * Add a "Loading and resuming a persisted session" section with
    the SDK example (`DaemonSessionClient.load` /
    `DaemonSessionClient.resume`) and the load-vs-resume
    decision table.
  * Update the durability model — sessions are still ephemeral
    across daemon restarts in Stage 1, but persisted sessions on
    disk can now be reloaded.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(test): use _meta payload to satisfy ACP SessionConfigOption types

The two new state-propagation tests in `httpAcpBridge.test.ts` used
`{ id, name, value }` as a `SessionConfigOption`, but ACP's actual
`SessionConfigSelect` shape requires `currentValue` + `options`. vitest
runs through esbuild and skips strict typechecking, so the local
`vitest run` passed; CI's `tsc --build` (run during `npm run prepare`)
caught it.

Switch the fixture to `_meta: { tag: '...' }` instead — `_meta` is
typed as `Record<string, unknown> | null` on the ACP response shapes,
so any payload survives. The assertions only need the bridge to
forward the state object intact, which `_meta` proves equally well
without committing the test to the full SessionConfigOption union.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve): symmetric restore coalesce guard + transportClosed leak + defensive cleanup

Address the two new Critical findings + the test/cosmetic gaps from
wenshao's second review pass on PR #4222 (`a3f38da3a`):

- **[Critical] Symmetric coalesce guard.** The previous guard only
  rejected `load`-on-`resume`; `resume` arriving while a `load` was
  in flight silently coalesced and inherited the load's history-
  replay frames over SSE — directly violating resume's "no UI
  replay" contract (made worse by `DaemonSessionClient.resume()`
  seeding `lastEventId: 0`). Tighten the guard to
  `action !== inFlight.action` so any cross-action race throws
  `RestoreInProgressError`. Same-action coalescing is unaffected.

- **[Critical] `transportClosed` dangling rejection.** When
  `withTimeout` wins the `Promise.race` against `channel.exited`,
  the `.then(throw)` chain on `channel.exited` stays pending. A
  later channel exit (next session boundary, daemon shutdown, agent
  crash) fires the `throw` with no observer attached — Node 22 logs
  `unhandledRejection`, and `--unhandled-rejections=throw`
  deployments crash the daemon. Add `transportClosed.catch(() => {})`
  to suppress the dangling rejection after the race settles.

- **`isAcpSessionResourceNotFound` exact-match fallback.** The
  message-fallback path used `message.includes(expectedUri)`, which
  would falsely match a sessionId of `"a"` against a message
  containing `"session:abc"`. Tighten to exact equality on the
  canonical `Resource not found: <uri>` form. The primary
  `data.uri` path remains the dominant code path.

- **`loadSession` mcpServers default symmetry.** `loadSession` now
  uses `params.mcpServers ?? []` to mirror `unstable_resumeSession`.
  Defends against a future ACP schema loosening that makes
  `LoadSessionRequest.mcpServers` optional — without the
  null-coalesce, `newSessionConfig` would `TypeError` on iteration.

Tests added:
- `httpAcpBridge.test.ts`: `resume`-on-`load` rejection (mirror of
  the existing `load`-on-`resume` test); regression for the
  dangling `unhandledRejection` (resolves `channel.exited` after
  the restore promise has already settled and asserts no
  `unhandledRejection` event); shutdown-awaits-restore via
  `Promise.race`-based ordering.
- `server.test.ts`: 400 for non-string and over-length `cwd` on
  the restore routes (mirroring the equivalent `POST /session`
  cases for `parseOptionalWorkspaceCwd`).
- `acpAgent.test.ts`: load with `getResumedSessionData()` returning
  `undefined` — distinct code path that does NOT call
  `replayHistory`.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 12:58:47 +08:00
qqqys
2e773b0e60
[codex] Allow custom output directory for /export (#4193)
* feat(cli): support export output directories

* fix(cli): address export review feedback

* test(cli): cover JSON export directory handling

* fix(cli): constrain export output directories

* test(cli): cover export edge cases

* fix(cli): address export directory review feedback

* fix(cli): revalidate export directory before write

* fix(cli): validate export directory before mkdir

* fix(cli): harden export target writes

* fix(cli): refine export failure handling

* fix(cli): clarify export directory mode

* fix(cli): include export path context in errors

* fix(cli): add export debug logging

* fix(cli): make export tests path portable

* fix(cli): refine export validation diagnostics

* test(cli): cover export validation failures
2026-05-17 12:33:02 +08:00
Shaojin Wen
ef29700bce
fix(ui): trim background task results and show newest first (#4094) (#4125)
Some checks are pending
Qwen Code CI / Classify PR (push) Waiting to run
Qwen Code CI / Lint (push) Blocked by required conditions
Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* fix(ui): trim background task results and show newest first (#4094)

Two related improvements to the background task pill and dialog:

1. Trim outdated terminal task results.
   `BackgroundTaskRegistry` and `BackgroundShellRegistry` now cap
   retained terminal entries at 32 each (mirroring `MonitorRegistry`'s
   existing `MAX_RETAINED_TERMINAL_MONITORS` pattern). Running, paused,
   and cancelled-but-not-yet-notified entries are never evicted —
   pruning a not-yet-notified entry would break the SDK contract that
   every `register` pairs with exactly one terminal `task-notification`.

2. Show newest tasks at the top of the dialog.
   `useBackgroundTaskView` now sorts entries by `startTime` descending
   so the dialog opens with the cursor on the most recently launched
   task. `LiveAgentPanel` reverses internally back to ASC for its own
   visual layout (newest row sits closest to the composer).

* perf(shell-registry): batch abortAll prune + statusChange into one pass

abortAll() previously delegated to cancel() per entry, so each running
shell triggered its own pruneTerminalEntries() and statusChange wakeup.
On shutdown / `/clear` with N running shells the only subscriber
(useBackgroundTaskView) re-pulled getAll() N times for what is logically
a single batch transition.

Settle each entry inline via the new private settleAsCancelled() helper,
then fire prune + statusChange exactly once after the loop. The split
keeps the running-status guard at the public-API boundary so callers
can't accidentally re-settle a terminal entry.

* fix(ui): two-bucket sort so running tasks outrank fresh terminals

The earlier startTime DESC sort surfaced the newest LAUNCH but let an
older long-running / paused entry get pushed below a batch of newer
terminal entries — the user opening the dialog to check on the running
work would find it buried under stale completed rows.

Split the merge into two buckets:
  - active (running + paused): sorted by startTime DESC so the most
    recent launch sits at the very top of the dialog.
  - terminal (completed / failed / cancelled): sorted by endTime DESC
    so the most recently FINISHED entry leads the terminal section
    (matches "what changed while I wasn't looking" intuition; a long
    task that just settled outranks an old quick task that finished
    hours ago).

Pin the new behavior with two tests covering active-above-terminal
and the endTime-vs-startTime distinction inside the terminal bucket.

* fix: add missing outputFile and isBackgrounded to retention cap tests

The merge brought in required fields on AgentTaskRegistration that the
retention-cap test helpers were not supplying.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-17 09:13:24 +08:00
qqqys
07165a095c
Add stop hook blocking cap (#4208)
* feat(core): add stop hook blocking cap

* fix(core): tighten stop hook cap behavior

* fix(cli): show goal judge details

* fix(core): bound stop hook blocking cap

* fix(core): surface subagent stop cap warnings

* fix(core): clean up stop hook cap loop

* test(core): cover stop hook cap integrations

* test(core): strengthen stop hook cap coverage
2026-05-17 06:52:56 +08:00
易良
ba77ddd81b
fix(lsp): expose status and startup diagnostics (#3649)
* feat(lsp): add /lsp slash command to show server status

Implements the /lsp command that displays the status of all configured
LSP servers. Previously this was documented in the FAQ but never
implemented, leaving users with no way to check if their language
servers started successfully.

Changes:
- Add LspServerStatusInfo interface to lsp/types.ts
- Add getServerStatus() to LspClient and NativeLspClient
- Expose getServerHandles() from NativeLspService
- Create lspCommand.ts with status table output
- Register /lsp in BuiltinCommandLoader (only when LSP is enabled)

The command shows: server name, command, languages, and status
(NOT_STARTED / IN_PROGRESS / READY / FAILED + error message).

* fix(lsp): expose status and startup diagnostics

* fix(lsp): harden status command diagnostics

* fix(lsp): add stderr error listener and harden initialization error handling

- Add stderr 'error' event listener in LspConnectionFactory to prevent
  unhandled stream errors from crashing the process
- Wrap setLspInitializationError calls in try-catch in config.ts to guard
  against post-initialization state changes that would throw
2026-05-17 01:42:28 +08:00
jinye
54fd5c50f0
feat(telemetry): add detailed sensitive span attributes (#4097)
Layer detailed content attributes onto the existing hierarchical spans
(qwen-code.interaction / qwen-code.llm_request / qwen-code.tool) gated
by includeSensitiveSpanAttributes:

- Interaction span: user prompt (new_context)
- LLM request span: system prompt + hash + preview + length (full text
  deduped per session via SHA-256), tool schemas (per-tool tool_schema
  events, also hash-deduped), model output
- Tool span: tool input, tool result on every exit path (success +
  pre-hook block + post-hook stop + tool error + try-block cancel +
  catch-block cancel + execution exception)

All large content truncated at 60KB with *_truncated and
*_original_length metadata. Heavy serialization (safeJsonStringify on
tool I/O, partToString on user prompt) is guarded by the sensitive
flag at the call site so it doesn't run when telemetry is off.

Also adds:
- getActiveInteractionSpan() helper for client.ts to attach prompt
  attributes to the interaction span.
- Updated config schema description and docs (telemetry.md +
  settings.md) to reflect expanded scope and add security/cost notes.
- 28 unit tests for detailed-span-attributes, 4 tests for
  getActiveInteractionSpan, integration mocks updated.
2026-05-17 00:36:48 +08:00
qqqys
daaa85e98e
feat(cli): add fork-session resume flag (#4159)
* feat(cli): add fork-session resume flag

* fix(cli): address fork-session review feedback

* fix(cli): handle fork session copy failures

* fix(cli): guard sandbox session handoff flag
2026-05-17 00:27:52 +08:00
Shaojin Wen
b9590283c0
fix(cli): pass rewind selector test props (#4211)
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-16 23:57:50 +08:00
jinye
878f35fc4f
feat(serve): per-request sessionScope override on POST /session (#4175 Wave 2 PR 5) (#4209)
* feat(serve): per-request sessionScope override on POST /session

Resolves the FIXME at httpAcpBridge.ts:BridgeOptions.sessionScope from
#3803 — clients can now override the daemon-wide sessionScope per
request instead of being stuck with whatever boot-time value the
operator picked. A VSCode window that wants strict isolation can ask
for `'thread'` against a default-`'single'` daemon, and vice versa.

Wire change:
- POST /session body accepts optional `sessionScope: 'single' | 'thread'`
- Per-request value wins; daemon-wide default remains the fallback when
  the field is omitted (bit-for-bit backward compat for every existing
  caller)
- Invalid values yield 400 `{ code: 'invalid_session_scope' }`
- New capability tag `session_scope_override` advertised on
  /capabilities.features for negotiation

Bridge changes:
- BridgeSpawnRequest gains optional `sessionScope`
- spawnOrAttach validates the per-request value and resolves
  effectiveScope = req.sessionScope ?? defaultSessionScope
- doSpawn now takes effectiveScope and only stamps `defaultEntry`
  (the single-scope attach slot) when the spawn is single-scope —
  fixes a mixed-scope leak where a thread-first call would let a
  later omitted-scope call attach to the supposedly-isolated session

SDK:
- CreateSessionRequest gains optional `sessionScope`
- DaemonClient.createOrAttachSession conditionally spreads it into the
  JSON body so omitted callers send the same wire shape as before

Tests:
- 4 new bridge tests (override single→thread, override thread→single,
  mixed-scope leak regression, invalid-value rejection)
- 3 new server tests (valid passthrough, invalid 400, omitted backward
  compat)
- 2 new SDK tests (forwards/omits sessionScope on the wire)
- EXPECTED_STAGE1_FEATURES updated for the new capability tag

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve): address Wave 2 PR 5 review findings

Three independent review passes found three real issues:

1. Bridge `TypeError` on invalid `sessionScope` collapsed to opaque 500
   in `sendBridgeError` instead of the typed `400 invalid_session_scope`
   the route layer guarantees. Direct embed / test / future entry-point
   callers bypassing the route would see a generic 500 with stack noise
   on stderr — disagreeing with the route contract.

   Fix: add `InvalidSessionScopeError` class (alongside
   `SessionNotFoundError` / `WorkspaceMismatchError` /
   `SessionLimitExceededError`); the `spawnOrAttach` validator now
   throws it, and `sendBridgeError` translates to the same
   `{ error, code: 'invalid_session_scope' }` shape.

2. SDK `DaemonClient.createOrAttachSession` used a truthy check
   (`req.sessionScope ? ...`) for the conditional spread, silently
   erasing falsy-but-defined values (`''`, `null`, `0`) on the wire.
   A buggy caller would never see the daemon's 400 — it'd inherit the
   daemon-wide default while believing it requested a specific scope.
   Fix: use `!== undefined` (matching the bridge's own validation
   shape). Same fix to the server-side spread for consistency.

3. JSDoc and docs referenced `serve --sessionScope` as if it were a
   shipping CLI flag. It isn't — `ServeOptions` has no field, neither
   `runQwenServe` nor `serve.ts` plumbs one, and the production daemon
   default is hardcoded to `'single'`. Strike the references; note
   that #4175 may add the flag in a follow-up.

Test coverage expanded:
- Cap-bypass guard: per-request `'thread'` overrides cannot bypass
  `maxSessions` on a daemon-default-`'single'` deployment. Without
  this, a future refactor that gated the cap on `defaultSessionScope`
  instead of `effectiveScope` would silently let `'thread'` overrides
  amplify past the limit — the exact N-amplification cliff #3803 was
  about.
- Symmetric mixed-scope leak: daemon-default-`'thread'` +
  single-first-call followed by omitted-scope-second-call must produce
  distinct sessions. Mirrors the existing daemon-default-`'single'` +
  thread-first leak regression.
- Concurrent mixed-scope coalescing: simultaneous single + thread
  `spawnOrAttach` against the same workspace under slow `initialize`
  must not collide on `inFlightSpawns` (tracker keys differ by scope).
- Updated invalid-scope rejection test to assert
  `InvalidSessionScopeError` instance + carried `sessionScope` field.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
2026-05-16 23:54:20 +08:00
Dragon
8f54ae9c0f
feat(cli): add built-in status line presets with interactive dialog (#4120)
* chore(skills): add codex reproduce workflows

* feat(cli): add built-in status line presets with interactive dialog

Replace the shell-command-only status line with a preset system that
renders structured session info (model, context usage, git branch,
token counts, etc.) without external commands. Users can configure
which items to display via a new interactive dialog accessible through
/statusline or the settings UI.

- Add statusLinePresets module with 16 built-in item types
- Add StatusLineDialog component with search, multi-select, and preview
- Update /statusline command to open the preset dialog
- Extend settings schema to support { type: "preset", items: [...] }
- Enhance MultiSelect with separator items, active marker, and
  customizable checked text
- Update Footer to support theme-colored preset output

* fix(cli): refresh status line preset after saving

* chore: remove codex reproduce skills

* fix(cli): address status line preset review feedback
2026-05-16 23:22:11 +08:00
dreamWB
966b040359
feat(cli): readline Ctrl+P/N for history and selection navigation (#4082)
* feat(cli): readline Ctrl+P/N for history and selection navigation

Adds GNU-readline-style Ctrl+P (previous) and Ctrl+N (next) shortcuts
to the qwen-code TUI so users coming from bash/zsh, Emacs, or Claude
Code feel at home. The change has three orthogonal behavior groups:

1. Input prompt, history-versus-line-motion two-step edge

   Ctrl+P / Ctrl+N and the arrow keys behave identically and apply a
   two-step edge transition that matches GNU readline and Claude Code:
   inside a multi-line buffer they move the cursor between visual
   rows; on the top row with the cursor away from column 0 the first
   Up press snaps the cursor to column 0 without changing history, and
   only the second press walks one entry back. The mirror rule holds
   for Down at the last row (snap to end of line, then advance). After
   navigateUp the buffer is parked at offset 0 (the "start of older
   entry" landing position); after navigateDown setText's default
   end-of-text positioning keeps the cursor at the end. The same
   two-step rule applies to single-line buffers so the
   reverse-direction case the issue called out works: pressing Ctrl+N
   immediately after Ctrl+P loaded a single-line older entry (cursor
   at col 0) first snaps the cursor to end-of-line, and only the next
   Ctrl+N moves forward through the history. Bare k/j inside the
   input prompt remain ordinary typed letters — the vim aliases are
   selection-list shortcuts, not text-editing ones.

2. Selection lists: arrows, k/j, and Ctrl+P/N are interchangeable

   A new pair of Command bindings, SELECTION_UP and SELECTION_DOWN, is
   wired into the shared useSelectionList hook and every dialog that
   used to hand-roll an "up/down arrow only" or "up/k arrow + vim
   only" navigation check. Covered surfaces: the main selection-list
   hook itself, the MCP / extensions / agents / hooks / background-
   tasks / rewind / plugin-choice / ask-user-question dialogs, the
   memory dialog (both its file list and the auto-memory and
   auto-cleanup toggle panel above the list), the settings dialog
   list (with the in-place value editor's "block other keys while
   editing" guard preserved), and the manage-models dialog's top
   tabs row. The auth-provider wizard's Advanced Config focus rows
   and the resume-session picker's cross-mode arrows are extended
   with the readline Ctrl+P / Ctrl+N synonyms while keeping their
   existing arrow-key and (for the session picker) vim k/j semantics
   intact.

3. Selection surfaces that wrap an active text input

   AskUserQuestionDialog's "Other / type a custom answer" field,
   manage-models' search input, the resume-session picker's search
   field, and the auth-wizard's Context-window number input all
   coexist with the selection list on the same screen. In those
   surfaces typing k or j has to land in the text buffer, not scroll
   the surrounding list. The fix is to scope the input-aware handler
   to unambiguous non-letter shortcuts only — arrow keys plus
   readline-style Ctrl+P / Ctrl+N escape the text field, while bare
   letters (including k / j / p / n) are delivered to the active
   input. The keyBinding-level fix that backs this is the
   `{ key: 'k', ctrl: false }` / `{ key: 'j', ctrl: false }` clauses
   on SELECTION_UP / SELECTION_DOWN, which prevent Ctrl+K from
   accidentally matching SELECTION_UP and thereby firing both the
   list-up handler and the KILL_LINE_RIGHT handler in the same
   keystroke (the P0 finding the quality-gate review surfaced).
   Focus-traversal tokens (the agent tab bar and the background-task
   pill) and chord shortcuts (Ctrl+Shift+Up/Down for embedded-shell
   history) are deliberately left untouched because their existing
   "any printable letter yields focus back to the composer" UX would
   break under the new vim-style letter bindings, and the Help
   viewer's scroll is a viewer rather than a selection list and is
   out of this PR's scope.

Documentation: docs/users/reference/keyboard-shortcuts.md is updated
so the Ctrl+P / Ctrl+N entries describe the two-step edge rule and
the radio-button-select table mentions the new k/j and Ctrl+P/N
aliases. Per-dialog on-screen hints (which still read "↑↓ to
navigate") are intentionally not touched so the i18n string surface
stays unchanged; the global reference doc is the authoritative source
for the new shortcuts.

Tests:
 - packages/cli/src/ui/keyMatchers.test.ts adds positive cases
   covering ↑ / ↓ / bare k / bare j / Ctrl+P / Ctrl+N matching
   SELECTION_UP / SELECTION_DOWN and negative cases asserting that
   Ctrl+K and Ctrl+J do NOT match (the conflict guard).
 - packages/cli/src/ui/components/InputPrompt.test.tsx adds a
   "two-step edge transition for history navigation" describe block
   with four cases: a mid-line Ctrl+P snaps to col 0 without invoking
   navigateUp; an at-col-0 Ctrl+P does invoke navigateUp and then
   parks the cursor via moveToOffset(0); a not-at-end Ctrl+N snaps to
   end-of-line without invoking navigateDown; and arrow Up obeys the
   same rule as Ctrl+P for keyboard-parity. The test file's mock
   buffer's setText was also corrected to mirror the real buffer's
   "cursor lands at the end of the new text" semantic so the cursor
   field is internally consistent during keypress assertions; the
   small InputPrompt render-frame snapshot in the same file's
   __snapshots__/ directory was regenerated to reflect the now-
   accurate cursor render position. Three pre-existing arrow-key
   navigation tests were updated to pre-position the mock cursor at
   the relevant edge before pressing the arrow, because the new
   two-step rule means the first arrow press at a non-edge position
   is a cursor snap, not a history step. Multi-line cursor-between-
   rows movement is covered indirectly by the keyBinding-level
   matcher tests plus the end-to-end manual demo plan.

The work landed in three rounds against the planner's gate: round 1
added the unified SELECTION_UP / SELECTION_DOWN Command binding and
the cursor-first dispatch in the input prompt; round 2 picked up the
quality-gate review's P0 (the Ctrl+K double-fire in the "Other"
custom-input field) and the user's hand-test feedback on the missing
two-step edge in the reverse direction plus the MemoryDialog
top-panel sections that weren't wired through SELECTION_*; round 3
swept the remaining adjacent dialogs (SettingsDialog list,
ManageModelsDialog tabs and search transitions, ProviderSetupSteps
advancedConfig, useSessionPicker's cross-mode arrows) so the
keyboard model is uniform across the TUI.

The original issue also asks for Meta+B / Meta+F word motion and
smarter Ctrl+H token-aware backspace among other readline
conveniences. The user explicitly scoped this PR down to Ctrl+P /
Ctrl+N at the planner approval gate; the remaining wish-list items
are deferred to follow-up issues.

Closes #3821

* docs(cli): refine Ctrl+P/N input-history rows; fix Ctrl+J in selection-list comment

Both items came from a non-blocking COMMENTED review on PR #4082
(https://github.com/QwenLM/qwen-code/pull/4082#pullrequestreview-4271527787),
flagging two polish points in the readline Ctrl+P/Ctrl+N feature the parent
commit `feat(cli): readline Ctrl+P/N for history and selection navigation`
(f66427b) introduced.

The `Up Arrow`, `Down Arrow`, `Ctrl+P`, and `Ctrl+N` rows of the Input
Prompt table in `docs/users/reference/keyboard-shortcuts.md` are reworded
to describe the three-phase keystroke sequence the implementation walks
through — an intra-buffer visual-row step (a no-op in a single-line
buffer, where there's exactly one visual row), a column-edge snap when
the cursor reaches the buffer's first or last visual row with the
cursor not already at column 0 (for the up-direction pair) or
end-of-line (for the down-direction pair), and the readline-style
previous-history or next-history walk on the press after the snap. The
reviewer specifically pointed out that the prior wording described
single-line input as "navigates the input history directly", which no
longer matches the post-PR-#4082 behavior: single-line input also goes
through the snap-then-walk two-press rule (the snap is a no-op when
the cursor is already at the line's edge column, in which case the
keystroke does the history walk on its first press). The new sentence
covers the single-line and multi-line cases in one shape — single-line
is the degenerate zero-row-walk-prefix instance of the same rule. The
up-direction text is shared verbatim between the `Up Arrow` row (L31)
and the `Ctrl+P` row (L43), and the down-direction text between the
`Down Arrow` row (L27) and the `Ctrl+N` row (L42), so the keyboard-
parity alias relationship is signaled by source-side text duplication
rather than a prose cross-reference. The Input Prompt table's 234-byte
canonical row width (the separator row's `| <50-dash> | <177-dash> |`
template, which sets the column-1 and column-2 source-side widths the
file's existing untouched rows already align to) is preserved by
trailing-ASCII-space padding inside the description column.

The comment above `[Command.SELECTION_UP]` and `[Command.SELECTION_DOWN]`
in `packages/cli/src/config/keyBindings.ts` previously read

    // Selection list navigation — up/k/Ctrl+P move selection up; down/j/Ctrl+N move selection down
    // ctrl: false on k/j ensures Ctrl+K (kill-line) and Ctrl+N (history-down) are not captured here

The `Ctrl+N` half of the second line is wrong: `Ctrl+N` is intentionally
matched here as the selection-down readline alias — the
`{ key: 'n', ctrl: true }` entry in the `SELECTION_DOWN` array literal
directly below the comment, mirroring the input-prompt-side
`[Command.HISTORY_DOWN]: [{ key: 'n', ctrl: true }]` binding at L134 of
the same file. The Ctrl-modified key the bare-letter `k` and `j`
matchers actually guard against — the one already bound elsewhere
whose double-match with the bare-letter selection-key the `ctrl: false`
opt-out is preventing — is `Ctrl+J`, the ASCII line-feed (0x0A) encoding
of the Enter family that appears as `{ key: 'j', ctrl: true }` inside
the four-alternative `[Command.NEWLINE]` array a few lines below. The
corrected one-liner is

    // Selection-list nav: arrows + k/j + Ctrl+P/Ctrl+N
    // ctrl: false on bare k/j skips Ctrl+K and Ctrl+J

in the same terse no-trailing-period section-label style as the file's
adjacent `// Screen control` (L129), `// History navigation` (L132),
`// Auto-completion` (L213, post-edit numbering), and `// Text input`
(L219) header comments. A 64-line block-comment that earlier in the
review-fix cycle wrapped this same correct fact in dispatch-broadcast-
model prose plus `keyMatchers.test.ts` backreferences was condensed to
those two lines for cell-budget consistency with the rest of the file.

No code behavior change. The local verification surface the reviewer
named at the bottom of the review summary stays green: from
`packages/cli`,

    npx vitest run \
        src/ui/keyMatchers.test.ts \
        src/config/keyBindings.test.ts \
        src/ui/components/InputPrompt.test.tsx

runs 178 cases with 177 passed and one unrelated skip (the
implementation file `InputPrompt.tsx`'s feature flag for the keyboard-
queue-input-editing case that was already skipped on the parent commit),
including all four cases inside the `InputPrompt > two-step edge
transition for history navigation` describe-block — `Ctrl+P with cursor
mid-line snaps to col 0 without touching history`, `Ctrl+N with cursor
not at end-of-line snaps to end without touching history`, `Ctrl+P at
col 0 walks history and parks the cursor at offset 0`, and `arrow Up
applies the same two-step rule as Ctrl+P (snap before navigate)`. Those
four test-case names are the implementation-side anchors the new docs
wording verbally mirrors. `npx tsc --noEmit -p .` in the same package
directory reports zero diagnostics.

* fix(cli): align readline history shortcuts with dialogs

* test(cli): cover readline navigation aliases

* fix(cli): guard readline shortcuts in dialog inputs

* test(cli): cover readline aliases in more dialogs
2026-05-16 23:07:25 +08:00
tanzhenxin
8d765fec78
refactor(core): TaskBase envelope + foreground subagent persistence (#3970)
* refactor(core): TaskBase envelope + foreground subagent persistence

Establishes a shared `TaskBase` envelope across the agent / shell /
monitor task registries with a mandatory `outputFile` field. Brings the
foreground subagent path into compliance with the new contract, so it
now leaves the same JSONL transcript + meta sidecar on disk that
backgrounded subagents have always produced — closing the only gap
where a registered task wrote nothing. Renames the agent-task
discriminator from `flavor: 'foreground' | 'background'` to claw-code's
`isBackgrounded: boolean`; the deprecated names are kept as
one-release type aliases.

PR 1 of the task-registry-unification design. PR 2 will collapse the
three per-kind registries into one thin TaskRegistry plus per-kind
modules.

* refactor(core): drop unused BackgroundTaskFlavor type alias

The alias only preserved the type name; no in-tree caller used it,
and after the field rename no realistic external consumer use survives
(reading entry.flavor / writing { flavor: ... } both fail at the use
site regardless of whether the alias resolves). Drop it instead of
carrying a hollow shim.

* fix(core): tighten foreground subagent launch path

- Register before writing the meta sidecar so a register() failure can't
  leave an orphaned 'running' meta file behind. writeAgentMeta is
  best-effort and never throws, so the inverse failure mode (registry
  entry without sidecar) is a benign degradation.
- Cache getGitBranch by cwd at the agent module level so foreground
  launches don't pay a fresh git rev-parse exec each time. Branches
  don't change within a process under normal use; the transcript
  annotation is best-effort audit metadata.
- Document on cancel() that foreground entries take a partial path
  through the method — Map deletion is the caller's responsibility
  via unregisterForeground() in the tool-call's finally path.

* fix(agent): correct foreground meta status mapping and register order

The foreground finally block in agent.ts mapped any non-ERROR, non-CANCELLED
terminate mode (including MAX_TURNS, TIMEOUT, SHUTDOWN) to 'completed' in
the sidecar, so post-mortem readers and resume logic saw a successful
status for runs that actually hit a guardrail. Flip the ternary to mirror
the background path: GOAL -> completed, CANCELLED -> cancelled, else ->
failed.

Also reorder the background launch so registry.register() runs before
writeAgentMeta(), matching the foreground path. Both paths now share the
same orphaned-meta guarantee.

* test(agent): rename stale foreground-flavor test

The "default flavor (absent) behaves as background" test name and its
backwards-compat comment referenced the old optional flavor field, but
the registration shape has required isBackgrounded for a while now —
there is no "absent" path to exercise. Rename it to describe what the
assertion actually covers: that background entries fire a task-
notification on complete.

* refactor(core): alias BackgroundTaskStatus to TaskStatus

The local `BackgroundTaskStatus` union was byte-identical to the new
shared `TaskStatus` defined in `tasks/types.ts`. Replace it with a
`@deprecated` type alias so external consumers (notably
`nonInteractiveCli.ts`) keep compiling unchanged while the canonical
name lives in one place.

* refactor(core): tidy monitorRegistry signatures and document cancel ordering

Two small consistency wins flagged in review:

1. `dispatchOwnerLifecycleWake` and `dispatchNotification` were the only
   methods on the registry still typed with the deprecated `MonitorEntry`
   alias. Rename their parameters to `MonitorTask` to match every other
   signature in the file.

2. `cancel()` orders `settle()` and `abort()` differently between its two
   branches, which is intentional (silent cancel locks the terminal status
   before abort listeners run; default cancel lets a naturally-completing
   operation settle through its own terminal path). Document that
   asymmetry in a JSDoc on the method so the next reader doesn't have to
   reverse-engineer it.

* refactor(core): migrate internal BackgroundTaskStatus refs to TaskStatus

The `BackgroundTaskStatus` alias was introduced in 91b59a8fb as a
`@deprecated` synonym for external SDK consumers (notably
`nonInteractiveCli.ts`). New internal references in this PR's own
file kept the old name; migrate them so the only remaining usage of
the deprecated alias is the alias declaration itself.

No behavior change — the alias is `= TaskStatus` so the union is
identical.

* test(agent): cover foreground failed-mode terminal status mapping

The foreground finally block maps GOAL→completed, CANCELLED→cancelled,
and everything else (ERROR, MAX_TURNS, TIMEOUT, SHUTDOWN) → failed.
Only the GOAL branch was asserted; the failed-mode fallback had no
coverage even though the same mapping recently regressed (d67da6d50)
and had to be fixed by review.

Adds a table-driven case mocking getTerminateMode to ERROR /
MAX_TURNS / TIMEOUT and asserting patchAgentMeta receives
status: 'failed'. CANCELLED is already covered by the
"foreground CANCELLED prefixes the partial result" test below.

* test(agent): cover foreground CANCELLED → cancelled meta mapping

Extends the foreground terminate-mode it.each to assert that
CANCELLED is recorded as `cancelled` in the on-disk sidecar — the
existing cancel-prefix test only verified the LLM-visible payload,
leaving the patchAgentMeta mapping uncovered. A regression flipping
CANCELLED → 'failed' would now fail this case.

* test(agent): make registry path assertions platform-agnostic

The outputFile/metaPath regexes hardcoded forward slashes, so the
foreground JSONL+meta reservation test failed on Windows where paths
use backslashes. Accept either separator.

* fix(core): guard executeBackground register-throw window; correct outputFile contract

A throwing register() subscriber in executeBackground() would leak the
already-spawned child + open output stream, unreachable by /tasks /
task_stop. Mirror the promote path's defensive try/catch: abort the
entry's controller, destroy the stream, and rethrow so the launch fails
visibly.

Also correct the TaskBase.outputFile contract: agent JSONL is
materialized on the writer's first append, which is the launch prompt
at attach time — not the first runtime event. A subagent cancelled
before any event still leaves a prompt-only JSONL plus meta, not meta
alone.
2026-05-16 22:53:08 +08:00
jinye
379d14ad00
feat(rewind): add file restoration support to /rewind command (#4064)
* feat(rewind): add file restoration support to /rewind command (#3697)

Previously /rewind only truncated conversation history — files modified
by the assistant remained on disk. This adds a file-copy-based backup
system (ported from claude-code's fileHistory) so users can optionally
roll back file changes when rewinding.

Core changes:
- New FileHistoryService with snapshot/backup/restore lifecycle
- trackEdit() called before each file write in edit and write-file tools
- makeSnapshot() at each user turn boundary in client.ts
- Three-phase RewindSelector UI: pick turn → choose restore option → execute
- RestoreOption type: 'both' | 'conversation' | 'code' | 'cancel'

Closes #3697

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(rewind): replace findLast with reverse loop for ES2022 compat

vscode-ide-companion targets ES2022 which lacks Array.findLast.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(rewind): add missing i18n translations and fix test expectation

- Add file restore i18n keys to all 8 locale files (zh-TW, ca, de, fr,
  ja, pt, ru were missing)
- Update useGeminiStream test to expect promptId in user history item

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(rewind): add getFileHistoryService mock to tool tests

edit.test.ts and write-file.test.ts mock configs lacked the new
getFileHistoryService method, causing trackEdit calls to throw.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(rewind): allow Esc during diff loading and add missing i18n footer strings

Allow users to press Esc/Ctrl+C to cancel during diff stats loading
phase. Add three missing footer navigation strings to all 9 locale files.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(rewind): address review feedback — restoreBackup correctness, missing promptId warning, dead code removal

- restoreBackup now returns boolean; applySnapshot only counts a file
  as restored when the backup was actually applied (fixes misleading
  "Restored N file(s)" when backup is missing on disk)
- Show warning when user selects file restore on a turn created before
  file checkpointing was enabled (promptId undefined)
- Remove unused snapshotSequence field, canRestore(), and hasAnyChanges()
  methods that had no callers

* fix(rewind): correct diff direction, truncate snapshots on rewind, add zero-files feedback

- Swap diffLines args to diffLines(backup, current) so +/- stats
  match git convention (insertions = lines added since checkpoint)
- Truncate snapshots after rewind to discard stale timeline state,
  preventing makeSnapshot from using wrong baseline
- Show "No files needed restoration." when rewind finds files already
  at target state (all 9 locales)

* test(tools): assert trackEdit is called before file writes

* fix(i18n): add missing rewind UI locale keys across all 9 locales

* fix(core): reset fileHistoryService on session change, clean up dead code

- Reset fileHistoryService in startNewSession() so /clear gets a fresh
  instance with the new sessionId
- Rebuild trackedFiles after rewind() to avoid stale stat() calls
- Remove unused setCurrentPromptId/getCurrentPromptId dead API

* fix(rewind): validate conversation before file restore, preserve snapshots for code-only

- For 'both': validate conversation can be truncated before restoring
  files to prevent inconsistent state (files rolled back but conversation
  stays at newer state)
- For 'code'-only: pass truncateHistory=false so snapshot timeline is
  preserved — conversation turns remain visible and their snapshots stay
  available for future rewinds

* fix: correct trackEdit race comment — overwrite not orphan

* fix(types): use HistoryItemWithoutId for addItem to preserve union member properties

* fix(types): revert addItem type change, use cast at call site for promptId

* fix(rewind): guard onRewind calls with .catch() to prevent unhandled rejection

* fix(rewind): only truncate snapshot timeline when conversation truncation will execute

* fix(rewind): address tanzhenxin review - gate, partial failure, tests

1. Disable file checkpointing for non-interactive (-p) mode by gating
   on `params.interactive !== false` in addition to `!params.sdkMode`.

2. Surface partial restore failures: `rewind()` now returns
   `RewindResult { filesChanged, filesFailed }`. In "both" mode,
   conversation truncation is skipped when any file fails to restore,
   preventing inconsistent state.

3. Add comprehensive unit tests for FileHistoryService (17 tests
   covering trackEdit, makeSnapshot, rewind, eviction, diffStats).

* fix(rewind): defensive trackEdit + fix version collision on re-track

1. Wrap trackEdit calls in edit.ts and write-file.ts with try/catch
   so file history failures never break core tool operations.

2. Replace hardcoded version:1 in trackEdit with max-version lookup
   across all snapshots. Prevents backup file overwrite when the same
   file is re-tracked after a code-only rewind (truncateHistory=false).

* fix(rewind): add missing i18n keys + fix makeSnapshot version collision

1. Add 'Failed to restore {{count}} file(s): {{files}}' to all 7
   missing locales (ca, de, fr, ja, pt, ru, zh-TW).

2. Use global max-version scan in makeSnapshot (same as trackEdit)
   to prevent backup filename collisions after snapshot eviction.

* fix(rewind): set hasRestoreFailure when promptId is missing

In "both" mode, if the target turn has no promptId, conversation
truncation was still proceeding because hasRestoreFailure was not set.
Now correctly blocks truncation to prevent inconsistent state.

* fix(rewind): show loading state during async restore, close selector in finally

Defer setIsRewindSelectorOpen(false) to a try/finally block so the
selector stays visible during async file restore. RewindSelector now
manages its own isRestoring state: shows "Restoring..." text and
disables all keypress handlers while the restore is in progress.

This prevents the user from seeing a bare prompt with no progress
indicator during slow restores, and eliminates the race where typing
during restore could clobber the pre-filled prompt.

* fix(rewind): skip timeline truncation on partial failure + fix wording

1. rewind() now only truncates the snapshot timeline when
   filesFailed is empty, preventing loss of future checkpoints
   when the caller skips conversation truncation due to failures.

2. Change "No files needed restoration." to the more idiomatic
   "No files needed to be restored." across all 9 locales.

* fix(rewind): address review — TOCTOU in createBackup + outer catch in handleRewindConfirm

- Extract safeCopyFile(src, dst) helper that distinguishes source-missing
  (TOCTOU: file deleted between stat and copyFile) from target-dir-missing,
  so trackEdit no longer silently fails when a file disappears mid-backup.
  Same helper now covers restoreBackup.
- Wrap handleRewindConfirm with an outer catch that surfaces unexpected
  failures via historyManager error item; previously a sync throw from the
  post-rewind block would silently close the selector and leave 'both'
  mode in a half-applied state.
- Add 'Rewind failed: {{error}}' i18n key in all 9 locales.

* test(rewind): cover restoreFromSnapshots, trackEdit no-snapshot path, partial-failure timeline guard

- restoreFromSnapshots: assert relative-path shortening + external-path preservation
- trackEdit before any makeSnapshot: assert no-op early return
- rewind truncation guard: assert snapshot timeline is preserved when filesFailed > 0

* fix(rewind): clean up orphaned backups, surface no-client states, polish

- Per-eviction backup cleanup: when MAX_SNAPSHOTS overflow or rewind
  truncation drops snapshots, remove backup files no longer referenced
  by any surviving snapshot (best-effort, ENOENT-tolerant). Backup files
  are content-deduplicated across snapshots, so the live-set is computed
  from survivors before deletion.
- Surface no-client failure modes in handleRewindConfirm: 'conversation'
  mode now shows an error instead of silently returning; 'both' mode
  shows an info message after restore so the user knows the conversation
  half was skipped.
- i18n the previously hardcoded 'Conversation rewound...' message and
  add 3 new keys to all 9 locales.
- Tighten createBackup signature (drop unreachable null branch).
- Extract getMaxVersion helper to deduplicate identical loops in
  trackEdit and makeSnapshot.

Tests added: orphan-cleanup on overflow, dedupe preservation, rewind
truncation cleanup. All existing tests continue to pass (23 core, 71
AppContainer, 27 i18n).

* fix(rewind): use path separator constant in maybeShortenFilePath

The hardcoded '/' check meant Windows absolute paths (with '\') never
matched the cwd prefix, so the shortening was a no-op on Windows. The
new cleanup tests revealed this by asserting on the relative-path key:
on Windows the key was the full absolute path, so trackedFileBackups
lookups returned undefined.

Switching to the platform sep also makes Windows snapshots use the
relative key like POSIX, improving portability if cwd moves later.
restoreFromSnapshots re-runs maybeShortenFilePath on every key, so
existing on-disk sessions migrate transparently on resume.

* test(rewind): cover trackEdit best-effort guarantees and unchanged-file rewind

- edit.test.ts: assert tool still completes (file written, llmContent
  reflects the edit) when FileHistoryService.trackEdit rejects.
- write-file.test.ts: same for the write_file tool.
- fileHistoryService.test.ts: assert trackEdit swallows createBackup
  failures (forced via storageDir-replaced-with-file → ENOTDIR in
  recursive mkdir) without recording any backup.
- fileHistoryService.test.ts: assert applySnapshot leaves a file
  untouched (mtime unchanged, filesChanged empty) when its content
  already matches the target backup — covers the
  checkOriginFileChanged short-circuit.

* fix(rewind): align fileCheckpointing default + surface backup-missing on rewind

Two issues from a Codex review pass:

- Config: `fileCheckpointingEnabled` defaulted via `params.interactive !== false`,
  which resolves truthy when the caller omits `interactive` — but `this.interactive`
  itself defaults to `false`. Headless/programmatic callers that did not set
  `interactive` would silently start writing file-history backups under
  `~/.qwen/file-history/`. Use the same `?? false` default so the gate matches
  the resolved interactive value.

- checkOriginFileChanged: when the on-disk backup AND the working file have both
  been removed externally, the function returned `false` ("unchanged"), so
  `applySnapshot` skipped `restoreBackup` and rewind reported success even though
  the target snapshot expected the file to exist. Treat any failure to stat the
  backup as "changed" so callers attempt the restore: applySnapshot surfaces the
  missing backup via restoreBackup → filesFailed, makeSnapshot creates a fresh
  backup. Added a regression test for the both-missing path.

* fix(rewind): mark per-file backup failures so rewind surfaces them

Two related issues from a /review pass:

1. Silent data loss in makeSnapshot inheritance: when the per-file
   backup attempt threw inside makeSnapshot, the catch block left the
   path missing from `trackedFileBackups`, and the inheritance loop
   then copied the previous snapshot's backup into the new snapshot.
   A later rewind to that snapshot would restore older content while
   reporting success.

   Now the catch records `{ failed: true, ... }` for the path. The
   inheritance loop skips paths already present in trackedFileBackups,
   so failed paths are no longer paved over by stale carryover. Both
   applySnapshot and getDiffStats honor `failed` — rewind pushes the
   path to filesFailed and the diff preview omits it.

2. Marketing/scope mismatch: the rewind UI offers "Restore code" but
   the feature only tracks edits made via the `edit` and `write_file`
   tools — shell-mediated changes (`sed -i`, `cp`, `rm`, `mv`,
   `npm`, etc.) and out-of-tool manual edits are not captured.
   Added a class-level JSDoc on FileHistoryService spelling out the
   scope, and an inline footer in the restore-options panel:
   "Rewinding does not affect files edited manually or via shell
   commands." (matching the upstream claude-code MessageSelector
   wording). New i18n key in all 9 locales.

Test added: trackEdit/makeSnapshot per-file failure path. Asserts
the new snapshot has `failed: true`, and that rewind to that snapshot
reports the file as filesFailed instead of silently restoring the
inherited stale backup.

* fix(rewind): polish — i18n, type tightening, resumed-session UX hint

Several small wins from the latest /review pass plus a UX mitigation for
turns whose file-history snapshot is not present in memory (most often
because the conversation came from a resumed session, but also when a
turn has no captured edits):

- AppContainer: wrap the "Cannot rewind to a turn that was compressed"
  error in t(); add the new key to all 9 locales.
- RewindSelector: replace the inline `(+N -M in K file/files)` template
  literal with t() using two plural-aware keys; add to all 9 locales.
- DiffStats.filesChanged: tighten from optional to required to match
  reality (every code path that returns a DiffStats sets it). Drops the
  `!.filesChanged!` non-null cascade in RewindSelector.
- RewindSelector phase 2: when the option list does not contain
  code/both (i.e. no file-restore is actionable for this turn), show
  an explicit hint instead of leaving the user to guess why those
  options are missing. Same i18n key in all 9 locales.

The mitigation hint covers the resumed-session case Tan raised
(snapshots are not rehydrated by `/resume` today) without changing
behavior — `getRestoreOptions` already gracefully degrades to
conversation-only when `getDiffStats` returns undefined for a snapshot
that is not in memory; we just surface the "why" to the user.

* fix(rewind): unstick failed marker on the unchanged-file fast path

The `failed: true` marker added in d59838338 was sticky: once set, the
no-change optimization in `makeSnapshot` would copy the failed entry
forward into every subsequent snapshot for as long as the file stayed
unchanged. A single transient I/O error therefore poisoned `/rewind`
for that file until the user happened to modify the content again.

Add `!latestBackup.failed` to the no-change reuse guard so a failed
entry is never copied forward — the next snapshot retries the backup,
which either heals (when the underlying I/O has recovered) or honestly
records another failed entry.

New regression test (`does not carry a failed marker forward when the
file is unchanged`):

- Snapshot p1 with file content X
- Sabotage the storage dir → p2's per-file backup throws → p2 records
  failed: true
- Restore the storage dir; file still equals X
- p3 must NOT copy p2's failed entry; it must retry createBackup and
  produce a fresh non-failed entry that allows rewind to p3 to succeed
2026-05-16 22:16:01 +08:00
qqqys
0dde1ad704
feat(cli): add session-scoped /goal command with judge-driven turn continuation (#4123)
Some checks are pending
Qwen Code CI / Classify PR (push) Waiting to run
Qwen Code CI / Lint (push) Blocked by required conditions
Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* feat(cli): add session-scoped /goal command with judge-driven turn continuation

`/goal <condition>` pins a free-form objective for the rest of the session.
While a goal is active, an LLM judge runs at every Stop boundary and either
lets the turn end (condition met) or feeds the judge's reason back as the
next user prompt to keep the model working. Auto-clears on success;
`/goal clear` cancels early. Same primitive as Anthropic's Claude Code
2.1.140 `/goal`, built on qwen-code's existing Stop-hook + function-hook
plumbing — no new subsystem.

Core (packages/core/src/goals/):
  - activeGoalStore: per-session active goal + last-terminal cache, with a
    terminal-observer channel the CLI subscribes to so achieved/aborted
    cards land in history.
  - goalJudge: side-query against a fast model, transcript-grounded
    system prompt + json_schema response + disabled thinking. Tolerant
    JSON extraction with fallback so a flaky judge can't kill the loop;
    30s default timeout (vs. the 5s function-hook default that was
    silently killing real-world judge calls).
  - goalHook: function hook on Stop. Returns {decision:'block', reason}
    when not met (reusing client.ts's existing recursive continuation),
    {continue:true} when met. Self-clears active goal + notifies the
    terminal observer on met/aborted. MAX_GOAL_ITERATIONS=50 backstop.

CLI:
  - goalCommand: /goal | /goal <cond> | /goal clear|stop|off|reset|none|
    cancel. 4000-char cap, trust + disableAllHooks gates. Empty /goal
    shows running status, falls back to the last completed summary.
  - GoalPill: footer chip "◎ /goal active (12s)" — terse, claude-aligned.
  - GoalStatusMessage: set / checking / achieved / cleared / aborted
    history cards. "checking" replaces the generic stop_hook_loop chip
    for goal-driven iterations.
  - restoreGoal: on session resume, rehydrate the active goal hook +
    last-terminal cache from transcript so /goal survives /resume.

Cross-cutting fixes:
  - HookSystem.hasHooksForEvent(eventName, sessionId?): also consults
    SessionHooksManager. Previously SDK / programmatic Stop function
    hooks were silently gated out by client.ts's fast-path check, so
    they never fired.
  - client.ts: yield StopHookLoop on every continuation iteration (was
    iter > 1) — first not-met turn is now visible in the UI.
  - useGeminiStream: commit pending item + clear thoughtBuffer /
    geminiMessageBuffer on every Finished event. Fixes a UI bug where
    a Stop-hook continuation's text bled into the prior turn's pending
    history item (cumulative "te" / "tes" rendering), even though the
    persisted transcript was clean.

Co-authored-by: Qwen-Coder <noreply@qwen.ai>

* test(cli): fix footer goal pill mock

* fix(goal): persist terminal status on restore

* fix(goal): harden judge hook

* fix(goal): sanitize condition in instruction prompt and update matcher test

- goalCommand.ts: collapse newlines and downgrade embedded double-quotes in
  the condition before splicing into the instruction prompt so the wrapping
  quote structure stays intact.
- goalLoop.integration.test.ts: matcher assertion updated to '*' to match the
  current registerGoalHook contract (previously '').

Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>

* feat(goal): surface judge reason on terminal cards

Renders `Last check: <reason>` on the achieved / aborted history card
and on the empty-`/goal` summary so the final view records *why* the
judge ruled the goal complete. Uses a single inline-label Text instead
of the flex-row split used for `Goal:` — the reason is capped at 240
chars and almost always wraps; the flex-row variant hangs the
continuation at the value column's left edge (~12 cols of blank space,
easily mistaken for a stray empty line). Single Text + natural wrap
keeps the continuation flush.

Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>

* fix(goal): re-arm /goal on runtime /resume and /branch

Cold boot path in AppContainer already calls restoreGoalFromHistory after
loading session data, but the runtime /resume and /branch paths skipped
it entirely. After /new + /resume back to a session that had an active
/goal, the in-memory activeGoalStore entry still held the pre-/new
setAt and a hookId pointing to a hook that config.startNewSession() had
torn down — leaving the footer pill ticking from the original setAt
(observable as "几十秒" elapsed immediately after resume) while the
Stop hook was silently dead.

Wire restoreGoalFromHistory into both handlers right after the session
data lands so unregisterGoalHook clears the stale entry and
registerGoalHook re-arms with a fresh setAt / hookId and re-installs
the terminal observer.

Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>

* refactor(goal): reuse shared formatDuration utility

Drop the duplicated local formatDuration from goalCommand.ts and
GoalStatusMessage.tsx in favor of the shared formatters.ts version,
called with { hideTrailingZeros: true }. The shared util already has
its own test suite and matches Claude Code's ShellTimeDisplay style
(round values drop zero-unit tails: `5m 0s` → `5m`).

Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>

* fix(goal): abort judge API call on judge timeout

The judge-timeout path in judgeGoalWithTimeout only resolved a fallback
verdict; the underlying judgeGoal generateContent call kept running
because the hook context signal is never aborted by the timeout. Each
timeout leaked one in-flight request that accumulated across goal-loop
iterations. Link an AbortController into the judge signal and abort it
when the timeout fires.

Co-Authored-By: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(goal): harden judge continuation feedback

* test(goal): align loop integration with safe continuation

* fix(cli): harden goal resume lifecycle

* fix(cli): address goal review blockers

* fix(goal): guard stale same-condition callbacks

---------

Co-authored-by: Qwen-Coder <noreply@qwen.ai>
Co-authored-by: Qwen-Coder <noreply@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-16 18:14:13 +08:00
jinye
264ed82273
[codex] feat(serve): add capability registry protocol versions (#4191)
* feat(serve): add capability registry protocol versions

Introduce a serve capability registry and advertise protocolVersions from /capabilities while preserving the existing v1 envelope and Stage 1 feature aliases. Update SDK wire types, docs, and focused tests for old-daemon compatibility.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(serve): clarify capability advertisement semantics

Address PR review feedback by preserving historical capability versions, separating registered and advertised feature helpers, testing protocol version metadata directly, and keeping runtime exports out of the serve types module.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-16 18:07:38 +08:00
qqqys
96b30ee427
feat(cli): add baseline /doctor memory diagnostics (#4180)
* feat(cli): add baseline doctor memory diagnostics

* fix(cli): address doctor memory review feedback

* feat(cli): add doctor memory assessment

* feat(cli): support doctor memory heap snapshots

* feat(cli): add doctor memory sampling

* fix(cli): harden doctor memory heap snapshots

* fix(cli): harden doctor memory heap snapshots

* fix(cli): harden memory heap snapshot diagnostics

* fix(cli): harden doctor memory snapshots

* fix(cli): stabilize heap snapshot cleanup ordering

* fix(cli): harden heap snapshot cleanup

* test(cli): cover memory snapshot fallbacks

* fix(cli): harden doctor memory abort and disk checks
2026-05-16 17:19:50 +08:00
qqqys
372acf1444
feat(cli): argument hint + --auto completion for /rename (#4048)
* feat(cli): argument hint + --auto completion for /rename

Closes #4047.

The /rename command supports a structured --auto flag (let the fast
model generate a sentence-case title from the conversation), but
unlike /model — which advertises --fast via argumentHint and a
completion entry — /rename's flag was undocumented inline. Users had
to either run the command incorrectly or check the docs to learn
about --auto.

- argumentHint: '[--auto] [<name>]' so the completion menu shows the
  shape when the user types `/rename` and tabs.
- completion: returns null on empty / free-text input (don't shadow
  the user typing a title) and surfaces --auto when the partial arg
  is a prefix of it ('-', '--', '--a', '--au', '--auto'). Same shape
  as /model's --fast handling.

Free-text titles intentionally don't auto-complete — there's nothing
meaningful to suggest, and offering --auto on every keystroke would
feel like noise on `/rename my-feature`.

Tests:
- pins argumentHint shape
- empty partial → null
- '-' / '--' / '--a' / '--au' / '--auto' all return the --auto suggestion
- 'my-feature' / 'fix bug' / '-x' return null (free-text path)

Co-Authored-By: Qwen-Coder <noreply@qwen.ai>

* fix(core): fall back to text JSON when generateJson gets no tool call

generateJson registers schemas as a respond_in_schema function
declaration and walks parts[].functionCall for the result. When no
tool_choice is set (the OpenAI-compatible converter never sets one) and
the system prompt explicitly asks for text JSON — e.g. session-title
generation's "Return ONLY a JSON object..." — some models honor the
prompt and emit the answer as a plain text part instead of calling the
tool. The answer is semantically correct; we just weren't reading it.

This bottoms out in /rename --auto as "The fast model returned no
usable title" on qwen3.6-max-preview, and likely affects every other
generateJson caller (next-speaker checker, edit corrector, etc.) on
the same class of model.

Add a tolerant fallback: when no function call comes back, parse
getResponseText(result) — which already skips thought parts — with a
JSON-object extractor that strips optional ```json fences and reads
the outermost {...} block. Strictly additive; the function-call path
stays primary.

Closes #4057.

Co-Authored-By: Qwen-Coder <noreply@qwen.ai>

* refactor(cli): unify /rename and /rename --auto pipelines

Bare /rename (no args) used to call a private generateKebabTitle path
that asked the fast model (or main-model fallback) for a 2-4 word
kebab-case name via a plain text call. /rename --auto used the
schema-enforced tryGenerateSessionTitle path for a 3-7 word sentence-
case title. Two code paths, two prompts, two failure-message formats,
two sanitizers — with the kebab path consistently lagging on history
filtering, surrogate handling, and error specificity.

Collapse to a single fast-model schema-enforced pipeline. Both bare
/rename and /rename --auto now call tryGenerateSessionTitle and both
record titleSource: 'auto' on success. The --auto flag stays as an
explicit user-intent marker (preserves the existing argumentHint /
completion / parseArgs surface) but no longer diverges semantically.

Bare /rename now also hard-requires fastModel; users who relied on
the main-model fallback need to either /model --fast <name> or pass
a name explicitly (/rename <name>). The new failure message points
at both options.

Co-Authored-By: Qwen-Coder <noreply@qwen.ai>

* fix(cli): clarify rename title failure

* test(core): cover loose json fallback

---------

Co-authored-by: Qwen-Coder <noreply@qwen.ai>
2026-05-16 16:47:15 +08:00
jinye
435f711e33
feat(cli): warn users that rewind is disabled in IDE mode (#4122)
Some checks are pending
Qwen Code CI / Classify PR (push) Waiting to run
Qwen Code CI / Lint (push) Blocked by required conditions
Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
2026-05-15 20:27:37 +08:00