Commit graph

5905 commits

Author SHA1 Message Date
pomelo-nwu
f4e01a409e fix(auth): address PR #4287 review (critical + suggestion)
vscode AuthMessageHandler (Critical):
- Add the missing protocol-selection step so custom-provider users can
  pick Anthropic/Gemini instead of being silently locked to OpenAI.
- Validate free-form base URL with the same /^https?:\/\// check the
  CLI uses; reject file:/javascript: schemes.

vscode AuthMessageHandler (Suggestion):
- Stop filtering separator entries from the provider QuickPick so
  groups (Alibaba Cloud / Third Party / Custom) actually show as
  headers instead of a flat list.
- Treat a null authInteractiveHandler as an error: surface an
  authError + cancellation notification instead of silently dropping
  the user's input.
- Call notifyAuthCancelled when validateApiKey rejects so the
  webview state resets and the user can retry.

core/providers/presets/openrouter.ts (Critical):
- Replace the substring includes() in ownsModel with a URL-hostname
  match so paths like https://api.example.com/openrouter.ai/v1 stop
  being misidentified as OpenRouter models (and getting removed on
  re-install).

vscode/services/settingsWriter.ts (Critical):
- stripTrailingCommas() so JSONC files with trailing commas (VSCode's
  default style) parse instead of silently returning {} and then
  overwriting the entire settings file.
- readSettings() distinguishes ENOENT (return {}) from parse errors
  (log + rethrow) so a malformed file never gets clobbered.
- writeSettings() writes through a temp file + fs.renameSync atomic
  rename, eliminating the half-written file window on EACCES /
  disk-full / crash.
- setValue() refuses to overwrite a scalar at an intermediate path
  segment (would have silently destroyed e.g. {"env": "legacy-string"}).

core/providers/install.ts (Suggestion):
- Move settings.backup?.() inside the try block so a backup failure
  still triggers the env-rollback path in catch.

cli/config/loadedSettingsAdapter.ts (Suggestion):
- Add the same UNSAFE_KEY_PARTS guard the vscode adapter has, so
  __proto__/constructor/prototype segments are rejected before
  reaching the underlying setNestedPropertySafe walker. Defense in
  depth: not exploitable today but the utility has no built-in guard.

vscode/webview/providers/WebViewProvider.ts (Suggestion):
- Hoist buildInstallPlan / applyProviderInstallPlanToFile to static
  imports (both modules already top-level imported); drops two
  per-call await import() round-trips.

cli/utils/doctorChecks.ts (Suggestion):
- Whitespace nit before the comma in the qwen-code-core import.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-19 13:58:25 +08:00
pomelo-nwu
e5191b885a fix(vscode): await applyProviderInstallPlanToFile and grow test coverage
Critical: applyProviderInstallPlanToFile fired the install plan with
`void`, so any rejection (EACCES from persist(), prototype-pollution
guard throw, etc.) was silently swallowed and WebViewProvider proceeded
to disconnect/reconnect the agent as if the write had succeeded.
Make the wrapper `async` and `await` it in the only caller.

Tests added:
- core/install.test: isSameModelIdentity fallback path
  (prepend-and-remove-owned with no ownsModel) — verifies models are
  matched on id+baseUrl, not just id.
- vscode/AuthMessageHandler.test: happy-path with a fixed-baseUrl
  third-party provider, validateApiKey error branch, and BaseUrlOption
  picker presentation.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-19 10:21:31 +08:00
pomelo-nwu
18b35b48ec i18n(cli): translate "Connect an LLM provider" in all locales
Strict-parity locales (zh, zh-TW) require every built-in command
description to be translated; the renamed /auth description was
falling back to English and breaking the must-translate test.

Add translations for zh / zh-TW (required) and refresh the other
seven locales (en, ru, de, ja, fr, ca, pt) so the old
"Configure authentication information for login" key is removed
everywhere rather than left as a dangling dictionary entry.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 22:36:09 +08:00
pomelo-nwu
1b0b7f93a8 refactor(cli): rename /auth description to "Connect an LLM provider"
The old description ("Configure authentication information for login")
implied a Qwen-account login. After the /auth refactor it's really
about picking an LLM provider and entering credentials, so the menu
entry should say that.

Also add 'connect' as an alt-name alongside the existing 'login' so
users can type /connect when 'auth' feels wrong. Keep 'login' for
muscle-memory compatibility.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 22:12:36 +08:00
pomelo-nwu
ed08f7296b fix(auth): show base URL default as placeholder, not prefilled value
In Custom Provider Step 2/6 (and on protocol switch), the base URL
input started with the protocol's default URL pre-filled. Users who
wanted a non-default endpoint had to manually clear the field first.

Switch to placeholder semantics: the input starts empty, the default
URL is shown as a hint, and submitting blank falls back to that
default (then writes it back to baseUrl so downstream steps see a
real value).

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 21:56:28 +08:00
pomelo-nwu
0d8fe738f5 fix(auth): default Audio modality to off in provider advanced config
In the /auth Custom Provider advanced-config step, "Enable modality"
should default to Image + Video only. Audio was on by default, which
implied the model accepts audio input even though most providers
people configure here don't.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 21:49:31 +08:00
pomelo-nwu
c7f016271c fix(vscode): guard ProviderSettingsAdapter against prototype pollution
The dotted-key writer in createFileSettingsAdapter walked through any
segment, including __proto__/constructor/prototype, which would let a
malicious or malformed ProviderInstallPlan reach Object.prototype.

Refuse to write paths containing reserved segments and use
hasOwnProperty when traversing intermediate objects so that inherited
properties cannot redirect the walk.

Addresses CodeQL alert #226 surfaced on PR #4287.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 21:19:53 +08:00
pomelo-nwu
efcfce5b6b Merge remote-tracking branch 'origin/main' into refactor/unify-provider-config-to-core
# Conflicts:
#	packages/cli/src/ui/components/ManageModelsDialog.test.tsx
#	packages/cli/src/ui/components/ManageModelsDialog.tsx
#	packages/cli/src/utils/apiPreconnect.ts
#	packages/core/src/providers/all-providers.ts
2026-05-18 20:49:31 +08:00
pomelo-nwu
b4aa4b4f38 refactor(providers): prefix Alibaba plan presets with alibaba-
Rename coding-plan.{ts,test.ts} → alibaba-coding-plan.{ts,test.ts} and
token-plan.{ts,test.ts} → alibaba-token-plan.{ts,test.ts} so the file
names line up with the existing alibaba-standard preset and make it
obvious at a glance which presets belong to Alibaba ModelStudio.

Export names (codingPlanProvider, tokenPlanProvider, TOKEN_PLAN_*,
CODING_PLAN_*) are unchanged — only the file paths and the two
imports in all-providers.ts / index.ts move.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 18:48:41 +08:00
pomelo-nwu
9431966625 refactor(providers): drop unused authMethod field from ProviderConfig
Every preset has had authMethod='input' since OpenRouter switched to the
standard API-key flow, making the field a dead dimension. Removing it
cleans up three never-taken branches and aligns the type with reality:
connecting a provider always means entering an API key.

- core: remove ProviderConfig.authMethod; shouldShowStep('apiKey') is
  now unconditionally true; drop authMethod from 9 presets
- vscode AuthMessageHandler: drop the OAuth branch in handleAuthInteractive
- vscode WebViewProvider: simplify the apiKey-required guard
- tests: update provider-config.test and custom-provider.test

If a future provider needs a browser-based flow, the field can be
re-introduced; for now the smaller surface is worth more.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 18:45:17 +08:00
pomelo-nwu
6804a18f01 refactor(auth): unify applyProviderInstallPlan in core, drop cli/auth
CLI and vscode now share core's applyProviderInstallPlan instead of keeping
two parallel implementations. The CLI-only env rollback (snapshot
process.env, restore on error) is folded into the core version so vscode
also benefits from it.

CLI ships a LoadedSettingsAdapter that maps LoadedSettings to core's
ProviderSettingsAdapter contract. Backup/restore is layered: write a .orig
file, structuredClone settings + originalSettings, then recomputeMerged()
on restore — same guarantees as before, just routed through the adapter.

Tests for the install logic are migrated to core and rewritten against the
adapter mock (more focused than the previous LoadedSettings/Config mocks).

packages/cli/src/auth/ is gone entirely.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 18:14:12 +08:00
pomelo-nwu
423e453069 refactor(cli): drop OpenRouter OAuth + /manage-models, simplify /auth
OpenRouter now uses the standard API-key flow under "Third-party Providers"
(issue #4108). The whole OpenRouter OAuth implementation (PKCE, callback
server, model auto-install) and the /manage-models command (only OpenRouter
was wired in; /auth Step 2 already covers model selection) are removed.

/auth is renamed around the "Connect a Provider" mental model:
- Dialog title is now "Connect a Provider"; the OAuth main entry is gone
- handleAuthSelect (mixed close + auth trigger) is split into a single-purpose
  closeAuthDialog; legacy wrappers (handleSubscriptionPlanSubmit,
  handleApiKeyProviderSubmit, handleCustomApiKeySubmit, ...) are dropped in
  favor of the unified handleProviderSubmit

Core: openRouterProvider switches to authMethod='input', uiGroup='third-party',
ships with two recommended free models, and is reordered to the end of the
third-party list to keep DeepSeek as the default highlight.

Net diff: 34 files, +124 / -3835.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 17:54:40 +08:00
jinye
33b2e0dccc
fix(serve): normalize Windows path separators in workspace file read responses (#4279)
`workspaceRelative` returned the platform-native separator from
`path.relative`, leaking backslashes into `/file`, `/stat`, `/list`,
and `/glob` response paths on Windows. Surfaced as a Windows-only CI
failure in the `GET /glob > scopes glob matches to cwd` test
(`['sub\\inside.ts']` vs expected `['sub/inside.ts']`).

Always emit POSIX-style separators so SDK consumers see the same
shape across platforms.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
2026-05-18 17:53:14 +08:00
jifeng
b208f34455
feat(serve): add /demo debug page for qwen serve daemon (#4132)
* feat(serve): add /demo debug page for qwen serve daemon

Add a self-contained HTML debug page at GET /demo that provides a
browser-based UI for exercising all daemon routes: session
create/attach, prompt send/cancel, SSE event streaming, model
switching, permission voting, and health/capabilities checks.

Also add a same-origin exemption middleware (before the CORS deny
layer) so browser fetch calls from the demo page pass through while
external Origins remain blocked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(serve): address CR feedback for /demo page security and robustness

- Fix XSS: build permission buttons with DOM APIs instead of innerHTML
- Fix SSE: move currentEvent outside read loop for cross-chunk frames
- Fix SSE: handle stream end (flush trailing buffer, update UI status)
- Security: move /demo route behind hostAllowlist and bearerAuth guards
- Security: add host.docker.internal to same-origin Origin allowlist
- Add Auth Token input and include Authorization header in API/SSE calls
- Add try/catch to /demo route handler with writeStderrLine logging
- Check API result before removing permission card from UI
- Add 7 tests for /demo route and Origin-stripping middleware

* fix(serve): move /demo before bearerAuth so browsers can reach it

Browsers cannot attach Authorization headers on address-bar navigation,
so /demo behind bearerAuth was unreachable when --token was set. Move
the /demo route after CORS + Host allowlist but before bearerAuth. The
static HTML shell contains no secrets; all API/SSE routes remain
bearer-protected and the in-page token input authenticates them.

* feat(serve): show 401 token hint on demo page

When an API call returns 401 Unauthorized, highlight the Auth Token
input field with a yellow border and display a hint message guiding
the user to enter their bearer token. Applies to both API calls and
SSE connections. The hint auto-dismisses after 6 seconds.

* fix(serve): address round-2 CR feedback for /demo security

- Loopback-gate /demo like /health: pre-auth on loopback, post-auth on
  non-loopback. Prevents unauthenticated access on public interfaces.
- Add X-Frame-Options: DENY + CSP frame-ancestors to /demo response to
  prevent clickjacking via cross-origin iframe embedding.
- Cache selfOrigins Set in Origin-stripping middleware (rebuild only
  when port changes) instead of allocating per-request.
- Clear pendingPerms + reset currentAssistantBubble on session switch
  to prevent stale permission cards from the previous session.
- Update tests: loopback vs non-loopback /demo auth, anti-clickjacking
  headers, rename CORS test, add localhost and [::1] origin coverage.

* fix(serve/demo): address CR round-2 feedback for demo page UX

- Replace innerHTML with DOM APIs in addLog() to prevent XSS
- Add MAX_LOG_ENTRIES=500 pruning to prevent unbounded DOM growth
- Add concurrent-send guard (promptInFlight) to prevent double submits
- Show error feedback in Chat tab when sendPrompt or createSession fails
- Disable prompt controls on SSE failure (both catch and non-OK paths)
- Add toolCall context detail (command/input/path/diff) to permission cards

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-18 16:31:03 +08:00
jinye
52d2850c7f
feat(serve): safe workspace file read routes (#4175 PR 19) (#4269)
* refactor(serve/fs): glob audit hashes workspace + emits pattern

Closes PR #4250 follow-up #4.

Hashing the per-call cwd for glob audit produced a different
pathHash for every subdirectory glob without giving operators any
actionable difference (raw paths are privacy-gated). Replace the
hash basis with the bound workspace itself and surface the
literal pattern on a new schema field, so every glob row carries
a stable workspace marker and a per-call pattern.

The pattern field also fires on parse_error denials (path-escape
patterns, non-relative patterns) so audit consumers debugging a
production glob rejection can see the exact rejected pattern
without needing QWEN_AUDIT_RAW_PATHS=1.

* feat(serve): safe workspace file read routes (#4175 PR 19)

Add four read-only HTTP routes that consume PR 18's per-request
WorkspaceFileSystem boundary:

  - GET /file?path=...       text content + meta (encoding/BOM/lineEnding)
  - GET /list?path=...       directory entries (name/kind/ignored)
  - GET /glob?pattern=...    workspace-relative match paths
  - GET /stat?path=...       file/directory metadata

The routes share one error envelope (sendFsError) that maps
FsError.kind through the boundary's existing DEFAULT_STATUS_BY_KIND
table to a typed JSON response. All four 200 responses set
Cache-Control: no-store and X-Content-Type-Options: nosniff so a
browser-adjacent client cannot cache or sniff source content.

Routes are advertised under a single workspace_file_read capability
tag — the four endpoints share the same backing boundary and the
same failure shape, so per-route tags would force four
simultaneous registry entries with no operator-meaningful
difference between them. Mutating routes will ship in PR 20 under
their own workspace_file_write tag.

Trust gate is unchanged: read intents pass on untrusted workspaces
per PR 18's policy.ts. Auth follows the global bearer flow only;
read routes never run mutate(), since none of them mutate state.

* feat(serve): runQwenServe injects fsFactory + emit pipeline

Closes PR #4250 follow-up #2.

runQwenServe now constructs a WorkspaceFileSystem factory from
the bound workspace, threads its emit hook through to the read
routes, and exposes the trust snapshot via deps.trustedWorkspace.
Test additions pin the wiring contract:

  - audit events emitted on success / denial flow back through
    the test-supplied fsAuditEmit hook
  - deps.fsFactory override is honored (built-in default does not
    silently shadow injection)
  - trust snapshot defaults to true (operator-chosen workspace)
  - trust=false routes through to the boundary and trips
    untrusted_workspace on write intents

Default emit stays a stderr warning so a wiring regression that
drops events remains visible. PR 21's SSE fan-out will replace
the default with a workspace-scoped event channel.

* fixup(serve): address PR #4269 round-1 review feedback

Closes 8 findings from Copilot inline review + Codex review on
PR #4269 (5 P0, 3 P1):

P0 (correctness / privacy / operations)

- runQwenServe.ts: throttle the default fsAuditEmit by reusing
  the exported `createDefaultFsAuditEmit` from server.ts. The
  earlier per-event `writeStderrLine` would print one line for
  every /file/list/glob/stat audit event under normal traffic.
  Now warns once + every 100th drop with payload context, so a
  wiring regression is still visible without flooding logs.
  (Copilot runQwenServe.ts:316; Codex runQwenServe.ts:305)
- routes/workspaceFileRead.ts: probe glob with maxResults+1 and
  trim, so `truncated` reflects whether the boundary actually
  had more matches. Earlier `length === maxResults` heuristic
  false-positived when the workspace happened to hold exactly N
  matches. (Copilot workspaceFileRead.ts:399)
- routes/workspaceFileRead.ts: glob `relMatches` now flows
  through the shared `workspaceRelative` helper. Root match
  (`pattern=.`) renders as "." rather than the empty string
  `path.relative` returns; helper also covers the
  boundWorkspace-undefined edge case so the route no longer
  carries its own fallback branch.
  (Copilot workspaceFileRead.ts:388; review summary HIGH-1)
- fs/audit.ts: `pattern` field now rides on the same privacy
  gate as `relPath` / `message`. Glob patterns commonly carry
  workspace-relative or absolute path fragments
  (`src/secrets/*.env`, rejected `/Users/alice/ws/**`), so
  emitting them in privacy mode bypassed the same redaction the
  other path-bearing fields honor. Operators wanting full
  forensic context opt in via QWEN_AUDIT_RAW_PATHS=1.
  (Codex audit.ts:249)
- routes/workspaceFileRead.ts: cwd resolves with intent='list'
  rather than 'glob'. The orchestrator's `recordAndWrap`
  auto-derives `data.pattern` from `intent === 'glob'`, which
  turned cwd-resolution failures into rows where the cwd string
  masqueraded as the glob pattern (`?cwd=../outside` →
  `pattern: ../outside` in audit). Switching to 'list' is the
  correct semantic shape (cwd is a directory we intend to walk)
  with identical trust + path-resolution behavior.
  (Codex workspaceFileSystem.ts:941)

P1 (cosmetic / comment accuracy)

- server.test.ts: `honors deps.fsFactory override` test comment
  rewritten to match the actual failure mode (a regression would
  404 on a.txt, not 200 against package.json). (Copilot server.test.ts:3219)
- routes/workspaceFileRead.ts: `limit` error message uses the
  MAX_LIST_ENTRIES constant instead of the literal 2000.
  (review summary MEDIUM)
- fs/audit.ts: expanded the JSDoc explaining why the AuditPublisher
  request types Omit four fields and pass `pattern` through.
  (review summary MEDIUM)

Test additions / adjustments

- audit.test.ts: split the existing pattern tests into raw-paths
  and privacy-default cases; added two new privacy-mode assertions
  that strip pattern under default config.
- workspaceFileSystem.test.ts: harness accepts `includeRawPaths`;
  glob audit suite runs with raw paths to observe `pattern`;
  new `glob audit privacy default` suite asserts pattern + relPath
  are stripped without the env opt-in.
- workspaceFileRead.test.ts: new GET /glob cases for the
  truncated edge case (count == maxResults → false; count >
  maxResults → true) and root-match normalization.

Not adopted (with rationale)

- review summary HIGH-2 (glob pathHash uses boundWorkspace): this
  is the deliberate follow-up #4 contract from PR 18; pattern is
  the per-call signal, pathHash is the workspace marker.
- review summary MEDIUM-1 (parseIntInRange three-state return):
  matches `parseMaxQueuedQuery` in server.ts; consistency wins.
- review summary LOW-1/2/3 (capabilities comment length, CSP
  header, reverse truncated:false assertion): rationale already
  documented in code, CSP belongs in a hardening PR, the
  reverse assertion already exists.

518/518 serve tests pass; typecheck + eslint clean within
src/serve/.

* fix(serve): address workspace file read review

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(serve): tighten workspace file read review followups

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 16:17:14 +08:00
易良
8f9b979ec0
test: reduce wait-dependent UI test delays (#3987)
* test: reduce wait-dependent UI test delays

* test: address UI test review feedback

* test: address remaining UI test review feedback

- Replace runPendingRetryTimer's polling loop with
  vi.advanceTimersToNextTimerAsync to decouple from connect()'s
  internal async structure.
- Document that newSessionWithRetry's 300ms auth-delay timer is
  unreachable under the current mocks so future tests advance it
  via the same helper instead of restoring shouldAdvanceTime.
- Stop swallowing runPendingRetryTimer errors behind
  connectPromise.catch(e => e); attach a noop catch only to mute
  the unhandled-rejection guard and assert directly on the
  original promise.
- Comment why flush() yields twice (Ink 7 + React 19 split a
  render across two microtask ticks) so it is not "optimized" to
  a single yield.
- Move the trailing unmount() into the finally block in the
  shell-history placeholder test so an assertion failure does
  not leak the Ink instance.
2026-05-18 15:34:34 +08:00
Shaojin Wen
0bd703bce6
fix(serve): add mcp_guardrails to E2E capabilities expectation (#4268)
PR #4247 (`feat(serve): MCP client guardrails`) added the always-on
`mcp_guardrails` capability to `SERVE_CAPABILITY_REGISTRY` and updated
the unit-level `EXPECTED_STAGE1_FEATURES` in
`packages/cli/src/serve/server.test.ts`, but missed the matching
integration-test expectation. The E2E `qwen serve — capabilities
envelope > advertises all baseline capabilities` assertion has been
failing on `main` since #4247 landed.

Append `'mcp_guardrails'` to the expected `caps.features` array, in the
same position as `SERVE_CAPABILITY_REGISTRY` and the unit-level
constant (after `session_metadata`, before the conditional
`require_auth`). No production code changes.
2026-05-18 14:30:27 +08:00
jinye
103090669e
feat(serve): workspace memory and agents CRUD (#4175 Wave 4 PR 16) (#4249)
* feat(serve): workspace memory and agents CRUD (#4175 Wave 4 PR 16)

Adds the first Wave 4 mutation route surface: workspace-scoped memory
and subagent CRUD over HTTP. Remote clients (TUI / channels / web /
IDE adapters) can now list, read, create, update, and delete subagent
definitions and read / append / replace QWEN.md without disturbing
session state.

Routes:
- GET    /workspace/memory             (read-only snapshot)
- POST   /workspace/memory             (append/replace, strict-gated)
- GET    /workspace/agents             (list project + user + builtin)
- POST   /workspace/agents             (create-only; 409 on collision)
- GET    /workspace/agents/:agentType  (full detail incl. systemPrompt)
- POST   /workspace/agents/:agentType  (update; 403 read-only on builtin)
- DELETE /workspace/agents/:agentType  (idempotent for SDK callers)

Mutation paths use mutate({ strict: true }) from PR 15 so they refuse
unauthenticated requests even on no-token loopback defaults. Workspace
mutations validate X-Qwen-Client-Id against bridge.knownClientIds() and
stamp originatorClientId on emitted events.

Capability tags added: workspace_memory, workspace_agents.

New typed events fanned out via bridge.publishWorkspaceEvent (best-
effort to every active session bus; read-after-write is the contract):
- memory_changed { scope, filePath, mode, bytesWritten }
- agent_changed  { change, name, level }

writeContextFile.ts is the new core helper that resolves
QWEN.md placement (workspace vs ~/.qwen) and append-vs-replace
semantics. Whitespace-only appends short-circuit before fs.writeFile,
so a no-op POST does not bump mtime or fan out a misleading event.

SubagentManager is wrapped with a CRUD-scoped Config stub via Proxy:
only getSdkMode / getProjectRoot / getActiveExtensions are stubbed
(verified against subagent-manager.ts; getToolRegistry is execution-
path only). Any future Config method touched on a CRUD path throws
immediately so dependency creep is visible.

Auto-memory CRUD, persistent audit log, and the EACCES → NOT_FOUND
unlink mapping in core SubagentManager.deleteSubagent are explicit
follow-ups (PR 16.5 / PR 24 / separate fix).

Validation:
- typecheck: cli + sdk-typescript clean
- vitest:    serve 348/348, writeContextFile 10/10, SDK 335/335
- eslint:    clean

* fix(serve): address Codex P2 review on PR 16 (#4175 Wave 4 PR 16 follow-up)

Three correctness issues Codex flagged on the just-shipped workspace
memory + agents CRUD surface:

1. Concurrent POST /workspace/memory append no longer loses writes.
   Two simultaneous appends would each read the same existing file,
   compose new content in JS memory, then race the fs.writeFile —
   the later write silently overwrote the earlier appended entry.
   Add a per-resolved-path Mutex map (mirroring jsonl-utils.ts's
   fileLocks pattern) and wrap the entire read-compose-write
   sequence in runExclusive.

2. GET /workspace/agents now reflects out-of-band file changes.
   SubagentManager.listSubagents() default served the in-memory cache;
   developer / IDE adapter edits to .qwen/agents/*.md never appeared
   even though GET /workspace/agents/:agentType always reads disk.
   Pass { force: true } so the LIST route walks disk every call,
   matching the detail route's "filesystem is the source of truth"
   contract.

3. Reject builtin agent names on POST /workspace/agents to prevent
   undeleteable shadow files. A client could write a project-level
   agent named "general-purpose" — list/load resolved the shadow
   first, but SubagentManager.deleteSubagent's name-based builtin
   guard (subagent-manager.ts:302) rejected DELETE forever. Add a
   BuiltinAgentRegistry.isBuiltinAgent check in parseAgentConfig
   so the conflict surfaces at create time instead of trapping the
   file beyond the API. The check is case-insensitive, matching the
   resolver's case-insensitive cascade.

New tests:
- writeContextFile.test.ts: 10 parallel appends, all 10 entries
  must survive in the final file (would fail without the mutex).
- workspaceAgents.test.ts: GET /workspace/agents observes a
  freshly-written agent file on the second call (force-refresh
  proof); POST with name="general-purpose" returns 422 + the
  case-insensitive variant "explore" too.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: serve 351/351 (was 348, +3 new), writeContextFile 11/11
- eslint: clean

* fix(serve): apply round-1 review fold-in 2a (HIGH + CodeQL) on PR 16

Round-1 inline review (#4249) flagged ~28 items across Copilot,
wenshao, and CodeQL. This commit lands the HIGH-severity correctness
fixes plus the two CodeQL polynomial-regex warnings.

Validation tighten — `parseAgentConfig` + `parseAgentUpdates`:
- Trim leading/trailing whitespace on `name` before passing to
  SubagentManager. `" tester "` would otherwise create a frontmatter
  name with spaces that case-insensitive lookups can never find.
- Fail-closed (422 invalid_config) on present-but-wrong-type optional
  scalars: `model`, `color`, `approvalMode`, `background`. Previously
  malformed values silently dropped through validation, masking
  client-serialization bugs.
- Validate `approvalMode` against the `APPROVAL_MODES` enum on both
  create and update; an unknown value used to 201 with the field
  silently omitted from the saved file.
- `runConfig` is now whitelist-sanitized to `{ max_time_minutes,
  max_turns }` only; unknown keys are dropped, malformed values
  return 422. Previously the whole input object was persisted
  verbatim into YAML frontmatter.
- `?scope=` query is fail-closed for repeated values
  (`?scope=workspace&scope=global`) — Express parses these as arrays
  which the previous `typeof === 'string'` check silently treated as
  absent, broadening DELETE/UPDATE semantics from one level to both.
- Empty update body returns 400 invalid_config (previously rewrote
  the file + emitted a misleading `agent_changed` event).
- No-op updates (every supplied field already matches `existing`)
  return 200 + `changed: false` and SKIP the file rewrite + event
  fan-out.

Memory write helper — `writeContextFile.ts`:
- Move whitespace-only no-op detection BEFORE `fs.mkdir`. Without
  this, an empty POST still created the parent directory and bumped
  its mtime even though `changed: false` was reported.
- Replace two polynomial regex patterns flagged by CodeQL
  (`/^\s+|\s+$/g` and `/^\n+|\n+$/g`) with hand-rolled `while` loops.
  Same pattern auth.ts:120-125 already uses for the same CodeQL rule.

SDK — `DaemonClient.ts` + `types.ts`:
- `DaemonWriteMemoryResult` gains optional `changed?: boolean` so
  typed callers can suppress redundant cache invalidation on no-op
  appends. Optional for forward-compat with daemons that predate the
  field — undefined treats as "changed: true" (legacy contract).
- `deleteWorkspaceAgent` only swallows 404 when the body's `code`
  is `agent_not_found`. A bare 404 (older daemon, misrouted proxy,
  generic gateway page) now throws — previously the SDK silently
  reported success even when the request never reached a route that
  understands workspace agents.
- `updateWorkspaceAgent` adds an optional `scope` parameter
  mirroring `deleteWorkspaceAgent`, so callers can target the user-
  level definition when a project-level agent shadows it.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: serve 357/357 + writeContextFile 12/12 = 369/369 passing
  (was 362; +7 new)
- eslint: clean

Explicitly NOT applying (out of scope per issue #4175 PR 16
review-resolution policy):
- Copilot's "strict gate after body parser" finding — already
  documented as PR 15 review-resolved tradeoff at auth.ts:256-269.

* fix(serve): apply round-1 review fold-in 2b (MEDIUM + tests) on PR 16

MEDIUM hardening:
- Fix the JSDoc on `collectWorkspaceMemoryStatus` to match the
  workspace-root-only discovery the implementation actually does
  today. The 32-iteration upward walk is reserved for a future
  hierarchical mode but breaks after iteration 1 in v1.
- Lower the depth limit on `walkWorkspaceForMemory` from 32 → 12.
  Realistic project depth sits well below 8; 12 leaves headroom
  without amplifying blast radius from symlink cycles.
- Daemon `Config` Proxy now defines a `has` trap symmetric to the
  existing `get` trap. Without it, a future SubagentManager path
  doing `'someMethod' in this.config` would silently get `false` and
  bypass the safety net the throw-on-unknown-property design
  installed.
- Preflight `manager.loadSubagent(name, level)` before
  `manager.createSubagent`. The default-path collision check inside
  SubagentManager would otherwise miss same-frontmatter-name +
  different-filename collisions; the preflight makes 409
  agent_already_exists deterministic.
- Multi-level DELETE now emits one `agent_changed` event per level
  that actually had a file removed. Previously an unscoped DELETE
  removing both project and user shadows would publish only one
  event with one level — misleading subscribers using event metadata
  for toasts / audit / echo-suppression.

Test additions (covers the new event types + bridge fan-out + SDK
helpers):
- `daemonEvents.test.ts`: predicate narrowing for `memory_changed` /
  `agent_changed` (rejects malformed scope/mode/level), reducer
  records `lastWorkspaceMutation` + `lastWorkspaceMutationType` with
  latest-event-wins semantics and stays non-terminal.
- `httpAcpBridge.test.ts`: `publishWorkspaceEvent` fans out to every
  active session bus; `knownClientIds()` aggregates clientIds across
  sessions and the returned set is a snapshot (mutating it does not
  affect future calls).
- `workspaceAgents.test.ts`: success-path test stamping
  `originatorClientId` on the create / update / delete events for a
  known client.
- `DaemonClient.test.ts`: 7 round-trip tests for the new SDK helpers
  (workspaceMemory, writeWorkspaceMemory, listWorkspaceAgents,
  getWorkspaceAgent, createWorkspaceAgent, updateWorkspaceAgent with
  scope query, deleteWorkspaceAgent: 204 / structured 404 / bare 404
  triage).
- `writeContextFile.test.ts`: replace the 30ms-mtime test with a
  `vi.spyOn(fs, 'writeFile')` assertion that the no-op path never
  invokes writeFile. Deterministic on every filesystem.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: serve 363/363 + writeContextFile 12/12 + SDK 347/347
- eslint: clean

Reviewer guide: combined with fold-in 2a (commit 134c43c82),
PR 16's round-1 review feedback is closed except for the explicitly-
deferred Copilot finding on "strict gate after body parser" (already
documented as PR 15 review-resolved tradeoff at auth.ts:256-269).
The DRY refactor wenshao suggested for `resolveOriginatorClientId`
is left as a future sweep — it touches multiple Wave 4 routes and
should land alongside PR 17/19/20/21 to keep the helper's shape
informed by all consumers.

* docs(serve): apply round-1 review fold-in 2c (doc/type tightening) on PR 16

Two doc-only fixes that close the last open Copilot threads on PR
#4249 — both are JSDoc/tsdoc corrections where the wording promised
broader behavior than the implementation actually delivers, so a
maintainer or SDK consumer reading the type would form a wrong
mental model.

1. `DaemonAgentLevel` (sdk-typescript) and `ServeAgentLevel` (cli
   serve) keep `'extension'` + `'session'` on the union for forward-
   compat but the JSDoc now explicitly says the daemon does NOT
   return either today. The `'extension'` case is gated by the
   daemon's stub `Config.getActiveExtensions()` returning `[]`;
   `'session'` is a runtime-only `SubagentManager` cache the CRUD
   routes don't read. Both arms stay so a future PR exposing either
   source is not a breaking SDK change.
2. `DaemonClient.workspaceMemory()` tsdoc no longer says
   "hierarchical" — v1 only discovers files at the bound workspace
   root + the global `~/.qwen` directory, no parent-directory walk.
   The 12-iteration upward-walk loop body inside
   `walkWorkspaceForMemory` is reserved for PR 16.5 hierarchical
   mode and breaks after iteration 1 today; the SDK doc now states
   that explicitly so callers don't expect more than they receive.

No runtime change. Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 363/363 serve + 12/12 writeContextFile + SDK unchanged
- eslint: clean

* fix(serve): apply round-2 review fold-in 2d on PR 16

wenshao round-2 (4 inline comments at 16:51-16:53Z): three real bugs
+ one performance-tradeoff doc note.

1. `composeAppendedContent` now inserts inside the MEMORY section,
   not at EOF. Previously a QWEN.md whose `## Qwen Added Memories`
   block was followed by another `## ...` heading would silently
   land each new entry past the next heading — moving entries into
   the wrong section. Walk the memory header forward, find the next
   `\n## ` heading, and insert just before it. Fall back to the EOF
   append when the memory section is the last block.

2. `parseAgentUpdates` now matches the create-side trim/empty rule
   for `description` (rejects whitespace-only) and ensures
   `systemPrompt` rejects the empty string. Update path used to
   silently accept `"   "` and overwrite the field with blank
   content — divergent from create which 422s the same payload.

3. `isNoOpUpdate`'s runConfig comparison no longer false-positives
   on partial updates. Comparing every known runConfig field against
   `existing` treated absent keys as `undefined` while existing had
   real values — so `{max_time_minutes: 30}` against `{max_time_minutes:
   30, max_turns: 10}` claimed non-no-op and re-emitted
   `agent_changed`. Fixed to only compare keys actually present in
   `updates.runConfig`, matching `mergeConfigurations` semantics
   (existing values preserved when not in updates).

4. JSDoc on the LIST-route `force: true` call now explains the
   tradeoff (no TTL cache / no fs.watch invalidation): re-introducing
   caching would re-introduce the stale-list bug Codex P2 #2 fixed,
   `fs.watch` is platform-fragile, and PR 24's audit/policy layer is
   the proper home for request rate limiting. Sub-millisecond cost
   per request on local SSD; revisit if profiling flags it.

Tests:
- writeContextFile.test.ts: section-boundary insertion + EOF fallback
- workspaceAgents.test.ts: whitespace-only description rejected; partial
  runConfig no-op detection; partial runConfig real change preserves
  omitted keys via mergeConfigurations

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 368/368 (was 363, +5 new)
- eslint: clean

* fix(serve): apply round-3 review fold-in 2e on PR 16

wenshao round-3 (5 inline [Suggestion]s, all real correctness or
forward-compat issues; one item carried over from round-2):

1. `parseAgentConfig` rejects whitespace-only `systemPrompt` on
   create, matching the description field's `trim().length === 0`
   rule. Pure-whitespace prompts collapse to nothing on YAML
   serialization and the agent can't operate without instructions —
   422 at the boundary is friendlier than the downstream "agent does
   nothing" failure.
2. `parseAgentUpdates` mirrors the same `trim()` check on the update
   path so `{systemPrompt: "   "}` returns 422 rather than silently
   blanking the field.
3. `POST /workspace/memory` `file_error` 500 response now carries
   `scope`, `mode`, optional `osCode` (`EACCES`/`EROFS`/`ENOSPC`/...)
   and a redacted `errorMessage`. Previous shape was just
   `{error, code: 'file_error'}` — callers had nothing to branch on.
4. `composeAppendedContent` runs `fs.stat` before `fs.readFile` and
   refuses with a typed `WorkspaceMemoryFileTooLargeError` when the
   existing file exceeds 16 MB. Without this cap a pathological QWEN.md
   would be loaded into the daemon heap on every append. The route
   maps the typed error to a 413 with `code: 'memory_file_too_large'`
   plus `bytes` / `limit` so callers can decide whether to trim or
   switch to mode=replace.
5. `toDetail` no longer spreads `config.runConfig` with a cast.
   Explicit field-by-field pick of `max_time_minutes` / `max_turns`
   ensures any future `SubagentConfig.runConfig` field requires a
   deliberate route-schema update rather than silently leaking
   through the HTTP API.

Tests:
- workspaceAgents.test.ts: whitespace-only systemPrompt rejected on
  create AND update; toDetail.runConfig only emits whitelisted keys
- existing tests still cover the description-side trim and the
  partial runConfig no-op detection from fold-in 2d

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 371/371 (was 368, +3 new)
- eslint: clean

Reviewer note: response shape on 500 file_error is additive
(`scope`/`mode`/`osCode`/`errorMessage` are new fields), so SDK
callers that only consumed `{error, code}` keep working. The new
413 `memory_file_too_large` is a new error code SDK consumers can
branch on but that pre-PR-16 daemons never emitted, so adding it is
also additive.

* fix(sdk): expose `changed` on DaemonAgentMutationResult (PR 16 round-4)

wenshao round-4 review (single inline at types.ts:434): the agent
update route emits `changed: true` for real updates and
`changed: false` for no-op short-circuits (introduced in fold-in
2a alongside the no-op detection), but `DaemonAgentMutationResult`
in the SDK type still only exposed `{ ok, agent }`. Typed callers
of `updateWorkspaceAgent()` couldn't observe the no-op signal even
though `DaemonClient` already returns the raw JSON at runtime.

Add optional `changed?: boolean` matching the shape introduced for
`DaemonWriteMemoryResult.changed` in fold-in 2a. Optional for
forward-compat with daemons that predate the field; SDK consumers
should treat `undefined` as `true` (the legacy contract — every
successful create / update was a write before fold-in 2a's no-op
short-circuit landed).

Test:
- `DaemonClient.test.ts`: round-trip asserts the typed result
  surfaces `changed: false` from the wire payload.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 82/82 in DaemonClient.test.ts (was 81; +1 new)
- eslint: clean

* fix(serve): apply round-6 review fold-in 2g on PR 16

Round-6 review (gpt-5.5 [Critical] + 5 wenshao [Suggestion]s).

[Critical] Per-level delete verification (workspaceAgents.ts):
- gpt-5.5 flagged that `SubagentManager.deleteSubagent` swallows
  per-level `fs.unlink()` failures (subagent-manager.ts:332-336)
  and returns success as long as ANY level was removed. Trusting
  that signal would let the route publish `agent_changed`/`deleted`
  for a file still on disk under EACCES/EBUSY/EPERM — the client UI
  would drop a still-active definition from cache.
- Route now runs `fs.access` on each pre-checked level's file path
  AFTER `manager.deleteSubagent` returns and partitions into
  `removed` / `remaining`. Events are emitted ONLY for confirmed
  removals; if any level still has its file, the route returns 500
  `agent_delete_partial` with `removedLevels` + `remainingLevels`
  so callers can act precisely.
- New test installs a 0o555 chmod on the user-level agents directory
  so `fs.unlink` raises EACCES while the project-level unlink
  succeeds, asserting both the 500 response and that exactly one
  `agent_changed` event fired for the level that actually went away.

Concurrency consistency (writeContextFile.ts):
- Whitespace-only no-op detection now happens INSIDE the per-file
  mutex's `runExclusive` block. The pre-fix layout did the
  short-circuit `fs.stat` outside the lock; under concurrent
  POSTs (one whitespace-only, one with real content) the no-op's
  `bytesWritten` could lag the post-write reality. Functional
  behavior was already correct; this aligns the snapshot with the
  post-write state.

Defense-in-depth + DRY (workspaceAgents.ts):
- `validateAgentType(req, res)` regex-validates `:agentType` URL
  parameter at the route boundary against the same
  `^[\\p{L}\\p{N}_-]+$/u` pattern as `SubagentValidator.validateName`,
  with a 64-char cap. `findSubagentByNameAtLevel`'s readdir scan
  already prevented path traversal, but failing fast at the boundary
  keeps surprising inputs out of downstream code paths. Two new
  tests cover `..%2Fetc%2Fpasswd` and over-long names.
- `parseScopeQuery(req, res)` extracts the duplicated `?scope=` query
  parser from the POST update + DELETE handlers. Same fail-closed
  semantics on repeated/non-string values.
- `assertMutableLevel(found, agentType, res)` extracts the
  duplicated `isBuiltin || level === 'builtin' || 'extension' ||
  'session'` 403 guard. Future Wave 4 mutation routes (PR
  17 / 19 / 20) call this helper instead of re-implementing the
  predicate.

Client-id helper consistency (workspaceMemory.ts):
- `resolveWorkspaceClientId` removed; the inline branch in the POST
  handler now mirrors `workspaceAgents.ts:resolveOriginatorClientId`
  (validate against `bridge.knownClientIds()`, send 400 directly,
  return so the caller short-circuits). Previously this file threw
  `InvalidClientIdError` and caught it locally — wenshao round-6
  flagged the throw-vs-direct-400 inconsistency between the two
  files. The deeper full-extraction DRY refactor remains deferred
  to the cross-Wave-4 sweep with PR 17/19/20/21.

Won't-fix doc note (workspaceMemory.ts):
- Mount-point JSDoc now explicitly explains why the route returns
  absolute on-disk paths (success / 413 / GET list): clients
  pre-flight `caps.workspaceCwd` to learn the bound workspace and
  can compute relative paths if they want; the global scope's
  `~/.qwen/QWEN.md` is NOT under the workspace root, so a
  workspace-relative form would lose information. Path redaction
  for multi-tenant deployments belongs to PR 24's `--redact-errors`
  policy work, not a per-route default flip in PR 16.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 374/374 (was 371, +3 new)
- eslint: clean

* fix(serve): apply round-7 review fold-in 2h on PR 16

glm-5.1 round-7: 2 [Critical] + 5 [Suggestion] inline comments. Five
applied as code changes; one is a stale-snapshot false positive
(workspaceMemory.ts no longer has the InvalidClientIdError call site
glm-5.1 referenced — fold-in 2g already replaced it with inline
400); one is rationale-replied (INVALID_CONFIG → 422 mapping
suggestion is based on incorrect premise about manager semantics).

[Critical] Code-fence-aware section-boundary detection (writeContextFile.ts):
- The naive `\n## ` indexOf scan would split user-authored memory
  entries that quote markdown documentation containing `##` headings
  inside fenced code blocks. New `findNextTopLevelHeading` helper
  tracks fence state line-by-line and only accepts matches outside
  fences. Two new tests: (a) entry containing a fenced `## Request
  Body` keeps its body intact; (b) real `## post` heading outside
  fences still acts as the section boundary.

[Suggestion] errorMessage + filePath gating (workspaceMemory.ts):
- 500 `file_error` and 413 `memory_file_too_large` responses now
  omit `errorMessage` and `filePath` unless `QWEN_SERVE_DEBUG` is
  set. Default response carries `error / code / scope / mode /
  osCode` — enough for SDK callers to branch without leaking
  absolute filesystem paths. New test asserts both modes round-trip
  the right shape.

[Suggestion] publishWorkspaceEvent visibility (httpAcpBridge.ts):
- Catch block now writes to stderr unconditionally during normal
  operation; only downgrades to the debug channel when
  `shuttingDown` is true. `EventBus.publish` is documented never to
  throw, so a hit in normal ops is by definition a regression that
  must be visible in production logs — silencing via debug-gate
  could let a true bug succeed at the route layer (200 OK) while
  SSE subscribers stop receiving events.

[Suggestion] Log-injection defense for `agentType` (workspaceAgents.ts):
- New `safeLogValue` helper wraps `agentType` interpolations in
  `JSON.stringify(...).slice(0, 82)` before stderr writes (mirrors
  `server.ts:1340`). The route's `validateAgentType` regex already
  rejects names with control chars, but defense-in-depth covers
  legacy on-disk shadows and future fields. Five `writeStderrLine`
  call sites updated (GET / POST / DELETE failure, reload-failure,
  partial-delete, create-reload-failure).

[Suggestion] Simplify walkWorkspaceForMemory (workspaceMemory.ts):
- Replaced the 12-iteration loop with a straightforward single-pass
  stat-each-filename. The `seen` Set, `cursor = parent` walk, and
  filesystem-root guard were dead code (the loop unconditionally
  broke on first iteration). PR 16.5's hierarchical mode lands as a
  fresh upward walk rather than re-enabling commented-out code.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 377/377 (was 374, +3 new)
- eslint: clean

Reviewer notes (NOT adopting):
- glm-5.1's "InvalidClientIdError('workspace', ...)" message-confusion
  Critical: stale-snapshot false positive — fold-in 2g already
  removed `resolveWorkspaceClientId` and inlined a 400 with the
  correct "registered for this workspace" wording. Only a comment
  reference remains.
- glm-5.1's "INVALID_CONFIG → 422" suggestion: SubagentManager only
  ever throws INVALID_CONFIG for read-only conditions (built-in /
  extension / session) — not for malformed config (which uses
  VALIDATION_ERROR). The current 403 mapping in update + delete is
  correct for the manager's actual semantics.

* fix(serve): apply round-8 review fold-in 2i on PR 16

wenshao round-8: 2 [Critical] path-disclosure + 5 [Suggestion]
(name regex, per-field caps, mutex timeout, test gaps, tilde fence).
All adopted.

[Critical] C1 — 413 `err.message` path disclosure (workspaceMemory.ts):
- The 413 `memory_file_too_large` response sent `err.message`
  unconditionally as the `error` field. The
  `WorkspaceMemoryFileTooLargeError` constructor embeds the
  absolute file path in its message ("Existing memory file at
  /Users/<x>/.qwen/QWEN.md is ..."), bypassing the `debugMode()`
  gating that already hid the `filePath` field. Same gating now
  applies to both `error` and `filePath`; default response carries
  a generic string + structured `code` / `bytes` / `limit` so SDK
  callers can branch without the path leak.

[Critical] C2 — workspaceAgents FILE_ERROR `err.message` (workspaceAgents.ts):
- Two catch blocks (create + update) sent `SubagentError(FILE_ERROR)`
  messages directly in the response. Node fs errors embed paths
  like "ENOENT: ... '/Users/<x>/.qwen/agents/foo.md'". Both now
  gate behind `isServeDebugMode()`; default response is the generic
  "Failed to write workspace agent file" envelope.

Shared `isServeDebugMode` helper (debugMode.ts new):
- Moved from inlined copies in workspaceMemory.ts to a small
  shared module so both route files (and future Wave 4 mutation
  routes) share one canonical predicate.

[Suggestion] S1 — POST body `name` validation (workspaceAgents.ts):
- `parseAgentConfig` now applies the same regex + length contract
  as `validateAgentType` (`^[\p{L}\p{N}_-]+$/u`, 2-64 chars). A
  client posting `name: "my/agent"` or 100-char name now fails at
  the body-validation boundary with a 422 `invalid_config` instead
  of bubbling a less-specific `SubagentValidator` error.

[Suggestion] S2 — Per-field size caps (workspaceAgents.ts):
- `description` / `systemPrompt`: 256 KB each
- `tools` / `disallowedTools`: 256 entries, each at most 256 chars
  Applied on both create + update; matches workspaceMemory's
  `MAX_MEMORY_CONTENT_BYTES = 1 MB` posture and keeps `GET
  /workspace/agents` list-response cost bounded.

[Suggestion] S3 — Mutex timeout (writeContextFile.ts):
- `getFileLock` now wraps each Mutex with `withTimeout(..., 30_000)`
  so a wedged filesystem (NFS hiccup, OneDrive lock, kernel I/O
  hang) cannot indefinitely hold the per-file lock. The
  `E_TIMEOUT` sentinel is caught and re-thrown as a typed
  `WorkspaceMemoryWriteTimeoutError`; the route maps it to 500
  `memory_write_timeout` with `timeoutMs` so SDK callers can
  branch on stalled-fs without parsing a generic 500.

[Suggestion] S4 — Test gaps:
- `DELETE /workspace/agents/:id?scope=workspace` happy path:
  removes only the project shadow, leaves user file on disk,
  emits exactly one `agent_changed` event with `level: project`.
- `POST /workspace/agents/:id?scope=global` happy path: updates
  user shadow, leaves project file untouched.
- 413 `memory_file_too_large`: write a 17 MB QWEN.md externally,
  POST append fails with the structured 413 payload (`bytes` /
  `limit`, no `filePath` / no path-embedding error message in
  default response).

[Nice] N1 — Tilde fence support (writeContextFile.ts):
- `findNextTopLevelHeading` now toggles fence state on both ``` `
  and `~~~` openers (CommonMark allows both). A `## heading`
  inside a `~~~` fenced block no longer counts as the section
  boundary.

Validation:
- typecheck: cli + sdk-typescript clean
- vitest: 380/380 (was 377, +3 new)
- eslint: clean

* fix(serve): apply round-9 review fold-in 2j on PR 16

Two real correctness fixes from wenshao's 2026-05-18 review:

1. resolveContextFilePath now uses getCurrentGeminiMdFilename() so
   POST /workspace/memory writes to the same file GET surfaces.
   Without this, a deployment that ran setGeminiMdFilename('AGENTS.md')
   saw GET list AGENTS.md while POST kept appending to a stale QWEN.md
   — clients then observed "I just wrote content but it's missing
   from /workspace/memory".

2. runWrite no-op branch now returns bytesWritten: 0 instead of the
   existing file's stat.size. The prior value conflated "bytes I
   wrote" with "current file size"; clients accumulating writes via
   sum(bytesWritten) added the file size for every whitespace POST.
   changed: false already signals the no-op; the byte count should
   match its field name.

JSDoc updated on both WriteContextFileResult.bytesWritten and
DaemonWriteMemoryResult.bytesWritten so the contract is explicit.
New test covers setGeminiMdFilename(AGENTS.md) round-trip; existing
no-op test updated for the new bytesWritten semantics.

Round-8 thread PRRT_kwDOPB-92c6Cpyap (DRY resolveOriginatorClientId)
stays open as the cross-Wave-4 tracking marker. CodeQL "missing rate
limiting" alert deferred to PR 24's audit/policy layer (bearer +
max-connections + mutation gate provide v1 mitigations).

* fix(serve): skip two Windows-incompatible test fixtures on win32

Both tests rely on `fs.chmod(dir, 0o555)` to trigger EACCES on a
subsequent write/unlink. Windows ignores Unix-style permission bits
passed to `fs.chmod`, so the directory stays writable, the operation
succeeds, and the error path the test exercises is unreachable —
the test then sees the success status (200 / 204) instead of the
expected 500. CI failed on Windows runner only; Ubuntu + macOS pass.

Route logic is platform-agnostic — these tests validate that:

- `workspaceMemory.test.ts` POST returns the structured 500 envelope
  (no `errorMessage` / `filePath` leakage outside QWEN_SERVE_DEBUG).
- `workspaceAgents.test.ts` DELETE returns 500 `agent_delete_partial`
  when one level's `fs.unlink` silently fails inside SubagentManager.

Both invariants are still covered by the Ubuntu + macOS runs. We can't
swap in a `vi.spyOn(fs, 'unlink')` mock for the agents case either —
`SubagentManager` does `import * as fs from 'fs/promises'`, creating
a sealed ESM namespace object vitest can't redefine.

Skip pattern mirrors `customBanner.test.ts:232`
(`if (process.platform === 'win32') return;`).
2026-05-18 14:26:59 +08:00
jinye
495d11f016
refactor(serve): add FileSystemService boundary (#4175 Wave 4 PR 18) (#4250)
Some checks are pending
Qwen Code CI / Classify PR (push) Waiting to run
Qwen Code CI / Lint (push) Blocked by required conditions
Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* refactor(serve): add FileSystemService boundary (#4175 PR 18)

Introduce a per-request workspace filesystem boundary inside the
`qwen serve` daemon. The boundary centralizes path canonicalization,
symlink-aware boundary checks, ignore/trust policy, size/binary
limits, and audit hooks behind a single typed surface — preparing
PR 19 (read-only file routes) and PR 20 (write/edit routes) to
share a guarded chokepoint instead of re-implementing path safety
per route.

Wave 4 PR 18 of #4175 — pure refactor, no new HTTP routes; depends
on PR 12 (#4241) and PR 15 (#4236), both merged.

New module under `packages/cli/src/serve/fs/`:

- `paths.ts` extracts `canonicalizeWorkspace` from `httpAcpBridge.ts`
  (re-exported there for backward compatibility) and adds:
  - `ResolvedPath` brand and `Intent` union (read/write/edit/list/
    glob/stat) with exhaustiveness checks at the trust gate
  - `hasSuspiciousPathPattern` — detects NTFS ADS, 8.3 short names,
    long-path prefixes, UNC paths, trailing dots, DOS device names,
    and three-or-more-dot path components (claude-code-style)
  - `findExistingAncestor` with explicit ENOTDIR rejection so a
    regular file in a path component throws `parse_error` rather
    than passing boundary inspection and 500-ing later
  - `resolveWithinWorkspace` running a chain-aware realpath check
    with ENOENT-tolerant ancestor walk for write/stat intents
- `errors.ts` defines `FsError` / `FsErrorKind` plus `wrapAsFsError`,
  which categorizes raw `fs.promises` errnos (EACCES → permission_
  denied, ELOOP → symlink_escape, ENOTDIR → parse_error, etc.) so
  body-level failures emit audit events instead of escaping
  uncategorized
- `policy.ts` carries `MAX_READ_BYTES` (256 KiB), `MAX_WRITE_BYTES`
  (5 MiB), `BINARY_PROBE_BYTES` (4 KiB), `shouldIgnore` (file/
  directory aware), and `assertTrustedForIntent` with an
  exhaustive switch over `Intent`
- `audit.ts` emits typed `fs.access` / `fs.denied` `BridgeEvent`
  frames with SHA-256-hashed paths, optional raw-path passthrough
  via `QWEN_AUDIT_RAW_PATHS=1`, and discriminator `kind` fields so
  SDK consumers can exhaustively narrow `event.data`
- `workspaceFileSystem.ts` — `WorkspaceFileSystem` interface +
  `createWorkspaceFileSystemFactory` with eight methods (resolve,
  stat, readText, readBytes, list, glob, writeText, edit). Every
  body method funnels failures through `recordAndWrap`, which
  wraps raw fs errors and always emits an `fs.denied` audit event
  before rethrowing. `readText` enforces `MAX_READ_BYTES` *before*
  delegating to the slurping core service so unbounded requests
  against multi-gigabyte files can no longer OOM the daemon.
  `glob` realpath-checks each hit against the canonical workspace
  and reports filtered escapes via a single aggregated `fs.denied`
  event with the dropped count
- `index.ts` is the barrel re-export PR 19/20 will import from

Modified files:

- `packages/cli/src/serve/httpAcpBridge.ts` — extracted
  `canonicalizeWorkspace` to `fs/paths.ts`; the bridge re-exports
  it so existing callers in `server.ts` and `runQwenServe.ts` keep
  working
- `packages/cli/src/serve/server.ts` — added
  `fsFactory?: WorkspaceFileSystemFactory` to `ServeAppDeps`;
  `createServeApp` builds a strict default (`trusted: false`,
  warn-once no-op `emit`) when none is injected so a future
  refactor that forgets `fsFactory` injection cannot silently
  allow writes against an untrusted workspace; factory parked on
  `app.locals` for PR 19/20 route handlers
- `packages/core/src/index.ts` — re-exports `Ignore`,
  `loadIgnoreRules`, and `LoadIgnoreRulesOptions` from
  `utils/filesearch/ignore.js` for cli consumption

411 serve tests pass; typecheck clean.

Engineering principles checklist:
- [x] Independently mergeable (no new routes, no new capability tag)
- [x] Backward compatible (no removed routes / event fields / CLI behavior)
- [x] Default off (no public surface change; PR 19/20 will activate routes)
- [x] qwen serve Stage 1 routes preserved
- [x] Gradual migration (PR 19/20 will adopt the boundary)
- [x] Reversible (single PR rollback)
- [x] Tests-first (101 unit tests across the new module + contract test)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): address PR review feedback (#4250)

Codex + Copilot review found 8 substantive issues against
7b0db4c3a; this commit fixes all of them. The first two are
P1 build-breakers introduced by the pre-commit `eslint --fix`
auto-promoting value imports to `import type` — 31 fs tests
were failing post-commit until this fix.

Issue list (links to PR comments at #4250):

1. Import type erased runtime values —
   workspaceFileSystem.ts:10-15. eslint's
   consistent-type-imports rule rewrote
   `import { Ignore, StandardFileSystemService, loadIgnoreRules,
   type WriteTextFileOptions }` -> `import type { ... }` because
   it saw type-only usage of `Ignore`. That erased
   `loadIgnoreRules` (called at runtime) and
   `StandardFileSystemService` (constructed at runtime), causing
   TS1361/TS2206 and runtime ReferenceErrors. Restored as a
   value import with inline `type` modifiers per-symbol and an
   eslint-disable line + comment so future autofixes don't
   repeat the regression.

2. Same import-type erasure in contract.test.ts:14 — `isFsError`
   was lumped under `import type` even though it's called at
   runtime. Same fix shape.

3. edit() OOM hole — workspaceFileSystem.ts. The earlier review
   pass added a pre-stat MAX_READ_BYTES gate to readText but
   missed edit, which fsp.readFile's the whole file before any
   size check. Multi-GB targets inside the workspace could OOM
   the daemon. Now stat-first; refuses above the cap with
   file_too_large; also rejects binary files (string indexOf
   over arbitrary bytes is meaningless).

4. glob accepted absolute / device patterns —
   workspaceFileSystem.ts. The ..-segment check stopped lexical
   traversal but /etc/** / C:\\Users\\foo\\** /
   \\\\server\\share\\** / //server/share/** still reached
   globAsync, walking outside the workspace before per-hit
   filtering dropped the results. Now rejects these patterns
   up-front with parse_error so no I/O happens outside.

5. glob ignore filter probed every hit as `file` —
   workspaceFileSystem.ts. The underlying `ignore` library
   needs a trailing-slash probe for dist/ / .git/-style
   directory patterns; probing as `file` silently leaked
   directory matches. Now lstats each hit and routes 'directory'
   vs 'file' to shouldIgnore so dir-only ignore rules actually
   match.

6. ReadTextOptions.line off-by-one — workspaceFileSystem.ts. The
   public option was documented as 1-based but forwarded as-is
   to readFileWithLineAndLimit, which is 0-based. A request with
   line: 1 returned content starting at the second line. Now
   converts 1-based -> 0-based at the boundary; doc clarified;
   truncation check uses the converted index.

7. ServeAppDeps.fsFactory JSDoc said trusted=true — server.ts:96.
   Stale from before the strict-default refactor in the same
   review pass. Rewrote to match the actual trusted: false +
   warn-once emit behavior.

8. MAX_READ_BYTES JSDoc said reads above cap return truncated —
   policy.ts:18. Stale from before the hard-cap refactor; now
   correctly states the cap throws file_too_large and that soft
   truncation only applies under the cap via enforceReadSize.

7 new tests cover the new behaviors:
- POSIX-absolute pattern rejection
- Win32 / UNC pattern rejection (4 variants)
- directory-pattern ignore (dist/)
- edit file-too-large
- edit binary refusal
- readText line: 1 returns from first line
- readText line: 2 starts from second line

418/418 serve tests pass; typecheck + eslint clean.

Deferred follow-ups (per PR review reply):
- glob maxResults is applied after globAsync materializes every
  match. A streaming iterator (glob.iterate) would bound the
  walk too. Non-trivial; tracked as a separate hardening
  follow-up since current behavior is correctness-safe (just
  not optimal under huge trees).
- Per-path glob escape hash in audit hint (currently aggregated
  count) — can revisit once PR 19 wires the routes and we see
  real audit volume.
- EVENT_SCHEMA_VERSION migration mechanism — orthogonal; the
  whole BridgeEvent schema lacks one and that's a Wave 5+
  concern.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 3 review-round-2 findings (#4250)

The wenshao + DeepSeek reviewer pass on a81ada43f surfaced 3 more
issues; this commit fixes them.

1. Dangling-symlink write escape (paths.ts) — Critical security
   bug. A request like `write /ws/escape` where `<ws>/escape` is
   a symlink whose target doesn't exist YET would pass the
   ENOENT-tolerant ancestor walk: realpath fails ENOENT, the
   walk-up returns `<ws>` as ancestor, the canonical becomes
   `<ws>/escape`, containment passes — but the eventual write
   follows the symlink and creates the file at the symlink
   target outside the workspace. lstat-then-readlink before the
   ancestor walk catches this; the symlink target is itself
   resolved via the deepest existing ancestor so macOS
   /private/var canonicalization stays consistent with
   boundCanonical (an absolute target inside the workspace
   tmpdir would otherwise have been false-flagged on macOS).

2. glob realpath catch over-reported symlink_escape
   (workspaceFileSystem.ts) — every realpath failure inside the
   per-hit boundary loop was counted as `symlink_escape`. EIO,
   EACCES, ENAMETOOLONG, EBUSY are environmental failures, not
   security events; mislabeling them poisoned the audit signal
   for operators trying to investigate genuine escape attempts.
   Now distinguished: ENOENT/ELOOP count as escapes; other
   errnos count as transient errors and emit a separate
   aggregated `fs.denied` with errorKind: 'permission_denied'.

3. policy.ts:enforceReadSize JSDoc said the boundary "intentionally
   does NOT throw" — stale after a81ada43f's hard-cap refactor.
   Rewrote to clarify the helper is the soft truncation gate that
   only fires under the hard cap; readText itself enforces the
   hard cap with file_too_large via its pre-stat check. The
   readme/contract is now consistent with workspaceFileSystem.ts.

2 new tests:
- dangling symlink targeting outside-workspace path → symlink_escape
- dangling symlink targeting future-inside-workspace path → succeeds
  (ahead-of-mkdir flow for atomic-write-via-rename)

420/420 serve tests pass; typecheck clean.

Remaining tracked follow-ups (per PR review reply):
- list/glob brand cast (P2 deferred per PR description)
- glob audit pathHash hashes pattern not paths
- edit() TOCTOU read-modify-write race (atomic-via-temp + rename)
- wrapAsFsError ENOSPC/EIO mapping to a distinct kind
- runQwenServe → fsFactory injection integration test
- glob maxResults streaming (glob.iterate)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): cross-platform ENOTDIR detection (#4250 CI fix)

Windows CI test failure on a81ada43f surfaced a real cross-platform
bug in `findExistingAncestor`. POSIX returns `ENOTDIR` when
`fs.stat` traverses through a non-directory in a path component
(e.g. `${ws}/file.txt/leaf` where `file.txt` is a regular file).
**Windows returns `ENOENT` for the same case.** The errno-based
guard added in a81ada43f only branched on `ENOTDIR`, so the
Windows path silently fell through to the ancestor walk and the
boundary returned a "canonical" the eventual write could not
honor — `WorkspaceFileSystem - audit always emits on body errors >
rejects ENOTDIR ancestor walk with parse_error rather than
passing boundary` failed with `expected false to be true` on the
windows-latest runner.

Fix: switch from errno-based detection (platform-divergent) to
dirent-kind detection. After `fs.stat` succeeds during the
walk-up, if the existing ancestor is NOT a directory AND there
are unresolved tail components, throw `parse_error`. Both `ENOENT`
and `ENOTDIR` from `fs.stat` are now treated as "the *current*
path doesn't resolve, keep walking" — the post-walk kind check
fires regardless of which errno surfaced. Cross-platform-safe.

The local 110/110 fs tests still pass on macOS/Linux; the Windows
case will exercise the kind-check branch on next CI run.

macOS CI failures on the same workflow run (`InputPrompt.test.tsx`
placeholder reuse, `SettingsDialog.test.tsx` 5s timeout) are pre-
existing flaky UI tests, NOT touched by this PR.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 4 review-round-3 findings (#4250)

DeepSeek's third review pass (after b38e82157) flagged four more
issues; this commit fixes all of them.

1. Multi-hop dangling symlink bypass (paths.ts) — Critical
   security fix. The earlier single-readlink fix at efd7a4611
   was bypassed by chained dangling symlinks: <ws>/leak ->
   <ws>/middle -> /scratch/evil where every layer is a symlink
   and the final target doesn't exist. The fix only readlink'd
   the first hop (<ws>/middle), saw it was inside the workspace
   via findExistingAncestor, and let the chain through. The OS
   write at <ws>/leak would then follow both hops and create
   /scratch/evil. Now loops lstat + readlink up to
   MAX_ANCESTOR_HOPS, tracks visited inodes for cycle detection,
   and only validates containment on the fully-dereferenced
   leaf. Cycle detection rejects with symlink_escape; chains
   that exceed the hop bound also surface as symlink_escape
   with a "too long or contains a cycle" hint.

2. opts.line validation (workspaceFileSystem.ts) — the docstring
   committed to "1-based positive integer" but Infinity / floats
   / negative values flowed through to readFileWithLineAndLimit
   and degraded silently. Now enforces Number.isSafeInteger +
   line >= 1 at the boundary; everything else throws
   parse_error. Test covers Infinity, -Infinity, 0, -1, 1.5,
   NaN.

3. io_error FsErrorKind (errors.ts) — wrapAsFsError previously
   conflated EIO/EBUSY/ENAMETOOLONG/EMFILE/ENFILE/ENOSPC/ETXTBSY
   under permission_denied. Monitoring pipelines that key on
   errorKind for security alerting would page oncall on a full
   disk. New io_error kind (HTTP 503) maps the environmental-
   failure errnos with distinct hints. EACCES/EPERM stay on
   permission_denied (literal access denial); EIO (failing
   disk), EBUSY (busy file), ENAMETOOLONG (PATH_MAX),
   EMFILE/ENFILE (fd exhaustion), ENOSPC (df -h reporting
   100%), ETXTBSY (text-busy) all route to io_error.

4. glob audit kind taxonomy (workspaceFileSystem.ts) — three-way
   classification mirrors wrapAsFsError so the per-hit realpath
   catch surfaces ENOENT/ELOOP -> symlink_escape, EACCES/EPERM
   -> permission_denied, everything else -> io_error. Each
   class emits its own aggregated fs.denied event.

5. edit() matchedIgnore (workspaceFileSystem.ts) — readText and
   writeText both stamp matchedIgnore in their access audit;
   edit didn't, so operators monitoring fs.access events couldn't
   distinguish edits to .gitignore'd files (build artifacts,
   logs) from edits to tracked source. Added the same
   shouldIgnore + matchedIgnore plumbing that readText uses.

8 new tests:
- multi-hop dangling symlink (security)
- symlink cycle (security)
- ENOSPC/EIO/EBUSY/ETXTBSY/ENAMETOOLONG -> io_error mapping
- io_error -> HTTP 503
- EMFILE/ENFILE updated to io_error (was permission_denied)
- opts.line rejects Infinity/-Infinity/0/-1/1.5/NaN
- edit() audit records matchedIgnore on .log file

426/426 serve tests pass; typecheck clean.

Remaining tracked follow-ups (per PR review reply):
- list/glob brand cast (P2 deferred per PR description)
- glob audit pathHash hashes pattern not paths
- edit() TOCTOU read-modify-write race (atomic-via-temp + rename) — pinned to PR 20
- runQwenServe → fsFactory injection integration test — pinned to PR 19
- glob maxResults streaming (glob.iterate)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 3 review-round-4 critical findings (#4250)

DeepSeek's fourth review pass surfaced three more critical bugs;
all are fixed in-PR.

1. TOCTOU symlink substitution in readText/readBytes/edit
   (workspaceFileSystem.ts) — Critical. fsp.stat(p) and the
   subsequent lowFs.readTextFile(p) (or fsp.readFile / fsp.readFile
   for edit) are two separate syscalls. An attacker who can write
   into the workspace can swap p from a regular file to a symlink
   pointing outside between the two calls; pre-stat sees the
   original, the read follows the swap.

   Fix: assertInodeStableAfterRead(p, preIno) — after each read,
   re-lstat p; reject with symlink_escape if the path is now a
   symlink (post.isSymbolicLink()) or its inode changed (preIno
   !== post.ino, with a 0-ino fallback for procfs / virtual
   mounts that don't report meaningful inodes). Catches the
   swap-and-leave attack and the swap-and-keep-swapped attack.
   Residual race (attacker swaps back AFTER our read but BEFORE
   our lstat) is much smaller than the original window and
   outside PR 18's threat model; fd-based reading via
   fsp.open + fileHandle.read would close it entirely but
   requires a new variant of lowFs that takes a FileHandle —
   tracked as a follow-up.

2. UTF-8 truncation corruption in readText (workspaceFileSystem.ts)
   — Critical. buf.subarray(0, sizeOutcome.bytesToRead).toString
   ('utf-8') silently emits U+FFFD when bytesToRead falls
   mid-codepoint (CJK, emoji). A downstream consumer parsing
   JSON or source code over the truncated content would see
   broken trailing bytes; meta.truncated would be true but the
   prefix is corrupt. A subsequent edit() with the corrupted
   string as oldText would also fail to match the on-disk content.

   Fix: safeUtf8Truncate(buf, maxBytes) walks back from maxBytes
   through any continuation bytes (0b10xxxxxx), then verifies
   the leading byte's full sequence fits within the cap; drops
   the leading byte if it doesn't. The result is always a clean
   prefix at a valid codepoint boundary. Test pins '中文测试' (12
   bytes / 3 bytes per char) truncated at 7 bytes -> '中文' (no
   U+FFFD).

3. glob opts.cwd bypasses workspace boundary
   (workspaceFileSystem.ts) — Critical. opts.cwd was used
   directly as the glob root with no validation against
   boundWorkspace. ResolvedPath is a brand cast and a stale
   or forged value lets a glob('**/*', { cwd: '/etc' })
   enumerate files outside the workspace. The pattern-side
   absolute / UNC checks added in a81ada43f only constrain
   the *pattern*; cwd is the actual hazard.

   Fix: at the entry point of glob(), path.resolve cwd and
   isWithinRoot-check against boundWorkspace. Throws
   path_outside_workspace if cwd is outside, even when the
   pattern itself is harmlessly relative. Test pins the case
   with cwd: scratch (outside workspace).

3 new tests:
- readText with mid-operation symlink swap -> symlink_escape
- safeUtf8Truncate keeps CJK codepoints intact at 7-byte cap
- glob with opts.cwd outside workspace -> path_outside_workspace

429/429 serve tests pass; typecheck + eslint clean.

Remaining tracked follow-ups (Post-PR-18 hardening, in #4175):
- list/glob brand-cast contract (PR 19)
- runQwenServe → fsFactory injection contract test (PR 19)
- edit() write-side TOCTOU + atomic-via-temp + expectedHash (PR 20)
- glob audit pathHash (independent audit.ts commit)
- glob maxResults streaming (independent hardening)
- glob pattern preflight refactor to reuse hasSuspiciousPathPattern (cosmetic)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 7 review-round-5 findings (#4250)

DeepSeek round-5 surfaced 9 comments; 7 are real findings (the
other 2 — UTF-8 truncation + glob opts.cwd — already fixed in
546834267 from round-4 and replied-to inline). Stack on round-4.

1. edit() empty-oldText silent-prepend (workspaceFileSystem.ts) —
   Critical silent data corruption. JS `''.indexOf('')` returns 0,
   so without an empty-string guard
   `current.slice(0,0) + newText + current.slice(0)` = `newText +
   current` — silently prepends `newText` to the whole file with
   a success audit event. PR 20 routes that thread user-supplied
   `oldText` verbatim must not be able to trigger this. Now
   throws `parse_error` BEFORE the read with a hint explaining
   why empty matches are rejected.

2. DOS device name regex misses bare names + first-extension forms
   (paths.ts) — Windows attack surface. The earlier
   /\.(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])$/i only caught the
   last-extension form (file.CON). NTFS reserves these names
   regardless of extension: CON, NUL, CON.txt, NUL.dat,
   CON.foo.bar are ALL reserved device handles. New regex
   /(^|\.)(CON|...|LPT[1-9])(\.|$)/i covers all four positions
   (bare / first-ext / last-ext / middle-ext) while still
   admitting legitimate substrings (BACON, concat.txt, precon.go).

3. Buffer.byteLength replaces Buffer.from for size-only checks
   (workspaceFileSystem.ts) — 5 MB heap allocation per write
   eliminated. `writeText` and `edit()` previously materialized
   the entire UTF-8 payload (up to MAX_WRITE_BYTES) just to read
   `.length`; `Buffer.byteLength(content, 'utf-8')` returns the
   count without allocating.

4. edit() error message includes oldText snippet
   (workspaceFileSystem.ts) — Production debuggability. The
   earlier hint was just "edit() expects oldText to appear
   verbatim in the file" — at 3 AM an operator can't tell whether
   the mismatch is whitespace, a stale file, or a wrong target.
   Now includes a JSON-quoted truncated snippet (max 80 chars +
   ellipsis) of the searched-for text.

5. recordAndWrap forwards FsError message into fs.denied audit
   (workspaceFileSystem.ts + audit.ts) — Audit observability gap.
   Audit consumers debugging an incident saw `errorKind` + `hint`
   but lost the underlying OS error detail (path, errno text,
   byte count). FsDeniedAuditPayload now carries an optional
   `message` field; recordAndWrap forwards `fs.message`
   automatically.

6. EISDIR / ENOTDIR distinct hints (errors.ts) — UX. Both shared
   the same hint "a path component is not a directory where one
   was expected" — for EISDIR (path IS a directory but a file was
   expected) the wording was reversed. Now distinct hints with
   the errno name explicitly cited.

7. kindFromStats / kindFromDirent merged into kindFromStatLike
   (workspaceFileSystem.ts) — duplicate function bodies removed.
   Both fs.Stats and fs.Dirent expose the same isFile /
   isDirectory / isSymbolicLink interface, and both targets
   (FsStat['kind'], FsEntry['kind']) are the same 4-value union.
   Single helper avoids drift if the union grows.

4 new tests:
- bare/multi-ext DOS device names (CON, NUL, CON.txt, CON.foo.bar)
  + legitimate substrings (BACON, concat, precon, contemplating)
- edit() empty oldText -> parse_error + file unchanged
- edit() not-found error includes searched snippet in hint
- fs.denied audit payload carries FsError message

Plus the 2 already-fixed items (UTF-8 boundary, glob opts.cwd)
have new test coverage from round-4.

433/433 serve tests pass; typecheck + eslint clean.

Stack: 7b0db4c3aa81ada43fefd7a4611b38e82157911cb8e5d546834267 → THIS

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 7 review-round-6 findings (#4250)

Round-6 review (wenshao + gpt-5.5) flagged 7 items including
two Critical and one privacy regression I introduced in round-5.

1. writeText / edit pre-write symlink guard
   (workspaceFileSystem.ts) — Critical, two reviewers (wenshao
   #CsBcq + gpt-5.5 #CsB3M) independently flagged.
   `atomicWriteFile` (`packages/core/src/utils/atomicFileWrite.ts`)
   resolves symlinks at write time, so a swap between the
   boundary's `resolve()` and `lowFs.writeTextFile()` would let
   the write follow the symlink to outside the workspace.
   `writeText` had no read phase, so its window was wider than
   `edit()`'s. New helper `assertNotSymlinkBeforeWrite(p)` lstats
   the path immediately before each `lowFs.writeTextFile` call;
   ENOENT is fine (ahead-of-create flow), but an actual symlink
   throws `symlink_escape`. Used in `writeText` AND `edit()`.
   Residual race after this guard but before the write completes
   is the deferred PR 20 atomic-via-temp follow-up.

2. recordAndWrap message field bypassed privacy mode
   (audit.ts) — Critical privacy regression I introduced at
   ebd9e78a1 (round-5). The new `FsDeniedAuditPayload.message`
   field forwarded `FsError.message` unconditionally — and many
   throw-sites embed `${p}` (absolute paths) or user-supplied
   `oldText` snippets into the message. Privacy-mode operators
   (without `QWEN_AUDIT_RAW_PATHS=1`) saw paths leak through the
   message field even though they explicitly disabled raw-path
   logging. Fixed: `message` is now gated behind `includeRawPaths`
   alongside `relPath`. Privacy mode = no path-bearing fields,
   period. Operators wanting forensic context opt in via
   `QWEN_AUDIT_RAW_PATHS=1` and accept both fields together.

3. glob opts.cwd via symlink (workspaceFileSystem.ts) — gpt-5.5
   #CsB3P. Textual `path.resolve(cwd) + isWithinRoot` admits
   `<ws>/link` even when `<ws>/link → /etc` is a symlink to
   outside; `globAsync` walks `/etc` before the per-hit filter
   drops results. Switched to `fsp.realpath(path.resolve(cwd))`
   so the containment check sees the actual walk root. ENOENT
   on cwd surfaces as `path_not_found`.

4. readText OOM via concurrent file growth
   (workspaceFileSystem.ts) — wenshao #CsBeE. The pre-stat
   `MAX_READ_BYTES` gate only sees the size at stat time; a
   concurrent writer can grow the file before the actual
   `readFileWithLineAndLimit` slurp. Added post-read
   `Buffer.byteLength(result.content) > MAX_READ_BYTES` check.
   The proper fix (fd-based read tying size + read to the same
   inode) is a hardening follow-up; this byte-length check is
   the defense-in-depth layer.

5. readBytes maxBytes can widen past MAX_READ_BYTES
   (policy.ts) — wenshao #CsBj5. `enforceReadBytesSize(st.size,
   opts.maxBytes)` used the caller-supplied `maxBytes` as the
   ceiling, replacing rather than clamping `MAX_READ_BYTES`. A
   future PR 19/20 route forwarding `req.query.maxBytes` could
   blindly bypass the daemon's 256 KiB safety cap. Now clamps
   via `Math.min(maxBytes, MAX_READ_BYTES)`.

6. ENOENT_TOLERATING_INTENTS docstring + test (paths.ts) —
   wenshao #CsBk3. The Intent docstring only mentioned `'write'`
   tolerating ENOENT; `'stat'` was in the set undocumented. A
   future maintainer removing `'stat'` thinking it was a
   copy-paste error would silently change behavior (stat on a
   concurrently-deleted path would throw `path_not_found` from
   the resolver instead of letting `fsp.lstat` throw `ENOENT`
   naturally). Amended docstring to call out `'stat'`'s rationale
   explicitly + added contract corpus case.

6 new tests:
- glob cwd via symlink to outside → path_outside_workspace
- writeText with mid-operation symlink swap → symlink_escape +
  outside file unchanged
- edit with mid-operation symlink swap → symlink_escape + outside
  file unchanged
- readBytes opts.maxBytes attempting widening → file_too_large
- fs.denied message field absent in privacy mode (default)
- fs.denied message field present in raw-paths mode (forensic)
- contract corpus: resolve('newdir/leaf', 'stat') succeeds for
  ENOENT path

439/439 serve tests pass; typecheck + eslint clean.

Stack: 7b0db4c3aa81ada43fefd7a4611b38e82157911cb8e5d546834267ebd9e78a1 → THIS

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 5 review-round-7 findings (#4250)

glm-5.1 round-7 review surfaced 6 items; 5 are real fixes, 1 was
already addressed in 7f4f30d0b (writeText pre-write symlink
guard — reviewer looked at ebd9e78a1 snapshot ~8h before the
round-6 fix). Stack on round-6.

1. edit() bypassed lowFs.readTextFile (workspaceFileSystem.ts) —
   Critical encoding round-trip corruption. The earlier
   `fsp.readFile(p, 'utf-8')` included the UTF-8 BOM verbatim in
   `current`, breaking `oldText` matching even when the user
   passed the exact source text from a previous read; lost
   iconv-supported codepage handling (GBK / Big5 / Shift_JIS),
   so non-UTF-8 files would mojibake into `current` and
   round-trip-corrupt on write-back; and the subsequent
   `lowFs.writeTextFile` passed no `_meta`, stripping the BOM
   and forcing UTF-8 on write-back even when the original was
   BOM'd or non-UTF-8. Fix: read via `lowFs.readTextFile` AND
   forward `readResult._meta` into the write-back. Test pins
   UTF-8 BOM round-trip end-to-end.

2. hasSuspiciousPathPattern multi-digit and POSIX false positive
   (paths.ts) — `/~\\d/` only matched single-digit; NTFS
   allocates `~10+` on >9 collisions. POSIX false-positives:
   editor swap files, backup tools used `~N` legitimately. Fixed
   to `/~\\d+/` and gated behind `process.platform === 'win32'`,
   matching the ADS-colon check.

3. canonicalizeWorkspace redundant per-request syscall
   (paths.ts) — Performance. Factory canonicalizes once at
   build; every `resolveWithinWorkspace` also ran realpathSync
   on the same path, blocking the event loop. Added
   CANONICAL_BOUND_CACHE Map keyed on the input string;
   steady-state size = 1 per `1 daemon = 1 workspace`.

4. readBytes opts.maxBytes API contract
   (workspaceFileSystem.ts) — Semantic mismatch. Parameter name
   promised window-read; impl only used it as a hard reject
   gate. Now truncates the buffer post-read so `readBytes(p,
   { maxBytes: 1024 })` on a 200 KB file returns 1 KB. Hard
   `MAX_READ_BYTES` cap still throws for files above it.

5. glob walks node_modules and .git unnecessarily
   (workspaceFileSystem.ts) — Performance. Without an `ignore`
   option, `globAsync` traversed every file under those dirs
   before our per-hit `shouldIgnore` filter. Now passes
   `ignore: ['**/node_modules/**', '**/.git/**']` to
   short-circuit traversal. Post-filter via `shouldIgnore`
   remains authoritative.

5 new tests:
- 8.3 short-name regex Windows / POSIX split
- readBytes truncates 2048-byte file to 1024 with maxBytes
- readBytes throws file_too_large only above hard cap
- edit() preserves UTF-8 BOM round-trip
- glob prunes node_modules and .git

442/442 serve tests pass; typecheck + eslint clean.

Stack: 7b0db4c3aa81ada43fefd7a4611b38e82157911cb8e5d546834267ebd9e78a17f4f30d0b → THIS

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 10 review-round-8/9 findings (#4250)

Two more review passes (gpt-5.5 + DeepSeek + wenshao) flagged 13
items; 10 are real fixes, 3 are reviewer-stale-snapshot or
already-tracked. Stack on round-7.

Critical (5):

1. paths.ts symlink-escape hint embedded the symlink target
   (gpt-5.5) — Privacy regression sibling to round-6 audit
   `message` gate. `recordDenied` always forwards `hint` into
   `fs.denied` even with `QWEN_AUDIT_RAW_PATHS` off; the hint
   `'symlink points to /Users/alice/secret'` leaks the
   attacker's intended exfiltration path through audit. Hint is
   now path-free; operators wanting the resolved target enable
   `QWEN_AUDIT_RAW_PATHS=1` and read it from `relPath` /
   `message`.

2. paths.ts dangling-symlink chain discarded its verified
   canonical (DeepSeek) — After the multi-hop walk validated
   `cursor → canonicalTarget` was inside the workspace, the
   code fell through to `findExistingAncestor(absolute)`,
   re-walking from the original input and discarding the
   verified result. An attacker swapping an intermediate
   symlink between the verification and the re-walk could
   produce a different canonical than the one validated. The
   verified `canonicalTarget` is now captured in
   `symlinkResolvedCanonical` and used directly; the
   `findExistingAncestor(absolute)` fallthrough only runs when
   no symlink was traversed.

3. workspaceFileSystem.ts readBytes missing post-read size
   check (DeepSeek) — Same TOCTOU shape as `readText`'s
   round-6 fix. The pre-stat `enforceReadBytesSize` sees the
   size at stat time; a concurrent appender keeps the same
   inode but grows the file past the cap before
   `fsp.readFile` returns. `assertInodeStableAfterRead`
   catches inode changes but not same-inode growth. Added a
   post-read `buf.length > MAX_READ_BYTES` check matching
   `readText`'s defense-in-depth pattern.

4. errors.ts wrapAsFsError default = permission_denied
   (DeepSeek) — Misclassified non-errno errors (`TypeError`,
   programmer-error throws, native module exceptions) as
   security denials, paging security oncall for what should
   be a developer ticket. New `internal_error` kind (HTTP
   500) is the new default; `permission_denied` reserved for
   actual `EACCES`/`EPERM`.

5. audit.ts AuditContext.sessionId not forwarded to
   BridgeEvent (DeepSeek) — Multi-session daemons couldn't
   trace audit events back to the session that triggered
   them. `originatorClientId` identifies the client, not the
   session. Added optional `sessionId` field to both
   `FsAccessAuditPayload` and `FsDeniedAuditPayload`,
   forwarded from `ctx.sessionId` when present.

Improvements (4):

6. workspaceFileSystem.ts glob cwd realpath redundant when
   cwd === boundWorkspace (wenshao) — `boundWorkspace` is
   already canonicalized by the factory (`realpathSync.native`
   at build time), so calling `fsp.realpath` per-request when
   no `opts.cwd` was supplied is a redundant async syscall.
   Added a short-circuit.

7. workspaceFileSystem.ts kindFromStatLike JSDoc orphaned
   (wenshao) — Inserting `assertNotSymlinkBeforeWrite` between
   the JSDoc and `kindFromStatLike` left the doc floating
   above the wrong function. IDE hovers showed the wrong
   description. Moved the doc back to its function.

8. workspaceFileSystem.ts shared mutable Ignore object
   (DeepSeek) — `createWorkspaceFileSystemFactory` builds one
   `Ignore` instance and shares it across every
   `WorkspaceFileSystemImpl` returned by `forRequest()`.
   `Ignore.add(): this` is a public mutator. A future
   "per-session ignore rules" feature calling `.add()` from a
   request handler would silently corrupt all concurrent
   sessions. `Object.freeze` turns the cross-request mutation
   into a `TypeError` rather than a silent leak.

9. server.ts createDefaultFsAuditEmit one-shot warned
   (DeepSeek) — Permanent silent no-op after the first event;
   only logged the event `type` with no pathHash / errorKind /
   intent. If PR 19 forgets the real factory injection, every
   write 403s and audit is silent past the first warning —
   exactly the regression the warning exists to surface.
   Periodic warning (every 100th drop) + first-event context
   (errorKind, intent, pathHash) makes the regression
   actionable in production logs.

Cleanup (1):

10. workspaceFileSystem.ts safeUtf8Truncate dead code
    (DeepSeek noted as "off-by-one") — The lead-byte
    seqLen-check block was dead code: `subarray(0, end)`
    already excludes the leading byte at `end`, so no
    further adjustment is ever needed. Removed the block;
    function is now 4 lines and still produces a valid
    codepoint prefix. Reviewer's suggested fix
    (`buf[end-1] → buf[end]`) was technically correct but
    redundant with the subarray cut.

Already-fixed (3 reviewer-stale-snapshot, reply + resolve):

- writeText pre-write symlink guard — fixed in 7f4f30d0b
- edit() read-modify-write race — already deferred to PR 20
  atomic-via-temp follow-up
- glob maxResults walk-bound — already follow-up #5

3 new tests + 2 updated:
- wrapAsFsError unknown errno → internal_error (default change)
- internal_error has HTTP 500
- non-Error throwables → internal_error (not permission_denied)
- readBytes post-stat growth → file_too_large
- existing wrapAsFsError test updated for new default

445/445 serve tests pass; typecheck + eslint clean.

Stack: 7b0db4c3aa81ada43fefd7a4611b38e82157911cb8e5d546834267ebd9e78a17f4f30d0b1dc9d2290 → THIS

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve/fs): close 3 review-round-10 findings (#4250)

Round-10 review (wenshao) flagged 4 items, all marked
non-blocking by the reviewer. 3 are landed in-PR; 1 (glob
withFileTypes optimization) is moved to issue #4175 follow-ups
since the reviewer themselves recommended "test it on a 100K-file
workspace after PR 19 lands."

1. canonicalizeWorkspace docstring (paths.ts) — Documentation.
   The earlier FIXME warned about sync syscall blocking the
   event loop without mentioning the `CANONICAL_BOUND_CACHE`
   added in 1dc9d2290 brings steady-state cost to zero. Future
   reviewers reading the FIXME again would re-flag the
   already-mitigated concern. Added a paragraph noting cache hit
   rate = 100% under the `1 daemon = 1 workspace` model so
   per-request blocking only happens at boot or on fresh
   workspace values (e.g. tests).

2. enforceReadBytesSize dead `maxBytes` parameter (policy.ts) —
   Round-7 changed `readBytes` to use post-read truncation for
   the soft window cap, leaving the function's `maxBytes`
   parameter unused at the only callsite. The
   `Math.min(maxBytes, MAX_READ_BYTES)` clamp branch became dead
   code. Reviewer's option 1: tighten the signature to
   `enforceReadBytesSize(fileBytes: number): void`. Done — the
   function is now purely the hard-cap enforcer its name
   implies; soft-window truncation lives in the orchestrator's
   `buf.subarray(0, opts.maxBytes)` post-read step where it's
   visible alongside the cap check it complements.

3. relForAudit cross-drive sentinel (audit.ts) — Windows-only
   privacy edge case. `path.relative('C:\\ws', 'D:\\evil')`
   returns `'D:\\evil'` (an absolute path) because Win32 can't
   express cross-drive relatives. Even when raw-paths mode is
   ENABLED, the audit `relPath` field would carry the off-drive
   absolute path, exposing the attacker's drive letter +
   directory. Added a `path.isAbsolute(rel)` post-check that
   substitutes a `<cross-drive>` sentinel — audit consumers see
   the cross-drive case distinctly without leaking the offending
   path. This was previously P2 deferred in PR 18's description
   ("Windows cross-drive path.relative"); reviewer's "few lines
   beats deferring" assessment was right.

Tracked as follow-up (not in this commit):

4. glob withFileTypes optimization — Replace per-hit `lstat`
   (line ~664) with `glob` v10's `withFileTypes: true` so each
   hit comes back as a `Path` object with `isDirectory()` /
   `isFile()` / `isSymbolicLink()` already available. Saves N
   syscalls in large workspaces. Non-trivial restructure (return
   type changes from `string[]` to `Path[]`). Reviewer
   themselves marked "[performance / non-blocking]" and said
   "test it on a 100K-file workspace after PR 19 lands so we
   know whether it's worth it." Added to issue #4175 body's
   Post-PR-18 hardening follow-ups.

446/446 serve tests pass; typecheck + eslint clean.

Stack: 7b0db4c3aa81ada43fefd7a4611b38e82157911cb8e5d546834267ebd9e78a17f4f30d0b1dc9d2290a33d459df → THIS

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
2026-05-18 13:14:43 +08:00
jinye
96219924a0
feat(serve): MCP client guardrails (#4175 Wave 3 PR 14) (#4247)
* feat(serve): MCP client guardrails (#4175 Wave 3 PR 14)

Adds an in-process MCP client counter, slot-reservation enforcement at all 3 spawn sites (discoverAllMcpTools / discoverAllMcpToolsIncremental / readResource), new `--mcp-client-budget=N` + `--mcp-budget-mode={enforce,warn,off}` CLI flags forwarded to the ACP child via env, and additive `clientCount` / `clientBudget` / `budgetMode` / `budgets[]` fields plus `disabledReason: 'budget'` tagging on `GET /workspace/mcp`.

Always-on capability tag `mcp_guardrails` with `modes: ['warn', 'enforce']` so SDK clients can pre-flight refusal semantics. Typed SSE push events (`mcp_budget_warning` / `mcp_child_refused_batch`) intentionally deferred to a small follow-up PR — the snapshot already exposes `budgets[0].status: 'warning'|'error'` + `refusedCount` so operator visibility isn't blocked.

* fixup(serve): address PR 14 review (#4247) findings 1-7

Addresses Codex + Copilot review feedback on #4247. Seven functional and forward-compat fixes; (8) `tcp` transport mapper vs createTransport deferred pending @wenshao direction (separate core/protocol decision).

1. **Single-server rediscovery bypass** — add `tryReserveSlot` at the top of `discoverMcpToolsForServerInternal`. Pre-fix a server refused at startup could be brought online later via `/mcp reconnect <name>` and exceed the cap in enforce mode.
2. **Empty `budgets[]` when mode=off** — early `return []` in `buildBudgetCells` when mode is `off`. Protocol docs / SDK types promise empty array; pre-fix emitted a synthetic noisy cell.
3. **runQwenServe validation + env leakage** — mirror CLI budget validation in `runQwenServe` (the embedded entry point); explicitly delete `QWEN_SERVE_MCP_*` env vars when options are undefined so multiple daemons in one process don't leak prior budget config to subsequent ACP children.
4. **Disabled-vs-refused precedence + stale refusal log** — config-disable wins over budget refusal in the per-server cell; `removeServer` + `disconnectServer` drop the entry from `lastRefusedServerNames` so operator action immediately clears the budget tag.
5. **Incremental remove-before-reserve ordering** — process config-removed servers FIRST in `discoverAllMcpToolsIncremental` so freed slots are visible to subsequent `tryReserveSlot` calls. Pre-fix scenario {a,b}→{a,c} with budget=2 wasted a slot.
6. **`scope` forward-compat type widening** — `'workspace' | (string & {})` on both `ServeMcpBudgetStatusCell` and `DaemonMcpBudgetStatusCell` so SDK consumers don't break when PR 23 adds `scope: 'pool'` per the documented no-schema-bump contract.
7. **Test comment alignment** — fix "With budget=1" comment to match `clientBudget: 2` code.

Plus 4 new core regression tests covering #1/#2/#4/#5, and 4 new serve tests covering #3 (boot rejection + env cleanup). 237/237 pass across the affected files (36 core mcp-client-manager + 50 acpAgent + 151 serve).

* docs(serve): clarify v1 snapshot-based budget warning detection (#4247)

Address github-actions review-summary finding (I) on PR #4247: v1 operators have no SSE push event for budget pressure yet (deferred to PR 14b), so the protocol doc should explicitly say how to detect warning / error states from the snapshot. Adds the three-way mapping `budgets[0].status` ↔ live/refused counts.

* fixup(serve): address PR 14 review round 2 (#4247 wenshao)

Addresses @wenshao review on PR #4247. Three critical safety fixes + four suggestion-level improvements.

Critical (zombie slot leaks — would break `enforce` mode for the rest of the daemon's lifetime):
- C2: `discoverAllMcpTools` connect() catch now releases reservedSlots + clients entry. Pre-fix one failed connect permanently consumed a budget slot.
- C3: `readResource` wraps client.connect() in try/catch; on throw the slot + client entry are cleaned up before re-raising. Tracked `weReservedSlot` so the cleanup only fires for newly-created lazy spawns (reused already-CONNECTED clients are untouched).
- (wenshao C1 was the rediscovery-bypass also caught by Codex + Copilot — already addressed in fixup 597f011e6.)

Suggestion:
- S4: `readBudgetFromEnv` downgrades `mode='enforce'` → `'off'` when no budget is set, mirroring the CLI + `runQwenServe` invariant. Fail-closed on operator misconfiguration rather than silently bypassing enforcement.
- S5: extract duplicated `mcp_budget_decision` telemetry into private `emitBudgetTelemetry(configuredCount)`.
- S6: rename `BudgetExhaustedError` constructor param `liveCount` → `reservedCount`. `reservedSlots.size` is what's blocking the new server, not the live CONNECTED count (those differ when a reserved server is disconnected).
- S7a: bump accounting-failure log level — `debugLogger.debug` (gated on debug=true) replaced by `process.stderr.write` so production daemons surface slot-leak / type-mismatch failures in journald/docker logs.

(S7b — expose `reservedSlots[]` on the wire for slot-leak debugging — deferred as additive; will be in PR 14b alongside the typed events.)

+ 3 new core regression tests (C2 leak release, C3 lazy-spawn leak release, S4 env enforce-downgrade). 626/626 tests pass across the focused suite; typecheck + lint clean.

* fixup(serve): address PR 14 review round 3 (#4247 wenshao second pass)

Addresses @wenshao's second review pass on PR #4247 (submitted 15:56Z after round 2 fixup landed). Four code fixes + three doc clarifications.

Code:
- R3 #5: `readResource` lazy-spawn path now checks `isMcpServerDisabled` BEFORE the budget gate. Pre-existing gap: a server disabled via `mcpServers.<name>.disabled: true` or `/mcp disable <name>` could be resurrected by any resource read. Disabled precedence over budget mirrors the per-server cell logic.
- R3 #6: `buildBudgetCells` now receives the post-disabled-filter `refusedCount` so the workspace cell matches the per-server cell precedence. Pre-fix a server disabled after being refused rendered `disabled` on its per-server row but `error: budget_exhausted` on the workspace row.
- R3 #7: extract `MCP_BUDGET_WARN_FRACTION = 0.75` constant. Was hardcoded in `acpAgent.buildBudgetCells` AND `commands/serve.ts` stderr breadcrumb (the latter with `Math.ceil` divergence on non-integer multiples). Pre-extract so PR 14b's dual-threshold (0.75 warn + 0.375 rearm) lands in one file.
- R3 #1: env-var enforce-without-budget downgrade (already fixed in round 2 ba3e3febd S4 — reply-only on the new thread).

Docs:
- R3 #2: docstring on `mcpTransportOf` now spells out the `tcp` vs `createTransport` divergence + records the deferred decision (PR 14b / future core). Closes the "comment claims X but code does Y" gap.
- R3 #3: comments in both `discoverAllMcpTools` catch (release slot — stop() owns lifecycle) AND `discoverMcpToolsForServerInternal` catch (KEEP slot — operator intent + health-monitor retry). Different paths, different contracts, both explicit.
- R3 #4: invariant note in `readResource` lookup→reserve sequence documenting the synchronous no-await guarantee that closes the TOCTOU window.

+ 3 new core regression tests (readResource disabled gate, disabled-wins-over-budget precedence, MCP_BUDGET_WARN_FRACTION pin). 629/629 tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 4 (#4247 wenshao second + third pass)

Addresses @wenshao's second + third review passes on PR #4247. One critical scope-correction (per-session vs per-workspace) + one zombie leak fix shared across three threads.

Critical correction — per-session vs per-workspace (wenshao R3 line 117 docs):
- Reality check: `acpAgent.newSessionConfig()` constructs a fresh `Config` + `ToolRegistry` + `McpClientManager` for EVERY ACP session. Each manager independently reads `QWEN_SERVE_MCP_CLIENT_BUDGET` env. So `--mcp-client-budget=10` with 5 sessions caps at 5 × 10 = 50 live MCP clients across the daemon, NOT 10. The "per-workspace" framing in v1 docs was incorrect.
- Pragmatic v1 path (not the big refactor): rewrite docs + change `scope: 'workspace'` → `scope: 'session'` so the wire contract reflects reality. Wave 5 PR 23 (shared MCP pool) will introduce a workspace-scoped manager and add `scope: 'workspace'` cells alongside.
- Files touched: `status.ts` + `sdk types.ts` (cell `scope` field widened to `'session' | 'workspace' | (string & {})` with v1 emitting `'session'`), `acpAgent.buildBudgetCells` (emits `'session'` + new code comment explaining the per-session truth), `docs/users/qwen-serve.md` (CLI flag + budget section relabel + ⚠️ v1 limitation callout), `docs/developers/qwen-serve-protocol.md` (capabilities section + JSON example + paragraph rewrite + per-session detection hint).

Zombie leak fix — single weReserved-pattern fix in discoverMcpToolsForServerInternal closes wenshao R3 line 546 + R4 line 639 + R4 line 929:
- Same pattern as R2 C3 (`readResource`): track `weReservedSlot = reservation === 'reserved' && this.reservedSlots.has(serverName)` (the set-membership guard distinguishes a real fresh reservation from `off`-mode's no-op return). On connect-failure, release slot + drop client only when `weReservedSlot`; an `'already_held'` reconnect keeps its slot so health-monitor retry doesn't compete for capacity.
- Pre-fix a brand-new server connecting via /mcp reconnect / health monitor / incremental's serversToUpdate that failed on connect() would permanently consume a budget slot under enforce mode.
- Updated R3's "always keep" doc comment to reflect the new two-mode cleanup (release on fresh + keep on reconnect).
- Caught and added a tripwire test for the `off`-mode no-op edge case (`tryReserveSlot` returns `'reserved'` without adding to the set in off mode — without the has-guard, my fix would have broken the pre-existing "should restore health checks after failed server rediscovery" test by deleting the failed client even in unbudgeted operation).

+ 2 new core regression tests (fresh-reserve connect-failure releases slot, reconnect connect-failure keeps slot). 631/631 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 5 (#4247 wenshao fourth pass)

Addresses @wenshao's fourth review pass on PR #4247. Two critical zombie-leak / staleness fixes; three reviewer findings deferred or already-addressed (replied + resolved on the threads).

Critical fixes:
- R5 line 956: `runWithDiscoveryTimeout` timeout handler now releases `reservedSlots.delete(serverName)` and drops the stale `lastRefusedServerNames` entry alongside the existing `clients.delete`. Pre-fix a timed-out server in `enforce` mode permanently held its budget slot; N consecutive timeouts permanently degraded daemon capacity. + regression test.
- R5 line 1268-1: `readResource` lazy-spawn path drops the server from `lastRefusedServerNames` when `tryReserveSlot` returns `'reserved'` (a successful late re-reservation). Pre-fix a server refused at discovery but later re-reserved via `readResource` (e.g., after another server freed a slot) kept its stale `disabledReason: 'budget'` tag in the snapshot. + regression test.

Reviewer findings deferred / already done (replied + resolved):
- R5 line 1268-2 (`no try/catch around connect()` in readResource): stale view — R2 C3 fixup ba3e3febd added the try/catch with the weReservedSlot cleanup pattern.
- R5 line 1274 (`BudgetExhaustedError.liveCount` semantic mismatch): R2 S6 fixup ba3e3febd renamed the param + readonly field to `reservedCount`, exactly matching the proposed semantic.
- R5 acpAgent.ts null line (`Math.ceil(0.75 * budget)` for small budgets): proposed fix is semantically a no-op for integer liveCount — `liveCount >= 0.75` and `liveCount >= Math.ceil(0.75) === 1` give identical results when liveCount is an integer. The underlying "small budgets jump ok→error" observation is a real but inherent limitation of percentage-based thresholds at small N; design tradeoff, not implementation bug.

46/46 core tests pass (44 prior + 2 new R5 regression). Typecheck + lint clean.

* fixup(serve): address PR 14 review round 6 (#4247 wenshao fifth pass)

Addresses @wenshao's fifth review pass on PR #4247. Two critical fixes (one TOCTOU race, one cross-daemon env leak).

Critical fixes:
- R6 Thread 2 (line 956): remove the duplicate pre-reservation block in `discoverAllMcpToolsIncremental`. The reservation already happens inside `discoverMcpToolsForServerInternal` (R1 fix #1). With both sites reserving, the timeout cleanup raced against the inner connect path — `runWithDiscoveryTimeout`'s timeout handler could release the slot mid-flight while the inner `connect()` later resolved successfully, leaving a CONNECTED client with NO reservation and breaking `enforce`-mode budget enforcement. With pre-reservation removed, the inner call owns the entire reservation lifecycle (reserve → connect → release-on-failure-via-weReservedSlot → cleared-by-timeout-if-fires) at a single site. Refusal behavior is observably identical from outside.

- R6 Thread 1 (runQwenServe.ts:216): per-handle env passthrough via new `BridgeOptions.childEnvOverrides` instead of mutating global `process.env`. Pre-fix concurrent embedded `runQwenServe()` handles with different MCP budgets would race on the global env — `defaultSpawnChannelFactory` snapshots `process.env` AT SPAWN TIME, so the last `runQwenServe()` call to set the var would silently win for ALL daemon handles' subsequent ACP child spawns. Wire surface:
  - `ChannelFactory` signature: `(workspaceCwd, childEnvOverrides?) => Promise<AcpChannel>`.
  - `BridgeOptions.childEnvOverrides?: Readonly<Record<string, string | undefined>>` — `undefined` value means "scrub this var from the child env" so an embedded caller can wipe a stale inherited var without touching global state.
  - `defaultSpawnChannelFactory` merges overrides AFTER `SCRUBBED_CHILD_ENV_KEYS` so the daemon-only secret list still wins (operators can't override the scrub).
  - `runQwenServe` closes over per-handle overrides; never touches `process.env`.

+ 3 new regression tests (incremental refusal post-pre-reservation-removal, runQwenServe-doesn't-mutate-process.env, bridge forwards childEnvOverrides to channelFactory with two concurrent bridges asserting isolation). 327/327 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 7 (#4247 wenshao sixth pass)

Addresses @wenshao's sixth review pass on PR #4247 (glm-5.1 via Qwen Code /review). One critical staleness fix + four real bug fixes + one operator-visibility breadcrumb + one refactor.

Critical:
- R7 #1 line 612: `discoverMcpToolsForServerInternal` now drops the entry from `lastRefusedServerNames` on successful connect+discover. Pre-fix a previously-refused server that reconnects via `/mcp reconnect` (or health-monitor retry after another server frees capacity) left the snapshot reporting `error / disabledReason: 'budget'` for a CONNECTED, working server until the next discovery pass cleared the per-pass log.

Real bugs:
- R7 #2 line 528: disabled gate added to `discoverMcpToolsForServerInternal`. Reachable from `/mcp reconnect`, OAuth re-discovery, and health-monitor `reconnectServer` — none of which previously checked `isMcpServerDisabled`. Pre-fix a disabled server could be resurrected through any of these paths, wasting a budget slot and registering tools the operator told us to ignore. Mirrors the bulk-discovery + readResource patterns. Optional-chain on the call to stay defensive against test fixtures missing the method.
- R7 #3 line 634: transport leak in the `discoverMcpToolsForServerInternal` connect-failure catch. Pre-fix when `connect()` succeeded (transport established) and `discover()` later threw, the catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / socket until Node exit. Best-effort `await client.disconnect()` added before the map cleanup.
- R7 #4 line 1302: `readResource`'s `weReservedSlot` now uses the same `reservation === 'reserved' && this.reservedSlots.has(serverName)` guard as `discoverMcpToolsForServerInternal`. Distinguishes a real fresh reservation from `off`-mode's no-op return. Maintenance-trap fix; in `off` mode the cleanup branch never fires now.
- R7 #5 line 1342: `readResource` re-checks `isMcpServerDisabled` on EVERY call, regardless of whether the client was just lazy-spawned or pre-existing. Pre-fix a server connected pre-disable and then operator-disabled mid-session via settings reload still served resource reads via its existing CONNECTED client until the next incremental discovery pass called `removeServer`.

Polish:
- R7 #6 line 191: `readBudgetFromEnv` now emits a stderr breadcrumb when env values are invalid (`QWEN_SERVE_MCP_CLIENT_BUDGET=abc`, `QWEN_SERVE_MCP_BUDGET_MODE=foo`). Pre-fix operator typos silently fell through to "no enforcement". Same pattern as the `--require-auth` boot log.
- R7 #7 line 464: extracted `dropRefusalEntry` (4 sites) + `refuseAndLog` (3 sites) helpers. Pure refactor, zero behavior change. The `readResource` refusal path now calls `refuseAndLog` before throwing `BudgetExhaustedError` so operators get the same stderr trail as bulk-discovery refusals.

+ 5 new core regression tests (refusal-cleared-on-success, internal-disabled-gate, discover-throw-disconnects, env-typo-breadcrumb, existing-client-disabled-rejected). 52/52 core tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 8 (#4247 wenshao seventh pass)

Addresses @wenshao's seventh review pass on PR #4247 (gpt-5.5 + DeepSeek/deepseek-v4-pro via Qwen Code /review). One critical transport leak + three soundness/consistency fixes; one optional clarity refactor explicitly deferred.

Critical:
- R8 #1 line 532 (4 duplicate threads): bulk-path transport leak. Mirrors the R7 #3 fix but in `discoverAllMcpTools` instead of the per-server path. Pre-fix: when `connect()` succeeded (transport established) and `discover()` later threw, the bulk catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / WebSocket / HTTP socket for the rest of the daemon's lifetime (`stop()` can't see what we just removed from `this.clients`). Best-effort `await client.disconnect()` added before `clients.delete` + `reservedSlots.delete`. Updated the doc comment that misleadingly claimed `stop()` was the lifecycle owner — true only for slot bookkeeping, not transports.

Soundness:
- R8 #2 line 431: tighten `readBudgetFromEnv` mode-without-budget downgrade. Originally only `enforce` got downgraded to `off` when no budget was set; `warn` mode without a budget threshold reached `emitBudgetTelemetry` with `clientBudget: undefined`, contradicting the JSDoc invariant `mode !== 'off' ⇒ clientBudget defined`. Now both `enforce` AND `warn` downgrade to `off` when no budget is configured. The invariant comment was also weakened to match the actual `?? 0` defense-in-depth (the new R8 #5 constructor downgrade closes the remaining edge case).

- R8 #5 line 302: constructor mirrors the `readBudgetFromEnv` downgrade for the direct `budgetConfig` parameter. All production callers (CLI, `runQwenServe`, env-var fallback) validate upfront, but a future code path that injects `budgetConfig` directly without re-validating would re-introduce the silent fail-open. Defense in depth.

- R8 #4 line 1221: distinguish fresh vs `'already_held'` reservations in `runWithDiscoveryTimeout`'s timeout handler. New private `freshReservations: Set<string>` field marked when `weReservedSlot === true` inside `discoverMcpToolsForServerInternal` and cleared in finally / catch / success. Timeout handler now releases the slot ONLY when `freshReservations.has(serverName)` — meaning the slot was freshly reserved by THIS in-flight call. `'already_held'` reconnect timeouts (a previously-healthy server's transient hiccup) keep the slot so health-monitor retry doesn't have to compete for capacity with new servers admitted during the timeout window. Aligns the timeout handler with the connect-failure catch's `weReservedSlot` semantics — closes the asymmetry wenshao R8 #4 caught.

Deferred:
- R8 #3 line 332 (`tryReserveSlot` `'observed'` return value clarity): optional, non-blocking style improvement that ripples through 3 call sites + many tests for zero behavior change. Worth doing in a focused refactor PR; flagged as deferred polish, not in this fixup.

+ 3 new core regression tests (bulk discover-throw disconnects, warn-no-budget downgrade, constructor enforce downgrade). 679/679 focused tests pass; typecheck + lint clean.

* fixup(serve): address PR 14 review round 9 (#4247 wenshao eighth pass)

Addresses @wenshao's eighth review pass on PR #4247 (glm-5.1 via Qwen Code /review). Six actionable findings adopted; two threads explained as not-actionable (one stale-view, one reviewer hallucination).

Critical / real bugs:
- R9 #2 line 1534: `readResource` lazy-spawn connect-failure catch now does best-effort `await client.disconnect()` BEFORE `clients.delete` + `reservedSlots.delete`. Mirror of R7 #3 (per-server discovery) and R8 #1 (bulk discovery) — closes the same transport-leak class for the third spawn path. Pre-fix: connect() establishing the transport but throwing on a later handshake step would orphan the stdio child / socket.
- R9 #6 line 1521: `readResource` lazy `client.connect()` now wraps in `Promise.race` against `discoveryTimeoutFor(serverConfig)` — same per-server timeout the bulk + incremental paths use. Pre-fix a hung MCP server during a resource-read spawn would block forever and permanently consume a budget slot under enforce mode, cascading into total budget exhaustion. `serverConfig` lookup hoisted to the top of `readResource` so both lazy-spawn and existing-client branches use identical timeout behavior.
- R9 #8 line 1514: `readResource` lazy spawn now calls `this.startHealthCheck(serverName)` after a successful connect. Pre-fix a lazy-spawned server that later disconnected (crash, network) had no automatic reconnect — sat DISCONNECTED until the next readResource or incremental pass. Mirrors `discoverMcpToolsForServerInternal`'s finally-block pattern.

Operator-visibility:
- R9 #7 (general): `readBudgetFromEnv` now writes a stderr breadcrumb when the `(enforce|warn)`-without-budget downgrade fires. Pre-fix a Docker Compose / k8s env that set `QWEN_SERVE_MCP_BUDGET_MODE=enforce` but forgot the matching `_BUDGET=N` would silently boot with enforcement off and `mcp_guardrails` capability advertised — operator only signal was the snapshot's `budgetMode: 'off'`. Now mirrors the R7 #6 invalid-value breadcrumb pattern.

Doc fixes:
- R9 #4 line 81: `McpBudgetConfig.clientBudget` JSDoc now reflects the R4 per-session scope correction. The doc was a leftover from the original "per-workspace" framing — every other doc surface (protocol doc, user doc, type comments on the snapshot cell, capability tag) was rewritten in R4 except this one.
- R9 #5 line 870: `acpAgent.buildBudgetCells` now spells out the `liveCount` (`accounting.total`, CONNECTED only — operator observability) vs `reservedSlots.size` (all reserved including in-flight — enforcement) semantic distinction. The intentional gap was undocumented in the type signatures, JSDoc, and protocol doc; future PR 14b SSE event payloads should reference both.

Not adopted:
- R9 #1 acpAgent:15: claimed "MCP_BUDGET_WARN_FRACTION not exported + getMcpClient* methods don't exist + 4 tsc errors" — verified incorrect: the constant IS exported (mcp-client-manager.ts:61), the 3 methods ARE class members (lines 379, 407, 412), and `npm run typecheck` is clean across all 4 workspaces. Reviewer's tool hallucinated this critical finding.
- R9 #3 mcp:410: reported the bulk-path transport leak that R8 #1 (commit 7228813c5) had already closed. Reviewer was on the pre-R8 commit view.

+ 2 new core regression tests (readResource lazy connect-fail disconnects + R9 #7 stderr breadcrumb). 57/57 core tests + 679/679 focused suite pass. Typecheck + lint clean.

* fixup(serve): address PR 14 review round 10 (#4247 wenshao ninth pass)

Two non-blocking 🟢 nits — both adopted for symmetry / explicitness.

- R10 line 357: constructor downgrade now emits the same stderr breadcrumb the env-var path got in R9 #7. Pre-R10 the `(enforce|warn)`-without-budget downgrade was silent for the direct-`budgetConfig` path, so a future caller bypassing CLI / env-var validation would have shipped a daemon advertising `mcp_guardrails` while silently disabling enforcement. Now boot logs surface the misconfiguration uniformly across all three resolution paths.
- R10 line 1572: documented the `McpClient.disconnect()` cancel-pending-connect contract that the timeout-race cleanup relies on across all three spawn paths (lazy `readResource`, bulk `discoverAllMcpTools`, per-server `discoverMcpToolsForServerInternal`). The bulk path's production stability since #3889 is implicit evidence the contract holds; comment makes the assumption discoverable to the next reader and notes a follow-up unit test would be valuable. No behavior change.

57/57 core tests pass. Typecheck + lint clean.
2026-05-18 12:07:23 +08:00
ChiGao
d07c958bb5
feat(tui): add daemon adapter spike (#4202)
* docs(tui): draft daemon adapter plan

* feat(tui): add daemon adapter spike

* fix(tui): harden daemon adapter event handling

* fix(tui): report daemon prompt failures

* fix(tui): surface daemon terminal failures

* fix(tui): harden daemon adapter state handling

* fix(tui): harden daemon adapter lifecycle

* fix(tui): harden daemon adapter follow-ups

---------

Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
2026-05-18 11:22:39 +08:00
ChiGao
4ab20ff6b8
feat(ide): add daemon connection spike (#4199)
* docs(ide): draft daemon adapter plan

* feat(ide): add daemon connection spike

* fix(ide): harden daemon connection lifecycle

* fix(ide): harden daemon permission replay

* fix(ide): harden daemon connection lifecycle

* fix(ide): harden daemon connection adapter

* fix(ide): harden daemon permission routing

* fix(ide): tighten daemon adapter review gaps

---------

Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
2026-05-18 10:38:26 +08:00
ChiGao
11ba3856df
feat(channel): add daemon bridge spike (#4203)
* docs(channel): draft daemon adapter plan

* feat(channel): add daemon bridge spike

* fix(channel): harden daemon bridge session lifecycle

* fix(channel): handle daemon terminal frames

* fix(channel): harden daemon bridge lifecycle

* fix(channel): harden daemon bridge isolation

* fix(channel): harden daemon bridge cancellation

* fix(channels): close daemon bridge review nits

---------

Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
2026-05-18 10:21:22 +08:00
tanzhenxin
d3cbe7614e
fix(core): preserve read-before-write state across idle microcompaction (#4243)
* fix(core): preserve read-before-write state across idle microcompaction (#4239)

Idle microcompaction blanked old tool outputs and then wiped the entire
file-read tracking cache to keep the read_file "unchanged" fast path from
pointing at content no longer in history. That wipe also destroyed the
read-before-write record, so after any idle break the model was falsely
told a file it had already read "has not been read in this session" and
forced into a redundant re-read of an unchanged file before it could
edit it.

Replace the blanket wipe with per-file targeting: when microcompaction
blanks a read/edit/write result, only that file's fast-path eligibility
is disarmed; the on-disk fingerprint and "seen this session" markers are
preserved, so edits/overwrites of an unchanged file proceed without a
re-read while a file that actually changed on disk is still correctly
caught. If a blanked result's file path cannot be resolved back to its
cache entry, fall back to the old blanket wipe so a stale "unchanged"
placeholder can never be served.

* refactor(core): harden microcompaction fast-path eviction (PR #4243 review)

- buildCallIdToFilePath now accumulates paths per callId (Map<string,
  string[]>) so a reused/duplicate functionCall.id disarms ALL candidate
  files instead of silently dropping one — over-disarming costs at most a
  redundant re-read, whereas keeping the wrong file armed would resurrect
  the issue #4239 dangling-placeholder hazard.
- Add a guard-rail comment tying COMPACTABLE_TOOLS to FILE_PATH_TOOLS.
- Document why readResidentInHistory is intentionally not reset on the
  fingerprint-drift branch.
- Add tests: duplicate-id disarm, and a mixed on-disk/ghost evicted batch
  forcing the safe blanket wipe.
2026-05-18 10:05:44 +08:00
jinye
f44ed09412
feat(serve): preflight and env diagnostics routes (#4175 Wave 3 PR 13) (#4251)
* feat(serve): introduce ServeErrorKind and BridgeTimeoutError (#4175 Wave 3 PR 13)

Lay the type foundation for `/workspace/preflight` and `/workspace/env` (and
the eventual MCP guardrails route) so cells emitted by all three share a
closed `errorKind` taxonomy:

- `SERVE_ERROR_KINDS` literal-list + `ServeErrorKind` union — the seven
  values from #4175 (`missing_binary`, `blocked_egress`, `auth_env_error`,
  `init_timeout`, `protocol_error`, `missing_file`, `parse_error`).
- `BridgeTimeoutError` typed class — `withTimeout` now rejects with this
  rather than a plain `Error`, letting `mapDomainErrorToErrorKind` recognize
  init / heartbeat / extMethod timeouts via `instanceof` instead of
  regex-matching message strings. Message format is preserved bit-for-bit.
- `mapDomainErrorToErrorKind` helper — one place to classify
  `BridgeTimeoutError`, `SkillError`, fs ENOENT/EACCES/EPERM, ModelConfigError
  subclasses (recognized by `name` field — they aren't on the public surface
  of `@qwen-code/qwen-code-core`), `SyntaxError`, plus message-regex fallbacks
  for legacy throw sites (`agent channel closed`, missing CLI entry path).
- `ServeStatusCell.errorKind` tightened from open `string` to the closed
  `ServeErrorKind` union. Backward compatible — PR 12 never assigned the
  field.
- SDK mirrors: `DAEMON_ERROR_KINDS` const + `DaemonErrorKind` type;
  `DaemonStatusCell.errorKind` tightened.

Tests: 11 new unit tests in `status.test.ts` covering each mapping rule plus
the BridgeTimeoutError shape.

No route changes; no behavior changes for any existing path.

* feat(serve): add buildEnvStatusFromProcess helper (#4175 Wave 3 PR 13)

Pure helper that constructs the `/workspace/env` payload from `process.*`
state. No I/O, no ACP roundtrip, no globals beyond `process.env`. The route
itself lands in the next commit.

- `ServeEnvKind` discriminant: `runtime | platform | sandbox | proxy | env_var`
- `ServeEnvCell extends ServeStatusCell` with `name` + optional `present` /
  `value`. Cells with `kind: 'env_var'` are presence-only — `value` is
  ALWAYS omitted to keep secret env vars off the wire even by accident.
- `ServeWorkspaceEnvStatus` envelope: `{ v, workspaceCwd, initialized: true,
  acpChannelLive, cells, errors? }`. `initialized` is structurally `true`
  because env answers from the daemon process directly; `acpChannelLive`
  reports whether a child is up but does not change the payload shape.

Whitelist policy:
- Auth/secret keys (presence-only): OPENAI/ANTHROPIC/GEMINI/GOOGLE/DASHSCOPE/
  OPENROUTER `_API_KEY`, `QWEN_SERVER_TOKEN`.
- Non-secret keys (also presence-only for shape uniformity): base URLs, locale,
  TZ, NODE_EXTRA_CA_CERTS, QWEN_CLI_ENTRY.
- Proxy vars (`HTTP_PROXY`/`HTTPS_PROXY`/`NO_PROXY`/`ALL_PROXY` + lowercase
  variants): credentials stripped via `redactProxyCredentials`, then
  `URL().host` so the wire only carries `host:port`. NO_PROXY is a host list
  rather than a URL so we pass the redacted form verbatim.

SDK mirrors: `DaemonEnvKind`, `DaemonEnvCell`, `DaemonWorkspaceEnvStatus`.

Tests: 9 unit tests covering the proxy-credential redaction, lowercase env
fallback, NO_PROXY pass-through, presence-only `env_var` invariant
(`'value' in cell === false`), whitelist enforcement, runtime tag detection,
and envelope shape.

* feat(serve): add GET /workspace/env route (#4175 Wave 3 PR 13)

Wire `buildEnvStatusFromProcess` from the previous commit through the
bridge, server, and SDK so remote clients can pre-flight the daemon's
runtime environment without spawning an ACP child.

- `workspace_env` capability tag (always advertised on a current daemon).
- `bridge.getWorkspaceEnvStatus()` answers entirely from `process.*` —
  the route never consults ACP. `acpChannelLive` reflects whether a child
  exists but does not change the payload, so an idle daemon and a busy
  one return the same env shape.
- `app.get('/workspace/env', ...)` mirrors PR 12's one-liner pattern.
- SDK: `DaemonClient.workspaceEnv()` returning `DaemonWorkspaceEnvStatus`.
- Docs: bullet in `docs/users/qwen-serve.md` calling out the
  presence-only redaction policy and the no-ACP-spawn guarantee.

Tests: server-level (env returned + `'value' in env_var === false`
assertion), bridge-level (idle and live both answer locally without
hitting ACP extMethod), SDK-level (recording-fetch round-trip on
`/workspace/env`). The `workspace_env` tag is added to the
`EXPECTED_STAGE1_FEATURES` capability list assertion.

* feat(serve): add /workspace/preflight daemon-cells path (#4175 Wave 3 PR 13)

Wire the preflight route. Daemon-level cells are populated unconditionally
from `process.*` and `node:fs`; ACP-level cells fall back to `not_started`
placeholders when no child is alive so a poll never spawns one.

- `workspace_preflight` capability tag.
- `ServePreflightKind` discriminant (12 values: node_version, cli_entry,
  workspace_dir, ripgrep, git, npm — daemon-level — plus auth, mcp_discovery,
  skills, providers, tool_registry, egress — ACP-level).
- `ServePreflightCell extends ServeStatusCell` with `locality: 'daemon' | 'acp'`
  + free-form `detail`. `ServeWorkspacePreflightStatus` envelope.
- `createIdleAcpPreflightCells()` factory: emits the six ACP-level cells with
  `status: 'not_started'` + a uniform `hint` so the bridge can stitch them in
  alongside daemon cells without ever calling ACP.
- `bridge.getWorkspacePreflightStatus()`:
  - Daemon cells via `buildDaemonPreflightCells` (Promise.all over Node-version,
    CLI-entry resolution mirroring `defaultSpawnChannelFactory`, `fs.stat` on
    `boundWorkspace` with ENOENT/EACCES/EPERM mapped to `missing_file`,
    best-effort `canUseRipgrep` / `getGitVersion` / `getNpmVersion` warnings).
  - ACP cells via `requestWorkspaceStatus` — idle factory returns the
    `not_started` placeholders; live path delegates to ACP via the
    `qwen/status/workspace/preflight` ext method (handler lands in next
    commit). Bridge-side timeout / channel-close while consulting ACP folds
    into envelope `errors[]` with `mapDomainErrorToErrorKind` classification;
    daemon cells still render.
- `app.get('/workspace/preflight', ...)` route + JSDoc bullet.
- SDK: `DaemonPreflightKind` / `DaemonPreflightCell` / `DaemonWorkspacePreflightStatus`
  mirrors; `DaemonClient.workspacePreflight()`.

Tests: server-level (route returns the bridge payload), bridge-level (idle
returns 6 daemon + 6 ACP `not_started` cells without spawning a channel),
SDK-level (`workspacePreflight()` round-trip). Capability test updated.

* feat(serve): wire ACP-side preflight cells (#4175 Wave 3 PR 13)

Populate the six ACP-level preflight cells inside the ACP child so
`/workspace/preflight` returns real values for live sessions.

- `extMethod(qwen/status/workspace/preflight, ...)` dispatches to a new
  `buildAcpPreflightCells(config)` private method.
- Five cell builders, each returning a `ServePreflightCell` with
  `locality: 'acp'`:
  - `auth`: `validateAuthMethod(authType, config)` returning non-null
    string → `auth_env_error`. Missing auth method → warning. Throws
    classified via `mapDomainErrorToErrorKind` with `auth_env_error`
    fallback.
  - `mcp_discovery`: rolls up `getMCPDiscoveryState()` + per-server
    `getMCPServerStatus(name)` counts. `connecting > 0` or in-progress
    discovery → warning + `init_timeout`; `disconnected > 0` post-discovery
    → error + `protocol_error`.
  - `skills`: `SkillManager.listSkills()`; SkillError throws are mapped
    via the helper (`PARSE_ERROR` → `parse_error`, `FILE_ERROR` →
    `missing_file`).
  - `providers`: `getAllConfiguredModels()`; empty list with a configured
    `authType` → warning + `auth_env_error`. ModelConfigError throws map
    to `auth_env_error`.
  - `tool_registry`: null registry → error + `protocol_error`. Otherwise
    surfaces tool count.
- `egress`: stays `not_started`. PR 14 plugs in the real probe.
- `errorCell` private helper extended with optional `errorKind` parameter;
  defaults to `mapDomainErrorToErrorKind(error)` so existing call sites
  (`mcp` / `skills` / `providers` envelope errors) automatically gain
  classification.

Tests: 2 new acpAgent tests — preflight returns the six expected ACP cells
with correct locality + statuses; preflight surfaces a `SkillError`
(`PARSE_ERROR`) on the `skills` cell as `errorKind: 'parse_error'`. The
core `vi.mock` block adds a SkillError class for `instanceof` matching
inside `mapDomainErrorToErrorKind`.

* docs(serve): preflight and env protocol section (#4175 Wave 3 PR 13)

Document `/workspace/env` and `/workspace/preflight` end-to-end:

- Common-cell shape: tighten `errorKind` from open `string` to the closed
  `DaemonErrorKind` enum (seven literals from #4175). Add an explicit
  redaction-policy paragraph covering env-var presence-only, proxy
  host:port reduction, and the whitelisted-secrets list.
- Capability-tag list: add `workspace_env` and `workspace_preflight`.
- New `### GET /workspace/env` section with sample payload, `DaemonEnvKind`
  / `DaemonEnvCell` types, and the redaction-policy paragraph spelling
  out which secret env vars are enumerated and how proxy URLs are
  reduced to `host:port`.
- New `### GET /workspace/preflight` section with idle sample payload,
  `DaemonPreflightKind` / `DaemonPreflightCell` types, the seven-value
  `errorKind` semantics table, and the bridge-error fallback contract
  (mid-request ACP channel close → cells drop to `not_started` + envelope
  carries one `errors[]` entry).
- Source-layout table: extend the `status.ts` row to mention the new
  `ServeErrorKind` / `BridgeTimeoutError` / `mapDomainErrorToErrorKind`
  surface; add a new `envSnapshot.ts` row.
2026-05-18 07:29:05 +08:00
qqqys
f84ddd434b
feat(core): fail impossible goals (#4230)
Some checks are pending
Qwen Code CI / Classify PR (push) Waiting to run
Qwen Code CI / Lint (push) Blocked by required conditions
Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* feat(core): fail impossible goals

* fix(core): refine impossible goal judgement

* fix(core): include goal feedback when continuing

* fix(core): clarify impossible goal terminal state

* fix(core): harden impossible goal feedback

* fix(core): log suppressed impossible verdicts

* fix(goal): address review suggestions

* test(goal): cover impossible parsing suggestions
2026-05-18 00:31:51 +08:00
Shaojin Wen
c93d66cd23
fix(serve): align build and integration test coverage (#4248)
* fix(serve): align test coverage with build inputs

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* test(serve): address review feedback

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-18 00:01:47 +08:00
Shaojin Wen
ad23c7ab34
docs: user + design docs for --json-schema structured output (#4051)
* docs: add user + design docs for --json-schema structured output

Follows up #3598 (cli/core feature shipped to main, no docs).

**User doc** `docs/users/features/structured-output.md` — covers
quick-start, schema input forms (inline + `@path`), output shapes per
`--output-format`, parse-time restrictions, retry/failure modes,
privacy redaction, permission gating, MCP shadow-tool handling, and a
worked `jq`-piped pipeline example. Registered under the existing
`features/_meta.ts` so it shows up in the docs sidebar between
"Headless Mode" and "Dual Output".

**Design doc** `docs/design/structured-output/structured-output.md` —
why the synthetic-tool-whose-param-schema-is-the-user-schema approach,
the four-stage parse-time validation pipeline,
`schemaRootAcceptsObject`'s decided-vs-deferred boundaries, main-turn
vs drain-turn parity via `processToolCallBatch`, the structured-
success terminal block, the cross-surface privacy redaction sharing
`STRUCTURED_OUTPUT_REDACTED_ARGS`, subagent context handling
(`forSubAgent`), MCP shadow-tool guard, the compatibility surface,
alternatives considered (and why rejected), and a file-by-file index.

Both docs are English-only — repo convention is English-only for
both `docs/users/features/` (zero zh-CN siblings) and `docs/design/`
(only `customize-banner-area/` has a zh-CN twin). Open to adding
zh-CN translations as a separate PR if there's demand.

* docs(structured-output): address PR review feedback

User doc:
- explicit stdout-vs-stderr contract and `{}`-schema behavior.
- 500 ms shutdown-holdback latency note.
- ReDoS warning for user-supplied `pattern` keywords.
- root `$ref` rejection + `allOf` workaround.
- per-retry token cost note.
- sibling-suppression success vs retry paths split out.
- numeric exit codes (1 / 53 / 130) for every failure mode.
- new "Session resumption" section for --continue / --resume.

Design doc:
- gloss the ToolSearch on-demand-loading reference.
- `not` row: drop the array-indexing-lookalike `[…]`.
- 500 ms holdback is best-effort, not guaranteed.
- redaction rationale extends to validation-failure retries.
- `CORE_TOOLS` phrasing: structured_output is excluded FROM the set;
  skill is in a separate dynamically-discovered category.
- subagent suppression maintainer note (single brittle call path).
- `--bare` parenthetical lists the three retained core tools.
- PR #4001 status (closed 2026-05-11, superseded).

* docs(structured-output): correct empty-schema / holdback / SIGINT claims

Three doc claims were stronger than the actual code behaviour:

- **Empty schema produces `{}`, not `null`.** `turn.ts` normalises
  the tool args via `(fnCall.args || {})` before they land in
  `structuredSubmission`, so a zero-arg call against `{}` is emitted
  as `{}` on stdout. The `?? null` in the adapter is defence-in-depth
  for the strictly-undefined case, which the upstream path doesn't
  produce.

- **Holdback is a cap, not a fixed wait.** The loop guard is
  `Date.now() < deadline && registry.hasUnfinalizedTasks()`, so it
  exits immediately when nothing is in flight. Reword as "capped at
  ~500 ms" with an early-exit note.

- **SIGINT can still flush a captured result.** The holdback loop
  does not poll the abort signal, so a SIGINT after the structured
  call is captured but before `adapter.emitResult` finishes may
  still land on stdout. Treat exit code 130 as the source of truth.

Also addresses the new auto-review summary suggestion about per-turn
schema cost: pull the cost callout up out of the bullet list (so it
covers both retry cost and schema-embedded-every-turn cost), since
the schema-embedding cost isn't retry-specific.

* docs(structured-output): correct stdout/stderr + json-mode envelope claims

Two doc claims didn't match `JsonOutputAdapter.emitResult`:

- **Model prose doesn't go to stderr in text mode.** Only error
  messages and log lines do. Successful runs emit just the
  JSON-stringified payload on stdout; accumulated assistant prose
  is discarded entirely (not mirrored to stderr). Point users at
  `--output-format json` / `stream-json` when they need the prose.

- **`--output-format json` emits a JSON array, not a single
  document with top-level fields.** The adapter calls
  `JSON.stringify(this.messages)` where `messages` is an array of
  message objects. `structured_result` lives on the final
  `type: "result"` element of that array, not at the document
  root, so consumers must read `.[-1].structured_result` rather
  than `.structured_result`.

* docs(structured-output): note schema-itself reaches the provider

The Privacy section so far only described `structured_output` *args*
being redacted from local on-device surfaces (telemetry + chat
recording). The schema body is a separate exposure surface — it
ships as the function declaration's `parameters` block on every
model request, so `enum`, `const`, `default`, `examples`,
`description`, `$comment`, etc. travel to the provider in
cleartext. Users defaulting to "redaction covers everything"
could legitimately leak secrets via schema-literal fields.

Add a callout in the user doc, plus a parallel paragraph in the
design doc explaining why the redaction stops at on-device
surfaces (the model needs the schema to satisfy the tool-call
contract, so provider-side redaction isn't possible).

* docs(structured-output): correct stdout-on-failure / ReDoS example / hooks / --bare deny / typo

Five issues from the latest /qreview pass:

- **stdout-vs-stderr is text-mode only.** In `--output-format json`
  and `stream-json`, the failure result message is emitted on
  stdout (final element of the JSON array, or the terminating
  `result` line on the JSONL stream). Wrappers in those modes must
  switch on `is_error`, not on whether stdout is empty.

- **ReDoS example didn't actually demonstrate the threat.** JSON
  Schema `pattern` only fires on string instances, and tool args
  are always objects, so the bare `{"pattern": "(a+)+b"}` schema
  doesn't constrain anything the model can supply. Move the
  pattern inside a string-typed property.

- **Hooks see raw `tool_input`.** `PreToolUse` / `PostToolUse` /
  `PostToolUseFailure` receive the unredacted args — including
  HTTP hooks that can forward off-device. Call this out
  explicitly so users with audit-style catch-all hooks know to
  filter or add hook-side redaction.

- **`--bare` drops settings-level deny.** Bare mode builds
  `mergedDeny` as `[...(bareMode ? [] : settings.permissions.deny), …]`
  — settings-level denies are skipped while the synthetic tool
  stays registered. Argv-level `--exclude-tools` still applies.
  Document this exception in the user doc and the design doc.

- **`maxSessionTurns` hint typo.** The hint points at "schema is
  unsatisfiable" — the original text inverted the polarity.

* feat(core): PR-2.5 — post-promote stream redirect + natural-exit registry settle

Closes the two limitations PR-2 (#3894) deferred for the Phase D part
(b) Ctrl+B promote flow (#3831):

1. **Post-promote stream redirect**: today the `bg_xxx.output` file
   is frozen at promote time because `ShellExecutionService` detaches
   its data listener as part of PR-1's ownership-transfer contract.
   PR-2.5 wires a caller-side `onPostPromoteData` callback so bytes
   from the still-running child append to the file via an
   `fs.createWriteStream` opened in `handlePromotedForeground`.
2. **Natural-exit registry settle**: today the registry entry stays
   `'running'` until `task_stop` / session-end `abortAll` fires its
   abort listener. PR-2.5 wires `onPostPromoteSettle` so natural
   child exit transitions the entry to `'completed'` / `'failed'`
   with the right exitCode / signal / error message.

## Service (`shellExecutionService.ts`)

- New exported types: `ShellExecuteOptions`, `ShellPostPromoteHandlers`,
  `ShellPostPromoteSettleInfo`.
- `execute()` options bag now accepts `postPromote?: { onData, onSettle }`.
  Threaded through to both `executeWithPty` and `childProcessFallback`.
- PTY's `performBackgroundPromote` (line ~1159): after disposing
  the foreground data + exit + error listeners, RE-ATTACH minimal
  forwarders that call `postPromote.onData` / `postPromote.onSettle`
  when the caller opted in. Backwards compat: when `postPromote` is
  unset the PR-2 detach-everything contract is preserved (the
  re-attach is gated on each callback being defined).
- `childProcessFallback`'s `performBackgroundPromote` (line ~706):
  same pattern — re-attach `stdout.on('data', ...)`, `stderr.on('data',
  ...)`, `child.once('exit', ...)`, `child.once('error', ...)` when
  the caller opted in. `error` listener routes through `onSettle`
  with `error` populated, so spawn-side errors after the foreground
  errorHandler detached don't crash the daemon via the default
  unhandled `'error'` event.
- Both paths wrap caller callbacks in try/catch so a thrown handler
  doesn't crash the child's data loop / unhandled-rejection the
  service.

## Shell tool (`shell.ts`)

- New `PromoteArtifacts` type — slots shared between the foreground
  `execute()` postPromote handlers (which fire on the service side
  as soon as promote happens) and the post-resolve
  `handlePromotedForeground` finalizer (which runs after
  `await resultPromise` returns). The two race; the buffer +
  settle-queue absorb that race so neither chunks nor the eventual
  exit info are lost.
- `executeForeground` wires `postPromote` handlers that route data
  to either `promoteArtifacts.stream` (if open) or
  `promoteArtifacts.buffer` (drained when the stream opens), and
  queue settle info if the wired handler isn't yet installed.
- `handlePromotedForeground` opens `fs.createWriteStream(outputPath,
  { flags: 'w' })`, writes the initial snapshot first, drains the
  buffer, then registers the entry and wires `onSettleWired` with
  the full registry decision table:
    - `error` set → `registry.fail(shellId, error.message, endTime)`
    - `exitCode === 0` → `registry.complete(shellId, 0, endTime)`
    - non-zero exitCode → `registry.fail(shellId, "Exited with code N", endTime)`
    - signal !== null → `registry.fail(shellId, "Terminated by signal N", endTime)`
    - all-null fallback → `registry.fail(shellId, "Exited with unknown status", endTime)`
- Fires queued settle synchronously after wiring so a fast command
  that exits between promote and finalizer doesn't get lost.
- Self-audit catch: closes the output stream on the
  `registry.register` throw path so the FD doesn't leak past the
  orphan-child kill.

## Tests

- 3 new in `shellExecutionService.test.ts`:
  - `post-promote bytes route to postPromote.onData when callback provided`
  - `postPromote.onSettle fires on natural child exit after promote`
  - `backwards compat: without postPromote, listeners stay fully detached`
- 3 new in `shell.test.ts` under a `foreground → background promote
  PR-2.5` describe block:
  - `post-promote bytes APPEND to bg_xxx.output via write stream`
  - `natural child exit transitions registry entry to "completed"`
  - `non-zero exit / signal / error → "failed" with descriptive message`
- Bulk-replaced 50 prior `{},` (empty 6th-arg shellExecutionConfig)
  with `expect.objectContaining({}),` + added `expect.objectContaining({
  postPromote: expect.any(Object) }),` as the 7th-arg expectation for
  the foreground execute call.
- Updated the existing `registers a bg_xxx entry on result.promoted`
  test to assert on `fs.createWriteStream` + `stream.write` instead
  of the now-removed `fs.writeFileSync` snapshot path.

182/182 shell.test.ts pass + 73/73 shellExecutionService.test.ts pass
+ 111/111 coreToolScheduler.test.ts pass + 60/60 AppContainer.test.tsx
pass; tsc + ESLint clean.

Self-audit: 3 rounds (positive / reverse / cross-file) found one
issue — output stream FD leak on `registry.register` throw — and
fixed it before flagging complete. All flagged edge cases (stream
errors, child-exits-before-wire-up race, task_stop during natural-
exit window, promote-never-happens cleanup, backwards compat
without callbacks) have explicit handling and / or test pinning.

* fix(core): #4102 review wave — 3 Critical + UTF-8 + tests

3 Critical race/correctness issues + 1 multibyte-corruption suggestion
+ 3 test coverage gaps addressed:

**Critical 1 — child_process late-chunk drop (service)**
Settle was fired on 'exit', but stdout/stderr can emit buffered data
between 'exit' and 'close'. Late chunks landed in
`promoteArtifacts.buffer` after shell.ts had already closed the
stream + transitioned the registry → silently dropped → truncated
`bg_xxx.output`. Switched to listening on 'close' which guarantees
all stdio is fully drained. (code, signal) payload is identical to
'exit', just with proper ordering.

**Critical 2 — stream-flush wait before registry transition (shell)**
`stream.end()` is asynchronous; pending writes can still be in the
libuv queue when it returns. The old code transitioned the registry
immediately after `.end()`, so a /tasks consumer could observe a
`completed` entry and read the output file BEFORE the trailing
bytes were on disk. Fixed: wired settle now `stream.once('finish',
...)` BEFORE calling `registry.complete/fail`. `error` event also
short-circuits to the transition so a late ENOSPC doesn't hang the
settle path forever.

**Critical 3 — stream-open-fail buffer leak (shell)**
If `fs.createWriteStream` threw, the catch path set `stream = null`
but the foreground `onData` handler would still take the
`stream === null` branch and push chunks into `promoteArtifacts.buffer`
— unbounded growth under a sustained child whose output file
couldn't be opened. Added a `streamFailed: boolean` latch on
`PromoteArtifacts`. When set, `onData` drops chunks (with a debug
log) instead of buffering. The catch branch sets the latch.

**Suggestion — shared TextDecoder corrupts multibyte UTF-8 (service)**
child_process post-promote used ONE TextDecoder for both stdout AND
stderr. The decoder's continuation-byte state machine assumes one
byte source; interleaved multibyte chunks corrupted. Now uses
separate decoders + flushes both with `decode()` (no `stream: true`)
on settle so trailing bytes surface as their final characters.

**Suggestion — llmContent reflects already-settled status (shell)**
When the queued-settle drain transitions the registry synchronously
(fast-exit race), the model-facing copy was still saying "Status:
running. … task_stop({...})". Updated to branch on
`postPromoteAlreadySettled` / `postPromoteFinalStatus` — when the
process is already gone, the copy says "Status: completed/failed"
and replaces the `task_stop` suggestion with "Process has already
exited; no `task_stop` needed".

**Suggestion — test coverage gaps**
Added: (a) `queued-settle race: onSettle BEFORE
handlePromotedForeground completes` — custom service impl fires
onSettle synchronously before resolving the promote promise, pins
the drain path. (b) child_process post-promote tests for stdout/stderr
forwarding + 'close'-not-'exit' settle + spawn-error settle.

**Self-audit**: Round 1 + reverse audit. Stream.once mock added to
fire 'finish' synchronously so existing tests don't hang on the new
flush wait. 76/76 shellExecutionService.test.ts (+3) + 183/183
shell.test.ts (+1) pass; tsc + ESLint clean.

* fix(core): #4102 review wave-2 — 3 more from gpt-5.5

C1 (shell.ts:2227): the WriteStream `'error'` event handler only
logged. `fs.createWriteStream` reports common open failures
(ENOENT / EACCES / ENOSPC) asynchronously via that event rather
than throwing. Result: `promoteArtifacts.stream` kept pointing at
the failed stream; `onSettleWired` attached a `.once('finish')`
listener that would never fire → registry stuck on `running`
forever. Latch the failure (null the shared `stream` slot,
set `streamFailed`); `onSettleWired`'s existing `if (!stream)`
branch then transitions the registry immediately.

C2 (shellExecutionService.ts:1468): the promote handoff removes the
foreground `ptyErrorHandler` and only re-attaches data + exit
listeners. A subsequent PTY `error` event had no listener — Node
treats an unhandled `error` from an EventEmitter as a fatal
exception that takes the whole CLI down. Attach a post-promote
forwarder that ignores expected PTY read-exit codes (EIO / EAGAIN,
same filter the foreground handler uses) and routes unexpected
errors through `postPromote.onSettle` with `error` populated.
Single-fire latch shared with `onExit` so settle never fires twice.

C3 (shell.ts:2503): `onSettleWired` waits for the stream's
asynchronous `'finish'` event before flipping
`postPromoteAlreadySettled`, but the model-facing `statusLine` was
built immediately after invoking `onSettleWired` on the queued
settle. A fast-exited promoted command could therefore land
"Status: running" + a `task_stop` instruction in production even
though settle was already observed. Split into two flags:
`postPromoteSettleObserved` (set synchronously when settle is
classified) drives the model copy; the registry transition stays
behind the stream flush.

Tests: +1 PR-2.5 wave-2 PTY error-routing test; +2 shell.ts tests
(stream open async error → registry still transitions; async
`'finish'` after queued-settle drain → llmContent says 'completed'
before registry transition fires).

* fix(core): #4102 review wave-3 — 4 actionable from deepseek-v4-pro

T2 (shell.ts:2456) — Critical buffer-leak race
`onSettleWired` previously set `promoteArtifacts.stream = null`
BEFORE calling `stream.end()`. Any `postPromote.onData` chunk that
landed between that null assignment and the actual flush completing
saw `stream === null && streamFailed === false` and pushed into
`promoteArtifacts.buffer` — a buffer that has no further drain path
(the foreground finalizer has already returned). Result: chunks
stranded indefinitely; PTY mode in particular hits this because
`onExit` can fire while kernel buffers still hold data. Fix drains
the pre-settle buffer to the stream BEFORE nulling AND latches
`streamFailed = true` so any subsequent chunk drops via the
existing `else if (streamFailed)` arm in `onData` instead of
leaking. Updates the `streamFailed` doc to cover both setters
(open-fail and settle-done) so the dual semantic is explicit.

T3 (shell.ts:2262) — silent chunk-drop in catch path
When `fs.createWriteStream` throws synchronously (rare: ENOENT on
a vanished tmpdir), chunks already in `promoteArtifacts.buffer`
were silently lost with no observability — oncall reading a
truncated `bg_xxx.output` had no way to distinguish "stream open
failed" from "child produced nothing." Logs the dropped chunk
count and empties the buffer.

T5 (shell.ts:2443) — opaque all-null fallback
The "Exited with unknown status" fallback fired the registry to
'failed' without any context about which fields were null. This
branch is meant to be unreachable; hitting it indicates the
service emitted a defective settle info object. Includes the
field values in both the fail message and a warn log so the
oncall engineer can tell this path apart from the other "failed"
branches.

T6 (shellExecutionService.ts:1452) — leaked PTY post-promote listeners
`ptyProcess.onData(...)` returns an `IDisposable` that was being
discarded; same for `onExit`. The `'error'` listener function was
also not captured (no way to `removeListener` it). EventEmitter
holds refs to listener closures, which transitively hold refs to
`onPostData` / `onPostSettle` / the caller's `promoteArtifacts`.
While bounded by the PTY's lifetime, the closures keep the
caller's state pinned for the post-settle delay window. Captures
all three handles into `postPromoteDataDisposable` /
`postPromoteExitDisposable` / `postPromoteErrorListener`, then
releases them via a shared `disposePostPromoteListeners()` call
from `firePostSettle` (idempotent — each slot null-checked and
nulled after disposal).

Tests: +1 service test for IDisposable + error-listener cleanup;
+2 shell.ts tests for buffer drain race and catch-path snapshot
fallback. Existing tests stay green (262 → 265 in the touched
suites; 7819 → 7822 across the core package).

* fix(core/test): drop unused 'registry' in wave-3 T2 test (TS6133)

CI build failed across all platforms with src/tools/shell.test.ts(4395,15): error TS6133. The variable was a leftover from copying the queued-settle test pattern; the wave-3 T2 test inspects writeStreamMock.write call history directly and never reads the registry, so the assignment is dead code. Drop it.

* fix(core): #4102 review wave-4 — 6 actionable from gpt-5.5 + deepseek-v4-pro

T1 (Critical, shellExecutionService.ts:860 child_process onSettle
exactly-once)
The PTY path used a `firePostSettle` latch but child_process wired
`close` and `error` independently to `onPostSettle`. A spawn-side
error followed by Node's auto-emitted `'close'` would call the
caller's settle TWICE, racing the registry transition. Added the
same single-fire latch on the child_process path.

T2 (Critical, shell.ts:2264 handoff race reorder)
Original order was `write(snapshot) -> drain buffer -> assign stream`.
Synchronous today (no race in current code), but assign-after-drain
leaves a hazard for any future refactor that adds an `await` inside
the drain loop — a chunk arriving in that window would land in
`promoteArtifacts.buffer`, then post-assign chunks would write to
the stream first, producing out-of-order bytes until the settle
drain. Reordered to `write(snapshot) -> assign stream -> drain
buffer`, which closes the hazard regardless of future async
additions.

T3 (Suggestion, shellExecutionService.ts:816 decoder flush gated
on onSettle)
The trailing-multibyte flush ran inside the `child.once('close', ...)`
handler, which was only installed when `onSettle` was set. An
`onData`-only caller (no onSettle) lost trailing continuation
bytes silently. Hoisted flush into `flushPostPromoteDecoders`
called from `firePostSettle`, and made `firePostSettle` available
on the `'close'` path independent of onSettle (T6 install).

T4 (Suggestion, shell.ts:1700 promoted ANSI passthrough)
The regular `executeBackground` path strips ANSI before writing to
`bg_xxx.output`; the promoted-foreground onData path appended raw
chunks. Reading `bg_xxx.output` after Ctrl+B showed plain text up
to the snapshot then raw `\x1b[31m` / cursor-move / clear-screen
sequences for the post-promote tail — unreadable. Apply
`stripAnsi(rawChunk)` before write/buffer, matching the
executeBackground contract.

T5 (Suggestion, shellExecutionService.ts:786 UTF-8 hardcoded)
The post-promote child_process decoders were hard-coded to
`new TextDecoder('utf-8')`, but the foreground decoder runs
encoding detection via `getCachedEncodingForBuffer`. On a non-UTF-8
child (e.g. GBK on a Chinese Windows shell), the snapshot decoded
correctly but the post-promote tail was mojibake. Capture the
foreground decoder's `.encoding` property and reuse it for
post-promote (with utf-8 fallback if foreground hadn't seen any
bytes yet, and a try/catch around `new TextDecoder` for the rare
unsupported-encoding case).

T6 (Suggestion, shellExecutionService.ts:1540 `error` listener
gated on onSettle)
The post-promote `error` listener was attached only when `onSettle`
was set. An `onData`-only caller still had the foreground
errorHandler detached; a post-promote spawn error would then crash
the CLI via Node's unhandled-error default. Hoisted the close +
error listeners into `if (postPromote)` so any caller opting into
post-promote gets crash protection; if `onSettle` is absent the
listeners log + drop instead of routing.

T7 (Suggestion, shellExecutionService.ts:791 onSettle-only
pipe-block deadlock)
Same root cause as T6: when only `onSettle` is set, the foreground
`stdout`/`stderr` 'data' listeners are detached and no post-promote
listener replaces them. The Readables stay paused, the OS pipe
buffer fills (~64KB on Linux), the child blocks on `stdout.write`,
'close' never fires, onSettle never fires. Added `child.stdout?.resume()`
and `child.stderr?.resume()` in the no-onData branch so the child
can drain its pipes and reach exit.

T8 (Suggestion, shell.ts:2614 dead inspectLine ternary)
`inspectLine`'s ternary returned the same string on both sides —
copy-paste leftover from when the other two adjacent ternaries
(statusLine / stopLine) were correctly varied. Collapsed to a
single string assignment.

Tests: +5 regression tests (4 child_process: T1 double-fire latch,
T3 onData-only flush, T6 onData-only error survives, T7 onSettle-
only resume; +1 shell.ts: T4 ANSI strip).

265 -> 270 in the touched suites; 7822 -> 7827 across the core
package; full suite green.

* fix(core/test): use ShellOutputEvent type in wave-4 onData callbacks (TS2345)

CI lint failed on the wave-4 (T3 / T6) tests with TS2345: pushing
ShellOutputEvent into Array<{type:string;chunk:unknown}> narrows
incompatibly. Switch to ShellOutputEvent[] (matches earlier helpers
at lines 758/966) and discriminate the union via .type === 'data'
when reading .chunk so the narrowed multibyte assertion still
type-checks.

* docs(structured-output): address doudouOUC's four review findings

- Tighten JSON/stream-json paragraph: not all failures emit a result
  to stdout (exit 53 / exit 130 are stderr-only); check exit code first
- Fix suppressed-sibling retry guidance: re-issue in a separate turn
  that does not include structured_output (avoids re-suppression)
- Distinguish settings-deny (exit 53) from --exclude-tools (exit 1)
  in Permission gating section
- Replace <projectDir> placeholder with actual path
  ~/.qwen/projects/<sanitized-cwd>/chats/<sessionId>.jsonl in both docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(structured-output): fix Permission gating — both deny paths strip registration

Forward audit against source code found that the Permission gating
section incorrectly distinguished settings.permissions.deny (claiming
tool stays visible, exit 53) from --exclude-tools (claiming
declaration stripped, exit 1). Both go through the same mergedDeny →
isToolEnabled path and both prevent registration — the model never
sees the tool. Corrected both docs to reflect the actual mechanism:
typical outcome is plain text (exit 1), with maxSessionTurns (exit 53)
as the fallback if the model loops through other tools.

* docs(structured-output): address doudouOUC's May 17 review (5 items)

- Clarify validation is client-side Ajv, not provider-side
- Qualify "same way" with DeclarativeTool abstraction parenthetical
- Match symptom→cause structure for maxSessionTurns hint
- Expand $ref workaround with concrete $defs example
- Clarify Dual Output See Also doesn't require --json-schema

* docs(structured-output): address 2 unresolved design-doc suggestions

1. Privacy/redaction section: note hooks as intentionally non-redacted
   surface (matches user-doc "Hooks see raw args" callout).
2. Dual call-site section: clarify differing post-helper termination
   flow between main-turn (direct return) and drain-turn (sentinel hop).

* docs(structured-output): address doudouOUC's May 17 review (2 nits)

1. Failure-paths table: align "three common causes" cell with the
   symptom→cause framing already used at parse-time validation pipeline
   section ("common stuck-run symptom and its two likely causes").
2. Dual call-site section: fix factual inaccuracy from prior commit —
   `drainOneItem` is `async (): Promise<void>` and returns nothing.
   The two-hop termination is via closure-mutated `structuredSubmission`
   (set by `processToolCallBatch`, checked by `drainLocalQueue` and the
   holdback loop), not a return-value sentinel.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-17 23:10:34 +08:00
xMillogx
79f38e0701
fix: add cache limits to prevent OOM during build/test (#4188)
* fix: add cache limits to prevent OOM during build/test

* chore: remove intermediate OOM analysis document

* fix(core): enforce MAX_TOTAL_PATHS cap when updating existing crawlCache key

Before: !crawlCache.has(key) guard in the MAX_TOTAL_PATHS eviction loop
short-circuited eviction when updating an existing key, allowing cache to
grow beyond 50,000 paths (F1 bug - OOM protection bypassed).

After: totalPaths is calculated excluding the current key, and the eviction
loop protects the key being written from being evicted as "largest".
FIFO bump (delete+set) ensures frequently updated keys move to end of queue.

Per @wenshao review: MAX_CACHE_ENTRIES guard on line 59 is preserved
(updating existing key doesn't increase entry count).

* test(core): add cache eviction tests and fix MAX_TOTAL_PATHS loop guard

* test(core): add eviction coverage for fileReadCache and crawlCache

* test(core): unskip bumped entries eviction test now that upsert bump is implemented

---------

Co-authored-by: Юрий Острожных <millog@mail.ru>
2026-05-17 23:01:23 +08:00
jinye
672de88a47
fix(serve): align integration test mirrors with merged capability + EventBus changes (#4245)
- qwen-serve-routes.test.ts: expand expected features list to 24, adding
  slow_client_warning (#4237) and workspace_mcp/workspace_skills/
  workspace_providers/session_context/session_supported_commands (#4241).
  Matches EXPECTED_STAGE1_FEATURES in server.test.ts:76-101.
- qwen-serve-baseline.test.ts: update SSE backpressure assertion from 3
  to 4 frames (tick, tick, slow_client_warning, client_evicted). PR #4237
  changed EventBus to force-push a slow_client_warning synthetic frame
  when the per-subscriber queue reaches the 75% warn threshold, before
  the client_evicted terminal frame fires on overflow. Mirrors the unit
  test at eventBus.test.ts:103-122.

Both integration mirrors drifted because integration tests only run on
schedule / workflow_dispatch (release.yml:4-9), not PR CI. Fixes the
release run 25992130532 failure in both Docker and No-Sandbox jobs.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
2026-05-17 22:53:11 +08:00
jinye
60fe594e8f
feat(serve): add read-only status routes (#4241)
* feat(serve): add read-only status routes

Add read-only daemon status endpoints for workspace MCP, skills, providers, session context, and session supported commands.

Expose matching typed SDK helpers and document the new additive v1 status surface.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(serve): harden read-only status snapshots

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(serve): address read-only status review feedback

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 21:37:20 +08:00
jinye
aef35c390e
feat(serve): session metadata and close/delete lifecycle (#4175 Wave 2.5 PR 11) (#4240)
* feat(serve): session metadata and close/delete lifecycle (#4175 Wave 2.5 PR 11)

Add explicit session close and metadata management to the daemon serve
infrastructure, closing the Stage 1 limitation that sessions could only
end via child crash or daemon shutdown.

- DELETE /session/:id — force-closes a live session (cancels active
  prompt, resolves pending permissions, publishes session_closed event)
- PATCH /session/:id/metadata — update mutable displayName
- Enriched GET /workspace/:id/sessions with createdAt, displayName,
  clientCount, hasActivePrompt
- session_closed + session_metadata_updated SDK event types with
  validation, reducer, and terminal event priority
- DaemonClient.closeSession / updateSessionMetadata + session client
  wrappers
- Capabilities: session_close, session_metadata

* fix(serve): address review feedback on session lifecycle PR

- Fix JSDoc on closeSession: clarify that bridge throws SessionNotFoundError
  (SDK absorbs 404 for client-side idempotency)
- Tighten event validators: isSessionClosedData checks closedBy type,
  isSessionMetadataUpdatedData checks displayName type
- PATCH /session/:id/metadata now returns effective stored metadata
  instead of echoing request fields, avoiding ambiguous no-op responses
- Only publish session_metadata_updated event when displayName changes
- Update chooseTerminalEvent comment to reflect session_closed

* fix: address PR 4240 review feedback

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix: address remaining PR 4240 suggestions

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix: update serve sessions test mock

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 20:42:15 +08:00
JerryLee
9abd704e09
fix(cli): record mid-turn queued user prompts (#4215) 2026-05-17 20:21:06 +08:00
jinye
4e06967c2b
feat(serve): mutation gating helper and --require-auth (#4236)
Some checks are pending
Qwen Code CI / Classify PR (push) Waiting to run
Qwen Code CI / Lint (push) Blocked by required conditions
Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* feat(serve): mutation gating helper and --require-auth

Implements issue #4175 Wave 4 PR 15. Adds the centralized
state-changing-route gate that Wave 4 follow-ups (memory CRUD, file
edit, MCP restart, device-flow auth) will reuse, plus the
`--require-auth` deployment knob that hardens the loopback developer
default for shared dev hosts / CI runners.

- `createMutationGate({ tokenConfigured, requireAuth })` factory in
  serve/auth.ts — per-route middleware with a 4-cell behavior matrix:
  pass-through under `requireAuth` or any token configured;
  `401 token_required` for `strict: true` routes on no-token loopback
  defaults; baseline pass-through otherwise.
- Existing Wave 1-2 mutation routes (POST /session, /session/:id/{load,
  resume,prompt,cancel,model}, /permission/:requestId) opt into the
  default non-strict factory call as the centralization marker. Wave 4
  routes will pass `{ strict: true }` to require a token even on
  loopback.
- `--require-auth` CLI flag + `ServeOptions.requireAuth`. Boot refuses
  without a token; closes the `/health` exemption when on so loopback
  `/health` also requires bearer auth; stderr breadcrumb so the
  hardened mode is visible in journald/docker logs.
- Conditional `require_auth` capability tag advertised only when the
  flag is on. New `CONDITIONAL_SERVE_FEATURES` registry primitive so
  future per-deployment toggles follow the same shape.
- 5 new unit tests in auth.test.ts covering the gate matrix; 5 added
  in server.test.ts for capability advertisement, conditional tag,
  /health 401 under --require-auth, and runQwenServe boot
  refusal + happy path. 245/245 serve tests pass; typecheck + eslint
  clean.

Refs: #4175

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4236 review feedback

Three small follow-ups from the automated reviewers on PR #4236:

1. **Drop misleading `--require-auth` from `token_required` error
   message** (Copilot inline auth.ts:262). The strict-mode 401
   listed three remediations but `--require-auth` is paired-required
   with a token at boot — naming it standalone would loop the operator
   into a different boot error. Keep the two valid standalone fixes
   (env var, --token); add inline note explaining the omission.
   `auth.test.ts` regex updated to `not.toMatch(/--require-auth/)`
   to anchor the new wording.

2. **Mention `/health` gating in `--require-auth` CLI description**
   (auto-reviewer Medium #2). Operators flipping the flag without
   reading the protocol doc would get paged when k8s/Compose probes
   start 401-ing. One sentence in the yargs description prevents that.

3. **Drift insurance comment between registry and
   `CONDITIONAL_SERVE_FEATURES`** (auto-reviewer Low #3). Document
   the four-step procedure for adding a new conditional tag so a
   future contributor doesn't update only the registry and silently
   advertise the tag unconditionally. Notes the Map<predicate>
   refactor as the right move when a second tag lands.

Deferred (not in this fix-up):
- Module-level PASSTHROUGH singleton (High #1) — micro-optimization,
  unmeasurable.
- Map<feature, predicate> for conditional features (High #2) —
  premature abstraction with one tag.
- Per-route `// non-strict marker` comments (Medium #1) — noise.
- `@see` cross-ref in types.ts (Low #2) — sugar.
- JSDoc bullet-list vs table (Low #1) — current format is fine.

Refs: #4175 #4236

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fixup(serve): address PR #4236 round-2 review feedback

Five small follow-ups from @wenshao + DeepSeek (via Qwen Code /review)
on PR #4236:

1. **Map<predicate> refactor for `CONDITIONAL_SERVE_FEATURES`**
   (review threads #3254467192 + #3254485912). Two reviewers asked
   for the same shape on the grounds that the `Set` + per-feature
   `if`-branch needed FOUR coordinated changes per new conditional
   tag and silently fail-CLOSED when the branch was missed. The Map
   collapses the predicate-decision and the set-membership into one
   entry per feature — adding a new conditional tag is now two
   coordinated changes (registry + Map entry) and a missing predicate
   is a TypeScript error rather than a silent omission. JSDoc
   updated.

2. **Drift-insurance test that iterates `CONDITIONAL_SERVE_FEATURES`**
   (review thread #3254467192 option 1, layered on top of #1).
   `server.test.ts` now walks every Map entry and asserts the
   predicate accepts/rejects as expected; future entries that don't
   add an assertion branch fail the test loudly so a missing
   predicate cannot ship silently. Adoption-of-record for the Map
   shape rather than relying on a hand-maintained invariant.

3. **Cache `strictDenier` for allocation symmetry** (review thread
   #3254467193). Wave 4 PRs will mount strict mode on multiple
   routes; without the cache each `mutate({strict:true})` call would
   allocate a fresh 401 closure. Now both the passthrough and the
   strict denier are pre-built singletons. Identity assertion in
   `auth.test.ts` anchors the cache so a future change that loses it
   surfaces in CI.

4. **Doc cosmetic — extra blank line in qwen-serve.md** (review
   thread #3254467198). Single blank line between the `>` quoted
   example and the following non-quoted bash block now.

5. **Doc correctness — `require_auth` is post-auth confirmation**
   (review thread #3254485910 from DeepSeek). When `--require-auth`
   is on, the global `bearerAuth` middleware gates every route
   including `/capabilities`, so an unauthenticated client cannot
   pre-flight `caps.features` to discover that auth is required —
   the discovery surface is the 401 response body itself. Both
   `qwen-serve.md` and `qwen-serve-protocol.md` rewritten to
   describe the tag as a post-authentication confirmation, matching
   the auth.ts JSDoc which already stated this correctly.

Trade-offs documented (no code change):

- **Body-parser ordering** (review thread #3254485915 from DeepSeek)
  noted as a comment block in `auth.ts`. Strict-mode 401 fires AFTER
  `express.json()` because the gate is per-route middleware. On
  loopback no-token defaults a strict route therefore parses the
  request body before refusing it — bounded by
  `express.json({limit: '10mb'})` × `--max-connections` (256
  default). Strict routes Wave 4 actually adds carry small bodies in
  legitimate use, so this isn't a production hot path. Future routes
  accepting large bodies should lift the gate to app-level (maintain
  a strict-path Set in `createServeApp`); flagged as a Wave 4
  follow-up rather than re-architecting the helper.

- **`bearerAuth` body-shape inconsistency** (review thread
  #3254467197 from @wenshao) flagged as a Wave 4 cross-PR
  follow-up. `bearerAuth` returns `{error: 'Unauthorized'}` while
  the strict gate returns `{code: 'token_required', error: '...'}`;
  SDK clients have to branch on both shapes. Standardizing
  `bearerAuth` to also carry a `code` field is orthogonal to this
  PR's scope.

Validation: 260/260 cli serve tests pass (was 258 — added the drift
insurance test + strict denier identity test); typecheck + eslint
clean.

Refs: #4175 #4236

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 20:10:37 +08:00
易良
eef06ce376
feat(cli): add structured memory diagnostics JSON (#3785)
* feat(cli): add memory diagnostics doctor command

* fix(core): platform-aware maxRSS conversion and accurate risk message

- Extract platform detection before building diagnostics so the correct
  unit conversion can be applied: multiply by 1024 on Linux (where
  process.resourceUsage().maxRSS is in KB) but leave the value unchanged
  on macOS/Windows (where it is already in bytes).
- Correct the native-memory-pressure risk message to accurately state
  that the threshold is 2× heap used, not just "larger than heapUsed".
- Add a dedicated test to assert that maxRSS is not multiplied on a
  non-Linux platform (darwin).

All 3 core and 9 CLI tests pass; typecheck clean.

Agent-Logs-Url: https://github.com/QwenLM/qwen-code/sessions/9b413337-68ed-4d5c-af99-0d42378900c3

* test(core): cover active request memory risk

* fix(cli): address memory diagnostics review feedback

* fix(cli): harden memory diagnostics review fixes

* fix(memory-diagnostics): tighten risk thresholds and expand readable output

- Add 64MB absolute floor on native-memory-pressure so cold processes don't trip
  the 2x ratio check; raise active-handles threshold from 100 to 256
- Show detachedContexts, nativeContexts, maxRSS, CPU times, smapsRollup
  availability, and v8HeapSpaces summary in the readable /doctor memory output
- Validate unknown memory subcommand args with a usage hint instead of silently
  dropping them
- Wrap human-readable strings in t(...) for i18n parity with the rest of doctor
- Advertise the memory subcommand via /doctor argumentHint while keeping
  acceptsInput false so the parent still auto-submits
- Document _getActiveHandles/_getActiveRequests as undocumented Node internals
- Update tests for new thresholds, expanded output, unknown-arg path, and
  abort-during-json

* fix(cli): harden memory doctor diagnostics

* fix(core): correct maxRSS byte handling and heapRatio consistency

- Remove incorrect * 1024 multiplier for maxRSS on Linux (Node.js >=14.10 returns bytes on all platforms)
- Use v8HeapStats.usedHeapSize for heapRatio to avoid cross-API inconsistency
- Update test expectations and rename "does not multiply" test

* fix(cli): resolve rebase conflicts in memory diagnostics

- Rename local formatMemoryDiagnostics to formatCoreDiagnostics to avoid
  naming conflict with the imported utility from memoryDiagnostics.js
- Update Session.test.ts to use objectContaining for _meta field added
  in recent main commits
- Align doctorCommand.test.ts assertions with current parent command
  state (argumentHint includes --sample/--snapshot from main)

* fix(core): use null instead of undefined for optional probes, deduplicate active count helpers

- optionalProbe/optionalSyncProbe now return null on failure so
  JSON.stringify preserves the keys instead of silently omitting them.
- Merge getActiveHandlesCount/getActiveRequestsCount into a single
  parameterized getProcessInternalCount helper.
- Update MemoryDiagnostics interface: v8HeapSpaces, openFileDescriptors,
  smapsRollup are now T | null instead of T | undefined.

* fix(cli): finish memory diagnostics review fixes

* fix(cli): address memory diagnostics review feedback

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-05-17 19:52:46 +08:00
Yan Shen
9985d91e08
feat(cli): add configurable plansDirectory for Plan Mode (#4062)
* feat(cli): add configurable plansDirectory for Plan Mode

Add a plansDirectory setting that allows users to define a custom
directory for approved Plan Mode files. Relative paths are resolved
against the project root and validated to prevent path traversal.

- Storage: add isPathWithinDirectory() with realpathSync-based symlink
  resolution to prevent traversal bypass attacks (direct, intermediate,
  and cross-drive)
- Config: cache plansDir at construction time, use atomic write
  (write-temp then rename) to prevent corrupted plan files on crash
- CLI: respect bareMode by clearing plansDirectory in minimal mode
- Docs: document plansDirectory with requiresRestart and gitignore hint
- Tests: 26 new tests covering path validation, symlink attacks
  (direct and intermediate), Windows cross-drive paths, mixed
  separators, and configuration integration

Closes #3548

* fix(core): align symlink test with return value

* fix(core): harden plans directory handling

* fix(config): address PR #4062 review findings for plansDirectory

- Handle EXDEV during atomic plan writes (cross-device rename fallback)

- Sanitize session IDs to prevent path traversal in plan filenames

- Expand tilde (~) in configured plansDirectory paths

- Preserve plansDirectory in bare mode

- Add EACCES/EPERM handling to getPlanFileNames with user-visible warnings

- Close TOCTOU gap with post-write path containment validation

- Fix docs to clarify plansDirectory is a top-level key

- Add happy-path I/O tests for configured plansDirectory
2026-05-17 19:43:24 +08:00
jinye
d2d426fad0
feat(serve): SSE replay sizing + slow_client_warning backpressure (#4175 Wave 2.5 PR 10) (#4237)
* feat(serve): SSE replay sizing + slow_client_warning backpressure

#4175 Wave 2.5 PR 10. Closes the SSE replay / backpressure knobs
called out in #3803 §02 so chatty Stage 1 sessions get an honest
reconnect window and operators get a heads-up signal before clients
are summarily evicted.

- **`DEFAULT_RING_SIZE` 4000 → 8000.** Per-session replay ring depth
  now matches the #3803 §02 target for chatty sessions.
- **`--event-ring-size <n>`** CLI flag (default 8000) lets operators
  tune the ring per daemon. Threaded `ServeOptions` →
  `BridgeOptions.eventRingSize` → both `new EventBus()` construction
  sites (fresh sessions + restore path). Validation is fail-CLOSED
  (positive finite integer; 0 / NaN / negative throw at boot).
- **`slow_client_warning` SSE frame.** When a subscriber's queue
  crosses 75% full the bus force-pushes a synthetic
  `slow_client_warning` to that subscriber once per overflow
  episode, carrying `{queueSize, maxQueued, lastEventId}`. The flag
  re-arms after the queue drains below 37.5% (hysteresis, no flap
  near threshold). If the queue actually overflows after the
  warning, the existing `client_evicted` terminal frame path still
  fires. Like `client_evicted`, the warning has no `id` (synthetic
  frame; must not burn a sequence slot for other subscribers).
- **`?maxQueued=N`** query param on `GET /session/:id/events`
  (range `[16, 2048]`, default 256). Lets cold reconnect clients
  pre-size their per-subscriber backlog so a large `Last-Event-ID:
  0` replay doesn't trip the warning on the first publish. Range
  rationale: lower bound 16 (smaller is useless for any replay);
  upper bound 2048 (so a single subscriber can't pin ~1 MB just by
  asking). Out-of-range / non-decimal returns `400
  invalid_max_queued` BEFORE opening the SSE stream — clean 4xx
  beats half-opening a stream + emitting a `stream_error` (which
  EventSource would auto-reconnect on).
- **`slow_client_warning` capability tag** — single source of truth
  for the warning frame + `?maxQueued` query param + ring-size
  knob. Old daemons silently lack all of these; pre-flight via
  `caps.features`.
- **SDK extensions** (`@qwen-code/sdk`): typed
  `DaemonSlowClientWarningEvent` (added to known event union and
  `DaemonStreamLifecycleEvent`); schema-validated by a new
  `isSlowClientWarningData` predicate; reducer
  (`reduceDaemonSessionEvent`) increments `slowClientWarningCount`
  + stores `lastSlowClientWarning`. Warning is **non-terminal** —
  `alive` stays true (only `client_evicted` / `stream_error` /
  `session_died` close the stream). Re-exported from the public
  SDK entry.
- **Docs**: `qwen-serve-protocol.md` updates the features list (adds
  `slow_client_warning` and the previously-missing `client_identity`
  to match reality post-#4231), documents the `?maxQueued` query
  param, adds the warning frame to the event table, and notes the
  new default ring size. `qwen-serve.md` adds the `--event-ring-size`
  flag row.

Tests: 19 eventBus (4 new: warning at 75%, once per episode,
no `id` on the synthetic frame, hysteresis re-arm), 106 bridge
(2 new: validate eventRingSize accept/reject), 111 server (4 new:
?maxQueued accept/absent/non-decimal/out-of-range +
EXPECTED_STAGE1_FEATURES update), 14 SDK daemonEvents (2 new:
schema validation + non-terminal reducer behavior). 321 focused
tests total, all green.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* refactor(serve): adopt PR #4237 review feedback (eventBus polish)

Address the actionable items from the Qwen Code review bot's pass
on PR #4237:

- Pre-compute `warnThreshold` / `warnResetThreshold` per
  `InternalSub` at `subscribe()` time so `publish()`'s per-event
  hot path is one integer compare per subscriber instead of a
  multiply + compare. The `!warned` short-circuit still collapses
  the steady state to a single boolean read; this just shaves a
  multiply when the threshold check actually fires.
- Document the back-of-queue ordering choice for the synthetic
  `slow_client_warning` frame in `EventBus.publish()`: front-push
  was considered but mid-stream front-insertion would mis-count
  `forcedInBuf` in `BoundedAsyncQueue.next()`, and `forcePush`
  already short-circuits via `resolvers.shift()` for the
  active-consumer case — the back-of-queue path only matters for
  stalled consumers, who can't drain regardless of warning
  position.
- Reuse the existing `collect()` helper in the "default ring size
  8000" test for consistency with the rest of the file; the new
  test also tightens the assertion by checking that the first
  retained event id is 2 (id=1 dropped by the ring) and the last
  is 8001.
- Soften the "~500 B per session" magic number in
  `BridgeOptions.eventRingSize`'s JSDoc to a qualitative
  description (each retained `BridgeEvent` is a reference plus its
  serialized payload; ceiling scales as
  `ringSize × average-event-size`).

Rejected:
- Bot's claim that the error JSON contains `\`...\`` escape
  sequences — bot misread the JS template-literal source as the
  wire output; `JSON.stringify` does not escape backticks, and
  the existing `cwd` error messages use the same style.
- Bot's "use `Record<string, never>` instead of `[key: string]:
  unknown`" suggestion on `DaemonSlowClientWarningData` — every
  other event-data type in `sdk-typescript/src/daemon/events.ts`
  carries the same index signature for additive-field
  compatibility.
- Bot's "features list breaks alphabetical order" — the
  capability list is grouped by protocol lifecycle (health →
  capabilities → session lifecycle → events → permissions), not
  alphabetical.

Tests: 139 focused tests across eventBus + httpAcpBridge + SDK
daemon events — all passing. Behavior unchanged; this is
hot-path micro-opt + comment polish only.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve): correct queue tagging + plumb maxQueued through SDK

Address both P2 findings from the Codex review pass on PR #4237.

**Bug 1: `BoundedAsyncQueue.forcedInBuf` position-invariant break**

The previous `forcedInBuf` counter only tracked LIVE-vs-FORCED
correctly when all forced entries lived at the FRONT of the buffer
(subscribe-time `Last-Event-ID` replay). The new mid-stream
`slow_client_warning` path force-pushes to the BACK of the queue
while the queue is still open, which the existing accounting was
not designed for:

  - publish 6 events at maxQueued=8 → 75% threshold trips →
    force-push warning at the back → buf=[1..6, warning],
    forcedInBuf=1.
  - consumer shifts `1` → forcedInBuf decremented to 0 (incorrect:
    `1` was a live frame, not the forced one).
  - consumer drains 2..6 + warning → buf=[], forcedInBuf=0, true
    live count = 0, but `size` getter and `push()` cap check then
    use `buf.length - forcedInBuf` which drifts over subsequent
    refills, causing premature warn / eviction before the cap is
    actually reached.

Replace the position-dependent counter with a per-entry
`{value, forced}` tag. `liveCount` is incremented in `push()` /
decremented in `next()` only when the shifted entry was non-forced
— position becomes irrelevant. `size` getter returns `liveCount`
directly. The class doc comment is rewritten to call out that the
new tag is the position-independent replacement for the old
"forced frames must stay at the front" invariant.

Regression test in `eventBus.test.ts` reproduces the codex trace
(warn at 75%, drain past warning, refill to cap) and asserts no
premature eviction.

**Bug 2: SDK does not expose `?maxQueued`**

`docs/users/qwen-serve.md` and `docs/developers/qwen-serve-protocol.md`
both document `?maxQueued=N` as something SDK clients can request,
but `SubscribeOptions` on `DaemonClient` only declared `lastEventId`
+ `signal`, and `subscribeEvents()` always fetched `/events` without
a query string. Typed-SDK consumers had no way to opt in without
hand-crafting URLs.

  - Add `SubscribeOptions.maxQueued?: number` with JSDoc noting the
    daemon range `[16, 2048]` and the pre-flight requirement on
    `caps.features.slow_client_warning`.
  - `DaemonClient.subscribeEvents` builds the URL with an optional
    `?maxQueued=<n>` segment. No client-side range validation —
    the daemon's `parseMaxQueuedQuery` is the source of truth and
    returns structured `400 invalid_max_queued`; duplicating the
    bounds in two layers would diverge on the next tweak.
  - `DaemonSessionSubscribeOptions extends SubscribeOptions` so the
    new field flows through `DaemonSessionClient` automatically.

Three new SDK tests:
  - subscribeEvents appends `?maxQueued=N` when set
  - omits the query string when absent (existing behavior preserved)
  - propagates a `400 invalid_max_queued` unchanged

Tests: 214 focused tests across eventBus / bridge / SDK
DaemonClient / DaemonSessionClient / daemonEvents, plus 111 in the
server suite. All green; the new eventBus regression case proves
the position-invariant fix.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* refactor(serve): adopt PR #4237 copilot review feedback

Address 6 of 8 copilot-reviewer findings on PR #4237; the other 2
(#1 forcedInBuf live-size corruption, #5 SDK lacks maxQueued) were
already fixed in bae42c88b — replied on the threads with the
commit hash.

- **[2] server.ts:1068** — `?maxQueued=` (present-but-empty) now
  fails closed with `400 invalid_max_queued` instead of silently
  falling back to the default queue cap. The API documents
  fail-closed for any malformed value before opening SSE, so an
  empty string is unambiguously malformed. New server.test.ts
  case locks this in.
- **[3] commands/serve.ts:93** — CLI help text for
  `--event-ring-size` no longer mis-shapes `Last-Event-ID` as a
  query parameter. It is an HTTP header, and the daemon's SSE
  route does not parse a `?Last-Event-ID=` query.
- **[4] docs/developers/qwen-serve-protocol.md:351** — clarify
  that `?maxQueued=N` controls the LIVE-event backlog cap.
  Replay frames are force-pushed and exempt from the cap; what
  consumes it is live events that arrive while the subscriber is
  still draining a cold-reconnect replay. Bumping for cold
  reconnects is still the right answer, but for the live tail,
  not for the replay frames themselves.
- **[6] eventBus.ts:214** — stale `ringSize=4000` performance
  comment updated to the new `ringSize=8000` default with a note
  about the O(n) `shift()` cost scaling.
- **[7] sdk-typescript events.ts:492** — `isSlowClientWarningData`
  now uses the existing `isFiniteNumber` helper instead of bare
  `typeof === 'number'`. Mirrors the sibling predicates and
  rejects `NaN` / `Infinity` payloads as schema garbage. New
  daemonEvents.test.ts assertions cover both.
- **[8] server.ts:127** — `createServeApp`'s default-bridge
  construction now also forwards `opts.eventRingSize` to
  `createHttpAcpBridge`, symmetric with the `runQwenServe.ts`
  path. Direct embeds / tests that called `createServeApp`
  without supplying their own bridge but did pass
  `ServeOptions.eventRingSize` were silently getting the
  default 8000 ring.

Tests: 326 focused tests across eventBus / bridge / SDK
DaemonClient / DaemonSessionClient / daemonEvents / server. All
green; the new server.test.ts case + the extended
daemonEvents.test.ts assertions cover the tightened guards.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* refactor(serve): adopt PR #4237 wenshao round-2 review feedback

Six adopted findings from @wenshao's second review pass on
PR #4237. The seventh ([10] forcedInBuf 3rd case invariant) was
already fixed in bae42c88b — replied on that thread.

- **[9] + [14] server.ts** — Sanitize attacker-controlled values
  before stderr interpolation in both `parseMaxQueuedQuery` and
  `parseLastEventId`. New `safeLogValue()` helper uses
  `JSON.stringify` to escape control characters (`\n`/`\r`/…) so a
  URL-encoded newline in `?maxQueued=%0a` can't inject extra log
  lines into journald/Loki/Splunk pipelines. Matches the
  `workspace_mismatch` sanitization style in `sendBridgeError`.
  Fixed in both helpers (the sibling pre-existing
  `parseLastEventId` had the same shape) so the file stays
  consistent.

- **[11] httpAcpBridge.ts** — `!Number.isFinite(eventRingSize)`
  was redundant: `Number.isInteger(NaN)` and
  `Number.isInteger(Infinity)` both return `false`, so the sibling
  `!Number.isInteger` already catches both. Drop the dead guard.

- **[12] httpAcpBridge.ts** — Add soft upper bound
  `MAX_EVENT_RING_SIZE = 1_000_000` on `eventRingSize` to catch
  operator typos (`--event-ring-size 80000000` vs `8000000`). At
  ~500 B per `BridgeEvent` an 1M-frame ring already pins ~500 MB
  per session — well past any realistic workload. Not a security
  boundary (operator-controlled flag), pure typo defense. Existing
  bridge construction test extended with an `80_000_000` case.

- **[13] commands/serve.ts** — CLI `--event-ring-size` flag now
  sources its default from `DEFAULT_RING_SIZE` (imported from
  `serve/eventBus.js`) instead of the hardcoded literal `8000`.
  Without this, a future bump of the bus default would silently
  not take effect for daemons launched through the CLI because
  the flag always overrides — single source of truth fixes that.

- **[15] eventBus.ts** — Drop unreachable `event.id ?? this.lastEventId`
  fallback in the `slow_client_warning` frame. `event` is locally
  constructed at the top of `publish()` with `id: this.nextId++`
  and is guaranteed defined. Use `event.id as number` directly +
  an inline note about the invariant.

Tests: 197 (eventBus 20 / bridge 107 / SDK DaemonClient 57 / SDK
daemonEvents 14) + 112 server. All green; the new upper-bound
bridge case + the existing log assertions pin the changed
behaviors.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 19:30:43 +08:00
JerryLee
6a1d55c737
fix(core): apply tool name migrations at dispatch (#4213) 2026-05-17 19:04:08 +08:00
jinye
0a4a08e443
feat(serve): add client heartbeat (#4175 Wave 2.5 PR 9) (#4235)
* feat(serve): add client heartbeat route

Adds POST /session/:id/heartbeat plus SDK helpers so long-lived
adapters (TUI/IDE/web) can refresh the daemon's last-seen
bookkeeping. Bridge stores per-session and per-client timestamps
behind a getHeartbeatState() snapshot accessor that PR 12
read-only diagnostics and PR 24 revocation policy will consume.

- Capability tag: client_heartbeat (advertised on /capabilities.features)
- Identified clients must echo X-Qwen-Client-Id; the bridge validates
  the id BEFORE bumping any timestamp so a forged id can't mask
  client absence
- Per-client entries are dropped together with the registration
  ref-count in unregisterClient, so churn doesn't leak stale ids
- getHeartbeatState returns a snapshot Map; mutating it does not
  leak into bridge state
- Anonymous heartbeats bump only the per-session watermark

Errors mirror the rest of the routes — 404 SessionNotFoundError, 400
invalid_client_id (header malformed or unknown for this session).

Roadmap PR 9 from #4175. Depends on PR 7 (#4231 client identity,
merged) for the trusted clientId registry.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* feat(sdk): re-export HeartbeatResult from package root

The published @qwen-code/sdk only exposes the root entrypoint via
`exports`; daemon subpath imports are not part of the public API.
Adding HeartbeatResult to packages/sdk-typescript/src/daemon/index.ts
made it reachable internally but not for downstream consumers writing
`import type { HeartbeatResult } from '@qwen-code/sdk'` — every other
daemon result type (PromptResult, SetModelResult, DaemonSession, etc.)
is forwarded through the root barrel, so HeartbeatResult was the only
hole in the heartbeat helper's public surface.

Inserted alphabetically between DaemonStreamLifecycleEvent and
KnownDaemonEvent to match the existing ordering convention.
2026-05-17 18:57:28 +08:00
jinye
07e0e82258
feat(serve): advertise typed_event_schema + pin SDK public surface (#4175 PR 4 follow-up) (#4226)
* feat(serve): advertise typed_event_schema capability

Follow-up to #4217 (`feat(protocol): add typed daemon event schema v1`,
Wave 1 PR 4 of #4175), which landed the SDK-side typed schema +
`KnownDaemonEvent` union + reducer but did not register a daemon-side
capability tag for it. Without the tag, non-SDK clients (web debug
UI, third-party adapters, channel/IDE backends not yet on
`@qwen-code/sdk`) have no way to detect at the protocol envelope
level that the daemon promises to emit only `KnownDaemonEvent`-shaped
frames — they would either pin against SDK version, or pre-flight
every frame defensively.

Add `typed_event_schema: { since: 'v1' }` to `SERVE_CAPABILITY_REGISTRY`,
inserted right after `session_events` (the route that delivers the
frames whose schema this tag describes). The capability is purely
informational — `narrowDaemonEvent`/`asKnownDaemonEvent` already
fall back to "unknown" for older daemons that don't advertise the
tag, so the SDK does not gate any behavior off the tag.

Sync `EXPECTED_STAGE1_FEATURES` (server.test.ts) and the integration
test array (qwen-serve-routes.test.ts) with the registry order, the
same lockstep discipline #4214 codified.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* test(sdk): pin typed event surface at the public SDK entry, point DaemonSessionClient docstring at it

Two small follow-ups to #4217 (Wave 1 PR 4 of #4175).

1. Public-entry regression fence

   `@qwen-code/sdk` is a single-entry package: `package.json.exports`
   only exposes `.` (`dist/index.{cjs,mjs,d.ts}`), and the bundle
   is built from `src/index.ts`. Symbols re-exported only from
   `src/daemon/index.ts` are unreachable to consumers unless they
   are also forwarded by `src/index.ts`. #4217 forwards the typed
   event schema correctly today, but the two-layer chain has no
   compile-time test pinning it — a future daemon export that lands
   in `src/daemon/index.ts` but is missed by `src/index.ts` would
   ship invisibly.

   Add `test/unit/daemon-public-surface.test.ts` that imports
   `* as Public from '../../src/index.js'`, asserts at runtime that
   every PR 4 value is `typeof === 'function'` (or a primitive of
   the expected shape), round-trips a raw `DaemonEvent` through the
   public `asKnownDaemonEvent` to prove the wire-up actually works,
   and compile-imports every PR 4 type so any drift fails to build.

2. DaemonSessionClient docstring pointer

   The class docstring already deferred typed event consumption to
   "the protocol schema layer" without a concrete pointer. Now that
   #4217 has put `asKnownDaemonEvent` and `reduceDaemonSessionEvent`
   in `./events.js`, name them so future readers can find the
   typed surface without grepping. No code change.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
2026-05-17 18:43:38 +08:00
易良
ab03badaae
fix(core): extend DashScope provider detection with additional hostname rules (#4157)
* fix(core): extend DashScope provider detection with additional hostname rules

Add internal domain patterns (*.dw.alibaba-inc.com and
*.data.aliyun-inc.com) to isDashScopeProvider() so that
DashScope-compatible private gateways are recognized
without requiring DASHSCOPE_PROXY_BASE_URL.

Closes #4138

* feat(core): add providerType field for explicit DashScope provider opt-in

Allow modelProviders entries to declare providerType: 'dashscope'
so that custom endpoints on private gateways are recognized as
DashScope-compatible without relying solely on hostname detection.

Closes #4138

* revert(core): remove DashScope providerType opt-in

* refactor(core): broaden dashscope internal domain match to *.alibaba-inc.com / *.aliyun-inc.com

Drop the business-specific dw./data. subdomain checks and treat any
*.alibaba-inc.com or *.aliyun-inc.com host as DashScope-compatible.
If a future internal service under these domains is not compatible,
narrow it via a runtime provider property rather than re-introducing
business-name subdomains.

* fix(core): clarify internal dashscope origin matching

* docs(core): clarify DashScope internal host matching
2026-05-17 18:36:42 +08:00
kkhomej33-netizen
a5e4839e07
fix(cli): restore ACP prompt counter on resume (#4233) 2026-05-17 18:32:44 +08:00
Shaojin Wen
0240c310fd
feat(core): PR-2.5 — post-promote stream redirect + natural-exit registry settle (#3831 follow-up) (#4102)
* feat(core): PR-2.5 — post-promote stream redirect + natural-exit registry settle

Closes the two limitations PR-2 (#3894) deferred for the Phase D part
(b) Ctrl+B promote flow (#3831):

1. **Post-promote stream redirect**: today the `bg_xxx.output` file
   is frozen at promote time because `ShellExecutionService` detaches
   its data listener as part of PR-1's ownership-transfer contract.
   PR-2.5 wires a caller-side `onPostPromoteData` callback so bytes
   from the still-running child append to the file via an
   `fs.createWriteStream` opened in `handlePromotedForeground`.
2. **Natural-exit registry settle**: today the registry entry stays
   `'running'` until `task_stop` / session-end `abortAll` fires its
   abort listener. PR-2.5 wires `onPostPromoteSettle` so natural
   child exit transitions the entry to `'completed'` / `'failed'`
   with the right exitCode / signal / error message.

- New exported types: `ShellExecuteOptions`, `ShellPostPromoteHandlers`,
  `ShellPostPromoteSettleInfo`.
- `execute()` options bag now accepts `postPromote?: { onData, onSettle }`.
  Threaded through to both `executeWithPty` and `childProcessFallback`.
- PTY's `performBackgroundPromote` (line ~1159): after disposing
  the foreground data + exit + error listeners, RE-ATTACH minimal
  forwarders that call `postPromote.onData` / `postPromote.onSettle`
  when the caller opted in. Backwards compat: when `postPromote` is
  unset the PR-2 detach-everything contract is preserved (the
  re-attach is gated on each callback being defined).
- `childProcessFallback`'s `performBackgroundPromote` (line ~706):
  same pattern — re-attach `stdout.on('data', ...)`, `stderr.on('data',
  ...)`, `child.once('exit', ...)`, `child.once('error', ...)` when
  the caller opted in. `error` listener routes through `onSettle`
  with `error` populated, so spawn-side errors after the foreground
  errorHandler detached don't crash the daemon via the default
  unhandled `'error'` event.
- Both paths wrap caller callbacks in try/catch so a thrown handler
  doesn't crash the child's data loop / unhandled-rejection the
  service.

- New `PromoteArtifacts` type — slots shared between the foreground
  `execute()` postPromote handlers (which fire on the service side
  as soon as promote happens) and the post-resolve
  `handlePromotedForeground` finalizer (which runs after
  `await resultPromise` returns). The two race; the buffer +
  settle-queue absorb that race so neither chunks nor the eventual
  exit info are lost.
- `executeForeground` wires `postPromote` handlers that route data
  to either `promoteArtifacts.stream` (if open) or
  `promoteArtifacts.buffer` (drained when the stream opens), and
  queue settle info if the wired handler isn't yet installed.
- `handlePromotedForeground` opens `fs.createWriteStream(outputPath,
  { flags: 'w' })`, writes the initial snapshot first, drains the
  buffer, then registers the entry and wires `onSettleWired` with
  the full registry decision table:
    - `error` set → `registry.fail(shellId, error.message, endTime)`
    - `exitCode === 0` → `registry.complete(shellId, 0, endTime)`
    - non-zero exitCode → `registry.fail(shellId, "Exited with code N", endTime)`
    - signal !== null → `registry.fail(shellId, "Terminated by signal N", endTime)`
    - all-null fallback → `registry.fail(shellId, "Exited with unknown status", endTime)`
- Fires queued settle synchronously after wiring so a fast command
  that exits between promote and finalizer doesn't get lost.
- Self-audit catch: closes the output stream on the
  `registry.register` throw path so the FD doesn't leak past the
  orphan-child kill.

- 3 new in `shellExecutionService.test.ts`:
  - `post-promote bytes route to postPromote.onData when callback provided`
  - `postPromote.onSettle fires on natural child exit after promote`
  - `backwards compat: without postPromote, listeners stay fully detached`
- 3 new in `shell.test.ts` under a `foreground → background promote
  PR-2.5` describe block:
  - `post-promote bytes APPEND to bg_xxx.output via write stream`
  - `natural child exit transitions registry entry to "completed"`
  - `non-zero exit / signal / error → "failed" with descriptive message`
- Bulk-replaced 50 prior `{},` (empty 6th-arg shellExecutionConfig)
  with `expect.objectContaining({}),` + added `expect.objectContaining({
  postPromote: expect.any(Object) }),` as the 7th-arg expectation for
  the foreground execute call.
- Updated the existing `registers a bg_xxx entry on result.promoted`
  test to assert on `fs.createWriteStream` + `stream.write` instead
  of the now-removed `fs.writeFileSync` snapshot path.

182/182 shell.test.ts pass + 73/73 shellExecutionService.test.ts pass
+ 111/111 coreToolScheduler.test.ts pass + 60/60 AppContainer.test.tsx
pass; tsc + ESLint clean.

Self-audit: 3 rounds (positive / reverse / cross-file) found one
issue — output stream FD leak on `registry.register` throw — and
fixed it before flagging complete. All flagged edge cases (stream
errors, child-exits-before-wire-up race, task_stop during natural-
exit window, promote-never-happens cleanup, backwards compat
without callbacks) have explicit handling and / or test pinning.

* fix(core): #4102 review wave — 3 Critical + UTF-8 + tests

3 Critical race/correctness issues + 1 multibyte-corruption suggestion
+ 3 test coverage gaps addressed:

**Critical 1 — child_process late-chunk drop (service)**
Settle was fired on 'exit', but stdout/stderr can emit buffered data
between 'exit' and 'close'. Late chunks landed in
`promoteArtifacts.buffer` after shell.ts had already closed the
stream + transitioned the registry → silently dropped → truncated
`bg_xxx.output`. Switched to listening on 'close' which guarantees
all stdio is fully drained. (code, signal) payload is identical to
'exit', just with proper ordering.

**Critical 2 — stream-flush wait before registry transition (shell)**
`stream.end()` is asynchronous; pending writes can still be in the
libuv queue when it returns. The old code transitioned the registry
immediately after `.end()`, so a /tasks consumer could observe a
`completed` entry and read the output file BEFORE the trailing
bytes were on disk. Fixed: wired settle now `stream.once('finish',
...)` BEFORE calling `registry.complete/fail`. `error` event also
short-circuits to the transition so a late ENOSPC doesn't hang the
settle path forever.

**Critical 3 — stream-open-fail buffer leak (shell)**
If `fs.createWriteStream` threw, the catch path set `stream = null`
but the foreground `onData` handler would still take the
`stream === null` branch and push chunks into `promoteArtifacts.buffer`
— unbounded growth under a sustained child whose output file
couldn't be opened. Added a `streamFailed: boolean` latch on
`PromoteArtifacts`. When set, `onData` drops chunks (with a debug
log) instead of buffering. The catch branch sets the latch.

**Suggestion — shared TextDecoder corrupts multibyte UTF-8 (service)**
child_process post-promote used ONE TextDecoder for both stdout AND
stderr. The decoder's continuation-byte state machine assumes one
byte source; interleaved multibyte chunks corrupted. Now uses
separate decoders + flushes both with `decode()` (no `stream: true`)
on settle so trailing bytes surface as their final characters.

**Suggestion — llmContent reflects already-settled status (shell)**
When the queued-settle drain transitions the registry synchronously
(fast-exit race), the model-facing copy was still saying "Status:
running. … task_stop({...})". Updated to branch on
`postPromoteAlreadySettled` / `postPromoteFinalStatus` — when the
process is already gone, the copy says "Status: completed/failed"
and replaces the `task_stop` suggestion with "Process has already
exited; no `task_stop` needed".

**Suggestion — test coverage gaps**
Added: (a) `queued-settle race: onSettle BEFORE
handlePromotedForeground completes` — custom service impl fires
onSettle synchronously before resolving the promote promise, pins
the drain path. (b) child_process post-promote tests for stdout/stderr
forwarding + 'close'-not-'exit' settle + spawn-error settle.

**Self-audit**: Round 1 + reverse audit. Stream.once mock added to
fire 'finish' synchronously so existing tests don't hang on the new
flush wait. 76/76 shellExecutionService.test.ts (+3) + 183/183
shell.test.ts (+1) pass; tsc + ESLint clean.

* fix(core): #4102 review wave-2 — 3 more

C1 (shell.ts:2227): the WriteStream `'error'` event handler only
logged. `fs.createWriteStream` reports common open failures
(ENOENT / EACCES / ENOSPC) asynchronously via that event rather
than throwing. Result: `promoteArtifacts.stream` kept pointing at
the failed stream; `onSettleWired` attached a `.once('finish')`
listener that would never fire → registry stuck on `running`
forever. Latch the failure (null the shared `stream` slot,
set `streamFailed`); `onSettleWired`'s existing `if (!stream)`
branch then transitions the registry immediately.

C2 (shellExecutionService.ts:1468): the promote handoff removes the
foreground `ptyErrorHandler` and only re-attaches data + exit
listeners. A subsequent PTY `error` event had no listener — Node
treats an unhandled `error` from an EventEmitter as a fatal
exception that takes the whole CLI down. Attach a post-promote
forwarder that ignores expected PTY read-exit codes (EIO / EAGAIN,
same filter the foreground handler uses) and routes unexpected
errors through `postPromote.onSettle` with `error` populated.
Single-fire latch shared with `onExit` so settle never fires twice.

C3 (shell.ts:2503): `onSettleWired` waits for the stream's
asynchronous `'finish'` event before flipping
`postPromoteAlreadySettled`, but the model-facing `statusLine` was
built immediately after invoking `onSettleWired` on the queued
settle. A fast-exited promoted command could therefore land
"Status: running" + a `task_stop` instruction in production even
though settle was already observed. Split into two flags:
`postPromoteSettleObserved` (set synchronously when settle is
classified) drives the model copy; the registry transition stays
behind the stream flush.

Tests: +1 PR-2.5 wave-2 PTY error-routing test; +2 shell.ts tests
(stream open async error → registry still transitions; async
`'finish'` after queued-settle drain → llmContent says 'completed'
before registry transition fires).

* fix(core): #4102 review wave-3 — 4 actionable

T2 (shell.ts:2456) — Critical buffer-leak race
`onSettleWired` previously set `promoteArtifacts.stream = null`
BEFORE calling `stream.end()`. Any `postPromote.onData` chunk that
landed between that null assignment and the actual flush completing
saw `stream === null && streamFailed === false` and pushed into
`promoteArtifacts.buffer` — a buffer that has no further drain path
(the foreground finalizer has already returned). Result: chunks
stranded indefinitely; PTY mode in particular hits this because
`onExit` can fire while kernel buffers still hold data. Fix drains
the pre-settle buffer to the stream BEFORE nulling AND latches
`streamFailed = true` so any subsequent chunk drops via the
existing `else if (streamFailed)` arm in `onData` instead of
leaking. Updates the `streamFailed` doc to cover both setters
(open-fail and settle-done) so the dual semantic is explicit.

T3 (shell.ts:2262) — silent chunk-drop in catch path
When `fs.createWriteStream` throws synchronously (rare: ENOENT on
a vanished tmpdir), chunks already in `promoteArtifacts.buffer`
were silently lost with no observability — oncall reading a
truncated `bg_xxx.output` had no way to distinguish "stream open
failed" from "child produced nothing." Logs the dropped chunk
count and empties the buffer.

T5 (shell.ts:2443) — opaque all-null fallback
The "Exited with unknown status" fallback fired the registry to
'failed' without any context about which fields were null. This
branch is meant to be unreachable; hitting it indicates the
service emitted a defective settle info object. Includes the
field values in both the fail message and a warn log so the
oncall engineer can tell this path apart from the other "failed"
branches.

T6 (shellExecutionService.ts:1452) — leaked PTY post-promote listeners
`ptyProcess.onData(...)` returns an `IDisposable` that was being
discarded; same for `onExit`. The `'error'` listener function was
also not captured (no way to `removeListener` it). EventEmitter
holds refs to listener closures, which transitively hold refs to
`onPostData` / `onPostSettle` / the caller's `promoteArtifacts`.
While bounded by the PTY's lifetime, the closures keep the
caller's state pinned for the post-settle delay window. Captures
all three handles into `postPromoteDataDisposable` /
`postPromoteExitDisposable` / `postPromoteErrorListener`, then
releases them via a shared `disposePostPromoteListeners()` call
from `firePostSettle` (idempotent — each slot null-checked and
nulled after disposal).

Tests: +1 service test for IDisposable + error-listener cleanup;
+2 shell.ts tests for buffer drain race and catch-path snapshot
fallback. Existing tests stay green (262 → 265 in the touched
suites; 7819 → 7822 across the core package).

* fix(core/test): drop unused 'registry' in wave-3 T2 test (TS6133)

CI build failed across all platforms with src/tools/shell.test.ts(4395,15): error TS6133. The variable was a leftover from copying the queued-settle test pattern; the wave-3 T2 test inspects writeStreamMock.write call history directly and never reads the registry, so the assignment is dead code. Drop it.

* fix(core): #4102 review wave-4 — 6 actionable

T1 (Critical, shellExecutionService.ts:860 child_process onSettle
exactly-once)
The PTY path used a `firePostSettle` latch but child_process wired
`close` and `error` independently to `onPostSettle`. A spawn-side
error followed by Node's auto-emitted `'close'` would call the
caller's settle TWICE, racing the registry transition. Added the
same single-fire latch on the child_process path.

T2 (Critical, shell.ts:2264 handoff race reorder)
Original order was `write(snapshot) -> drain buffer -> assign stream`.
Synchronous today (no race in current code), but assign-after-drain
leaves a hazard for any future refactor that adds an `await` inside
the drain loop — a chunk arriving in that window would land in
`promoteArtifacts.buffer`, then post-assign chunks would write to
the stream first, producing out-of-order bytes until the settle
drain. Reordered to `write(snapshot) -> assign stream -> drain
buffer`, which closes the hazard regardless of future async
additions.

T3 (Suggestion, shellExecutionService.ts:816 decoder flush gated
on onSettle)
The trailing-multibyte flush ran inside the `child.once('close', ...)`
handler, which was only installed when `onSettle` was set. An
`onData`-only caller (no onSettle) lost trailing continuation
bytes silently. Hoisted flush into `flushPostPromoteDecoders`
called from `firePostSettle`, and made `firePostSettle` available
on the `'close'` path independent of onSettle (T6 install).

T4 (Suggestion, shell.ts:1700 promoted ANSI passthrough)
The regular `executeBackground` path strips ANSI before writing to
`bg_xxx.output`; the promoted-foreground onData path appended raw
chunks. Reading `bg_xxx.output` after Ctrl+B showed plain text up
to the snapshot then raw `\x1b[31m` / cursor-move / clear-screen
sequences for the post-promote tail — unreadable. Apply
`stripAnsi(rawChunk)` before write/buffer, matching the
executeBackground contract.

T5 (Suggestion, shellExecutionService.ts:786 UTF-8 hardcoded)
The post-promote child_process decoders were hard-coded to
`new TextDecoder('utf-8')`, but the foreground decoder runs
encoding detection via `getCachedEncodingForBuffer`. On a non-UTF-8
child (e.g. GBK on a Chinese Windows shell), the snapshot decoded
correctly but the post-promote tail was mojibake. Capture the
foreground decoder's `.encoding` property and reuse it for
post-promote (with utf-8 fallback if foreground hadn't seen any
bytes yet, and a try/catch around `new TextDecoder` for the rare
unsupported-encoding case).

T6 (Suggestion, shellExecutionService.ts:1540 `error` listener
gated on onSettle)
The post-promote `error` listener was attached only when `onSettle`
was set. An `onData`-only caller still had the foreground
errorHandler detached; a post-promote spawn error would then crash
the CLI via Node's unhandled-error default. Hoisted the close +
error listeners into `if (postPromote)` so any caller opting into
post-promote gets crash protection; if `onSettle` is absent the
listeners log + drop instead of routing.

T7 (Suggestion, shellExecutionService.ts:791 onSettle-only
pipe-block deadlock)
Same root cause as T6: when only `onSettle` is set, the foreground
`stdout`/`stderr` 'data' listeners are detached and no post-promote
listener replaces them. The Readables stay paused, the OS pipe
buffer fills (~64KB on Linux), the child blocks on `stdout.write`,
'close' never fires, onSettle never fires. Added `child.stdout?.resume()`
and `child.stderr?.resume()` in the no-onData branch so the child
can drain its pipes and reach exit.

T8 (Suggestion, shell.ts:2614 dead inspectLine ternary)
`inspectLine`'s ternary returned the same string on both sides —
copy-paste leftover from when the other two adjacent ternaries
(statusLine / stopLine) were correctly varied. Collapsed to a
single string assignment.

Tests: +5 regression tests (4 child_process: T1 double-fire latch,
T3 onData-only flush, T6 onData-only error survives, T7 onSettle-
only resume; +1 shell.ts: T4 ANSI strip).

265 -> 270 in the touched suites; 7822 -> 7827 across the core
package; full suite green.

* fix(core/test): use ShellOutputEvent type in wave-4 onData callbacks (TS2345)

CI lint failed on the wave-4 (T3 / T6) tests with TS2345: pushing
ShellOutputEvent into Array<{type:string;chunk:unknown}> narrows
incompatibly. Switch to ShellOutputEvent[] (matches earlier helpers
at lines 758/966) and discriminate the union via .type === 'data'
when reading .chunk so the narrowed multibyte assertion still
type-checks.

* fix(core): address PR #4102 review — PTY error guard, flush timeout, diagnostic marker, failed-settle test

- Move PTY post-promote error listener from `if (postPromote?.onSettle)` to
  `if (postPromote)` to match child_process path and prevent unhandled error
  crashes for onData-only callers
- Add 10s flush timeout in onSettleWired so stalled streams don't leave
  registry entries stuck on 'running' forever
- Append diagnostic marker to output file on stream error so truncation
  is visible without debug logging
- Add queued-settle test with exitCode:1 asserting 'Status: failed.' in
  llmContent

* fix(core): address PR #4102 review — align PTY/child_process guards, add flush timeout, diagnostic marker, and tests

- Widen PTY post-promote onExit + error listener guard from
  `if (postPromote?.onSettle)` to `if (postPromote)` to match
  child_process path — prevents unhandled error crash and listener
  leak for onData-only callers
- Add 10s flush timeout in onSettleWired so stalled streams don't
  leave registry entries stuck on 'running' indefinitely
- Append diagnostic marker to output file on stream error so
  truncation is visible without debug logging
- Remove model name references from code comments
- Add tests: PTY onData-only error/exit, flush timeout fallback,
  appendFileSync diagnostic marker, queued-settle with failed exit code

* fix(core): address PR #4102 review round 2 — listener cleanup, rename, constant hoist

- Fix expect.objectContaining({}) misused as runtime arg in 2 execute() call sites
- Add child_process post-promote stdout/stderr listener cleanup in firePostSettle
- Rename streamFailed → streamClosed to reflect its overloaded semantics
- Hoist FLUSH_TIMEOUT_MS to module-level PROMOTE_FLUSH_TIMEOUT_MS constant
- Fix dangling FLUSH_TIMEOUT_MS reference (was undefined at runtime)
- Add Windows note to streams pause/resume comment
- Document PTY onData dispose-before-settle as known limitation
2026-05-17 17:57:08 +08:00
tanzhenxin
cc32ef2ff9
test(perf): skip daemon baseline harness under sandbox (#4234)
The qwen-serve-baseline harness walks the daemon process tree using
host-side `pgrep -P`. Under the Docker/Podman sandbox the daemon's
`qwen --acp` child and its MCP grandchildren run inside the container's
PID namespace, which host `pgrep` cannot observe, so the MCP-grandchild
descendant walk always sees zero and times out. The test passes in the
no-sandbox job but failed every retry in the Docker release job.

Extend the existing Windows `SKIP` gate to also skip when sandbox is
enabled, matching the precedent in acp-integration.test.ts and
cron-tools.test.ts.

Refs #4205
2026-05-17 17:52:34 +08:00
ChiGao
c25e22b575
feat(serve): add session-scoped permission route (#4232)
Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
2026-05-17 17:48:30 +08:00
ChiGao
4d9cbe49c0
feat(serve): add daemon-stamped client identity (#4231)
* feat(serve): add daemon-stamped client identity

* fix(serve): harden daemon client identity handling

---------

Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
2026-05-17 16:19:30 +08:00
kkhomej33-netizen
605e5eea16
fix(cli): include skill base dir in slash commands (#4224) 2026-05-17 15:52:29 +08:00
kkhomej33-netizen
d6914bdfd6
fix(core): align shell tool description with configured shell (#4170) 2026-05-17 15:36:59 +08:00
ChiGao
b90a2c91c9
feat(sdk): harden daemon session client (#4225)
Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
2026-05-17 15:05:37 +08:00
jinye
2453b82add
[codex] Add daemon session load/resume (#4222)
* feat(serve): add daemon session load resume

Adds HTTP and SDK support for restoring persisted daemon sessions through load/resume routes, including replay buffering for load and guarded concurrent restore handling.

Refs #4175

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(serve): address review feedback on daemon session load/resume

- Gate `defaultEntry` claim in `restoreSession` on
  `defaultSessionScope === 'single'`, mirroring `doSpawn`. Without the
  gate, a restored session silently became the omitted-scope attach
  target on `'thread'`-default daemons.
- Rename advertised capability `session_resume` to
  `unstable_session_resume` to match the underlying ACP method
  (`connection.unstable_resumeSession`). `session_load` stays stable.
- Seed `lastEventId: 0` in `DaemonSessionClient.resume`, symmetric with
  `load`. The agent's `unstable_resumeSession` schedules an
  `available_commands_update` via `setTimeout(0)`; without the seed the
  SDK consumer would miss that frame.
- Add HTTP-level test for the `RestoreInProgressError → 409` envelope.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* docs(serve): adopt review feedback comments on session load/resume

- Cross-reference the `POST /session` disconnect-cleanup rationale
  from `restoreSessionHandler`'s `!res.writable` branch so future
  maintainers find the BQ9tV race + tanzhenxin attach-rollback
  context without grep.
- Document `DaemonSessionState.{models, modes, configOptions}` in
  the SDK so callers can narrow to the ACP `SessionModelState` /
  `SessionModeState` / `SessionConfigOption` shapes.
- Add JSDoc on `DaemonClient.restoreSession` explaining why
  `loadSession` and `resumeSession` collapse into one transport.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve): preserve restore state and harden in-flight restore races

Address the four Critical findings from PR #4222 review (wenshao):

- Coalesced restore waiters now observe the same ACP state the
  original restore caller did. `state: {}` in `restoreSession`'s
  coalesce branch was clobbering the spread `restored.state`, so
  concurrent callers got different payloads based purely on timing.
  Cache the load/resume response on `SessionEntry.restoreState` and
  return it from both the existing-byId early return and the
  coalesce branch.
- Drop the `defaultEntry` promotion on restore. Explicit
  `session/load` / `session/resume` is "give me THIS id"; it must
  not become the implicit attach target for subsequent omitted-id
  `POST /session` callers under `single` scope. Reserves
  `defaultEntry` for sessions created through `doSpawn` only.
- Reserve coalesced attaches synchronously via
  `InFlightRestore.coalesceState.count` so the spawn owner's
  `requireZeroAttaches` disconnect-reaper sees a non-zero
  `attachCount` on the freshly registered entry and skips the
  kill. Without this, B's `attachCount++` happened after `await
  inFlight.promise`, leaving a window where A's HTTP-disconnect
  cleanup could reap the session out from under B.
- Include `pendingRestoreIds` in the `killSession` channel-teardown
  decision. The last live session leaving while a restore is
  in-flight on the same channel would otherwise SIGTERM the
  channel mid-restore.
- Bump `RestoreInProgressError`'s `Retry-After` from 1s to 5s
  (matches `SessionLimitExceededError`); under the default
  `initTimeoutMs` of 10s, 1s pushed clients into tight loops.

Tests: new bridge cases covering state propagation through
coalesce, the spawn-owner-disconnect race, the
pendingRestoreIds-aware channel teardown, and the no-promote-
on-restore invariant. Existing "attaches twice" test rewritten
to assert the cached restore state propagates.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* test(serve): cover acpAgent load/resume + restore route error mappings

Close the test-coverage gaps wenshao called out in PR #4222 review:

- acpAgent.test.ts gains a `QwenAgent loadSession /
  unstable_resumeSession` block that locks down the new contract
  end-to-end at the agent layer:
  * `loadSession` missing persisted session → throws
    `RequestError.resourceNotFound("session:<id>")` (code -32002
    + `data.uri`).
  * `loadSession` existing session → returns LoadSessionResponse
    AND triggers `session.replayHistory(messages)` so SSE
    subscribers see the persisted turns.
  * `unstable_resumeSession` missing session → same
    resourceNotFound contract.
  * `unstable_resumeSession` existing session → returns the
    response WITHOUT replaying history (resume restores model
    context internally; UI replay is intentionally suppressed).
  Required extending the mocked `RequestError` with
  `resourceNotFound`, and mocking `SessionService` per case.
- server.test.ts adds the missing restore-route wire mappings:
  `WorkspaceMismatchError → 400 workspace_mismatch` and
  `SessionLimitExceededError → 503 + Retry-After: 5`. Combined with
  the existing 409 case for `RestoreInProgressError`, the route
  layer now has full structured-error coverage.
- Updated the 409 test's `Retry-After` expectation from `1` to `5`
  to match the bumped retry hint.

Disconnect-cleanup tests for the restore route were intentionally
not added — the cleanup branch is line-for-line identical to
`POST /session`'s handler (which itself ships without route-level
disconnect tests due to flaky supertest + Node http close-event
timing).

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* docs(serve): document daemon session load/resume routes

Sync the docs to the routes that landed via PR #4222:

- `docs/developers/qwen-serve-protocol.md`:
  * Add `session_load` and `unstable_session_resume` to the
    advertised features list, with a note on the `unstable_`
    prefix mirroring ACP's underlying method name.
  * Document `POST /session/:id/load` and `POST /session/:id/resume`
    — request body, response shape (including the cached `state`
    field that late attachers observe), and the full error
    envelope: 404 unknown id, 400 workspace_mismatch, 503
    session_limit_exceeded (counts in-flight restores), 409
    restore_in_progress (cross-action race).
  * Note the SSE replay ring bound (4000 frames default) and the
    "subscribe immediately after load" guidance for long histories.
- `docs/users/qwen-serve.md`:
  * Add a "Loading and resuming a persisted session" section with
    the SDK example (`DaemonSessionClient.load` /
    `DaemonSessionClient.resume`) and the load-vs-resume
    decision table.
  * Update the durability model — sessions are still ephemeral
    across daemon restarts in Stage 1, but persisted sessions on
    disk can now be reloaded.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(test): use _meta payload to satisfy ACP SessionConfigOption types

The two new state-propagation tests in `httpAcpBridge.test.ts` used
`{ id, name, value }` as a `SessionConfigOption`, but ACP's actual
`SessionConfigSelect` shape requires `currentValue` + `options`. vitest
runs through esbuild and skips strict typechecking, so the local
`vitest run` passed; CI's `tsc --build` (run during `npm run prepare`)
caught it.

Switch the fixture to `_meta: { tag: '...' }` instead — `_meta` is
typed as `Record<string, unknown> | null` on the ACP response shapes,
so any payload survives. The assertions only need the bridge to
forward the state object intact, which `_meta` proves equally well
without committing the test to the full SessionConfigOption union.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(serve): symmetric restore coalesce guard + transportClosed leak + defensive cleanup

Address the two new Critical findings + the test/cosmetic gaps from
wenshao's second review pass on PR #4222 (`a3f38da3a`):

- **[Critical] Symmetric coalesce guard.** The previous guard only
  rejected `load`-on-`resume`; `resume` arriving while a `load` was
  in flight silently coalesced and inherited the load's history-
  replay frames over SSE — directly violating resume's "no UI
  replay" contract (made worse by `DaemonSessionClient.resume()`
  seeding `lastEventId: 0`). Tighten the guard to
  `action !== inFlight.action` so any cross-action race throws
  `RestoreInProgressError`. Same-action coalescing is unaffected.

- **[Critical] `transportClosed` dangling rejection.** When
  `withTimeout` wins the `Promise.race` against `channel.exited`,
  the `.then(throw)` chain on `channel.exited` stays pending. A
  later channel exit (next session boundary, daemon shutdown, agent
  crash) fires the `throw` with no observer attached — Node 22 logs
  `unhandledRejection`, and `--unhandled-rejections=throw`
  deployments crash the daemon. Add `transportClosed.catch(() => {})`
  to suppress the dangling rejection after the race settles.

- **`isAcpSessionResourceNotFound` exact-match fallback.** The
  message-fallback path used `message.includes(expectedUri)`, which
  would falsely match a sessionId of `"a"` against a message
  containing `"session:abc"`. Tighten to exact equality on the
  canonical `Resource not found: <uri>` form. The primary
  `data.uri` path remains the dominant code path.

- **`loadSession` mcpServers default symmetry.** `loadSession` now
  uses `params.mcpServers ?? []` to mirror `unstable_resumeSession`.
  Defends against a future ACP schema loosening that makes
  `LoadSessionRequest.mcpServers` optional — without the
  null-coalesce, `newSessionConfig` would `TypeError` on iteration.

Tests added:
- `httpAcpBridge.test.ts`: `resume`-on-`load` rejection (mirror of
  the existing `load`-on-`resume` test); regression for the
  dangling `unhandledRejection` (resolves `channel.exited` after
  the restore promise has already settled and asserts no
  `unhandledRejection` event); shutdown-awaits-restore via
  `Promise.race`-based ordering.
- `server.test.ts`: 400 for non-string and over-length `cwd` on
  the restore routes (mirroring the equivalent `POST /session`
  cases for `parseOptionalWorkspaceCwd`).
- `acpAgent.test.ts`: load with `getResumedSessionData()` returning
  `undefined` — distinct code path that does NOT call
  `replayHistory`.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

---------

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 12:58:47 +08:00