qwen-code

mirror of https://github.com/QwenLM/qwen-code.git synced 2026-05-24 05:26:21 +00:00

Author	SHA1	Message	Date
pomelo-nwu	f4e01a409e	fix(auth): address PR #4287 review (critical + suggestion) vscode AuthMessageHandler (Critical): - Add the missing protocol-selection step so custom-provider users can pick Anthropic/Gemini instead of being silently locked to OpenAI. - Validate free-form base URL with the same /^https?:\/\// check the CLI uses; reject file:/javascript: schemes. vscode AuthMessageHandler (Suggestion): - Stop filtering separator entries from the provider QuickPick so groups (Alibaba Cloud / Third Party / Custom) actually show as headers instead of a flat list. - Treat a null authInteractiveHandler as an error: surface an authError + cancellation notification instead of silently dropping the user's input. - Call notifyAuthCancelled when validateApiKey rejects so the webview state resets and the user can retry. core/providers/presets/openrouter.ts (Critical): - Replace the substring includes() in ownsModel with a URL-hostname match so paths like https://api.example.com/openrouter.ai/v1 stop being misidentified as OpenRouter models (and getting removed on re-install). vscode/services/settingsWriter.ts (Critical): - stripTrailingCommas() so JSONC files with trailing commas (VSCode's default style) parse instead of silently returning {} and then overwriting the entire settings file. - readSettings() distinguishes ENOENT (return {}) from parse errors (log + rethrow) so a malformed file never gets clobbered. - writeSettings() writes through a temp file + fs.renameSync atomic rename, eliminating the half-written file window on EACCES / disk-full / crash. - setValue() refuses to overwrite a scalar at an intermediate path segment (would have silently destroyed e.g. {"env": "legacy-string"}). core/providers/install.ts (Suggestion): - Move settings.backup?.() inside the try block so a backup failure still triggers the env-rollback path in catch. cli/config/loadedSettingsAdapter.ts (Suggestion): - Add the same UNSAFE_KEY_PARTS guard the vscode adapter has, so __proto__/constructor/prototype segments are rejected before reaching the underlying setNestedPropertySafe walker. Defense in depth: not exploitable today but the utility has no built-in guard. vscode/webview/providers/WebViewProvider.ts (Suggestion): - Hoist buildInstallPlan / applyProviderInstallPlanToFile to static imports (both modules already top-level imported); drops two per-call await import() round-trips. cli/utils/doctorChecks.ts (Suggestion): - Whitespace nit before the comma in the qwen-code-core import. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-19 13:58:25 +08:00
pomelo-nwu	e5191b885a	fix(vscode): await applyProviderInstallPlanToFile and grow test coverage Critical: applyProviderInstallPlanToFile fired the install plan with `void`, so any rejection (EACCES from persist(), prototype-pollution guard throw, etc.) was silently swallowed and WebViewProvider proceeded to disconnect/reconnect the agent as if the write had succeeded. Make the wrapper `async` and `await` it in the only caller. Tests added: - core/install.test: isSameModelIdentity fallback path (prepend-and-remove-owned with no ownsModel) — verifies models are matched on id+baseUrl, not just id. - vscode/AuthMessageHandler.test: happy-path with a fixed-baseUrl third-party provider, validateApiKey error branch, and BaseUrlOption picker presentation. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-19 10:21:31 +08:00
pomelo-nwu	18b35b48ec	i18n(cli): translate "Connect an LLM provider" in all locales Strict-parity locales (zh, zh-TW) require every built-in command description to be translated; the renamed /auth description was falling back to English and breaking the must-translate test. Add translations for zh / zh-TW (required) and refresh the other seven locales (en, ru, de, ja, fr, ca, pt) so the old "Configure authentication information for login" key is removed everywhere rather than left as a dangling dictionary entry. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 22:36:09 +08:00
pomelo-nwu	1b0b7f93a8	refactor(cli): rename /auth description to "Connect an LLM provider" The old description ("Configure authentication information for login") implied a Qwen-account login. After the /auth refactor it's really about picking an LLM provider and entering credentials, so the menu entry should say that. Also add 'connect' as an alt-name alongside the existing 'login' so users can type /connect when 'auth' feels wrong. Keep 'login' for muscle-memory compatibility. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 22:12:36 +08:00
pomelo-nwu	ed08f7296b	fix(auth): show base URL default as placeholder, not prefilled value In Custom Provider Step 2/6 (and on protocol switch), the base URL input started with the protocol's default URL pre-filled. Users who wanted a non-default endpoint had to manually clear the field first. Switch to placeholder semantics: the input starts empty, the default URL is shown as a hint, and submitting blank falls back to that default (then writes it back to baseUrl so downstream steps see a real value). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 21:56:28 +08:00
pomelo-nwu	0d8fe738f5	fix(auth): default Audio modality to off in provider advanced config In the /auth Custom Provider advanced-config step, "Enable modality" should default to Image + Video only. Audio was on by default, which implied the model accepts audio input even though most providers people configure here don't. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 21:49:31 +08:00
pomelo-nwu	c7f016271c	fix(vscode): guard ProviderSettingsAdapter against prototype pollution The dotted-key writer in createFileSettingsAdapter walked through any segment, including __proto__/constructor/prototype, which would let a malicious or malformed ProviderInstallPlan reach Object.prototype. Refuse to write paths containing reserved segments and use hasOwnProperty when traversing intermediate objects so that inherited properties cannot redirect the walk. Addresses CodeQL alert #226 surfaced on PR #4287. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 21:19:53 +08:00
pomelo-nwu	efcfce5b6b	Merge remote-tracking branch 'origin/main' into refactor/unify-provider-config-to-core # Conflicts: # packages/cli/src/ui/components/ManageModelsDialog.test.tsx # packages/cli/src/ui/components/ManageModelsDialog.tsx # packages/cli/src/utils/apiPreconnect.ts # packages/core/src/providers/all-providers.ts	2026-05-18 20:49:31 +08:00
pomelo-nwu	b4aa4b4f38	refactor(providers): prefix Alibaba plan presets with alibaba- Rename coding-plan.{ts,test.ts} → alibaba-coding-plan.{ts,test.ts} and token-plan.{ts,test.ts} → alibaba-token-plan.{ts,test.ts} so the file names line up with the existing alibaba-standard preset and make it obvious at a glance which presets belong to Alibaba ModelStudio. Export names (codingPlanProvider, tokenPlanProvider, TOKEN_PLAN_, CODING_PLAN_) are unchanged — only the file paths and the two imports in all-providers.ts / index.ts move. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 18:48:41 +08:00
pomelo-nwu	9431966625	refactor(providers): drop unused authMethod field from ProviderConfig Every preset has had authMethod='input' since OpenRouter switched to the standard API-key flow, making the field a dead dimension. Removing it cleans up three never-taken branches and aligns the type with reality: connecting a provider always means entering an API key. - core: remove ProviderConfig.authMethod; shouldShowStep('apiKey') is now unconditionally true; drop authMethod from 9 presets - vscode AuthMessageHandler: drop the OAuth branch in handleAuthInteractive - vscode WebViewProvider: simplify the apiKey-required guard - tests: update provider-config.test and custom-provider.test If a future provider needs a browser-based flow, the field can be re-introduced; for now the smaller surface is worth more. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 18:45:17 +08:00
pomelo-nwu	6804a18f01	refactor(auth): unify applyProviderInstallPlan in core, drop cli/auth CLI and vscode now share core's applyProviderInstallPlan instead of keeping two parallel implementations. The CLI-only env rollback (snapshot process.env, restore on error) is folded into the core version so vscode also benefits from it. CLI ships a LoadedSettingsAdapter that maps LoadedSettings to core's ProviderSettingsAdapter contract. Backup/restore is layered: write a .orig file, structuredClone settings + originalSettings, then recomputeMerged() on restore — same guarantees as before, just routed through the adapter. Tests for the install logic are migrated to core and rewritten against the adapter mock (more focused than the previous LoadedSettings/Config mocks). packages/cli/src/auth/ is gone entirely. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 18:14:12 +08:00
pomelo-nwu	423e453069	refactor(cli): drop OpenRouter OAuth + /manage-models, simplify /auth OpenRouter now uses the standard API-key flow under "Third-party Providers" (issue #4108). The whole OpenRouter OAuth implementation (PKCE, callback server, model auto-install) and the /manage-models command (only OpenRouter was wired in; /auth Step 2 already covers model selection) are removed. /auth is renamed around the "Connect a Provider" mental model: - Dialog title is now "Connect a Provider"; the OAuth main entry is gone - handleAuthSelect (mixed close + auth trigger) is split into a single-purpose closeAuthDialog; legacy wrappers (handleSubscriptionPlanSubmit, handleApiKeyProviderSubmit, handleCustomApiKeySubmit, ...) are dropped in favor of the unified handleProviderSubmit Core: openRouterProvider switches to authMethod='input', uiGroup='third-party', ships with two recommended free models, and is reordered to the end of the third-party list to keep DeepSeek as the default highlight. Net diff: 34 files, +124 / -3835. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 17:54:40 +08:00
jinye	33b2e0dccc	fix(serve): normalize Windows path separators in workspace file read responses (#4279 ) `workspaceRelative` returned the platform-native separator from `path.relative`, leaking backslashes into `/file`, `/stat`, `/list`, and `/glob` response paths on Windows. Surfaced as a Windows-only CI failure in the `GET /glob > scopes glob matches to cwd` test (`['sub\\inside.ts']` vs expected `['sub/inside.ts']`). Always emit POSIX-style separators so SDK consumers see the same shape across platforms. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)	2026-05-18 17:53:14 +08:00
jifeng	b208f34455	feat(serve): add /demo debug page for qwen serve daemon (#4132 ) * feat(serve): add /demo debug page for qwen serve daemon Add a self-contained HTML debug page at GET /demo that provides a browser-based UI for exercising all daemon routes: session create/attach, prompt send/cancel, SSE event streaming, model switching, permission voting, and health/capabilities checks. Also add a same-origin exemption middleware (before the CORS deny layer) so browser fetch calls from the demo page pass through while external Origins remain blocked. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(serve): address CR feedback for /demo page security and robustness - Fix XSS: build permission buttons with DOM APIs instead of innerHTML - Fix SSE: move currentEvent outside read loop for cross-chunk frames - Fix SSE: handle stream end (flush trailing buffer, update UI status) - Security: move /demo route behind hostAllowlist and bearerAuth guards - Security: add host.docker.internal to same-origin Origin allowlist - Add Auth Token input and include Authorization header in API/SSE calls - Add try/catch to /demo route handler with writeStderrLine logging - Check API result before removing permission card from UI - Add 7 tests for /demo route and Origin-stripping middleware * fix(serve): move /demo before bearerAuth so browsers can reach it Browsers cannot attach Authorization headers on address-bar navigation, so /demo behind bearerAuth was unreachable when --token was set. Move the /demo route after CORS + Host allowlist but before bearerAuth. The static HTML shell contains no secrets; all API/SSE routes remain bearer-protected and the in-page token input authenticates them. * feat(serve): show 401 token hint on demo page When an API call returns 401 Unauthorized, highlight the Auth Token input field with a yellow border and display a hint message guiding the user to enter their bearer token. Applies to both API calls and SSE connections. The hint auto-dismisses after 6 seconds. * fix(serve): address round-2 CR feedback for /demo security - Loopback-gate /demo like /health: pre-auth on loopback, post-auth on non-loopback. Prevents unauthenticated access on public interfaces. - Add X-Frame-Options: DENY + CSP frame-ancestors to /demo response to prevent clickjacking via cross-origin iframe embedding. - Cache selfOrigins Set in Origin-stripping middleware (rebuild only when port changes) instead of allocating per-request. - Clear pendingPerms + reset currentAssistantBubble on session switch to prevent stale permission cards from the previous session. - Update tests: loopback vs non-loopback /demo auth, anti-clickjacking headers, rename CORS test, add localhost and [::1] origin coverage. * fix(serve/demo): address CR round-2 feedback for demo page UX - Replace innerHTML with DOM APIs in addLog() to prevent XSS - Add MAX_LOG_ENTRIES=500 pruning to prevent unbounded DOM growth - Add concurrent-send guard (promptInFlight) to prevent double submits - Show error feedback in Chat tab when sendPrompt or createSession fails - Disable prompt controls on SSE failure (both catch and non-OK paths) - Add toolCall context detail (command/input/path/diff) to permission cards --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-18 16:31:03 +08:00
jinye	52d2850c7f	feat(serve): safe workspace file read routes (#4175 PR 19) (#4269 ) * refactor(serve/fs): glob audit hashes workspace + emits pattern Closes PR #4250 follow-up #4. Hashing the per-call cwd for glob audit produced a different pathHash for every subdirectory glob without giving operators any actionable difference (raw paths are privacy-gated). Replace the hash basis with the bound workspace itself and surface the literal pattern on a new schema field, so every glob row carries a stable workspace marker and a per-call pattern. The pattern field also fires on parse_error denials (path-escape patterns, non-relative patterns) so audit consumers debugging a production glob rejection can see the exact rejected pattern without needing QWEN_AUDIT_RAW_PATHS=1. * feat(serve): safe workspace file read routes (#4175 PR 19) Add four read-only HTTP routes that consume PR 18's per-request WorkspaceFileSystem boundary: - GET /file?path=... text content + meta (encoding/BOM/lineEnding) - GET /list?path=... directory entries (name/kind/ignored) - GET /glob?pattern=... workspace-relative match paths - GET /stat?path=... file/directory metadata The routes share one error envelope (sendFsError) that maps FsError.kind through the boundary's existing DEFAULT_STATUS_BY_KIND table to a typed JSON response. All four 200 responses set Cache-Control: no-store and X-Content-Type-Options: nosniff so a browser-adjacent client cannot cache or sniff source content. Routes are advertised under a single workspace_file_read capability tag — the four endpoints share the same backing boundary and the same failure shape, so per-route tags would force four simultaneous registry entries with no operator-meaningful difference between them. Mutating routes will ship in PR 20 under their own workspace_file_write tag. Trust gate is unchanged: read intents pass on untrusted workspaces per PR 18's policy.ts. Auth follows the global bearer flow only; read routes never run mutate(), since none of them mutate state. * feat(serve): runQwenServe injects fsFactory + emit pipeline Closes PR #4250 follow-up #2. runQwenServe now constructs a WorkspaceFileSystem factory from the bound workspace, threads its emit hook through to the read routes, and exposes the trust snapshot via deps.trustedWorkspace. Test additions pin the wiring contract: - audit events emitted on success / denial flow back through the test-supplied fsAuditEmit hook - deps.fsFactory override is honored (built-in default does not silently shadow injection) - trust snapshot defaults to true (operator-chosen workspace) - trust=false routes through to the boundary and trips untrusted_workspace on write intents Default emit stays a stderr warning so a wiring regression that drops events remains visible. PR 21's SSE fan-out will replace the default with a workspace-scoped event channel. * fixup(serve): address PR #4269 round-1 review feedback Closes 8 findings from Copilot inline review + Codex review on PR #4269 (5 P0, 3 P1): P0 (correctness / privacy / operations) - runQwenServe.ts: throttle the default fsAuditEmit by reusing the exported `createDefaultFsAuditEmit` from server.ts. The earlier per-event `writeStderrLine` would print one line for every /file/list/glob/stat audit event under normal traffic. Now warns once + every 100th drop with payload context, so a wiring regression is still visible without flooding logs. (Copilot runQwenServe.ts:316; Codex runQwenServe.ts:305) - routes/workspaceFileRead.ts: probe glob with maxResults+1 and trim, so `truncated` reflects whether the boundary actually had more matches. Earlier `length === maxResults` heuristic false-positived when the workspace happened to hold exactly N matches. (Copilot workspaceFileRead.ts:399) - routes/workspaceFileRead.ts: glob `relMatches` now flows through the shared `workspaceRelative` helper. Root match (`pattern=.`) renders as "." rather than the empty string `path.relative` returns; helper also covers the boundWorkspace-undefined edge case so the route no longer carries its own fallback branch. (Copilot workspaceFileRead.ts:388; review summary HIGH-1) - fs/audit.ts: `pattern` field now rides on the same privacy gate as `relPath` / `message`. Glob patterns commonly carry workspace-relative or absolute path fragments (`src/secrets/.env`, rejected `/Users/alice/ws/`), so emitting them in privacy mode bypassed the same redaction the other path-bearing fields honor. Operators wanting full forensic context opt in via QWEN_AUDIT_RAW_PATHS=1. (Codex audit.ts:249) - routes/workspaceFileRead.ts: cwd resolves with intent='list' rather than 'glob'. The orchestrator's `recordAndWrap` auto-derives `data.pattern` from `intent === 'glob'`, which turned cwd-resolution failures into rows where the cwd string masqueraded as the glob pattern (`?cwd=../outside` → `pattern: ../outside` in audit). Switching to 'list' is the correct semantic shape (cwd is a directory we intend to walk) with identical trust + path-resolution behavior. (Codex workspaceFileSystem.ts:941) P1 (cosmetic / comment accuracy) - server.test.ts: `honors deps.fsFactory override` test comment rewritten to match the actual failure mode (a regression would 404 on a.txt, not 200 against package.json). (Copilot server.test.ts:3219) - routes/workspaceFileRead.ts: `limit` error message uses the MAX_LIST_ENTRIES constant instead of the literal 2000. (review summary MEDIUM) - fs/audit.ts: expanded the JSDoc explaining why the AuditPublisher request types Omit four fields and pass `pattern` through. (review summary MEDIUM) Test additions / adjustments - audit.test.ts: split the existing pattern tests into raw-paths and privacy-default cases; added two new privacy-mode assertions that strip pattern under default config. - workspaceFileSystem.test.ts: harness accepts `includeRawPaths`; glob audit suite runs with raw paths to observe `pattern`; new `glob audit privacy default` suite asserts pattern + relPath are stripped without the env opt-in. - workspaceFileRead.test.ts: new GET /glob cases for the truncated edge case (count == maxResults → false; count > maxResults → true) and root-match normalization. Not adopted (with rationale) - review summary HIGH-2 (glob pathHash uses boundWorkspace): this is the deliberate follow-up #4 contract from PR 18; pattern is the per-call signal, pathHash is the workspace marker. - review summary MEDIUM-1 (parseIntInRange three-state return): matches `parseMaxQueuedQuery` in server.ts; consistency wins. - review summary LOW-1/2/3 (capabilities comment length, CSP header, reverse truncated:false assertion): rationale already documented in code, CSP belongs in a hardening PR, the reverse assertion already exists. 518/518 serve tests pass; typecheck + eslint clean within src/serve/. fix(serve): address workspace file read review Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(serve): tighten workspace file read review followups Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 16:17:14 +08:00
易良	8f9b979ec0	test: reduce wait-dependent UI test delays (#3987 ) * test: reduce wait-dependent UI test delays * test: address UI test review feedback * test: address remaining UI test review feedback - Replace runPendingRetryTimer's polling loop with vi.advanceTimersToNextTimerAsync to decouple from connect()'s internal async structure. - Document that newSessionWithRetry's 300ms auth-delay timer is unreachable under the current mocks so future tests advance it via the same helper instead of restoring shouldAdvanceTime. - Stop swallowing runPendingRetryTimer errors behind connectPromise.catch(e => e); attach a noop catch only to mute the unhandled-rejection guard and assert directly on the original promise. - Comment why flush() yields twice (Ink 7 + React 19 split a render across two microtask ticks) so it is not "optimized" to a single yield. - Move the trailing unmount() into the finally block in the shell-history placeholder test so an assertion failure does not leak the Ink instance.	2026-05-18 15:34:34 +08:00
Shaojin Wen	0bd703bce6	fix(serve): add mcp_guardrails to E2E capabilities expectation (#4268 ) PR #4247 (`feat(serve): MCP client guardrails`) added the always-on `mcp_guardrails` capability to `SERVE_CAPABILITY_REGISTRY` and updated the unit-level `EXPECTED_STAGE1_FEATURES` in `packages/cli/src/serve/server.test.ts`, but missed the matching integration-test expectation. The E2E `qwen serve — capabilities envelope > advertises all baseline capabilities` assertion has been failing on `main` since #4247 landed. Append `'mcp_guardrails'` to the expected `caps.features` array, in the same position as `SERVE_CAPABILITY_REGISTRY` and the unit-level constant (after `session_metadata`, before the conditional `require_auth`). No production code changes.	2026-05-18 14:30:27 +08:00
jinye	103090669e	feat(serve): workspace memory and agents CRUD (#4175 Wave 4 PR 16) (#4249 ) * feat(serve): workspace memory and agents CRUD (#4175 Wave 4 PR 16) Adds the first Wave 4 mutation route surface: workspace-scoped memory and subagent CRUD over HTTP. Remote clients (TUI / channels / web / IDE adapters) can now list, read, create, update, and delete subagent definitions and read / append / replace QWEN.md without disturbing session state. Routes: - GET /workspace/memory (read-only snapshot) - POST /workspace/memory (append/replace, strict-gated) - GET /workspace/agents (list project + user + builtin) - POST /workspace/agents (create-only; 409 on collision) - GET /workspace/agents/:agentType (full detail incl. systemPrompt) - POST /workspace/agents/:agentType (update; 403 read-only on builtin) - DELETE /workspace/agents/:agentType (idempotent for SDK callers) Mutation paths use mutate({ strict: true }) from PR 15 so they refuse unauthenticated requests even on no-token loopback defaults. Workspace mutations validate X-Qwen-Client-Id against bridge.knownClientIds() and stamp originatorClientId on emitted events. Capability tags added: workspace_memory, workspace_agents. New typed events fanned out via bridge.publishWorkspaceEvent (best- effort to every active session bus; read-after-write is the contract): - memory_changed { scope, filePath, mode, bytesWritten } - agent_changed { change, name, level } writeContextFile.ts is the new core helper that resolves QWEN.md placement (workspace vs ~/.qwen) and append-vs-replace semantics. Whitespace-only appends short-circuit before fs.writeFile, so a no-op POST does not bump mtime or fan out a misleading event. SubagentManager is wrapped with a CRUD-scoped Config stub via Proxy: only getSdkMode / getProjectRoot / getActiveExtensions are stubbed (verified against subagent-manager.ts; getToolRegistry is execution- path only). Any future Config method touched on a CRUD path throws immediately so dependency creep is visible. Auto-memory CRUD, persistent audit log, and the EACCES → NOT_FOUND unlink mapping in core SubagentManager.deleteSubagent are explicit follow-ups (PR 16.5 / PR 24 / separate fix). Validation: - typecheck: cli + sdk-typescript clean - vitest: serve 348/348, writeContextFile 10/10, SDK 335/335 - eslint: clean * fix(serve): address Codex P2 review on PR 16 (#4175 Wave 4 PR 16 follow-up) Three correctness issues Codex flagged on the just-shipped workspace memory + agents CRUD surface: 1. Concurrent POST /workspace/memory append no longer loses writes. Two simultaneous appends would each read the same existing file, compose new content in JS memory, then race the fs.writeFile — the later write silently overwrote the earlier appended entry. Add a per-resolved-path Mutex map (mirroring jsonl-utils.ts's fileLocks pattern) and wrap the entire read-compose-write sequence in runExclusive. 2. GET /workspace/agents now reflects out-of-band file changes. SubagentManager.listSubagents() default served the in-memory cache; developer / IDE adapter edits to .qwen/agents/.md never appeared even though GET /workspace/agents/:agentType always reads disk. Pass { force: true } so the LIST route walks disk every call, matching the detail route's "filesystem is the source of truth" contract. 3. Reject builtin agent names on POST /workspace/agents to prevent undeleteable shadow files. A client could write a project-level agent named "general-purpose" — list/load resolved the shadow first, but SubagentManager.deleteSubagent's name-based builtin guard (subagent-manager.ts:302) rejected DELETE forever. Add a BuiltinAgentRegistry.isBuiltinAgent check in parseAgentConfig so the conflict surfaces at create time instead of trapping the file beyond the API. The check is case-insensitive, matching the resolver's case-insensitive cascade. New tests: - writeContextFile.test.ts: 10 parallel appends, all 10 entries must survive in the final file (would fail without the mutex). - workspaceAgents.test.ts: GET /workspace/agents observes a freshly-written agent file on the second call (force-refresh proof); POST with name="general-purpose" returns 422 + the case-insensitive variant "explore" too. Validation: - typecheck: cli + sdk-typescript clean - vitest: serve 351/351 (was 348, +3 new), writeContextFile 11/11 - eslint: clean fix(serve): apply round-1 review fold-in 2a (HIGH + CodeQL) on PR 16 Round-1 inline review (#4249) flagged ~28 items across Copilot, wenshao, and CodeQL. This commit lands the HIGH-severity correctness fixes plus the two CodeQL polynomial-regex warnings. Validation tighten — `parseAgentConfig` + `parseAgentUpdates`: - Trim leading/trailing whitespace on `name` before passing to SubagentManager. `" tester "` would otherwise create a frontmatter name with spaces that case-insensitive lookups can never find. - Fail-closed (422 invalid_config) on present-but-wrong-type optional scalars: `model`, `color`, `approvalMode`, `background`. Previously malformed values silently dropped through validation, masking client-serialization bugs. - Validate `approvalMode` against the `APPROVAL_MODES` enum on both create and update; an unknown value used to 201 with the field silently omitted from the saved file. - `runConfig` is now whitelist-sanitized to `{ max_time_minutes, max_turns }` only; unknown keys are dropped, malformed values return 422. Previously the whole input object was persisted verbatim into YAML frontmatter. - `?scope=` query is fail-closed for repeated values (`?scope=workspace&scope=global`) — Express parses these as arrays which the previous `typeof === 'string'` check silently treated as absent, broadening DELETE/UPDATE semantics from one level to both. - Empty update body returns 400 invalid_config (previously rewrote the file + emitted a misleading `agent_changed` event). - No-op updates (every supplied field already matches `existing`) return 200 + `changed: false` and SKIP the file rewrite + event fan-out. Memory write helper — `writeContextFile.ts`: - Move whitespace-only no-op detection BEFORE `fs.mkdir`. Without this, an empty POST still created the parent directory and bumped its mtime even though `changed: false` was reported. - Replace two polynomial regex patterns flagged by CodeQL (`/^\s+\|\s+$/g` and `/^\n+\|\n+$/g`) with hand-rolled `while` loops. Same pattern auth.ts:120-125 already uses for the same CodeQL rule. SDK — `DaemonClient.ts` + `types.ts`: - `DaemonWriteMemoryResult` gains optional `changed?: boolean` so typed callers can suppress redundant cache invalidation on no-op appends. Optional for forward-compat with daemons that predate the field — undefined treats as "changed: true" (legacy contract). - `deleteWorkspaceAgent` only swallows 404 when the body's `code` is `agent_not_found`. A bare 404 (older daemon, misrouted proxy, generic gateway page) now throws — previously the SDK silently reported success even when the request never reached a route that understands workspace agents. - `updateWorkspaceAgent` adds an optional `scope` parameter mirroring `deleteWorkspaceAgent`, so callers can target the user- level definition when a project-level agent shadows it. Validation: - typecheck: cli + sdk-typescript clean - vitest: serve 357/357 + writeContextFile 12/12 = 369/369 passing (was 362; +7 new) - eslint: clean Explicitly NOT applying (out of scope per issue #4175 PR 16 review-resolution policy): - Copilot's "strict gate after body parser" finding — already documented as PR 15 review-resolved tradeoff at auth.ts:256-269. * fix(serve): apply round-1 review fold-in 2b (MEDIUM + tests) on PR 16 MEDIUM hardening: - Fix the JSDoc on `collectWorkspaceMemoryStatus` to match the workspace-root-only discovery the implementation actually does today. The 32-iteration upward walk is reserved for a future hierarchical mode but breaks after iteration 1 in v1. - Lower the depth limit on `walkWorkspaceForMemory` from 32 → 12. Realistic project depth sits well below 8; 12 leaves headroom without amplifying blast radius from symlink cycles. - Daemon `Config` Proxy now defines a `has` trap symmetric to the existing `get` trap. Without it, a future SubagentManager path doing `'someMethod' in this.config` would silently get `false` and bypass the safety net the throw-on-unknown-property design installed. - Preflight `manager.loadSubagent(name, level)` before `manager.createSubagent`. The default-path collision check inside SubagentManager would otherwise miss same-frontmatter-name + different-filename collisions; the preflight makes 409 agent_already_exists deterministic. - Multi-level DELETE now emits one `agent_changed` event per level that actually had a file removed. Previously an unscoped DELETE removing both project and user shadows would publish only one event with one level — misleading subscribers using event metadata for toasts / audit / echo-suppression. Test additions (covers the new event types + bridge fan-out + SDK helpers): - `daemonEvents.test.ts`: predicate narrowing for `memory_changed` / `agent_changed` (rejects malformed scope/mode/level), reducer records `lastWorkspaceMutation` + `lastWorkspaceMutationType` with latest-event-wins semantics and stays non-terminal. - `httpAcpBridge.test.ts`: `publishWorkspaceEvent` fans out to every active session bus; `knownClientIds()` aggregates clientIds across sessions and the returned set is a snapshot (mutating it does not affect future calls). - `workspaceAgents.test.ts`: success-path test stamping `originatorClientId` on the create / update / delete events for a known client. - `DaemonClient.test.ts`: 7 round-trip tests for the new SDK helpers (workspaceMemory, writeWorkspaceMemory, listWorkspaceAgents, getWorkspaceAgent, createWorkspaceAgent, updateWorkspaceAgent with scope query, deleteWorkspaceAgent: 204 / structured 404 / bare 404 triage). - `writeContextFile.test.ts`: replace the 30ms-mtime test with a `vi.spyOn(fs, 'writeFile')` assertion that the no-op path never invokes writeFile. Deterministic on every filesystem. Validation: - typecheck: cli + sdk-typescript clean - vitest: serve 363/363 + writeContextFile 12/12 + SDK 347/347 - eslint: clean Reviewer guide: combined with fold-in 2a (commit `134c43c82`), PR 16's round-1 review feedback is closed except for the explicitly- deferred Copilot finding on "strict gate after body parser" (already documented as PR 15 review-resolved tradeoff at auth.ts:256-269). The DRY refactor wenshao suggested for `resolveOriginatorClientId` is left as a future sweep — it touches multiple Wave 4 routes and should land alongside PR 17/19/20/21 to keep the helper's shape informed by all consumers. * docs(serve): apply round-1 review fold-in 2c (doc/type tightening) on PR 16 Two doc-only fixes that close the last open Copilot threads on PR #4249 — both are JSDoc/tsdoc corrections where the wording promised broader behavior than the implementation actually delivers, so a maintainer or SDK consumer reading the type would form a wrong mental model. 1. `DaemonAgentLevel` (sdk-typescript) and `ServeAgentLevel` (cli serve) keep `'extension'` + `'session'` on the union for forward- compat but the JSDoc now explicitly says the daemon does NOT return either today. The `'extension'` case is gated by the daemon's stub `Config.getActiveExtensions()` returning `[]`; `'session'` is a runtime-only `SubagentManager` cache the CRUD routes don't read. Both arms stay so a future PR exposing either source is not a breaking SDK change. 2. `DaemonClient.workspaceMemory()` tsdoc no longer says "hierarchical" — v1 only discovers files at the bound workspace root + the global `~/.qwen` directory, no parent-directory walk. The 12-iteration upward-walk loop body inside `walkWorkspaceForMemory` is reserved for PR 16.5 hierarchical mode and breaks after iteration 1 today; the SDK doc now states that explicitly so callers don't expect more than they receive. No runtime change. Validation: - typecheck: cli + sdk-typescript clean - vitest: 363/363 serve + 12/12 writeContextFile + SDK unchanged - eslint: clean * fix(serve): apply round-2 review fold-in 2d on PR 16 wenshao round-2 (4 inline comments at 16:51-16:53Z): three real bugs + one performance-tradeoff doc note. 1. `composeAppendedContent` now inserts inside the MEMORY section, not at EOF. Previously a QWEN.md whose `## Qwen Added Memories` block was followed by another `## ...` heading would silently land each new entry past the next heading — moving entries into the wrong section. Walk the memory header forward, find the next `\n## ` heading, and insert just before it. Fall back to the EOF append when the memory section is the last block. 2. `parseAgentUpdates` now matches the create-side trim/empty rule for `description` (rejects whitespace-only) and ensures `systemPrompt` rejects the empty string. Update path used to silently accept `" "` and overwrite the field with blank content — divergent from create which 422s the same payload. 3. `isNoOpUpdate`'s runConfig comparison no longer false-positives on partial updates. Comparing every known runConfig field against `existing` treated absent keys as `undefined` while existing had real values — so `{max_time_minutes: 30}` against `{max_time_minutes: 30, max_turns: 10}` claimed non-no-op and re-emitted `agent_changed`. Fixed to only compare keys actually present in `updates.runConfig`, matching `mergeConfigurations` semantics (existing values preserved when not in updates). 4. JSDoc on the LIST-route `force: true` call now explains the tradeoff (no TTL cache / no fs.watch invalidation): re-introducing caching would re-introduce the stale-list bug Codex P2 #2 fixed, `fs.watch` is platform-fragile, and PR 24's audit/policy layer is the proper home for request rate limiting. Sub-millisecond cost per request on local SSD; revisit if profiling flags it. Tests: - writeContextFile.test.ts: section-boundary insertion + EOF fallback - workspaceAgents.test.ts: whitespace-only description rejected; partial runConfig no-op detection; partial runConfig real change preserves omitted keys via mergeConfigurations Validation: - typecheck: cli + sdk-typescript clean - vitest: 368/368 (was 363, +5 new) - eslint: clean * fix(serve): apply round-3 review fold-in 2e on PR 16 wenshao round-3 (5 inline [Suggestion]s, all real correctness or forward-compat issues; one item carried over from round-2): 1. `parseAgentConfig` rejects whitespace-only `systemPrompt` on create, matching the description field's `trim().length === 0` rule. Pure-whitespace prompts collapse to nothing on YAML serialization and the agent can't operate without instructions — 422 at the boundary is friendlier than the downstream "agent does nothing" failure. 2. `parseAgentUpdates` mirrors the same `trim()` check on the update path so `{systemPrompt: " "}` returns 422 rather than silently blanking the field. 3. `POST /workspace/memory` `file_error` 500 response now carries `scope`, `mode`, optional `osCode` (`EACCES`/`EROFS`/`ENOSPC`/...) and a redacted `errorMessage`. Previous shape was just `{error, code: 'file_error'}` — callers had nothing to branch on. 4. `composeAppendedContent` runs `fs.stat` before `fs.readFile` and refuses with a typed `WorkspaceMemoryFileTooLargeError` when the existing file exceeds 16 MB. Without this cap a pathological QWEN.md would be loaded into the daemon heap on every append. The route maps the typed error to a 413 with `code: 'memory_file_too_large'` plus `bytes` / `limit` so callers can decide whether to trim or switch to mode=replace. 5. `toDetail` no longer spreads `config.runConfig` with a cast. Explicit field-by-field pick of `max_time_minutes` / `max_turns` ensures any future `SubagentConfig.runConfig` field requires a deliberate route-schema update rather than silently leaking through the HTTP API. Tests: - workspaceAgents.test.ts: whitespace-only systemPrompt rejected on create AND update; toDetail.runConfig only emits whitelisted keys - existing tests still cover the description-side trim and the partial runConfig no-op detection from fold-in 2d Validation: - typecheck: cli + sdk-typescript clean - vitest: 371/371 (was 368, +3 new) - eslint: clean Reviewer note: response shape on 500 file_error is additive (`scope`/`mode`/`osCode`/`errorMessage` are new fields), so SDK callers that only consumed `{error, code}` keep working. The new 413 `memory_file_too_large` is a new error code SDK consumers can branch on but that pre-PR-16 daemons never emitted, so adding it is also additive. * fix(sdk): expose `changed` on DaemonAgentMutationResult (PR 16 round-4) wenshao round-4 review (single inline at types.ts:434): the agent update route emits `changed: true` for real updates and `changed: false` for no-op short-circuits (introduced in fold-in 2a alongside the no-op detection), but `DaemonAgentMutationResult` in the SDK type still only exposed `{ ok, agent }`. Typed callers of `updateWorkspaceAgent()` couldn't observe the no-op signal even though `DaemonClient` already returns the raw JSON at runtime. Add optional `changed?: boolean` matching the shape introduced for `DaemonWriteMemoryResult.changed` in fold-in 2a. Optional for forward-compat with daemons that predate the field; SDK consumers should treat `undefined` as `true` (the legacy contract — every successful create / update was a write before fold-in 2a's no-op short-circuit landed). Test: - `DaemonClient.test.ts`: round-trip asserts the typed result surfaces `changed: false` from the wire payload. Validation: - typecheck: cli + sdk-typescript clean - vitest: 82/82 in DaemonClient.test.ts (was 81; +1 new) - eslint: clean * fix(serve): apply round-6 review fold-in 2g on PR 16 Round-6 review (gpt-5.5 [Critical] + 5 wenshao [Suggestion]s). [Critical] Per-level delete verification (workspaceAgents.ts): - gpt-5.5 flagged that `SubagentManager.deleteSubagent` swallows per-level `fs.unlink()` failures (subagent-manager.ts:332-336) and returns success as long as ANY level was removed. Trusting that signal would let the route publish `agent_changed`/`deleted` for a file still on disk under EACCES/EBUSY/EPERM — the client UI would drop a still-active definition from cache. - Route now runs `fs.access` on each pre-checked level's file path AFTER `manager.deleteSubagent` returns and partitions into `removed` / `remaining`. Events are emitted ONLY for confirmed removals; if any level still has its file, the route returns 500 `agent_delete_partial` with `removedLevels` + `remainingLevels` so callers can act precisely. - New test installs a 0o555 chmod on the user-level agents directory so `fs.unlink` raises EACCES while the project-level unlink succeeds, asserting both the 500 response and that exactly one `agent_changed` event fired for the level that actually went away. Concurrency consistency (writeContextFile.ts): - Whitespace-only no-op detection now happens INSIDE the per-file mutex's `runExclusive` block. The pre-fix layout did the short-circuit `fs.stat` outside the lock; under concurrent POSTs (one whitespace-only, one with real content) the no-op's `bytesWritten` could lag the post-write reality. Functional behavior was already correct; this aligns the snapshot with the post-write state. Defense-in-depth + DRY (workspaceAgents.ts): - `validateAgentType(req, res)` regex-validates `:agentType` URL parameter at the route boundary against the same `^[\\p{L}\\p{N}_-]+$/u` pattern as `SubagentValidator.validateName`, with a 64-char cap. `findSubagentByNameAtLevel`'s readdir scan already prevented path traversal, but failing fast at the boundary keeps surprising inputs out of downstream code paths. Two new tests cover `..%2Fetc%2Fpasswd` and over-long names. - `parseScopeQuery(req, res)` extracts the duplicated `?scope=` query parser from the POST update + DELETE handlers. Same fail-closed semantics on repeated/non-string values. - `assertMutableLevel(found, agentType, res)` extracts the duplicated `isBuiltin \|\| level === 'builtin' \|\| 'extension' \|\| 'session'` 403 guard. Future Wave 4 mutation routes (PR 17 / 19 / 20) call this helper instead of re-implementing the predicate. Client-id helper consistency (workspaceMemory.ts): - `resolveWorkspaceClientId` removed; the inline branch in the POST handler now mirrors `workspaceAgents.ts:resolveOriginatorClientId` (validate against `bridge.knownClientIds()`, send 400 directly, return so the caller short-circuits). Previously this file threw `InvalidClientIdError` and caught it locally — wenshao round-6 flagged the throw-vs-direct-400 inconsistency between the two files. The deeper full-extraction DRY refactor remains deferred to the cross-Wave-4 sweep with PR 17/19/20/21. Won't-fix doc note (workspaceMemory.ts): - Mount-point JSDoc now explicitly explains why the route returns absolute on-disk paths (success / 413 / GET list): clients pre-flight `caps.workspaceCwd` to learn the bound workspace and can compute relative paths if they want; the global scope's `~/.qwen/QWEN.md` is NOT under the workspace root, so a workspace-relative form would lose information. Path redaction for multi-tenant deployments belongs to PR 24's `--redact-errors` policy work, not a per-route default flip in PR 16. Validation: - typecheck: cli + sdk-typescript clean - vitest: 374/374 (was 371, +3 new) - eslint: clean * fix(serve): apply round-7 review fold-in 2h on PR 16 glm-5.1 round-7: 2 [Critical] + 5 [Suggestion] inline comments. Five applied as code changes; one is a stale-snapshot false positive (workspaceMemory.ts no longer has the InvalidClientIdError call site glm-5.1 referenced — fold-in 2g already replaced it with inline 400); one is rationale-replied (INVALID_CONFIG → 422 mapping suggestion is based on incorrect premise about manager semantics). [Critical] Code-fence-aware section-boundary detection (writeContextFile.ts): - The naive `\n## ` indexOf scan would split user-authored memory entries that quote markdown documentation containing `##` headings inside fenced code blocks. New `findNextTopLevelHeading` helper tracks fence state line-by-line and only accepts matches outside fences. Two new tests: (a) entry containing a fenced `## Request Body` keeps its body intact; (b) real `## post` heading outside fences still acts as the section boundary. [Suggestion] errorMessage + filePath gating (workspaceMemory.ts): - 500 `file_error` and 413 `memory_file_too_large` responses now omit `errorMessage` and `filePath` unless `QWEN_SERVE_DEBUG` is set. Default response carries `error / code / scope / mode / osCode` — enough for SDK callers to branch without leaking absolute filesystem paths. New test asserts both modes round-trip the right shape. [Suggestion] publishWorkspaceEvent visibility (httpAcpBridge.ts): - Catch block now writes to stderr unconditionally during normal operation; only downgrades to the debug channel when `shuttingDown` is true. `EventBus.publish` is documented never to throw, so a hit in normal ops is by definition a regression that must be visible in production logs — silencing via debug-gate could let a true bug succeed at the route layer (200 OK) while SSE subscribers stop receiving events. [Suggestion] Log-injection defense for `agentType` (workspaceAgents.ts): - New `safeLogValue` helper wraps `agentType` interpolations in `JSON.stringify(...).slice(0, 82)` before stderr writes (mirrors `server.ts:1340`). The route's `validateAgentType` regex already rejects names with control chars, but defense-in-depth covers legacy on-disk shadows and future fields. Five `writeStderrLine` call sites updated (GET / POST / DELETE failure, reload-failure, partial-delete, create-reload-failure). [Suggestion] Simplify walkWorkspaceForMemory (workspaceMemory.ts): - Replaced the 12-iteration loop with a straightforward single-pass stat-each-filename. The `seen` Set, `cursor = parent` walk, and filesystem-root guard were dead code (the loop unconditionally broke on first iteration). PR 16.5's hierarchical mode lands as a fresh upward walk rather than re-enabling commented-out code. Validation: - typecheck: cli + sdk-typescript clean - vitest: 377/377 (was 374, +3 new) - eslint: clean Reviewer notes (NOT adopting): - glm-5.1's "InvalidClientIdError('workspace', ...)" message-confusion Critical: stale-snapshot false positive — fold-in 2g already removed `resolveWorkspaceClientId` and inlined a 400 with the correct "registered for this workspace" wording. Only a comment reference remains. - glm-5.1's "INVALID_CONFIG → 422" suggestion: SubagentManager only ever throws INVALID_CONFIG for read-only conditions (built-in / extension / session) — not for malformed config (which uses VALIDATION_ERROR). The current 403 mapping in update + delete is correct for the manager's actual semantics. * fix(serve): apply round-8 review fold-in 2i on PR 16 wenshao round-8: 2 [Critical] path-disclosure + 5 [Suggestion] (name regex, per-field caps, mutex timeout, test gaps, tilde fence). All adopted. [Critical] C1 — 413 `err.message` path disclosure (workspaceMemory.ts): - The 413 `memory_file_too_large` response sent `err.message` unconditionally as the `error` field. The `WorkspaceMemoryFileTooLargeError` constructor embeds the absolute file path in its message ("Existing memory file at /Users/<x>/.qwen/QWEN.md is ..."), bypassing the `debugMode()` gating that already hid the `filePath` field. Same gating now applies to both `error` and `filePath`; default response carries a generic string + structured `code` / `bytes` / `limit` so SDK callers can branch without the path leak. [Critical] C2 — workspaceAgents FILE_ERROR `err.message` (workspaceAgents.ts): - Two catch blocks (create + update) sent `SubagentError(FILE_ERROR)` messages directly in the response. Node fs errors embed paths like "ENOENT: ... '/Users/<x>/.qwen/agents/foo.md'". Both now gate behind `isServeDebugMode()`; default response is the generic "Failed to write workspace agent file" envelope. Shared `isServeDebugMode` helper (debugMode.ts new): - Moved from inlined copies in workspaceMemory.ts to a small shared module so both route files (and future Wave 4 mutation routes) share one canonical predicate. [Suggestion] S1 — POST body `name` validation (workspaceAgents.ts): - `parseAgentConfig` now applies the same regex + length contract as `validateAgentType` (`^[\p{L}\p{N}_-]+$/u`, 2-64 chars). A client posting `name: "my/agent"` or 100-char name now fails at the body-validation boundary with a 422 `invalid_config` instead of bubbling a less-specific `SubagentValidator` error. [Suggestion] S2 — Per-field size caps (workspaceAgents.ts): - `description` / `systemPrompt`: 256 KB each - `tools` / `disallowedTools`: 256 entries, each at most 256 chars Applied on both create + update; matches workspaceMemory's `MAX_MEMORY_CONTENT_BYTES = 1 MB` posture and keeps `GET /workspace/agents` list-response cost bounded. [Suggestion] S3 — Mutex timeout (writeContextFile.ts): - `getFileLock` now wraps each Mutex with `withTimeout(..., 30_000)` so a wedged filesystem (NFS hiccup, OneDrive lock, kernel I/O hang) cannot indefinitely hold the per-file lock. The `E_TIMEOUT` sentinel is caught and re-thrown as a typed `WorkspaceMemoryWriteTimeoutError`; the route maps it to 500 `memory_write_timeout` with `timeoutMs` so SDK callers can branch on stalled-fs without parsing a generic 500. [Suggestion] S4 — Test gaps: - `DELETE /workspace/agents/:id?scope=workspace` happy path: removes only the project shadow, leaves user file on disk, emits exactly one `agent_changed` event with `level: project`. - `POST /workspace/agents/:id?scope=global` happy path: updates user shadow, leaves project file untouched. - 413 `memory_file_too_large`: write a 17 MB QWEN.md externally, POST append fails with the structured 413 payload (`bytes` / `limit`, no `filePath` / no path-embedding error message in default response). [Nice] N1 — Tilde fence support (writeContextFile.ts): - `findNextTopLevelHeading` now toggles fence state on both ``` ` and `~~~` openers (CommonMark allows both). A `## heading` inside a `~~~` fenced block no longer counts as the section boundary. Validation: - typecheck: cli + sdk-typescript clean - vitest: 380/380 (was 377, +3 new) - eslint: clean * fix(serve): apply round-9 review fold-in 2j on PR 16 Two real correctness fixes from wenshao's 2026-05-18 review: 1. resolveContextFilePath now uses getCurrentGeminiMdFilename() so POST /workspace/memory writes to the same file GET surfaces. Without this, a deployment that ran setGeminiMdFilename('AGENTS.md') saw GET list AGENTS.md while POST kept appending to a stale QWEN.md — clients then observed "I just wrote content but it's missing from /workspace/memory". 2. runWrite no-op branch now returns bytesWritten: 0 instead of the existing file's stat.size. The prior value conflated "bytes I wrote" with "current file size"; clients accumulating writes via sum(bytesWritten) added the file size for every whitespace POST. changed: false already signals the no-op; the byte count should match its field name. JSDoc updated on both WriteContextFileResult.bytesWritten and DaemonWriteMemoryResult.bytesWritten so the contract is explicit. New test covers setGeminiMdFilename(AGENTS.md) round-trip; existing no-op test updated for the new bytesWritten semantics. Round-8 thread PRRT_kwDOPB-92c6Cpyap (DRY resolveOriginatorClientId) stays open as the cross-Wave-4 tracking marker. CodeQL "missing rate limiting" alert deferred to PR 24's audit/policy layer (bearer + max-connections + mutation gate provide v1 mitigations). * fix(serve): skip two Windows-incompatible test fixtures on win32 Both tests rely on `fs.chmod(dir, 0o555)` to trigger EACCES on a subsequent write/unlink. Windows ignores Unix-style permission bits passed to `fs.chmod`, so the directory stays writable, the operation succeeds, and the error path the test exercises is unreachable — the test then sees the success status (200 / 204) instead of the expected 500. CI failed on Windows runner only; Ubuntu + macOS pass. Route logic is platform-agnostic — these tests validate that: - `workspaceMemory.test.ts` POST returns the structured 500 envelope (no `errorMessage` / `filePath` leakage outside QWEN_SERVE_DEBUG). - `workspaceAgents.test.ts` DELETE returns 500 `agent_delete_partial` when one level's `fs.unlink` silently fails inside SubagentManager. Both invariants are still covered by the Ubuntu + macOS runs. We can't swap in a `vi.spyOn(fs, 'unlink')` mock for the agents case either — `SubagentManager` does `import * as fs from 'fs/promises'`, creating a sealed ESM namespace object vitest can't redefine. Skip pattern mirrors `customBanner.test.ts:232` (`if (process.platform === 'win32') return;`).	2026-05-18 14:26:59 +08:00
jinye	495d11f016	refactor(serve): add FileSystemService boundary (#4175 Wave 4 PR 18) (#4250 ) Some checks are pending Qwen Code CI / Classify PR (push) Waiting to run Details Qwen Code CI / Lint (push) Blocked by required conditions Details Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions Details Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions Details Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions Details Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Blocked by required conditions Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details * refactor(serve): add FileSystemService boundary (#4175 PR 18) Introduce a per-request workspace filesystem boundary inside the `qwen serve` daemon. The boundary centralizes path canonicalization, symlink-aware boundary checks, ignore/trust policy, size/binary limits, and audit hooks behind a single typed surface — preparing PR 19 (read-only file routes) and PR 20 (write/edit routes) to share a guarded chokepoint instead of re-implementing path safety per route. Wave 4 PR 18 of #4175 — pure refactor, no new HTTP routes; depends on PR 12 (#4241) and PR 15 (#4236), both merged. New module under `packages/cli/src/serve/fs/`: - `paths.ts` extracts `canonicalizeWorkspace` from `httpAcpBridge.ts` (re-exported there for backward compatibility) and adds: - `ResolvedPath` brand and `Intent` union (read/write/edit/list/ glob/stat) with exhaustiveness checks at the trust gate - `hasSuspiciousPathPattern` — detects NTFS ADS, 8.3 short names, long-path prefixes, UNC paths, trailing dots, DOS device names, and three-or-more-dot path components (claude-code-style) - `findExistingAncestor` with explicit ENOTDIR rejection so a regular file in a path component throws `parse_error` rather than passing boundary inspection and 500-ing later - `resolveWithinWorkspace` running a chain-aware realpath check with ENOENT-tolerant ancestor walk for write/stat intents - `errors.ts` defines `FsError` / `FsErrorKind` plus `wrapAsFsError`, which categorizes raw `fs.promises` errnos (EACCES → permission_ denied, ELOOP → symlink_escape, ENOTDIR → parse_error, etc.) so body-level failures emit audit events instead of escaping uncategorized - `policy.ts` carries `MAX_READ_BYTES` (256 KiB), `MAX_WRITE_BYTES` (5 MiB), `BINARY_PROBE_BYTES` (4 KiB), `shouldIgnore` (file/ directory aware), and `assertTrustedForIntent` with an exhaustive switch over `Intent` - `audit.ts` emits typed `fs.access` / `fs.denied` `BridgeEvent` frames with SHA-256-hashed paths, optional raw-path passthrough via `QWEN_AUDIT_RAW_PATHS=1`, and discriminator `kind` fields so SDK consumers can exhaustively narrow `event.data` - `workspaceFileSystem.ts` — `WorkspaceFileSystem` interface + `createWorkspaceFileSystemFactory` with eight methods (resolve, stat, readText, readBytes, list, glob, writeText, edit). Every body method funnels failures through `recordAndWrap`, which wraps raw fs errors and always emits an `fs.denied` audit event before rethrowing. `readText` enforces `MAX_READ_BYTES` before delegating to the slurping core service so unbounded requests against multi-gigabyte files can no longer OOM the daemon. `glob` realpath-checks each hit against the canonical workspace and reports filtered escapes via a single aggregated `fs.denied` event with the dropped count - `index.ts` is the barrel re-export PR 19/20 will import from Modified files: - `packages/cli/src/serve/httpAcpBridge.ts` — extracted `canonicalizeWorkspace` to `fs/paths.ts`; the bridge re-exports it so existing callers in `server.ts` and `runQwenServe.ts` keep working - `packages/cli/src/serve/server.ts` — added `fsFactory?: WorkspaceFileSystemFactory` to `ServeAppDeps`; `createServeApp` builds a strict default (`trusted: false`, warn-once no-op `emit`) when none is injected so a future refactor that forgets `fsFactory` injection cannot silently allow writes against an untrusted workspace; factory parked on `app.locals` for PR 19/20 route handlers - `packages/core/src/index.ts` — re-exports `Ignore`, `loadIgnoreRules`, and `LoadIgnoreRulesOptions` from `utils/filesearch/ignore.js` for cli consumption 411 serve tests pass; typecheck clean. Engineering principles checklist: - [x] Independently mergeable (no new routes, no new capability tag) - [x] Backward compatible (no removed routes / event fields / CLI behavior) - [x] Default off (no public surface change; PR 19/20 will activate routes) - [x] qwen serve Stage 1 routes preserved - [x] Gradual migration (PR 19/20 will adopt the boundary) - [x] Reversible (single PR rollback) - [x] Tests-first (101 unit tests across the new module + contract test) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve/fs): address PR review feedback (#4250) Codex + Copilot review found 8 substantive issues against `7b0db4c3a`; this commit fixes all of them. The first two are P1 build-breakers introduced by the pre-commit `eslint --fix` auto-promoting value imports to `import type` — 31 fs tests were failing post-commit until this fix. Issue list (links to PR comments at #4250): 1. Import type erased runtime values — workspaceFileSystem.ts:10-15. eslint's consistent-type-imports rule rewrote `import { Ignore, StandardFileSystemService, loadIgnoreRules, type WriteTextFileOptions }` -> `import type { ... }` because it saw type-only usage of `Ignore`. That erased `loadIgnoreRules` (called at runtime) and `StandardFileSystemService` (constructed at runtime), causing TS1361/TS2206 and runtime ReferenceErrors. Restored as a value import with inline `type` modifiers per-symbol and an eslint-disable line + comment so future autofixes don't repeat the regression. 2. Same import-type erasure in contract.test.ts:14 — `isFsError` was lumped under `import type` even though it's called at runtime. Same fix shape. 3. edit() OOM hole — workspaceFileSystem.ts. The earlier review pass added a pre-stat MAX_READ_BYTES gate to readText but missed edit, which fsp.readFile's the whole file before any size check. Multi-GB targets inside the workspace could OOM the daemon. Now stat-first; refuses above the cap with file_too_large; also rejects binary files (string indexOf over arbitrary bytes is meaningless). 4. glob accepted absolute / device patterns — workspaceFileSystem.ts. The ..-segment check stopped lexical traversal but /etc/ / C:\\Users\\foo\\ / \\\\server\\share\\ / //server/share/ still reached globAsync, walking outside the workspace before per-hit filtering dropped the results. Now rejects these patterns up-front with parse_error so no I/O happens outside. 5. glob ignore filter probed every hit as `file` — workspaceFileSystem.ts. The underlying `ignore` library needs a trailing-slash probe for dist/ / .git/-style directory patterns; probing as `file` silently leaked directory matches. Now lstats each hit and routes 'directory' vs 'file' to shouldIgnore so dir-only ignore rules actually match. 6. ReadTextOptions.line off-by-one — workspaceFileSystem.ts. The public option was documented as 1-based but forwarded as-is to readFileWithLineAndLimit, which is 0-based. A request with line: 1 returned content starting at the second line. Now converts 1-based -> 0-based at the boundary; doc clarified; truncation check uses the converted index. 7. ServeAppDeps.fsFactory JSDoc said trusted=true — server.ts:96. Stale from before the strict-default refactor in the same review pass. Rewrote to match the actual trusted: false + warn-once emit behavior. 8. MAX_READ_BYTES JSDoc said reads above cap return truncated — policy.ts:18. Stale from before the hard-cap refactor; now correctly states the cap throws file_too_large and that soft truncation only applies under the cap via enforceReadSize. 7 new tests cover the new behaviors: - POSIX-absolute pattern rejection - Win32 / UNC pattern rejection (4 variants) - directory-pattern ignore (dist/) - edit file-too-large - edit binary refusal - readText line: 1 returns from first line - readText line: 2 starts from second line 418/418 serve tests pass; typecheck + eslint clean. Deferred follow-ups (per PR review reply): - glob maxResults is applied after globAsync materializes every match. A streaming iterator (glob.iterate) would bound the walk too. Non-trivial; tracked as a separate hardening follow-up since current behavior is correctness-safe (just not optimal under huge trees). - Per-path glob escape hash in audit hint (currently aggregated count) — can revisit once PR 19 wires the routes and we see real audit volume. - EVENT_SCHEMA_VERSION migration mechanism — orthogonal; the whole BridgeEvent schema lacks one and that's a Wave 5+ concern. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve/fs): close 3 review-round-2 findings (#4250) The wenshao + DeepSeek reviewer pass on `a81ada43f` surfaced 3 more issues; this commit fixes them. 1. Dangling-symlink write escape (paths.ts) — Critical security bug. A request like `write /ws/escape` where `<ws>/escape` is a symlink whose target doesn't exist YET would pass the ENOENT-tolerant ancestor walk: realpath fails ENOENT, the walk-up returns `<ws>` as ancestor, the canonical becomes `<ws>/escape`, containment passes — but the eventual write follows the symlink and creates the file at the symlink target outside the workspace. lstat-then-readlink before the ancestor walk catches this; the symlink target is itself resolved via the deepest existing ancestor so macOS /private/var canonicalization stays consistent with boundCanonical (an absolute target inside the workspace tmpdir would otherwise have been false-flagged on macOS). 2. glob realpath catch over-reported symlink_escape (workspaceFileSystem.ts) — every realpath failure inside the per-hit boundary loop was counted as `symlink_escape`. EIO, EACCES, ENAMETOOLONG, EBUSY are environmental failures, not security events; mislabeling them poisoned the audit signal for operators trying to investigate genuine escape attempts. Now distinguished: ENOENT/ELOOP count as escapes; other errnos count as transient errors and emit a separate aggregated `fs.denied` with errorKind: 'permission_denied'. 3. policy.ts:enforceReadSize JSDoc said the boundary "intentionally does NOT throw" — stale after a81ada43f's hard-cap refactor. Rewrote to clarify the helper is the soft truncation gate that only fires under the hard cap; readText itself enforces the hard cap with file_too_large via its pre-stat check. The readme/contract is now consistent with workspaceFileSystem.ts. 2 new tests: - dangling symlink targeting outside-workspace path → symlink_escape - dangling symlink targeting future-inside-workspace path → succeeds (ahead-of-mkdir flow for atomic-write-via-rename) 420/420 serve tests pass; typecheck clean. Remaining tracked follow-ups (per PR review reply): - list/glob brand cast (P2 deferred per PR description) - glob audit pathHash hashes pattern not paths - edit() TOCTOU read-modify-write race (atomic-via-temp + rename) - wrapAsFsError ENOSPC/EIO mapping to a distinct kind - runQwenServe → fsFactory injection integration test - glob maxResults streaming (glob.iterate) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve/fs): cross-platform ENOTDIR detection (#4250 CI fix) Windows CI test failure on `a81ada43f` surfaced a real cross-platform bug in `findExistingAncestor`. POSIX returns `ENOTDIR` when `fs.stat` traverses through a non-directory in a path component (e.g. `${ws}/file.txt/leaf` where `file.txt` is a regular file). Windows returns `ENOENT` for the same case. The errno-based guard added in `a81ada43f` only branched on `ENOTDIR`, so the Windows path silently fell through to the ancestor walk and the boundary returned a "canonical" the eventual write could not honor — `WorkspaceFileSystem - audit always emits on body errors > rejects ENOTDIR ancestor walk with parse_error rather than passing boundary` failed with `expected false to be true` on the windows-latest runner. Fix: switch from errno-based detection (platform-divergent) to dirent-kind detection. After `fs.stat` succeeds during the walk-up, if the existing ancestor is NOT a directory AND there are unresolved tail components, throw `parse_error`. Both `ENOENT` and `ENOTDIR` from `fs.stat` are now treated as "the current path doesn't resolve, keep walking" — the post-walk kind check fires regardless of which errno surfaced. Cross-platform-safe. The local 110/110 fs tests still pass on macOS/Linux; the Windows case will exercise the kind-check branch on next CI run. macOS CI failures on the same workflow run (`InputPrompt.test.tsx` placeholder reuse, `SettingsDialog.test.tsx` 5s timeout) are pre- existing flaky UI tests, NOT touched by this PR. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve/fs): close 4 review-round-3 findings (#4250) DeepSeek's third review pass (after `b38e82157`) flagged four more issues; this commit fixes all of them. 1. Multi-hop dangling symlink bypass (paths.ts) — Critical security fix. The earlier single-readlink fix at `efd7a4611` was bypassed by chained dangling symlinks: <ws>/leak -> <ws>/middle -> /scratch/evil where every layer is a symlink and the final target doesn't exist. The fix only readlink'd the first hop (<ws>/middle), saw it was inside the workspace via findExistingAncestor, and let the chain through. The OS write at <ws>/leak would then follow both hops and create /scratch/evil. Now loops lstat + readlink up to MAX_ANCESTOR_HOPS, tracks visited inodes for cycle detection, and only validates containment on the fully-dereferenced leaf. Cycle detection rejects with symlink_escape; chains that exceed the hop bound also surface as symlink_escape with a "too long or contains a cycle" hint. 2. opts.line validation (workspaceFileSystem.ts) — the docstring committed to "1-based positive integer" but Infinity / floats / negative values flowed through to readFileWithLineAndLimit and degraded silently. Now enforces Number.isSafeInteger + line >= 1 at the boundary; everything else throws parse_error. Test covers Infinity, -Infinity, 0, -1, 1.5, NaN. 3. io_error FsErrorKind (errors.ts) — wrapAsFsError previously conflated EIO/EBUSY/ENAMETOOLONG/EMFILE/ENFILE/ENOSPC/ETXTBSY under permission_denied. Monitoring pipelines that key on errorKind for security alerting would page oncall on a full disk. New io_error kind (HTTP 503) maps the environmental- failure errnos with distinct hints. EACCES/EPERM stay on permission_denied (literal access denial); EIO (failing disk), EBUSY (busy file), ENAMETOOLONG (PATH_MAX), EMFILE/ENFILE (fd exhaustion), ENOSPC (df -h reporting 100%), ETXTBSY (text-busy) all route to io_error. 4. glob audit kind taxonomy (workspaceFileSystem.ts) — three-way classification mirrors wrapAsFsError so the per-hit realpath catch surfaces ENOENT/ELOOP -> symlink_escape, EACCES/EPERM -> permission_denied, everything else -> io_error. Each class emits its own aggregated fs.denied event. 5. edit() matchedIgnore (workspaceFileSystem.ts) — readText and writeText both stamp matchedIgnore in their access audit; edit didn't, so operators monitoring fs.access events couldn't distinguish edits to .gitignore'd files (build artifacts, logs) from edits to tracked source. Added the same shouldIgnore + matchedIgnore plumbing that readText uses. 8 new tests: - multi-hop dangling symlink (security) - symlink cycle (security) - ENOSPC/EIO/EBUSY/ETXTBSY/ENAMETOOLONG -> io_error mapping - io_error -> HTTP 503 - EMFILE/ENFILE updated to io_error (was permission_denied) - opts.line rejects Infinity/-Infinity/0/-1/1.5/NaN - edit() audit records matchedIgnore on .log file 426/426 serve tests pass; typecheck clean. Remaining tracked follow-ups (per PR review reply): - list/glob brand cast (P2 deferred per PR description) - glob audit pathHash hashes pattern not paths - edit() TOCTOU read-modify-write race (atomic-via-temp + rename) — pinned to PR 20 - runQwenServe → fsFactory injection integration test — pinned to PR 19 - glob maxResults streaming (glob.iterate) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve/fs): close 3 review-round-4 critical findings (#4250) DeepSeek's fourth review pass surfaced three more critical bugs; all are fixed in-PR. 1. TOCTOU symlink substitution in readText/readBytes/edit (workspaceFileSystem.ts) — Critical. fsp.stat(p) and the subsequent lowFs.readTextFile(p) (or fsp.readFile / fsp.readFile for edit) are two separate syscalls. An attacker who can write into the workspace can swap p from a regular file to a symlink pointing outside between the two calls; pre-stat sees the original, the read follows the swap. Fix: assertInodeStableAfterRead(p, preIno) — after each read, re-lstat p; reject with symlink_escape if the path is now a symlink (post.isSymbolicLink()) or its inode changed (preIno !== post.ino, with a 0-ino fallback for procfs / virtual mounts that don't report meaningful inodes). Catches the swap-and-leave attack and the swap-and-keep-swapped attack. Residual race (attacker swaps back AFTER our read but BEFORE our lstat) is much smaller than the original window and outside PR 18's threat model; fd-based reading via fsp.open + fileHandle.read would close it entirely but requires a new variant of lowFs that takes a FileHandle — tracked as a follow-up. 2. UTF-8 truncation corruption in readText (workspaceFileSystem.ts) — Critical. buf.subarray(0, sizeOutcome.bytesToRead).toString ('utf-8') silently emits U+FFFD when bytesToRead falls mid-codepoint (CJK, emoji). A downstream consumer parsing JSON or source code over the truncated content would see broken trailing bytes; meta.truncated would be true but the prefix is corrupt. A subsequent edit() with the corrupted string as oldText would also fail to match the on-disk content. Fix: safeUtf8Truncate(buf, maxBytes) walks back from maxBytes through any continuation bytes (0b10xxxxxx), then verifies the leading byte's full sequence fits within the cap; drops the leading byte if it doesn't. The result is always a clean prefix at a valid codepoint boundary. Test pins '中文测试' (12 bytes / 3 bytes per char) truncated at 7 bytes -> '中文' (no U+FFFD). 3. glob opts.cwd bypasses workspace boundary (workspaceFileSystem.ts) — Critical. opts.cwd was used directly as the glob root with no validation against boundWorkspace. ResolvedPath is a brand cast and a stale or forged value lets a glob('*/', { cwd: '/etc' }) enumerate files outside the workspace. The pattern-side absolute / UNC checks added in `a81ada43f` only constrain the pattern; cwd is the actual hazard. Fix: at the entry point of glob(), path.resolve cwd and isWithinRoot-check against boundWorkspace. Throws path_outside_workspace if cwd is outside, even when the pattern itself is harmlessly relative. Test pins the case with cwd: scratch (outside workspace). 3 new tests: - readText with mid-operation symlink swap -> symlink_escape - safeUtf8Truncate keeps CJK codepoints intact at 7-byte cap - glob with opts.cwd outside workspace -> path_outside_workspace 429/429 serve tests pass; typecheck + eslint clean. Remaining tracked follow-ups (Post-PR-18 hardening, in #4175): - list/glob brand-cast contract (PR 19) - runQwenServe → fsFactory injection contract test (PR 19) - edit() write-side TOCTOU + atomic-via-temp + expectedHash (PR 20) - glob audit pathHash (independent audit.ts commit) - glob maxResults streaming (independent hardening) - glob pattern preflight refactor to reuse hasSuspiciousPathPattern (cosmetic) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve/fs): close 7 review-round-5 findings (#4250) DeepSeek round-5 surfaced 9 comments; 7 are real findings (the other 2 — UTF-8 truncation + glob opts.cwd — already fixed in `546834267` from round-4 and replied-to inline). Stack on round-4. 1. edit() empty-oldText silent-prepend (workspaceFileSystem.ts) — Critical silent data corruption. JS `''.indexOf('')` returns 0, so without an empty-string guard `current.slice(0,0) + newText + current.slice(0)` = `newText + current` — silently prepends `newText` to the whole file with a success audit event. PR 20 routes that thread user-supplied `oldText` verbatim must not be able to trigger this. Now throws `parse_error` BEFORE the read with a hint explaining why empty matches are rejected. 2. DOS device name regex misses bare names + first-extension forms (paths.ts) — Windows attack surface. The earlier /\.(CON\|PRN\|AUX\|NUL\|COM[1-9]\|LPT[1-9])$/i only caught the last-extension form (file.CON). NTFS reserves these names regardless of extension: CON, NUL, CON.txt, NUL.dat, CON.foo.bar are ALL reserved device handles. New regex /(^\|\.)(CON\|...\|LPT[1-9])(\.\|$)/i covers all four positions (bare / first-ext / last-ext / middle-ext) while still admitting legitimate substrings (BACON, concat.txt, precon.go). 3. Buffer.byteLength replaces Buffer.from for size-only checks (workspaceFileSystem.ts) — 5 MB heap allocation per write eliminated. `writeText` and `edit()` previously materialized the entire UTF-8 payload (up to MAX_WRITE_BYTES) just to read `.length`; `Buffer.byteLength(content, 'utf-8')` returns the count without allocating. 4. edit() error message includes oldText snippet (workspaceFileSystem.ts) — Production debuggability. The earlier hint was just "edit() expects oldText to appear verbatim in the file" — at 3 AM an operator can't tell whether the mismatch is whitespace, a stale file, or a wrong target. Now includes a JSON-quoted truncated snippet (max 80 chars + ellipsis) of the searched-for text. 5. recordAndWrap forwards FsError message into fs.denied audit (workspaceFileSystem.ts + audit.ts) — Audit observability gap. Audit consumers debugging an incident saw `errorKind` + `hint` but lost the underlying OS error detail (path, errno text, byte count). FsDeniedAuditPayload now carries an optional `message` field; recordAndWrap forwards `fs.message` automatically. 6. EISDIR / ENOTDIR distinct hints (errors.ts) — UX. Both shared the same hint "a path component is not a directory where one was expected" — for EISDIR (path IS a directory but a file was expected) the wording was reversed. Now distinct hints with the errno name explicitly cited. 7. kindFromStats / kindFromDirent merged into kindFromStatLike (workspaceFileSystem.ts) — duplicate function bodies removed. Both fs.Stats and fs.Dirent expose the same isFile / isDirectory / isSymbolicLink interface, and both targets (FsStat['kind'], FsEntry['kind']) are the same 4-value union. Single helper avoids drift if the union grows. 4 new tests: - bare/multi-ext DOS device names (CON, NUL, CON.txt, CON.foo.bar) + legitimate substrings (BACON, concat, precon, contemplating) - edit() empty oldText -> parse_error + file unchanged - edit() not-found error includes searched snippet in hint - fs.denied audit payload carries FsError message Plus the 2 already-fixed items (UTF-8 boundary, glob opts.cwd) have new test coverage from round-4. 433/433 serve tests pass; typecheck + eslint clean. Stack: `7b0db4c3a` → `a81ada43f` → `efd7a4611` → `b38e82157` → `911cb8e5d` → `546834267` → THIS 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve/fs): close 7 review-round-6 findings (#4250) Round-6 review (wenshao + gpt-5.5) flagged 7 items including two Critical and one privacy regression I introduced in round-5. 1. writeText / edit pre-write symlink guard (workspaceFileSystem.ts) — Critical, two reviewers (wenshao #CsBcq + gpt-5.5 #CsB3M) independently flagged. `atomicWriteFile` (`packages/core/src/utils/atomicFileWrite.ts`) resolves symlinks at write time, so a swap between the boundary's `resolve()` and `lowFs.writeTextFile()` would let the write follow the symlink to outside the workspace. `writeText` had no read phase, so its window was wider than `edit()`'s. New helper `assertNotSymlinkBeforeWrite(p)` lstats the path immediately before each `lowFs.writeTextFile` call; ENOENT is fine (ahead-of-create flow), but an actual symlink throws `symlink_escape`. Used in `writeText` AND `edit()`. Residual race after this guard but before the write completes is the deferred PR 20 atomic-via-temp follow-up. 2. recordAndWrap message field bypassed privacy mode (audit.ts) — Critical privacy regression I introduced at `ebd9e78a1` (round-5). The new `FsDeniedAuditPayload.message` field forwarded `FsError.message` unconditionally — and many throw-sites embed `${p}` (absolute paths) or user-supplied `oldText` snippets into the message. Privacy-mode operators (without `QWEN_AUDIT_RAW_PATHS=1`) saw paths leak through the message field even though they explicitly disabled raw-path logging. Fixed: `message` is now gated behind `includeRawPaths` alongside `relPath`. Privacy mode = no path-bearing fields, period. Operators wanting forensic context opt in via `QWEN_AUDIT_RAW_PATHS=1` and accept both fields together. 3. glob opts.cwd via symlink (workspaceFileSystem.ts) — gpt-5.5 #CsB3P. Textual `path.resolve(cwd) + isWithinRoot` admits `<ws>/link` even when `<ws>/link → /etc` is a symlink to outside; `globAsync` walks `/etc` before the per-hit filter drops results. Switched to `fsp.realpath(path.resolve(cwd))` so the containment check sees the actual walk root. ENOENT on cwd surfaces as `path_not_found`. 4. readText OOM via concurrent file growth (workspaceFileSystem.ts) — wenshao #CsBeE. The pre-stat `MAX_READ_BYTES` gate only sees the size at stat time; a concurrent writer can grow the file before the actual `readFileWithLineAndLimit` slurp. Added post-read `Buffer.byteLength(result.content) > MAX_READ_BYTES` check. The proper fix (fd-based read tying size + read to the same inode) is a hardening follow-up; this byte-length check is the defense-in-depth layer. 5. readBytes maxBytes can widen past MAX_READ_BYTES (policy.ts) — wenshao #CsBj5. `enforceReadBytesSize(st.size, opts.maxBytes)` used the caller-supplied `maxBytes` as the ceiling, replacing rather than clamping `MAX_READ_BYTES`. A future PR 19/20 route forwarding `req.query.maxBytes` could blindly bypass the daemon's 256 KiB safety cap. Now clamps via `Math.min(maxBytes, MAX_READ_BYTES)`. 6. ENOENT_TOLERATING_INTENTS docstring + test (paths.ts) — wenshao #CsBk3. The Intent docstring only mentioned `'write'` tolerating ENOENT; `'stat'` was in the set undocumented. A future maintainer removing `'stat'` thinking it was a copy-paste error would silently change behavior (stat on a concurrently-deleted path would throw `path_not_found` from the resolver instead of letting `fsp.lstat` throw `ENOENT` naturally). Amended docstring to call out `'stat'`'s rationale explicitly + added contract corpus case. 6 new tests: - glob cwd via symlink to outside → path_outside_workspace - writeText with mid-operation symlink swap → symlink_escape + outside file unchanged - edit with mid-operation symlink swap → symlink_escape + outside file unchanged - readBytes opts.maxBytes attempting widening → file_too_large - fs.denied message field absent in privacy mode (default) - fs.denied message field present in raw-paths mode (forensic) - contract corpus: resolve('newdir/leaf', 'stat') succeeds for ENOENT path 439/439 serve tests pass; typecheck + eslint clean. Stack: `7b0db4c3a` → `a81ada43f` → `efd7a4611` → `b38e82157` → `911cb8e5d` → `546834267` → `ebd9e78a1` → THIS 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve/fs): close 5 review-round-7 findings (#4250) glm-5.1 round-7 review surfaced 6 items; 5 are real fixes, 1 was already addressed in `7f4f30d0b` (writeText pre-write symlink guard — reviewer looked at `ebd9e78a1` snapshot ~8h before the round-6 fix). Stack on round-6. 1. edit() bypassed lowFs.readTextFile (workspaceFileSystem.ts) — Critical encoding round-trip corruption. The earlier `fsp.readFile(p, 'utf-8')` included the UTF-8 BOM verbatim in `current`, breaking `oldText` matching even when the user passed the exact source text from a previous read; lost iconv-supported codepage handling (GBK / Big5 / Shift_JIS), so non-UTF-8 files would mojibake into `current` and round-trip-corrupt on write-back; and the subsequent `lowFs.writeTextFile` passed no `_meta`, stripping the BOM and forcing UTF-8 on write-back even when the original was BOM'd or non-UTF-8. Fix: read via `lowFs.readTextFile` AND forward `readResult._meta` into the write-back. Test pins UTF-8 BOM round-trip end-to-end. 2. hasSuspiciousPathPattern multi-digit and POSIX false positive (paths.ts) — `/~\\d/` only matched single-digit; NTFS allocates `~10+` on >9 collisions. POSIX false-positives: editor swap files, backup tools used `~N` legitimately. Fixed to `/~\\d+/` and gated behind `process.platform === 'win32'`, matching the ADS-colon check. 3. canonicalizeWorkspace redundant per-request syscall (paths.ts) — Performance. Factory canonicalizes once at build; every `resolveWithinWorkspace` also ran realpathSync on the same path, blocking the event loop. Added CANONICAL_BOUND_CACHE Map keyed on the input string; steady-state size = 1 per `1 daemon = 1 workspace`. 4. readBytes opts.maxBytes API contract (workspaceFileSystem.ts) — Semantic mismatch. Parameter name promised window-read; impl only used it as a hard reject gate. Now truncates the buffer post-read so `readBytes(p, { maxBytes: 1024 })` on a 200 KB file returns 1 KB. Hard `MAX_READ_BYTES` cap still throws for files above it. 5. glob walks node_modules and .git unnecessarily (workspaceFileSystem.ts) — Performance. Without an `ignore` option, `globAsync` traversed every file under those dirs before our per-hit `shouldIgnore` filter. Now passes `ignore: ['/node_modules/', '/.git/']` to short-circuit traversal. Post-filter via `shouldIgnore` remains authoritative. 5 new tests: - 8.3 short-name regex Windows / POSIX split - readBytes truncates 2048-byte file to 1024 with maxBytes - readBytes throws file_too_large only above hard cap - edit() preserves UTF-8 BOM round-trip - glob prunes node_modules and .git 442/442 serve tests pass; typecheck + eslint clean. Stack: `7b0db4c3a` → `a81ada43f` → `efd7a4611` → `b38e82157` → `911cb8e5d` → `546834267` → `ebd9e78a1` → `7f4f30d0b` → THIS 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve/fs): close 10 review-round-8/9 findings (#4250) Two more review passes (gpt-5.5 + DeepSeek + wenshao) flagged 13 items; 10 are real fixes, 3 are reviewer-stale-snapshot or already-tracked. Stack on round-7. Critical (5): 1. paths.ts symlink-escape hint embedded the symlink target (gpt-5.5) — Privacy regression sibling to round-6 audit `message` gate. `recordDenied` always forwards `hint` into `fs.denied` even with `QWEN_AUDIT_RAW_PATHS` off; the hint `'symlink points to /Users/alice/secret'` leaks the attacker's intended exfiltration path through audit. Hint is now path-free; operators wanting the resolved target enable `QWEN_AUDIT_RAW_PATHS=1` and read it from `relPath` / `message`. 2. paths.ts dangling-symlink chain discarded its verified canonical (DeepSeek) — After the multi-hop walk validated `cursor → canonicalTarget` was inside the workspace, the code fell through to `findExistingAncestor(absolute)`, re-walking from the original input and discarding the verified result. An attacker swapping an intermediate symlink between the verification and the re-walk could produce a different canonical than the one validated. The verified `canonicalTarget` is now captured in `symlinkResolvedCanonical` and used directly; the `findExistingAncestor(absolute)` fallthrough only runs when no symlink was traversed. 3. workspaceFileSystem.ts readBytes missing post-read size check (DeepSeek) — Same TOCTOU shape as `readText`'s round-6 fix. The pre-stat `enforceReadBytesSize` sees the size at stat time; a concurrent appender keeps the same inode but grows the file past the cap before `fsp.readFile` returns. `assertInodeStableAfterRead` catches inode changes but not same-inode growth. Added a post-read `buf.length > MAX_READ_BYTES` check matching `readText`'s defense-in-depth pattern. 4. errors.ts wrapAsFsError default = permission_denied (DeepSeek) — Misclassified non-errno errors (`TypeError`, programmer-error throws, native module exceptions) as security denials, paging security oncall for what should be a developer ticket. New `internal_error` kind (HTTP 500) is the new default; `permission_denied` reserved for actual `EACCES`/`EPERM`. 5. audit.ts AuditContext.sessionId not forwarded to BridgeEvent (DeepSeek) — Multi-session daemons couldn't trace audit events back to the session that triggered them. `originatorClientId` identifies the client, not the session. Added optional `sessionId` field to both `FsAccessAuditPayload` and `FsDeniedAuditPayload`, forwarded from `ctx.sessionId` when present. Improvements (4): 6. workspaceFileSystem.ts glob cwd realpath redundant when cwd === boundWorkspace (wenshao) — `boundWorkspace` is already canonicalized by the factory (`realpathSync.native` at build time), so calling `fsp.realpath` per-request when no `opts.cwd` was supplied is a redundant async syscall. Added a short-circuit. 7. workspaceFileSystem.ts kindFromStatLike JSDoc orphaned (wenshao) — Inserting `assertNotSymlinkBeforeWrite` between the JSDoc and `kindFromStatLike` left the doc floating above the wrong function. IDE hovers showed the wrong description. Moved the doc back to its function. 8. workspaceFileSystem.ts shared mutable Ignore object (DeepSeek) — `createWorkspaceFileSystemFactory` builds one `Ignore` instance and shares it across every `WorkspaceFileSystemImpl` returned by `forRequest()`. `Ignore.add(): this` is a public mutator. A future "per-session ignore rules" feature calling `.add()` from a request handler would silently corrupt all concurrent sessions. `Object.freeze` turns the cross-request mutation into a `TypeError` rather than a silent leak. 9. server.ts createDefaultFsAuditEmit one-shot warned (DeepSeek) — Permanent silent no-op after the first event; only logged the event `type` with no pathHash / errorKind / intent. If PR 19 forgets the real factory injection, every write 403s and audit is silent past the first warning — exactly the regression the warning exists to surface. Periodic warning (every 100th drop) + first-event context (errorKind, intent, pathHash) makes the regression actionable in production logs. Cleanup (1): 10. workspaceFileSystem.ts safeUtf8Truncate dead code (DeepSeek noted as "off-by-one") — The lead-byte seqLen-check block was dead code: `subarray(0, end)` already excludes the leading byte at `end`, so no further adjustment is ever needed. Removed the block; function is now 4 lines and still produces a valid codepoint prefix. Reviewer's suggested fix (`buf[end-1] → buf[end]`) was technically correct but redundant with the subarray cut. Already-fixed (3 reviewer-stale-snapshot, reply + resolve): - writeText pre-write symlink guard — fixed in `7f4f30d0b` - edit() read-modify-write race — already deferred to PR 20 atomic-via-temp follow-up - glob maxResults walk-bound — already follow-up #5 3 new tests + 2 updated: - wrapAsFsError unknown errno → internal_error (default change) - internal_error has HTTP 500 - non-Error throwables → internal_error (not permission_denied) - readBytes post-stat growth → file_too_large - existing wrapAsFsError test updated for new default 445/445 serve tests pass; typecheck + eslint clean. Stack: `7b0db4c3a` → `a81ada43f` → `efd7a4611` → `b38e82157` → `911cb8e5d` → `546834267` → `ebd9e78a1` → `7f4f30d0b` → `1dc9d2290` → THIS 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve/fs): close 3 review-round-10 findings (#4250) Round-10 review (wenshao) flagged 4 items, all marked non-blocking by the reviewer. 3 are landed in-PR; 1 (glob withFileTypes optimization) is moved to issue #4175 follow-ups since the reviewer themselves recommended "test it on a 100K-file workspace after PR 19 lands." 1. canonicalizeWorkspace docstring (paths.ts) — Documentation. The earlier FIXME warned about sync syscall blocking the event loop without mentioning the `CANONICAL_BOUND_CACHE` added in `1dc9d2290` brings steady-state cost to zero. Future reviewers reading the FIXME again would re-flag the already-mitigated concern. Added a paragraph noting cache hit rate = 100% under the `1 daemon = 1 workspace` model so per-request blocking only happens at boot or on fresh workspace values (e.g. tests). 2. enforceReadBytesSize dead `maxBytes` parameter (policy.ts) — Round-7 changed `readBytes` to use post-read truncation for the soft window cap, leaving the function's `maxBytes` parameter unused at the only callsite. The `Math.min(maxBytes, MAX_READ_BYTES)` clamp branch became dead code. Reviewer's option 1: tighten the signature to `enforceReadBytesSize(fileBytes: number): void`. Done — the function is now purely the hard-cap enforcer its name implies; soft-window truncation lives in the orchestrator's `buf.subarray(0, opts.maxBytes)` post-read step where it's visible alongside the cap check it complements. 3. relForAudit cross-drive sentinel (audit.ts) — Windows-only privacy edge case. `path.relative('C:\\ws', 'D:\\evil')` returns `'D:\\evil'` (an absolute path) because Win32 can't express cross-drive relatives. Even when raw-paths mode is ENABLED, the audit `relPath` field would carry the off-drive absolute path, exposing the attacker's drive letter + directory. Added a `path.isAbsolute(rel)` post-check that substitutes a `<cross-drive>` sentinel — audit consumers see the cross-drive case distinctly without leaking the offending path. This was previously P2 deferred in PR 18's description ("Windows cross-drive path.relative"); reviewer's "few lines beats deferring" assessment was right. Tracked as follow-up (not in this commit): 4. glob withFileTypes optimization — Replace per-hit `lstat` (line ~664) with `glob` v10's `withFileTypes: true` so each hit comes back as a `Path` object with `isDirectory()` / `isFile()` / `isSymbolicLink()` already available. Saves N syscalls in large workspaces. Non-trivial restructure (return type changes from `string[]` to `Path[]`). Reviewer themselves marked "[performance / non-blocking]" and said "test it on a 100K-file workspace after PR 19 lands so we know whether it's worth it." Added to issue #4175 body's Post-PR-18 hardening follow-ups. 446/446 serve tests pass; typecheck + eslint clean. Stack: `7b0db4c3a` → `a81ada43f` → `efd7a4611` → `b38e82157` → `911cb8e5d` → `546834267` → `ebd9e78a1` → `7f4f30d0b` → `1dc9d2290` → `a33d459df` → THIS 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)	2026-05-18 13:14:43 +08:00
jinye	96219924a0	feat(serve): MCP client guardrails (#4175 Wave 3 PR 14) (#4247 ) * feat(serve): MCP client guardrails (#4175 Wave 3 PR 14) Adds an in-process MCP client counter, slot-reservation enforcement at all 3 spawn sites (discoverAllMcpTools / discoverAllMcpToolsIncremental / readResource), new `--mcp-client-budget=N` + `--mcp-budget-mode={enforce,warn,off}` CLI flags forwarded to the ACP child via env, and additive `clientCount` / `clientBudget` / `budgetMode` / `budgets[]` fields plus `disabledReason: 'budget'` tagging on `GET /workspace/mcp`. Always-on capability tag `mcp_guardrails` with `modes: ['warn', 'enforce']` so SDK clients can pre-flight refusal semantics. Typed SSE push events (`mcp_budget_warning` / `mcp_child_refused_batch`) intentionally deferred to a small follow-up PR — the snapshot already exposes `budgets[0].status: 'warning'\|'error'` + `refusedCount` so operator visibility isn't blocked. * fixup(serve): address PR 14 review (#4247) findings 1-7 Addresses Codex + Copilot review feedback on #4247. Seven functional and forward-compat fixes; (8) `tcp` transport mapper vs createTransport deferred pending @wenshao direction (separate core/protocol decision). 1. Single-server rediscovery bypass — add `tryReserveSlot` at the top of `discoverMcpToolsForServerInternal`. Pre-fix a server refused at startup could be brought online later via `/mcp reconnect <name>` and exceed the cap in enforce mode. 2. Empty `budgets[]` when mode=off — early `return []` in `buildBudgetCells` when mode is `off`. Protocol docs / SDK types promise empty array; pre-fix emitted a synthetic noisy cell. 3. runQwenServe validation + env leakage — mirror CLI budget validation in `runQwenServe` (the embedded entry point); explicitly delete `QWEN_SERVE_MCP_` env vars when options are undefined so multiple daemons in one process don't leak prior budget config to subsequent ACP children. 4. Disabled-vs-refused precedence + stale refusal log* — config-disable wins over budget refusal in the per-server cell; `removeServer` + `disconnectServer` drop the entry from `lastRefusedServerNames` so operator action immediately clears the budget tag. 5. Incremental remove-before-reserve ordering — process config-removed servers FIRST in `discoverAllMcpToolsIncremental` so freed slots are visible to subsequent `tryReserveSlot` calls. Pre-fix scenario {a,b}→{a,c} with budget=2 wasted a slot. 6. `scope` forward-compat type widening — `'workspace' \| (string & {})` on both `ServeMcpBudgetStatusCell` and `DaemonMcpBudgetStatusCell` so SDK consumers don't break when PR 23 adds `scope: 'pool'` per the documented no-schema-bump contract. 7. Test comment alignment — fix "With budget=1" comment to match `clientBudget: 2` code. Plus 4 new core regression tests covering #1/#2/#4/#5, and 4 new serve tests covering #3 (boot rejection + env cleanup). 237/237 pass across the affected files (36 core mcp-client-manager + 50 acpAgent + 151 serve). * docs(serve): clarify v1 snapshot-based budget warning detection (#4247) Address github-actions review-summary finding (I) on PR #4247: v1 operators have no SSE push event for budget pressure yet (deferred to PR 14b), so the protocol doc should explicitly say how to detect warning / error states from the snapshot. Adds the three-way mapping `budgets[0].status` ↔ live/refused counts. * fixup(serve): address PR 14 review round 2 (#4247 wenshao) Addresses @wenshao review on PR #4247. Three critical safety fixes + four suggestion-level improvements. Critical (zombie slot leaks — would break `enforce` mode for the rest of the daemon's lifetime): - C2: `discoverAllMcpTools` connect() catch now releases reservedSlots + clients entry. Pre-fix one failed connect permanently consumed a budget slot. - C3: `readResource` wraps client.connect() in try/catch; on throw the slot + client entry are cleaned up before re-raising. Tracked `weReservedSlot` so the cleanup only fires for newly-created lazy spawns (reused already-CONNECTED clients are untouched). - (wenshao C1 was the rediscovery-bypass also caught by Codex + Copilot — already addressed in fixup `597f011e6`.) Suggestion: - S4: `readBudgetFromEnv` downgrades `mode='enforce'` → `'off'` when no budget is set, mirroring the CLI + `runQwenServe` invariant. Fail-closed on operator misconfiguration rather than silently bypassing enforcement. - S5: extract duplicated `mcp_budget_decision` telemetry into private `emitBudgetTelemetry(configuredCount)`. - S6: rename `BudgetExhaustedError` constructor param `liveCount` → `reservedCount`. `reservedSlots.size` is what's blocking the new server, not the live CONNECTED count (those differ when a reserved server is disconnected). - S7a: bump accounting-failure log level — `debugLogger.debug` (gated on debug=true) replaced by `process.stderr.write` so production daemons surface slot-leak / type-mismatch failures in journald/docker logs. (S7b — expose `reservedSlots[]` on the wire for slot-leak debugging — deferred as additive; will be in PR 14b alongside the typed events.) + 3 new core regression tests (C2 leak release, C3 lazy-spawn leak release, S4 env enforce-downgrade). 626/626 tests pass across the focused suite; typecheck + lint clean. * fixup(serve): address PR 14 review round 3 (#4247 wenshao second pass) Addresses @wenshao's second review pass on PR #4247 (submitted 15:56Z after round 2 fixup landed). Four code fixes + three doc clarifications. Code: - R3 #5: `readResource` lazy-spawn path now checks `isMcpServerDisabled` BEFORE the budget gate. Pre-existing gap: a server disabled via `mcpServers.<name>.disabled: true` or `/mcp disable <name>` could be resurrected by any resource read. Disabled precedence over budget mirrors the per-server cell logic. - R3 #6: `buildBudgetCells` now receives the post-disabled-filter `refusedCount` so the workspace cell matches the per-server cell precedence. Pre-fix a server disabled after being refused rendered `disabled` on its per-server row but `error: budget_exhausted` on the workspace row. - R3 #7: extract `MCP_BUDGET_WARN_FRACTION = 0.75` constant. Was hardcoded in `acpAgent.buildBudgetCells` AND `commands/serve.ts` stderr breadcrumb (the latter with `Math.ceil` divergence on non-integer multiples). Pre-extract so PR 14b's dual-threshold (0.75 warn + 0.375 rearm) lands in one file. - R3 #1: env-var enforce-without-budget downgrade (already fixed in round 2 `ba3e3febd` S4 — reply-only on the new thread). Docs: - R3 #2: docstring on `mcpTransportOf` now spells out the `tcp` vs `createTransport` divergence + records the deferred decision (PR 14b / future core). Closes the "comment claims X but code does Y" gap. - R3 #3: comments in both `discoverAllMcpTools` catch (release slot — stop() owns lifecycle) AND `discoverMcpToolsForServerInternal` catch (KEEP slot — operator intent + health-monitor retry). Different paths, different contracts, both explicit. - R3 #4: invariant note in `readResource` lookup→reserve sequence documenting the synchronous no-await guarantee that closes the TOCTOU window. + 3 new core regression tests (readResource disabled gate, disabled-wins-over-budget precedence, MCP_BUDGET_WARN_FRACTION pin). 629/629 tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 4 (#4247 wenshao second + third pass) Addresses @wenshao's second + third review passes on PR #4247. One critical scope-correction (per-session vs per-workspace) + one zombie leak fix shared across three threads. Critical correction — per-session vs per-workspace (wenshao R3 line 117 docs): - Reality check: `acpAgent.newSessionConfig()` constructs a fresh `Config` + `ToolRegistry` + `McpClientManager` for EVERY ACP session. Each manager independently reads `QWEN_SERVE_MCP_CLIENT_BUDGET` env. So `--mcp-client-budget=10` with 5 sessions caps at 5 × 10 = 50 live MCP clients across the daemon, NOT 10. The "per-workspace" framing in v1 docs was incorrect. - Pragmatic v1 path (not the big refactor): rewrite docs + change `scope: 'workspace'` → `scope: 'session'` so the wire contract reflects reality. Wave 5 PR 23 (shared MCP pool) will introduce a workspace-scoped manager and add `scope: 'workspace'` cells alongside. - Files touched: `status.ts` + `sdk types.ts` (cell `scope` field widened to `'session' \| 'workspace' \| (string & {})` with v1 emitting `'session'`), `acpAgent.buildBudgetCells` (emits `'session'` + new code comment explaining the per-session truth), `docs/users/qwen-serve.md` (CLI flag + budget section relabel + ⚠️ v1 limitation callout), `docs/developers/qwen-serve-protocol.md` (capabilities section + JSON example + paragraph rewrite + per-session detection hint). Zombie leak fix — single weReserved-pattern fix in discoverMcpToolsForServerInternal closes wenshao R3 line 546 + R4 line 639 + R4 line 929: - Same pattern as R2 C3 (`readResource`): track `weReservedSlot = reservation === 'reserved' && this.reservedSlots.has(serverName)` (the set-membership guard distinguishes a real fresh reservation from `off`-mode's no-op return). On connect-failure, release slot + drop client only when `weReservedSlot`; an `'already_held'` reconnect keeps its slot so health-monitor retry doesn't compete for capacity. - Pre-fix a brand-new server connecting via /mcp reconnect / health monitor / incremental's serversToUpdate that failed on connect() would permanently consume a budget slot under enforce mode. - Updated R3's "always keep" doc comment to reflect the new two-mode cleanup (release on fresh + keep on reconnect). - Caught and added a tripwire test for the `off`-mode no-op edge case (`tryReserveSlot` returns `'reserved'` without adding to the set in off mode — without the has-guard, my fix would have broken the pre-existing "should restore health checks after failed server rediscovery" test by deleting the failed client even in unbudgeted operation). + 2 new core regression tests (fresh-reserve connect-failure releases slot, reconnect connect-failure keeps slot). 631/631 focused tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 5 (#4247 wenshao fourth pass) Addresses @wenshao's fourth review pass on PR #4247. Two critical zombie-leak / staleness fixes; three reviewer findings deferred or already-addressed (replied + resolved on the threads). Critical fixes: - R5 line 956: `runWithDiscoveryTimeout` timeout handler now releases `reservedSlots.delete(serverName)` and drops the stale `lastRefusedServerNames` entry alongside the existing `clients.delete`. Pre-fix a timed-out server in `enforce` mode permanently held its budget slot; N consecutive timeouts permanently degraded daemon capacity. + regression test. - R5 line 1268-1: `readResource` lazy-spawn path drops the server from `lastRefusedServerNames` when `tryReserveSlot` returns `'reserved'` (a successful late re-reservation). Pre-fix a server refused at discovery but later re-reserved via `readResource` (e.g., after another server freed a slot) kept its stale `disabledReason: 'budget'` tag in the snapshot. + regression test. Reviewer findings deferred / already done (replied + resolved): - R5 line 1268-2 (`no try/catch around connect()` in readResource): stale view — R2 C3 fixup `ba3e3febd` added the try/catch with the weReservedSlot cleanup pattern. - R5 line 1274 (`BudgetExhaustedError.liveCount` semantic mismatch): R2 S6 fixup `ba3e3febd` renamed the param + readonly field to `reservedCount`, exactly matching the proposed semantic. - R5 acpAgent.ts null line (`Math.ceil(0.75 * budget)` for small budgets): proposed fix is semantically a no-op for integer liveCount — `liveCount >= 0.75` and `liveCount >= Math.ceil(0.75) === 1` give identical results when liveCount is an integer. The underlying "small budgets jump ok→error" observation is a real but inherent limitation of percentage-based thresholds at small N; design tradeoff, not implementation bug. 46/46 core tests pass (44 prior + 2 new R5 regression). Typecheck + lint clean. * fixup(serve): address PR 14 review round 6 (#4247 wenshao fifth pass) Addresses @wenshao's fifth review pass on PR #4247. Two critical fixes (one TOCTOU race, one cross-daemon env leak). Critical fixes: - R6 Thread 2 (line 956): remove the duplicate pre-reservation block in `discoverAllMcpToolsIncremental`. The reservation already happens inside `discoverMcpToolsForServerInternal` (R1 fix #1). With both sites reserving, the timeout cleanup raced against the inner connect path — `runWithDiscoveryTimeout`'s timeout handler could release the slot mid-flight while the inner `connect()` later resolved successfully, leaving a CONNECTED client with NO reservation and breaking `enforce`-mode budget enforcement. With pre-reservation removed, the inner call owns the entire reservation lifecycle (reserve → connect → release-on-failure-via-weReservedSlot → cleared-by-timeout-if-fires) at a single site. Refusal behavior is observably identical from outside. - R6 Thread 1 (runQwenServe.ts:216): per-handle env passthrough via new `BridgeOptions.childEnvOverrides` instead of mutating global `process.env`. Pre-fix concurrent embedded `runQwenServe()` handles with different MCP budgets would race on the global env — `defaultSpawnChannelFactory` snapshots `process.env` AT SPAWN TIME, so the last `runQwenServe()` call to set the var would silently win for ALL daemon handles' subsequent ACP child spawns. Wire surface: - `ChannelFactory` signature: `(workspaceCwd, childEnvOverrides?) => Promise<AcpChannel>`. - `BridgeOptions.childEnvOverrides?: Readonly<Record<string, string \| undefined>>` — `undefined` value means "scrub this var from the child env" so an embedded caller can wipe a stale inherited var without touching global state. - `defaultSpawnChannelFactory` merges overrides AFTER `SCRUBBED_CHILD_ENV_KEYS` so the daemon-only secret list still wins (operators can't override the scrub). - `runQwenServe` closes over per-handle overrides; never touches `process.env`. + 3 new regression tests (incremental refusal post-pre-reservation-removal, runQwenServe-doesn't-mutate-process.env, bridge forwards childEnvOverrides to channelFactory with two concurrent bridges asserting isolation). 327/327 focused tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 7 (#4247 wenshao sixth pass) Addresses @wenshao's sixth review pass on PR #4247 (glm-5.1 via Qwen Code /review). One critical staleness fix + four real bug fixes + one operator-visibility breadcrumb + one refactor. Critical: - R7 #1 line 612: `discoverMcpToolsForServerInternal` now drops the entry from `lastRefusedServerNames` on successful connect+discover. Pre-fix a previously-refused server that reconnects via `/mcp reconnect` (or health-monitor retry after another server frees capacity) left the snapshot reporting `error / disabledReason: 'budget'` for a CONNECTED, working server until the next discovery pass cleared the per-pass log. Real bugs: - R7 #2 line 528: disabled gate added to `discoverMcpToolsForServerInternal`. Reachable from `/mcp reconnect`, OAuth re-discovery, and health-monitor `reconnectServer` — none of which previously checked `isMcpServerDisabled`. Pre-fix a disabled server could be resurrected through any of these paths, wasting a budget slot and registering tools the operator told us to ignore. Mirrors the bulk-discovery + readResource patterns. Optional-chain on the call to stay defensive against test fixtures missing the method. - R7 #3 line 634: transport leak in the `discoverMcpToolsForServerInternal` connect-failure catch. Pre-fix when `connect()` succeeded (transport established) and `discover()` later threw, the catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / socket until Node exit. Best-effort `await client.disconnect()` added before the map cleanup. - R7 #4 line 1302: `readResource`'s `weReservedSlot` now uses the same `reservation === 'reserved' && this.reservedSlots.has(serverName)` guard as `discoverMcpToolsForServerInternal`. Distinguishes a real fresh reservation from `off`-mode's no-op return. Maintenance-trap fix; in `off` mode the cleanup branch never fires now. - R7 #5 line 1342: `readResource` re-checks `isMcpServerDisabled` on EVERY call, regardless of whether the client was just lazy-spawned or pre-existing. Pre-fix a server connected pre-disable and then operator-disabled mid-session via settings reload still served resource reads via its existing CONNECTED client until the next incremental discovery pass called `removeServer`. Polish: - R7 #6 line 191: `readBudgetFromEnv` now emits a stderr breadcrumb when env values are invalid (`QWEN_SERVE_MCP_CLIENT_BUDGET=abc`, `QWEN_SERVE_MCP_BUDGET_MODE=foo`). Pre-fix operator typos silently fell through to "no enforcement". Same pattern as the `--require-auth` boot log. - R7 #7 line 464: extracted `dropRefusalEntry` (4 sites) + `refuseAndLog` (3 sites) helpers. Pure refactor, zero behavior change. The `readResource` refusal path now calls `refuseAndLog` before throwing `BudgetExhaustedError` so operators get the same stderr trail as bulk-discovery refusals. + 5 new core regression tests (refusal-cleared-on-success, internal-disabled-gate, discover-throw-disconnects, env-typo-breadcrumb, existing-client-disabled-rejected). 52/52 core tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 8 (#4247 wenshao seventh pass) Addresses @wenshao's seventh review pass on PR #4247 (gpt-5.5 + DeepSeek/deepseek-v4-pro via Qwen Code /review). One critical transport leak + three soundness/consistency fixes; one optional clarity refactor explicitly deferred. Critical: - R8 #1 line 532 (4 duplicate threads): bulk-path transport leak. Mirrors the R7 #3 fix but in `discoverAllMcpTools` instead of the per-server path. Pre-fix: when `connect()` succeeded (transport established) and `discover()` later threw, the bulk catch deleted the client reference without calling `client.disconnect()`, leaking the stdio child / WebSocket / HTTP socket for the rest of the daemon's lifetime (`stop()` can't see what we just removed from `this.clients`). Best-effort `await client.disconnect()` added before `clients.delete` + `reservedSlots.delete`. Updated the doc comment that misleadingly claimed `stop()` was the lifecycle owner — true only for slot bookkeeping, not transports. Soundness: - R8 #2 line 431: tighten `readBudgetFromEnv` mode-without-budget downgrade. Originally only `enforce` got downgraded to `off` when no budget was set; `warn` mode without a budget threshold reached `emitBudgetTelemetry` with `clientBudget: undefined`, contradicting the JSDoc invariant `mode !== 'off' ⇒ clientBudget defined`. Now both `enforce` AND `warn` downgrade to `off` when no budget is configured. The invariant comment was also weakened to match the actual `?? 0` defense-in-depth (the new R8 #5 constructor downgrade closes the remaining edge case). - R8 #5 line 302: constructor mirrors the `readBudgetFromEnv` downgrade for the direct `budgetConfig` parameter. All production callers (CLI, `runQwenServe`, env-var fallback) validate upfront, but a future code path that injects `budgetConfig` directly without re-validating would re-introduce the silent fail-open. Defense in depth. - R8 #4 line 1221: distinguish fresh vs `'already_held'` reservations in `runWithDiscoveryTimeout`'s timeout handler. New private `freshReservations: Set<string>` field marked when `weReservedSlot === true` inside `discoverMcpToolsForServerInternal` and cleared in finally / catch / success. Timeout handler now releases the slot ONLY when `freshReservations.has(serverName)` — meaning the slot was freshly reserved by THIS in-flight call. `'already_held'` reconnect timeouts (a previously-healthy server's transient hiccup) keep the slot so health-monitor retry doesn't have to compete for capacity with new servers admitted during the timeout window. Aligns the timeout handler with the connect-failure catch's `weReservedSlot` semantics — closes the asymmetry wenshao R8 #4 caught. Deferred: - R8 #3 line 332 (`tryReserveSlot` `'observed'` return value clarity): optional, non-blocking style improvement that ripples through 3 call sites + many tests for zero behavior change. Worth doing in a focused refactor PR; flagged as deferred polish, not in this fixup. + 3 new core regression tests (bulk discover-throw disconnects, warn-no-budget downgrade, constructor enforce downgrade). 679/679 focused tests pass; typecheck + lint clean. * fixup(serve): address PR 14 review round 9 (#4247 wenshao eighth pass) Addresses @wenshao's eighth review pass on PR #4247 (glm-5.1 via Qwen Code /review). Six actionable findings adopted; two threads explained as not-actionable (one stale-view, one reviewer hallucination). Critical / real bugs: - R9 #2 line 1534: `readResource` lazy-spawn connect-failure catch now does best-effort `await client.disconnect()` BEFORE `clients.delete` + `reservedSlots.delete`. Mirror of R7 #3 (per-server discovery) and R8 #1 (bulk discovery) — closes the same transport-leak class for the third spawn path. Pre-fix: connect() establishing the transport but throwing on a later handshake step would orphan the stdio child / socket. - R9 #6 line 1521: `readResource` lazy `client.connect()` now wraps in `Promise.race` against `discoveryTimeoutFor(serverConfig)` — same per-server timeout the bulk + incremental paths use. Pre-fix a hung MCP server during a resource-read spawn would block forever and permanently consume a budget slot under enforce mode, cascading into total budget exhaustion. `serverConfig` lookup hoisted to the top of `readResource` so both lazy-spawn and existing-client branches use identical timeout behavior. - R9 #8 line 1514: `readResource` lazy spawn now calls `this.startHealthCheck(serverName)` after a successful connect. Pre-fix a lazy-spawned server that later disconnected (crash, network) had no automatic reconnect — sat DISCONNECTED until the next readResource or incremental pass. Mirrors `discoverMcpToolsForServerInternal`'s finally-block pattern. Operator-visibility: - R9 #7 (general): `readBudgetFromEnv` now writes a stderr breadcrumb when the `(enforce\|warn)`-without-budget downgrade fires. Pre-fix a Docker Compose / k8s env that set `QWEN_SERVE_MCP_BUDGET_MODE=enforce` but forgot the matching `_BUDGET=N` would silently boot with enforcement off and `mcp_guardrails` capability advertised — operator only signal was the snapshot's `budgetMode: 'off'`. Now mirrors the R7 #6 invalid-value breadcrumb pattern. Doc fixes: - R9 #4 line 81: `McpBudgetConfig.clientBudget` JSDoc now reflects the R4 per-session scope correction. The doc was a leftover from the original "per-workspace" framing — every other doc surface (protocol doc, user doc, type comments on the snapshot cell, capability tag) was rewritten in R4 except this one. - R9 #5 line 870: `acpAgent.buildBudgetCells` now spells out the `liveCount` (`accounting.total`, CONNECTED only — operator observability) vs `reservedSlots.size` (all reserved including in-flight — enforcement) semantic distinction. The intentional gap was undocumented in the type signatures, JSDoc, and protocol doc; future PR 14b SSE event payloads should reference both. Not adopted: - R9 #1 acpAgent:15: claimed "MCP_BUDGET_WARN_FRACTION not exported + getMcpClient* methods don't exist + 4 tsc errors" — verified incorrect: the constant IS exported (mcp-client-manager.ts:61), the 3 methods ARE class members (lines 379, 407, 412), and `npm run typecheck` is clean across all 4 workspaces. Reviewer's tool hallucinated this critical finding. - R9 #3 mcp:410: reported the bulk-path transport leak that R8 #1 (commit `7228813c5`) had already closed. Reviewer was on the pre-R8 commit view. + 2 new core regression tests (readResource lazy connect-fail disconnects + R9 #7 stderr breadcrumb). 57/57 core tests + 679/679 focused suite pass. Typecheck + lint clean. * fixup(serve): address PR 14 review round 10 (#4247 wenshao ninth pass) Two non-blocking 🟢 nits — both adopted for symmetry / explicitness. - R10 line 357: constructor downgrade now emits the same stderr breadcrumb the env-var path got in R9 #7. Pre-R10 the `(enforce\|warn)`-without-budget downgrade was silent for the direct-`budgetConfig` path, so a future caller bypassing CLI / env-var validation would have shipped a daemon advertising `mcp_guardrails` while silently disabling enforcement. Now boot logs surface the misconfiguration uniformly across all three resolution paths. - R10 line 1572: documented the `McpClient.disconnect()` cancel-pending-connect contract that the timeout-race cleanup relies on across all three spawn paths (lazy `readResource`, bulk `discoverAllMcpTools`, per-server `discoverMcpToolsForServerInternal`). The bulk path's production stability since #3889 is implicit evidence the contract holds; comment makes the assumption discoverable to the next reader and notes a follow-up unit test would be valuable. No behavior change. 57/57 core tests pass. Typecheck + lint clean.	2026-05-18 12:07:23 +08:00
ChiGao	d07c958bb5	feat(tui): add daemon adapter spike (#4202 ) * docs(tui): draft daemon adapter plan * feat(tui): add daemon adapter spike * fix(tui): harden daemon adapter event handling * fix(tui): report daemon prompt failures * fix(tui): surface daemon terminal failures * fix(tui): harden daemon adapter state handling * fix(tui): harden daemon adapter lifecycle * fix(tui): harden daemon adapter follow-ups --------- Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>	2026-05-18 11:22:39 +08:00
ChiGao	4ab20ff6b8	feat(ide): add daemon connection spike (#4199 ) * docs(ide): draft daemon adapter plan * feat(ide): add daemon connection spike * fix(ide): harden daemon connection lifecycle * fix(ide): harden daemon permission replay * fix(ide): harden daemon connection lifecycle * fix(ide): harden daemon connection adapter * fix(ide): harden daemon permission routing * fix(ide): tighten daemon adapter review gaps --------- Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>	2026-05-18 10:38:26 +08:00
ChiGao	11ba3856df	feat(channel): add daemon bridge spike (#4203 ) * docs(channel): draft daemon adapter plan * feat(channel): add daemon bridge spike * fix(channel): harden daemon bridge session lifecycle * fix(channel): handle daemon terminal frames * fix(channel): harden daemon bridge lifecycle * fix(channel): harden daemon bridge isolation * fix(channel): harden daemon bridge cancellation * fix(channels): close daemon bridge review nits --------- Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>	2026-05-18 10:21:22 +08:00
tanzhenxin	d3cbe7614e	fix(core): preserve read-before-write state across idle microcompaction (#4243 ) * fix(core): preserve read-before-write state across idle microcompaction (#4239) Idle microcompaction blanked old tool outputs and then wiped the entire file-read tracking cache to keep the read_file "unchanged" fast path from pointing at content no longer in history. That wipe also destroyed the read-before-write record, so after any idle break the model was falsely told a file it had already read "has not been read in this session" and forced into a redundant re-read of an unchanged file before it could edit it. Replace the blanket wipe with per-file targeting: when microcompaction blanks a read/edit/write result, only that file's fast-path eligibility is disarmed; the on-disk fingerprint and "seen this session" markers are preserved, so edits/overwrites of an unchanged file proceed without a re-read while a file that actually changed on disk is still correctly caught. If a blanked result's file path cannot be resolved back to its cache entry, fall back to the old blanket wipe so a stale "unchanged" placeholder can never be served. * refactor(core): harden microcompaction fast-path eviction (PR #4243 review) - buildCallIdToFilePath now accumulates paths per callId (Map<string, string[]>) so a reused/duplicate functionCall.id disarms ALL candidate files instead of silently dropping one — over-disarming costs at most a redundant re-read, whereas keeping the wrong file armed would resurrect the issue #4239 dangling-placeholder hazard. - Add a guard-rail comment tying COMPACTABLE_TOOLS to FILE_PATH_TOOLS. - Document why readResidentInHistory is intentionally not reset on the fingerprint-drift branch. - Add tests: duplicate-id disarm, and a mixed on-disk/ghost evicted batch forcing the safe blanket wipe.	2026-05-18 10:05:44 +08:00
jinye	f44ed09412	feat(serve): preflight and env diagnostics routes (#4175 Wave 3 PR 13) (#4251 ) * feat(serve): introduce ServeErrorKind and BridgeTimeoutError (#4175 Wave 3 PR 13) Lay the type foundation for `/workspace/preflight` and `/workspace/env` (and the eventual MCP guardrails route) so cells emitted by all three share a closed `errorKind` taxonomy: - `SERVE_ERROR_KINDS` literal-list + `ServeErrorKind` union — the seven values from #4175 (`missing_binary`, `blocked_egress`, `auth_env_error`, `init_timeout`, `protocol_error`, `missing_file`, `parse_error`). - `BridgeTimeoutError` typed class — `withTimeout` now rejects with this rather than a plain `Error`, letting `mapDomainErrorToErrorKind` recognize init / heartbeat / extMethod timeouts via `instanceof` instead of regex-matching message strings. Message format is preserved bit-for-bit. - `mapDomainErrorToErrorKind` helper — one place to classify `BridgeTimeoutError`, `SkillError`, fs ENOENT/EACCES/EPERM, ModelConfigError subclasses (recognized by `name` field — they aren't on the public surface of `@qwen-code/qwen-code-core`), `SyntaxError`, plus message-regex fallbacks for legacy throw sites (`agent channel closed`, missing CLI entry path). - `ServeStatusCell.errorKind` tightened from open `string` to the closed `ServeErrorKind` union. Backward compatible — PR 12 never assigned the field. - SDK mirrors: `DAEMON_ERROR_KINDS` const + `DaemonErrorKind` type; `DaemonStatusCell.errorKind` tightened. Tests: 11 new unit tests in `status.test.ts` covering each mapping rule plus the BridgeTimeoutError shape. No route changes; no behavior changes for any existing path. * feat(serve): add buildEnvStatusFromProcess helper (#4175 Wave 3 PR 13) Pure helper that constructs the `/workspace/env` payload from `process.` state. No I/O, no ACP roundtrip, no globals beyond `process.env`. The route itself lands in the next commit. - `ServeEnvKind` discriminant: `runtime \| platform \| sandbox \| proxy \| env_var` - `ServeEnvCell extends ServeStatusCell` with `name` + optional `present` / `value`. Cells with `kind: 'env_var'` are presence-only — `value` is ALWAYS omitted to keep secret env vars off the wire even by accident. - `ServeWorkspaceEnvStatus` envelope: `{ v, workspaceCwd, initialized: true, acpChannelLive, cells, errors? }`. `initialized` is structurally `true` because env answers from the daemon process directly; `acpChannelLive` reports whether a child is up but does not change the payload shape. Whitelist policy: - Auth/secret keys (presence-only): OPENAI/ANTHROPIC/GEMINI/GOOGLE/DASHSCOPE/ OPENROUTER `_API_KEY`, `QWEN_SERVER_TOKEN`. - Non-secret keys (also presence-only for shape uniformity): base URLs, locale, TZ, NODE_EXTRA_CA_CERTS, QWEN_CLI_ENTRY. - Proxy vars (`HTTP_PROXY`/`HTTPS_PROXY`/`NO_PROXY`/`ALL_PROXY` + lowercase variants): credentials stripped via `redactProxyCredentials`, then `URL().host` so the wire only carries `host:port`. NO_PROXY is a host list rather than a URL so we pass the redacted form verbatim. SDK mirrors: `DaemonEnvKind`, `DaemonEnvCell`, `DaemonWorkspaceEnvStatus`. Tests: 9 unit tests covering the proxy-credential redaction, lowercase env fallback, NO_PROXY pass-through, presence-only `env_var` invariant (`'value' in cell === false`), whitelist enforcement, runtime tag detection, and envelope shape. feat(serve): add GET /workspace/env route (#4175 Wave 3 PR 13) Wire `buildEnvStatusFromProcess` from the previous commit through the bridge, server, and SDK so remote clients can pre-flight the daemon's runtime environment without spawning an ACP child. - `workspace_env` capability tag (always advertised on a current daemon). - `bridge.getWorkspaceEnvStatus()` answers entirely from `process.` — the route never consults ACP. `acpChannelLive` reflects whether a child exists but does not change the payload, so an idle daemon and a busy one return the same env shape. - `app.get('/workspace/env', ...)` mirrors PR 12's one-liner pattern. - SDK: `DaemonClient.workspaceEnv()` returning `DaemonWorkspaceEnvStatus`. - Docs: bullet in `docs/users/qwen-serve.md` calling out the presence-only redaction policy and the no-ACP-spawn guarantee. Tests: server-level (env returned + `'value' in env_var === false` assertion), bridge-level (idle and live both answer locally without hitting ACP extMethod), SDK-level (recording-fetch round-trip on `/workspace/env`). The `workspace_env` tag is added to the `EXPECTED_STAGE1_FEATURES` capability list assertion. feat(serve): add /workspace/preflight daemon-cells path (#4175 Wave 3 PR 13) Wire the preflight route. Daemon-level cells are populated unconditionally from `process.` and `node:fs`; ACP-level cells fall back to `not_started` placeholders when no child is alive so a poll never spawns one. - `workspace_preflight` capability tag. - `ServePreflightKind` discriminant (12 values: node_version, cli_entry, workspace_dir, ripgrep, git, npm — daemon-level — plus auth, mcp_discovery, skills, providers, tool_registry, egress — ACP-level). - `ServePreflightCell extends ServeStatusCell` with `locality: 'daemon' \| 'acp'` + free-form `detail`. `ServeWorkspacePreflightStatus` envelope. - `createIdleAcpPreflightCells()` factory: emits the six ACP-level cells with `status: 'not_started'` + a uniform `hint` so the bridge can stitch them in alongside daemon cells without ever calling ACP. - `bridge.getWorkspacePreflightStatus()`: - Daemon cells via `buildDaemonPreflightCells` (Promise.all over Node-version, CLI-entry resolution mirroring `defaultSpawnChannelFactory`, `fs.stat` on `boundWorkspace` with ENOENT/EACCES/EPERM mapped to `missing_file`, best-effort `canUseRipgrep` / `getGitVersion` / `getNpmVersion` warnings). - ACP cells via `requestWorkspaceStatus` — idle factory returns the `not_started` placeholders; live path delegates to ACP via the `qwen/status/workspace/preflight` ext method (handler lands in next commit). Bridge-side timeout / channel-close while consulting ACP folds into envelope `errors[]` with `mapDomainErrorToErrorKind` classification; daemon cells still render. - `app.get('/workspace/preflight', ...)` route + JSDoc bullet. - SDK: `DaemonPreflightKind` / `DaemonPreflightCell` / `DaemonWorkspacePreflightStatus` mirrors; `DaemonClient.workspacePreflight()`. Tests: server-level (route returns the bridge payload), bridge-level (idle returns 6 daemon + 6 ACP `not_started` cells without spawning a channel), SDK-level (`workspacePreflight()` round-trip). Capability test updated. feat(serve): wire ACP-side preflight cells (#4175 Wave 3 PR 13) Populate the six ACP-level preflight cells inside the ACP child so `/workspace/preflight` returns real values for live sessions. - `extMethod(qwen/status/workspace/preflight, ...)` dispatches to a new `buildAcpPreflightCells(config)` private method. - Five cell builders, each returning a `ServePreflightCell` with `locality: 'acp'`: - `auth`: `validateAuthMethod(authType, config)` returning non-null string → `auth_env_error`. Missing auth method → warning. Throws classified via `mapDomainErrorToErrorKind` with `auth_env_error` fallback. - `mcp_discovery`: rolls up `getMCPDiscoveryState()` + per-server `getMCPServerStatus(name)` counts. `connecting > 0` or in-progress discovery → warning + `init_timeout`; `disconnected > 0` post-discovery → error + `protocol_error`. - `skills`: `SkillManager.listSkills()`; SkillError throws are mapped via the helper (`PARSE_ERROR` → `parse_error`, `FILE_ERROR` → `missing_file`). - `providers`: `getAllConfiguredModels()`; empty list with a configured `authType` → warning + `auth_env_error`. ModelConfigError throws map to `auth_env_error`. - `tool_registry`: null registry → error + `protocol_error`. Otherwise surfaces tool count. - `egress`: stays `not_started`. PR 14 plugs in the real probe. - `errorCell` private helper extended with optional `errorKind` parameter; defaults to `mapDomainErrorToErrorKind(error)` so existing call sites (`mcp` / `skills` / `providers` envelope errors) automatically gain classification. Tests: 2 new acpAgent tests — preflight returns the six expected ACP cells with correct locality + statuses; preflight surfaces a `SkillError` (`PARSE_ERROR`) on the `skills` cell as `errorKind: 'parse_error'`. The core `vi.mock` block adds a SkillError class for `instanceof` matching inside `mapDomainErrorToErrorKind`. * docs(serve): preflight and env protocol section (#4175 Wave 3 PR 13) Document `/workspace/env` and `/workspace/preflight` end-to-end: - Common-cell shape: tighten `errorKind` from open `string` to the closed `DaemonErrorKind` enum (seven literals from #4175). Add an explicit redaction-policy paragraph covering env-var presence-only, proxy host:port reduction, and the whitelisted-secrets list. - Capability-tag list: add `workspace_env` and `workspace_preflight`. - New `### GET /workspace/env` section with sample payload, `DaemonEnvKind` / `DaemonEnvCell` types, and the redaction-policy paragraph spelling out which secret env vars are enumerated and how proxy URLs are reduced to `host:port`. - New `### GET /workspace/preflight` section with idle sample payload, `DaemonPreflightKind` / `DaemonPreflightCell` types, the seven-value `errorKind` semantics table, and the bridge-error fallback contract (mid-request ACP channel close → cells drop to `not_started` + envelope carries one `errors[]` entry). - Source-layout table: extend the `status.ts` row to mention the new `ServeErrorKind` / `BridgeTimeoutError` / `mapDomainErrorToErrorKind` surface; add a new `envSnapshot.ts` row.	2026-05-18 07:29:05 +08:00
qqqys	f84ddd434b	feat(core): fail impossible goals (#4230 ) Some checks are pending Qwen Code CI / Classify PR (push) Waiting to run Details Qwen Code CI / Lint (push) Blocked by required conditions Details Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions Details Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions Details Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions Details Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Blocked by required conditions Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details * feat(core): fail impossible goals * fix(core): refine impossible goal judgement * fix(core): include goal feedback when continuing * fix(core): clarify impossible goal terminal state * fix(core): harden impossible goal feedback * fix(core): log suppressed impossible verdicts * fix(goal): address review suggestions * test(goal): cover impossible parsing suggestions	2026-05-18 00:31:51 +08:00
Shaojin Wen	c93d66cd23	fix(serve): align build and integration test coverage (#4248 ) * fix(serve): align test coverage with build inputs Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(serve): address review feedback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-18 00:01:47 +08:00
Shaojin Wen	ad23c7ab34	docs: user + design docs for --json-schema structured output (#4051 ) * docs: add user + design docs for --json-schema structured output Follows up #3598 (cli/core feature shipped to main, no docs). User doc `docs/users/features/structured-output.md` — covers quick-start, schema input forms (inline + `@path`), output shapes per `--output-format`, parse-time restrictions, retry/failure modes, privacy redaction, permission gating, MCP shadow-tool handling, and a worked `jq`-piped pipeline example. Registered under the existing `features/_meta.ts` so it shows up in the docs sidebar between "Headless Mode" and "Dual Output". Design doc `docs/design/structured-output/structured-output.md` — why the synthetic-tool-whose-param-schema-is-the-user-schema approach, the four-stage parse-time validation pipeline, `schemaRootAcceptsObject`'s decided-vs-deferred boundaries, main-turn vs drain-turn parity via `processToolCallBatch`, the structured- success terminal block, the cross-surface privacy redaction sharing `STRUCTURED_OUTPUT_REDACTED_ARGS`, subagent context handling (`forSubAgent`), MCP shadow-tool guard, the compatibility surface, alternatives considered (and why rejected), and a file-by-file index. Both docs are English-only — repo convention is English-only for both `docs/users/features/` (zero zh-CN siblings) and `docs/design/` (only `customize-banner-area/` has a zh-CN twin). Open to adding zh-CN translations as a separate PR if there's demand. * docs(structured-output): address PR review feedback User doc: - explicit stdout-vs-stderr contract and `{}`-schema behavior. - 500 ms shutdown-holdback latency note. - ReDoS warning for user-supplied `pattern` keywords. - root `$ref` rejection + `allOf` workaround. - per-retry token cost note. - sibling-suppression success vs retry paths split out. - numeric exit codes (1 / 53 / 130) for every failure mode. - new "Session resumption" section for --continue / --resume. Design doc: - gloss the ToolSearch on-demand-loading reference. - `not` row: drop the array-indexing-lookalike `[…]`. - 500 ms holdback is best-effort, not guaranteed. - redaction rationale extends to validation-failure retries. - `CORE_TOOLS` phrasing: structured_output is excluded FROM the set; skill is in a separate dynamically-discovered category. - subagent suppression maintainer note (single brittle call path). - `--bare` parenthetical lists the three retained core tools. - PR #4001 status (closed 2026-05-11, superseded). * docs(structured-output): correct empty-schema / holdback / SIGINT claims Three doc claims were stronger than the actual code behaviour: - Empty schema produces `{}`, not `null`. `turn.ts` normalises the tool args via `(fnCall.args \|\| {})` before they land in `structuredSubmission`, so a zero-arg call against `{}` is emitted as `{}` on stdout. The `?? null` in the adapter is defence-in-depth for the strictly-undefined case, which the upstream path doesn't produce. - Holdback is a cap, not a fixed wait. The loop guard is `Date.now() < deadline && registry.hasUnfinalizedTasks()`, so it exits immediately when nothing is in flight. Reword as "capped at ~500 ms" with an early-exit note. - SIGINT can still flush a captured result. The holdback loop does not poll the abort signal, so a SIGINT after the structured call is captured but before `adapter.emitResult` finishes may still land on stdout. Treat exit code 130 as the source of truth. Also addresses the new auto-review summary suggestion about per-turn schema cost: pull the cost callout up out of the bullet list (so it covers both retry cost and schema-embedded-every-turn cost), since the schema-embedding cost isn't retry-specific. * docs(structured-output): correct stdout/stderr + json-mode envelope claims Two doc claims didn't match `JsonOutputAdapter.emitResult`: - Model prose doesn't go to stderr in text mode. Only error messages and log lines do. Successful runs emit just the JSON-stringified payload on stdout; accumulated assistant prose is discarded entirely (not mirrored to stderr). Point users at `--output-format json` / `stream-json` when they need the prose. - `--output-format json` emits a JSON array, not a single document with top-level fields. The adapter calls `JSON.stringify(this.messages)` where `messages` is an array of message objects. `structured_result` lives on the final `type: "result"` element of that array, not at the document root, so consumers must read `.[-1].structured_result` rather than `.structured_result`. * docs(structured-output): note schema-itself reaches the provider The Privacy section so far only described `structured_output` args being redacted from local on-device surfaces (telemetry + chat recording). The schema body is a separate exposure surface — it ships as the function declaration's `parameters` block on every model request, so `enum`, `const`, `default`, `examples`, `description`, `$comment`, etc. travel to the provider in cleartext. Users defaulting to "redaction covers everything" could legitimately leak secrets via schema-literal fields. Add a callout in the user doc, plus a parallel paragraph in the design doc explaining why the redaction stops at on-device surfaces (the model needs the schema to satisfy the tool-call contract, so provider-side redaction isn't possible). * docs(structured-output): correct stdout-on-failure / ReDoS example / hooks / --bare deny / typo Five issues from the latest /qreview pass: - stdout-vs-stderr is text-mode only. In `--output-format json` and `stream-json`, the failure result message is emitted on stdout (final element of the JSON array, or the terminating `result` line on the JSONL stream). Wrappers in those modes must switch on `is_error`, not on whether stdout is empty. - ReDoS example didn't actually demonstrate the threat. JSON Schema `pattern` only fires on string instances, and tool args are always objects, so the bare `{"pattern": "(a+)+b"}` schema doesn't constrain anything the model can supply. Move the pattern inside a string-typed property. - Hooks see raw `tool_input`. `PreToolUse` / `PostToolUse` / `PostToolUseFailure` receive the unredacted args — including HTTP hooks that can forward off-device. Call this out explicitly so users with audit-style catch-all hooks know to filter or add hook-side redaction. - `--bare` drops settings-level deny. Bare mode builds `mergedDeny` as `[...(bareMode ? [] : settings.permissions.deny), …]` — settings-level denies are skipped while the synthetic tool stays registered. Argv-level `--exclude-tools` still applies. Document this exception in the user doc and the design doc. - `maxSessionTurns` hint typo. The hint points at "schema is unsatisfiable" — the original text inverted the polarity. * feat(core): PR-2.5 — post-promote stream redirect + natural-exit registry settle Closes the two limitations PR-2 (#3894) deferred for the Phase D part (b) Ctrl+B promote flow (#3831): 1. Post-promote stream redirect: today the `bg_xxx.output` file is frozen at promote time because `ShellExecutionService` detaches its data listener as part of PR-1's ownership-transfer contract. PR-2.5 wires a caller-side `onPostPromoteData` callback so bytes from the still-running child append to the file via an `fs.createWriteStream` opened in `handlePromotedForeground`. 2. Natural-exit registry settle: today the registry entry stays `'running'` until `task_stop` / session-end `abortAll` fires its abort listener. PR-2.5 wires `onPostPromoteSettle` so natural child exit transitions the entry to `'completed'` / `'failed'` with the right exitCode / signal / error message. ## Service (`shellExecutionService.ts`) - New exported types: `ShellExecuteOptions`, `ShellPostPromoteHandlers`, `ShellPostPromoteSettleInfo`. - `execute()` options bag now accepts `postPromote?: { onData, onSettle }`. Threaded through to both `executeWithPty` and `childProcessFallback`. - PTY's `performBackgroundPromote` (line ~1159): after disposing the foreground data + exit + error listeners, RE-ATTACH minimal forwarders that call `postPromote.onData` / `postPromote.onSettle` when the caller opted in. Backwards compat: when `postPromote` is unset the PR-2 detach-everything contract is preserved (the re-attach is gated on each callback being defined). - `childProcessFallback`'s `performBackgroundPromote` (line ~706): same pattern — re-attach `stdout.on('data', ...)`, `stderr.on('data', ...)`, `child.once('exit', ...)`, `child.once('error', ...)` when the caller opted in. `error` listener routes through `onSettle` with `error` populated, so spawn-side errors after the foreground errorHandler detached don't crash the daemon via the default unhandled `'error'` event. - Both paths wrap caller callbacks in try/catch so a thrown handler doesn't crash the child's data loop / unhandled-rejection the service. ## Shell tool (`shell.ts`) - New `PromoteArtifacts` type — slots shared between the foreground `execute()` postPromote handlers (which fire on the service side as soon as promote happens) and the post-resolve `handlePromotedForeground` finalizer (which runs after `await resultPromise` returns). The two race; the buffer + settle-queue absorb that race so neither chunks nor the eventual exit info are lost. - `executeForeground` wires `postPromote` handlers that route data to either `promoteArtifacts.stream` (if open) or `promoteArtifacts.buffer` (drained when the stream opens), and queue settle info if the wired handler isn't yet installed. - `handlePromotedForeground` opens `fs.createWriteStream(outputPath, { flags: 'w' })`, writes the initial snapshot first, drains the buffer, then registers the entry and wires `onSettleWired` with the full registry decision table: - `error` set → `registry.fail(shellId, error.message, endTime)` - `exitCode === 0` → `registry.complete(shellId, 0, endTime)` - non-zero exitCode → `registry.fail(shellId, "Exited with code N", endTime)` - signal !== null → `registry.fail(shellId, "Terminated by signal N", endTime)` - all-null fallback → `registry.fail(shellId, "Exited with unknown status", endTime)` - Fires queued settle synchronously after wiring so a fast command that exits between promote and finalizer doesn't get lost. - Self-audit catch: closes the output stream on the `registry.register` throw path so the FD doesn't leak past the orphan-child kill. ## Tests - 3 new in `shellExecutionService.test.ts`: - `post-promote bytes route to postPromote.onData when callback provided` - `postPromote.onSettle fires on natural child exit after promote` - `backwards compat: without postPromote, listeners stay fully detached` - 3 new in `shell.test.ts` under a `foreground → background promote PR-2.5` describe block: - `post-promote bytes APPEND to bg_xxx.output via write stream` - `natural child exit transitions registry entry to "completed"` - `non-zero exit / signal / error → "failed" with descriptive message` - Bulk-replaced 50 prior `{},` (empty 6th-arg shellExecutionConfig) with `expect.objectContaining({}),` + added `expect.objectContaining({ postPromote: expect.any(Object) }),` as the 7th-arg expectation for the foreground execute call. - Updated the existing `registers a bg_xxx entry on result.promoted` test to assert on `fs.createWriteStream` + `stream.write` instead of the now-removed `fs.writeFileSync` snapshot path. 182/182 shell.test.ts pass + 73/73 shellExecutionService.test.ts pass + 111/111 coreToolScheduler.test.ts pass + 60/60 AppContainer.test.tsx pass; tsc + ESLint clean. Self-audit: 3 rounds (positive / reverse / cross-file) found one issue — output stream FD leak on `registry.register` throw — and fixed it before flagging complete. All flagged edge cases (stream errors, child-exits-before-wire-up race, task_stop during natural- exit window, promote-never-happens cleanup, backwards compat without callbacks) have explicit handling and / or test pinning. * fix(core): #4102 review wave — 3 Critical + UTF-8 + tests 3 Critical race/correctness issues + 1 multibyte-corruption suggestion + 3 test coverage gaps addressed: Critical 1 — child_process late-chunk drop (service) Settle was fired on 'exit', but stdout/stderr can emit buffered data between 'exit' and 'close'. Late chunks landed in `promoteArtifacts.buffer` after shell.ts had already closed the stream + transitioned the registry → silently dropped → truncated `bg_xxx.output`. Switched to listening on 'close' which guarantees all stdio is fully drained. (code, signal) payload is identical to 'exit', just with proper ordering. Critical 2 — stream-flush wait before registry transition (shell) `stream.end()` is asynchronous; pending writes can still be in the libuv queue when it returns. The old code transitioned the registry immediately after `.end()`, so a /tasks consumer could observe a `completed` entry and read the output file BEFORE the trailing bytes were on disk. Fixed: wired settle now `stream.once('finish', ...)` BEFORE calling `registry.complete/fail`. `error` event also short-circuits to the transition so a late ENOSPC doesn't hang the settle path forever. Critical 3 — stream-open-fail buffer leak (shell) If `fs.createWriteStream` threw, the catch path set `stream = null` but the foreground `onData` handler would still take the `stream === null` branch and push chunks into `promoteArtifacts.buffer` — unbounded growth under a sustained child whose output file couldn't be opened. Added a `streamFailed: boolean` latch on `PromoteArtifacts`. When set, `onData` drops chunks (with a debug log) instead of buffering. The catch branch sets the latch. Suggestion — shared TextDecoder corrupts multibyte UTF-8 (service) child_process post-promote used ONE TextDecoder for both stdout AND stderr. The decoder's continuation-byte state machine assumes one byte source; interleaved multibyte chunks corrupted. Now uses separate decoders + flushes both with `decode()` (no `stream: true`) on settle so trailing bytes surface as their final characters. Suggestion — llmContent reflects already-settled status (shell) When the queued-settle drain transitions the registry synchronously (fast-exit race), the model-facing copy was still saying "Status: running. … task_stop({...})". Updated to branch on `postPromoteAlreadySettled` / `postPromoteFinalStatus` — when the process is already gone, the copy says "Status: completed/failed" and replaces the `task_stop` suggestion with "Process has already exited; no `task_stop` needed". Suggestion — test coverage gaps Added: (a) `queued-settle race: onSettle BEFORE handlePromotedForeground completes` — custom service impl fires onSettle synchronously before resolving the promote promise, pins the drain path. (b) child_process post-promote tests for stdout/stderr forwarding + 'close'-not-'exit' settle + spawn-error settle. Self-audit: Round 1 + reverse audit. Stream.once mock added to fire 'finish' synchronously so existing tests don't hang on the new flush wait. 76/76 shellExecutionService.test.ts (+3) + 183/183 shell.test.ts (+1) pass; tsc + ESLint clean. * fix(core): #4102 review wave-2 — 3 more from gpt-5.5 C1 (shell.ts:2227): the WriteStream `'error'` event handler only logged. `fs.createWriteStream` reports common open failures (ENOENT / EACCES / ENOSPC) asynchronously via that event rather than throwing. Result: `promoteArtifacts.stream` kept pointing at the failed stream; `onSettleWired` attached a `.once('finish')` listener that would never fire → registry stuck on `running` forever. Latch the failure (null the shared `stream` slot, set `streamFailed`); `onSettleWired`'s existing `if (!stream)` branch then transitions the registry immediately. C2 (shellExecutionService.ts:1468): the promote handoff removes the foreground `ptyErrorHandler` and only re-attaches data + exit listeners. A subsequent PTY `error` event had no listener — Node treats an unhandled `error` from an EventEmitter as a fatal exception that takes the whole CLI down. Attach a post-promote forwarder that ignores expected PTY read-exit codes (EIO / EAGAIN, same filter the foreground handler uses) and routes unexpected errors through `postPromote.onSettle` with `error` populated. Single-fire latch shared with `onExit` so settle never fires twice. C3 (shell.ts:2503): `onSettleWired` waits for the stream's asynchronous `'finish'` event before flipping `postPromoteAlreadySettled`, but the model-facing `statusLine` was built immediately after invoking `onSettleWired` on the queued settle. A fast-exited promoted command could therefore land "Status: running" + a `task_stop` instruction in production even though settle was already observed. Split into two flags: `postPromoteSettleObserved` (set synchronously when settle is classified) drives the model copy; the registry transition stays behind the stream flush. Tests: +1 PR-2.5 wave-2 PTY error-routing test; +2 shell.ts tests (stream open async error → registry still transitions; async `'finish'` after queued-settle drain → llmContent says 'completed' before registry transition fires). * fix(core): #4102 review wave-3 — 4 actionable from deepseek-v4-pro T2 (shell.ts:2456) — Critical buffer-leak race `onSettleWired` previously set `promoteArtifacts.stream = null` BEFORE calling `stream.end()`. Any `postPromote.onData` chunk that landed between that null assignment and the actual flush completing saw `stream === null && streamFailed === false` and pushed into `promoteArtifacts.buffer` — a buffer that has no further drain path (the foreground finalizer has already returned). Result: chunks stranded indefinitely; PTY mode in particular hits this because `onExit` can fire while kernel buffers still hold data. Fix drains the pre-settle buffer to the stream BEFORE nulling AND latches `streamFailed = true` so any subsequent chunk drops via the existing `else if (streamFailed)` arm in `onData` instead of leaking. Updates the `streamFailed` doc to cover both setters (open-fail and settle-done) so the dual semantic is explicit. T3 (shell.ts:2262) — silent chunk-drop in catch path When `fs.createWriteStream` throws synchronously (rare: ENOENT on a vanished tmpdir), chunks already in `promoteArtifacts.buffer` were silently lost with no observability — oncall reading a truncated `bg_xxx.output` had no way to distinguish "stream open failed" from "child produced nothing." Logs the dropped chunk count and empties the buffer. T5 (shell.ts:2443) — opaque all-null fallback The "Exited with unknown status" fallback fired the registry to 'failed' without any context about which fields were null. This branch is meant to be unreachable; hitting it indicates the service emitted a defective settle info object. Includes the field values in both the fail message and a warn log so the oncall engineer can tell this path apart from the other "failed" branches. T6 (shellExecutionService.ts:1452) — leaked PTY post-promote listeners `ptyProcess.onData(...)` returns an `IDisposable` that was being discarded; same for `onExit`. The `'error'` listener function was also not captured (no way to `removeListener` it). EventEmitter holds refs to listener closures, which transitively hold refs to `onPostData` / `onPostSettle` / the caller's `promoteArtifacts`. While bounded by the PTY's lifetime, the closures keep the caller's state pinned for the post-settle delay window. Captures all three handles into `postPromoteDataDisposable` / `postPromoteExitDisposable` / `postPromoteErrorListener`, then releases them via a shared `disposePostPromoteListeners()` call from `firePostSettle` (idempotent — each slot null-checked and nulled after disposal). Tests: +1 service test for IDisposable + error-listener cleanup; +2 shell.ts tests for buffer drain race and catch-path snapshot fallback. Existing tests stay green (262 → 265 in the touched suites; 7819 → 7822 across the core package). * fix(core/test): drop unused 'registry' in wave-3 T2 test (TS6133) CI build failed across all platforms with src/tools/shell.test.ts(4395,15): error TS6133. The variable was a leftover from copying the queued-settle test pattern; the wave-3 T2 test inspects writeStreamMock.write call history directly and never reads the registry, so the assignment is dead code. Drop it. * fix(core): #4102 review wave-4 — 6 actionable from gpt-5.5 + deepseek-v4-pro T1 (Critical, shellExecutionService.ts:860 child_process onSettle exactly-once) The PTY path used a `firePostSettle` latch but child_process wired `close` and `error` independently to `onPostSettle`. A spawn-side error followed by Node's auto-emitted `'close'` would call the caller's settle TWICE, racing the registry transition. Added the same single-fire latch on the child_process path. T2 (Critical, shell.ts:2264 handoff race reorder) Original order was `write(snapshot) -> drain buffer -> assign stream`. Synchronous today (no race in current code), but assign-after-drain leaves a hazard for any future refactor that adds an `await` inside the drain loop — a chunk arriving in that window would land in `promoteArtifacts.buffer`, then post-assign chunks would write to the stream first, producing out-of-order bytes until the settle drain. Reordered to `write(snapshot) -> assign stream -> drain buffer`, which closes the hazard regardless of future async additions. T3 (Suggestion, shellExecutionService.ts:816 decoder flush gated on onSettle) The trailing-multibyte flush ran inside the `child.once('close', ...)` handler, which was only installed when `onSettle` was set. An `onData`-only caller (no onSettle) lost trailing continuation bytes silently. Hoisted flush into `flushPostPromoteDecoders` called from `firePostSettle`, and made `firePostSettle` available on the `'close'` path independent of onSettle (T6 install). T4 (Suggestion, shell.ts:1700 promoted ANSI passthrough) The regular `executeBackground` path strips ANSI before writing to `bg_xxx.output`; the promoted-foreground onData path appended raw chunks. Reading `bg_xxx.output` after Ctrl+B showed plain text up to the snapshot then raw `\x1b[31m` / cursor-move / clear-screen sequences for the post-promote tail — unreadable. Apply `stripAnsi(rawChunk)` before write/buffer, matching the executeBackground contract. T5 (Suggestion, shellExecutionService.ts:786 UTF-8 hardcoded) The post-promote child_process decoders were hard-coded to `new TextDecoder('utf-8')`, but the foreground decoder runs encoding detection via `getCachedEncodingForBuffer`. On a non-UTF-8 child (e.g. GBK on a Chinese Windows shell), the snapshot decoded correctly but the post-promote tail was mojibake. Capture the foreground decoder's `.encoding` property and reuse it for post-promote (with utf-8 fallback if foreground hadn't seen any bytes yet, and a try/catch around `new TextDecoder` for the rare unsupported-encoding case). T6 (Suggestion, shellExecutionService.ts:1540 `error` listener gated on onSettle) The post-promote `error` listener was attached only when `onSettle` was set. An `onData`-only caller still had the foreground errorHandler detached; a post-promote spawn error would then crash the CLI via Node's unhandled-error default. Hoisted the close + error listeners into `if (postPromote)` so any caller opting into post-promote gets crash protection; if `onSettle` is absent the listeners log + drop instead of routing. T7 (Suggestion, shellExecutionService.ts:791 onSettle-only pipe-block deadlock) Same root cause as T6: when only `onSettle` is set, the foreground `stdout`/`stderr` 'data' listeners are detached and no post-promote listener replaces them. The Readables stay paused, the OS pipe buffer fills (~64KB on Linux), the child blocks on `stdout.write`, 'close' never fires, onSettle never fires. Added `child.stdout?.resume()` and `child.stderr?.resume()` in the no-onData branch so the child can drain its pipes and reach exit. T8 (Suggestion, shell.ts:2614 dead inspectLine ternary) `inspectLine`'s ternary returned the same string on both sides — copy-paste leftover from when the other two adjacent ternaries (statusLine / stopLine) were correctly varied. Collapsed to a single string assignment. Tests: +5 regression tests (4 child_process: T1 double-fire latch, T3 onData-only flush, T6 onData-only error survives, T7 onSettle- only resume; +1 shell.ts: T4 ANSI strip). 265 -> 270 in the touched suites; 7822 -> 7827 across the core package; full suite green. * fix(core/test): use ShellOutputEvent type in wave-4 onData callbacks (TS2345) CI lint failed on the wave-4 (T3 / T6) tests with TS2345: pushing ShellOutputEvent into Array<{type:string;chunk:unknown}> narrows incompatibly. Switch to ShellOutputEvent[] (matches earlier helpers at lines 758/966) and discriminate the union via .type === 'data' when reading .chunk so the narrowed multibyte assertion still type-checks. * docs(structured-output): address doudouOUC's four review findings - Tighten JSON/stream-json paragraph: not all failures emit a result to stdout (exit 53 / exit 130 are stderr-only); check exit code first - Fix suppressed-sibling retry guidance: re-issue in a separate turn that does not include structured_output (avoids re-suppression) - Distinguish settings-deny (exit 53) from --exclude-tools (exit 1) in Permission gating section - Replace <projectDir> placeholder with actual path ~/.qwen/projects/<sanitized-cwd>/chats/<sessionId>.jsonl in both docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(structured-output): fix Permission gating — both deny paths strip registration Forward audit against source code found that the Permission gating section incorrectly distinguished settings.permissions.deny (claiming tool stays visible, exit 53) from --exclude-tools (claiming declaration stripped, exit 1). Both go through the same mergedDeny → isToolEnabled path and both prevent registration — the model never sees the tool. Corrected both docs to reflect the actual mechanism: typical outcome is plain text (exit 1), with maxSessionTurns (exit 53) as the fallback if the model loops through other tools. * docs(structured-output): address doudouOUC's May 17 review (5 items) - Clarify validation is client-side Ajv, not provider-side - Qualify "same way" with DeclarativeTool abstraction parenthetical - Match symptom→cause structure for maxSessionTurns hint - Expand $ref workaround with concrete $defs example - Clarify Dual Output See Also doesn't require --json-schema * docs(structured-output): address 2 unresolved design-doc suggestions 1. Privacy/redaction section: note hooks as intentionally non-redacted surface (matches user-doc "Hooks see raw args" callout). 2. Dual call-site section: clarify differing post-helper termination flow between main-turn (direct return) and drain-turn (sentinel hop). * docs(structured-output): address doudouOUC's May 17 review (2 nits) 1. Failure-paths table: align "three common causes" cell with the symptom→cause framing already used at parse-time validation pipeline section ("common stuck-run symptom and its two likely causes"). 2. Dual call-site section: fix factual inaccuracy from prior commit — `drainOneItem` is `async (): Promise<void>` and returns nothing. The two-hop termination is via closure-mutated `structuredSubmission` (set by `processToolCallBatch`, checked by `drainLocalQueue` and the holdback loop), not a return-value sentinel. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-17 23:10:34 +08:00
xMillogx	79f38e0701	fix: add cache limits to prevent OOM during build/test (#4188 ) * fix: add cache limits to prevent OOM during build/test * chore: remove intermediate OOM analysis document * fix(core): enforce MAX_TOTAL_PATHS cap when updating existing crawlCache key Before: !crawlCache.has(key) guard in the MAX_TOTAL_PATHS eviction loop short-circuited eviction when updating an existing key, allowing cache to grow beyond 50,000 paths (F1 bug - OOM protection bypassed). After: totalPaths is calculated excluding the current key, and the eviction loop protects the key being written from being evicted as "largest". FIFO bump (delete+set) ensures frequently updated keys move to end of queue. Per @wenshao review: MAX_CACHE_ENTRIES guard on line 59 is preserved (updating existing key doesn't increase entry count). * test(core): add cache eviction tests and fix MAX_TOTAL_PATHS loop guard * test(core): add eviction coverage for fileReadCache and crawlCache * test(core): unskip bumped entries eviction test now that upsert bump is implemented --------- Co-authored-by: Юрий Острожных <millog@mail.ru>	2026-05-17 23:01:23 +08:00
jinye	672de88a47	fix(serve): align integration test mirrors with merged capability + EventBus changes (#4245 ) - qwen-serve-routes.test.ts: expand expected features list to 24, adding slow_client_warning (#4237) and workspace_mcp/workspace_skills/ workspace_providers/session_context/session_supported_commands (#4241). Matches EXPECTED_STAGE1_FEATURES in server.test.ts:76-101. - qwen-serve-baseline.test.ts: update SSE backpressure assertion from 3 to 4 frames (tick, tick, slow_client_warning, client_evicted). PR #4237 changed EventBus to force-push a slow_client_warning synthetic frame when the per-subscriber queue reaches the 75% warn threshold, before the client_evicted terminal frame fires on overflow. Mirrors the unit test at eventBus.test.ts:103-122. Both integration mirrors drifted because integration tests only run on schedule / workflow_dispatch (release.yml:4-9), not PR CI. Fixes the release run 25992130532 failure in both Docker and No-Sandbox jobs. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)	2026-05-17 22:53:11 +08:00
jinye	60fe594e8f	feat(serve): add read-only status routes (#4241 ) * feat(serve): add read-only status routes Add read-only daemon status endpoints for workspace MCP, skills, providers, session context, and session supported commands. Expose matching typed SDK helpers and document the new additive v1 status surface. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(serve): harden read-only status snapshots Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(serve): address read-only status review feedback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-17 21:37:20 +08:00
jinye	aef35c390e	feat(serve): session metadata and close/delete lifecycle (#4175 Wave 2.5 PR 11) (#4240 ) * feat(serve): session metadata and close/delete lifecycle (#4175 Wave 2.5 PR 11) Add explicit session close and metadata management to the daemon serve infrastructure, closing the Stage 1 limitation that sessions could only end via child crash or daemon shutdown. - DELETE /session/:id — force-closes a live session (cancels active prompt, resolves pending permissions, publishes session_closed event) - PATCH /session/:id/metadata — update mutable displayName - Enriched GET /workspace/:id/sessions with createdAt, displayName, clientCount, hasActivePrompt - session_closed + session_metadata_updated SDK event types with validation, reducer, and terminal event priority - DaemonClient.closeSession / updateSessionMetadata + session client wrappers - Capabilities: session_close, session_metadata * fix(serve): address review feedback on session lifecycle PR - Fix JSDoc on closeSession: clarify that bridge throws SessionNotFoundError (SDK absorbs 404 for client-side idempotency) - Tighten event validators: isSessionClosedData checks closedBy type, isSessionMetadataUpdatedData checks displayName type - PATCH /session/:id/metadata now returns effective stored metadata instead of echoing request fields, avoiding ambiguous no-op responses - Only publish session_metadata_updated event when displayName changes - Update chooseTerminalEvent comment to reflect session_closed * fix: address PR 4240 review feedback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix: address remaining PR 4240 suggestions Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix: update serve sessions test mock Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-17 20:42:15 +08:00
JerryLee	9abd704e09	fix(cli): record mid-turn queued user prompts (#4215 )	2026-05-17 20:21:06 +08:00
jinye	4e06967c2b	feat(serve): mutation gating helper and --require-auth (#4236 ) Some checks are pending Qwen Code CI / Classify PR (push) Waiting to run Details Qwen Code CI / Lint (push) Blocked by required conditions Details Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions Details Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions Details Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions Details Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions Details Qwen Code CI / CodeQL (push) Blocked by required conditions Details E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run Details E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run Details E2E Tests / E2E Test - macOS (push) Waiting to run Details * feat(serve): mutation gating helper and --require-auth Implements issue #4175 Wave 4 PR 15. Adds the centralized state-changing-route gate that Wave 4 follow-ups (memory CRUD, file edit, MCP restart, device-flow auth) will reuse, plus the `--require-auth` deployment knob that hardens the loopback developer default for shared dev hosts / CI runners. - `createMutationGate({ tokenConfigured, requireAuth })` factory in serve/auth.ts — per-route middleware with a 4-cell behavior matrix: pass-through under `requireAuth` or any token configured; `401 token_required` for `strict: true` routes on no-token loopback defaults; baseline pass-through otherwise. - Existing Wave 1-2 mutation routes (POST /session, /session/:id/{load, resume,prompt,cancel,model}, /permission/:requestId) opt into the default non-strict factory call as the centralization marker. Wave 4 routes will pass `{ strict: true }` to require a token even on loopback. - `--require-auth` CLI flag + `ServeOptions.requireAuth`. Boot refuses without a token; closes the `/health` exemption when on so loopback `/health` also requires bearer auth; stderr breadcrumb so the hardened mode is visible in journald/docker logs. - Conditional `require_auth` capability tag advertised only when the flag is on. New `CONDITIONAL_SERVE_FEATURES` registry primitive so future per-deployment toggles follow the same shape. - 5 new unit tests in auth.test.ts covering the gate matrix; 5 added in server.test.ts for capability advertisement, conditional tag, /health 401 under --require-auth, and runQwenServe boot refusal + happy path. 245/245 serve tests pass; typecheck + eslint clean. Refs: #4175 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4236 review feedback Three small follow-ups from the automated reviewers on PR #4236: 1. Drop misleading `--require-auth` from `token_required` error message (Copilot inline auth.ts:262). The strict-mode 401 listed three remediations but `--require-auth` is paired-required with a token at boot — naming it standalone would loop the operator into a different boot error. Keep the two valid standalone fixes (env var, --token); add inline note explaining the omission. `auth.test.ts` regex updated to `not.toMatch(/--require-auth/)` to anchor the new wording. 2. Mention `/health` gating in `--require-auth` CLI description (auto-reviewer Medium #2). Operators flipping the flag without reading the protocol doc would get paged when k8s/Compose probes start 401-ing. One sentence in the yargs description prevents that. 3. Drift insurance comment between registry and `CONDITIONAL_SERVE_FEATURES` (auto-reviewer Low #3). Document the four-step procedure for adding a new conditional tag so a future contributor doesn't update only the registry and silently advertise the tag unconditionally. Notes the Map<predicate> refactor as the right move when a second tag lands. Deferred (not in this fix-up): - Module-level PASSTHROUGH singleton (High #1) — micro-optimization, unmeasurable. - Map<feature, predicate> for conditional features (High #2) — premature abstraction with one tag. - Per-route `// non-strict marker` comments (Medium #1) — noise. - `@see` cross-ref in types.ts (Low #2) — sugar. - JSDoc bullet-list vs table (Low #1) — current format is fine. Refs: #4175 #4236 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fixup(serve): address PR #4236 round-2 review feedback Five small follow-ups from @wenshao + DeepSeek (via Qwen Code /review) on PR #4236: 1. Map<predicate> refactor for `CONDITIONAL_SERVE_FEATURES` (review threads #3254467192 + #3254485912). Two reviewers asked for the same shape on the grounds that the `Set` + per-feature `if`-branch needed FOUR coordinated changes per new conditional tag and silently fail-CLOSED when the branch was missed. The Map collapses the predicate-decision and the set-membership into one entry per feature — adding a new conditional tag is now two coordinated changes (registry + Map entry) and a missing predicate is a TypeScript error rather than a silent omission. JSDoc updated. 2. Drift-insurance test that iterates `CONDITIONAL_SERVE_FEATURES` (review thread #3254467192 option 1, layered on top of #1). `server.test.ts` now walks every Map entry and asserts the predicate accepts/rejects as expected; future entries that don't add an assertion branch fail the test loudly so a missing predicate cannot ship silently. Adoption-of-record for the Map shape rather than relying on a hand-maintained invariant. 3. Cache `strictDenier` for allocation symmetry (review thread #3254467193). Wave 4 PRs will mount strict mode on multiple routes; without the cache each `mutate({strict:true})` call would allocate a fresh 401 closure. Now both the passthrough and the strict denier are pre-built singletons. Identity assertion in `auth.test.ts` anchors the cache so a future change that loses it surfaces in CI. 4. Doc cosmetic — extra blank line in qwen-serve.md (review thread #3254467198). Single blank line between the `>` quoted example and the following non-quoted bash block now. 5. Doc correctness — `require_auth` is post-auth confirmation (review thread #3254485910 from DeepSeek). When `--require-auth` is on, the global `bearerAuth` middleware gates every route including `/capabilities`, so an unauthenticated client cannot pre-flight `caps.features` to discover that auth is required — the discovery surface is the 401 response body itself. Both `qwen-serve.md` and `qwen-serve-protocol.md` rewritten to describe the tag as a post-authentication confirmation, matching the auth.ts JSDoc which already stated this correctly. Trade-offs documented (no code change): - Body-parser ordering (review thread #3254485915 from DeepSeek) noted as a comment block in `auth.ts`. Strict-mode 401 fires AFTER `express.json()` because the gate is per-route middleware. On loopback no-token defaults a strict route therefore parses the request body before refusing it — bounded by `express.json({limit: '10mb'})` × `--max-connections` (256 default). Strict routes Wave 4 actually adds carry small bodies in legitimate use, so this isn't a production hot path. Future routes accepting large bodies should lift the gate to app-level (maintain a strict-path Set in `createServeApp`); flagged as a Wave 4 follow-up rather than re-architecting the helper. - `bearerAuth` body-shape inconsistency (review thread #3254467197 from @wenshao) flagged as a Wave 4 cross-PR follow-up. `bearerAuth` returns `{error: 'Unauthorized'}` while the strict gate returns `{code: 'token_required', error: '...'}`; SDK clients have to branch on both shapes. Standardizing `bearerAuth` to also carry a `code` field is orthogonal to this PR's scope. Validation: 260/260 cli serve tests pass (was 258 — added the drift insurance test + strict denier identity test); typecheck + eslint clean. Refs: #4175 #4236 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-17 20:10:37 +08:00
易良	eef06ce376	feat(cli): add structured memory diagnostics JSON (#3785 ) * feat(cli): add memory diagnostics doctor command * fix(core): platform-aware maxRSS conversion and accurate risk message - Extract platform detection before building diagnostics so the correct unit conversion can be applied: multiply by 1024 on Linux (where process.resourceUsage().maxRSS is in KB) but leave the value unchanged on macOS/Windows (where it is already in bytes). - Correct the native-memory-pressure risk message to accurately state that the threshold is 2× heap used, not just "larger than heapUsed". - Add a dedicated test to assert that maxRSS is not multiplied on a non-Linux platform (darwin). All 3 core and 9 CLI tests pass; typecheck clean. Agent-Logs-Url: https://github.com/QwenLM/qwen-code/sessions/9b413337-68ed-4d5c-af99-0d42378900c3 * test(core): cover active request memory risk * fix(cli): address memory diagnostics review feedback * fix(cli): harden memory diagnostics review fixes * fix(memory-diagnostics): tighten risk thresholds and expand readable output - Add 64MB absolute floor on native-memory-pressure so cold processes don't trip the 2x ratio check; raise active-handles threshold from 100 to 256 - Show detachedContexts, nativeContexts, maxRSS, CPU times, smapsRollup availability, and v8HeapSpaces summary in the readable /doctor memory output - Validate unknown memory subcommand args with a usage hint instead of silently dropping them - Wrap human-readable strings in t(...) for i18n parity with the rest of doctor - Advertise the memory subcommand via /doctor argumentHint while keeping acceptsInput false so the parent still auto-submits - Document _getActiveHandles/_getActiveRequests as undocumented Node internals - Update tests for new thresholds, expanded output, unknown-arg path, and abort-during-json * fix(cli): harden memory doctor diagnostics * fix(core): correct maxRSS byte handling and heapRatio consistency - Remove incorrect * 1024 multiplier for maxRSS on Linux (Node.js >=14.10 returns bytes on all platforms) - Use v8HeapStats.usedHeapSize for heapRatio to avoid cross-API inconsistency - Update test expectations and rename "does not multiply" test * fix(cli): resolve rebase conflicts in memory diagnostics - Rename local formatMemoryDiagnostics to formatCoreDiagnostics to avoid naming conflict with the imported utility from memoryDiagnostics.js - Update Session.test.ts to use objectContaining for _meta field added in recent main commits - Align doctorCommand.test.ts assertions with current parent command state (argumentHint includes --sample/--snapshot from main) * fix(core): use null instead of undefined for optional probes, deduplicate active count helpers - optionalProbe/optionalSyncProbe now return null on failure so JSON.stringify preserves the keys instead of silently omitting them. - Merge getActiveHandlesCount/getActiveRequestsCount into a single parameterized getProcessInternalCount helper. - Update MemoryDiagnostics interface: v8HeapSpaces, openFileDescriptors, smapsRollup are now T \| null instead of T \| undefined. * fix(cli): finish memory diagnostics review fixes * fix(cli): address memory diagnostics review feedback --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-17 19:52:46 +08:00
Yan Shen	9985d91e08	feat(cli): add configurable plansDirectory for Plan Mode (#4062 ) * feat(cli): add configurable plansDirectory for Plan Mode Add a plansDirectory setting that allows users to define a custom directory for approved Plan Mode files. Relative paths are resolved against the project root and validated to prevent path traversal. - Storage: add isPathWithinDirectory() with realpathSync-based symlink resolution to prevent traversal bypass attacks (direct, intermediate, and cross-drive) - Config: cache plansDir at construction time, use atomic write (write-temp then rename) to prevent corrupted plan files on crash - CLI: respect bareMode by clearing plansDirectory in minimal mode - Docs: document plansDirectory with requiresRestart and gitignore hint - Tests: 26 new tests covering path validation, symlink attacks (direct and intermediate), Windows cross-drive paths, mixed separators, and configuration integration Closes #3548 * fix(core): align symlink test with return value * fix(core): harden plans directory handling * fix(config): address PR #4062 review findings for plansDirectory - Handle EXDEV during atomic plan writes (cross-device rename fallback) - Sanitize session IDs to prevent path traversal in plan filenames - Expand tilde (~) in configured plansDirectory paths - Preserve plansDirectory in bare mode - Add EACCES/EPERM handling to getPlanFileNames with user-visible warnings - Close TOCTOU gap with post-write path containment validation - Fix docs to clarify plansDirectory is a top-level key - Add happy-path I/O tests for configured plansDirectory	2026-05-17 19:43:24 +08:00
jinye	d2d426fad0	feat(serve): SSE replay sizing + slow_client_warning backpressure (#4175 Wave 2.5 PR 10) (#4237 ) * feat(serve): SSE replay sizing + slow_client_warning backpressure #4175 Wave 2.5 PR 10. Closes the SSE replay / backpressure knobs called out in #3803 §02 so chatty Stage 1 sessions get an honest reconnect window and operators get a heads-up signal before clients are summarily evicted. - `DEFAULT_RING_SIZE` 4000 → 8000. Per-session replay ring depth now matches the #3803 §02 target for chatty sessions. - `--event-ring-size <n>` CLI flag (default 8000) lets operators tune the ring per daemon. Threaded `ServeOptions` → `BridgeOptions.eventRingSize` → both `new EventBus()` construction sites (fresh sessions + restore path). Validation is fail-CLOSED (positive finite integer; 0 / NaN / negative throw at boot). - `slow_client_warning` SSE frame. When a subscriber's queue crosses 75% full the bus force-pushes a synthetic `slow_client_warning` to that subscriber once per overflow episode, carrying `{queueSize, maxQueued, lastEventId}`. The flag re-arms after the queue drains below 37.5% (hysteresis, no flap near threshold). If the queue actually overflows after the warning, the existing `client_evicted` terminal frame path still fires. Like `client_evicted`, the warning has no `id` (synthetic frame; must not burn a sequence slot for other subscribers). - `?maxQueued=N` query param on `GET /session/:id/events` (range `[16, 2048]`, default 256). Lets cold reconnect clients pre-size their per-subscriber backlog so a large `Last-Event-ID: 0` replay doesn't trip the warning on the first publish. Range rationale: lower bound 16 (smaller is useless for any replay); upper bound 2048 (so a single subscriber can't pin ~1 MB just by asking). Out-of-range / non-decimal returns `400 invalid_max_queued` BEFORE opening the SSE stream — clean 4xx beats half-opening a stream + emitting a `stream_error` (which EventSource would auto-reconnect on). - `slow_client_warning` capability tag — single source of truth for the warning frame + `?maxQueued` query param + ring-size knob. Old daemons silently lack all of these; pre-flight via `caps.features`. - SDK extensions (`@qwen-code/sdk`): typed `DaemonSlowClientWarningEvent` (added to known event union and `DaemonStreamLifecycleEvent`); schema-validated by a new `isSlowClientWarningData` predicate; reducer (`reduceDaemonSessionEvent`) increments `slowClientWarningCount` + stores `lastSlowClientWarning`. Warning is non-terminal — `alive` stays true (only `client_evicted` / `stream_error` / `session_died` close the stream). Re-exported from the public SDK entry. - Docs: `qwen-serve-protocol.md` updates the features list (adds `slow_client_warning` and the previously-missing `client_identity` to match reality post-#4231), documents the `?maxQueued` query param, adds the warning frame to the event table, and notes the new default ring size. `qwen-serve.md` adds the `--event-ring-size` flag row. Tests: 19 eventBus (4 new: warning at 75%, once per episode, no `id` on the synthetic frame, hysteresis re-arm), 106 bridge (2 new: validate eventRingSize accept/reject), 111 server (4 new: ?maxQueued accept/absent/non-decimal/out-of-range + EXPECTED_STAGE1_FEATURES update), 14 SDK daemonEvents (2 new: schema validation + non-terminal reducer behavior). 321 focused tests total, all green. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * refactor(serve): adopt PR #4237 review feedback (eventBus polish) Address the actionable items from the Qwen Code review bot's pass on PR #4237: - Pre-compute `warnThreshold` / `warnResetThreshold` per `InternalSub` at `subscribe()` time so `publish()`'s per-event hot path is one integer compare per subscriber instead of a multiply + compare. The `!warned` short-circuit still collapses the steady state to a single boolean read; this just shaves a multiply when the threshold check actually fires. - Document the back-of-queue ordering choice for the synthetic `slow_client_warning` frame in `EventBus.publish()`: front-push was considered but mid-stream front-insertion would mis-count `forcedInBuf` in `BoundedAsyncQueue.next()`, and `forcePush` already short-circuits via `resolvers.shift()` for the active-consumer case — the back-of-queue path only matters for stalled consumers, who can't drain regardless of warning position. - Reuse the existing `collect()` helper in the "default ring size 8000" test for consistency with the rest of the file; the new test also tightens the assertion by checking that the first retained event id is 2 (id=1 dropped by the ring) and the last is 8001. - Soften the "~500 B per session" magic number in `BridgeOptions.eventRingSize`'s JSDoc to a qualitative description (each retained `BridgeEvent` is a reference plus its serialized payload; ceiling scales as `ringSize × average-event-size`). Rejected: - Bot's claim that the error JSON contains `\`...\`` escape sequences — bot misread the JS template-literal source as the wire output; `JSON.stringify` does not escape backticks, and the existing `cwd` error messages use the same style. - Bot's "use `Record<string, never>` instead of `[key: string]: unknown`" suggestion on `DaemonSlowClientWarningData` — every other event-data type in `sdk-typescript/src/daemon/events.ts` carries the same index signature for additive-field compatibility. - Bot's "features list breaks alphabetical order" — the capability list is grouped by protocol lifecycle (health → capabilities → session lifecycle → events → permissions), not alphabetical. Tests: 139 focused tests across eventBus + httpAcpBridge + SDK daemon events — all passing. Behavior unchanged; this is hot-path micro-opt + comment polish only. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve): correct queue tagging + plumb maxQueued through SDK Address both P2 findings from the Codex review pass on PR #4237. Bug 1: `BoundedAsyncQueue.forcedInBuf` position-invariant break The previous `forcedInBuf` counter only tracked LIVE-vs-FORCED correctly when all forced entries lived at the FRONT of the buffer (subscribe-time `Last-Event-ID` replay). The new mid-stream `slow_client_warning` path force-pushes to the BACK of the queue while the queue is still open, which the existing accounting was not designed for: - publish 6 events at maxQueued=8 → 75% threshold trips → force-push warning at the back → buf=[1..6, warning], forcedInBuf=1. - consumer shifts `1` → forcedInBuf decremented to 0 (incorrect: `1` was a live frame, not the forced one). - consumer drains 2..6 + warning → buf=[], forcedInBuf=0, true live count = 0, but `size` getter and `push()` cap check then use `buf.length - forcedInBuf` which drifts over subsequent refills, causing premature warn / eviction before the cap is actually reached. Replace the position-dependent counter with a per-entry `{value, forced}` tag. `liveCount` is incremented in `push()` / decremented in `next()` only when the shifted entry was non-forced — position becomes irrelevant. `size` getter returns `liveCount` directly. The class doc comment is rewritten to call out that the new tag is the position-independent replacement for the old "forced frames must stay at the front" invariant. Regression test in `eventBus.test.ts` reproduces the codex trace (warn at 75%, drain past warning, refill to cap) and asserts no premature eviction. Bug 2: SDK does not expose `?maxQueued` `docs/users/qwen-serve.md` and `docs/developers/qwen-serve-protocol.md` both document `?maxQueued=N` as something SDK clients can request, but `SubscribeOptions` on `DaemonClient` only declared `lastEventId` + `signal`, and `subscribeEvents()` always fetched `/events` without a query string. Typed-SDK consumers had no way to opt in without hand-crafting URLs. - Add `SubscribeOptions.maxQueued?: number` with JSDoc noting the daemon range `[16, 2048]` and the pre-flight requirement on `caps.features.slow_client_warning`. - `DaemonClient.subscribeEvents` builds the URL with an optional `?maxQueued=<n>` segment. No client-side range validation — the daemon's `parseMaxQueuedQuery` is the source of truth and returns structured `400 invalid_max_queued`; duplicating the bounds in two layers would diverge on the next tweak. - `DaemonSessionSubscribeOptions extends SubscribeOptions` so the new field flows through `DaemonSessionClient` automatically. Three new SDK tests: - subscribeEvents appends `?maxQueued=N` when set - omits the query string when absent (existing behavior preserved) - propagates a `400 invalid_max_queued` unchanged Tests: 214 focused tests across eventBus / bridge / SDK DaemonClient / DaemonSessionClient / daemonEvents, plus 111 in the server suite. All green; the new eventBus regression case proves the position-invariant fix. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * refactor(serve): adopt PR #4237 copilot review feedback Address 6 of 8 copilot-reviewer findings on PR #4237; the other 2 (#1 forcedInBuf live-size corruption, #5 SDK lacks maxQueued) were already fixed in `bae42c88b` — replied on the threads with the commit hash. - [2] server.ts:1068 — `?maxQueued=` (present-but-empty) now fails closed with `400 invalid_max_queued` instead of silently falling back to the default queue cap. The API documents fail-closed for any malformed value before opening SSE, so an empty string is unambiguously malformed. New server.test.ts case locks this in. - [3] commands/serve.ts:93 — CLI help text for `--event-ring-size` no longer mis-shapes `Last-Event-ID` as a query parameter. It is an HTTP header, and the daemon's SSE route does not parse a `?Last-Event-ID=` query. - [4] docs/developers/qwen-serve-protocol.md:351 — clarify that `?maxQueued=N` controls the LIVE-event backlog cap. Replay frames are force-pushed and exempt from the cap; what consumes it is live events that arrive while the subscriber is still draining a cold-reconnect replay. Bumping for cold reconnects is still the right answer, but for the live tail, not for the replay frames themselves. - [6] eventBus.ts:214 — stale `ringSize=4000` performance comment updated to the new `ringSize=8000` default with a note about the O(n) `shift()` cost scaling. - [7] sdk-typescript events.ts:492 — `isSlowClientWarningData` now uses the existing `isFiniteNumber` helper instead of bare `typeof === 'number'`. Mirrors the sibling predicates and rejects `NaN` / `Infinity` payloads as schema garbage. New daemonEvents.test.ts assertions cover both. - [8] server.ts:127 — `createServeApp`'s default-bridge construction now also forwards `opts.eventRingSize` to `createHttpAcpBridge`, symmetric with the `runQwenServe.ts` path. Direct embeds / tests that called `createServeApp` without supplying their own bridge but did pass `ServeOptions.eventRingSize` were silently getting the default 8000 ring. Tests: 326 focused tests across eventBus / bridge / SDK DaemonClient / DaemonSessionClient / daemonEvents / server. All green; the new server.test.ts case + the extended daemonEvents.test.ts assertions cover the tightened guards. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * refactor(serve): adopt PR #4237 wenshao round-2 review feedback Six adopted findings from @wenshao's second review pass on PR #4237. The seventh ([10] forcedInBuf 3rd case invariant) was already fixed in `bae42c88b` — replied on that thread. - [9] + [14] server.ts — Sanitize attacker-controlled values before stderr interpolation in both `parseMaxQueuedQuery` and `parseLastEventId`. New `safeLogValue()` helper uses `JSON.stringify` to escape control characters (`\n`/`\r`/…) so a URL-encoded newline in `?maxQueued=%0a` can't inject extra log lines into journald/Loki/Splunk pipelines. Matches the `workspace_mismatch` sanitization style in `sendBridgeError`. Fixed in both helpers (the sibling pre-existing `parseLastEventId` had the same shape) so the file stays consistent. - [11] httpAcpBridge.ts — `!Number.isFinite(eventRingSize)` was redundant: `Number.isInteger(NaN)` and `Number.isInteger(Infinity)` both return `false`, so the sibling `!Number.isInteger` already catches both. Drop the dead guard. - [12] httpAcpBridge.ts — Add soft upper bound `MAX_EVENT_RING_SIZE = 1_000_000` on `eventRingSize` to catch operator typos (`--event-ring-size 80000000` vs `8000000`). At ~500 B per `BridgeEvent` an 1M-frame ring already pins ~500 MB per session — well past any realistic workload. Not a security boundary (operator-controlled flag), pure typo defense. Existing bridge construction test extended with an `80_000_000` case. - [13] commands/serve.ts — CLI `--event-ring-size` flag now sources its default from `DEFAULT_RING_SIZE` (imported from `serve/eventBus.js`) instead of the hardcoded literal `8000`. Without this, a future bump of the bus default would silently not take effect for daemons launched through the CLI because the flag always overrides — single source of truth fixes that. - [15] eventBus.ts — Drop unreachable `event.id ?? this.lastEventId` fallback in the `slow_client_warning` frame. `event` is locally constructed at the top of `publish()` with `id: this.nextId++` and is guaranteed defined. Use `event.id as number` directly + an inline note about the invariant. Tests: 197 (eventBus 20 / bridge 107 / SDK DaemonClient 57 / SDK daemonEvents 14) + 112 server. All green; the new upper-bound bridge case + the existing log assertions pin the changed behaviors. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-17 19:30:43 +08:00
JerryLee	6a1d55c737	fix(core): apply tool name migrations at dispatch (#4213 )	2026-05-17 19:04:08 +08:00
jinye	0a4a08e443	feat(serve): add client heartbeat (#4175 Wave 2.5 PR 9) (#4235 ) * feat(serve): add client heartbeat route Adds POST /session/:id/heartbeat plus SDK helpers so long-lived adapters (TUI/IDE/web) can refresh the daemon's last-seen bookkeeping. Bridge stores per-session and per-client timestamps behind a getHeartbeatState() snapshot accessor that PR 12 read-only diagnostics and PR 24 revocation policy will consume. - Capability tag: client_heartbeat (advertised on /capabilities.features) - Identified clients must echo X-Qwen-Client-Id; the bridge validates the id BEFORE bumping any timestamp so a forged id can't mask client absence - Per-client entries are dropped together with the registration ref-count in unregisterClient, so churn doesn't leak stale ids - getHeartbeatState returns a snapshot Map; mutating it does not leak into bridge state - Anonymous heartbeats bump only the per-session watermark Errors mirror the rest of the routes — 404 SessionNotFoundError, 400 invalid_client_id (header malformed or unknown for this session). Roadmap PR 9 from #4175. Depends on PR 7 (#4231 client identity, merged) for the trusted clientId registry. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * feat(sdk): re-export HeartbeatResult from package root The published @qwen-code/sdk only exposes the root entrypoint via `exports`; daemon subpath imports are not part of the public API. Adding HeartbeatResult to packages/sdk-typescript/src/daemon/index.ts made it reachable internally but not for downstream consumers writing `import type { HeartbeatResult } from '@qwen-code/sdk'` — every other daemon result type (PromptResult, SetModelResult, DaemonSession, etc.) is forwarded through the root barrel, so HeartbeatResult was the only hole in the heartbeat helper's public surface. Inserted alphabetically between DaemonStreamLifecycleEvent and KnownDaemonEvent to match the existing ordering convention.	2026-05-17 18:57:28 +08:00
jinye	07e0e82258	feat(serve): advertise typed_event_schema + pin SDK public surface (#4175 PR 4 follow-up) (#4226 ) * feat(serve): advertise typed_event_schema capability Follow-up to #4217 (`feat(protocol): add typed daemon event schema v1`, Wave 1 PR 4 of #4175), which landed the SDK-side typed schema + `KnownDaemonEvent` union + reducer but did not register a daemon-side capability tag for it. Without the tag, non-SDK clients (web debug UI, third-party adapters, channel/IDE backends not yet on `@qwen-code/sdk`) have no way to detect at the protocol envelope level that the daemon promises to emit only `KnownDaemonEvent`-shaped frames — they would either pin against SDK version, or pre-flight every frame defensively. Add `typed_event_schema: { since: 'v1' }` to `SERVE_CAPABILITY_REGISTRY`, inserted right after `session_events` (the route that delivers the frames whose schema this tag describes). The capability is purely informational — `narrowDaemonEvent`/`asKnownDaemonEvent` already fall back to "unknown" for older daemons that don't advertise the tag, so the SDK does not gate any behavior off the tag. Sync `EXPECTED_STAGE1_FEATURES` (server.test.ts) and the integration test array (qwen-serve-routes.test.ts) with the registry order, the same lockstep discipline #4214 codified. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * test(sdk): pin typed event surface at the public SDK entry, point DaemonSessionClient docstring at it Two small follow-ups to #4217 (Wave 1 PR 4 of #4175). 1. Public-entry regression fence `@qwen-code/sdk` is a single-entry package: `package.json.exports` only exposes `.` (`dist/index.{cjs,mjs,d.ts}`), and the bundle is built from `src/index.ts`. Symbols re-exported only from `src/daemon/index.ts` are unreachable to consumers unless they are also forwarded by `src/index.ts`. #4217 forwards the typed event schema correctly today, but the two-layer chain has no compile-time test pinning it — a future daemon export that lands in `src/daemon/index.ts` but is missed by `src/index.ts` would ship invisibly. Add `test/unit/daemon-public-surface.test.ts` that imports `* as Public from '../../src/index.js'`, asserts at runtime that every PR 4 value is `typeof === 'function'` (or a primitive of the expected shape), round-trips a raw `DaemonEvent` through the public `asKnownDaemonEvent` to prove the wire-up actually works, and compile-imports every PR 4 type so any drift fails to build. 2. DaemonSessionClient docstring pointer The class docstring already deferred typed event consumption to "the protocol schema layer" without a concrete pointer. Now that #4217 has put `asKnownDaemonEvent` and `reduceDaemonSessionEvent` in `./events.js`, name them so future readers can find the typed surface without grepping. No code change. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)	2026-05-17 18:43:38 +08:00
易良	ab03badaae	fix(core): extend DashScope provider detection with additional hostname rules (#4157 ) * fix(core): extend DashScope provider detection with additional hostname rules Add internal domain patterns (.dw.alibaba-inc.com and .data.aliyun-inc.com) to isDashScopeProvider() so that DashScope-compatible private gateways are recognized without requiring DASHSCOPE_PROXY_BASE_URL. Closes #4138 * feat(core): add providerType field for explicit DashScope provider opt-in Allow modelProviders entries to declare providerType: 'dashscope' so that custom endpoints on private gateways are recognized as DashScope-compatible without relying solely on hostname detection. Closes #4138 * revert(core): remove DashScope providerType opt-in * refactor(core): broaden dashscope internal domain match to .alibaba-inc.com / .aliyun-inc.com Drop the business-specific dw./data. subdomain checks and treat any .alibaba-inc.com or .aliyun-inc.com host as DashScope-compatible. If a future internal service under these domains is not compatible, narrow it via a runtime provider property rather than re-introducing business-name subdomains. * fix(core): clarify internal dashscope origin matching * docs(core): clarify DashScope internal host matching	2026-05-17 18:36:42 +08:00
kkhomej33-netizen	a5e4839e07	fix(cli): restore ACP prompt counter on resume (#4233 )	2026-05-17 18:32:44 +08:00
Shaojin Wen	0240c310fd	feat(core): PR-2.5 — post-promote stream redirect + natural-exit registry settle (#3831 follow-up) (#4102 ) * feat(core): PR-2.5 — post-promote stream redirect + natural-exit registry settle Closes the two limitations PR-2 (#3894) deferred for the Phase D part (b) Ctrl+B promote flow (#3831): 1. Post-promote stream redirect: today the `bg_xxx.output` file is frozen at promote time because `ShellExecutionService` detaches its data listener as part of PR-1's ownership-transfer contract. PR-2.5 wires a caller-side `onPostPromoteData` callback so bytes from the still-running child append to the file via an `fs.createWriteStream` opened in `handlePromotedForeground`. 2. Natural-exit registry settle: today the registry entry stays `'running'` until `task_stop` / session-end `abortAll` fires its abort listener. PR-2.5 wires `onPostPromoteSettle` so natural child exit transitions the entry to `'completed'` / `'failed'` with the right exitCode / signal / error message. - New exported types: `ShellExecuteOptions`, `ShellPostPromoteHandlers`, `ShellPostPromoteSettleInfo`. - `execute()` options bag now accepts `postPromote?: { onData, onSettle }`. Threaded through to both `executeWithPty` and `childProcessFallback`. - PTY's `performBackgroundPromote` (line ~1159): after disposing the foreground data + exit + error listeners, RE-ATTACH minimal forwarders that call `postPromote.onData` / `postPromote.onSettle` when the caller opted in. Backwards compat: when `postPromote` is unset the PR-2 detach-everything contract is preserved (the re-attach is gated on each callback being defined). - `childProcessFallback`'s `performBackgroundPromote` (line ~706): same pattern — re-attach `stdout.on('data', ...)`, `stderr.on('data', ...)`, `child.once('exit', ...)`, `child.once('error', ...)` when the caller opted in. `error` listener routes through `onSettle` with `error` populated, so spawn-side errors after the foreground errorHandler detached don't crash the daemon via the default unhandled `'error'` event. - Both paths wrap caller callbacks in try/catch so a thrown handler doesn't crash the child's data loop / unhandled-rejection the service. - New `PromoteArtifacts` type — slots shared between the foreground `execute()` postPromote handlers (which fire on the service side as soon as promote happens) and the post-resolve `handlePromotedForeground` finalizer (which runs after `await resultPromise` returns). The two race; the buffer + settle-queue absorb that race so neither chunks nor the eventual exit info are lost. - `executeForeground` wires `postPromote` handlers that route data to either `promoteArtifacts.stream` (if open) or `promoteArtifacts.buffer` (drained when the stream opens), and queue settle info if the wired handler isn't yet installed. - `handlePromotedForeground` opens `fs.createWriteStream(outputPath, { flags: 'w' })`, writes the initial snapshot first, drains the buffer, then registers the entry and wires `onSettleWired` with the full registry decision table: - `error` set → `registry.fail(shellId, error.message, endTime)` - `exitCode === 0` → `registry.complete(shellId, 0, endTime)` - non-zero exitCode → `registry.fail(shellId, "Exited with code N", endTime)` - signal !== null → `registry.fail(shellId, "Terminated by signal N", endTime)` - all-null fallback → `registry.fail(shellId, "Exited with unknown status", endTime)` - Fires queued settle synchronously after wiring so a fast command that exits between promote and finalizer doesn't get lost. - Self-audit catch: closes the output stream on the `registry.register` throw path so the FD doesn't leak past the orphan-child kill. - 3 new in `shellExecutionService.test.ts`: - `post-promote bytes route to postPromote.onData when callback provided` - `postPromote.onSettle fires on natural child exit after promote` - `backwards compat: without postPromote, listeners stay fully detached` - 3 new in `shell.test.ts` under a `foreground → background promote PR-2.5` describe block: - `post-promote bytes APPEND to bg_xxx.output via write stream` - `natural child exit transitions registry entry to "completed"` - `non-zero exit / signal / error → "failed" with descriptive message` - Bulk-replaced 50 prior `{},` (empty 6th-arg shellExecutionConfig) with `expect.objectContaining({}),` + added `expect.objectContaining({ postPromote: expect.any(Object) }),` as the 7th-arg expectation for the foreground execute call. - Updated the existing `registers a bg_xxx entry on result.promoted` test to assert on `fs.createWriteStream` + `stream.write` instead of the now-removed `fs.writeFileSync` snapshot path. 182/182 shell.test.ts pass + 73/73 shellExecutionService.test.ts pass + 111/111 coreToolScheduler.test.ts pass + 60/60 AppContainer.test.tsx pass; tsc + ESLint clean. Self-audit: 3 rounds (positive / reverse / cross-file) found one issue — output stream FD leak on `registry.register` throw — and fixed it before flagging complete. All flagged edge cases (stream errors, child-exits-before-wire-up race, task_stop during natural- exit window, promote-never-happens cleanup, backwards compat without callbacks) have explicit handling and / or test pinning. * fix(core): #4102 review wave — 3 Critical + UTF-8 + tests 3 Critical race/correctness issues + 1 multibyte-corruption suggestion + 3 test coverage gaps addressed: Critical 1 — child_process late-chunk drop (service) Settle was fired on 'exit', but stdout/stderr can emit buffered data between 'exit' and 'close'. Late chunks landed in `promoteArtifacts.buffer` after shell.ts had already closed the stream + transitioned the registry → silently dropped → truncated `bg_xxx.output`. Switched to listening on 'close' which guarantees all stdio is fully drained. (code, signal) payload is identical to 'exit', just with proper ordering. Critical 2 — stream-flush wait before registry transition (shell) `stream.end()` is asynchronous; pending writes can still be in the libuv queue when it returns. The old code transitioned the registry immediately after `.end()`, so a /tasks consumer could observe a `completed` entry and read the output file BEFORE the trailing bytes were on disk. Fixed: wired settle now `stream.once('finish', ...)` BEFORE calling `registry.complete/fail`. `error` event also short-circuits to the transition so a late ENOSPC doesn't hang the settle path forever. Critical 3 — stream-open-fail buffer leak (shell) If `fs.createWriteStream` threw, the catch path set `stream = null` but the foreground `onData` handler would still take the `stream === null` branch and push chunks into `promoteArtifacts.buffer` — unbounded growth under a sustained child whose output file couldn't be opened. Added a `streamFailed: boolean` latch on `PromoteArtifacts`. When set, `onData` drops chunks (with a debug log) instead of buffering. The catch branch sets the latch. Suggestion — shared TextDecoder corrupts multibyte UTF-8 (service) child_process post-promote used ONE TextDecoder for both stdout AND stderr. The decoder's continuation-byte state machine assumes one byte source; interleaved multibyte chunks corrupted. Now uses separate decoders + flushes both with `decode()` (no `stream: true`) on settle so trailing bytes surface as their final characters. Suggestion — llmContent reflects already-settled status (shell) When the queued-settle drain transitions the registry synchronously (fast-exit race), the model-facing copy was still saying "Status: running. … task_stop({...})". Updated to branch on `postPromoteAlreadySettled` / `postPromoteFinalStatus` — when the process is already gone, the copy says "Status: completed/failed" and replaces the `task_stop` suggestion with "Process has already exited; no `task_stop` needed". Suggestion — test coverage gaps Added: (a) `queued-settle race: onSettle BEFORE handlePromotedForeground completes` — custom service impl fires onSettle synchronously before resolving the promote promise, pins the drain path. (b) child_process post-promote tests for stdout/stderr forwarding + 'close'-not-'exit' settle + spawn-error settle. Self-audit: Round 1 + reverse audit. Stream.once mock added to fire 'finish' synchronously so existing tests don't hang on the new flush wait. 76/76 shellExecutionService.test.ts (+3) + 183/183 shell.test.ts (+1) pass; tsc + ESLint clean. * fix(core): #4102 review wave-2 — 3 more C1 (shell.ts:2227): the WriteStream `'error'` event handler only logged. `fs.createWriteStream` reports common open failures (ENOENT / EACCES / ENOSPC) asynchronously via that event rather than throwing. Result: `promoteArtifacts.stream` kept pointing at the failed stream; `onSettleWired` attached a `.once('finish')` listener that would never fire → registry stuck on `running` forever. Latch the failure (null the shared `stream` slot, set `streamFailed`); `onSettleWired`'s existing `if (!stream)` branch then transitions the registry immediately. C2 (shellExecutionService.ts:1468): the promote handoff removes the foreground `ptyErrorHandler` and only re-attaches data + exit listeners. A subsequent PTY `error` event had no listener — Node treats an unhandled `error` from an EventEmitter as a fatal exception that takes the whole CLI down. Attach a post-promote forwarder that ignores expected PTY read-exit codes (EIO / EAGAIN, same filter the foreground handler uses) and routes unexpected errors through `postPromote.onSettle` with `error` populated. Single-fire latch shared with `onExit` so settle never fires twice. C3 (shell.ts:2503): `onSettleWired` waits for the stream's asynchronous `'finish'` event before flipping `postPromoteAlreadySettled`, but the model-facing `statusLine` was built immediately after invoking `onSettleWired` on the queued settle. A fast-exited promoted command could therefore land "Status: running" + a `task_stop` instruction in production even though settle was already observed. Split into two flags: `postPromoteSettleObserved` (set synchronously when settle is classified) drives the model copy; the registry transition stays behind the stream flush. Tests: +1 PR-2.5 wave-2 PTY error-routing test; +2 shell.ts tests (stream open async error → registry still transitions; async `'finish'` after queued-settle drain → llmContent says 'completed' before registry transition fires). * fix(core): #4102 review wave-3 — 4 actionable T2 (shell.ts:2456) — Critical buffer-leak race `onSettleWired` previously set `promoteArtifacts.stream = null` BEFORE calling `stream.end()`. Any `postPromote.onData` chunk that landed between that null assignment and the actual flush completing saw `stream === null && streamFailed === false` and pushed into `promoteArtifacts.buffer` — a buffer that has no further drain path (the foreground finalizer has already returned). Result: chunks stranded indefinitely; PTY mode in particular hits this because `onExit` can fire while kernel buffers still hold data. Fix drains the pre-settle buffer to the stream BEFORE nulling AND latches `streamFailed = true` so any subsequent chunk drops via the existing `else if (streamFailed)` arm in `onData` instead of leaking. Updates the `streamFailed` doc to cover both setters (open-fail and settle-done) so the dual semantic is explicit. T3 (shell.ts:2262) — silent chunk-drop in catch path When `fs.createWriteStream` throws synchronously (rare: ENOENT on a vanished tmpdir), chunks already in `promoteArtifacts.buffer` were silently lost with no observability — oncall reading a truncated `bg_xxx.output` had no way to distinguish "stream open failed" from "child produced nothing." Logs the dropped chunk count and empties the buffer. T5 (shell.ts:2443) — opaque all-null fallback The "Exited with unknown status" fallback fired the registry to 'failed' without any context about which fields were null. This branch is meant to be unreachable; hitting it indicates the service emitted a defective settle info object. Includes the field values in both the fail message and a warn log so the oncall engineer can tell this path apart from the other "failed" branches. T6 (shellExecutionService.ts:1452) — leaked PTY post-promote listeners `ptyProcess.onData(...)` returns an `IDisposable` that was being discarded; same for `onExit`. The `'error'` listener function was also not captured (no way to `removeListener` it). EventEmitter holds refs to listener closures, which transitively hold refs to `onPostData` / `onPostSettle` / the caller's `promoteArtifacts`. While bounded by the PTY's lifetime, the closures keep the caller's state pinned for the post-settle delay window. Captures all three handles into `postPromoteDataDisposable` / `postPromoteExitDisposable` / `postPromoteErrorListener`, then releases them via a shared `disposePostPromoteListeners()` call from `firePostSettle` (idempotent — each slot null-checked and nulled after disposal). Tests: +1 service test for IDisposable + error-listener cleanup; +2 shell.ts tests for buffer drain race and catch-path snapshot fallback. Existing tests stay green (262 → 265 in the touched suites; 7819 → 7822 across the core package). * fix(core/test): drop unused 'registry' in wave-3 T2 test (TS6133) CI build failed across all platforms with src/tools/shell.test.ts(4395,15): error TS6133. The variable was a leftover from copying the queued-settle test pattern; the wave-3 T2 test inspects writeStreamMock.write call history directly and never reads the registry, so the assignment is dead code. Drop it. * fix(core): #4102 review wave-4 — 6 actionable T1 (Critical, shellExecutionService.ts:860 child_process onSettle exactly-once) The PTY path used a `firePostSettle` latch but child_process wired `close` and `error` independently to `onPostSettle`. A spawn-side error followed by Node's auto-emitted `'close'` would call the caller's settle TWICE, racing the registry transition. Added the same single-fire latch on the child_process path. T2 (Critical, shell.ts:2264 handoff race reorder) Original order was `write(snapshot) -> drain buffer -> assign stream`. Synchronous today (no race in current code), but assign-after-drain leaves a hazard for any future refactor that adds an `await` inside the drain loop — a chunk arriving in that window would land in `promoteArtifacts.buffer`, then post-assign chunks would write to the stream first, producing out-of-order bytes until the settle drain. Reordered to `write(snapshot) -> assign stream -> drain buffer`, which closes the hazard regardless of future async additions. T3 (Suggestion, shellExecutionService.ts:816 decoder flush gated on onSettle) The trailing-multibyte flush ran inside the `child.once('close', ...)` handler, which was only installed when `onSettle` was set. An `onData`-only caller (no onSettle) lost trailing continuation bytes silently. Hoisted flush into `flushPostPromoteDecoders` called from `firePostSettle`, and made `firePostSettle` available on the `'close'` path independent of onSettle (T6 install). T4 (Suggestion, shell.ts:1700 promoted ANSI passthrough) The regular `executeBackground` path strips ANSI before writing to `bg_xxx.output`; the promoted-foreground onData path appended raw chunks. Reading `bg_xxx.output` after Ctrl+B showed plain text up to the snapshot then raw `\x1b[31m` / cursor-move / clear-screen sequences for the post-promote tail — unreadable. Apply `stripAnsi(rawChunk)` before write/buffer, matching the executeBackground contract. T5 (Suggestion, shellExecutionService.ts:786 UTF-8 hardcoded) The post-promote child_process decoders were hard-coded to `new TextDecoder('utf-8')`, but the foreground decoder runs encoding detection via `getCachedEncodingForBuffer`. On a non-UTF-8 child (e.g. GBK on a Chinese Windows shell), the snapshot decoded correctly but the post-promote tail was mojibake. Capture the foreground decoder's `.encoding` property and reuse it for post-promote (with utf-8 fallback if foreground hadn't seen any bytes yet, and a try/catch around `new TextDecoder` for the rare unsupported-encoding case). T6 (Suggestion, shellExecutionService.ts:1540 `error` listener gated on onSettle) The post-promote `error` listener was attached only when `onSettle` was set. An `onData`-only caller still had the foreground errorHandler detached; a post-promote spawn error would then crash the CLI via Node's unhandled-error default. Hoisted the close + error listeners into `if (postPromote)` so any caller opting into post-promote gets crash protection; if `onSettle` is absent the listeners log + drop instead of routing. T7 (Suggestion, shellExecutionService.ts:791 onSettle-only pipe-block deadlock) Same root cause as T6: when only `onSettle` is set, the foreground `stdout`/`stderr` 'data' listeners are detached and no post-promote listener replaces them. The Readables stay paused, the OS pipe buffer fills (~64KB on Linux), the child blocks on `stdout.write`, 'close' never fires, onSettle never fires. Added `child.stdout?.resume()` and `child.stderr?.resume()` in the no-onData branch so the child can drain its pipes and reach exit. T8 (Suggestion, shell.ts:2614 dead inspectLine ternary) `inspectLine`'s ternary returned the same string on both sides — copy-paste leftover from when the other two adjacent ternaries (statusLine / stopLine) were correctly varied. Collapsed to a single string assignment. Tests: +5 regression tests (4 child_process: T1 double-fire latch, T3 onData-only flush, T6 onData-only error survives, T7 onSettle- only resume; +1 shell.ts: T4 ANSI strip). 265 -> 270 in the touched suites; 7822 -> 7827 across the core package; full suite green. * fix(core/test): use ShellOutputEvent type in wave-4 onData callbacks (TS2345) CI lint failed on the wave-4 (T3 / T6) tests with TS2345: pushing ShellOutputEvent into Array<{type:string;chunk:unknown}> narrows incompatibly. Switch to ShellOutputEvent[] (matches earlier helpers at lines 758/966) and discriminate the union via .type === 'data' when reading .chunk so the narrowed multibyte assertion still type-checks. * fix(core): address PR #4102 review — PTY error guard, flush timeout, diagnostic marker, failed-settle test - Move PTY post-promote error listener from `if (postPromote?.onSettle)` to `if (postPromote)` to match child_process path and prevent unhandled error crashes for onData-only callers - Add 10s flush timeout in onSettleWired so stalled streams don't leave registry entries stuck on 'running' forever - Append diagnostic marker to output file on stream error so truncation is visible without debug logging - Add queued-settle test with exitCode:1 asserting 'Status: failed.' in llmContent * fix(core): address PR #4102 review — align PTY/child_process guards, add flush timeout, diagnostic marker, and tests - Widen PTY post-promote onExit + error listener guard from `if (postPromote?.onSettle)` to `if (postPromote)` to match child_process path — prevents unhandled error crash and listener leak for onData-only callers - Add 10s flush timeout in onSettleWired so stalled streams don't leave registry entries stuck on 'running' indefinitely - Append diagnostic marker to output file on stream error so truncation is visible without debug logging - Remove model name references from code comments - Add tests: PTY onData-only error/exit, flush timeout fallback, appendFileSync diagnostic marker, queued-settle with failed exit code * fix(core): address PR #4102 review round 2 — listener cleanup, rename, constant hoist - Fix expect.objectContaining({}) misused as runtime arg in 2 execute() call sites - Add child_process post-promote stdout/stderr listener cleanup in firePostSettle - Rename streamFailed → streamClosed to reflect its overloaded semantics - Hoist FLUSH_TIMEOUT_MS to module-level PROMOTE_FLUSH_TIMEOUT_MS constant - Fix dangling FLUSH_TIMEOUT_MS reference (was undefined at runtime) - Add Windows note to streams pause/resume comment - Document PTY onData dispose-before-settle as known limitation	2026-05-17 17:57:08 +08:00
tanzhenxin	cc32ef2ff9	test(perf): skip daemon baseline harness under sandbox (#4234 ) The qwen-serve-baseline harness walks the daemon process tree using host-side `pgrep -P`. Under the Docker/Podman sandbox the daemon's `qwen --acp` child and its MCP grandchildren run inside the container's PID namespace, which host `pgrep` cannot observe, so the MCP-grandchild descendant walk always sees zero and times out. The test passes in the no-sandbox job but failed every retry in the Docker release job. Extend the existing Windows `SKIP` gate to also skip when sandbox is enabled, matching the precedent in acp-integration.test.ts and cron-tools.test.ts. Refs #4205	2026-05-17 17:52:34 +08:00
ChiGao	c25e22b575	feat(serve): add session-scoped permission route (#4232 ) Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>	2026-05-17 17:48:30 +08:00
ChiGao	4d9cbe49c0	feat(serve): add daemon-stamped client identity (#4231 ) * feat(serve): add daemon-stamped client identity * fix(serve): harden daemon client identity handling --------- Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>	2026-05-17 16:19:30 +08:00
kkhomej33-netizen	605e5eea16	fix(cli): include skill base dir in slash commands (#4224 )	2026-05-17 15:52:29 +08:00
kkhomej33-netizen	d6914bdfd6	fix(core): align shell tool description with configured shell (#4170 )	2026-05-17 15:36:59 +08:00
ChiGao	b90a2c91c9	feat(sdk): harden daemon session client (#4225 ) Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>	2026-05-17 15:05:37 +08:00
jinye	2453b82add	[codex] Add daemon session load/resume (#4222 ) * feat(serve): add daemon session load resume Adds HTTP and SDK support for restoring persisted daemon sessions through load/resume routes, including replay buffering for load and guarded concurrent restore handling. Refs #4175 Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(serve): address review feedback on daemon session load/resume - Gate `defaultEntry` claim in `restoreSession` on `defaultSessionScope === 'single'`, mirroring `doSpawn`. Without the gate, a restored session silently became the omitted-scope attach target on `'thread'`-default daemons. - Rename advertised capability `session_resume` to `unstable_session_resume` to match the underlying ACP method (`connection.unstable_resumeSession`). `session_load` stays stable. - Seed `lastEventId: 0` in `DaemonSessionClient.resume`, symmetric with `load`. The agent's `unstable_resumeSession` schedules an `available_commands_update` via `setTimeout(0)`; without the seed the SDK consumer would miss that frame. - Add HTTP-level test for the `RestoreInProgressError → 409` envelope. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * docs(serve): adopt review feedback comments on session load/resume - Cross-reference the `POST /session` disconnect-cleanup rationale from `restoreSessionHandler`'s `!res.writable` branch so future maintainers find the BQ9tV race + tanzhenxin attach-rollback context without grep. - Document `DaemonSessionState.{models, modes, configOptions}` in the SDK so callers can narrow to the ACP `SessionModelState` / `SessionModeState` / `SessionConfigOption` shapes. - Add JSDoc on `DaemonClient.restoreSession` explaining why `loadSession` and `resumeSession` collapse into one transport. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve): preserve restore state and harden in-flight restore races Address the four Critical findings from PR #4222 review (wenshao): - Coalesced restore waiters now observe the same ACP state the original restore caller did. `state: {}` in `restoreSession`'s coalesce branch was clobbering the spread `restored.state`, so concurrent callers got different payloads based purely on timing. Cache the load/resume response on `SessionEntry.restoreState` and return it from both the existing-byId early return and the coalesce branch. - Drop the `defaultEntry` promotion on restore. Explicit `session/load` / `session/resume` is "give me THIS id"; it must not become the implicit attach target for subsequent omitted-id `POST /session` callers under `single` scope. Reserves `defaultEntry` for sessions created through `doSpawn` only. - Reserve coalesced attaches synchronously via `InFlightRestore.coalesceState.count` so the spawn owner's `requireZeroAttaches` disconnect-reaper sees a non-zero `attachCount` on the freshly registered entry and skips the kill. Without this, B's `attachCount++` happened after `await inFlight.promise`, leaving a window where A's HTTP-disconnect cleanup could reap the session out from under B. - Include `pendingRestoreIds` in the `killSession` channel-teardown decision. The last live session leaving while a restore is in-flight on the same channel would otherwise SIGTERM the channel mid-restore. - Bump `RestoreInProgressError`'s `Retry-After` from 1s to 5s (matches `SessionLimitExceededError`); under the default `initTimeoutMs` of 10s, 1s pushed clients into tight loops. Tests: new bridge cases covering state propagation through coalesce, the spawn-owner-disconnect race, the pendingRestoreIds-aware channel teardown, and the no-promote- on-restore invariant. Existing "attaches twice" test rewritten to assert the cached restore state propagates. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * test(serve): cover acpAgent load/resume + restore route error mappings Close the test-coverage gaps wenshao called out in PR #4222 review: - acpAgent.test.ts gains a `QwenAgent loadSession / unstable_resumeSession` block that locks down the new contract end-to-end at the agent layer: * `loadSession` missing persisted session → throws `RequestError.resourceNotFound("session:<id>")` (code -32002 + `data.uri`). * `loadSession` existing session → returns LoadSessionResponse AND triggers `session.replayHistory(messages)` so SSE subscribers see the persisted turns. * `unstable_resumeSession` missing session → same resourceNotFound contract. * `unstable_resumeSession` existing session → returns the response WITHOUT replaying history (resume restores model context internally; UI replay is intentionally suppressed). Required extending the mocked `RequestError` with `resourceNotFound`, and mocking `SessionService` per case. - server.test.ts adds the missing restore-route wire mappings: `WorkspaceMismatchError → 400 workspace_mismatch` and `SessionLimitExceededError → 503 + Retry-After: 5`. Combined with the existing 409 case for `RestoreInProgressError`, the route layer now has full structured-error coverage. - Updated the 409 test's `Retry-After` expectation from `1` to `5` to match the bumped retry hint. Disconnect-cleanup tests for the restore route were intentionally not added — the cleanup branch is line-for-line identical to `POST /session`'s handler (which itself ships without route-level disconnect tests due to flaky supertest + Node http close-event timing). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * docs(serve): document daemon session load/resume routes Sync the docs to the routes that landed via PR #4222: - `docs/developers/qwen-serve-protocol.md`: * Add `session_load` and `unstable_session_resume` to the advertised features list, with a note on the `unstable_` prefix mirroring ACP's underlying method name. * Document `POST /session/:id/load` and `POST /session/:id/resume` — request body, response shape (including the cached `state` field that late attachers observe), and the full error envelope: 404 unknown id, 400 workspace_mismatch, 503 session_limit_exceeded (counts in-flight restores), 409 restore_in_progress (cross-action race). * Note the SSE replay ring bound (4000 frames default) and the "subscribe immediately after load" guidance for long histories. - `docs/users/qwen-serve.md`: * Add a "Loading and resuming a persisted session" section with the SDK example (`DaemonSessionClient.load` / `DaemonSessionClient.resume`) and the load-vs-resume decision table. * Update the durability model — sessions are still ephemeral across daemon restarts in Stage 1, but persisted sessions on disk can now be reloaded. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(test): use _meta payload to satisfy ACP SessionConfigOption types The two new state-propagation tests in `httpAcpBridge.test.ts` used `{ id, name, value }` as a `SessionConfigOption`, but ACP's actual `SessionConfigSelect` shape requires `currentValue` + `options`. vitest runs through esbuild and skips strict typechecking, so the local `vitest run` passed; CI's `tsc --build` (run during `npm run prepare`) caught it. Switch the fixture to `_meta: { tag: '...' }` instead — `_meta` is typed as `Record<string, unknown> \| null` on the ACP response shapes, so any payload survives. The assertions only need the bridge to forward the state object intact, which `_meta` proves equally well without committing the test to the full SessionConfigOption union. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve): symmetric restore coalesce guard + transportClosed leak + defensive cleanup Address the two new Critical findings + the test/cosmetic gaps from wenshao's second review pass on PR #4222 (`a3f38da3a`): - [Critical] Symmetric coalesce guard. The previous guard only rejected `load`-on-`resume`; `resume` arriving while a `load` was in flight silently coalesced and inherited the load's history- replay frames over SSE — directly violating resume's "no UI replay" contract (made worse by `DaemonSessionClient.resume()` seeding `lastEventId: 0`). Tighten the guard to `action !== inFlight.action` so any cross-action race throws `RestoreInProgressError`. Same-action coalescing is unaffected. - [Critical] `transportClosed` dangling rejection. When `withTimeout` wins the `Promise.race` against `channel.exited`, the `.then(throw)` chain on `channel.exited` stays pending. A later channel exit (next session boundary, daemon shutdown, agent crash) fires the `throw` with no observer attached — Node 22 logs `unhandledRejection`, and `--unhandled-rejections=throw` deployments crash the daemon. Add `transportClosed.catch(() => {})` to suppress the dangling rejection after the race settles. - `isAcpSessionResourceNotFound` exact-match fallback. The message-fallback path used `message.includes(expectedUri)`, which would falsely match a sessionId of `"a"` against a message containing `"session:abc"`. Tighten to exact equality on the canonical `Resource not found: <uri>` form. The primary `data.uri` path remains the dominant code path. - `loadSession` mcpServers default symmetry. `loadSession` now uses `params.mcpServers ?? []` to mirror `unstable_resumeSession`. Defends against a future ACP schema loosening that makes `LoadSessionRequest.mcpServers` optional — without the null-coalesce, `newSessionConfig` would `TypeError` on iteration. Tests added: - `httpAcpBridge.test.ts`: `resume`-on-`load` rejection (mirror of the existing `load`-on-`resume` test); regression for the dangling `unhandledRejection` (resolves `channel.exited` after the restore promise has already settled and asserts no `unhandledRejection` event); shutdown-awaits-restore via `Promise.race`-based ordering. - `server.test.ts`: 400 for non-string and over-length `cwd` on the restore routes (mirroring the equivalent `POST /session` cases for `parseOptionalWorkspaceCwd`). - `acpAgent.test.ts`: load with `getResumedSessionData()` returning `undefined` — distinct code path that does NOT call `replayHistory`. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-05-17 12:58:47 +08:00

1 2 3 4 5 ...

5905 commits