qwen-code/docs/users
Shaojin Wen 790f2d0485
refactor(serve): 1 daemon = 1 workspace (#3803 §02) (#4113)
* refactor(serve): 1 daemon = 1 workspace (#3803 §02)

Stage 1 shipped with M-workspaces-per-daemon routing (`byWorkspaceChannel`
Map keyed by request `cwd`). The §02 architectural revision in
`docs/comparison/qwen-code-daemon-design/02-architectural-decisions.md`
narrows the bridge to 1 daemon = 1 workspace × N sessions: each daemon
binds to one canonical workspace path at boot; `POST /session` with a
mismatched `cwd` returns 400 `workspace_mismatch`. Multi-workspace
deployments run multiple daemon processes (one per workspace, supervised
externally — systemd / docker-compose / k8s / `qwen-coordinator`).

Bridge state collapses from maps to single optional slots:

- `byWorkspaceChannel: Map<string, ChannelInfo>` → `channelInfo?: ChannelInfo`
- `inFlightChannelSpawns: Map<string, Promise>` → `inFlightChannelSpawn?: Promise`
- `byWorkspace: Map<string, SessionEntry>` → `defaultEntry?: SessionEntry`
- `liveChannels: Set<ChannelInfo>` → not needed; `channelInfo` is the live
  reference, cleared only by `channel.exited` (preserves the tanzhenxin
  BkUyD invariant that `killAllSync` finds a target mid-SIGTERM-grace)

`BridgeOptions.boundWorkspace` becomes required. `WorkspaceMismatchError`
is thrown from `spawnOrAttach` when the request's canonical cwd doesn't
match the bound path, translated to 400 `workspace_mismatch` (with both
paths in the body) by the route layer. `CapabilitiesEnvelope.workspaceCwd`
surfaces the bound path so clients pre-flight check + omit `cwd` from
`POST /session` (it falls back to the bound workspace).

A new `--workspace <path>` CLI flag lets operators override
`process.cwd()` at boot. The previous `--http-bridge` / `--multi-workspace`
opt-in was never shipped; nothing changes for default users running
`qwen serve` in their project directory.

Removed code path: ~150 LOC of multi-workspace map machinery in
`httpAcpBridge.ts` plus the test cases that exercised it.

Test surgery:

- New `makeBridge()` helper in `httpAcpBridge.test.ts` injects
  `boundWorkspace: WS_A` by default; tests that need a different bind
  (the mismatch test) pass it explicitly.
- `does NOT reuse across workspaces` → `rejects cross-workspace requests
  with WorkspaceMismatchError` (the new semantics under §02).
- `shutdown kills every live channel` retargeted to single-channel
  multi-session shutdown.
- `killAllSync force-kills channels even after shutdown cleared
  byWorkspaceChannel (BkUyD)` retargeted to single-channel: the
  invariant is the same (channel reference must outlive eager shutdown
  clearing), the surface is just smaller.
- `listWorkspaceSessions` cross-workspace assertion now expects empty
  for the un-bound path.
- `--max-sessions` cap test uses two thread-scope sessions on `WS_A`
  instead of WS_A + WS_B.

Closes #3803 §02.

* fix(serve): address review findings on the §02 refactor

Two correctness fixes + four doc/test polish items surfaced by the
multi-agent review of #4113:

1. `killSession` → `spawnOrAttach` race (Critical). After killing
   the last session, `channel.kill()` runs through a 5s SIGTERM grace
   before SIGKILL. During that window a concurrent `spawnOrAttach`
   used to hit `ensureChannel`, find `channelInfo` still set, and
   reuse the dying transport — either landing the caller with a
   sessionId that 404s on every follow-up once `channel.exited`
   fires, or hanging until the newSession timeout.

   Fix: add an `isDying: boolean` flag on `ChannelInfo`, set
   synchronously by `killSession` / `doSpawn`-newSession-failure /
   `shutdown` BEFORE awaiting `channel.kill()`. `ensureChannel`
   treats a dying channel as absent and spawns a fresh one. The
   tanzhenxin BkUyD invariant ("`channelInfo` reference must outlive
   the kill-await for `killAllSync` mid-grace") is preserved — we
   set `isDying` but don't clear `channelInfo` until the OS reaps
   the child via `channel.exited`. A regression test in
   `httpAcpBridge.test.ts` pins the invariant: a never-resolving
   `kill()` keeps the SIGTERM grace open while a concurrent spawn
   verifies the factory was called twice (two distinct handles).

2. `boundWorkspace` canonicalization divergence (Critical).
   `server.ts` and `runQwenServe.ts` each computed
   `opts.workspace ?? process.cwd()` independently. The bridge
   canonicalized that string via `realpathSync.native` (resolving
   symlinks, case-folding on case-insensitive filesystems); the
   callers retained the raw form. On macOS HFS+ / APFS or any
   symlinked path, `/capabilities.workspaceCwd` advertised one
   spelling while the bridge enforced against another — clients
   echoing the advertised path back saw `POST /session` succeed but
   the response carry a different `workspaceCwd`.

   Fix: export `canonicalizeWorkspace` from `httpAcpBridge.ts` and
   call it once in `runQwenServe` (after the existence check) and
   once in `createServeApp`. Both paths land on the same canonical
   form; the bridge's own re-canonicalize is now a no-op
   (idempotent).

3. Reject `--workspace` pointing at non-existent directories at
   boot (Suggestion). `canonicalizeWorkspace`'s ENOENT fallback to
   `path.resolve` previously let the daemon boot pointed at a path
   that didn't exist; every `POST /session` then spawned a
   `qwen --acp` child with that cwd and the agent failed with an
   opaque ENOENT. Now `runQwenServe` `statSync`s the bound path at
   boot and rejects "directory does not exist" / "not a directory"
   with a clear message.

4. Stale docstrings (Nice to have). `types.ts` `ServeMode` JSDoc
   said "one `qwen --acp` child PER WORKSPACE" — directly
   contradicted the new `workspace` field's doc in the same file.
   `commands/serve.ts` `--http-bridge` description said "per
   workspace" — directly contradicted the `--workspace` flag's help
   in the same yargs builder. Both updated to "per daemon (the
   daemon binds to ONE workspace at boot)".

5. Stale `byWorkspace` comment references (Nice to have).
   `server.ts:188` ("orphaned in byId / byWorkspace") and
   `httpAcpBridge.test.ts:1210` ("still in byId/byWorkspace at the
   moment of crash") referenced the removed Map. Updated to
   `defaultEntry`.

6. `/capabilities` curl example in the Authentication section of
   `docs/users/qwen-serve.md` was missing the new `workspaceCwd`
   field — the Quickstart's curl example was updated but the
   parallel one in the auth section was not. Synced.

Tests added:
- `killSession marks the channel dying so concurrent spawnOrAttach
   gets a fresh channel` — pins fix (1).
- `--workspace flows end-to-end and surfaces on /capabilities` —
   exercises the runQwenServe → server.ts → bridge plumbing that
   no prior test covered.
- `rejects --workspace pointing at a non-existent directory` and
   `rejects --workspace pointing at a regular file` — pin fix (3).
- `rejects relative --workspace at boot` — covers the absoluteness
   check that exists but was untested.

Net: +238 / -24 across 8 files. All 149 serve tests pass.

* fix(serve): BkUyD overwrite race + Windows-fragile test + doSpawn-failure coverage

Round-2 review of #4113 caught three follow-up issues introduced by
or left open after round-1's fixes:

1. **BkUyD invariant overwrite race (Critical).** Round-1's `isDying`
   flag lets `ensureChannel` skip a dying channel and spawn a fresh
   one. When the fresh spawn completes, `channelInfo = info` overwrote
   the dying channel's reference — leaving NO global pointer to it.
   `killAllSync()` then iterated only `channelInfo` (the fresh one)
   and missed the dying child entirely. A double-Ctrl+C arriving
   mid-SIGTERM-grace would call `process.exit(1)` before the dying
   child's per-channel SIGKILL escalation timer fired, orphaning the
   child.

   Restore a `aliveChannels: Set<ChannelInfo>` (parallel to the
   original Stage 1 design, but justified by single-workspace too).
   Entries added in `ensureChannel`, removed by each channel's
   `channel.exited` handler. `killAllSync` iterates the SET, not the
   single attach-target slot. `shutdown` does the same — snapshots
   every alive channel and kills each, not just the current
   `channelInfo`.

   New regression test pins the invariant: spawn → killSession
   (channel marked dying, kill hangs) → spawnOrAttach (fresh channel
   overwrites `channelInfo`) → `killAllSync` — expect BOTH channels'
   `killSync` to fire. Pre-fix only the fresh one would have fired.

2. **Windows-fragile test path.** The new
   `rejects --workspace pointing at a regular file` test used
   `new URL(import.meta.url).pathname` to get a path to the test
   file. On Windows that returns `/C:/path/...` (leading slash);
   `fs.statSync` then resolves it as path-from-current-drive-root,
   fails with ENOENT, and the test sees the "does not exist" error
   message instead of the expected "not a directory" branch. CI runs
   `windows-latest`. Fix: `fileURLToPath(import.meta.url)` from
   `node:url`.

3. **doSpawn newSession-failure isDying path was untested.** The
   round-1 fix added `ci.isDying = true` to both `killSession` AND
   `doSpawn`'s newSession-failure catch, but only the killSession
   path had a regression test. Added a parallel one for the doSpawn
   path: thread-scope bridge with a `newSessionImpl` that throws on
   the first call → captures the rejection without awaiting it (the
   bridge's `await ci.channel.kill()` hangs in the test), yields
   enough cycles for the `isDying = true` sync prefix to settle, then
   confirms (a) the next `spawnOrAttach` produces a fresh channel
   and (b) `killAllSync` finds both channels in `aliveChannels`.

Also added a `newSessionImpl` option to the test FakeAgent — the
existing `initializeThrows` hook covered handshake-time failures, but
post-init `newSession` rejections (auth, bad config, mid-init
crashes) had no test affordance.

All 151 serve tests pass.

* docs(serve): update daemon-client-quickstart for §02 single-workspace

Round-3 review caught that the SDK example doc was the only one of the
three serve-related docs that the §02 refactor didn't touch. Updated:

- Boot log example now shows the `, workspace=/path/to/your-project`
  suffix that `runQwenServe` emits after the §02 changes.
- The "Hello daemon" example now reads `caps.workspaceCwd` off
  `/capabilities` and passes it back as `workspaceCwd` on session
  creation — illustrating the documented pre-flight pattern, not a
  hand-written literal that may not match the daemon's actual bind.
- Shared-session example makes the prerequisite explicit: the daemon
  must be bound to `/work/repo` (via `--workspace` or `cd`); under §02
  two clients can only share a session if they're both hitting a
  daemon already bound to that workspace.
- New "Workspace mismatch" section shows how to handle the
  `400 workspace_mismatch` error class: catching `DaemonHttpError`,
  branching on `body.code`, surfacing `boundWorkspace` /
  `requestedWorkspace` for the operator. This is a new error
  class SDK consumers' error handlers should branch on.

No code changes; docs only.

* feat(sdk,test): align SDK types + integration tests with §02 single-workspace

Round-4 review caught one type-drift gap + a set of integration-test
assumptions that the §02 refactor invalidated.

**SDK type drift.** `DaemonCapabilities` in
`packages/sdk-typescript/src/daemon/types.ts` was the SDK-side mirror
of `CapabilitiesEnvelope` on the daemon side. The §02 PR added
`workspaceCwd: string` to the daemon envelope (and the round-3 doc
example reads `caps.workspaceCwd` off the SDK client) but the SDK
type wasn't updated. A TypeScript consumer copying the doc snippet
verbatim would hit `TS2339 'workspaceCwd' does not exist on type
'DaemonCapabilities'`. The wire field is present so JS consumers
wouldn't notice — but the SDK is marketed as a TypeScript quickstart,
so this is a real onboarding break.

Fix: add `workspaceCwd: string` to `DaemonCapabilities` (parallel to
`DaemonSession.workspaceCwd` which is already there). The SDK unit
test for `client.capabilities()` was updated to put the new field
in the mocked response.

**Integration tests.** `qwen-serve-routes.test.ts` spawns a real
`qwen serve` daemon in `beforeAll`. Three breakages exposed:

1. The daemon was launched without `--workspace`, so it inherited
   the test runner's `cwd`. Tests then POST `workspaceCwd: REPO_ROOT`
   assuming the daemon is bound to the repo root — true when run via
   `npm test` from the repo, brittle from IDEs / launchers that have
   a different `cwd`. Added `'--workspace', REPO_ROOT` to the spawn
   args so the bound workspace is deterministic regardless of where
   the test runner is launched.

2. The `bad modelServiceId` test used `cwd: '/tmp'`. Under §02 this
   would now return 400 workspace_mismatch before the session was
   spawned. Switched to `REPO_ROOT` and softened the `attached`
   assertion (REPO_ROOT may already have a session from earlier
   tests in the suite under sessionScope:single).

3. Added three new integration tests pinning the §02 surface
   end-to-end through a real daemon process:
   - `rejects cross-workspace cwd with 400 workspace_mismatch` —
     posts `/tmp` and asserts the full structured error body
     (`code`, `boundWorkspace`, `requestedWorkspace`).
   - `omits cwd → falls back to bound workspace` — posts an empty
     body and asserts the response's `workspaceCwd` matches REPO_ROOT
     (verifies the runQwenServe → createServeApp → bridge fallback
     plumbing).
   - `GET /capabilities surfaces workspaceCwd` — asserts the new
     SDK type field is populated correctly off the wire.

All 422 unit tests pass (cli serve + sdk). Integration tests
typecheck clean.

* fix(serve): address /review feedback from gpt-5.5 + deepseek-v4-pro

Process the 7 inline /review comments on PR #4113:

- C1+C3 (SDK): make `DaemonCapabilities.workspaceCwd` and
  `CreateSessionRequest.workspaceCwd` optional in the SDK types.
  `workspaceCwd` is an additive field on the v=1 envelope per #3803
  §02; the protocol's "bump v only on incompatible changes" stance
  is honored by leaving the field optional at the type level.
  `DaemonClient.createOrAttachSession` now omits `cwd` from the body
  when `workspaceCwd` isn't passed, matching the PR description's
  "SDK accepts bound path or none". Adds a unit test pinning the
  empty-body shape.

- C2 (docs/users/qwen-serve.md): the `--http-bridge` row described
  the pre-§02 per-session model; updated to reflect one child per
  daemon with N sessions multiplexed via ACP `newSession()`.

- C4 (server.ts): `WorkspaceMismatchError` was silently 400'ing
  without a stderr breadcrumb, leaving operators blind to
  cross-workspace routing drift. Mirrors the SessionLimitExceeded
  /InvalidPermissionOption observability pattern.

- C5 (server.test.ts): the `/capabilities` fallback test compared
  `res.body.workspaceCwd` against raw `process.cwd()`; on macOS
  default tmpdir flows (`/var/folders/...` → `/private/var/...`)
  the canonicalize-once route value diverges. Use
  `realpathSync.native(process.cwd())` to match the route's
  canonicalization.

- C6 (server.ts): the cwd-not-absolute error said "cwd is required
  and must be an absolute path" but cwd is now optional under §02.
  Tightened wording to "must be an absolute path when provided".

- C7 (runQwenServe.ts): the `statSync` catch only wrapped ENOENT
  with a friendly diagnostic; EACCES / EPERM (typical for
  SIP-protected dirs on macOS or root-owned paths the daemon's UID
  can't traverse) re-threw as raw `SystemError`. Wrap both codes
  with a `--workspace`-context message so the boot failure points
  at the flag the operator set.

Docs: quickstart shows the explicit-pass-or-omit options side by
side; protocol reference notes `workspaceCwd` is additive to v=1.

* fix(serve/test): make /work/bound literals Windows-portable

Windows CI failed on this PR's two new tests because
 returns  (drive-relative
absolute), so the route's canonicalize step diverged from the hardcoded
literal. Mirror the WS_A/WS_B pattern already used in
httpAcpBridge.test.ts: define WS_BOUND / WS_DIFFERENT via
`path.resolve(path.sep, …)` and use the constants everywhere. The
400 workspace_mismatch test would still have passed (mock controls
both throw + assertion) but I aligned it for consistency.

Failures from CI run 25806528710:
  expected 'D:\work\bound' to be '/work/bound' (Object.is)

Affected tests:
  - createServeApp > GET /capabilities > reports the bound workspace
  - createServeApp > POST /session > 200 when cwd is omitted

* fix(serve): address second /review round (gpt-5.5 + deepseek-v4-pro)

Four new inline findings from the latest /review pass:

- N1 (integration-tests/cli/qwen-serve-routes.test.ts) — Critical:
  the `workspace_mismatch` assertion compared `requestedWorkspace`
  against the literal `'/tmp'`, but the bridge canonicalizes via
  `realpathSync.native` and on macOS `/tmp` is a symlink to
  `/private/tmp`. Compare against `realpathSync.native('/tmp')` so
  the assertion is portable.

- N2 (packages/cli/src/serve/types.ts):
  `CapabilitiesEnvelope.workspaceCwd: string` (server side) diverged
  from the SDK's `DaemonCapabilities.workspaceCwd?: string`. Made the
  server type optional too — matches the SDK, matches the protocol
  doc's "additive to v=1" framing, doesn't change runtime emission
  (the post-§02 server still always populates the field).

- N3 + N4 (packages/cli/src/serve/server.ts + sdk-typescript/.../DaemonClient.ts):
  the route's `cwd` validation treated every non-string body value
  (`null`, `123`, `{}`, `[]`) the same as omitted, silently falling
  back to `boundWorkspace`. That hid client/orchestrator
  serialization bugs as "session attached to wrong workspace".
  Now the route uses `'cwd' in body` to detect presence and rejects
  presence-but-not-a-string with `400 'cwd must be a string absolute
  path when provided'`. Empty string still hits the existing
  `path.isAbsolute` branch ("must be an absolute path when
  provided"), so an SDK caller passing `workspaceCwd: ''` no longer
  silently lands in the daemon's bound workspace.

  SDK side: reverted my conditional spread to `cwd: req.workspaceCwd`
  unconditional. `JSON.stringify` strips `undefined` automatically
  (so omitted `workspaceCwd` becomes "no `cwd` key" on the wire, as
  before), but empty-string is now forwarded verbatim and the server's
  400 surfaces the bug instead of the SDK swallowing it. Added a unit
  test pinning the empty-string-forwarded shape.

Server tests:
  - `400 when cwd is present but not a string` covers null / number /
    object / array via a sub-loop.
  - `400 when cwd is the empty string` pins the isAbsolute path.

  bridge: 73/73; server: 80/80 (was 78, +2 new); SDK: 40/40 (was 39,
  +1 empty-string test). tsc clean for SDK and PR-touched CLI files.

* fix(serve): use const cwd in POST /session (prefer-const lint)

CI lint failed with packages/cli/src/serve/server.ts:199:9 prefer-const: 'cwd' is never reassigned. The wave-4 rewrite split the original 'let cwd; if (!cwd) cwd = boundWorkspace' into a single ternary, which removes the only mutation path; the variable should be const accordingly.

* fix(serve): address third /review round (gpt-5.5 + glm-5.1 + deepseek-v4-pro)

Five new inline findings; M1 was already resolved in 1c7f5f069.

- M2 (httpAcpBridge.ts): drop the dead `ChannelInfo.workspaceCwd`
  field. Pre-§02 it was the routing key for `byWorkspaceChannel.get`;
  after the §02 collapse all reads target `SessionEntry.workspaceCwd`
  and `ChannelInfo.workspaceCwd` was only written, never read. Per-
  channel storage also suggests variance the "1 daemon = 1 workspace"
  model forbids. Removing the field encodes the single-workspace
  invariant in the type itself; left a stub comment so future
  readers don't reintroduce it.

- M3 (httpAcpBridge.ts): fast-path `canonicalizeWorkspace` when
  `req.workspaceCwd === boundWorkspace`. The §02 recommended client
  flow is `caps.workspaceCwd` → POST `cwd: caps.workspaceCwd`, and
  the omit-cwd route in server.ts synthesizes the same equality.
  Both hit the equality check and skip the sync `realpathSync.native`
  syscall. Non-equal inputs fall through to the full canonicalize
  (clients sending `/work/./bound`, mixed casing on case-insensitive
  FS, symlink aliases) so correctness is unchanged.

- M4 (httpAcpBridge.ts): operator stderr breadcrumb in the
  `channel.exited` handler. An agent crash (OOM / segfault) used to
  be silent on the daemon side — the child-stderr forwarder caught
  whatever the child wrote before dying (often nothing on
  SIGKILL/segfault), and SSE subscribers saw `session_died` frames
  but operators reading `qwen serve`'s own output had no signal that
  the agent process was gone. Log code+signal+affected-session-count
  so the line is the canonical "agent disappeared" indicator.

- M5 (server.ts): documentation-only. The reviewer wanted
  `createServeApp` to validate `opts.workspace` exists + is a
  directory (currently only `runQwenServe` does). Trade-off: doing
  that breaks 4 existing tests which pass synthetic `/work/bound` on
  purpose to exercise route-layer behavior without a real directory.
  Deferred the helper extraction; added a JSDoc note pinning the
  contract so future entry points binding `createServeApp` to user
  input know to replicate the validation.

- M6 (runQwenServe.ts): pass the already-canonical `boundWorkspace`
  into `createServeApp` via `opts.workspace`. `canonicalizeWorkspace`
  is idempotent so the server-side recanonicalize is a no-op today,
  but if a future refactor ever makes it non-idempotent the values
  the route advertises on `/capabilities` and the bridge enforces
  would diverge — landing clients in a "/capabilities says X, POST
  /session/X returns workspace_mismatch" contradiction. Removes the
  drift risk.

bridge: 73/73; server: 80/80; tsc clean for PR-touched files.

* fix(serve,sdk): address fourth /review round (deepseek-v4-pro x2)

Two new inline findings:

- O1 (server.ts): the POST /session route uses `'cwd' in body` against
  `safeBody`'s `Object.create(null)` output to distinguish "client
  omitted cwd" from "client sent cwd". The semantics quietly couple
  to `safeBody`'s literal strip list (`__proto__/constructor/prototype`).
  If a future maintainer adds a user-facing key (e.g. `cwd`) to that
  strip list, the route's presence-check would silently flip to
  "absent → fallback", masking the bug as "wrong workspace bound."
  Extracted `PROTOTYPE_POLLUTION_KEYS: ReadonlySet<string>` as a named
  module-scope constant; safeBody uses `.has()` on it (behavior
  unchanged); the route's comment now cross-references the const so
  the coupling is documented at both ends. The const's JSDoc spells
  out what to do if the strip set ever has to grow into user-key
  territory.

- O2 (sdk-typescript): `DaemonCapabilities.workspaceCwd` is
  `string | undefined` (additive to v=1; pre-§02 daemons omit). SDK
  consumers that pass it into a `string` context get a TS strict
  error or, against an old daemon, a runtime
  `Cannot read properties of undefined`. Added a `requireWorkspaceCwd`
  helper + `DaemonCapabilityMissingError` so consumers can opt into
  an actionable
  `DaemonCapabilities.workspaceCwd is missing — introduced in #3803 §02 …`
  error instead. Exported both from `@qwen-code/sdk`'s top-level
  module + the `daemon/` sub-module. Unit tests cover populated,
  missing, and empty-string inputs.

bridge: 73/73; server: 80/80; SDK DaemonClient: 43/43 (was 40, +3
new requireWorkspaceCwd cases). tsc clean for SDK and PR-touched
CLI files.

* fix(serve): address tanzhenxin REQUEST_CHANGES (cold-spawn + streaming-test bind)

Two findings from the CHANGES_REQUESTED review on PR #4113.

- T1 (integration-tests/cli/qwen-serve-streaming.test.ts) — high
  severity: the daemon spawn in `beforeAll` did not pass
  `--workspace REPO_ROOT`, so under §02 the daemon bound to
  whatever cwd the test runner was invoked from. Every later
  `createOrAttachSession({ workspaceCwd: REPO_ROOT })` then 400'd
  with `workspace_mismatch`, and the entire file — child-crash
  recovery, multi-client first-responder permission, Last-Event-ID
  resume — silently no-op'd once `SKIP_LLM_TESTS` was unset. The
  sibling `qwen-serve-routes.test.ts` got the same fix earlier in
  this PR; this file was missed in that pass. Added the flag with a
  comment pointing at the rationale so the omission can't recur.

- T2 (packages/cli/src/serve/httpAcpBridge.ts) — medium severity:
  cold-spawn window orphans the agent child on double-Ctrl+C. The
  `qwen --acp` child exists from the moment `channelFactory` spawns
  it, but pre-fix the bridge only added the channel to
  `aliveChannels` AFTER `connection.initialize()` returned. During
  the up-to-`initTimeoutMs` (default 10s) handshake window
  `aliveChannels` was empty, and a double-Ctrl+C in that window
  played out as: first SIGINT entered `shutdown()` and awaited the
  in-flight spawn; second SIGINT called `killAllSync()` against an
  empty set; `process.exit(1)` orphaned the child. Same class of
  bug the BkUyD invariant set out to close — the post-init
  overwrite race was covered, the pre-init handshake window wasn't.

  Fix: move `info` creation + `aliveChannels.add(info)` + the
  `channel.exited` handler registration BEFORE the `initialize`
  await. Init-failure / late-shutdown / child-crash-during-handshake
  all converge on the same cleanup path: mark `isDying = true`,
  `await channel.kill()`, let the exited handler `aliveChannels
  .delete(info)` once the OS reaps the process. `channelInfo` (the
  attach target) is still assigned LAST so `ensureChannel`'s
  fast-path never returns a still-handshaking channel.

  Regression test: `killAllSync force-kills the channel during the
  initialize handshake` uses a bespoke factory whose agent's
  `initialize` never resolves and asserts `killAllSync` fires
  killSync against the channel during the handshake window. Pre-fix
  the test would observe an empty `killSyncCalls` array.

bridge: 74/74 (was 73, +1 cold-spawn test); server: 80/80;
tsc clean for PR-touched files.

* fix(serve): address third /review round (gpt-5.5 + glm-5.1 + deepseek-v4-pro)

Eight new inline findings; six applied, two deferred-with-reply.

- P1 (httpAcpBridge.ts init-failure isDying comment): my comment
  overstated what `info.isDying` accomplishes on the init-failure
  path — concurrent `ensureChannel()` callers don't bypass via
  `isDying`, they coalesce on `inFlightChannelSpawn` and observe the
  same rejection. Reworded to describe the actual cross-path
  invariant marker.

- P2 (server.ts workspace_mismatch log injection): doudouOUC flagged
  log injection via `err.requested` (user-controlled). `path.resolve`
  + `realpathSync.native` preserve control chars in path segments,
  so a body `{"cwd": "/legit/path\nqwen serve: FAKE LOG"}` would
  emit two valid-looking daemon log lines on stderr — weaponizing
  line-based log shippers (Splunk / Loki / journald → SIEM).
  `JSON.stringify` both `err.bound` and `err.requested` in the log
  line escapes control chars + quotes the values, making any
  injection attempt visible-as-quoted-noise rather than forged-line.
  Bound is operator-controlled and inherently safe but quoted
  symmetrically for readability. The defense-in-depth alternative
  (reject control chars in canonicalizeWorkspace) is deferred —
  this single log site was the actionable interpolation; future
  workspace-path-into-stderr / -JSON / -templated-SQL flows can pick
  up the rejection if they ship.

- P3 (httpAcpBridge.test.ts): refactor the cross-workspace
  WorkspaceMismatchError test to a single `.catch((e) => e)` capture
  rather than firing the rejection twice (once for the `rejects
  .toBeInstanceOf` matcher, once for the field assertions). Logic
  unchanged.

- P4 (httpAcpBridge.ts channel.exited log): the `qwen serve:
  channel exited (...)` line fired on every channel exit including
  planned shutdown — alarming for operators who Ctrl+C'd a healthy
  daemon. Guarded with `if (!shuttingDown)` so the planned-shutdown
  case (operator already saw `received SIGINT, draining...`) stays
  silent. The killSession path (last session leaves, daemon stays
  up — no top-level context line) still logs, since the line is the
  only signal that the cleanup actually ran.

- P5 (httpAcpBridge.ts): light trim of the "pre-fix" narrative
  voice in two comment blocks (cold-spawn ensureChannel layout +
  BkUyD killAllSync aliveChannels iteration). Kept the invariant
  explanations — those carry maintenance value — dropped the
  "pre-fix the code did X" framing that's review-context not
  future-reader context.

- P6 (server.ts + runQwenServe.ts): `createServeApp` now accepts a
  pre-canonicalized `deps.boundWorkspace` to skip its own
  `canonicalizeWorkspace` syscall when the caller (runQwenServe)
  already did the work. Replaces my earlier `{...opts, workspace:
  boundWorkspace}` opts-mutation hack — cleaner separation of
  concerns + drops one `realpathSync.native` per boot. Direct
  callers (tests, embeds) that omit `deps.boundWorkspace` still get
  the in-body canonicalize path.

- P8 (httpAcpBridge.ts): defensive `aliveChannels.size > 2`
  warning. The set is intentionally multi-entry to cover the
  killSession-then-spawnOrAttach overlap window (size 2 is
  legitimate). Anything higher implies a `channel.exited` handler
  never fired for a prior channel — a real leak we'd otherwise
  catch only as gradually-growing RSS. The warning surfaces it the
  moment it happens.

- P7 (CreateSessionRequest.workspaceCwd optional): deferred with
  reply rationale. Making the field optional is the §02 design
  ("SDK accepts bound path or none"); the JSDoc already explains
  the omit-vs-explicit choice; Stage 1 has no shipping SDK
  consumers so there's no breakage to call out in a changelog file.
  No code change.

bridge: 74/74 (cross-workspace test refactor + behavioral assertions
unchanged); server: 80/80; SDK 43/43. tsc clean for PR-touched
files.

* fix(serve): apply auto-fixes from /review (#4113)

- canonicalizeWorkspace: narrow catch to ENOENT only, propagate other filesystem errors
- listWorkspaceSessions: add fast-path string equality to avoid realpathSync on every poll
- GET /workspace/:id/sessions: return 400 workspace_mismatch for cross-workspace queries
- SessionNotFoundError: accept optional extra message; clarify agent-crash-on-spawn case
- requireWorkspaceCwd: distinguish empty-string (post-§02 bug) from absent (pre-§02 daemon)

* fix(serve/test): bind workspace explicitly in GET /workspace tests

Wave-5 commit 0c6e963cd ("apply auto-fixes from /review (#4113)") added
a 400 workspace_mismatch reject path to GET /workspace/:id/sessions
for cross-workspace queries, but the existing two happy-path tests
queried `/work/a` / `/work/idle` against an unbound daemon (which
falls back to `process.cwd()`). Both turned to 400 in CI.

Bind the daemon to WS_BOUND in both happy-path tests and query the
same path. Add a third regression test that pins the §02
cross-workspace rejection contract — `code: workspace_mismatch`,
both paths in the body, bridge.listCalls untouched (no silent
fallback regression).

Brings server.test.ts from 80 → 82 tests, all passing.

* fix(serve,sdk): address fourth /review round (deepseek-v4-pro x2)

Six new inline findings; five applied, one defer-with-reply.

- Q1 (httpAcpBridge.ts + server.ts + tests): cwd length amplification
  through WorkspaceMismatchError. The error constructor interpolates
  `requested` into `.message` TWICE; `sendBridgeError` echoes it on
  stderr (now JSON.stringify-wrapped); `res.json` echoes it again — a
  ~10 MB `cwd` body (right under express.json's 10 MB cap) would
  amplify to ~60 MB per request × maxConnections (default 256). On
  loopback-default-no-token deployments this is pre-auth. Added
  `MAX_WORKSPACE_PATH_LENGTH = 4096` (Linux PATH_MAX); route rejects
  oversized `cwd` with a 400 BEFORE the bridge is touched, and the
  `WorkspaceMismatchError` constructor truncates `requested` as
  defense-in-depth for non-route callers (tests, embeds, future
  entry points that throw the error directly). Three new tests pin
  the route 400, the constructor truncation, and the normal-path
  passthrough.

- Q2 + Q5 (httpAcpBridge.ts docs): the `channelInfo` declaration
  comment + `ChannelInfo.sessionIds` JSDoc + `ChannelInfo.isDying`
  JSDoc all overstated when `channelInfo` is cleared. Post-§02 the
  BkUyD invariant is "ONLY `channel.exited` clears `channelInfo`"
  — teardown initiators (killSession last-session-leaving,
  doSpawn-newSession-failure, ensureChannel init-failure/late-
  shutdown, shutdown) set `isDying = true` but LEAVE `channelInfo`
  pointing at the dying channel until OS reap, so `killAllSync`
  can still reach it through `aliveChannels`. A future maintainer
  reading the old phrasing might "fix" killSession to also clear
  `channelInfo` and silently break the double-Ctrl+C force-kill
  path. Rewrote all three sites to describe the actual invariant +
  enumerate the 5 isDying set-sites + spell out the BkUyD rationale
  in one place (the `isDying` JSDoc) that other comments point at.

- Q3 (runQwenServe.ts): the "listening on …" boot summary goes to
  stdout but every other operational diagnostic (bearer auth, the
  workspace_mismatch breadcrumb, channel-exited, bridge errors) goes
  to stderr. Operators capturing only stderr (systemd / docker / k8s
  default) miss the `workspace=` indicator, which is the single
  piece of information they need most when triaging §02 migration
  issues. Added a `qwen serve: bound to workspace "X"` stderr line
  alongside the stdout one — keeps stdout untouched (integration
  tests + scripts parse it) while making the breadcrumb visible to
  stderr-only log shippers. `JSON.stringify` the boundWorkspace
  value (operator-controlled but cheap defense-in-depth against any
  future flow that lands a control char in the path).

- Q4 (integration-tests/tsconfig.json): the `paths` entry resolved
  `@qwen-code/sdk` to the SDK's built `dist/` directory; `dist/` is
  gitignored and stale dist (no `npm run build` first) yields TS2339
  errors on the integration tests' imports of new SDK fields.
  Pointed `paths` at SDK source instead — `tsc -p
  integration-tests/tsconfig.json` no longer requires a prior
  rebuild. The vitest config's runtime alias still resolves to
  `dist/index.mjs` so the actual test execution exercises the
  published-bundle shape; this paths entry only affects type
  resolution.

- Q6 (httpAcpBridge.ts): `createHttpAcpBridge` constructor called
  `canonicalizeWorkspace(opts.boundWorkspace)` even when the caller
  (`runQwenServe`) had already canonicalized and threaded the same
  value through `deps.boundWorkspace` into `createServeApp`. Two
  independent `realpathSync.native` calls can theoretically diverge
  on NFS-transient / mid-rename filesystems, landing the bridge with
  a canonical form different from what `/capabilities` advertises
  and from `createServeApp`'s view. Dropped the bridge's
  re-canonicalize; kept `path.isAbsolute` (structural, not a
  syscall); documented the caller contract on `BridgeOptions
  .boundWorkspace` ("MUST be pre-canonicalized; tests/embeds call
  `canonicalizeWorkspace` first"). Tests use
  `path.resolve(path.sep, ...)` which is already canonical-or-
  fallback for non-existent paths, so no test changes needed.

bridge: 76/76 (was 74, +2 WorkspaceMismatchError truncation tests);
server: 82/82 (was 80, +2 length cap + the auto-applied helper).
tsc clean for SDK, CLI PR-touched files, and integration-tests'
qwen-serve-*.
2026-05-15 12:44:36 +08:00
..
configuration feat(perf): progressive MCP availability — MCP no longer blocks first input (#3994) 2026-05-13 22:17:16 +08:00
extension chore(deps): upgrade ink 6.2.3 → 7.0.2 + bump Node engine to 22 (#3860) 2026-05-11 17:29:50 +08:00
features feat(perf): progressive MCP availability — MCP no longer blocks first input (#3994) 2026-05-13 22:17:16 +08:00
ide-integration update documentation 2025-12-19 18:16:59 +08:00
reference feat(cli): Ctrl+B promote keybind (#3831 PR-3 of 3) (#3969) 2026-05-11 14:03:38 +08:00
support docs: update authentication methods to reflect OAuth discontinuation (#3325) 2026-04-17 15:34:18 +08:00
_meta.ts feat(cli,sdk): qwen serve daemon (Stage 1) (#3889) 2026-05-13 14:47:47 +08:00
common-workflow.md docs: updated all links, click and open in vscode, new showcase video in overview 2025-12-17 11:10:31 +08:00
integration-github-action.md docs: updated all links, click and open in vscode, new showcase video in overview 2025-12-17 11:10:31 +08:00
integration-jetbrains.md docs(integration): use CDN URLs for images and fix formatting 2026-03-16 14:12:48 +08:00
integration-vscode.md fix: docs 2026-01-14 10:30:03 +08:00
integration-zed.md docs(integration): use CDN URLs for images and fix formatting 2026-03-16 14:12:48 +08:00
overview.md feat(installer): add standalone archive installation (#3776) 2026-05-11 13:25:48 +08:00
quickstart.md chore(deps): upgrade ink 6.2.3 → 7.0.2 + bump Node engine to 22 (#3860) 2026-05-11 17:29:50 +08:00
qwen-serve.md refactor(serve): 1 daemon = 1 workspace (#3803 §02) (#4113) 2026-05-15 12:44:36 +08:00