From 66ffd7cc66bcb334468eb0235c0dfeec3683f502 Mon Sep 17 00:00:00 2001 From: wenshao Date: Tue, 12 May 2026 07:35:23 +0800 Subject: [PATCH] =?UTF-8?q?feat(serve):=20close=203=20chiga0=20audit=20ite?= =?UTF-8?q?ms=20=E2=80=94=20ringSize=204000,=20--max-sessions,=20/health?= =?UTF-8?q?=3Fdeep=3D1=20(#3803)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three "30-minute" items from chiga0's external architecture audit (2026-05-11). All actionable within Stage 1 scope; remaining items in chiga0's review (SaaS positioning, multi-token to Stage 1.5, acp-bridge package extraction, reference orchestrator) are larger scoping decisions deferred to Stage 1.5/2. DEFAULT_RING_SIZE 1000 → 4000 (Risk 4): - A single long turn can emit hundreds of frames (test plan reports 13 for a SHORT turn, real workloads can be 10× that). 1000 was exhausted by a moderate turn before a 5s reconnect window finished. 4000 gives ~30× headroom over a typical busy turn at the cost of a few hundred KB RAM/session. Updated user + protocol docs and the daemon-client-quickstart example. --max-sessions (default 20) (Rec 3): - New `ServeOptions.maxSessions` + matching `BridgeOptions`. Bridge throws `SessionLimitExceededError` when `byId.size + inFlightSpawns.size >= max` BEFORE issuing a fresh spawn. Attaches to existing sessions (single scope) bypass the cap so an idle daemon's reconnects keep working at-capacity. `0` disables. Default of 20 sized below the design's N≈50 cliff (per-session ~30–50 MB RSS + FD pressure). HTTP route maps to 503 with `Retry-After: 5` and `code: session_limit_exceeded`. Tests cover: cap rejection under thread scope, attach-not-counted under single scope, `0` disables. Documented in CLI flags table + protocol Common-error section. /health?deep=1 (Risk 3): - Default `/health` stays cheap (no bridge access). With `?deep=1` the response includes `sessions` and `pendingPermissions` from the bridge — touches state so a wedged bridge surfaces as 503 `{status: "degraded"}` instead of "200 ok" on a zombie daemon (the `k8s rolling deploy will see healthy` failure mode chiga0 flagged). Loopback-vs-non-loopback bearer-exempt logic from the earlier A8dZT fix is preserved via a shared handler. Tests cover: cheap default, deep response shape, throwing-getter → 503. --- .../examples/daemon-client-quickstart.md | 2 +- docs/developers/qwen-serve-protocol.md | 24 +++++- docs/users/qwen-serve.md | 15 ++-- packages/cli/src/commands/serve.ts | 9 ++ packages/cli/src/serve/eventBus.ts | 14 +++- packages/cli/src/serve/httpAcpBridge.test.ts | 83 +++++++++++++++++++ packages/cli/src/serve/httpAcpBridge.ts | 44 ++++++++++ packages/cli/src/serve/runQwenServe.ts | 3 +- packages/cli/src/serve/server.test.ts | 64 ++++++++++++++ packages/cli/src/serve/server.ts | 53 ++++++++++-- packages/cli/src/serve/types.ts | 11 +++ 11 files changed, 303 insertions(+), 19 deletions(-) diff --git a/docs/developers/examples/daemon-client-quickstart.md b/docs/developers/examples/daemon-client-quickstart.md index 3e1696069..a72069aca 100644 --- a/docs/developers/examples/daemon-client-quickstart.md +++ b/docs/developers/examples/daemon-client-quickstart.md @@ -113,7 +113,7 @@ for await (const event of client.subscribeEvents(session.sessionId, { } ``` -The daemon retains the last 1000 events per session in a ring buffer; gaps beyond that window won't be re-deliverable. +The daemon retains the last 4000 events per session in a ring buffer; gaps beyond that window won't be re-deliverable. ## Voting on permissions diff --git a/docs/developers/qwen-serve-protocol.md b/docs/developers/qwen-serve-protocol.md index 424c56b75..f072ac15a 100644 --- a/docs/developers/qwen-serve-protocol.md +++ b/docs/developers/qwen-serve-protocol.md @@ -40,6 +40,18 @@ with status `400`. with status `404`. +`POST /session` past the daemon's `--max-sessions` cap returns `503` with a `Retry-After: 5` header and: + +```json +{ + "error": "Session limit reached (20)", + "code": "session_limit_exceeded", + "limit": 20 +} +``` + +Attaches to existing sessions are NOT counted toward the cap, so an idle daemon's reconnects keep working even when at-capacity. + ## Capabilities Every Stage 1 daemon advertises 9 feature tags. Clients **must** gate UI off `features`, not off `mode` (per design §10). @@ -60,7 +72,15 @@ Every Stage 1 daemon advertises 9 feature tags. Clients **must** gate UI off `fe ### `GET /health` -Liveness probe. Returns `200 {"status":"ok"}` if the listener is up. +Liveness probe. Default form returns `200 {"status":"ok"}` if the listener is up — cheap, no bridge access, suitable for high-frequency k8s/Compose liveness probes. + +Pass `?deep=1` (also accepts `?deep=true` or bare `?deep`) for a probe that touches bridge state. The deep probe is what catches a "zombie daemon" — listener answering, bridge wedged: + +```json +{ "status": "ok", "sessions": 3, "pendingPermissions": 1 } +``` + +A 503 with `{"status":"degraded"}` indicates the bridge state access threw — pull the daemon out of rotation and investigate. **Auth:** required **only on non-loopback binds**. On loopback (`127.0.0.1`, `::1`, `[::1]`) `/health` is registered before the bearer middleware so k8s/Compose probes inside the pod don't need to carry the token. On non-loopback (`--hostname 0.0.0.0` etc.) the route is registered after the bearer middleware and returns 401 without a valid token — otherwise an unauthenticated caller could probe arbitrary addresses to confirm a `qwen serve` exists, a low-severity info leak that combines poorly with port scanning. CORS deny + Host allowlist still apply on the loopback exemption. @@ -237,7 +257,7 @@ The SSE-level `id:` / `event:` lines duplicate `envelope.id` / `envelope.type` f Reconnect semantics: -- Send `Last-Event-ID: ` to replay events with `id > n` from the per-session ring (default depth 1000) +- Send `Last-Event-ID: ` to replay events with `id > n` from the per-session ring (default depth 4000) - **Gap detection (client-side):** if `` predates the oldest event still in the ring (e.g. you reconnect with `Last-Event-ID: 50` but the ring now holds 200–1199), the daemon replays from the oldest available event without raising. Compare the first replayed event's `id` against `n + 1`; any difference is the size of the lost window. Stage 2 will inject an explicit `stream_gap` synthetic frame on the daemon side; in Stage 1 detection is the client's responsibility. - IDs are monotonic per session, starting at 1 - Synthetic terminal frames (`client_evicted`, `stream_error`) intentionally omit `id` so they don't burn a sequence slot for other subscribers diff --git a/docs/users/qwen-serve.md b/docs/users/qwen-serve.md index bdd8eaade..23ae59a81 100644 --- a/docs/users/qwen-serve.md +++ b/docs/users/qwen-serve.md @@ -57,7 +57,7 @@ curl -N http://127.0.0.1:4170/session/$SESSION_ID/events The `data:` line is the **full event envelope** — `{id?, v, type, data, originatorClientId?}` — JSON-stringified on a single line. The ACP payload (the `sessionUpdate` block in this example) sits under `data` inside that envelope. The SSE-level `id:` / `event:` lines are convenience for EventSource clients; the same values appear inside the JSON envelope so raw-`fetch` consumers get them too. Open this **before** sending the prompt — the SSE replay buffer holds the -last 1000 events so a late subscriber can catch up via `Last-Event-ID`, +last 4000 events so a late subscriber can catch up via `Last-Event-ID`, but for the simple "watch a single prompt" case it's easiest to subscribe first and let it stream live. @@ -100,12 +100,13 @@ The token comparison is constant-time (SHA-256 + `crypto.timingSafeEqual`); 401 ## CLI flags -| Flag | Default | Purpose | -| ------------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------- | -| `--port ` | `4170` | TCP port. `0` = OS-assigned ephemeral port. | -| `--hostname ` | `127.0.0.1` | Bind interface. Anything beyond loopback requires a token. | -| `--token ` | — | Bearer token. Falls back to `QWEN_SERVER_TOKEN` env var (with leading/trailing whitespace stripped — handy for `$(cat token.txt)`). | -| `--http-bridge` | `true` | Stage 1 mode: per-session `qwen --acp` child process. Stage 2 native in-process becomes available later. | +| Flag | Default | Purpose | +| -------------------- | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `--port ` | `4170` | TCP port. `0` = OS-assigned ephemeral port. | +| `--hostname ` | `127.0.0.1` | Bind interface. Anything beyond loopback requires a token. | +| `--token ` | — | Bearer token. Falls back to `QWEN_SERVER_TOKEN` env var (with leading/trailing whitespace stripped — handy for `$(cat token.txt)`). | +| `--max-sessions ` | `20` | Cap on concurrent live sessions. New `POST /session` requests that would spawn a fresh child return `503` (with `Retry-After: 5`) when the cap is hit; attaches to existing sessions are NOT counted. Set to `0` to disable. Sized for single-user / small-team usage; raise it if your deployment has the RAM/FD headroom (~30–50 MB per session). | +| `--http-bridge` | `true` | Stage 1 mode: per-session `qwen --acp` child process. Stage 2 native in-process becomes available later. | ## Default deployment threat model diff --git a/packages/cli/src/commands/serve.ts b/packages/cli/src/commands/serve.ts index 0d9fa81b4..bc70f8828 100644 --- a/packages/cli/src/commands/serve.ts +++ b/packages/cli/src/commands/serve.ts @@ -16,6 +16,7 @@ interface ServeArgs { port: number; hostname: string; token?: string; + 'max-sessions': number; // Read from the kebab-case key only — the camelCase mirror that yargs // synthesizes is convenient for handlers but type-confusing here. The // handler reads `argv['http-bridge']` directly. @@ -45,6 +46,13 @@ export const serveCommand: CommandModule = { description: 'Bearer token required on every request. Falls back to the QWEN_SERVER_TOKEN env var.', }) + .option('max-sessions', { + type: 'number', + default: 20, + description: + 'Cap on concurrent live sessions. New spawn requests beyond this return 503; ' + + 'attach to existing sessions still works. Set to 0 to disable.', + }) .option('http-bridge', { type: 'boolean', default: true, @@ -79,6 +87,7 @@ export const serveCommand: CommandModule = { hostname: argv.hostname, token: argv.token, mode: 'http-bridge', + maxSessions: argv['max-sessions'], }); } catch (err) { writeStderrLine( diff --git a/packages/cli/src/serve/eventBus.ts b/packages/cli/src/serve/eventBus.ts index a917ae55b..1d099fb2a 100644 --- a/packages/cli/src/serve/eventBus.ts +++ b/packages/cli/src/serve/eventBus.ts @@ -61,7 +61,17 @@ export interface SubscribeOptions { } const DEFAULT_MAX_QUEUED = 256; -const DEFAULT_RING_SIZE = 1000; +/** + * Default replay-ring depth per session. Sized for a 5-second + * reconnect window over a chatty turn — a single long-running prompt + * can emit hundreds of frames (test plan reports 13 for a short + * turn, real workloads can be 10× that or more once tool-call / + * thought streams pile up). 1000 was the original default and could + * be exhausted by a moderate turn before the client reconnected; + * 4000 gives ~30× headroom over a typical-but-busy turn at the cost + * of a few hundred KB of RAM per session. + */ +const DEFAULT_RING_SIZE = 4000; /** * Per-bus subscriber cap. With per-subscriber `maxQueued` defaulting to * 256 frames, 64 concurrent subscribers caps the per-session subscriber @@ -133,7 +143,7 @@ export class EventBus { ...input, }; this.ring.push(event); - // Eviction-by-shift is O(n) once the ring is full. With ringSize=1000 + // Eviction-by-shift is O(n) once the ring is full. With ringSize=4000 // and per-publish work measured in hundreds of microseconds even on // chatty sessions, this isn't a real hotspot today. A circular-buffer // refactor would push it to O(1) but adds index bookkeeping; deferred diff --git a/packages/cli/src/serve/httpAcpBridge.test.ts b/packages/cli/src/serve/httpAcpBridge.test.ts index 1ed70fbad..b5c68c017 100644 --- a/packages/cli/src/serve/httpAcpBridge.test.ts +++ b/packages/cli/src/serve/httpAcpBridge.test.ts @@ -1785,4 +1785,87 @@ describe('createHttpAcpBridge', () => { expect(events[0]?.type).toBe('session_died'); }); }); + + describe('maxSessions cap (chiga0 Rec 3)', () => { + it('refuses NEW spawns past the cap with SessionLimitExceededError', async () => { + const factory: ChannelFactory = async () => makeChannel().channel; + const bridge = createHttpAcpBridge({ + channelFactory: factory, + maxSessions: 2, + // `thread` so each call is a fresh session, not an attach. + sessionScope: 'thread', + }); + + // First two spawns succeed. + await bridge.spawnOrAttach({ workspaceCwd: WS_A }); + await bridge.spawnOrAttach({ workspaceCwd: WS_B }); + expect(bridge.sessionCount).toBe(2); + + // Third hits the cap. + await expect( + bridge.spawnOrAttach({ workspaceCwd: WS_A }), + ).rejects.toMatchObject({ + name: 'SessionLimitExceededError', + limit: 2, + }); + // Cap rejection must NOT register a new session. + expect(bridge.sessionCount).toBe(2); + + await bridge.shutdown(); + }); + + it('attach to an existing session under single scope is NOT counted toward the cap', async () => { + const factory: ChannelFactory = async () => makeChannel().channel; + const bridge = createHttpAcpBridge({ + channelFactory: factory, + maxSessions: 1, + sessionScope: 'single', + }); + + // First call spawns. + const a = await bridge.spawnOrAttach({ workspaceCwd: WS_A }); + expect(a.attached).toBe(false); + expect(bridge.sessionCount).toBe(1); + + // Second call to the SAME workspace attaches — cap doesn't apply. + const b = await bridge.spawnOrAttach({ workspaceCwd: WS_A }); + expect(b.attached).toBe(true); + expect(b.sessionId).toBe(a.sessionId); + expect(bridge.sessionCount).toBe(1); + + // But a NEW workspace (would need a fresh spawn) is rejected. + await expect( + bridge.spawnOrAttach({ workspaceCwd: WS_B }), + ).rejects.toMatchObject({ + name: 'SessionLimitExceededError', + }); + + await bridge.shutdown(); + }); + + it('maxSessions: 0 disables the cap', async () => { + // Distinct sessionIdPrefix per spawn so each call gets a unique + // sessionId (otherwise they'd collide in `byId` and only the + // last would remain — making `sessionCount` stay at 1). + let n = 0; + const factory: ChannelFactory = async () => + makeChannel({ sessionIdPrefix: `s${n++}` }).channel; + const bridge = createHttpAcpBridge({ + channelFactory: factory, + maxSessions: 0, + sessionScope: 'thread', + }); + // 5 spawns is far past the would-be default of 20 isn't, but + // it's enough to confirm the cap is disabled (with default of + // 20 a thread-scope flood could go 5 deep without hitting it + // anyway, so we use a smaller test value with 0/disabled + // explicit so a regression that re-enabled some default cap + // would still surface). + for (let i = 0; i < 5; i++) { + await bridge.spawnOrAttach({ workspaceCwd: WS_A }); + } + expect(bridge.sessionCount).toBe(5); + await bridge.shutdown(); + }); + }); }); diff --git a/packages/cli/src/serve/httpAcpBridge.ts b/packages/cli/src/serve/httpAcpBridge.ts index 8a667f0a2..88c9d9098 100644 --- a/packages/cli/src/serve/httpAcpBridge.ts +++ b/packages/cli/src/serve/httpAcpBridge.ts @@ -177,6 +177,22 @@ export class SessionNotFoundError extends Error { } } +/** + * Thrown by `spawnOrAttach` when a fresh-spawn would push `sessionCount` + * past `BridgeOptions.maxSessions`. The HTTP route maps this to 503 + * with a `Retry-After` hint. Attaches (same workspace under `single` + * scope) never trip this — only NEW children. Distinct error type so + * routes can branch without text-matching. + */ +export class SessionLimitExceededError extends Error { + readonly limit: number; + constructor(limit: number) { + super(`Session limit reached (${limit})`); + this.name = 'SessionLimitExceededError'; + this.limit = limit; + } +} + /** * One ACP NDJSON channel to a single agent. Tests inject a fake by replacing * the channel factory; production uses `defaultSpawnChannelFactory`. @@ -208,6 +224,14 @@ export interface BridgeOptions { channelFactory?: ChannelFactory; /** How long to wait for the child's `initialize` reply before giving up. */ initializeTimeoutMs?: number; + /** + * Cap on concurrent live sessions. `spawnOrAttach` calls that would + * cross this throw `SessionLimitExceededError`; attaches to an + * existing session (same workspace under `single` scope) are not + * counted. `0` / `Infinity` disable the cap. Defaults to 20 — see + * `ServeOptions.maxSessions` for the rationale. + */ + maxSessions?: number; } interface SessionEntry { @@ -394,9 +418,21 @@ class BridgeClient implements Client { } const DEFAULT_INIT_TIMEOUT_MS = 10_000; +const DEFAULT_MAX_SESSIONS = 20; export function createHttpAcpBridge(opts: BridgeOptions = {}): HttpAcpBridge { const sessionScope = opts.sessionScope ?? 'single'; + // `undefined` → default 20 (intentionally tight per #3803 N≈50 cliff). + // `0` or non-finite (Infinity / NaN) → unlimited. + // Positive finite → use as-is. + let maxSessions: number; + if (opts.maxSessions === undefined) { + maxSessions = DEFAULT_MAX_SESSIONS; + } else if (opts.maxSessions <= 0 || !Number.isFinite(opts.maxSessions)) { + maxSessions = Infinity; + } else { + maxSessions = opts.maxSessions; + } if (sessionScope !== 'single' && sessionScope !== 'thread') { throw new TypeError( `Invalid sessionScope: ${JSON.stringify(sessionScope)}. ` + @@ -801,6 +837,14 @@ export function createHttpAcpBridge(opts: BridgeOptions = {}): HttpAcpBridge { } } + // Cap check: count both registered sessions and in-flight spawns + // (a fresh-spawn races that's about to register hasn't hit + // `byId` yet but should still count toward the limit). Attaches + // returned above bypass this — only NEW children are gated. + if (byId.size + inFlightSpawns.size >= maxSessions) { + throw new SessionLimitExceededError(maxSessions); + } + const promise = doSpawn(workspaceKey, req.modelServiceId); // Track in-flight spawns regardless of scope. Under `single` // this also serves the coalescing path above (a parallel diff --git a/packages/cli/src/serve/runQwenServe.ts b/packages/cli/src/serve/runQwenServe.ts index e5cf5c55c..c7e2cb4f6 100644 --- a/packages/cli/src/serve/runQwenServe.ts +++ b/packages/cli/src/serve/runQwenServe.ts @@ -74,7 +74,8 @@ export async function runQwenServe( ); } - const bridge = deps.bridge ?? createHttpAcpBridge(); + const bridge = + deps.bridge ?? createHttpAcpBridge({ maxSessions: opts.maxSessions }); let actualPort = opts.port; const app = createServeApp(opts, () => actualPort, { bridge }); diff --git a/packages/cli/src/serve/server.test.ts b/packages/cli/src/serve/server.test.ts index b6404916e..606737407 100644 --- a/packages/cli/src/serve/server.test.ts +++ b/packages/cli/src/serve/server.test.ts @@ -17,6 +17,7 @@ import type { SetSessionModelResponse, } from '@agentclientprotocol/sdk'; import { + SessionLimitExceededError, SessionNotFoundError, type BridgeSession, type BridgeSessionSummary, @@ -758,6 +759,69 @@ describe('createServeApp', () => { expect(bridge.calls).toHaveLength(0); }); }); + + describe('GET /health?deep=1 (chiga0 Risk 3)', () => { + it('default /health stays cheap (no bridge touch)', async () => { + const bridge = fakeBridge(); + const app = createServeApp(baseOpts, undefined, { bridge }); + const res = await request(app) + .get('/health') + .set('Host', `127.0.0.1:${baseOpts.port}`); + expect(res.status).toBe(200); + expect(res.body).toEqual({ status: 'ok' }); + }); + + it('deep=1 includes bridge state', async () => { + const bridge = fakeBridge(); + const app = createServeApp(baseOpts, undefined, { bridge }); + const res = await request(app) + .get('/health?deep=1') + .set('Host', `127.0.0.1:${baseOpts.port}`); + expect(res.status).toBe(200); + expect(res.body).toMatchObject({ + status: 'ok', + sessions: 0, + pendingPermissions: 0, + }); + }); + + it('deep=1 returns 503 when bridge state access throws', async () => { + // Simulate a wedged bridge by replacing the getter to throw. + const bridge = fakeBridge(); + Object.defineProperty(bridge, 'sessionCount', { + get() { + throw new Error('bridge wedged'); + }, + }); + const app = createServeApp(baseOpts, undefined, { bridge }); + const res = await request(app) + .get('/health?deep=1') + .set('Host', `127.0.0.1:${baseOpts.port}`); + expect(res.status).toBe(503); + expect(res.body).toEqual({ status: 'degraded' }); + }); + }); + + describe('session limit (chiga0 Rec 3 — --max-sessions)', () => { + it('503 + Retry-After + structured error when bridge throws SessionLimitExceededError', async () => { + const bridge = fakeBridge({ + spawnImpl: async () => { + throw new SessionLimitExceededError(20); + }, + }); + const app = createServeApp(baseOpts, undefined, { bridge }); + const res = await request(app) + .post('/session') + .set('Host', `127.0.0.1:${baseOpts.port}`) + .send({ cwd: '/work/a' }); + expect(res.status).toBe(503); + expect(res.headers['retry-after']).toBe('5'); + expect(res.body).toMatchObject({ + code: 'session_limit_exceeded', + limit: 20, + }); + }); + }); }); describe('runQwenServe', () => { diff --git a/packages/cli/src/serve/server.ts b/packages/cli/src/serve/server.ts index a1b1a0e89..00c2f3345 100644 --- a/packages/cli/src/serve/server.ts +++ b/packages/cli/src/serve/server.ts @@ -12,6 +12,7 @@ import { bearerAuth, denyBrowserOriginCors, hostAllowlist } from './auth.js'; import { isLoopbackBind } from './loopbackBinds.js'; import { createHttpAcpBridge, + SessionLimitExceededError, SessionNotFoundError, type HttpAcpBridge, } from './httpAcpBridge.js'; @@ -75,11 +76,39 @@ export function createServeApp( // gate `/health` behind their token alongside everything else. // CORS deny + Host allowlist still apply to `/health` in both // cases. + // Shared handler so loopback (pre-auth) and non-loopback (post-auth) + // routes return the same shape. `?deep=1` opts into a probe that + // touches bridge state — if the bridge is wedged the property + // accesses throw and we surface 503 so k8s probes can distinguish + // a zombie daemon from a healthy one. Default (no query) stays + // cheap so high-frequency liveness probes don't load the bridge. + const healthHandler = ( + req: import('express').Request, + res: import('express').Response, + ): void => { + const deepQuery = req.query['deep']; + const deep = deepQuery === '1' || deepQuery === 'true' || deepQuery === ''; + if (!deep) { + res.status(200).json({ status: 'ok' }); + return; + } + try { + res.status(200).json({ + status: 'ok', + sessions: bridge.sessionCount, + pendingPermissions: bridge.pendingPermissionCount, + }); + } catch (err) { + writeStderrLine( + `qwen serve: /health deep probe failed: ${err instanceof Error ? err.message : String(err)}`, + ); + res.status(503).json({ status: 'degraded' }); + } + }; + const loopback = isLoopbackBind(opts.hostname); if (loopback) { - app.get('/health', (_req, res) => { - res.status(200).json({ status: 'ok' }); - }); + app.get('/health', healthHandler); } app.use(bearerAuth(opts.token)); @@ -89,9 +118,7 @@ export function createServeApp( // Non-loopback: register `/health` AFTER `bearerAuth` so probes // must carry the token. Otherwise unauthenticated callers can // ping any reachable address:port to confirm a daemon exists. - app.get('/health', (_req, res) => { - res.status(200).json({ status: 'ok' }); - }); + app.get('/health', healthHandler); } app.get('/capabilities', (_req, res) => { @@ -624,6 +651,20 @@ function sendBridgeError(res: import('express').Response, err: unknown): void { res.status(404).json({ error: err.message, sessionId: err.sessionId }); return; } + if (err instanceof SessionLimitExceededError) { + // 503 Service Unavailable + `Retry-After` is the canonical + // "we'd serve you, but we're full right now" shape. The hint + // is intentionally conservative (5s) because a session that + // finishes a prompt frees a slot quickly under normal load; + // a client that backs off too aggressively wastes capacity. + res.set('Retry-After', '5'); + res.status(503).json({ + error: err.message, + code: 'session_limit_exceeded', + limit: err.limit, + }); + return; + } // 5xx is the kind of error operators need to see in their daemon log // — bridge ENOMEM, agent stack trace, unexpected throw, etc. Without // logging here every 500 disappears once the caller consumes the diff --git a/packages/cli/src/serve/types.ts b/packages/cli/src/serve/types.ts index d55bcb3bd..0365bd2d0 100644 --- a/packages/cli/src/serve/types.ts +++ b/packages/cli/src/serve/types.ts @@ -25,6 +25,17 @@ export interface ServeOptions { */ token?: string; mode: ServeMode; + /** + * Cap on concurrent live sessions. Once `bridge.sessionCount` reaches + * this, new `POST /session` requests that would spawn fresh sessions + * return 503. Attaching to an existing session (same workspace under + * `sessionScope: 'single'`) still works — so an idle daemon doesn't + * block reconnects from existing users. Defaults to 20: comfortably + * above single-user usage, well below the design's N≈50 cliff where + * per-session RSS (~30–50 MB) and FD pressure start to bite. Set to + * `0` or `Infinity` to disable. + */ + maxSessions?: number; } /**