qwen-code

mirror of https://github.com/QwenLM/qwen-code.git synced 2026-05-18 06:05:04 +00:00

History

jinye d2d426fad0 feat(serve): SSE replay sizing + slow_client_warning backpressure (#4175 Wave 2.5 PR 10) (#4237 ) * feat(serve): SSE replay sizing + slow_client_warning backpressure #4175 Wave 2.5 PR 10. Closes the SSE replay / backpressure knobs called out in #3803 §02 so chatty Stage 1 sessions get an honest reconnect window and operators get a heads-up signal before clients are summarily evicted. - `DEFAULT_RING_SIZE` 4000 → 8000. Per-session replay ring depth now matches the #3803 §02 target for chatty sessions. - `--event-ring-size <n>` CLI flag (default 8000) lets operators tune the ring per daemon. Threaded `ServeOptions` → `BridgeOptions.eventRingSize` → both `new EventBus()` construction sites (fresh sessions + restore path). Validation is fail-CLOSED (positive finite integer; 0 / NaN / negative throw at boot). - `slow_client_warning` SSE frame. When a subscriber's queue crosses 75% full the bus force-pushes a synthetic `slow_client_warning` to that subscriber once per overflow episode, carrying `{queueSize, maxQueued, lastEventId}`. The flag re-arms after the queue drains below 37.5% (hysteresis, no flap near threshold). If the queue actually overflows after the warning, the existing `client_evicted` terminal frame path still fires. Like `client_evicted`, the warning has no `id` (synthetic frame; must not burn a sequence slot for other subscribers). - `?maxQueued=N` query param on `GET /session/:id/events` (range `[16, 2048]`, default 256). Lets cold reconnect clients pre-size their per-subscriber backlog so a large `Last-Event-ID: 0` replay doesn't trip the warning on the first publish. Range rationale: lower bound 16 (smaller is useless for any replay); upper bound 2048 (so a single subscriber can't pin ~1 MB just by asking). Out-of-range / non-decimal returns `400 invalid_max_queued` BEFORE opening the SSE stream — clean 4xx beats half-opening a stream + emitting a `stream_error` (which EventSource would auto-reconnect on). - `slow_client_warning` capability tag — single source of truth for the warning frame + `?maxQueued` query param + ring-size knob. Old daemons silently lack all of these; pre-flight via `caps.features`. - SDK extensions (`@qwen-code/sdk`): typed `DaemonSlowClientWarningEvent` (added to known event union and `DaemonStreamLifecycleEvent`); schema-validated by a new `isSlowClientWarningData` predicate; reducer (`reduceDaemonSessionEvent`) increments `slowClientWarningCount` + stores `lastSlowClientWarning`. Warning is non-terminal — `alive` stays true (only `client_evicted` / `stream_error` / `session_died` close the stream). Re-exported from the public SDK entry. - Docs: `qwen-serve-protocol.md` updates the features list (adds `slow_client_warning` and the previously-missing `client_identity` to match reality post-#4231), documents the `?maxQueued` query param, adds the warning frame to the event table, and notes the new default ring size. `qwen-serve.md` adds the `--event-ring-size` flag row. Tests: 19 eventBus (4 new: warning at 75%, once per episode, no `id` on the synthetic frame, hysteresis re-arm), 106 bridge (2 new: validate eventRingSize accept/reject), 111 server (4 new: ?maxQueued accept/absent/non-decimal/out-of-range + EXPECTED_STAGE1_FEATURES update), 14 SDK daemonEvents (2 new: schema validation + non-terminal reducer behavior). 321 focused tests total, all green. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * refactor(serve): adopt PR #4237 review feedback (eventBus polish) Address the actionable items from the Qwen Code review bot's pass on PR #4237: - Pre-compute `warnThreshold` / `warnResetThreshold` per `InternalSub` at `subscribe()` time so `publish()`'s per-event hot path is one integer compare per subscriber instead of a multiply + compare. The `!warned` short-circuit still collapses the steady state to a single boolean read; this just shaves a multiply when the threshold check actually fires. - Document the back-of-queue ordering choice for the synthetic `slow_client_warning` frame in `EventBus.publish()`: front-push was considered but mid-stream front-insertion would mis-count `forcedInBuf` in `BoundedAsyncQueue.next()`, and `forcePush` already short-circuits via `resolvers.shift()` for the active-consumer case — the back-of-queue path only matters for stalled consumers, who can't drain regardless of warning position. - Reuse the existing `collect()` helper in the "default ring size 8000" test for consistency with the rest of the file; the new test also tightens the assertion by checking that the first retained event id is 2 (id=1 dropped by the ring) and the last is 8001. - Soften the "~500 B per session" magic number in `BridgeOptions.eventRingSize`'s JSDoc to a qualitative description (each retained `BridgeEvent` is a reference plus its serialized payload; ceiling scales as `ringSize × average-event-size`). Rejected: - Bot's claim that the error JSON contains `\`...\`` escape sequences — bot misread the JS template-literal source as the wire output; `JSON.stringify` does not escape backticks, and the existing `cwd` error messages use the same style. - Bot's "use `Record<string, never>` instead of `[key: string]: unknown`" suggestion on `DaemonSlowClientWarningData` — every other event-data type in `sdk-typescript/src/daemon/events.ts` carries the same index signature for additive-field compatibility. - Bot's "features list breaks alphabetical order" — the capability list is grouped by protocol lifecycle (health → capabilities → session lifecycle → events → permissions), not alphabetical. Tests: 139 focused tests across eventBus + httpAcpBridge + SDK daemon events — all passing. Behavior unchanged; this is hot-path micro-opt + comment polish only. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(serve): correct queue tagging + plumb maxQueued through SDK Address both P2 findings from the Codex review pass on PR #4237. Bug 1: `BoundedAsyncQueue.forcedInBuf` position-invariant break The previous `forcedInBuf` counter only tracked LIVE-vs-FORCED correctly when all forced entries lived at the FRONT of the buffer (subscribe-time `Last-Event-ID` replay). The new mid-stream `slow_client_warning` path force-pushes to the BACK of the queue while the queue is still open, which the existing accounting was not designed for: - publish 6 events at maxQueued=8 → 75% threshold trips → force-push warning at the back → buf=[1..6, warning], forcedInBuf=1. - consumer shifts `1` → forcedInBuf decremented to 0 (incorrect: `1` was a live frame, not the forced one). - consumer drains 2..6 + warning → buf=[], forcedInBuf=0, true live count = 0, but `size` getter and `push()` cap check then use `buf.length - forcedInBuf` which drifts over subsequent refills, causing premature warn / eviction before the cap is actually reached. Replace the position-dependent counter with a per-entry `{value, forced}` tag. `liveCount` is incremented in `push()` / decremented in `next()` only when the shifted entry was non-forced — position becomes irrelevant. `size` getter returns `liveCount` directly. The class doc comment is rewritten to call out that the new tag is the position-independent replacement for the old "forced frames must stay at the front" invariant. Regression test in `eventBus.test.ts` reproduces the codex trace (warn at 75%, drain past warning, refill to cap) and asserts no premature eviction. Bug 2: SDK does not expose `?maxQueued` `docs/users/qwen-serve.md` and `docs/developers/qwen-serve-protocol.md` both document `?maxQueued=N` as something SDK clients can request, but `SubscribeOptions` on `DaemonClient` only declared `lastEventId` + `signal`, and `subscribeEvents()` always fetched `/events` without a query string. Typed-SDK consumers had no way to opt in without hand-crafting URLs. - Add `SubscribeOptions.maxQueued?: number` with JSDoc noting the daemon range `[16, 2048]` and the pre-flight requirement on `caps.features.slow_client_warning`. - `DaemonClient.subscribeEvents` builds the URL with an optional `?maxQueued=<n>` segment. No client-side range validation — the daemon's `parseMaxQueuedQuery` is the source of truth and returns structured `400 invalid_max_queued`; duplicating the bounds in two layers would diverge on the next tweak. - `DaemonSessionSubscribeOptions extends SubscribeOptions` so the new field flows through `DaemonSessionClient` automatically. Three new SDK tests: - subscribeEvents appends `?maxQueued=N` when set - omits the query string when absent (existing behavior preserved) - propagates a `400 invalid_max_queued` unchanged Tests: 214 focused tests across eventBus / bridge / SDK DaemonClient / DaemonSessionClient / daemonEvents, plus 111 in the server suite. All green; the new eventBus regression case proves the position-invariant fix. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * refactor(serve): adopt PR #4237 copilot review feedback Address 6 of 8 copilot-reviewer findings on PR #4237; the other 2 (#1 forcedInBuf live-size corruption, #5 SDK lacks maxQueued) were already fixed in `bae42c88b` — replied on the threads with the commit hash. - [2] server.ts:1068 — `?maxQueued=` (present-but-empty) now fails closed with `400 invalid_max_queued` instead of silently falling back to the default queue cap. The API documents fail-closed for any malformed value before opening SSE, so an empty string is unambiguously malformed. New server.test.ts case locks this in. - [3] commands/serve.ts:93 — CLI help text for `--event-ring-size` no longer mis-shapes `Last-Event-ID` as a query parameter. It is an HTTP header, and the daemon's SSE route does not parse a `?Last-Event-ID=` query. - [4] docs/developers/qwen-serve-protocol.md:351 — clarify that `?maxQueued=N` controls the LIVE-event backlog cap. Replay frames are force-pushed and exempt from the cap; what consumes it is live events that arrive while the subscriber is still draining a cold-reconnect replay. Bumping for cold reconnects is still the right answer, but for the live tail, not for the replay frames themselves. - [6] eventBus.ts:214 — stale `ringSize=4000` performance comment updated to the new `ringSize=8000` default with a note about the O(n) `shift()` cost scaling. - [7] sdk-typescript events.ts:492 — `isSlowClientWarningData` now uses the existing `isFiniteNumber` helper instead of bare `typeof === 'number'`. Mirrors the sibling predicates and rejects `NaN` / `Infinity` payloads as schema garbage. New daemonEvents.test.ts assertions cover both. - [8] server.ts:127 — `createServeApp`'s default-bridge construction now also forwards `opts.eventRingSize` to `createHttpAcpBridge`, symmetric with the `runQwenServe.ts` path. Direct embeds / tests that called `createServeApp` without supplying their own bridge but did pass `ServeOptions.eventRingSize` were silently getting the default 8000 ring. Tests: 326 focused tests across eventBus / bridge / SDK DaemonClient / DaemonSessionClient / daemonEvents / server. All green; the new server.test.ts case + the extended daemonEvents.test.ts assertions cover the tightened guards. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * refactor(serve): adopt PR #4237 wenshao round-2 review feedback Six adopted findings from @wenshao's second review pass on PR #4237. The seventh ([10] forcedInBuf 3rd case invariant) was already fixed in `bae42c88b` — replied on that thread. - [9] + [14] server.ts — Sanitize attacker-controlled values before stderr interpolation in both `parseMaxQueuedQuery` and `parseLastEventId`. New `safeLogValue()` helper uses `JSON.stringify` to escape control characters (`\n`/`\r`/…) so a URL-encoded newline in `?maxQueued=%0a` can't inject extra log lines into journald/Loki/Splunk pipelines. Matches the `workspace_mismatch` sanitization style in `sendBridgeError`. Fixed in both helpers (the sibling pre-existing `parseLastEventId` had the same shape) so the file stays consistent. - [11] httpAcpBridge.ts — `!Number.isFinite(eventRingSize)` was redundant: `Number.isInteger(NaN)` and `Number.isInteger(Infinity)` both return `false`, so the sibling `!Number.isInteger` already catches both. Drop the dead guard. - [12] httpAcpBridge.ts — Add soft upper bound `MAX_EVENT_RING_SIZE = 1_000_000` on `eventRingSize` to catch operator typos (`--event-ring-size 80000000` vs `8000000`). At ~500 B per `BridgeEvent` an 1M-frame ring already pins ~500 MB per session — well past any realistic workload. Not a security boundary (operator-controlled flag), pure typo defense. Existing bridge construction test extended with an `80_000_000` case. - [13] commands/serve.ts — CLI `--event-ring-size` flag now sources its default from `DEFAULT_RING_SIZE` (imported from `serve/eventBus.js`) instead of the hardcoded literal `8000`. Without this, a future bump of the bus default would silently not take effect for daemons launched through the CLI because the flag always overrides — single source of truth fixes that. - [15] eventBus.ts — Drop unreachable `event.id ?? this.lastEventId` fallback in the `slow_client_warning` frame. `event` is locally constructed at the top of `publish()` with `id: this.nextId++` and is guaranteed defined. Use `event.id as number` directly + an inline note about the invariant. Tests: 197 (eventBus 20 / bridge 107 / SDK DaemonClient 57 / SDK daemonEvents 14) + 112 server. All green; the new upper-bound bridge case + the existing log assertions pin the changed behaviors. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>		2026-05-17 19:30:43 +08:00
..
development	feat(telemetry): add detailed sensitive span attributes (#4097 )	2026-05-17 00:36:48 +08:00
examples	refactor(serve): 1 daemon = 1 workspace (#3803 §02) (#4113 )	2026-05-15 12:44:36 +08:00
tools	feat(web-search): remove built-in web_search tool, replace with MCP-based approach (#3502 )	2026-04-24 11:29:02 +08:00
_meta.ts	feat(cli,sdk): qwen serve daemon (Stage 1) (#3889 )	2026-05-13 14:47:47 +08:00
architecture.md	docs: enhance architecture documentation and add contribution guidelines	2025-12-11 18:31:24 +08:00
channel-plugins.md	feat(channels): add dispatch modes and prompt lifecycle hooks	2026-03-28 06:19:02 +00:00
contributing.md	chore(deps): upgrade ink 6.2.3 → 7.0.2 + bump Node engine to 22 (#3860 )	2026-05-11 17:29:50 +08:00
qwen-serve-protocol.md	feat(serve): SSE replay sizing + slow_client_warning backpressure (#4175 Wave 2.5 PR 10) (#4237 )	2026-05-17 19:30:43 +08:00
roadmap.md	feat(cli): improve auth dialog UX with clearer three-option layout	2026-03-01 15:22:35 +08:00
sdk-java.md	Fix typo in class name (#2189 )	2026-04-18 11:59:36 +08:00
sdk-python.md	doc[sdk-python] Expand Python SDK usage documentation (#3995 )	2026-05-12 15:27:00 +08:00
sdk-typescript.md	chore(deps): upgrade ink 6.2.3 → 7.0.2 + bump Node engine to 22 (#3860 )	2026-05-11 17:29:50 +08:00