qwen-code/integration-tests/baselines
jinye 0788ed7fb0
test(perf): add daemon baseline harness (#4175 Wave 1 PR 1) (#4205)
* test(perf): add daemon baseline harness (#4175 Wave 1 PR 1)

First implementation PR of the Mode B v0.16 rollout (issue #4175 Wave 1
PR 1). Captures reference performance metrics for the `qwen serve`
daemon so subsequent Mode B PRs (M2 MCP shared pool, M3 architecture
refactor, M4 multi-client safety) can be measured against a known
baseline rather than guessed-at numbers.

## What it captures

The new `integration-tests/cli/qwen-serve-baseline.test.ts` runs five
describe blocks against a real `qwen serve` daemon:

- RSS scaling across 1 / 5 / 10 same-workspace `createOrAttachSession`
  calls (sampled via `ps -o rss=`).
- Same-workspace attach latency for the 2nd and 5th attach.
- MCP child amplification with two configured idle-mcp servers,
  measured via two-level `pgrep -P` walk (daemon → ACP child → MCP
  grandchildren).
- SSE backpressure invariants exercised at the unit layer by
  instantiating `EventBus` directly: queue overflow → synthetic
  `client_evicted` frame; replay across reconnect honors
  `lastEventId` up to ring size.
- Prompt p50 / p99 (skipped when `QWEN_TEST_MODEL_KEY` is unset, with
  an explicit reason recorded in the snapshot).

Each run writes a structured JSON snapshot to
`<INTEGRATION_TEST_FILE_DIR>/perf-baseline.json` plus a Markdown
summary, with `gitCommit` / platform / config preserved for cross-PR
correlation.

## Honest documentation of current limits

The captured snapshot includes a `notes` field flagging that with the
default `sessionScope: 'single'`, N successive
`createOrAttachSession` calls return the same sessionId — so the RSS
and MCP metrics here measure "N attaches to one shared session", not
"N distinct sessions". Once Wave 2 PR 5 lands per-request
`sessionScope: 'thread'` override, the harness will be updated to
optionally force distinct sessions and surface the P1 MCP N×M
amplification before M2 fixes it.

## Reused / new

Reused: existing daemon spawn pattern from `qwen-serve-routes.test.ts`
(port-0 + stdout regex + SIGTERM teardown), `pgrep -P` pattern from
`qwen-serve-streaming.test.ts:144`, `EventBus` invariants from
`eventBus.test.ts`, `DaemonClient` SDK, integration-tests
`globalSetup.ts` env var conventions.

New (this PR):

- `integration-tests/cli/_daemon-harness.ts` (~280 lines) — extracts
  the inline daemon spawn pattern into a shared helper plus adds
  `getRssMB`, `startRssPolling`, `countDescendants`, `percentiles`,
  `consumeSseEvents`, `writeWorkspaceSettings`. Future serve test
  files can import instead of inlining.
- `integration-tests/fixtures/idle-mcp/{server.mjs,package.json}` — a
  minimal stdio MCP fixture that responds to `initialize` /
  `tools/list` and idles. Lets the harness count real MCP children
  via `pgrep` without depending on a network npm package in CI.
- `integration-tests/baselines/baseline-stage-1.json` — the first
  captured baseline at this commit. Future Mode B PRs can diff their
  run against this file; updating it is a deliberate one-line change
  in a follow-up PR.

## Reference patterns from opencode

JSDoc on the main test file documents the shape borrowed from
`opencode/test/memory/abort-leak.test.ts` (forced-GC heap-growth),
`opencode/src/cli/heap.ts` (RSS poll + threshold-triggered
`writeHeapSnapshot`, useful for Wave 6 production tooling), and
`opencode/src/util/cpu-watchdog.ts` (event-loop lag drift sampling).
The harness here is daemon-level multi-session — a shape neither
opencode nor qwen-code had before.

## Engineering principles checklist

- [x] Independently mergeable (test-only; no production code touched)
- [x] Backward compatible (no removed routes / event fields / CLI behavior)
- [x] Default off (PR CI does not run integration tests; baseline
      runs in release CI / nightly / manual)
- [x] `qwen serve` Stage 1 routes / SDK behavior preserved (no production
      code changed)
- [x] Gradual migration (no client adapter migration in this PR)
- [x] Reversible (revert = delete files, no other side effects)
- [x] Tests-first (this IS the test PR; harness exercises real daemon
      end-to-end; Windows skipped via existing `process.platform === 'win32'`
      precedent)

## Test plan

- [x] `KEEP_OUTPUT=true TEST_CLI_PATH=$(pwd)/packages/cli/dist/index.js
      QWEN_BASELINE_SKIP_PROMPT_LATENCY=1 QWEN_BASELINE_RSS_SAMPLE_DURATION_MS=2000
      npx vitest run integration-tests/cli/qwen-serve-baseline.test.ts`
      — 6 passed / 1 skipped (prompt latency requires model key)
- [x] `npx tsc --noEmit -p integration-tests/tsconfig.json` — only
      pre-existing tsconfig `paths` glob warning remains, no new errors

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix: import exit from node:process in idle-mcp fixture

Fixes eslint no-undef error: 'process' is not defined.
Replace process.exit(0) with exit(0) from node:process import.

* fix(test): remove stale baseline lint disable

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(test): harden daemon baseline harness

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-17 00:41:26 +08:00
..
baseline-stage-1.json test(perf): add daemon baseline harness (#4175 Wave 1 PR 1) (#4205) 2026-05-17 00:41:26 +08:00