qwen-code/docs
ChiGao d343e2c15e
Some checks are pending
Qwen Code CI / Classify PR (push) Waiting to run
Qwen Code CI / Lint (push) Blocked by required conditions
Qwen Code CI / Test (macos-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (ubuntu-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Test (windows-latest, Node 22.x) (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
feat(perf): progressive MCP availability — MCP no longer blocks first input (#3994)
* feat(perf): progressive MCP availability — MCP no longer blocks first input

Today `Config.initialize()` runs MCP discovery synchronously and the cli
can't accept input until every configured MCP server finishes its
discover handshake. One slow or hung server bottlenecks every user with
MCP configured. Validated by the profiler instrumentation added in this
PR (set `QWEN_CODE_PROFILE_STARTUP=1` to reproduce):

| User scenario             | Time to first prompt input |
| ------------------------- | -------------------------- |
| No MCP                    | ~480 ms                    |
| 1 fast MCP                | ~875 ms                    |
| 2 fast + 1 slow MCP       | **~7.1 s**                 |
| 1 hung MCP server         | **~10.5 s**                |

(Measured on macOS arm64 / Node 24.15, n=30/fixture, p50.)

`Config.initialize()` now passes `{ skipDiscovery: true }` to
`createToolRegistry` by default and kicks off MCP discovery in a
fire-and-forget background path. As each server completes discover,
the cli's `AppContainer` debounces `setTools()` calls into one-frame
(16 ms) batches so the model sees the consolidated tool list shortly
after each server settles. Rollback: `QWEN_CODE_LEGACY_MCP_BLOCKING=1`.

- `packages/core/src/config/config.ts` — `Config.initialize` switches
  to `skipDiscovery: true` + new `startMcpDiscoveryInBackground()`
  (defensive against partially-stubbed `ToolRegistry` in tests). Adds
  `MCPServerConfig.discoveryTimeoutMs` (last positional ctor param —
  doesn't shift existing call sites). Tool-call timeout is untouched.
- `packages/core/src/tools/tool-registry.ts` — new
  `getMcpClientManager()` getter so the background path can call the
  incremental discover directly without going through `discoverMcpTools`
  (which would wipe already-registered tools).
- `packages/core/src/tools/mcp-client-manager.ts` —
  `discoverAllMcpToolsIncremental` now: emits `mcp-client-update`
  after IN_PROGRESS transition, wraps each per-server discover in a
  discovery-only timeout (stdio 30s, remote 5s), emits trailing
  `mcp-client-update` after COMPLETED so UI subscribers see the
  terminal state.
- `packages/cli/src/ui/AppContainer.tsx` — new `useEffect` (gated on
  `isConfigInitialized`) subscribes to `mcp-client-update` and
  16ms-batches `setTools()` calls. Same effect also defers
  `finalizeStartupProfile` until MCP settles (or 35s hard cap), so
  startup-perf profiles capture the full MCP timeline.

Activated only by `QWEN_CODE_PROFILE_STARTUP=1`; when unset every
profiler entry point short-circuits in a single null/flag check and
returns. Heisenberg overhead measured at -1.12% Δp50 between
profile-on vs profile-off (Welch p=0.092, n=30/config × 3 configs) —
within statistical noise.

- `packages/cli/src/utils/startupProfiler.ts` — extended with
  `events` array (multi-fire), `recordStartupEvent`,
  `setInteractiveMode`, `derivedPhases`, per-checkpoint heap snapshots,
  `MAX_EVENTS` cap, and `QWEN_CODE_PROFILE_STARTUP_OUTER` / NO_HEAP
  env opt-ins. + 7 new tests.
- `packages/core/src/utils/startupEventSink.ts` (new) — minimal
  cross-package sink so `core` can emit profiler events without
  reverse-depending on `cli`. No-op when no sink registered. + 4 tests.
- `packages/core/src/index.ts` — export `setStartupEventSink` /
  `recordStartupEvent` / type aliases.
- `packages/cli/src/gemini.tsx` — registers the sink at `main()`
  entry, adds `first_paint` checkpoint after Ink render, calls
  `setInteractiveMode(true)` in the interactive branch.
- `packages/core/src/config/config.ts` — emits
  `tool_registry_created`.
- `packages/core/src/core/client.ts` — emits `gemini_tools_updated`
  at the end of `setTools()`.
- `packages/core/src/tools/mcp-client-manager.ts` — emits
  `mcp_discovery_start`, `mcp_server_ready:<name>`,
  `mcp_first_tool_registered`, `mcp_all_servers_settled`.
- `packages/cli/src/ui/AppContainer.tsx` — emits
  `config_initialize_start`, `config_initialize_end`, `input_enabled`.

`Config.initialize()` now returns BEFORE MCP discovery completes.
Things to check:
- Any code path that assumed "after `config.initialize()`, all MCP
  tools exist in the registry" — these will see only built-in tools
  initially; new tools appear via `mcp-client-update` events.
- `MCPDiscoveryState.COMPLETED` is now set asynchronously instead of
  synchronously after `initialize()` resolves.
- Model requests issued before MCP settles see only built-in tools;
  subsequent requests see the full set as servers come online.
- Tests that assert MCP tool count immediately after
  `config.initialize()` should wait for the `mcp-client-update` with
  COMPLETED discoveryState instead.

- 313 impacted-area tests green (config / mcp-client-manager / client
  / startupProfiler 18 / startupEventSink 4).
- `tsc --noEmit` clean for `packages/core` and `packages/cli`.
- `eslint` clean on touched files.
- Manual: `QWEN_CODE_PROFILE_STARTUP=1 SANDBOX=1` interactive run
  produces a JSON profile in `~/.qwen/startup-perf/` containing
  `first_paint`, `config_initialize_start/end`, `input_enabled`,
  MCP per-server events, and `gemini_tools_updated`. See PR
  description's "How to validate" section.

Generated with AI

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(core): harden progressive MCP discovery against silent regressions

Addresses review feedback on PR #3994:

- Skip user-disabled servers in discoverAllMcpToolsIncremental. The new
  incremental path used to iterate Object.entries(servers) without
  consulting isMcpServerDisabled, so a server the user had explicitly
  turned off would still get connected and its tools registered.
  Mirrors the existing protection in discoverAllMcpTools.

- Disconnect the underlying client when runWithDiscoveryTimeout fires.
  Without this, the inner discoverMcpToolsForServer kept running after
  the timeout rejected the outer promise — if discover() eventually
  succeeded it would register the late server's tools into the live
  toolRegistry (a silent registration vector, especially exploitable
  with a 0/negative discoveryTimeoutMs override).

- Clamp discoveryTimeoutMs to [100ms, 300_000ms]. 0/negative/Infinity
  values previously passed through to setTimeout unvalidated and made
  the silent-registration bug above trivially reachable.

- Classify the `tcp` (WebSocket) transport field as remote so hung WS
  handshakes use the 5s default instead of the 30s stdio default.

- Defensive delete of serverDiscoveryPromises[name] in the per-server
  catch so a doomed/orphan entry can't briefly short-circuit a
  subsequent discoverMcpToolsForServer call.

Adds focused tests for each fix.

Generated with AI

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(cli): restore runtime.json sidecar and harden non-interactive MCP visibility

Addresses review feedback on PR #3994:

- Restore writeRuntimeStatus + markRuntimeStatusEnabled in
  startInteractiveUI. The progressive-MCP diff inadvertently dropped
  the runtime.json sidecar write from the interactive entry point,
  leaving Config.refreshSessionId()'s session-swap refresh as dead
  code and silently breaking external integrations (terminal
  multiplexers, IDE integrations, status daemons) that map PID →
  sessionId via runtime.json.

- Add Config.getFailedMcpServerNames() and surface a stderr warning
  in --prompt / stream-json / ACP entry points when one or more MCP
  servers failed during background discovery. Per-server errors are
  caught inside discoverAllMcpToolsIncremental and never reached a
  TTY otherwise, so a script using non-interactive mode with broken
  MCP config would silently run with only built-in tools — a
  regression vs the legacy synchronous path.

- Pass the parsed `settings` object through to
  runNonInteractiveStreamJson. The new call site dropped the
  argument, falling back to createMinimalSettings() and losing any
  user-configured permission / approval / hook setup for stream-json
  sessions. Added regression assertion to gemini.test.tsx.

- Move finalizeStartupProfile out of gemini.tsx's stream-json branch
  and into Session.ensureConfigInitialized so it runs AFTER
  config.initialize() / waitForMcpReady() in stream-json. Previously
  the profile was finalized before any MCP / config_initialize_*
  events were emitted, producing empty stream-json profiles.

- Gate setStartupEventSink registration on isStartupProfilerEnabled()
  so core-side recordStartupEvent calls short-circuit at the first
  null-check when profiling is disabled, instead of going through an
  arrow wrapper and the profiler's own enabled gate.

- Tighten the type-unsafe ToolRegistry cast in
  startMcpDiscoveryInBackground to preserve the typed return signature
  so a rename of getMcpClientManager would be flagged at this call
  site (kept the optional-chain guard for tests that stub
  ToolRegistry as a plain object).

- Re-document first_paint as "render call returned" so consumers don't
  confuse Ink's synchronous render() return with literal pixel paint.
  Kept the checkpoint name for backward compatibility with collected
  profiles.

Generated with AI

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(cli): restore resize repaint and pin gemini_tools_lag capture in AppContainer

Addresses review feedback on PR #3994:

- Restore the terminal-resize useEffect that calls
  repaintStaticViewport() when terminalWidth changes. The progressive-
  MCP diff removed previousTerminalWidthRef + the repaint useCallback
  + the resize useEffect, so tmux pane resizes and fullscreen toggles
  leave the static region rendered at the old width — header content
  visibly tears until something else triggers refreshStatic.

- Pin the gemini_tools_lag startup metric. The previous onMcpUpdate
  handler called finalizeOnce() synchronously when discovery reached
  COMPLETED, but the pending setTools() batch was still 16ms away.
  setTools() emits `gemini_tools_updated` — when finalize ran first
  the profile's `finalized` guard suppressed that event, so
  gemini_tools_lag came out undefined in interactive mode. New
  onMcpUpdate flushes setTools() NOW on COMPLETED and only finalizes
  after the flush resolves, guaranteeing the event lands.

- Log setTools() batch-flush errors via debugLogger instead of
  silently swallowing them. GeminiClient.setTools() has no try/catch
  around warmAll() / getFunctionDeclarations() / getChat().setTools();
  the previous `.catch(() => {})` would have hidden production
  tool-registration regressions completely.

Generated with AI

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(core): correct MCP failure visibility and incremental cleanup

Addresses three review findings on PR #3994:

- McpClient.discover() now flips the client status to DISCONNECTED before
  re-throwing. Previously, a server that connected successfully but whose
  discoverPrompts / discoverTools then rejected (or that returned no
  prompts and no tools) would remain CONNECTED in the global status
  registry. Config.getFailedMcpServerNames() filters by
  `status !== CONNECTED`, so such servers were silently omitted from the
  non-interactive failure banner and the Footer's MCP health pill kept
  counting them as healthy.

- discoverAllMcpToolsIncremental no longer records `outcome: 'ready'`
  for servers whose connect/discover threw. The inner
  discoverMcpToolsForServerInternal catches errors without re-throwing
  (best-effort discovery semantics), so the try block resolved even for
  failures — only the runWithDiscoveryTimeout path reached the catch.
  Auth errors, server crashes, and missing-tools responses were therefore
  recorded as success in the startup profile. We now consult the actual
  server status (now correctly DISCONNECTED after the first fix) before
  emitting `ready`, and emit `outcome: 'failed'` otherwise.
  `mcp_first_tool_registered` is gated on the same check so a failed
  server can't pollute that user-facing metric.

- discoverAllMcpToolsIncremental tears down enabled→disabled mid-session
  transitions. When a previously-connected server is disabled (e.g. via
  `/mcp disable foo` or by editing settings), the incremental path used
  to just `continue` past it, leaving its client, tools, health check,
  and global status entry in place. Now calls removeServer() for any
  already-known client we encounter in the disabled branch.

Adds focused tests for each fix.

Generated with AI

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* docs(core): clarify ToolRegistry cast comment in startMcpDiscoveryInBackground

Addresses review feedback on PR #3994. The previous comment claimed the
call site uses "no defensive cast" but the code still casts via
`as ToolRegistry & { getMcpClientManager?: ... }`. Reword to explain
the cast's actual purpose: it exists only because some tests stub
ToolRegistry as a plain object, so we use optional chaining to avoid
crashing the init path when those tests run. Also note that the inner
shape now uses `ReturnType<ToolRegistry['getMcpClientManager']>` — a
future rename of the production method still surfaces as a type error
at this call site rather than silently falling through to the
`if (!manager)` branch.

Comment-only change; no behavior diff.

Generated with AI

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(core): close MCP timeout TOCTOU race and propagate disconnect status

Addresses two critical findings on PR #3994 round 6:

- runWithDiscoveryTimeout no longer uses fire-and-forget disconnect. The
  prior `void client.disconnect()` returned before `transport.close()`
  landed, leaving a window where an in-flight `discover()` could pump
  `tools/list` through the transport and synchronously register tools
  into the live registry BEFORE the close took effect. The earlier fix
  comment described this as a "remote-exploitable silent-tool-registration
  vector"; the await closes the timing window but doesn't help if tools
  already landed, so we also drop them with `removeMcpToolsByServer()`
  after the disconnect resolves. No-op when discover hadn't reached
  registration yet.

- McpClient.disconnect() now writes DISCONNECTED to the global registry
  directly. Previously, `isDisconnecting = true` was set BEFORE the
  internal `updateStatus(DISCONNECTED)` call, and `updateStatus`'s guard
  (designed to suppress LATE writes from a stale `connect()` catch)
  silently swallowed the write. The global stayed CONNECTED forever for
  timeout-disconnected servers, so `Config.getFailedMcpServerNames()`
  (which filters `status !== CONNECTED`) omitted them from the
  non-interactive failure banner and the Footer's MCP health pill kept
  counting them as healthy. This invalidated the round-5
  `getMCPServerStatus === CONNECTED` gate, which would always pass the
  "ready" check for timed-out servers. The guard stays in place for its
  original purpose; the legitimate disconnect→DISCONNECTED notification
  now bypasses it by writing the registry directly.

Also adds the `config_initialize_start` / `_end` profiler checkpoints
to `Session.ensureConfigInitialized()` so stream-json startup profiles
include the same derived `config_initialize_dur` phase as the
non-stream-json branch in gemini.tsx (round 6 [Suggestion]).

Tests cover (a) the disconnect-and-cleanup path on timeout and (b) the
intentional-disconnect global registry propagation regression.

Generated with AI

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

* fix(mcp): surface failures + prevent health-check resurrection of timed-out servers

Round-7 review follow-ups:

- AppContainer (interactive): MCP startup failures now route through
  debugLogger.warn on COMPLETED. Was silent — only debug logs / profile
  events surfaced failures, so regular interactive users got no
  indication their MCP servers failed. Mirrors the non-interactive
  stderr warning, adjusted to debugLogger so it doesn't collide with
  Ink's rendered output.

- acpAgent per-session: `QwenAgent.initializeConfig()` now emits the
  same `Warning: MCP server(s) failed to start` stderr line as the
  top-level `runAcpAgent` path. Previously per-session ACP configs
  with failed MCP servers silently fell back to built-in tools.

- mcp-client-manager timeout handler: after disconnecting an
  intentionally timed-out server, also drop it from `this.clients` and
  stop any pending health-check timer. Without this the discovery
  `finally` block would arm a health-check that detected DISCONNECTED
  status and called `reconnectServer()` → `discoverMcpToolsForServer()`
  directly — bypassing `runWithDiscoveryTimeout` entirely and silently
  resurrecting the slow server. `startHealthCheck` also early-returns
  for unknown servers so the trailing finally-block call is a no-op.

- startupEventSink: silent `catch {}` now logs via `debugLogger.error`
  so a corrupted sink doesn't silently drop every subsequent event.
  Quiet by default; visible under `QWEN_CODE_DEBUG=1`.

Tests:
- mcp-client-manager.test.ts: regression for the timeout → no-reconnect
  invariant (clients map purged + health-check timer absent).
- acpAgent.test.ts: per-session newSession surfaces failures to stderr,
  and stays safe when Config lacks `getFailedMcpServerNames`.

Declines (with reasoning in PR reply):
- [Critical] AppContainer batch-flush useEffect untested → re-flag of
  the round-5 deferral that wenshao acknowledged at the time. Lower-
  layer invariants (this PR's mcp-client-manager + mcp-client tests)
  pin the dependent contracts. The component-test harness for timers +
  event emitters in this file is non-trivial and out of scope; tracked
  for a follow-up.

Generated with AI

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

---------

Co-authored-by: 秦奇 <gary.gq@alibaba-inc.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-05-13 22:17:16 +08:00
..
design docs(auth): add custom API key wizard PRD (#3583) 2026-05-13 14:04:41 +08:00
developers feat(cli,sdk): qwen serve daemon (Stage 1) (#3889) 2026-05-13 14:47:47 +08:00
plans feat(vscode-ide-companion): add agent execution tool display (#2590) 2026-04-18 23:39:26 +08:00
users feat(perf): progressive MCP availability — MCP no longer blocks first input (#3994) 2026-05-13 22:17:16 +08:00
_meta.ts feat: refactor docs 2025-12-05 10:51:57 +08:00
index.md fix: lint issues 2025-12-19 15:52:11 +08:00