qwen-code/packages/cli
tanzhenxin d7a25682e6
refactor(core): route side-query LLM calls through runSideQuery chokepoint (#3775)
* refactor(core): route side-query LLM calls through runSideQuery chokepoint

Folds every one-shot side-query call site through a single `runSideQuery`
entry point with `thinkingConfig.includeThoughts: false` and `fastModel`
(falling back to main) as the default policy. Adds a text-mode sibling
to the existing JSON-mode helper, plus a `BaseLlmClient.generateText`
primitive that calls `ContentGenerator.generateContent` directly so
side queries get neither user-memory wrapping nor the main-prompt
fallback that `geminiClient.generateContent` applies.

Migrated call sites: session title, recap, tool-use summary, /rename,
follow-up suggestion (direct path), ACP rewrite, project /summary,
arena approach summary, chat compression, web-fetch, insight analysis,
subagent spec generation. Six call sites override the helper defaults
explicitly (subagent gen, suggestion, ACP rewrite, /summary, compression,
insight) where main-model quality or caller-supplied model matters.

The /summary path additionally fixes a latent bug: text extraction
previously did not strip thought parts, so on thinking models the
saved `.qwen/PROJECT_SUMMARY.md` could leak `reasoning_content` into
the file. The chokepoint now strips thought parts and the request
itself goes out with thinking off.

Best-effort cosmetic callers (recap, tool-use summary, kebab rename,
suggestion) opt into `maxAttempts: 1` so transient outages don't burn
seven retries on output the user will likely never see. `isInternalPromptId`
recognises the `side-query:` prefix automatically so new call sites are
filtered without per-site allowlist updates.

Removes the `getAgentContentGenerator` workaround in `InProcessBackend`
and the `getAgentSummaryGenerator` indirection in `ArenaManager` —
arena approach summaries now run through the chokepoint against
`fastModel`, giving every agent a neutral arbiter rather than a
self-summary on its own model.

* fix(core): guard isInternalPromptId against undefined prompt_id

logToolCall calls isInternalPromptId(event.prompt_id), and tool-call
events from useToolScheduler can carry an undefined prompt_id. The
side-query refactor added promptId.startsWith(SIDE_QUERY_PROMPT_PREFIX)
without a falsy guard, so the missing id crashed the logger and broke
six useToolScheduler tests across all OS / Node matrix entries on CI.

* fix(cli,core): polish runSideQuery callers from review feedback

- Cap web-fetch, chat-compression, and ACP rewrite at maxAttempts: 1.
  These paths degrade gracefully on failure (tool error, NOOP fallback,
  null return), so 7 retries only delays the user-visible outcome.
- /summary now carries the main session's system instruction so the
  summarizer keeps the coding-assistant role, project context, and
  user memory instead of summarizing the chat in isolation.
- Add isInternalPromptId tests for the side-query: prefix so future
  callers minted via runSideQuery stay filtered out of recordings.

* refactor(core): document runSideQuery defaults and surface promptId in errors

- Add JSDoc on the model and config fields of SideQueryJsonOptions and
  SideQueryTextOptions so the fastModel-first defaulting and the
  thinkingConfig.includeThoughts: false default are visible at the API
  surface, not buried in resolveDefaultModel / applyThinkingDefault.
- BaseLlmClient.generate{Json,Text} error wraps now include promptId
  in the message and pass { cause: error }, so a side-query failure
  identifies which call site failed and preserves the original stack.
- Add tests covering maxAttempts forwarding (present + omitted) and
  rejection propagation for both JSON and text modes — the conditional
  spread is non-trivial and was previously unverified.

* fix(core): preserve per-model provider routing in side queries

BaseLlmClient was bound to the main session's ContentGenerator and only
swapped the request `model` field, so side queries targeting a fast or
alternate model inherited the main provider's baseUrl, credentials, and
sampling settings — breaking cross-provider configurations.

Move per-model generator/authType resolution out of GeminiClient and into
BaseLlmClient as `resolveForModel`. Both generateJson and generateText
now build a per-model ContentGenerator (with cache) when the request
targets a non-main model and pass the resolved retry authType through
to retryWithBackoff. GeminiClient.generateContent delegates to the same
resolver so there is a single source of truth.

Also pin the /forget destructive selector to the main model — the
runSideQuery default moved to fast model in this branch, but /forget
acts on the selection without confirmation, so a weaker fast model
could silently delete the wrong managed-memory entries.

* test(core): assert thinkingConfig/maxAttempts/model forwarding in compression

The compression caller of runSideQuery sets thinkingConfig.includeThoughts=true
and maxAttempts=1. A future refactor that silently drops either would degrade
compression quality without test failure; this assertion locks the contract.

* fix(cli): route dynamic localization through side query

* refactor(core): remove unused memory governance review
2026-05-11 19:03:14 +08:00
..
src refactor(core): route side-query LLM calls through runSideQuery chokepoint (#3775) 2026-05-11 19:03:14 +08:00
index.ts fix(cli): stop double-wrapping and double-printing API errors in non-interactive mode (#3749) 2026-05-03 08:39:31 +08:00
package.json chore(deps): upgrade ink 6.2.3 → 7.0.2 + bump Node engine to 22 (#3860) 2026-05-11 17:29:50 +08:00
test-setup.ts fix: prevent bogus shell permission rules in tests 2026-03-20 17:55:33 +08:00
tsconfig.json Add background agent resume and continuation (#3739) 2026-05-01 12:14:33 +08:00
vitest.config.ts refactor(core): Unify package exports and improve dev experience 2026-02-01 11:59:05 +08:00