* fix(core): support cross-auth fast side queries
* refactor(core): hoist resolveForModel selector and refresh side-query docs
Compute the model selector once at the top of `resolveForModel` and pass
it through to `createContentGeneratorForModel` and
`resolveModelAcrossAuthTypes`. This eliminates the redundant selector
resolution that happened up to five times per cross-auth side query
(once per call, plus once inside each downstream helper).
Also update the JSDoc for `SideQueryJsonOptions.model` and
`SideQueryTextOptions.model` to reflect the actual fallback chain
(`getFastModelForSideQuery` → `getFastModel` → `getModel` →
`DEFAULT_QWEN_MODEL`) introduced in this PR.
* refactor(core): route side-query LLM calls through runSideQuery chokepoint
Folds every one-shot side-query call site through a single `runSideQuery`
entry point with `thinkingConfig.includeThoughts: false` and `fastModel`
(falling back to main) as the default policy. Adds a text-mode sibling
to the existing JSON-mode helper, plus a `BaseLlmClient.generateText`
primitive that calls `ContentGenerator.generateContent` directly so
side queries get neither user-memory wrapping nor the main-prompt
fallback that `geminiClient.generateContent` applies.
Migrated call sites: session title, recap, tool-use summary, /rename,
follow-up suggestion (direct path), ACP rewrite, project /summary,
arena approach summary, chat compression, web-fetch, insight analysis,
subagent spec generation. Six call sites override the helper defaults
explicitly (subagent gen, suggestion, ACP rewrite, /summary, compression,
insight) where main-model quality or caller-supplied model matters.
The /summary path additionally fixes a latent bug: text extraction
previously did not strip thought parts, so on thinking models the
saved `.qwen/PROJECT_SUMMARY.md` could leak `reasoning_content` into
the file. The chokepoint now strips thought parts and the request
itself goes out with thinking off.
Best-effort cosmetic callers (recap, tool-use summary, kebab rename,
suggestion) opt into `maxAttempts: 1` so transient outages don't burn
seven retries on output the user will likely never see. `isInternalPromptId`
recognises the `side-query:` prefix automatically so new call sites are
filtered without per-site allowlist updates.
Removes the `getAgentContentGenerator` workaround in `InProcessBackend`
and the `getAgentSummaryGenerator` indirection in `ArenaManager` —
arena approach summaries now run through the chokepoint against
`fastModel`, giving every agent a neutral arbiter rather than a
self-summary on its own model.
* fix(core): guard isInternalPromptId against undefined prompt_id
logToolCall calls isInternalPromptId(event.prompt_id), and tool-call
events from useToolScheduler can carry an undefined prompt_id. The
side-query refactor added promptId.startsWith(SIDE_QUERY_PROMPT_PREFIX)
without a falsy guard, so the missing id crashed the logger and broke
six useToolScheduler tests across all OS / Node matrix entries on CI.
* fix(cli,core): polish runSideQuery callers from review feedback
- Cap web-fetch, chat-compression, and ACP rewrite at maxAttempts: 1.
These paths degrade gracefully on failure (tool error, NOOP fallback,
null return), so 7 retries only delays the user-visible outcome.
- /summary now carries the main session's system instruction so the
summarizer keeps the coding-assistant role, project context, and
user memory instead of summarizing the chat in isolation.
- Add isInternalPromptId tests for the side-query: prefix so future
callers minted via runSideQuery stay filtered out of recordings.
* refactor(core): document runSideQuery defaults and surface promptId in errors
- Add JSDoc on the model and config fields of SideQueryJsonOptions and
SideQueryTextOptions so the fastModel-first defaulting and the
thinkingConfig.includeThoughts: false default are visible at the API
surface, not buried in resolveDefaultModel / applyThinkingDefault.
- BaseLlmClient.generate{Json,Text} error wraps now include promptId
in the message and pass { cause: error }, so a side-query failure
identifies which call site failed and preserves the original stack.
- Add tests covering maxAttempts forwarding (present + omitted) and
rejection propagation for both JSON and text modes — the conditional
spread is non-trivial and was previously unverified.
* fix(core): preserve per-model provider routing in side queries
BaseLlmClient was bound to the main session's ContentGenerator and only
swapped the request `model` field, so side queries targeting a fast or
alternate model inherited the main provider's baseUrl, credentials, and
sampling settings — breaking cross-provider configurations.
Move per-model generator/authType resolution out of GeminiClient and into
BaseLlmClient as `resolveForModel`. Both generateJson and generateText
now build a per-model ContentGenerator (with cache) when the request
targets a non-main model and pass the resolved retry authType through
to retryWithBackoff. GeminiClient.generateContent delegates to the same
resolver so there is a single source of truth.
Also pin the /forget destructive selector to the main model — the
runSideQuery default moved to fast model in this branch, but /forget
acts on the selection without confirmation, so a weaker fast model
could silently delete the wrong managed-memory entries.
* test(core): assert thinkingConfig/maxAttempts/model forwarding in compression
The compression caller of runSideQuery sets thinkingConfig.includeThoughts=true
and maxAttempts=1. A future refactor that silently drops either would degrade
compression quality without test failure; this assertion locks the contract.
* fix(cli): route dynamic localization through side query
* refactor(core): remove unused memory governance review