qwen-code/docs/users/features/followup-suggestions.md
Shaojin Wen f208801b0e
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
fix(followup): prevent tool call UI leak and Enter accept buffer race (#2872)
* fix(core): prevent followup suggestion input/output from appearing in tool call UI

The follow-up suggestion generation was leaking into the conversation UI
through three channels:

1. The forked query included tools in its generation config, allowing the
   model to produce function calls during suggestion generation. Fixed by
   setting `tools: []` in runForkedQuery's per-request config (kept in
   createForkedChat for speculation which needs tools).

2. logApiResponse and logApiError recorded suggestion API events to the
   chatRecordingService, causing them to appear in session JSONL files
   and the WebUI. Fixed by adding isInternalPromptId() guard that skips
   chatRecordingService for 'prompt_suggestion' and 'forked_query' IDs.
   uiTelemetryService.addEvent() is preserved so /stats still tracks
   suggestion token usage.

3. LoggingContentGenerator logged suggestion requests/responses to the
   OpenAI logger and telemetry pipeline. Fixed by skipping logApiRequest,
   buildOpenAIRequestForLogging, and logOpenAIInteraction for internal
   prompt IDs. _logApiResponse is preserved (for /stats) but its
   chatRecordingService path is filtered by fix #2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: deduplicate isInternalPromptId into shared export from loggers.ts

Address review feedback: extract isInternalPromptId() to a single
exported function in telemetry/loggers.ts and import it in
LoggingContentGenerator, eliminating the duplicate private method.

Also update loggingContentGenerator.test.ts mock to use importOriginal
so the real isInternalPromptId is available during tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract isInternalPromptId to shared utils, add tests

Address maintainer review feedback:

1. Move isInternalPromptId() to packages/core/src/utils/internalPromptIds.ts
   using a ReadonlySet for the ID registry. Adding new internal prompt IDs
   only requires changing one file. loggers.ts re-exports for compatibility,
   loggingContentGenerator.ts imports directly from utils.

2. Extract `tools: []` magic value to a frozen NO_TOOLS constant in
   forkedQuery.ts.

3. Add unit tests for isInternalPromptId: prompt_suggestion → true,
   forked_query → true, user_query → false, empty string → false.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Copilot review — docs, stream optimization, tests

1. Update forkedQuery.ts module docs to reflect that runForkedQuery
   overrides tools: [] at the per-request level while createForkedChat
   retains the full generationConfig for speculation callers.

2. Propagate isInternal into loggingStreamWrapper to skip response
   collection and consolidation for internal prompts, avoiding
   unnecessary CPU/memory overhead.

3. Add logApiResponse chatRecordingService filter tests: verify
   prompt_suggestion/forked_query skip recording while normal IDs
   still record.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: deep-freeze NO_TOOLS, add internal prompt guard tests

Address Copilot review round 3:

1. Deep-freeze NO_TOOLS.tools array to prevent shared mutable state
   across forked query calls.

2. Add LoggingContentGenerator tests verifying that internal prompt IDs
   (prompt_suggestion, forked_query) skip logApiRequest and OpenAI
   interaction logging while preserving logApiResponse.

3. Add logApiError chatRecordingService filter tests matching the
   existing logApiResponse coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: reconcile createForkedChat JSDoc with module header

Clarify that createForkedChat retains the full generationConfig
(including tools) for speculation callers, while runForkedQuery
strips tools at the per-request level via NO_TOOLS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: build errors and Copilot round 4 feedback

1. Fix NO_TOOLS type: Object.freeze produces readonly array incompatible
   with ToolUnion[]. Use Readonly<Pick<>> instead; spread in requestConfig
   already creates a fresh mutable copy per call.

2. Fix test missing required 'model' field in ContentGeneratorConfig.

3. Track firstResponseId/firstModelVersion in loggingStreamWrapper so
   _logApiResponse/_logApiError have accurate values even when full
   response collection is skipped for internal prompts.

4. Strengthen OpenAI logger test assertion: assert OpenAILogger was
   constructed (not guarded by if), then assert logInteraction was
   not called.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove dead Object.keys check, add streaming internal prompt test

1. Simplify runForkedQuery: requestConfig always has tools:[] from
   NO_TOOLS spread, so the Object.keys().length > 0 ternary is dead
   code. Pass requestConfig directly.

2. Add generateContentStream test for internal prompt IDs to match
   the existing generateContent coverage, ensuring the streaming
   wrapper also skips logApiRequest and OpenAI interaction logging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent Enter accept from re-inserting suggestion into buffer

When accepting a followup suggestion via Enter, accept() queued
buffer.insert(suggestion) in a microtask that executed after
handleSubmitAndClear had already cleared the buffer, leaving the
suggestion text stuck in the input.

Add skipOnAccept option to accept() so the Enter path bypasses the
onAccept callback. Also add runForkedQuery unit tests verifying
tools: [] is passed in per-request config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(core): add speculation to internal IDs, fix logToolCall filtering, improve suggestion prompt

- Add 'speculation' to INTERNAL_PROMPT_IDS so speculation API traffic
  and tool calls are hidden from chat recordings and tool call UI
- Add isInternalPromptId check to logToolCall() for consistency with
  logApiError/logApiResponse
- Improve SUGGESTION_PROMPT: prioritize assistant's last few lines and
  extract actionable text from explicit tips (e.g. "Tip: type X")
- Fix garbled unicode in prompt text
- Update design docs and user docs to reflect changes
- Add test coverage for all new behavior

* fix(core): deep-freeze NO_TOOLS, add speculation to loggingContentGenerator tests

- Object.freeze NO_TOOLS and its tools array to prevent runtime mutation
- Add 'speculation' to loggingContentGenerator internal prompt ID tests
  for consistency with loggers.test.ts and internalPromptIds.ts

* fix(core): fix NO_TOOLS Object.freeze type error

Use `as const` with type assertion to satisfy TypeScript while keeping
runtime immutability via Object.freeze.

* refactor(core): remove unused isInternalPromptId re-export from loggers.ts

All consumers import directly from utils/internalPromptIds.js.
The re-export was dead code with no importers.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 00:07:03 +08:00

4.5 KiB

Followup Suggestions

Qwen Code can predict what you want to type next and show it as ghost text in the input area. This feature uses an LLM call to analyze the conversation context and generate a natural next step suggestion.

This feature works end-to-end in the CLI. In the WebUI, the hook and UI plumbing are available, but host applications must trigger suggestion generation and wire the followup state for suggestions to appear.

How It Works

After Qwen Code finishes responding, a suggestion appears as dimmed text in the input area after a short delay (~300ms). For example, after fixing a bug, you might see:

> run the tests

The suggestion is generated by sending the conversation history to the model, which predicts what you would naturally type next. If the response contains an explicit tip (e.g., Tip: type post comments to publish findings), the suggested action is extracted automatically.

Accepting Suggestions

Key Action
Tab Accept the suggestion and fill it into the input
Enter Accept the suggestion and submit it immediately
Right Arrow Accept the suggestion and fill it into the input
Any typing Dismiss the suggestion and type normally

When Suggestions Appear

Suggestions are generated when all of the following conditions are met:

  • The model has completed its response (not during streaming)
  • At least 2 model turns have occurred in the conversation
  • There are no errors in the most recent response
  • No confirmation dialogs are pending (e.g., shell confirmation, permissions)
  • The approval mode is not set to plan
  • The feature is enabled in settings (enabled by default)

Suggestions will not appear in non-interactive mode (e.g., headless/SDK mode).

Suggestions are automatically dismissed when:

  • You start typing
  • A new model turn begins
  • The suggestion is accepted

Fast Model

By default, suggestions use the same model as your main conversation. For faster and cheaper suggestions, configure a dedicated fast model:

Via command

/model --fast qwen3.5-flash

Or use /model --fast (without a model name) to open a selection dialog.

Via settings.json

{
  "fastModel": "qwen3.5-flash"
}

The fast model is used for background tasks like suggestion generation. When not configured, the main conversation model is used as fallback.

Thinking/reasoning mode is automatically disabled for all background tasks (suggestion generation and speculation), regardless of your main model's thinking configuration. This avoids wasting tokens on internal reasoning that isn't needed for these tasks.

Configuration

These settings can be configured in settings.json:

Setting Type Default Description
ui.enableFollowupSuggestions boolean true Enable or disable followup suggestions
ui.enableCacheSharing boolean true Use cache-aware forked queries to reduce cost (experimental)
ui.enableSpeculation boolean false Speculatively execute suggestions before submission (experimental)
fastModel string "" Model for background tasks (suggestion generation, speculation)

Example

{
  "fastModel": "qwen3.5-flash",
  "ui": {
    "enableFollowupSuggestions": true,
    "enableCacheSharing": true
  }
}

Monitoring

Suggestion model usage appears in /stats output, showing tokens consumed by the fast model for suggestion generation.

The fast model is also shown in /about output under "Fast Model".

Suggestion Quality

Suggestions go through quality filters to ensure they are useful:

  • Must be 2-12 words (CJK: 2-30 characters), under 100 characters total
  • Cannot be evaluative ("looks good", "thanks")
  • Cannot use AI voice ("Let me...", "I'll...")
  • Cannot be multiple sentences or contain formatting (markdown, newlines)
  • Cannot be meta-commentary ("nothing to suggest", "silence")
  • Cannot be error messages or prefixed labels ("Suggestion: ...")
  • Single-word suggestions are only allowed for common commands (yes, commit, push, etc.)
  • Slash commands (e.g., /commit) are always allowed as single-word suggestions