vrr/qwen-code

Fork 0

mirror of https://github.com/QwenLM/qwen-code.git synced 2026-05-01 05:00:46 +00:00

Shaojin Wen f208801b0e

Qwen Code CI / Lint (push) Waiting to run

Details

Qwen Code CI / Test (push) Blocked by required conditions

Details

Qwen Code CI / Test-1 (push) Blocked by required conditions

Details

Qwen Code CI / Test-2 (push) Blocked by required conditions

Details

Qwen Code CI / Test-3 (push) Blocked by required conditions

Details

Qwen Code CI / Test-4 (push) Blocked by required conditions

Details

Qwen Code CI / Test-5 (push) Blocked by required conditions

Details

Qwen Code CI / Test-6 (push) Blocked by required conditions

Details

Qwen Code CI / Test-7 (push) Blocked by required conditions

Details

Qwen Code CI / Test-8 (push) Blocked by required conditions

Details

Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions

Details

Qwen Code CI / CodeQL (push) Waiting to run

Details

E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run

Details

E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run

Details

E2E Tests / E2E Test - macOS (push) Waiting to run

Details

fix(followup): prevent tool call UI leak and Enter accept buffer race (#2872 )

* fix(core): prevent followup suggestion input/output from appearing in tool call UI

The follow-up suggestion generation was leaking into the conversation UI
through three channels:

1. The forked query included tools in its generation config, allowing the
   model to produce function calls during suggestion generation. Fixed by
   setting `tools: []` in runForkedQuery's per-request config (kept in
   createForkedChat for speculation which needs tools).

2. logApiResponse and logApiError recorded suggestion API events to the
   chatRecordingService, causing them to appear in session JSONL files
   and the WebUI. Fixed by adding isInternalPromptId() guard that skips
   chatRecordingService for 'prompt_suggestion' and 'forked_query' IDs.
   uiTelemetryService.addEvent() is preserved so /stats still tracks
   suggestion token usage.

3. LoggingContentGenerator logged suggestion requests/responses to the
   OpenAI logger and telemetry pipeline. Fixed by skipping logApiRequest,
   buildOpenAIRequestForLogging, and logOpenAIInteraction for internal
   prompt IDs. _logApiResponse is preserved (for /stats) but its
   chatRecordingService path is filtered by fix #2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: deduplicate isInternalPromptId into shared export from loggers.ts

Address review feedback: extract isInternalPromptId() to a single
exported function in telemetry/loggers.ts and import it in
LoggingContentGenerator, eliminating the duplicate private method.

Also update loggingContentGenerator.test.ts mock to use importOriginal
so the real isInternalPromptId is available during tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract isInternalPromptId to shared utils, add tests

Address maintainer review feedback:

1. Move isInternalPromptId() to packages/core/src/utils/internalPromptIds.ts
   using a ReadonlySet for the ID registry. Adding new internal prompt IDs
   only requires changing one file. loggers.ts re-exports for compatibility,
   loggingContentGenerator.ts imports directly from utils.

2. Extract `tools: []` magic value to a frozen NO_TOOLS constant in
   forkedQuery.ts.

3. Add unit tests for isInternalPromptId: prompt_suggestion → true,
   forked_query → true, user_query → false, empty string → false.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Copilot review — docs, stream optimization, tests

1. Update forkedQuery.ts module docs to reflect that runForkedQuery
   overrides tools: [] at the per-request level while createForkedChat
   retains the full generationConfig for speculation callers.

2. Propagate isInternal into loggingStreamWrapper to skip response
   collection and consolidation for internal prompts, avoiding
   unnecessary CPU/memory overhead.

3. Add logApiResponse chatRecordingService filter tests: verify
   prompt_suggestion/forked_query skip recording while normal IDs
   still record.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: deep-freeze NO_TOOLS, add internal prompt guard tests

Address Copilot review round 3:

1. Deep-freeze NO_TOOLS.tools array to prevent shared mutable state
   across forked query calls.

2. Add LoggingContentGenerator tests verifying that internal prompt IDs
   (prompt_suggestion, forked_query) skip logApiRequest and OpenAI
   interaction logging while preserving logApiResponse.

3. Add logApiError chatRecordingService filter tests matching the
   existing logApiResponse coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: reconcile createForkedChat JSDoc with module header

Clarify that createForkedChat retains the full generationConfig
(including tools) for speculation callers, while runForkedQuery
strips tools at the per-request level via NO_TOOLS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: build errors and Copilot round 4 feedback

1. Fix NO_TOOLS type: Object.freeze produces readonly array incompatible
   with ToolUnion[]. Use Readonly<Pick<>> instead; spread in requestConfig
   already creates a fresh mutable copy per call.

2. Fix test missing required 'model' field in ContentGeneratorConfig.

3. Track firstResponseId/firstModelVersion in loggingStreamWrapper so
   _logApiResponse/_logApiError have accurate values even when full
   response collection is skipped for internal prompts.

4. Strengthen OpenAI logger test assertion: assert OpenAILogger was
   constructed (not guarded by if), then assert logInteraction was
   not called.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove dead Object.keys check, add streaming internal prompt test

1. Simplify runForkedQuery: requestConfig always has tools:[] from
   NO_TOOLS spread, so the Object.keys().length > 0 ternary is dead
   code. Pass requestConfig directly.

2. Add generateContentStream test for internal prompt IDs to match
   the existing generateContent coverage, ensuring the streaming
   wrapper also skips logApiRequest and OpenAI interaction logging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent Enter accept from re-inserting suggestion into buffer

When accepting a followup suggestion via Enter, accept() queued
buffer.insert(suggestion) in a microtask that executed after
handleSubmitAndClear had already cleared the buffer, leaving the
suggestion text stuck in the input.

Add skipOnAccept option to accept() so the Enter path bypasses the
onAccept callback. Also add runForkedQuery unit tests verifying
tools: [] is passed in per-request config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(core): add speculation to internal IDs, fix logToolCall filtering, improve suggestion prompt

- Add 'speculation' to INTERNAL_PROMPT_IDS so speculation API traffic
  and tool calls are hidden from chat recordings and tool call UI
- Add isInternalPromptId check to logToolCall() for consistency with
  logApiError/logApiResponse
- Improve SUGGESTION_PROMPT: prioritize assistant's last few lines and
  extract actionable text from explicit tips (e.g. "Tip: type X")
- Fix garbled unicode in prompt text
- Update design docs and user docs to reflect changes
- Add test coverage for all new behavior

* fix(core): deep-freeze NO_TOOLS, add speculation to loggingContentGenerator tests

- Object.freeze NO_TOOLS and its tools array to prevent runtime mutation
- Add 'speculation' to loggingContentGenerator internal prompt ID tests
  for consistency with loggers.test.ts and internalPromptIds.ts

* fix(core): fix NO_TOOLS Object.freeze type error

Use `as const` with type assertion to satisfy TypeScript while keeping
runtime immutability via Object.freeze.

* refactor(core): remove unused isInternalPromptId re-export from loggers.ts

All consumers import directly from utils/internalPromptIds.js.
The re-export was dead code with no importers.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-09 00:07:03 +08:00

14 KiB

Raw Blame History

Prompt Suggestion (NES) Design

Predicts what the user would naturally type next after the AI completes a response, showing it as ghost text in the input prompt.

Implementation status: prompt-suggestion-implementation.md. Speculation engine: speculation-design.md.

Overview

A prompt suggestion (Next-step Suggestion / NES) is a short prediction (2-12 words) of the user's next input, generated by an LLM call after each AI response. It appears as ghost text in the input prompt. The user can accept it with Tab/Enter/Right Arrow or dismiss it by typing.

Architecture

┌─────────────────────────────────────────────────────────────┐
│  AppContainer (CLI)                                         │
│                                                             │
│  Responding → Idle transition                               │
│       │                                                     │
│       ▼                                                     │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Guard Conditions (11 categories)                    │    │
│  │  settings, interactive, sdk, plan mode, dialogs,    │    │
│  │  elicitation, API error                             │    │
│  └────────────────────┬────────────────────────────────┘    │
│                       │                                     │
│                       ▼                                     │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  generatePromptSuggestion()                         │    │
│  │                                                     │    │
│  │  ┌─── CacheSafeParams available? ───┐               │    │
│  │  │                                  │               │    │
│  │  ▼ YES                         NO ▼                 │    │
│  │  runForkedQuery()      BaseLlmClient.generateJson() │    │
│  │  (cache-aware)         (standalone fallback)        │    │
│  │                                                     │    │
│  │  ──── SUGGESTION_PROMPT ────                        │    │
│  │  ──── 12 filter rules ──────                        │    │
│  │  ──── getFilterReason() ────                        │    │
│  └────────────────────┬────────────────────────────────┘    │
│                       │                                     │
│                       ▼                                     │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  FollowupController (framework-agnostic)            │    │
│  │  300ms delay → show as ghost text                   │    │
│  │                                                     │    │
│  │  Tab    → accept (fill input)                       │    │
│  │  Enter  → accept + submit                           │    │
│  │  Right  → accept (fill input)                       │    │
│  │  Type   → dismiss + abort speculation               │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Telemetry (PromptSuggestionEvent)                  │    │
│  │  outcome, accept_method, timing, similarity,        │    │
│  │  keystroke, focus, suppression reason, prompt_id     │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Suggestion Generation

LLM Prompt

[SUGGESTION MODE: Suggest what the user might naturally type next.]

FIRST: Read the LAST FEW LINES of the assistant's most recent message — that's where
next-step hints, tips, and actionable suggestions usually appear. Then check the user's
recent messages and original request.

Your job is to predict what THEY would type - not what you think they should do.
THE TEST: Would they think "I was just about to type that"?

PRIORITY: If the assistant's last message contains a tip or hint like "Tip: type X to ..."
or "type X to ...", extract X as the suggestion. These are explicit next-step hints.

EXAMPLES:
Assistant says "Tip: type post comments to publish findings" → "post comments"
Assistant says "type /review to start" → "/review"
User asked "fix the bug and run tests", bug is fixed → "run the tests"
After code written → "try it out"
Task complete, obvious follow-up → "commit this" or "push it"

Format: 2-12 words, match the user's style. Or nothing.
Reply with ONLY the suggestion, no quotes or explanation.

Filter Rules (12)

Rule	Example blocked
done	"done"
meta_text	"nothing found", "no suggestion", "silence"
meta_wrapped	"(silence)", "[no suggestion]"
error_message	"api error: 500"
prefixed_label	"Suggestion: commit"
too_few_words	"hmm" (but allows "yes", "commit", "push" etc.)
too_many_words	> 12 words
too_long	>= 100 chars
multiple_sentences	"Run tests. Then commit."
has_formatting	newlines, markdown bold
evaluative	"looks good", "thanks" (with \b word boundaries)
ai_voice	"Let me...", "I'll...", "Here's..."

Guard Conditions

AppContainer useEffect (13 checks in code):

Guard	Check
Settings toggle	`enableFollowupSuggestions`
Non-interactive	`config.isInteractive()`
SDK mode	`!config.getSdkMode()`
Streaming transition	`Responding → Idle` (2 checks)
API error (history)	`historyManager.history[last]?.type !== 'error'`
API error (pending)	`!pendingGeminiHistoryItems.some(type === 'error')`
Confirmation dialogs	shell + general + loop detection (3 checks)
Permission dialog	`isPermissionsDialogOpen`
Elicitation	`settingInputRequests.length === 0`
Plan mode	`ApprovalMode.PLAN`

Inside generatePromptSuggestion():

Guard	Check
Early conversation	`modelTurns < 2`

Separate feature flags (not in guard block):

Flag	Controls
`enableCacheSharing`	Whether to use forked query or fallback to generateJson
`enableSpeculation`	Whether to start speculation on suggestion display

State Management

FollowupState

interface FollowupState {
  suggestion: string | null;
  isVisible: boolean;
  shownAt: number; // timestamp for telemetry
}

FollowupController

Framework-agnostic controller shared by CLI (Ink) and WebUI (React):

setSuggestion(text) — 300ms delayed show, null clears immediately
accept(method) — clears state, fires onAccept via microtask, 100ms debounce lock
dismiss() — clears state, logs ignored telemetry
clear() — hard reset all state + timers
Object.freeze(INITIAL_FOLLOWUP_STATE) prevents accidental mutation

Keyboard Interaction

Key	CLI	WebUI
Tab	Fill input (no submit)	Fill input (no submit)
Enter	Fill + submit	Fill + submit (`explicitText` param)
Right Arrow	Fill input (no submit)	Fill input (no submit)
Typing	Dismiss + abort speculation	Dismiss
Paste	Dismiss + abort speculation	Dismiss

Key Binding Note

The Tab handler uses key.name === 'tab' explicitly (not ACCEPT_SUGGESTION matcher) because ACCEPT_SUGGESTION also matches Enter, which must fall through to the SUBMIT handler.

Telemetry

PromptSuggestionEvent

Field	Type	Description
outcome	accepted/ignored/suppressed	Final outcome
prompt_id	string	Default: 'user_intent'
accept_method	tab/enter/right	How user accepted
time_to_accept_ms	number	Time from shown to accept
time_to_ignore_ms	number	Time from shown to dismiss
time_to_first_keystroke_ms	number	Time to first keystroke while shown
suggestion_length	number	Character count
similarity	number	1.0 for accept, 0.0 for ignore
was_focused_when_shown	boolean	Terminal had focus
reason	string	For suppressed: filter rule name

SpeculationEvent

Field	Type	Description
outcome	accepted/aborted/failed	Speculation result
turns_used	number	API round-trips
files_written	number	Files in overlay
tool_use_count	number	Tools executed
duration_ms	number	Wall-clock time
boundary_type	string	What stopped speculation
had_pipelined_suggestion	boolean	Next suggestion generated

Feature Flags and Settings

Setting	Type	Default	Description
`enableFollowupSuggestions`	boolean	true	Master toggle for prompt suggestions
`enableCacheSharing`	boolean	true	Use cache-aware forked queries
`enableSpeculation`	boolean	false	Predictive execution engine
`fastModel` (top-level)	string	""	Model for all background tasks (empty = use main model). Set via `/model --fast`

Internal Prompt ID Filtering

Background operations use dedicated prompt IDs (INTERNAL_PROMPT_IDS in utils/internalPromptIds.ts) to prevent their API traffic and tool calls from appearing in the user-visible UI:

Prompt ID	Used by
`prompt_suggestion`	Suggestion generation
`forked_query`	Cache-aware forked queries
`speculation`	Speculation engine

Filtering applied:

loggingContentGenerator — skips logApiRequest and OpenAI interaction logging for internal IDs
logApiResponse / logApiError — skips chatRecordingService.recordUiTelemetryEvent
logToolCall — skips chatRecordingService.recordUiTelemetryEvent
uiTelemetryService.addEvent — not filtered (ensures /stats token tracking works)

Thinking Mode

Thinking/reasoning is explicitly disabled (thinkingConfig: { includeThoughts: false }) for all background task paths:

Forked query path (createForkedChat) — overrides thinkingConfig in the cloned generationConfig, covering both suggestion generation and speculation
BaseLlm fallback path (generateViaBaseLlm) — per-request config overrides base content generator's thinking settings

This is safe because:

Cache prefix is determined by systemInstruction + tools + history, not thinkingConfig — cache hits are unaffected
All backends (Gemini, OpenAI-compatible, Anthropic) handle includeThoughts: false by omitting the thinking field — no API errors on models without thinking support
Suggestion generation and speculation don't benefit from reasoning tokens

14 KiB Raw Blame History

Prompt Suggestion (NES) Design

Overview

Architecture

Suggestion Generation

LLM Prompt

Filter Rules (12)

Guard Conditions

State Management

FollowupState

FollowupController

Keyboard Interaction

Key Binding Note

Telemetry

PromptSuggestionEvent

SpeculationEvent

Feature Flags and Settings

Internal Prompt ID Filtering

Thinking Mode

14 KiB

Raw Blame History