* fix(core): prevent followup suggestion input/output from appearing in tool call UI The follow-up suggestion generation was leaking into the conversation UI through three channels: 1. The forked query included tools in its generation config, allowing the model to produce function calls during suggestion generation. Fixed by setting `tools: []` in runForkedQuery's per-request config (kept in createForkedChat for speculation which needs tools). 2. logApiResponse and logApiError recorded suggestion API events to the chatRecordingService, causing them to appear in session JSONL files and the WebUI. Fixed by adding isInternalPromptId() guard that skips chatRecordingService for 'prompt_suggestion' and 'forked_query' IDs. uiTelemetryService.addEvent() is preserved so /stats still tracks suggestion token usage. 3. LoggingContentGenerator logged suggestion requests/responses to the OpenAI logger and telemetry pipeline. Fixed by skipping logApiRequest, buildOpenAIRequestForLogging, and logOpenAIInteraction for internal prompt IDs. _logApiResponse is preserved (for /stats) but its chatRecordingService path is filtered by fix #2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: deduplicate isInternalPromptId into shared export from loggers.ts Address review feedback: extract isInternalPromptId() to a single exported function in telemetry/loggers.ts and import it in LoggingContentGenerator, eliminating the duplicate private method. Also update loggingContentGenerator.test.ts mock to use importOriginal so the real isInternalPromptId is available during tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: extract isInternalPromptId to shared utils, add tests Address maintainer review feedback: 1. Move isInternalPromptId() to packages/core/src/utils/internalPromptIds.ts using a ReadonlySet for the ID registry. Adding new internal prompt IDs only requires changing one file. loggers.ts re-exports for compatibility, loggingContentGenerator.ts imports directly from utils. 2. Extract `tools: []` magic value to a frozen NO_TOOLS constant in forkedQuery.ts. 3. Add unit tests for isInternalPromptId: prompt_suggestion → true, forked_query → true, user_query → false, empty string → false. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Copilot review — docs, stream optimization, tests 1. Update forkedQuery.ts module docs to reflect that runForkedQuery overrides tools: [] at the per-request level while createForkedChat retains the full generationConfig for speculation callers. 2. Propagate isInternal into loggingStreamWrapper to skip response collection and consolidation for internal prompts, avoiding unnecessary CPU/memory overhead. 3. Add logApiResponse chatRecordingService filter tests: verify prompt_suggestion/forked_query skip recording while normal IDs still record. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: deep-freeze NO_TOOLS, add internal prompt guard tests Address Copilot review round 3: 1. Deep-freeze NO_TOOLS.tools array to prevent shared mutable state across forked query calls. 2. Add LoggingContentGenerator tests verifying that internal prompt IDs (prompt_suggestion, forked_query) skip logApiRequest and OpenAI interaction logging while preserving logApiResponse. 3. Add logApiError chatRecordingService filter tests matching the existing logApiResponse coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: reconcile createForkedChat JSDoc with module header Clarify that createForkedChat retains the full generationConfig (including tools) for speculation callers, while runForkedQuery strips tools at the per-request level via NO_TOOLS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: build errors and Copilot round 4 feedback 1. Fix NO_TOOLS type: Object.freeze produces readonly array incompatible with ToolUnion[]. Use Readonly<Pick<>> instead; spread in requestConfig already creates a fresh mutable copy per call. 2. Fix test missing required 'model' field in ContentGeneratorConfig. 3. Track firstResponseId/firstModelVersion in loggingStreamWrapper so _logApiResponse/_logApiError have accurate values even when full response collection is skipped for internal prompts. 4. Strengthen OpenAI logger test assertion: assert OpenAILogger was constructed (not guarded by if), then assert logInteraction was not called. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove dead Object.keys check, add streaming internal prompt test 1. Simplify runForkedQuery: requestConfig always has tools:[] from NO_TOOLS spread, so the Object.keys().length > 0 ternary is dead code. Pass requestConfig directly. 2. Add generateContentStream test for internal prompt IDs to match the existing generateContent coverage, ensuring the streaming wrapper also skips logApiRequest and OpenAI interaction logging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent Enter accept from re-inserting suggestion into buffer When accepting a followup suggestion via Enter, accept() queued buffer.insert(suggestion) in a microtask that executed after handleSubmitAndClear had already cleared the buffer, leaving the suggestion text stuck in the input. Add skipOnAccept option to accept() so the Enter path bypasses the onAccept callback. Also add runForkedQuery unit tests verifying tools: [] is passed in per-request config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(core): add speculation to internal IDs, fix logToolCall filtering, improve suggestion prompt - Add 'speculation' to INTERNAL_PROMPT_IDS so speculation API traffic and tool calls are hidden from chat recordings and tool call UI - Add isInternalPromptId check to logToolCall() for consistency with logApiError/logApiResponse - Improve SUGGESTION_PROMPT: prioritize assistant's last few lines and extract actionable text from explicit tips (e.g. "Tip: type X") - Fix garbled unicode in prompt text - Update design docs and user docs to reflect changes - Add test coverage for all new behavior * fix(core): deep-freeze NO_TOOLS, add speculation to loggingContentGenerator tests - Object.freeze NO_TOOLS and its tools array to prevent runtime mutation - Add 'speculation' to loggingContentGenerator internal prompt ID tests for consistency with loggers.test.ts and internalPromptIds.ts * fix(core): fix NO_TOOLS Object.freeze type error Use `as const` with type assertion to satisfy TypeScript while keeping runtime immutability via Object.freeze. * refactor(core): remove unused isInternalPromptId re-export from loggers.ts All consumers import directly from utils/internalPromptIds.js. The re-export was dead code with no importers. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
14 KiB
Prompt Suggestion (NES) Design
Predicts what the user would naturally type next after the AI completes a response, showing it as ghost text in the input prompt.
Implementation status:
prompt-suggestion-implementation.md. Speculation engine:speculation-design.md.
Overview
A prompt suggestion (Next-step Suggestion / NES) is a short prediction (2-12 words) of the user's next input, generated by an LLM call after each AI response. It appears as ghost text in the input prompt. The user can accept it with Tab/Enter/Right Arrow or dismiss it by typing.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ AppContainer (CLI) │
│ │
│ Responding → Idle transition │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Guard Conditions (11 categories) │ │
│ │ settings, interactive, sdk, plan mode, dialogs, │ │
│ │ elicitation, API error │ │
│ └────────────────────┬────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ generatePromptSuggestion() │ │
│ │ │ │
│ │ ┌─── CacheSafeParams available? ───┐ │ │
│ │ │ │ │ │
│ │ ▼ YES NO ▼ │ │
│ │ runForkedQuery() BaseLlmClient.generateJson() │ │
│ │ (cache-aware) (standalone fallback) │ │
│ │ │ │
│ │ ──── SUGGESTION_PROMPT ──── │ │
│ │ ──── 12 filter rules ────── │ │
│ │ ──── getFilterReason() ──── │ │
│ └────────────────────┬────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ FollowupController (framework-agnostic) │ │
│ │ 300ms delay → show as ghost text │ │
│ │ │ │
│ │ Tab → accept (fill input) │ │
│ │ Enter → accept + submit │ │
│ │ Right → accept (fill input) │ │
│ │ Type → dismiss + abort speculation │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Telemetry (PromptSuggestionEvent) │ │
│ │ outcome, accept_method, timing, similarity, │ │
│ │ keystroke, focus, suppression reason, prompt_id │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Suggestion Generation
LLM Prompt
[SUGGESTION MODE: Suggest what the user might naturally type next.]
FIRST: Read the LAST FEW LINES of the assistant's most recent message — that's where
next-step hints, tips, and actionable suggestions usually appear. Then check the user's
recent messages and original request.
Your job is to predict what THEY would type - not what you think they should do.
THE TEST: Would they think "I was just about to type that"?
PRIORITY: If the assistant's last message contains a tip or hint like "Tip: type X to ..."
or "type X to ...", extract X as the suggestion. These are explicit next-step hints.
EXAMPLES:
Assistant says "Tip: type post comments to publish findings" → "post comments"
Assistant says "type /review to start" → "/review"
User asked "fix the bug and run tests", bug is fixed → "run the tests"
After code written → "try it out"
Task complete, obvious follow-up → "commit this" or "push it"
Format: 2-12 words, match the user's style. Or nothing.
Reply with ONLY the suggestion, no quotes or explanation.
Filter Rules (12)
| Rule | Example blocked |
|---|---|
| done | "done" |
| meta_text | "nothing found", "no suggestion", "silence" |
| meta_wrapped | "(silence)", "[no suggestion]" |
| error_message | "api error: 500" |
| prefixed_label | "Suggestion: commit" |
| too_few_words | "hmm" (but allows "yes", "commit", "push" etc.) |
| too_many_words | > 12 words |
| too_long | >= 100 chars |
| multiple_sentences | "Run tests. Then commit." |
| has_formatting | newlines, markdown bold |
| evaluative | "looks good", "thanks" (with \b word boundaries) |
| ai_voice | "Let me...", "I'll...", "Here's..." |
Guard Conditions
AppContainer useEffect (13 checks in code):
| Guard | Check |
|---|---|
| Settings toggle | enableFollowupSuggestions |
| Non-interactive | config.isInteractive() |
| SDK mode | !config.getSdkMode() |
| Streaming transition | Responding → Idle (2 checks) |
| API error (history) | historyManager.history[last]?.type !== 'error' |
| API error (pending) | !pendingGeminiHistoryItems.some(type === 'error') |
| Confirmation dialogs | shell + general + loop detection (3 checks) |
| Permission dialog | isPermissionsDialogOpen |
| Elicitation | settingInputRequests.length === 0 |
| Plan mode | ApprovalMode.PLAN |
Inside generatePromptSuggestion():
| Guard | Check |
|---|---|
| Early conversation | modelTurns < 2 |
Separate feature flags (not in guard block):
| Flag | Controls |
|---|---|
enableCacheSharing |
Whether to use forked query or fallback to generateJson |
enableSpeculation |
Whether to start speculation on suggestion display |
State Management
FollowupState
interface FollowupState {
suggestion: string | null;
isVisible: boolean;
shownAt: number; // timestamp for telemetry
}
FollowupController
Framework-agnostic controller shared by CLI (Ink) and WebUI (React):
setSuggestion(text)— 300ms delayed show, null clears immediatelyaccept(method)— clears state, firesonAcceptvia microtask, 100ms debounce lockdismiss()— clears state, logsignoredtelemetryclear()— hard reset all state + timersObject.freeze(INITIAL_FOLLOWUP_STATE)prevents accidental mutation
Keyboard Interaction
| Key | CLI | WebUI |
|---|---|---|
| Tab | Fill input (no submit) | Fill input (no submit) |
| Enter | Fill + submit | Fill + submit (explicitText param) |
| Right Arrow | Fill input (no submit) | Fill input (no submit) |
| Typing | Dismiss + abort speculation | Dismiss |
| Paste | Dismiss + abort speculation | Dismiss |
Key Binding Note
The Tab handler uses key.name === 'tab' explicitly (not ACCEPT_SUGGESTION matcher) because ACCEPT_SUGGESTION also matches Enter, which must fall through to the SUBMIT handler.
Telemetry
PromptSuggestionEvent
| Field | Type | Description |
|---|---|---|
| outcome | accepted/ignored/suppressed | Final outcome |
| prompt_id | string | Default: 'user_intent' |
| accept_method | tab/enter/right | How user accepted |
| time_to_accept_ms | number | Time from shown to accept |
| time_to_ignore_ms | number | Time from shown to dismiss |
| time_to_first_keystroke_ms | number | Time to first keystroke while shown |
| suggestion_length | number | Character count |
| similarity | number | 1.0 for accept, 0.0 for ignore |
| was_focused_when_shown | boolean | Terminal had focus |
| reason | string | For suppressed: filter rule name |
SpeculationEvent
| Field | Type | Description |
|---|---|---|
| outcome | accepted/aborted/failed | Speculation result |
| turns_used | number | API round-trips |
| files_written | number | Files in overlay |
| tool_use_count | number | Tools executed |
| duration_ms | number | Wall-clock time |
| boundary_type | string | What stopped speculation |
| had_pipelined_suggestion | boolean | Next suggestion generated |
Feature Flags and Settings
| Setting | Type | Default | Description |
|---|---|---|---|
enableFollowupSuggestions |
boolean | true | Master toggle for prompt suggestions |
enableCacheSharing |
boolean | true | Use cache-aware forked queries |
enableSpeculation |
boolean | false | Predictive execution engine |
fastModel (top-level) |
string | "" | Model for all background tasks (empty = use main model). Set via /model --fast |
Internal Prompt ID Filtering
Background operations use dedicated prompt IDs (INTERNAL_PROMPT_IDS in utils/internalPromptIds.ts) to prevent their API traffic and tool calls from appearing in the user-visible UI:
| Prompt ID | Used by |
|---|---|
prompt_suggestion |
Suggestion generation |
forked_query |
Cache-aware forked queries |
speculation |
Speculation engine |
Filtering applied:
loggingContentGenerator— skipslogApiRequestand OpenAI interaction logging for internal IDslogApiResponse/logApiError— skipschatRecordingService.recordUiTelemetryEventlogToolCall— skipschatRecordingService.recordUiTelemetryEventuiTelemetryService.addEvent— not filtered (ensures/statstoken tracking works)
Thinking Mode
Thinking/reasoning is explicitly disabled (thinkingConfig: { includeThoughts: false }) for all background task paths:
- Forked query path (
createForkedChat) — overridesthinkingConfigin the clonedgenerationConfig, covering both suggestion generation and speculation - BaseLlm fallback path (
generateViaBaseLlm) — per-request config overrides base content generator's thinking settings
This is safe because:
- Cache prefix is determined by systemInstruction + tools + history, not
thinkingConfig— cache hits are unaffected - All backends (Gemini, OpenAI-compatible, Anthropic) handle
includeThoughts: falseby omitting the thinking field — no API errors on models without thinking support - Suggestion generation and speculation don't benefit from reasoning tokens