qwen-code/docs/design/session-recap/session-recap-design.md
Shaojin Wen afbb5e71db
fix(cli): rework session recap rendering and add blur threshold setting (#3482)
* feat(cli): make recap away-threshold configurable

The 5-minute blur threshold was hard-coded. Confirmed from Claude
Code's own binary (v2.1.113) that 5 minutes is their default as well
(and that they shift to 60 minutes when 1h prompt-cache is active) —
so the default stays, but expose it as `general.sessionRecapAway
ThresholdMinutes` for users who briefly alt-tab often and don't want
recaps piling up, or who want to lower it for testing.

Non-positive / unset values fall back to the 5-minute default, so
dropping the key has the same behavior as before.

* fix(core): align recap prompt with Claude Code (1-2 sentences, ≤40 words)

The earlier "exactly one sentence, 80-char cap" was an over-correction
to a single in-the-moment ask. Going back to it: the natural shape of
"current task + next action" is two clauses, and forcing them into a
single sentence either crams them with a semicolon or drops the next
action entirely on complex sessions.

Adopt Claude Code's prompt verbatim (extracted from the v2.1.113
binary): "under 40 words, 1-2 plain sentences, no markdown. Lead with
the overall goal and current task, then the one next action. Skip
root-cause narrative, fix internals, secondary to-dos, and em-dash
tangents." Add a Chinese-budget note (~80 chars) and keep the
<recap>...</recap> wrapping that protects against reasoning-model
preambles leaking into the UI.

The sticky banner already re-measures controls height when the
recap toggles, so a 2-line render lays out cleanly.

Sweep "one-line" out of user-facing copy (settings description,
slash-command description, feature docs, design doc) so the
documentation matches the new shape.

* fix(cli): restore "one-line" in user-facing recap copy

Verified from the Claude Code v2.1.113 binary that the slash-command
description IS literally "Generate a one-line session recap now" even
though the underlying prompt allows 1-2 sentences. Claude Code is
deliberately setting a tighter user expectation than the prompt
guarantees, which keeps the surface feel "glanceable".

Mirror that asymmetry: keep the prompt at 1-2 sentences (the previous
commit) for behavioral parity, but put "one-line" back in the user-
visible copy (slash-command description, settings description, user
docs). Internal design doc keeps the accurate "1-2 sentence" wording.

* fix(cli): render recap inline in history to match Claude Code

Earlier I read the user's complaint that the recap "scrolled away" as
"the recap should be sticky above the input box," and built a sticky
banner accordingly. Disassembly of the Claude Code v2.1.113 binary
shows the actual behavior is the opposite: their away_summary is a
plain `type:"system", subtype:"away_summary"` message dispatched
through the standard message renderer (no Static, no anchor, no
flexbox pinning) — it scrolls with the conversation like every other
system message.

Tear out the sticky-banner machinery so recap matches that:

- Recap is back in the `HistoryItemWithoutId` union and `addItem`'d
  into history (both from `/recap` and from auto-trigger), so it
  serializes into session saves and behaves like every other history
  item — no special clear paths, no resume-wrapper, no layout-effect
  re-measure dance.
- `useAwaySummary` takes `addItem` again instead of a setter callback.
- `AwayRecapMessage` renders the way Claude Code does: a 2-column
  gutter with `※`, then bold "recap: " and italic content, all in
  dim color. Drop the prior `StatusMessage`-shaped layout that fused
  prefix and label into "※ recap:".
- Remove the AppContainer plumbing, the slashCommandProcessor state,
  the UIStateContext fields, the DefaultAppLayout / ScreenReader
  placement blocks, the test-utils mocks, and the noninteractive
  stub. Restore `useResumeCommand.handleResume` to a void return
  since callers no longer need the success boolean.

Sweep the design doc so the architecture diagram, files table, and
hook deps reflect the inline-history flow.

* fix(cli): dedupe back-to-back auto-recaps with no new user turns between

Two consecutive blur cycles, each over the threshold but with no new
user activity in between, would each fire their own auto-recap and
add two near-duplicate entries to history (same task, slightly
different wording from temperature-driven LLM variance). Reported
case: leaving the terminal twice while a /review of one PR was
still on screen produced two recaps both about that same review.

Add a `shouldFireRecap` gate before kicking off the LLM call:

- Need at least 3 user messages in history total (don't fire on a
  near-empty session).
- If a previous away_recap is already in history, need at least 2
  new user messages since that one before another can fire.

Same shape as Claude Code's `Ic1` gate (`Sc1=3`, `Rc1=2`). Read
history through a ref so this isn't in the effect's deps and the
effect doesn't re-run on every message.

* fix(cli): type useResumeCommand.handleResume as Promise<void>

Per gemini review on #3482: the interface declared this as `() => void`
but the implementation is `async` and returns `Promise<void>`. The
mismatch silently lost the chainable promise — tests had to launder
it through `as unknown as Promise<void> | undefined` just to await.

Tighten the interface to `Promise<void>` and drop the cast in the
"closes the dialog immediately" test.

* fix(cli): persist auto-fired recap to chat recording so /resume keeps it

Per yiliang114 review on #3482: the manual `/recap` path persists across
`/resume` because the slash-command processor records every output
history item via `chatRecorder.recordSlashCommand({ phase: 'result',
outputHistoryItems })`, but the auto path called `addItem` directly
and bypassed that recorder. The result was an asymmetry where users
who triggered recap manually saw it after `/resume`, while users whose
recap fired automatically lost it.

Mirror the manual recording from useAwaySummary's `.then` callback —
record only the `result` phase (not invocation, since we don't want
a fake `> /recap` user line replayed) with the away-recap item as the
single output. Wrapped in try/catch because recap is best-effort and
must never surface a failure to the user.

Add useAwaySummary.test.ts covering:
- the recording path is taken on a successful auto-trigger
- the dedup gate (`shouldFireRecap`) suppresses the LLM call entirely,
  including the recording, when no new user turns happened since the
  last recap

* fix(cli): cast recap item via spread to satisfy strict tsc --build

CI's `tsc --build` (stricter than local `tsc --noEmit`) rejected the
direct `item as Record<string, unknown>` cast: HistoryItemAwayRecap's
literal `type: 'away_recap'` field doesn't overlap with `unknown`,
TS2352. Use the `{ ...item } as Record<string, unknown>` spread
pattern that the rest of the codebase (arenaCommand,
slashCommandProcessor's serializer) already uses for the same
SlashCommandRecordPayload field.
2026-04-21 14:39:13 +08:00

16 KiB

Session Recap Design

A brief (1-2 sentence) "where did I leave off" summary surfaced when the user returns to an idle session, either on demand (/recap) or after the terminal has been blurred for 5+ minutes.

Overview

When a user /resumes an old session days later, scrolling back through pages of history to remember what they were doing and what came next is a real friction point. Just reloading messages does not solve this UX problem.

The goal is to proactively surface a brief 1-2 sentence recap when the user returns:

  • High-level task (what they are doing) → next step (what to do next).
  • Visually distinct from real assistant replies, so it is never mistaken for new model output.
  • Best-effort: failures must be silent and never break the main flow.

Triggers

Trigger Conditions Implementation
Manual User runs /recap recapCommand.ts calls the same underlying service
Auto Terminal blurred (DECSET 1004 focus protocol) for ≥ 5 min + focus returns + stream is Idle useAwaySummary.ts — 5min blur timer + useFocus event listener

Both paths funnel into a single function — generateSessionRecap() — to guarantee identical behavior. The auto-trigger is gated by general.showSessionRecap (default: off — explicit opt-in, so ambient LLM calls are never silently added to a user's bill); the manual command ignores that setting.

Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                          AppContainer.tsx                              │
│   isFocused = useFocus()                                               │
│   isIdle = streamingState === Idle                                     │
│       │                                                                │
│       ├─→ useAwaySummary({enabled, config, isFocused, isIdle,          │
│       │       │             addItem})                                  │
│       │       └─→ 5 min blur timer + idle/dedupe gates                 │
│       │              │                                                 │
│       │              ↓                                                 │
│       └─→ recapCommand (slash) ─→ generateSessionRecap(config, signal) │
│                                          │                             │
│                                          ↓                             │
│                              ┌─────────────────────────┐               │
│                              │ packages/core/services/ │               │
│                              │   sessionRecap.ts       │               │
│                              └─────────────────────────┘               │
│                                          │                             │
│                                          ↓                             │
│                              GeminiClient.generateContent              │
│                              (fastModel + tools:[])                    │
│                                                                        │
│   addItem({type: 'away_recap', text}) ─→ HistoryItemDisplay            │
│       └─ AwayRecapMessage rendered inline like any other history       │
│         item (※ + bold "recap: " + italic content, all dim);           │
│         scrolls naturally with the conversation. Mirrors Claude        │
│         Code's away_summary system message.                            │
└────────────────────────────────────────────────────────────────────────┘

Files

File Responsibility
packages/core/src/services/sessionRecap.ts One-shot LLM call + history filter + tag extraction
packages/cli/src/ui/hooks/useAwaySummary.ts Auto-trigger React hook
packages/cli/src/ui/commands/recapCommand.ts /recap manual entry point
packages/cli/src/ui/components/messages/StatusMessages.tsx AwayRecapMessage renderer ( + bold recap: + italic content, all dim)
packages/cli/src/ui/types.ts HistoryItemAwayRecap type
packages/cli/src/ui/components/HistoryItemDisplay.tsx Dispatches away_recap history items to the renderer
packages/cli/src/config/settingsSchema.ts general.showSessionRecap + general.sessionRecapAwayThresholdMinutes settings

Prompt Design

System Prompt

generationConfig.systemInstruction replaces the main agent's system prompt for this single call, so the model behaves only as a recap generator and not as a coding assistant.

Note that GeminiClient.generateContent() internally runs the prompt through getCustomSystemPrompt(), which appends the user's memory (QWEN.md / managed auto-memory) as a suffix. The final system prompt is therefore recap prompt + user memory — useful project context for the recap, not a leak.

Bullets below correspond 1:1 with RECAP_SYSTEM_PROMPT:

  • Under 40 words, 1-2 plain sentences (no markdown / lists / headings). For Chinese, treat the budget as roughly 80 characters total.
  • First sentence: the high-level task. Then: the concrete next step.
  • Explicitly forbid: listing what was done, reciting tool calls, status reports.
  • Match the dominant language of the conversation (English or Chinese).
  • Wrap output in <recap>...</recap>; nothing outside the tags.

Structured Output + Extraction

The model is instructed to wrap its answer in <recap>...</recap>:

<recap>Refactoring loopDetectionService.ts to address long-session OOM. Next step is to implement option B.</recap>

Why: some models (GLM family, reasoning models) write a "thinking" paragraph before the final answer. Returning the raw text would leak that reasoning into the UI.

extractRecap() has three fallback tiers:

  1. Both tags present: take what is between <recap>...</recap> (preferred).
  2. Only the open tag (e.g. maxOutputTokens truncated the close tag): take everything after the open tag.
  3. Tag missing entirely: return empty string → service returns null → UI renders nothing.

The third tier is "skip rather than show the wrong thing" — surfacing the model's reasoning preamble is worse than showing no recap at all.

Call Parameters

Parameter Value Reason
model getFastModel() ?? getModel() Recap doesn't need a frontier model
tools [] One-shot query, no tool use
maxOutputTokens 300 Headroom for 1-2 short sentences + tags
temperature 0.3 Mostly deterministic, with a bit of natural variation
systemInstruction The recap-only prompt above Replaces the main agent's role definition

History Filtering

geminiClient.getChat().getHistory() returns a Content[] that includes:

  • user / model text messages
  • model functionCall parts
  • user functionResponse parts (which can hold full file contents)
  • model thought parts (part.thought / part.thoughtSignature, the model's hidden reasoning)

filterToDialog() keeps only user / model parts that have non-empty text and are not thoughts. Two reasons:

  • Tool calls / responses: a single functionResponse can be 10K+ tokens. 30 such messages would drown the recap LLM in irrelevant detail, both wasting tokens and biasing the recap toward implementation noise like "called X tool to read Y file".
  • Thought parts: carry the model's internal reasoning. Including them risks treating hidden chain-of-thought as dialogue and surfacing it in the recap text.

After dropping empty messages, takeRecentDialog slices to the last 30 messages and refuses to start the slice on a dangling model/tool response.

Concurrency and Edge Cases

Auto-trigger hook state machine

useAwaySummary keeps three refs:

Ref Meaning
blurredAtRef Blur start time (not cleared until focus returns)
recapPendingRef Whether an LLM call is in flight
inFlightRef The current in-flight AbortController

useEffect deps: [enabled, config, isFocused, isIdle, addItem, thresholdMs].

Event Action
!enabled || !config Abort in-flight call + clear inFlightRef + clear blurredAtRef
!isFocused and blurredAtRef === null Set blurredAtRef = Date.now()
isFocused and blurredAtRef === null Return early (no blur cycle to handle — first render or right after a brief-blur reset)
isFocused and blur duration < 5 min Clear blurredAtRef, wait for next blur cycle
isFocused and blur ≥ 5 min and recapPendingRef Return (dedupe)
isFocused and blur ≥ 5 min and !isIdle Preserve blurredAtRef and wait for the turn to finish (isIdle is in the deps, so the effect re-fires when streaming completes)
isFocused and blur ≥ 5 min and shouldFireRecap returns false Clear blurredAtRef and return — conversation hasn't moved enough since the last recap (≥ 2 user turns required, mirrors Claude Code)
isFocused and all conditions met Clear blurredAtRef, set recapPendingRef = true, create AbortController, send the LLM request

The .then callback re-checks isIdleRef.current: if the user has started a new turn while the LLM was running, the late-arriving recap is dropped to avoid inserting it mid-turn.

The .finally clears recapPendingRef, and clears inFlightRef only if inFlightRef.current === controller (so it doesn't overwrite a newer controller).

A second useEffect aborts the in-flight controller on unmount.

/recap gating

CommandContext.ui.isIdleRef exposes the current stream state (mirroring the existing btwAbortControllerRef pattern). In interactive mode, recapCommand refuses when !isIdleRef.current or pendingItem !== null. pendingItem alone is insufficient because a normal model reply runs with streamingState === Responding and a null pendingItem.

Configuration and Model Selection

User-facing knobs

Setting Default Notes
general.showSessionRecap false Auto-trigger only. Manual /recap ignores this.
general.sessionRecapAwayThresholdMinutes 5 Minutes blurred before auto-recap fires on focus-in. Matches Claude Code's default.
fastModel unset Recommended (e.g. qwen3-coder-flash) for fast and cheap recaps.

Model fallback

config.getFastModel() ?? config.getModel():

  • User has a fastModel set and it is valid for the current auth type → use fastModel.
  • Otherwise → fall back to the main session model (works, just costlier and slower).

Observability

createDebugLogger('SESSION_RECAP') emits:

  • caught exceptions from the recap path (debugLogger.warn).

All failures are fully transparent to the user — recap is an auxiliary feature and never throws into the UI. Developers can grep for the [SESSION_RECAP] tag in the debug log file: written by default to ~/.qwen/debug/<sessionId>.txt (latest.txt symlinks to the current session); disable via QWEN_DEBUG_LOG_FILE=0.

Out of Scope

Item Why not
Progress UI for /recap (spinner / pendingItem) 3-5 second wait is tolerable; adds complexity.
Automated tests Service is small (~150 lines), end-to-end tested manually first; unit tests can land in a separate PR.
Localized prompts The system prompt is for the model; English is the most reliable substrate. The model selects the output language from the conversation.
QWEN_CODE_ENABLE_AWAY_SUMMARY env var Claude Code uses it to keep the feature on when telemetry is disabled; Qwen Code's current telemetry model doesn't need this.
Auto-recap on /resume completion A natural follow-up but needs a hook point in useResumeCommand; out of scope for this PR.