* feat(cli): make recap away-threshold configurable The 5-minute blur threshold was hard-coded. Confirmed from Claude Code's own binary (v2.1.113) that 5 minutes is their default as well (and that they shift to 60 minutes when 1h prompt-cache is active) — so the default stays, but expose it as `general.sessionRecapAway ThresholdMinutes` for users who briefly alt-tab often and don't want recaps piling up, or who want to lower it for testing. Non-positive / unset values fall back to the 5-minute default, so dropping the key has the same behavior as before. * fix(core): align recap prompt with Claude Code (1-2 sentences, ≤40 words) The earlier "exactly one sentence, 80-char cap" was an over-correction to a single in-the-moment ask. Going back to it: the natural shape of "current task + next action" is two clauses, and forcing them into a single sentence either crams them with a semicolon or drops the next action entirely on complex sessions. Adopt Claude Code's prompt verbatim (extracted from the v2.1.113 binary): "under 40 words, 1-2 plain sentences, no markdown. Lead with the overall goal and current task, then the one next action. Skip root-cause narrative, fix internals, secondary to-dos, and em-dash tangents." Add a Chinese-budget note (~80 chars) and keep the <recap>...</recap> wrapping that protects against reasoning-model preambles leaking into the UI. The sticky banner already re-measures controls height when the recap toggles, so a 2-line render lays out cleanly. Sweep "one-line" out of user-facing copy (settings description, slash-command description, feature docs, design doc) so the documentation matches the new shape. * fix(cli): restore "one-line" in user-facing recap copy Verified from the Claude Code v2.1.113 binary that the slash-command description IS literally "Generate a one-line session recap now" even though the underlying prompt allows 1-2 sentences. Claude Code is deliberately setting a tighter user expectation than the prompt guarantees, which keeps the surface feel "glanceable". Mirror that asymmetry: keep the prompt at 1-2 sentences (the previous commit) for behavioral parity, but put "one-line" back in the user- visible copy (slash-command description, settings description, user docs). Internal design doc keeps the accurate "1-2 sentence" wording. * fix(cli): render recap inline in history to match Claude Code Earlier I read the user's complaint that the recap "scrolled away" as "the recap should be sticky above the input box," and built a sticky banner accordingly. Disassembly of the Claude Code v2.1.113 binary shows the actual behavior is the opposite: their away_summary is a plain `type:"system", subtype:"away_summary"` message dispatched through the standard message renderer (no Static, no anchor, no flexbox pinning) — it scrolls with the conversation like every other system message. Tear out the sticky-banner machinery so recap matches that: - Recap is back in the `HistoryItemWithoutId` union and `addItem`'d into history (both from `/recap` and from auto-trigger), so it serializes into session saves and behaves like every other history item — no special clear paths, no resume-wrapper, no layout-effect re-measure dance. - `useAwaySummary` takes `addItem` again instead of a setter callback. - `AwayRecapMessage` renders the way Claude Code does: a 2-column gutter with `※`, then bold "recap: " and italic content, all in dim color. Drop the prior `StatusMessage`-shaped layout that fused prefix and label into "※ recap:". - Remove the AppContainer plumbing, the slashCommandProcessor state, the UIStateContext fields, the DefaultAppLayout / ScreenReader placement blocks, the test-utils mocks, and the noninteractive stub. Restore `useResumeCommand.handleResume` to a void return since callers no longer need the success boolean. Sweep the design doc so the architecture diagram, files table, and hook deps reflect the inline-history flow. * fix(cli): dedupe back-to-back auto-recaps with no new user turns between Two consecutive blur cycles, each over the threshold but with no new user activity in between, would each fire their own auto-recap and add two near-duplicate entries to history (same task, slightly different wording from temperature-driven LLM variance). Reported case: leaving the terminal twice while a /review of one PR was still on screen produced two recaps both about that same review. Add a `shouldFireRecap` gate before kicking off the LLM call: - Need at least 3 user messages in history total (don't fire on a near-empty session). - If a previous away_recap is already in history, need at least 2 new user messages since that one before another can fire. Same shape as Claude Code's `Ic1` gate (`Sc1=3`, `Rc1=2`). Read history through a ref so this isn't in the effect's deps and the effect doesn't re-run on every message. * fix(cli): type useResumeCommand.handleResume as Promise<void> Per gemini review on #3482: the interface declared this as `() => void` but the implementation is `async` and returns `Promise<void>`. The mismatch silently lost the chainable promise — tests had to launder it through `as unknown as Promise<void> | undefined` just to await. Tighten the interface to `Promise<void>` and drop the cast in the "closes the dialog immediately" test. * fix(cli): persist auto-fired recap to chat recording so /resume keeps it Per yiliang114 review on #3482: the manual `/recap` path persists across `/resume` because the slash-command processor records every output history item via `chatRecorder.recordSlashCommand({ phase: 'result', outputHistoryItems })`, but the auto path called `addItem` directly and bypassed that recorder. The result was an asymmetry where users who triggered recap manually saw it after `/resume`, while users whose recap fired automatically lost it. Mirror the manual recording from useAwaySummary's `.then` callback — record only the `result` phase (not invocation, since we don't want a fake `> /recap` user line replayed) with the away-recap item as the single output. Wrapped in try/catch because recap is best-effort and must never surface a failure to the user. Add useAwaySummary.test.ts covering: - the recording path is taken on a successful auto-trigger - the dedup gate (`shouldFireRecap`) suppresses the LLM call entirely, including the recording, when no new user turns happened since the last recap * fix(cli): cast recap item via spread to satisfy strict tsc --build CI's `tsc --build` (stricter than local `tsc --noEmit`) rejected the direct `item as Record<string, unknown>` cast: HistoryItemAwayRecap's literal `type: 'away_recap'` field doesn't overlap with `unknown`, TS2352. Use the `{ ...item } as Record<string, unknown>` spread pattern that the rest of the codebase (arenaCommand, slashCommandProcessor's serializer) already uses for the same SlashCommandRecordPayload field.
16 KiB
Session Recap Design
A brief (1-2 sentence) "where did I leave off" summary surfaced when the user returns to an idle session, either on demand (
/recap) or after the terminal has been blurred for 5+ minutes.
Overview
When a user /resumes an old session days later, scrolling back through
pages of history to remember what they were doing and what came next
is a real friction point. Just reloading messages does not solve this
UX problem.
The goal is to proactively surface a brief 1-2 sentence recap when the user returns:
- High-level task (what they are doing) → next step (what to do next).
- Visually distinct from real assistant replies, so it is never mistaken for new model output.
- Best-effort: failures must be silent and never break the main flow.
Triggers
| Trigger | Conditions | Implementation |
|---|---|---|
| Manual | User runs /recap |
recapCommand.ts calls the same underlying service |
| Auto | Terminal blurred (DECSET 1004 focus protocol) for ≥ 5 min + focus returns + stream is Idle |
useAwaySummary.ts — 5min blur timer + useFocus event listener |
Both paths funnel into a single function — generateSessionRecap() — to
guarantee identical behavior. The auto-trigger is gated by
general.showSessionRecap (default: off — explicit opt-in, so ambient
LLM calls are never silently added to a user's bill); the manual
command ignores that setting.
Architecture
┌────────────────────────────────────────────────────────────────────────┐
│ AppContainer.tsx │
│ isFocused = useFocus() │
│ isIdle = streamingState === Idle │
│ │ │
│ ├─→ useAwaySummary({enabled, config, isFocused, isIdle, │
│ │ │ addItem}) │
│ │ └─→ 5 min blur timer + idle/dedupe gates │
│ │ │ │
│ │ ↓ │
│ └─→ recapCommand (slash) ─→ generateSessionRecap(config, signal) │
│ │ │
│ ↓ │
│ ┌─────────────────────────┐ │
│ │ packages/core/services/ │ │
│ │ sessionRecap.ts │ │
│ └─────────────────────────┘ │
│ │ │
│ ↓ │
│ GeminiClient.generateContent │
│ (fastModel + tools:[]) │
│ │
│ addItem({type: 'away_recap', text}) ─→ HistoryItemDisplay │
│ └─ AwayRecapMessage rendered inline like any other history │
│ item (※ + bold "recap: " + italic content, all dim); │
│ scrolls naturally with the conversation. Mirrors Claude │
│ Code's away_summary system message. │
└────────────────────────────────────────────────────────────────────────┘
Files
| File | Responsibility |
|---|---|
packages/core/src/services/sessionRecap.ts |
One-shot LLM call + history filter + tag extraction |
packages/cli/src/ui/hooks/useAwaySummary.ts |
Auto-trigger React hook |
packages/cli/src/ui/commands/recapCommand.ts |
/recap manual entry point |
packages/cli/src/ui/components/messages/StatusMessages.tsx |
AwayRecapMessage renderer (※ + bold recap: + italic content, all dim) |
packages/cli/src/ui/types.ts |
HistoryItemAwayRecap type |
packages/cli/src/ui/components/HistoryItemDisplay.tsx |
Dispatches away_recap history items to the renderer |
packages/cli/src/config/settingsSchema.ts |
general.showSessionRecap + general.sessionRecapAwayThresholdMinutes settings |
Prompt Design
System Prompt
generationConfig.systemInstruction replaces the main agent's system
prompt for this single call, so the model behaves only as a recap
generator and not as a coding assistant.
Note that GeminiClient.generateContent() internally runs the prompt
through getCustomSystemPrompt(), which appends the user's memory
(QWEN.md / managed auto-memory) as a suffix. The final system prompt is
therefore recap prompt + user memory — useful project context for the
recap, not a leak.
Bullets below correspond 1:1 with RECAP_SYSTEM_PROMPT:
- Under 40 words, 1-2 plain sentences (no markdown / lists / headings). For Chinese, treat the budget as roughly 80 characters total.
- First sentence: the high-level task. Then: the concrete next step.
- Explicitly forbid: listing what was done, reciting tool calls, status reports.
- Match the dominant language of the conversation (English or Chinese).
- Wrap output in
<recap>...</recap>; nothing outside the tags.
Structured Output + Extraction
The model is instructed to wrap its answer in <recap>...</recap>:
<recap>Refactoring loopDetectionService.ts to address long-session OOM. Next step is to implement option B.</recap>
Why: some models (GLM family, reasoning models) write a "thinking" paragraph before the final answer. Returning the raw text would leak that reasoning into the UI.
extractRecap() has three fallback tiers:
- Both tags present: take what is between
<recap>...</recap>(preferred). - Only the open tag (e.g.
maxOutputTokenstruncated the close tag): take everything after the open tag. - Tag missing entirely: return empty string → service returns
null→ UI renders nothing.
The third tier is "skip rather than show the wrong thing" — surfacing the model's reasoning preamble is worse than showing no recap at all.
Call Parameters
| Parameter | Value | Reason |
|---|---|---|
model |
getFastModel() ?? getModel() |
Recap doesn't need a frontier model |
tools |
[] |
One-shot query, no tool use |
maxOutputTokens |
300 |
Headroom for 1-2 short sentences + tags |
temperature |
0.3 |
Mostly deterministic, with a bit of natural variation |
systemInstruction |
The recap-only prompt above | Replaces the main agent's role definition |
History Filtering
geminiClient.getChat().getHistory() returns a Content[] that
includes:
user/modeltext messagesmodelfunctionCallpartsuserfunctionResponseparts (which can hold full file contents)modelthought parts (part.thought/part.thoughtSignature, the model's hidden reasoning)
filterToDialog() keeps only user / model parts that have non-empty
text and are not thoughts. Two reasons:
- Tool calls / responses: a single
functionResponsecan be 10K+ tokens. 30 such messages would drown the recap LLM in irrelevant detail, both wasting tokens and biasing the recap toward implementation noise like "called X tool to read Y file". - Thought parts: carry the model's internal reasoning. Including them risks treating hidden chain-of-thought as dialogue and surfacing it in the recap text.
After dropping empty messages, takeRecentDialog slices to the last 30
messages and refuses to start the slice on a dangling model/tool
response.
Concurrency and Edge Cases
Auto-trigger hook state machine
useAwaySummary keeps three refs:
| Ref | Meaning |
|---|---|
blurredAtRef |
Blur start time (not cleared until focus returns) |
recapPendingRef |
Whether an LLM call is in flight |
inFlightRef |
The current in-flight AbortController |
useEffect deps: [enabled, config, isFocused, isIdle, addItem, thresholdMs].
| Event | Action |
|---|---|
!enabled || !config |
Abort in-flight call + clear inFlightRef + clear blurredAtRef |
!isFocused and blurredAtRef === null |
Set blurredAtRef = Date.now() |
isFocused and blurredAtRef === null |
Return early (no blur cycle to handle — first render or right after a brief-blur reset) |
isFocused and blur duration < 5 min |
Clear blurredAtRef, wait for next blur cycle |
isFocused and blur ≥ 5 min and recapPendingRef |
Return (dedupe) |
isFocused and blur ≥ 5 min and !isIdle |
Preserve blurredAtRef and wait for the turn to finish (isIdle is in the deps, so the effect re-fires when streaming completes) |
isFocused and blur ≥ 5 min and shouldFireRecap returns false |
Clear blurredAtRef and return — conversation hasn't moved enough since the last recap (≥ 2 user turns required, mirrors Claude Code) |
isFocused and all conditions met |
Clear blurredAtRef, set recapPendingRef = true, create AbortController, send the LLM request |
The .then callback re-checks isIdleRef.current: if the user has
started a new turn while the LLM was running, the late-arriving recap
is dropped to avoid inserting it mid-turn.
The .finally clears recapPendingRef, and clears inFlightRef only
if inFlightRef.current === controller (so it doesn't overwrite a
newer controller).
A second useEffect aborts the in-flight controller on unmount.
/recap gating
CommandContext.ui.isIdleRef exposes the current stream state
(mirroring the existing btwAbortControllerRef pattern). In
interactive mode, recapCommand refuses when !isIdleRef.current
or pendingItem !== null. pendingItem alone is insufficient
because a normal model reply runs with streamingState === Responding
and a null pendingItem.
Configuration and Model Selection
User-facing knobs
| Setting | Default | Notes |
|---|---|---|
general.showSessionRecap |
false |
Auto-trigger only. Manual /recap ignores this. |
general.sessionRecapAwayThresholdMinutes |
5 |
Minutes blurred before auto-recap fires on focus-in. Matches Claude Code's default. |
fastModel |
unset | Recommended (e.g. qwen3-coder-flash) for fast and cheap recaps. |
Model fallback
config.getFastModel() ?? config.getModel():
- User has a
fastModelset and it is valid for the current auth type → usefastModel. - Otherwise → fall back to the main session model (works, just costlier and slower).
Observability
createDebugLogger('SESSION_RECAP') emits:
- caught exceptions from the recap path (
debugLogger.warn).
All failures are fully transparent to the user — recap is an
auxiliary feature and never throws into the UI. Developers can grep for
the [SESSION_RECAP] tag in the debug log file: written by default to
~/.qwen/debug/<sessionId>.txt (latest.txt symlinks to the current
session); disable via QWEN_DEBUG_LOG_FILE=0.
Out of Scope
| Item | Why not |
|---|---|
Progress UI for /recap (spinner / pendingItem) |
3-5 second wait is tolerable; adds complexity. |
| Automated tests | Service is small (~150 lines), end-to-end tested manually first; unit tests can land in a separate PR. |
| Localized prompts | The system prompt is for the model; English is the most reliable substrate. The model selects the output language from the conversation. |
QWEN_CODE_ENABLE_AWAY_SUMMARY env var |
Claude Code uses it to keep the feature on when telemetry is disabled; Qwen Code's current telemetry model doesn't need this. |
Auto-recap on /resume completion |
A natural follow-up but needs a hook point in useResumeCommand; out of scope for this PR. |