# Session Recap Design > A brief (1-2 sentence) "where did I leave off" summary surfaced when the user > returns to an idle session, either on demand (`/recap`) or after the > terminal has been blurred for 5+ minutes. ## Overview When a user `/resume`s an old session days later, scrolling back through pages of history to remember **what they were doing and what came next** is a real friction point. Just reloading messages does not solve this UX problem. The goal is to proactively surface a brief 1-2 sentence recap when the user returns: - **High-level task** (what they are doing) → **next step** (what to do next). - Visually distinct from real assistant replies, so it is never mistaken for new model output. - **Best-effort**: failures must be silent and never break the main flow. ## Triggers | Trigger | Conditions | Implementation | | ---------- | -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- | | **Manual** | User runs `/recap` | `recapCommand.ts` calls the same underlying service | | **Auto** | Terminal blurred (DECSET 1004 focus protocol) for ≥ 5 min + focus returns + stream is `Idle` | `useAwaySummary.ts` — 5min blur timer + `useFocus` event listener | Both paths funnel into a single function — `generateSessionRecap()` — to guarantee identical behavior. The auto-trigger is gated by `general.showSessionRecap` (default: off — explicit opt-in, so ambient LLM calls are never silently added to a user's bill); the manual command ignores that setting. ## Architecture ``` ┌────────────────────────────────────────────────────────────────────────┐ │ AppContainer.tsx │ │ isFocused = useFocus() │ │ isIdle = streamingState === Idle │ │ │ │ │ ├─→ useAwaySummary({enabled, config, isFocused, isIdle, │ │ │ │ addItem}) │ │ │ └─→ 5 min blur timer + idle/dedupe gates │ │ │ │ │ │ │ ↓ │ │ └─→ recapCommand (slash) ─→ generateSessionRecap(config, signal) │ │ │ │ │ ↓ │ │ ┌─────────────────────────┐ │ │ │ packages/core/services/ │ │ │ │ sessionRecap.ts │ │ │ └─────────────────────────┘ │ │ │ │ │ ↓ │ │ GeminiClient.generateContent │ │ (fastModel + tools:[]) │ │ │ │ addItem({type: 'away_recap', text}) ─→ HistoryItemDisplay │ │ └─ AwayRecapMessage rendered inline like any other history │ │ item (※ + bold "recap: " + italic content, all dim); │ │ scrolls naturally with the conversation. Mirrors Claude │ │ Code's away_summary system message. │ └────────────────────────────────────────────────────────────────────────┘ ``` ### Files | File | Responsibility | | ------------------------------------------------------------ | -------------------------------------------------------------------------------- | | `packages/core/src/services/sessionRecap.ts` | One-shot LLM call + history filter + tag extraction | | `packages/cli/src/ui/hooks/useAwaySummary.ts` | Auto-trigger React hook | | `packages/cli/src/ui/commands/recapCommand.ts` | `/recap` manual entry point | | `packages/cli/src/ui/components/messages/StatusMessages.tsx` | `AwayRecapMessage` renderer (`※` + bold `recap:` + italic content, all dim) | | `packages/cli/src/ui/types.ts` | `HistoryItemAwayRecap` type | | `packages/cli/src/ui/components/HistoryItemDisplay.tsx` | Dispatches `away_recap` history items to the renderer | | `packages/cli/src/config/settingsSchema.ts` | `general.showSessionRecap` + `general.sessionRecapAwayThresholdMinutes` settings | ## Prompt Design ### System Prompt `generationConfig.systemInstruction` replaces the main agent's system prompt for this single call, so the model behaves only as a recap generator and not as a coding assistant. Note that `GeminiClient.generateContent()` internally runs the prompt through `getCustomSystemPrompt()`, which appends the user's memory (QWEN.md / managed auto-memory) as a suffix. The final system prompt is therefore `recap prompt + user memory` — useful project context for the recap, not a leak. Bullets below correspond 1:1 with `RECAP_SYSTEM_PROMPT`: - Under 40 words, 1-2 plain sentences (no markdown / lists / headings). For Chinese, treat the budget as roughly 80 characters total. - First sentence: the high-level task. Then: the concrete next step. - Explicitly forbid: listing what was done, reciting tool calls, status reports. - Match the dominant language of the conversation (English or Chinese). - Wrap output in `...`; nothing outside the tags. ### Structured Output + Extraction The model is instructed to wrap its answer in `...`: ``` Refactoring loopDetectionService.ts to address long-session OOM. Next step is to implement option B. ``` Why: some models (GLM family, reasoning models) write a "thinking" paragraph before the final answer. Returning the raw text would leak that reasoning into the UI. `extractRecap()` has three fallback tiers: 1. Both tags present: take what is between `...` (preferred). 2. Only the open tag (e.g. `maxOutputTokens` truncated the close tag): take everything after the open tag. 3. Tag missing entirely: return empty string → service returns `null` → UI renders nothing. The third tier is "skip rather than show the wrong thing" — surfacing the model's reasoning preamble is worse than showing no recap at all. ### Call Parameters | Parameter | Value | Reason | | ------------------- | ------------------------------ | ----------------------------------------------------- | | `model` | `getFastModel() ?? getModel()` | Recap doesn't need a frontier model | | `tools` | `[]` | One-shot query, no tool use | | `maxOutputTokens` | `300` | Headroom for 1-2 short sentences + tags | | `temperature` | `0.3` | Mostly deterministic, with a bit of natural variation | | `systemInstruction` | The recap-only prompt above | Replaces the main agent's role definition | ## History Filtering `geminiClient.getChat().getHistory()` returns a `Content[]` that includes: - `user` / `model` text messages - `model` `functionCall` parts - `user` `functionResponse` parts (which can hold full file contents) - `model` thought parts (`part.thought` / `part.thoughtSignature`, the model's hidden reasoning) `filterToDialog()` keeps only `user` / `model` parts that have **non-empty text and are not thoughts**. Two reasons: - **Tool calls / responses**: a single `functionResponse` can be 10K+ tokens. 30 such messages would drown the recap LLM in irrelevant detail, both wasting tokens and biasing the recap toward implementation noise like "called X tool to read Y file". - **Thought parts**: carry the model's internal reasoning. Including them risks treating hidden chain-of-thought as dialogue and surfacing it in the recap text. After dropping empty messages, `takeRecentDialog` slices to the last 30 messages and refuses to start the slice on a dangling model/tool response. ## Concurrency and Edge Cases ### Auto-trigger hook state machine `useAwaySummary` keeps three refs: | Ref | Meaning | | ----------------- | ------------------------------------------------- | | `blurredAtRef` | Blur start time (not cleared until focus returns) | | `recapPendingRef` | Whether an LLM call is in flight | | `inFlightRef` | The current in-flight `AbortController` | `useEffect` deps: `[enabled, config, isFocused, isIdle, addItem, thresholdMs]`. | Event | Action | | ---------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | | `!enabled \|\| !config` | Abort in-flight call + clear `inFlightRef` + clear `blurredAtRef` | | `!isFocused` and `blurredAtRef === null` | Set `blurredAtRef = Date.now()` | | `isFocused` and `blurredAtRef === null` | Return early (no blur cycle to handle — first render or right after a brief-blur reset) | | `isFocused` and blur duration < 5 min | Clear `blurredAtRef`, wait for next blur cycle | | `isFocused` and blur ≥ 5 min and `recapPendingRef` | Return (dedupe) | | `isFocused` and blur ≥ 5 min and `!isIdle` | **Preserve** `blurredAtRef` and wait for the turn to finish (`isIdle` is in the deps, so the effect re-fires when streaming completes) | | `isFocused` and blur ≥ 5 min and `shouldFireRecap` returns false | Clear `blurredAtRef` and return — conversation hasn't moved enough since the last recap (≥ 2 user turns required, mirrors Claude Code) | | `isFocused` and all conditions met | Clear `blurredAtRef`, set `recapPendingRef = true`, create `AbortController`, send the LLM request | The `.then` callback **re-checks** `isIdleRef.current`: if the user has started a new turn while the LLM was running, the late-arriving recap is dropped to avoid inserting it mid-turn. The `.finally` clears `recapPendingRef`, and clears `inFlightRef` only if `inFlightRef.current === controller` (so it doesn't overwrite a newer controller). A second `useEffect` aborts the in-flight controller on unmount. ### `/recap` gating `CommandContext.ui.isIdleRef` exposes the current stream state (mirroring the existing `btwAbortControllerRef` pattern). In interactive mode, `recapCommand` refuses when `!isIdleRef.current` **or** `pendingItem !== null`. `pendingItem` alone is insufficient because a normal model reply runs with `streamingState === Responding` and a null `pendingItem`. ## Configuration and Model Selection ### User-facing knobs | Setting | Default | Notes | | ------------------------------------------ | ------- | ----------------------------------------------------------------------------------- | | `general.showSessionRecap` | `false` | Auto-trigger only. Manual `/recap` ignores this. | | `general.sessionRecapAwayThresholdMinutes` | `5` | Minutes blurred before auto-recap fires on focus-in. Matches Claude Code's default. | | `fastModel` | unset | Recommended (e.g. `qwen3-coder-flash`) for fast and cheap recaps. | ### Model fallback `config.getFastModel() ?? config.getModel()`: - User has a `fastModel` set and it is valid for the current auth type → use `fastModel`. - Otherwise → fall back to the main session model (works, just costlier and slower). ## Observability `createDebugLogger('SESSION_RECAP')` emits: - caught exceptions from the recap path (`debugLogger.warn`). All failures are **fully transparent** to the user — recap is an auxiliary feature and never throws into the UI. Developers can grep for the `[SESSION_RECAP]` tag in the debug log file: written by default to `~/.qwen/debug/.txt` (`latest.txt` symlinks to the current session); disable via `QWEN_DEBUG_LOG_FILE=0`. ## Out of Scope | Item | Why not | | ------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- | | Progress UI for `/recap` (spinner / pendingItem) | 3-5 second wait is tolerable; adds complexity. | | Automated tests | Service is small (~150 lines), end-to-end tested manually first; unit tests can land in a separate PR. | | Localized prompts | The system prompt is for the model; English is the most reliable substrate. The model selects the output language from the conversation. | | `QWEN_CODE_ENABLE_AWAY_SUMMARY` env var | Claude Code uses it to keep the feature on when telemetry is disabled; Qwen Code's current telemetry model doesn't need this. | | Auto-recap on `/resume` completion | A natural follow-up but needs a hook point in `useResumeCommand`; out of scope for this PR. |