qwen-code/docs/design
Shaojin Wen 519e5aa1de
fix(core): recover from truncated tool calls via multi-turn continuation (#3313)
* fix(core): recover from truncated tool calls via multi-turn continuation (#3049)

When large tool calls (e.g., WriteFile with big HTML) exceed the output
token limit, the model's response gets truncated and required parameters
like file_path are missing. Previously this surfaced as a confusing
"params must have required property" error.

Three-layer defense:

1. Escalate to model's actual output limit (not fixed 64K). Models with
   128K output (Claude Opus, GPT-5) now use their full capacity.

2. Multi-turn recovery: if the escalated response is still truncated,
   keep the partial response in history and inject a recovery message
   ("Resume directly — pick up mid-thought") so the model continues
   from where it left off. Up to 3 recovery attempts before falling
   back to the tool scheduler's guidance.

3. Stronger truncation guidance as fallback: "you MUST split" instead
   of "consider splitting".

Also fixes:
- Clear toolCallRequests on RETRY to prevent duplicate tool execution
- Add isContinuation flag to RETRY events so the UI preserves text
  buffers during recovery (continuation) but resets them during
  escalation (fresh restart)
- Catch errors during recovery to prevent dangling history entries

* docs: update adaptive output token escalation design for recovery mechanism

Update the design doc to reflect:
- Escalation now targets model's actual output limit (64K floor)
- Multi-turn recovery loop after escalation (up to 3 attempts)
- isContinuation flag on RETRY events
- Recovery error handling (pop dangling message, break)
- Updated constants table and model-specific escalation limits
- New design decision: why multi-turn recovery over progressive escalation

* fix: remove competitor reference from code comment

* fix: address review feedback on recovery mechanism

Three correctness fixes from @tanzhenxin's review:

1. Partial text lost during continuation (useGeminiStream.ts):
   On continuation RETRY, setPendingHistoryItem(null) cleared the pending
   gemini item. The next Content event then saw a null pending item,
   created a fresh one, and reset geminiMessageBuffer = eventValue —
   discarding the preserved partial text. Now the pending item AND
   buffers are kept on continuation, so the continuation appends.

2. Recovery on truncated tool-call turns (geminiChat.ts):
   When the truncated turn already contains a complete functionCall,
   appending a user recovery message produces model(functionCall) →
   user(text) with no intervening functionResponse — an invalid API
   sequence. Now recovery skips turns with functionCall parts and
   defers to the tool scheduler's layer-3 fallback.

3. Recovery errors swallowed after partial chunks (geminiChat.ts):
   If a recovery attempt yielded chunks then failed, the catch block
   broke without emitting any terminal signal, leaving the UI with
   partial text and no Finished event. Now emits a synthetic
   finishReason=STOP chunk in the catch so the UI gets a proper
   terminal signal.

* test: add coverage for output token recovery loop

Four targeted tests for the recovery mechanism introduced in the
truncated-tool-call-recovery PR:

1. Recovery loop fires when escalated response is also truncated:
   initial MAX_TOKENS → escalation MAX_TOKENS → recovery STOP. Verifies
   two RETRY events (one escalation, one continuation) and three API
   calls.

2. Recovery is skipped when truncated turn contains a functionCall:
   prevents the invalid model(functionCall) → user(text) sequence.
   Verifies no continuation RETRY and history ends with the functionCall
   intact.

3. Recovery attempts are capped at MAX_OUTPUT_RECOVERY_ATTEMPTS (3):
   persistent MAX_TOKENS triggers exactly 5 API calls (1 initial + 1
   escalation + 3 recovery).

4. Recovery catch block emits synthetic STOP chunk and pops dangling
   user message: when a recovery attempt fails (empty stream →
   InvalidStreamError), the UI gets a terminal signal and history
   ends on the model turn, not a dangling user recovery message.

* test: cover cross-iteration functionCall detection in recovery loop

Existing tests cover the functionCall guard when both initial and
escalated responses have functionCall. This adds a test for the
cross-iteration case: iter 1 returns text (recovery proceeds), iter 2
returns functionCall (recovery must break before iter 3).

Verifies:
- API called exactly 4 times (1 initial + 1 escalation + 2 recovery)
- History ends with the functionCall model turn, not a dangling user
  recovery message
- Iter 3's user recovery message is never pushed (guard fires at top
  of loop before recoveryCount increment)

* fix(core): cast synthetic STOP chunk via unknown for TS2352

The object literal {candidates, content, parts} doesn't structurally
overlap enough with GenerateContentResponse for TypeScript's strict
narrow cast. Casting through 'unknown' is required per TS2352.

Build error from CI:
  src/core/geminiChat.ts(651,24): error TS2352: Conversion of type '...'
  to type 'GenerateContentResponse' may be a mistake because neither
  type sufficiently overlaps with the other. If this was intentional,
  convert the expression to 'unknown' first.

* test(core): tighten recovery history integrity assertions

Strengthen the "pop dangling recovery message" test to catch any
future regression that leaves consecutive same-role entries or an
empty last-model placeholder in history — conditions providers
reject on the next turn.

* fix(core): coalesce recovery pairs to avoid leaking control prompt

Previously every output-token recovery iteration left a (user, model)
pair in durable history where the user turn was the internal
OUTPUT_RECOVERY_MESSAGE control prompt. That prompt was then visible
to every later turn, biasing responses and polluting compression,
replay, and export.

Track successful recovery iterations and, after the recovery loop,
fold each completed pair back into the preceding model turn via a
new `coalesceRecoveryPairs` helper. Failed iterations already pop
their user turn in the catch block, so they need no coalescing.

Adds a targeted test that runs escalation + two successful recovery
iterations + a clean STOP, and asserts the merged history has
exactly one user turn and one model turn, no trace of the control
prompt text, and content ordered as B (escalation) + C + D.
2026-04-21 17:04:24 +08:00
..
adaptive-output-token-escalation fix(core): recover from truncated tool calls via multi-turn continuation (#3313) 2026-04-21 17:04:24 +08:00
auto-memory feat(memory): managed auto-memory and auto-dream system (#3087) 2026-04-16 20:05:45 +08:00
channels docs(channels): consolidate design docs into single file 2026-04-02 11:17:37 +08:00
compact-mode feat: optimize compact mode UX — shortcuts, settings sync, and safety (#3100) 2026-04-16 09:29:24 +08:00
fork-subagent feat(core): implement fork subagent for context sharing (#2936) 2026-04-14 14:27:38 +08:00
prompt-suggestion fix(followup): prevent tool call UI leak and Enter accept buffer race (#2872) 2026-04-09 00:07:03 +08:00
session-recap fix(cli): rework session recap rendering and add blur threshold setting (#3482) 2026-04-21 14:39:13 +08:00
slash-command refactor(cli): replace slash command whitelist with capability-based filtering (Phase 1) (#3283) 2026-04-20 14:34:43 +08:00