qwen-code/integration-tests
Shaojin Wen cae09279fa
fix(cli): bound SubAgent display by visual height to prevent flicker (#3721)
* fix(cli): bound SubAgent display by visual height to prevent flicker

The SubAgent runtime display used hard-coded MAX_TASK_PROMPT_LINES=5 and
MAX_TOOL_CALLS=5 plus character-length truncation (`length > 80`). On narrow
terminals the soft-wrapped content overflowed the available height as the
tool-call list grew, forcing Ink to clear and redraw on every update.

Pull AgentExecutionDisplay onto the same visual-height/visual-width slicing
pattern that ToolMessage and ConversationMessages already use:

- Add `sliceTextByVisualHeight` to textUtils — counts soft wraps as visual
  rows, supports top/bottom overflow direction.
- AgentExecutionDisplay now derives maxTaskPromptLines / maxToolCalls from
  the assigned `availableHeight` and uses `truncateToVisualWidth` (CJK +
  emoji safe) instead of substring(0, 80). Compact mode is unchanged.
- Drop the 300 ms debounced `refreshStatic` AppContainer fired on every
  terminalWidth change — that was a flicker source on resize and the
  static area no longer needs the refresh.

Tests:
- textUtils.test.ts covers undefined maxHeight, top/bottom overflow, and
  soft-wrap counting.
- AgentExecutionDisplay.test.tsx asserts the height-bounded render keeps
  the prompt + tool list inside the assigned rows.
- AppContainer.test.tsx asserts width-only changes no longer clear the
  terminal.

* test(tui): add SubAgent flicker regression script and ANSI counter

Two reusable tools for measuring TUI flicker:

- `scripts/measure-flicker.mjs` — standalone Node script that counts the
  ANSI escape sequences which betray flicker (clearTerminalPair, clearScreen,
  eraseLine, cursorUp) inside any recorded raw stream (`script` log,
  `tmux pipe-pane` output, custom PTY capture). Supports baseline diff mode.

- `integration-tests/terminal-capture/subagent-flicker-regression.ts` —
  end-to-end ratchet that boots a mock OpenAI server, drives a real qwen
  process through an `agent` tool dispatch + 5 `read_file` SubAgent rounds,
  then reads PTY bytes and asserts ANSI-redraw counts stay below configured
  ceilings. Mirrors PR #43f128b20's resize-clear-regression pattern.

Reference numbers (60-col / 18-row terminal, fixed build):
  clearTerminalPair=5, clearScreen=10, eraseLine=440, cursorUp=132

The ratchet defaults to 10/20 ceilings — roughly 2× steady state — so
regressions like reverting sliceTextByVisualHeight or restoring the
width-driven refreshStatic trip the build.

Implementation notes captured in the script's docstring:
  - Strips HTTP_PROXY family env vars (NO_PROXY isn't honored by undici,
    so corp proxy would otherwise hijack the loopback request).
  - Drops `--bare` (bare mode hard-codes the registered tool set and
    rejects the `agent` tool); HOME is sandboxed to a temp dir instead.
  - Mock server speaks SSE because the CLI requests stream:true.

* fix(cli): address inline review on SubAgent flicker fix

Three issues from inline review on PR #3721:

1. **availableHeight as total budget (Critical).** The previous formula
   only constrained prompt + tool-call height, not the surrounding
   header / section labels / gaps / footer. Default and verbose mode
   could still overrun the parent-provided budget. Subtract a fixed-row
   overhead (10 rows running, 18 completed) before computing
   `maxTaskPromptLines` / `maxToolCalls`. Add unit tests that assert the
   rendered frame line-count stays within `availableHeight` for both
   running and completed states.

2. **Ratchet that actually distinguishes fix from no-fix.** The previous
   `clearTerminalPair` / `clearScreen` ceilings passed for both fixed
   and unfixed builds. Add an `eraseLine` upper bound (default 460) —
   that's the metric whose drop reflects the in-place-update efficiency
   the visual-height fix delivers (no-fix observed 469, with-fix 434).
   Refresh docstring with the current numbers and a coverage map that
   honestly states what this ratchet does and does not exercise.

3. **Keypress scope.** `useKeypress` was active on every mounted
   `AgentExecutionDisplay`, including completed/historical instances in
   chat history — Ctrl+E / Ctrl+F would toggle them all in lock-step
   and cause large scrollback reflows. Gate `isActive` on
   `data.status === 'running'`. Test mock now also honors
   `{ isActive }` so the new "completed displays ignore Ctrl+E"
   regression is enforceable.

* fix(cli): address round-2 inline review on SubAgent flicker

Three follow-up issues from inline review on PR #3721:

1. **sliceTextByVisualHeight reservedRows early-return (Critical).**
   The early return compared `visualLineCount <= targetMaxHeight` and
   ignored `reservedRows`, so a caller asking us to keep one row free
   for a footer could still receive the full input back with
   `hiddenLinesCount: 0` even though only `targetMaxHeight - reservedRows`
   content rows were actually available. Compare against
   `visibleContentHeight` instead and add a regression test for the
   `'a\nb\nc' / 3 / reservedRows: 1` case the reviewer flagged.

2. **Footer hint and rendered prompt now share one slicing result
   (Suggestion).** Previously `hasMoreLines` looked at
   `data.taskPrompt.split('\n').length` (hard newlines only), but the
   prompt body was already truncated by `sliceTextByVisualHeight` (which
   counts soft wraps). A long single-line prompt could be visually
   truncated without the footer ever surfacing the "ctrl+f to show
   more" hint. Lift the slice into the parent component and feed both
   the rendered `TaskPromptSection` and the footer's `hasMoreLines`
   from the same `hiddenLinesCount`.

3. **Running → completed transition test (Critical).** The previous
   "completed displays ignore Ctrl+E" test rendered already-completed
   data, so `useKeypress` was inactive from the start and Ctrl+E was a
   no-op trivially. It missed the real path: a running subagent gets
   expanded, then completes while preserving the expanded
   `displayMode` — which is exactly when the completed-state budget
   has to hold the layout. Replace the test with a `rerender`-based
   one that runs the full transition, asserts the completed expanded
   frame stays within `availableHeight`, and asserts the post-transition
   Ctrl+E is a no-op. Bumped `COMPLETED_FIXED_OVERHEAD` from 18 to 22
   to accommodate the ExecutionSummary + ToolUsage block accounting
   that the new transition test exposed.

* fix(cli): gate SubAgent useKeypress on isFocused for parallel runs

Per @yiliang114's review on PR #3721 — `data.status === 'running'` alone
fixes the historical/scrollback case but two SubAgents running in parallel
both stay `running`, so a single Ctrl+E / Ctrl+F still toggles them in
lock-step and the dual reflow brings back the flicker the gating was meant
to prevent. The component already receives `isFocused` from ToolMessage
(via SubagentExecutionRenderer) for the inline confirmation prompt — reuse
it on the keypress hook:

  isActive: data.status === 'running' && isFocused

Adds a regression test that renders a running SubAgent with
`isFocused={false}` and asserts Ctrl+E is a no-op (frame unchanged).

---------

Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
2026-04-29 22:34:55 +08:00
..
cli feat(web-search): remove built-in web_search tool, replace with MCP-based approach (#3502) 2026-04-24 11:29:02 +08:00
concurrent-runner feat(web-search): remove built-in web_search tool, replace with MCP-based approach (#3502) 2026-04-24 11:29:02 +08:00
fixtures/settings-migration refactor: remove summarizeToolOutput feature 2026-03-15 13:51:32 +08:00
hook-integration feat(hooks): Add HTTP Hook, Function Hook and Async Hook support (#2827) 2026-04-16 10:10:33 +08:00
interactive test(integration): match new cron notification format in interactive tests (#3402) 2026-04-17 22:56:34 +08:00
sdk-typescript feat(web-search): remove built-in web_search tool, replace with MCP-based approach (#3502) 2026-04-24 11:29:02 +08:00
terminal-bench Terminal Bench Integration Test (#521) 2025-09-05 17:02:03 +08:00
terminal-capture fix(cli): bound SubAgent display by visual height to prevent flicker (#3721) 2026-04-29 22:34:55 +08:00
channel-plugin.test.ts docs(channels): add plugin developer guide and rename mock to plugin-example 2026-03-27 03:19:34 +00:00
globalSetup.ts feat(memory): managed auto-memory and auto-dream system (#3087) 2026-04-16 20:05:45 +08:00
test-helper.ts fix(integration-tests): honor stdinDoesNotEnd option (#2966) 2026-04-18 09:17:27 +08:00
test-mcp-server.ts # 🚀 Sync Gemini CLI v0.2.1 - Major Feature Update (#483) 2025-09-01 14:48:55 +08:00
tsconfig.json chore: rename @qwen-code/sdk-typescript to @qwen-code/sdk 2025-12-05 21:47:26 +08:00
vitest.config.ts add doc for hooks and skip integration test 2026-03-12 07:44:26 -07:00
vitest.terminal-bench.config.ts Fix E2E caused by Terminal Bench test (#529) 2025-09-08 10:51:14 +08:00