qwen-code/scripts/measure-flicker.mjs
Shaojin Wen cae09279fa
fix(cli): bound SubAgent display by visual height to prevent flicker (#3721)
* fix(cli): bound SubAgent display by visual height to prevent flicker

The SubAgent runtime display used hard-coded MAX_TASK_PROMPT_LINES=5 and
MAX_TOOL_CALLS=5 plus character-length truncation (`length > 80`). On narrow
terminals the soft-wrapped content overflowed the available height as the
tool-call list grew, forcing Ink to clear and redraw on every update.

Pull AgentExecutionDisplay onto the same visual-height/visual-width slicing
pattern that ToolMessage and ConversationMessages already use:

- Add `sliceTextByVisualHeight` to textUtils — counts soft wraps as visual
  rows, supports top/bottom overflow direction.
- AgentExecutionDisplay now derives maxTaskPromptLines / maxToolCalls from
  the assigned `availableHeight` and uses `truncateToVisualWidth` (CJK +
  emoji safe) instead of substring(0, 80). Compact mode is unchanged.
- Drop the 300 ms debounced `refreshStatic` AppContainer fired on every
  terminalWidth change — that was a flicker source on resize and the
  static area no longer needs the refresh.

Tests:
- textUtils.test.ts covers undefined maxHeight, top/bottom overflow, and
  soft-wrap counting.
- AgentExecutionDisplay.test.tsx asserts the height-bounded render keeps
  the prompt + tool list inside the assigned rows.
- AppContainer.test.tsx asserts width-only changes no longer clear the
  terminal.

* test(tui): add SubAgent flicker regression script and ANSI counter

Two reusable tools for measuring TUI flicker:

- `scripts/measure-flicker.mjs` — standalone Node script that counts the
  ANSI escape sequences which betray flicker (clearTerminalPair, clearScreen,
  eraseLine, cursorUp) inside any recorded raw stream (`script` log,
  `tmux pipe-pane` output, custom PTY capture). Supports baseline diff mode.

- `integration-tests/terminal-capture/subagent-flicker-regression.ts` —
  end-to-end ratchet that boots a mock OpenAI server, drives a real qwen
  process through an `agent` tool dispatch + 5 `read_file` SubAgent rounds,
  then reads PTY bytes and asserts ANSI-redraw counts stay below configured
  ceilings. Mirrors PR #43f128b20's resize-clear-regression pattern.

Reference numbers (60-col / 18-row terminal, fixed build):
  clearTerminalPair=5, clearScreen=10, eraseLine=440, cursorUp=132

The ratchet defaults to 10/20 ceilings — roughly 2× steady state — so
regressions like reverting sliceTextByVisualHeight or restoring the
width-driven refreshStatic trip the build.

Implementation notes captured in the script's docstring:
  - Strips HTTP_PROXY family env vars (NO_PROXY isn't honored by undici,
    so corp proxy would otherwise hijack the loopback request).
  - Drops `--bare` (bare mode hard-codes the registered tool set and
    rejects the `agent` tool); HOME is sandboxed to a temp dir instead.
  - Mock server speaks SSE because the CLI requests stream:true.

* fix(cli): address inline review on SubAgent flicker fix

Three issues from inline review on PR #3721:

1. **availableHeight as total budget (Critical).** The previous formula
   only constrained prompt + tool-call height, not the surrounding
   header / section labels / gaps / footer. Default and verbose mode
   could still overrun the parent-provided budget. Subtract a fixed-row
   overhead (10 rows running, 18 completed) before computing
   `maxTaskPromptLines` / `maxToolCalls`. Add unit tests that assert the
   rendered frame line-count stays within `availableHeight` for both
   running and completed states.

2. **Ratchet that actually distinguishes fix from no-fix.** The previous
   `clearTerminalPair` / `clearScreen` ceilings passed for both fixed
   and unfixed builds. Add an `eraseLine` upper bound (default 460) —
   that's the metric whose drop reflects the in-place-update efficiency
   the visual-height fix delivers (no-fix observed 469, with-fix 434).
   Refresh docstring with the current numbers and a coverage map that
   honestly states what this ratchet does and does not exercise.

3. **Keypress scope.** `useKeypress` was active on every mounted
   `AgentExecutionDisplay`, including completed/historical instances in
   chat history — Ctrl+E / Ctrl+F would toggle them all in lock-step
   and cause large scrollback reflows. Gate `isActive` on
   `data.status === 'running'`. Test mock now also honors
   `{ isActive }` so the new "completed displays ignore Ctrl+E"
   regression is enforceable.

* fix(cli): address round-2 inline review on SubAgent flicker

Three follow-up issues from inline review on PR #3721:

1. **sliceTextByVisualHeight reservedRows early-return (Critical).**
   The early return compared `visualLineCount <= targetMaxHeight` and
   ignored `reservedRows`, so a caller asking us to keep one row free
   for a footer could still receive the full input back with
   `hiddenLinesCount: 0` even though only `targetMaxHeight - reservedRows`
   content rows were actually available. Compare against
   `visibleContentHeight` instead and add a regression test for the
   `'a\nb\nc' / 3 / reservedRows: 1` case the reviewer flagged.

2. **Footer hint and rendered prompt now share one slicing result
   (Suggestion).** Previously `hasMoreLines` looked at
   `data.taskPrompt.split('\n').length` (hard newlines only), but the
   prompt body was already truncated by `sliceTextByVisualHeight` (which
   counts soft wraps). A long single-line prompt could be visually
   truncated without the footer ever surfacing the "ctrl+f to show
   more" hint. Lift the slice into the parent component and feed both
   the rendered `TaskPromptSection` and the footer's `hasMoreLines`
   from the same `hiddenLinesCount`.

3. **Running → completed transition test (Critical).** The previous
   "completed displays ignore Ctrl+E" test rendered already-completed
   data, so `useKeypress` was inactive from the start and Ctrl+E was a
   no-op trivially. It missed the real path: a running subagent gets
   expanded, then completes while preserving the expanded
   `displayMode` — which is exactly when the completed-state budget
   has to hold the layout. Replace the test with a `rerender`-based
   one that runs the full transition, asserts the completed expanded
   frame stays within `availableHeight`, and asserts the post-transition
   Ctrl+E is a no-op. Bumped `COMPLETED_FIXED_OVERHEAD` from 18 to 22
   to accommodate the ExecutionSummary + ToolUsage block accounting
   that the new transition test exposed.

* fix(cli): gate SubAgent useKeypress on isFocused for parallel runs

Per @yiliang114's review on PR #3721 — `data.status === 'running'` alone
fixes the historical/scrollback case but two SubAgents running in parallel
both stay `running`, so a single Ctrl+E / Ctrl+F still toggles them in
lock-step and the dual reflow brings back the flicker the gating was meant
to prevent. The component already receives `isFocused` from ToolMessage
(via SubagentExecutionRenderer) for the inline confirmation prompt — reuse
it on the keypress hook:

  isActive: data.status === 'running' && isFocused

Adds a regression test that renders a running SubAgent with
`isFocused={false}` and asserts Ctrl+E is a no-op (frame unchanged).

---------

Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
2026-04-29 22:34:55 +08:00

136 lines
4 KiB
JavaScript
Executable file
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#!/usr/bin/env node
/**
* Quick-and-dirty TUI flicker quantifier.
*
* Counts the ANSI escape sequences that betray flicker — clearing the screen,
* erasing lines, or jumping the cursor up to redraw — inside a recorded raw
* terminal stream. Useful for "before/after" sanity checks when refactoring
* TUI components.
*
* Recording phase (do this yourself; this script doesn't drive interactive
* TUIs — it just analyses what you recorded):
*
* # macOS / BSD `script`:
* script -q /tmp/qwen.before.raw node dist/cli.js --yolo
* # …drive a SubAgent scenario, then exit qwen with /quit or Ctrl-D
*
* # Linux / util-linux `script`:
* script -q -c 'node dist/cli.js --yolo' /tmp/qwen.before.raw
*
* # tmux variant: a tmux session preserves whatever you do live; pipe-pane
* # gives you the same raw bytes.
* tmux new -s flicker -d 'node dist/cli.js --yolo'
* tmux pipe-pane -t flicker -o 'cat > /tmp/qwen.before.raw'
* tmux attach -t flicker
*
* Then:
*
* node scripts/measure-flicker.mjs /tmp/qwen.before.raw
* node scripts/measure-flicker.mjs /tmp/qwen.after.raw /tmp/qwen.before.raw
*
* The second form prints both, and the delta — lower clearTerminalPair on
* "current" vs "baseline" is the win condition for a flicker fix.
*/
import { readFileSync, statSync } from 'node:fs';
import { argv, exit, stdout } from 'node:process';
if (argv.length < 3 || argv.length > 4) {
stdout.write(
'usage: node scripts/measure-flicker.mjs <raw-stream> [baseline-raw-stream]\n',
);
exit(64);
}
/* eslint-disable no-control-regex -- ANSI escape patterns deliberately match ESC */
const PATTERNS = [
{
name: 'clearTerminalPair',
description: 'ESC [ 2 J ESC [ 3 J ESC [ H (Ink full-screen redraw)',
regex: /\x1b\[2J\x1b\[3J\x1b\[H/g,
},
{
name: 'clearScreen',
description: 'ESC [ 2 J | ESC [ 3 J | ESC c (any clear-screen op)',
regex: /\x1b\[2J|\x1b\[3J|\x1bc/g,
},
{
name: 'eraseLine',
description: 'ESC [ 0|1|2 K (erase part/all of current line)',
regex: /\x1b\[[0-2]?K/g,
},
{
name: 'cursorUp',
description: 'ESC [ N A (cursor up — Ink uses this for in-place redraw)',
regex: /\x1b\[\d+A/g,
},
];
/* eslint-enable no-control-regex */
function readBytes(path) {
try {
statSync(path);
} catch {
stdout.write(`error: ${path} not found\n`);
exit(66);
}
// Read as binary so a NUL doesn't truncate the buffer; toString('binary')
// preserves byte values 0x00..0xFF as code points 0x00..0xFF, which is
// exactly what our regex patterns expect (they match \x1b literally).
return readFileSync(path).toString('binary');
}
function summarize(label, path) {
const raw = readBytes(path);
const counts = Object.fromEntries(
PATTERNS.map((p) => [p.name, raw.match(p.regex)?.length ?? 0]),
);
return {
label,
path,
bytes: raw.length,
counts,
};
}
function render(summary) {
const { label, path, bytes, counts } = summary;
stdout.write(`── ${label}\n`);
stdout.write(` file: ${path}\n`);
stdout.write(` bytes: ${bytes}\n`);
for (const p of PATTERNS) {
stdout.write(` ${p.name.padEnd(18)} ${counts[p.name]}\n`);
}
}
function renderDelta(current, baseline) {
stdout.write('\n── delta (current baseline)\n');
for (const p of PATTERNS) {
const c = current.counts[p.name];
const b = baseline.counts[p.name];
const d = c - b;
const arrow = d < 0 ? '↓' : d > 0 ? '↑' : '·';
stdout.write(` ${p.name.padEnd(18)} ${d > 0 ? '+' : ''}${d} ${arrow}\n`);
}
stdout.write(
'\ntip: lower clearTerminalPair (and lower clearScreen) on "current" wins.\n',
);
}
if (process.env.QWEN_FLICKER_VERBOSE) {
stdout.write('patterns:\n');
for (const p of PATTERNS) {
stdout.write(` ${p.name.padEnd(18)} = ${p.description}\n`);
}
stdout.write('\n');
}
const current = summarize('current', argv[2]);
render(current);
if (argv[3]) {
stdout.write('\n');
const baseline = summarize('baseline', argv[3]);
render(baseline);
renderDelta(current, baseline);
}