mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-05-05 23:42:03 +00:00
* fix(cli): bound SubAgent display by visual height to prevent flicker
The SubAgent runtime display used hard-coded MAX_TASK_PROMPT_LINES=5 and
MAX_TOOL_CALLS=5 plus character-length truncation (`length > 80`). On narrow
terminals the soft-wrapped content overflowed the available height as the
tool-call list grew, forcing Ink to clear and redraw on every update.
Pull AgentExecutionDisplay onto the same visual-height/visual-width slicing
pattern that ToolMessage and ConversationMessages already use:
- Add `sliceTextByVisualHeight` to textUtils — counts soft wraps as visual
rows, supports top/bottom overflow direction.
- AgentExecutionDisplay now derives maxTaskPromptLines / maxToolCalls from
the assigned `availableHeight` and uses `truncateToVisualWidth` (CJK +
emoji safe) instead of substring(0, 80). Compact mode is unchanged.
- Drop the 300 ms debounced `refreshStatic` AppContainer fired on every
terminalWidth change — that was a flicker source on resize and the
static area no longer needs the refresh.
Tests:
- textUtils.test.ts covers undefined maxHeight, top/bottom overflow, and
soft-wrap counting.
- AgentExecutionDisplay.test.tsx asserts the height-bounded render keeps
the prompt + tool list inside the assigned rows.
- AppContainer.test.tsx asserts width-only changes no longer clear the
terminal.
* test(tui): add SubAgent flicker regression script and ANSI counter
Two reusable tools for measuring TUI flicker:
- `scripts/measure-flicker.mjs` — standalone Node script that counts the
ANSI escape sequences which betray flicker (clearTerminalPair, clearScreen,
eraseLine, cursorUp) inside any recorded raw stream (`script` log,
`tmux pipe-pane` output, custom PTY capture). Supports baseline diff mode.
- `integration-tests/terminal-capture/subagent-flicker-regression.ts` —
end-to-end ratchet that boots a mock OpenAI server, drives a real qwen
process through an `agent` tool dispatch + 5 `read_file` SubAgent rounds,
then reads PTY bytes and asserts ANSI-redraw counts stay below configured
ceilings. Mirrors PR #43f128b20's resize-clear-regression pattern.
Reference numbers (60-col / 18-row terminal, fixed build):
clearTerminalPair=5, clearScreen=10, eraseLine=440, cursorUp=132
The ratchet defaults to 10/20 ceilings — roughly 2× steady state — so
regressions like reverting sliceTextByVisualHeight or restoring the
width-driven refreshStatic trip the build.
Implementation notes captured in the script's docstring:
- Strips HTTP_PROXY family env vars (NO_PROXY isn't honored by undici,
so corp proxy would otherwise hijack the loopback request).
- Drops `--bare` (bare mode hard-codes the registered tool set and
rejects the `agent` tool); HOME is sandboxed to a temp dir instead.
- Mock server speaks SSE because the CLI requests stream:true.
* fix(cli): address inline review on SubAgent flicker fix
Three issues from inline review on PR #3721:
1. **availableHeight as total budget (Critical).** The previous formula
only constrained prompt + tool-call height, not the surrounding
header / section labels / gaps / footer. Default and verbose mode
could still overrun the parent-provided budget. Subtract a fixed-row
overhead (10 rows running, 18 completed) before computing
`maxTaskPromptLines` / `maxToolCalls`. Add unit tests that assert the
rendered frame line-count stays within `availableHeight` for both
running and completed states.
2. **Ratchet that actually distinguishes fix from no-fix.** The previous
`clearTerminalPair` / `clearScreen` ceilings passed for both fixed
and unfixed builds. Add an `eraseLine` upper bound (default 460) —
that's the metric whose drop reflects the in-place-update efficiency
the visual-height fix delivers (no-fix observed 469, with-fix 434).
Refresh docstring with the current numbers and a coverage map that
honestly states what this ratchet does and does not exercise.
3. **Keypress scope.** `useKeypress` was active on every mounted
`AgentExecutionDisplay`, including completed/historical instances in
chat history — Ctrl+E / Ctrl+F would toggle them all in lock-step
and cause large scrollback reflows. Gate `isActive` on
`data.status === 'running'`. Test mock now also honors
`{ isActive }` so the new "completed displays ignore Ctrl+E"
regression is enforceable.
* fix(cli): address round-2 inline review on SubAgent flicker
Three follow-up issues from inline review on PR #3721:
1. **sliceTextByVisualHeight reservedRows early-return (Critical).**
The early return compared `visualLineCount <= targetMaxHeight` and
ignored `reservedRows`, so a caller asking us to keep one row free
for a footer could still receive the full input back with
`hiddenLinesCount: 0` even though only `targetMaxHeight - reservedRows`
content rows were actually available. Compare against
`visibleContentHeight` instead and add a regression test for the
`'a\nb\nc' / 3 / reservedRows: 1` case the reviewer flagged.
2. **Footer hint and rendered prompt now share one slicing result
(Suggestion).** Previously `hasMoreLines` looked at
`data.taskPrompt.split('\n').length` (hard newlines only), but the
prompt body was already truncated by `sliceTextByVisualHeight` (which
counts soft wraps). A long single-line prompt could be visually
truncated without the footer ever surfacing the "ctrl+f to show
more" hint. Lift the slice into the parent component and feed both
the rendered `TaskPromptSection` and the footer's `hasMoreLines`
from the same `hiddenLinesCount`.
3. **Running → completed transition test (Critical).** The previous
"completed displays ignore Ctrl+E" test rendered already-completed
data, so `useKeypress` was inactive from the start and Ctrl+E was a
no-op trivially. It missed the real path: a running subagent gets
expanded, then completes while preserving the expanded
`displayMode` — which is exactly when the completed-state budget
has to hold the layout. Replace the test with a `rerender`-based
one that runs the full transition, asserts the completed expanded
frame stays within `availableHeight`, and asserts the post-transition
Ctrl+E is a no-op. Bumped `COMPLETED_FIXED_OVERHEAD` from 18 to 22
to accommodate the ExecutionSummary + ToolUsage block accounting
that the new transition test exposed.
* fix(cli): gate SubAgent useKeypress on isFocused for parallel runs
Per @yiliang114's review on PR #3721 — `data.status === 'running'` alone
fixes the historical/scrollback case but two SubAgents running in parallel
both stay `running`, so a single Ctrl+E / Ctrl+F still toggles them in
lock-step and the dual reflow brings back the flicker the gating was meant
to prevent. The component already receives `isFocused` from ToolMessage
(via SubagentExecutionRenderer) for the inline confirmation prompt — reuse
it on the keypress hook:
isActive: data.status === 'running' && isFocused
Adds a regression test that renders a running SubAgent with
`isFocused={false}` and asserts Ctrl+E is a no-op (frame unchanged).
---------
Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
136 lines
4 KiB
JavaScript
Executable file
136 lines
4 KiB
JavaScript
Executable file
#!/usr/bin/env node
|
||
/**
|
||
* Quick-and-dirty TUI flicker quantifier.
|
||
*
|
||
* Counts the ANSI escape sequences that betray flicker — clearing the screen,
|
||
* erasing lines, or jumping the cursor up to redraw — inside a recorded raw
|
||
* terminal stream. Useful for "before/after" sanity checks when refactoring
|
||
* TUI components.
|
||
*
|
||
* Recording phase (do this yourself; this script doesn't drive interactive
|
||
* TUIs — it just analyses what you recorded):
|
||
*
|
||
* # macOS / BSD `script`:
|
||
* script -q /tmp/qwen.before.raw node dist/cli.js --yolo
|
||
* # …drive a SubAgent scenario, then exit qwen with /quit or Ctrl-D
|
||
*
|
||
* # Linux / util-linux `script`:
|
||
* script -q -c 'node dist/cli.js --yolo' /tmp/qwen.before.raw
|
||
*
|
||
* # tmux variant: a tmux session preserves whatever you do live; pipe-pane
|
||
* # gives you the same raw bytes.
|
||
* tmux new -s flicker -d 'node dist/cli.js --yolo'
|
||
* tmux pipe-pane -t flicker -o 'cat > /tmp/qwen.before.raw'
|
||
* tmux attach -t flicker
|
||
*
|
||
* Then:
|
||
*
|
||
* node scripts/measure-flicker.mjs /tmp/qwen.before.raw
|
||
* node scripts/measure-flicker.mjs /tmp/qwen.after.raw /tmp/qwen.before.raw
|
||
*
|
||
* The second form prints both, and the delta — lower clearTerminalPair on
|
||
* "current" vs "baseline" is the win condition for a flicker fix.
|
||
*/
|
||
|
||
import { readFileSync, statSync } from 'node:fs';
|
||
import { argv, exit, stdout } from 'node:process';
|
||
|
||
if (argv.length < 3 || argv.length > 4) {
|
||
stdout.write(
|
||
'usage: node scripts/measure-flicker.mjs <raw-stream> [baseline-raw-stream]\n',
|
||
);
|
||
exit(64);
|
||
}
|
||
|
||
/* eslint-disable no-control-regex -- ANSI escape patterns deliberately match ESC */
|
||
const PATTERNS = [
|
||
{
|
||
name: 'clearTerminalPair',
|
||
description: 'ESC [ 2 J ESC [ 3 J ESC [ H (Ink full-screen redraw)',
|
||
regex: /\x1b\[2J\x1b\[3J\x1b\[H/g,
|
||
},
|
||
{
|
||
name: 'clearScreen',
|
||
description: 'ESC [ 2 J | ESC [ 3 J | ESC c (any clear-screen op)',
|
||
regex: /\x1b\[2J|\x1b\[3J|\x1bc/g,
|
||
},
|
||
{
|
||
name: 'eraseLine',
|
||
description: 'ESC [ 0|1|2 K (erase part/all of current line)',
|
||
regex: /\x1b\[[0-2]?K/g,
|
||
},
|
||
{
|
||
name: 'cursorUp',
|
||
description: 'ESC [ N A (cursor up — Ink uses this for in-place redraw)',
|
||
regex: /\x1b\[\d+A/g,
|
||
},
|
||
];
|
||
/* eslint-enable no-control-regex */
|
||
|
||
function readBytes(path) {
|
||
try {
|
||
statSync(path);
|
||
} catch {
|
||
stdout.write(`error: ${path} not found\n`);
|
||
exit(66);
|
||
}
|
||
// Read as binary so a NUL doesn't truncate the buffer; toString('binary')
|
||
// preserves byte values 0x00..0xFF as code points 0x00..0xFF, which is
|
||
// exactly what our regex patterns expect (they match \x1b literally).
|
||
return readFileSync(path).toString('binary');
|
||
}
|
||
|
||
function summarize(label, path) {
|
||
const raw = readBytes(path);
|
||
const counts = Object.fromEntries(
|
||
PATTERNS.map((p) => [p.name, raw.match(p.regex)?.length ?? 0]),
|
||
);
|
||
return {
|
||
label,
|
||
path,
|
||
bytes: raw.length,
|
||
counts,
|
||
};
|
||
}
|
||
|
||
function render(summary) {
|
||
const { label, path, bytes, counts } = summary;
|
||
stdout.write(`── ${label}\n`);
|
||
stdout.write(` file: ${path}\n`);
|
||
stdout.write(` bytes: ${bytes}\n`);
|
||
for (const p of PATTERNS) {
|
||
stdout.write(` ${p.name.padEnd(18)} ${counts[p.name]}\n`);
|
||
}
|
||
}
|
||
|
||
function renderDelta(current, baseline) {
|
||
stdout.write('\n── delta (current − baseline)\n');
|
||
for (const p of PATTERNS) {
|
||
const c = current.counts[p.name];
|
||
const b = baseline.counts[p.name];
|
||
const d = c - b;
|
||
const arrow = d < 0 ? '↓' : d > 0 ? '↑' : '·';
|
||
stdout.write(` ${p.name.padEnd(18)} ${d > 0 ? '+' : ''}${d} ${arrow}\n`);
|
||
}
|
||
stdout.write(
|
||
'\ntip: lower clearTerminalPair (and lower clearScreen) on "current" wins.\n',
|
||
);
|
||
}
|
||
|
||
if (process.env.QWEN_FLICKER_VERBOSE) {
|
||
stdout.write('patterns:\n');
|
||
for (const p of PATTERNS) {
|
||
stdout.write(` ${p.name.padEnd(18)} = ${p.description}\n`);
|
||
}
|
||
stdout.write('\n');
|
||
}
|
||
|
||
const current = summarize('current', argv[2]);
|
||
render(current);
|
||
|
||
if (argv[3]) {
|
||
stdout.write('\n');
|
||
const baseline = summarize('baseline', argv[3]);
|
||
render(baseline);
|
||
renderDelta(current, baseline);
|
||
}
|