Commit graph

17 commits

Author SHA1 Message Date
Shaojin Wen
cae09279fa
fix(cli): bound SubAgent display by visual height to prevent flicker (#3721)
* fix(cli): bound SubAgent display by visual height to prevent flicker

The SubAgent runtime display used hard-coded MAX_TASK_PROMPT_LINES=5 and
MAX_TOOL_CALLS=5 plus character-length truncation (`length > 80`). On narrow
terminals the soft-wrapped content overflowed the available height as the
tool-call list grew, forcing Ink to clear and redraw on every update.

Pull AgentExecutionDisplay onto the same visual-height/visual-width slicing
pattern that ToolMessage and ConversationMessages already use:

- Add `sliceTextByVisualHeight` to textUtils — counts soft wraps as visual
  rows, supports top/bottom overflow direction.
- AgentExecutionDisplay now derives maxTaskPromptLines / maxToolCalls from
  the assigned `availableHeight` and uses `truncateToVisualWidth` (CJK +
  emoji safe) instead of substring(0, 80). Compact mode is unchanged.
- Drop the 300 ms debounced `refreshStatic` AppContainer fired on every
  terminalWidth change — that was a flicker source on resize and the
  static area no longer needs the refresh.

Tests:
- textUtils.test.ts covers undefined maxHeight, top/bottom overflow, and
  soft-wrap counting.
- AgentExecutionDisplay.test.tsx asserts the height-bounded render keeps
  the prompt + tool list inside the assigned rows.
- AppContainer.test.tsx asserts width-only changes no longer clear the
  terminal.

* test(tui): add SubAgent flicker regression script and ANSI counter

Two reusable tools for measuring TUI flicker:

- `scripts/measure-flicker.mjs` — standalone Node script that counts the
  ANSI escape sequences which betray flicker (clearTerminalPair, clearScreen,
  eraseLine, cursorUp) inside any recorded raw stream (`script` log,
  `tmux pipe-pane` output, custom PTY capture). Supports baseline diff mode.

- `integration-tests/terminal-capture/subagent-flicker-regression.ts` —
  end-to-end ratchet that boots a mock OpenAI server, drives a real qwen
  process through an `agent` tool dispatch + 5 `read_file` SubAgent rounds,
  then reads PTY bytes and asserts ANSI-redraw counts stay below configured
  ceilings. Mirrors PR #43f128b20's resize-clear-regression pattern.

Reference numbers (60-col / 18-row terminal, fixed build):
  clearTerminalPair=5, clearScreen=10, eraseLine=440, cursorUp=132

The ratchet defaults to 10/20 ceilings — roughly 2× steady state — so
regressions like reverting sliceTextByVisualHeight or restoring the
width-driven refreshStatic trip the build.

Implementation notes captured in the script's docstring:
  - Strips HTTP_PROXY family env vars (NO_PROXY isn't honored by undici,
    so corp proxy would otherwise hijack the loopback request).
  - Drops `--bare` (bare mode hard-codes the registered tool set and
    rejects the `agent` tool); HOME is sandboxed to a temp dir instead.
  - Mock server speaks SSE because the CLI requests stream:true.

* fix(cli): address inline review on SubAgent flicker fix

Three issues from inline review on PR #3721:

1. **availableHeight as total budget (Critical).** The previous formula
   only constrained prompt + tool-call height, not the surrounding
   header / section labels / gaps / footer. Default and verbose mode
   could still overrun the parent-provided budget. Subtract a fixed-row
   overhead (10 rows running, 18 completed) before computing
   `maxTaskPromptLines` / `maxToolCalls`. Add unit tests that assert the
   rendered frame line-count stays within `availableHeight` for both
   running and completed states.

2. **Ratchet that actually distinguishes fix from no-fix.** The previous
   `clearTerminalPair` / `clearScreen` ceilings passed for both fixed
   and unfixed builds. Add an `eraseLine` upper bound (default 460) —
   that's the metric whose drop reflects the in-place-update efficiency
   the visual-height fix delivers (no-fix observed 469, with-fix 434).
   Refresh docstring with the current numbers and a coverage map that
   honestly states what this ratchet does and does not exercise.

3. **Keypress scope.** `useKeypress` was active on every mounted
   `AgentExecutionDisplay`, including completed/historical instances in
   chat history — Ctrl+E / Ctrl+F would toggle them all in lock-step
   and cause large scrollback reflows. Gate `isActive` on
   `data.status === 'running'`. Test mock now also honors
   `{ isActive }` so the new "completed displays ignore Ctrl+E"
   regression is enforceable.

* fix(cli): address round-2 inline review on SubAgent flicker

Three follow-up issues from inline review on PR #3721:

1. **sliceTextByVisualHeight reservedRows early-return (Critical).**
   The early return compared `visualLineCount <= targetMaxHeight` and
   ignored `reservedRows`, so a caller asking us to keep one row free
   for a footer could still receive the full input back with
   `hiddenLinesCount: 0` even though only `targetMaxHeight - reservedRows`
   content rows were actually available. Compare against
   `visibleContentHeight` instead and add a regression test for the
   `'a\nb\nc' / 3 / reservedRows: 1` case the reviewer flagged.

2. **Footer hint and rendered prompt now share one slicing result
   (Suggestion).** Previously `hasMoreLines` looked at
   `data.taskPrompt.split('\n').length` (hard newlines only), but the
   prompt body was already truncated by `sliceTextByVisualHeight` (which
   counts soft wraps). A long single-line prompt could be visually
   truncated without the footer ever surfacing the "ctrl+f to show
   more" hint. Lift the slice into the parent component and feed both
   the rendered `TaskPromptSection` and the footer's `hasMoreLines`
   from the same `hiddenLinesCount`.

3. **Running → completed transition test (Critical).** The previous
   "completed displays ignore Ctrl+E" test rendered already-completed
   data, so `useKeypress` was inactive from the start and Ctrl+E was a
   no-op trivially. It missed the real path: a running subagent gets
   expanded, then completes while preserving the expanded
   `displayMode` — which is exactly when the completed-state budget
   has to hold the layout. Replace the test with a `rerender`-based
   one that runs the full transition, asserts the completed expanded
   frame stays within `availableHeight`, and asserts the post-transition
   Ctrl+E is a no-op. Bumped `COMPLETED_FIXED_OVERHEAD` from 18 to 22
   to accommodate the ExecutionSummary + ToolUsage block accounting
   that the new transition test exposed.

* fix(cli): gate SubAgent useKeypress on isFocused for parallel runs

Per @yiliang114's review on PR #3721 — `data.status === 'running'` alone
fixes the historical/scrollback case but two SubAgents running in parallel
both stay `running`, so a single Ctrl+E / Ctrl+F still toggles them in
lock-step and the dual reflow brings back the flicker the gating was meant
to prevent. The component already receives `isFocused` from ToolMessage
(via SubagentExecutionRenderer) for the inline confirmation prompt — reuse
it on the keypress hook:

  isActive: data.status === 'running' && isFocused

Adds a regression test that renders a running SubAgent with
`isFocused={false}` and asserts Ctrl+E is a no-op (frame unchanged).

---------

Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
2026-04-29 22:34:55 +08:00
tanzhenxin
dc833d9d94 feat: add bugfix workflow, test-engineer agent, and debugging skills
- Add test-engineer agent for bug reproduction and verification
- Add /qc:bugfix command for structured bugfix workflow
- Add e2e-testing skill covering headless/interactive modes, MCP testing
- Add structured-debugging skill for hypothesis-driven debugging
- Simplify AGENTS.md to focus on essential commands and conventions
- Add terminal-capture scenario for bugfix workflow testing
- Add .qwen folder to ESLint ignore list

Known limitations: The /qc:bugfix workflow and e2e-testing skill
are experimental and may be unstable or consume significant tokens.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-04 18:30:09 +08:00
tanzhenxin
76d64c9464
Merge pull request #2731 from QwenLM/feat/in-session-cron-loops
feat(cron): add in-session loop scheduling with cron tools
2026-04-01 16:18:46 +08:00
tanzhenxin
89b79544d1 fix: upgrade @lydell/node-pty to 1.2.0-beta.10 to fix PTY FD leak
The previous version (1.1.0) has a native-level bug on macOS where each
PTY spawn leaks one /dev/ptmx file descriptor that is never closed. Over
a long session with hundreds of shell commands, this exhausts the
system-wide PTY pool (kern.tty.ptmx_max = 511), breaking other programs
like tmux and new terminal windows.

Root cause: microsoft/node-pty#882

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-01 07:55:56 +08:00
tanzhenxin
ded89618ec refactor(tests): reorganize integration tests by execution mode
Move non-interactive tests to cli/, interactive tests to interactive/.
Add cron-interactive.test.ts wrapping terminal-capture E2E in vitest.
Update npm scripts and release workflow for new directory layout.
2026-03-29 05:49:17 +00:00
tanzhenxin
707b06ca48 fix(cron): replace "Claude" with "Qwen Code" in tool messages
Also adds terminal capture test scenario for cron-loop feature.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-03-29 13:17:52 +08:00
tanzhenxin
a3623fd819 feat(cron): add interactive E2E tests and fix cron trigger reactivity
- Add getScreenText() to TerminalCapture for reading rendered xterm.js screen
- Add E2E tests for in-session cron: inline firing, user priority, error resilience
- Fix cron prompts not processing by adding cronTrigger state dependency

This ensures cron-injected prompts are processed immediately when fired,
not just when streaming state changes, and provides comprehensive test
coverage for the in-session cron feature.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-03-29 04:22:28 +00:00
Mingholy
00447356ad
Merge pull request #2602 from QwenLM/feat/hooks-refactor-hooks-ui
feat(hooks ui): refactor ui for Qwen Code hooks
2026-03-26 20:11:50 +08:00
DennisYu07
b08154dbee refactor ui for qwen code hooks 2026-03-23 11:24:59 +08:00
yiliang114
dff9822f9b fix(cli): improve /btw overlay UX — layout, dismiss hints, and history cleanup
- Make /btw overlay mutually exclusive with Composer (replaces input area)
- Add dismiss hints: "Press Escape to cancel" (pending) / "Press Space,
  Enter, or Escape to dismiss" (completed)
- Skip adding /btw to conversation history to avoid duplicate display
- Prioritize dialog shortcuts over btw dismiss via dialogsVisibleRef
- Add `sleep` property to terminal-capture FlowStep for async wait scenarios

Made-with: Cursor
2026-03-21 01:07:02 +08:00
tanzhenxin
78faa365cb feat(tools): allow read-file access to OS temp directory
- Add os.tmpdir() to allowed paths in read-file tool
- Add tests for reading files from OS temp directory
- Add terminal capture scenario for PR review testing

This supports the PR review workflow which saves context to temp files.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-03-17 17:13:23 +08:00
tanzhenxin
01ed2a7b1f test(terminal-capture): add message-components scenario for PR #2120
Add test scenario to verify message component prefixes display correctly:
- Info message prefix (● filled circle)
- Error message prefix (✕)
- User message prefix (>)
- Assistant message prefix (✦)

Also refactors GIF generation to scenario-level for cleaner output.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-03-05 21:13:13 +08:00
tanzhenxin
06aeb8a6a2 add qc-code-review scenario 2026-03-05 19:41:32 +08:00
tanzhenxin
df20ec9871 feat(terminal-capture): add streaming-insight scenario and simplify GIF timing
- Add streaming-insight scenario for /insight command demo
- Add progress.sh script to test PTY carriage return handling
- Simplify generateGif with fixed frame durations (300ms normal, 1s edges)

This enhances terminal capture testing with a real-world streaming scenario
and cleaner GIF generation logic.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-03-05 19:13:28 +08:00
tanzhenxin
b8a7ac830d feat(terminal-capture): add streaming capture with GIF generation
Add ability to capture multiple screenshots at intervals during
long-running terminal output (e.g., progress bars). Optionally
generates animated GIFs from captured frames using ffmpeg.

Features:
- Streaming capture at configurable intervals
- Early stop when output stabilizes (3 consecutive unchanged frames)
- Duplicate frame skipping
- Animated GIF generation via ffmpeg concat demuxer
- Auto-cleanup of output directory before each run
- Configurable delay before starting captures

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-03-05 17:46:09 +08:00
tanzhenxin
a172696b86 Merge branch 'main' into feat/support-insight-command
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-26 21:05:41 +08:00
pomelo-nwu
997fcbfaed feat: add terminal-capture for CLI screenshot automation
- Add terminal-capture engine using node-pty + xterm.js + Playwright
- Add scenario runner with TypeScript configuration
- Add pre-built scenarios (/about, /context, /export, /auth)
- Add Cursor skills for terminal-capture and pr-review workflow
- Add motivation documentation

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-14 21:34:42 +08:00