qwen-code

mirror of https://github.com/QwenLM/qwen-code.git synced 2026-05-08 10:00:06 +00:00

History

jinye df594f75fe feat(core): event monitor tool with throttled stdout streaming (Phase C) (#3684 ) * feat(core): event monitor tool with throttled stdout streaming (Phase C) Add a new Monitor tool that spawns a long-running shell command and streams its stdout lines back to the agent as event notifications. This is Phase C from the background task management roadmap (#3634, #3666). What changes: - New MonitorRegistry (services/monitorRegistry.ts): per-monitor entry with lifecycle (running/completed/failed/cancelled), idle timeout auto-stop, max events auto-stop, AbortController-based cancellation. Follows the same structural pattern as BackgroundTaskRegistry. - New Monitor tool (tools/monitor.ts): spawns via child_process.spawn with independent AbortController (Ctrl+C won't kill monitors), separate stdout/stderr line buffers, token-bucket throttling (burst=5, sustain=1/s). Returns immediately with monitor ID; events stream as notifications. - Sleep interception in shell.ts: detectBlockedSleepPattern() blocks foreground `sleep N` (N>=2) and guides model to use Monitor or is_background instead. - Config integration: MonitorRegistry instantiation, accessor, shutdown cleanup (abortAll), lazy tool registration. - CLI wiring: notification callbacks in useGeminiStream.ts (interactive) and nonInteractiveCli.ts (headless), including hold-back loop abort on exit and SIGINT cleanup. What this PR doesn't do (gated on #3471/#3488): - Footer pill / dialog integration - task_stop / send_message integration Test plan: - 21 MonitorRegistry unit tests (lifecycle, idle timeout, max events, XML escaping, nonexistent ID guard, callback clearing) - 20 Monitor tool unit tests (validation, spawn, line buffering, separate stdout/stderr buffers, throttling, signal-killed path, turn isolation) - 7 detectBlockedSleepPattern unit tests - 2 E2E tests (monitor invocation, sleep interception) - Full core suite: 248 files / 6151 passed Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): hold-back loop waits for monitors + emit task_started for SDK Two fixes from Codex review: P1: The non-interactive hold-back loop now includes monitorRegistry.getRunning() in its wait condition, so monitors can stream events before the CLI exits. Previously monitors were aborted immediately after the agent's first reply. P2: MonitorRegistry gains setRegisterCallback(), and nonInteractiveCli wires it to emit task_started system messages. Stream-json/SDK consumers now see a task_started for each monitor, matching the backgroundTaskRegistry contract. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): Windows process kill + pipeline sleep false-positive Two fixes from Codex review: P1: Monitor abort handler now uses `taskkill /f /t` on Windows instead of POSIX-only `process.kill(-pid)`. Follows the existing pattern in ShellExecutionService.childProcessFallback. P2: detectBlockedSleepPattern no longer uses splitCommands (which splits on `\|` pipes). Replaced with a regex that only matches sleep followed by sequential separators (&&, \|\|, ;, &, newline), not pipes. `sleep 5 \| cat` is now correctly allowed. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): resolve TS errors in monitor.test.ts mock types Use Object.defineProperty for readonly ChildProcess.pid and proper Readable type for stdout/stderr mocks to satisfy strict tsc builds. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): remove false notification promise + add early-abort guard P1: Sleep interception guidance no longer promises "completion notification" for is_background — that wiring doesn't exist yet (follow-up from #3642). P2: Monitor.execute() now checks _signal.aborted before spawning, preventing a race where cancellation during tool scheduling still launches a monitor. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): add getMonitorRegistry mock to useGeminiStream tests The useGeminiStream hook now calls config.getMonitorRegistry() to wire up monitor notification callbacks. The test mock config was missing this method, causing 64 test failures with "config.getMonitorRegistry is not a function". Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): add getMonitorRegistry mock to nonInteractiveCli tests Same fix as useGeminiStream.test.tsx — the mock config needs getMonitorRegistry to avoid "is not a function" errors (29 failures). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address PR review — CORE_TOOLS, directory param, test fix 1. Add 'monitor' to PermissionManager.CORE_TOOLS so coreTools allowlist correctly gates the monitor tool (same as run_shell_command). 2. Add optional 'directory' parameter to MonitorTool with workspace validation, mirroring ShellTool's directory support for multi-root workspaces. 3. Fix sleep-interception E2E test: readToolLogs() doesn't expose toolResult, so the old assertion was dead code. Now verifies via the model's output text instead. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address MonitorTool review #4186888042 Addresses three [Critical] review comments on packages/core/src/tools/monitor.ts: 1. Partial-line buffer unbounded growth (processLines) MAX_LINE_LENGTH was only enforced after a newline, so a command emitting a long stream without newlines would grow buffer.value without bound and re-split the entire accumulated string on every chunk. Now, when the buffer has no newline and exceeds MAX_LINE_LENGTH, we force-emit a single truncated event through the throttled path and reset the buffer. 2. Missing type guard on params.command validateToolParamValues called params.command.trim() without a typeof check. Schema validation normally catches this, but SDK/direct callers could bypass it and hit an uncaught TypeError. Added typeof === 'string' guard, matching the pattern used for max_events / idle_timeout_ms. 3. Workspace check bypass via raw startsWith The directory validator used workspaceDirs.some(d => params.directory .startsWith(d)), which allowed prefix collisions (e.g. /tmp/project-evil against a /tmp/project workspace) and skipped canonicalisation / symlink resolution. Switched to WorkspaceContext.isPathWithinWorkspace, which already does fullyResolvedPath + segment-aware isPathWithinRoot matching and is the standard used elsewhere in the codebase. Test coverage: added 6 unit tests covering non-string command guard, non-absolute directory rejection, prefix-collision rejection, traversal rejection, workspace acceptance, and partial-line cap behaviour (including buffer reset). All 26 monitor.test.ts cases pass. The same startsWith pattern also exists in ShellTool and is tracked as a separate follow-up to keep this PR focused on Phase C scope. * fix(core): scope monitor always-allow permissions Populate Monitor confirmation permissionRules using the same command-rule extraction path as ShellTool, so ProceedAlways persists command-scoped Bash(...) rules instead of a broad monitor-level allow. Also add unit coverage for command-scoped rules, filtering already-allowed subcommands, and extractor fallback behavior. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): decouple monitor permission scope from Bash rules Remove pm.isCommandAllowed() from MonitorToolInvocation.getConfirmationDetails() to prevent existing Bash(...) allow rules from shrinking the monitor confirmation scope. Monitor is a long-running background process with a different risk profile than one-shot shell execution and should maintain its own permission boundary. Only AST-based read-only filtering is retained. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): unify monitor error/exit cleanup to prevent resource leaks Extract a shared cleanup() helper called from both the `exit` and `error` event handlers. Previously the `error` handler did not flush buffers, clear buffer values, remove the abort listener, or log dropped-line stats — causing potential memory leaks when `error` fires without a subsequent `exit` (e.g. ENOENT for missing commands). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): add user-skills-directory guard to monitor directory validation Mirror ShellTool's getUserSkillsDirs() check in MonitorTool's validateToolParamValues() to prevent monitor commands from running inside user skills directories. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * feat(core): add Monitor(...) permission namespace for monitor tool (#3726) Introduce a dedicated Monitor(...) permission namespace so monitor and shell tools have independent permission boundaries. Previously monitor emitted Bash(...) rules, causing "Always Allow" to fail for future monitor invocations while unintentionally granting run_shell_command. Changes: - rule-parser.ts: add Monitor alias, SHELL_TOOL_NAMES entry, CANONICAL_TO_RULE_DISPLAY, DISPLAY_NAME_TO_VERB - permission-manager.ts: extract SHELL_LIKE_TOOLS set so evaluate(), evaluateSingle(), hasRelevantRules(), hasMatchingAskRule() handle both run_shell_command and monitor - monitor.ts: emit Monitor(...) instead of Bash(...) in permissionRules - Tests: parseRule, matchesRule, cross-tool isolation regression, buildPermissionRules, buildHumanReadableRuleLabel for Monitor Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): decouple headless monitor lifetime from final result Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): stabilize stream-json monitor session shutdown Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): deny monitor in headless approval defaults Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): honor tool aliases in headless allow checks Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address opus review — sleep regex, monitor cap, non-interactive cleanup - Fix sleep interception false positive for backgrounded sleep (`sleep 5 & echo done`). Remove bare `&` from separator character class so the background operator is not treated as a sequential separator. - Add MAX_CONCURRENT_MONITORS (16) check in MonitorRegistry.register() and early rejection in MonitorTool.execute() to prevent unbounded process spawning. - Widen monitorId from 8 to 16 hex chars to reduce birthday collision risk. - Abort all running monitors in nonInteractiveCli.ts success-path finally so piped stdio refs don't keep the Node event loop alive after result emission in one-shot (--print) mode. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): abort monitors and background shells on /clear Without this, long-running monitors from a previous session survive /clear and continue pushing events into the new session's notification queue. This enables cross-session prompt injection where a malicious monitor persists across the user's escape hatch. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): abort monitors on stream-json session shutdown Call monitorRegistry.abortAll() in both shutdown() and drainAndShutdown() so detached monitor child processes don't survive session termination. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(cli): use content event type in stream tests Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): isolate session cleanup on clear and shutdown Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): finalize session cleanup after drain Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): close remaining monitor review gaps Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve shell cwd in virtual permission checks Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): normalize trailing background ampersands Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): align monitor permission and wrapper handling Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(core): make monitor CI assertions cross-platform Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): align monitor wrapper normalization Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): normalize wrapped monitor commands Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): harden monitor headless edge cases Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve monitor spawn errors Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): harden monitor register cleanup Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): parse monitor wrapper script token Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address PR review comments for monitor tool - Make Bash(...) permission rules cover monitor via toolMatchesRuleToolName, so deny rules like Bash(rm ) also block monitor({command: "rm ..."}) - Remove dead `normalizeRuleToolName` mock reference in config.test.ts - Fix tool description to mention stdout/stderr instead of just stdout - Export MAX_CONCURRENT_MONITORS from monitorRegistry and use it in monitor.ts instead of hardcoded 16 - Rename ambiguous MAX_LINE_LENGTH constants: PARTIAL_LINE_BUFFER_CAP (4096, monitor.ts) and EVENT_LINE_TRUNCATE (2000, monitorRegistry.ts) - Fix schema description text: "Max 80 characters" → "Truncated to 80 characters in display" - Add .unref() to SIGTERM→SIGKILL escalation timer to prevent 200ms exit delay Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> fix(cli): resolve clear command typecheck issues Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): preserve background tasks across shutdown abort Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): close monitor review gaps Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address latest monitor review comments Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(cli): handle monitors across session switches Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(core): cover aborted monitor startup Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address remaining monitor PR review comments Adopts four unresolved review threads on PR #3684: * shell: trim top-level trailing comments before validating sleep separator so 'sleep 5 # wait' no longer bypasses detectBlockedSleepPattern. * monitor: add sanitizeMonitorLine to strip C0/C1 control chars (except tab) and defang structural envelope tag names with U+200B before forwarding output to the model, blocking prompt-injection attempts hidden in monitored stdout/stderr. * monitor: declare line buffers and throttledEmit before abortHandler to avoid TDZ on synchronous abort paths, and add flushPartialLineBuffers called from both abortHandler (before kill) and cleanup (natural exit/error) so partial-line data is no longer silently dropped on cancel. * permissions: document that normalizePermissionContext relies on buildPermissionCheckContext to forward monitor's directory as cwd, and add regression tests proving relative-path Read(./...) allow and deny rules resolve against the monitor's explicit cwd. * fix(core): abort running monitors in MonitorRegistry.reset() reset() previously only cleared idle timers and emptied the map without aborting running monitors' AbortControllers. This could orphan child processes when reset() was called without a prior abortAll(), e.g. via useResumeCommand → resetBackgroundStateForSessionSwitch. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core): harden monitor notification XML and displayText - Extend escapeXml to escape " and ' as defense-in-depth: safe to reuse the helper in any future XML attribute context without re-auditing. - Strip C0 (except tab) and C1 control characters from the displayText surface before interpolation, so untrusted child-process output cannot leak ANSI escapes / NUL bytes into the operator's terminal even if a direct caller of MonitorRegistry.emitEvent skips sanitization. Adds unit tests for both hardening paths. * test(core): cover token-bucket throttling and commented-sleep bypass - Add 4 unit tests for the monitor token-bucket throttle (burst=5, 1 token/sec refill): burst cap, refill release, long-idle bucket cap, and whitespace lines not consuming budget. Uses vi.setSystemTime to exercise Date.now() without advancing pending setTimeouts. - Add an E2E case that feeds 'sleep 5 # wait for db' through the shell tool to lock in trimTrailingShellComment behavior end-to-end; the unit-level coverage in shell.test.ts remains authoritative but the E2E anchor prevents a regression from silently passing unit tests. * fix(core): address 3 remaining copilot review comments 1. shell.ts sleep interception: strip shell wrapper before detecting the blocked sleep pattern so `bash -c 'sleep 5'` / `sh -c ...` cannot route around the block. Mirrors every other sensitive check in shell.ts, which already normalizes through stripShellWrapper. 2. monitorRegistry.ts emitEvent auto-stop: settle the entry BEFORE aborting its controller so that any synchronous abort listener that flushes buffered output back through registry.emitEvent() (e.g. the Monitor tool's flushPartialLineBuffers) finds status !== 'running' and short-circuits instead of overshooting maxEvents and emitting a duplicate 'Max events reached' terminal notification. 3. monitorRegistry.ts truncateDescription: cap output at exactly MAX_DESCRIPTION_LENGTH by counting the ellipsis against the budget, instead of returning MAX_DESCRIPTION_LENGTH + 3 characters. Each fix is covered by a new unit test. * fix(core): address review comments — sanitize, notify, kill logging, throttle observability - Remove double normalize in buildPermissionCheckContext (PM is single source) - Add {notify:false} to Config.shutdown() and abortTaskRegistries() abortAll - Swap settle-before-abort in cancel() and resetIdleTimer() to prevent races - Add stripDisplayControlChars to emitTerminalNotification - Sanitize monitor description at entry creation via sanitizeMonitorLine - Surface throttle-dropped line count in terminal notification - Add .unref() to idle timer to allow clean process exit - Add error handler + stdio:ignore to Windows taskkill spawn - Log SIGTERM/SIGKILL kill failures via debugLogger.warn - Attach early child error handler to cover spawn-to-register window - Destroy child stdio on register failure to prevent handle leaks - Improve stripShellWrapper to handle absolute paths, combined flags, env prefix - Improve SHELL_TOOL_NAMES documentation and toolMatchesRuleToolName clarity Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): resolve monitor tool typecheck errors - Cast child.stdout/stderr to a minimal { destroy?: () => void } shape so the optional destroy() call compiles and still works with test mocks. - Initialize droppedLines: 0 in MonitorEntry test fixtures that predate the field becoming required. * fix(monitor): add missing stdio option in taskkill test assertions (#3784) * fix(core): address monitor review feedback * fix(core): harden monitor command lifecycle --------- Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>		2026-05-02 20:57:26 +08:00
..
acp-cron.test.ts	feat(cron): integrate cron scheduling into ACP session lifecycle	2026-03-30 18:06:09 +08:00
acp-integration.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
cron-tools.test.ts	fix: resolve punycode to userland package and skip env var test in sandbox	2026-04-01 18:58:19 +08:00
edit.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
extensions-install.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
file-system.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
json-output.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
list_directory.test.ts	Merge remote-tracking branch 'origin/main' into feat/in-session-cron-loops	2026-03-30 19:08:25 +08:00
mcp_server_cyclic_schema.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
monitor.test.ts	feat(core): event monitor tool with throttled stdout streaming (Phase C) (#3684 )	2026-05-02 20:57:26 +08:00
read_many_files.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
run_shell_command.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
save_memory.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
settings-migration.test.ts	test(integration): switch settings-migration probe from --help to mcp list (#3486 )	2026-04-21 14:19:44 +08:00
simple-mcp-server.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
sleep-interception.test.ts	feat(core): event monitor tool with throttled stdout streaming (Phase C) (#3684 )	2026-05-02 20:57:26 +08:00
stdin-context.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
telemetry.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
todo_write.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
utf-bom-encoding.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00
write_file.test.ts	refactor(tests): reorganize integration tests by execution mode	2026-03-29 05:49:17 +00:00