feat(core): managed background shell pool with /tasks command (#3642)

* feat(core): managed background shell pool with /bashes command Replace shell.ts's `&` fork-and-detach background path with a managed process registry. Background shells now have observable lifecycle, captured output, and explicit cancellation — matching the pattern used by background subagents (#3076). Phase B from #3634 (background task management roadmap). What changes - New `BackgroundShellRegistry` (services/backgroundShellRegistry.ts): per-process entry with status (running / completed / failed / cancelled), AbortController, output file path. State transitions are one-shot (terminal status sticks; late callbacks no-op). Mirrors the lifecycle shape of #3471's BackgroundTaskRegistry so the two can be unified later. - `shell.ts` is_background path rewritten as `executeBackground`: - Spawns the unwrapped command (no '&', no pgrep envelope) - Streams stdout to `<projectDir>/tasks/<sessionId>/shell-<id>.output` (path layout aligns with the direction sketched in #3471 review) - Bridges the external abort signal into the entry's AbortController so a single source of truth governs cancellation - Returns immediately with id + output path; agent's turn isn't blocked - Settles the registry entry asynchronously when ShellExecutionService resolves: complete (clean exit) / fail (error) / cancel (aborted) - Removes ~120 lines of dead bg-specific code from shell.ts: pgrep wrapping, '&' appending, Windows ampersand cleanup, Windows early-return path, bg PID parsing, tempFile cleanup - New `/bashes` slash command: lists registered shells with id, status, runtime, command, output path. Empty state prints a friendly message. What this PR doesn't do - Footer pill / dialog integration — gated on #3488 landing - task_stop / send_message integration — gated on #3471 landing - Auto-backgrounding heuristics for long foreground bash — Phase D Test plan - 11 registry unit tests (state machine + idempotent terminal transitions) - 4 background-path tests in shell.test.ts (spawn no-wrap + complete / fail / cancel settle paths) - 2 /bashes command tests (empty + populated) - Full core suite: 247 files / 6075 passed (existing tests unaffected) * fix(core): address PR #3642 review feedback Three [Critical] from the auto review + naming alignment with Claude Code: - shell.ts settle: non-zero exit code or termination signal now bucket into `failed` instead of `completed`. The previous `if (result.error) fail else complete()` would misreport `false` / failed `npm test` as success because ShellExecutionService surfaces ordinary command failures as a non-zero exitCode with `error: null`. Failure reason carries the exit code or signal so `/tasks` shows the real cause. - ShellExecutionService.childProcessFallback: add `streamStdout` mode that emits each decoded chunk through the existing onOutputEvent path. The default (foreground) path continues to buffer + emit the cleaned final blob, so existing in-line shell calls are unaffected. executeBackground opts in via `{ streamStdout: true }`, which is what makes the captured output file actually useful for long-running processes (dev servers, watchers) — without it the file stayed empty until the process exited. - shell.ts test fixture: cancel-settle test was using `signal: 'SIGTERM'` but `ShellExecutionResult.signal` is `number | null`. TS2322 broke the build; switched to `signal: null`. Added a test that explicitly covers the new "non-zero exit → failed" path so the bucketing change has regression coverage. - shell.ts comment: explicitly document why background shells force `shouldUseNodePty=false` (no terminal, no human; node-pty would be dead weight for fire-and-forget commands). - /bashes → /tasks (alias bashes), description "List and manage background tasks" — matches Claude Code's command name. Currently lists shells only; will surface other task kinds (subagents, monitor) as those registries land via #3471 / #3488. * fix(core): address PR #3642 second-round review feedback - shellExecutionService streaming: drop stdout/stderr buffer + outputChunks accumulation in streaming mode. Each decoded chunk goes straight to onOutputEvent and is GC-eligible immediately. Long-running background commands (dev servers, watchers) no longer accumulate unbounded memory proportional to total output. Buffered (foreground) mode is unchanged. - shell.ts executeBackground: stripAnsi each chunk before writing to the output file. Dev servers / build tools spam color codes and cursor-move sequences that would render as garbage in the file the agent reads. - bashesCommand: command description "List and manage" → "List background tasks" — current implementation only supports listing, cancellation follows when the unified task_stop tool from #3471 is wired in. Replace the hand-rolled formatRuntime helper with the shared formatDuration utility (uses hideTrailingZeros for parity with the previous output). - backgroundShellRegistry: add a comment documenting the lack of an eviction policy as a known limitation. LRU / age-based / capped-size eviction (and on-disk output rotation) is left as a follow-up alongside the broader output-file lifecycle story. * fix(core): address PR #3642 third-round review feedback - shell.ts executeBackground: add 'error' listener on the output write stream. fs.createWriteStream surfaces write failures (disk full, permission, fs going away) as 'error' events; without a listener Node treats it as an uncaught exception and kills the entire CLI session. Log + drop is the sane default — the registry still settles via resultPromise so /tasks shows the right terminal status. - shell.ts executeBackground: store the abort handler reference and removeEventListener in the settle callback. Background shells outlive the turn signal; the dangling listener was keeping `entryAc` (and transitively `outputStream`) reachable until the turn signal itself was GC'd, which for long sessions would never happen. - shell.test.ts: extend the createWriteStream mock with an `on` stub so the new error-listener wiring doesn't crash the test suite. * refactor(cli): drop /bashes alias and rename file to tasksCommand Per follow-up review: the slash command should be exclusively /tasks. Removes the `bashes` altName, renames `bashesCommand{,.test}.ts` → `tasksCommand{,.test}.ts`, renames the exported binding `bashesCommand` → `tasksCommand`, and cleans up the remaining `/bashes` references in backgroundShellRegistry.ts comments. No behavior change beyond the alias removal. * refactor(cli): finish tasksCommand rename — apply content changes The previous commit (03c8503c8) only captured the file rename via `git mv`; the export name change (`bashesCommand` → `tasksCommand`), the removal of `altNames: ['bashes']`, the import update in BuiltinCommandLoader, and the `/bashes` → `/tasks` comments in backgroundShellRegistry.ts were unstaged when that commit landed. Squash candidate before merge. * fix(core): address PR #3642 fourth-round review feedback Four reviewer concerns from @wenshao + @doudouOUC: - [Critical] Config.shutdown() now also calls `backgroundShellRegistry.abortAll()`. Previously only the subagent registry was aborted, so a managed background shell could outlive the CLI process and orphan its child. Symmetric with how `BackgroundTaskRegistry.abortAll()` is wired in. - [P1] shell.ts executeBackground strips a trailing `&` from the command before spawn. The managed path is itself the backgrounding mechanism; forwarding `node server.js &` verbatim made bash exit immediately while the real child outlived the wrapper, causing the registry to settle as `completed` while the shell was still running and chunked output to land on a closed stream. Strip + warn. - [P2] Output file moves under `storage.getProjectTempDir()` (specifically `<projectTempDir>/background-shells/<sessionId>/shell-<id>.output`). `ReadFileTool` already auto-allows the project temp dir, so the LLM can `Read` the captured output without bouncing off a permission prompt — important because background-agent contexts can't surface interactive prompts. - [P2] Background shells are no longer killed when the current turn's AbortSignal fires. Forwarding the turn signal into the entry's AbortController meant a Ctrl+C on the turn would also terminate intentionally backgrounded dev servers / watchers, contradicting the independent-lifecycle promise. Cancellation now flows only through `entryAc` (driven by future `task_stop` integration via #3471). Tests: - New `abortAll` registry tests cover running / mixed / empty cases. - `runs background commands as managed pool entries` test stops asserting the wrapper-vs-entry signal identity since they're now structurally separate (no turn-to-entry forwarding). - New `does not forward the turn signal into the background shell` test pins the new behavior. - New `strips trailing & from the spawned command` test pins the strip. - Removed the cancel-via-outer-signal settle test — that path no longer exists; cancellation is exercised end-to-end via the registry's own `cancel` and `abortAll` tests in `backgroundShellRegistry.test.ts`. * fix(core): tighten trailing & strip — narrow regex + ReDoS-safe Two reviewer concerns on the same line of #3642 round 4: - [Critical CodeQL] `\s*&+\s*$` is a polynomial-time regex on uncontrolled input (long all-`&` strings backtrack quadratically). - [P2 doudouOUC] `&+` is too greedy: it also rewrites `npm run dev &&` into `npm run dev` (breaks logical AND syntax) and `echo foo \&` into `echo foo \` (eats the escaped literal). Only the bare bash background operator should be stripped. Replace the regex with a small linear-time helper `stripTrailingBackgroundAmp` that explicitly checks for the three "don't touch" cases (`&&`, `\&`, no trailing `&`). Plain `endsWith` / `slice` — no regex backtracking, and the intent reads off the page. Tests: - Existing strip-trailing-`&` test still passes. - New `does not strip a trailing &&` test pins the logical-AND case. - New `does not strip an escaped trailing \\&` test pins the escape case. * fix(core): keep binary-detection sniff in streaming mode @doudouOUC noted that `streamStdout` shortcut returned before the binary-sniff path, so a background command emitting binary bytes (`cat /bin/ls`, image dump, etc.) would be text-decoded and appended to the task output file unbounded. Restructure handleOutput so the sniff-and-cutover logic runs in both modes: - Both modes accumulate up to MAX_SNIFF_SIZE for the binary check. The accumulator is bounded; once the threshold is reached, it stops growing in streaming mode (dropped on binary detection / left inert on text confirmation) and continues to accumulate in buffered mode (existing foreground behavior). - Streaming mode emits 'binary_detected' as soon as `isBinary` trips so the consumer can stop writing the output file. Up to ~4KB of bytes may have been emitted as text chunks before detection — this is bounded and acceptable; the unbounded write is the pathology reviewers flagged. - Streaming text mode still emits each decoded chunk immediately and does not accumulate stdout/stderr strings, so long-running text streams remain GC-friendly. - Buffered (foreground) behavior is unchanged — the sniff accumulator is the same path the existing tests cover. Tests: 50 shellExecutionService + 11 backgroundShellRegistry + 57 shell.test.ts all pass; no regressions. * fix(core): tighten streaming sniff bound + Windows rmSync flake Two unrelated reds on the latest CI run: 1. [P1 doudouOUC] Streaming sniff buffer leaks on small chunks. The previous fix recomputed `sniffedBytes` from `Buffer.concat(outputChunks.slice(0, 20)).length` on every chunk — pinned to the first 20 chunks. If those total under MAX_SNIFF_SIZE (line-sized stdout, e.g. dev-server logs) the byte count never grew, the sniff branch stayed open forever, and `outputChunks` accumulated every later chunk — exactly the leak `streamStdout` was meant to prevent. Track sniffed bytes by running sum (`sniffedBytes += data.length`) so the bound is genuine. When sniff confirms text in streaming mode, drop the accumulator immediately so subsequent chunks fall through the streaming emit path without ever touching it. 2. file-exporters.test.ts afterEach `fs.rmSync` flaked on Windows (ENOTEMPTY: directory not empty). The exporter's underlying write stream hasn't always released its handle by the time `rmSync` runs. Pass `maxRetries: 5, retryDelay: 50` so the cleanup retries through the brief Windows handle-release window instead of failing the test on a CI quirk. --------- Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
2026-05-05 23:42:03 +00:00 · 2026-04-28 11:06:50 +08:00 · 2026-04-28 11:06:50 +08:00 · aac2e96ec3
commit aac2e96ec3
parent 03c88b7308
11 changed files with 1024 additions and 410 deletions
--- a/packages/cli/src/services/BuiltinCommandLoader.ts
+++ b/packages/cli/src/services/BuiltinCommandLoader.ts
@ -8,6 +8,7 @@ import type { ICommandLoader } from './types.js';
 import type { SlashCommand } from '../ui/commands/types.js';
 import type { Config } from '@qwen-code/qwen-code-core';
 import { aboutCommand } from '../ui/commands/aboutCommand.js';
+import { tasksCommand } from '../ui/commands/tasksCommand.js';
 import { agentsCommand } from '../ui/commands/agentsCommand.js';
 import { arenaCommand } from '../ui/commands/arenaCommand.js';
 import { approvalModeCommand } from '../ui/commands/approvalModeCommand.js';
@ -92,6 +93,7 @@ export class BuiltinCommandLoader implements ICommandLoader {
    const allDefinitions: Array<SlashCommand | null> = [
      aboutCommand,
      agentsCommand,
+      tasksCommand,
      arenaCommand,
      approvalModeCommand,
      authCommand,
--- a/packages/cli/src/ui/commands/tasksCommand.test.ts
+++ b/packages/cli/src/ui/commands/tasksCommand.test.ts
@ -0,0 +1,94 @@
+/**
+ * @license
+ * Copyright 2026 Qwen Team
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import { vi, describe, it, expect, beforeEach } from 'vitest';
+import { tasksCommand } from './tasksCommand.js';
+import { type CommandContext } from './types.js';
+import { createMockCommandContext } from '../../test-utils/mockCommandContext.js';
+import type { BackgroundShellEntry } from '@qwen-code/qwen-code-core';
+
+function entry(
+  overrides: Partial<BackgroundShellEntry> = {},
+): BackgroundShellEntry {
+  return {
+    shellId: 'bg_aaaaaaaa',
+    command: 'sleep 60',
+    cwd: '/tmp',
+    status: 'running',
+    startTime: Date.now() - 5_000,
+    outputPath: '/tmp/tasks/sess/shell-bg_aaaaaaaa.output',
+    abortController: new AbortController(),
+    ...overrides,
+  };
+}
+
+describe('tasksCommand', () => {
+  let context: CommandContext;
+  let getAll: ReturnType<typeof vi.fn>;
+
+  beforeEach(() => {
+    getAll = vi.fn().mockReturnValue([]);
+    context = createMockCommandContext({
+      services: {
+        config: {
+          getBackgroundShellRegistry: () => ({ getAll }),
+        },
+      },
+    } as unknown as Parameters<typeof createMockCommandContext>[0]);
+  });
+
+  it('reports an empty registry', async () => {
+    const result = await tasksCommand.action!(context, '');
+    expect(result).toEqual({
+      type: 'message',
+      messageType: 'info',
+      content: 'No background shells.',
+    });
+  });
+
+  it('lists running and terminal entries with status / runtime / output path', async () => {
+    getAll.mockReturnValue([
+      entry({
+        shellId: 'bg_run',
+        command: 'npm run dev',
+        status: 'running',
+        startTime: Date.now() - 12_000,
+        pid: 1111,
+      }),
+      entry({
+        shellId: 'bg_done',
+        command: 'npm test',
+        status: 'completed',
+        exitCode: 0,
+        startTime: Date.now() - 70_000,
+        endTime: Date.now() - 5_000,
+        outputPath: '/tmp/tasks/sess/shell-bg_done.output',
+      }),
+      entry({
+        shellId: 'bg_fail',
+        command: 'flaky.sh',
+        status: 'failed',
+        error: 'spawn ENOENT',
+        startTime: Date.now() - 3_000,
+        endTime: Date.now() - 2_000,
+      }),
+    ]);
+
+    const result = await tasksCommand.action!(context, '');
+    if (!result || result.type !== 'message') {
+      throw new Error('expected message result');
+    }
+    expect(result.content).toContain('Background shells (3 total)');
+    expect(result.content).toContain('[bg_run] running');
+    expect(result.content).toContain('pid=1111');
+    expect(result.content).toContain('npm run dev');
+    expect(result.content).toContain('[bg_done] completed (exit 0)');
+    expect(result.content).toContain('[bg_fail] failed: spawn ENOENT');
+    expect(result.content).toContain(
+      'output: /tmp/tasks/sess/shell-bg_done.output',
+    );
+  });
+});
--- a/packages/cli/src/ui/commands/tasksCommand.ts
+++ b/packages/cli/src/ui/commands/tasksCommand.ts
@ -0,0 +1,78 @@
+/**
+ * @license
+ * Copyright 2026 Qwen Team
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import type { BackgroundShellEntry } from '@qwen-code/qwen-code-core';
+import type { SlashCommand } from './types.js';
+import { CommandKind } from './types.js';
+import { t } from '../../i18n/index.js';
+import { formatDuration } from '../utils/formatters.js';
+
+function statusLabel(entry: BackgroundShellEntry): string {
+  switch (entry.status) {
+    case 'completed':
+      return `completed (exit ${entry.exitCode ?? '?'})`;
+    case 'failed':
+      return `failed: ${entry.error ?? 'unknown error'}`;
+    case 'cancelled':
+      return 'cancelled';
+    case 'running':
+      return 'running';
+    default:
+      return entry.status;
+  }
+}
+
+export const tasksCommand: SlashCommand = {
+  name: 'tasks',
+  get description() {
+    return t('List background tasks');
+  },
+  kind: CommandKind.BUILT_IN,
+  supportedModes: ['interactive', 'non_interactive', 'acp'] as const,
+  action: async (context) => {
+    const { config } = context.services;
+    if (!config) {
+      return {
+        type: 'message' as const,
+        messageType: 'error' as const,
+        content: 'Config not available.',
+      };
+    }
+
+    const entries = config.getBackgroundShellRegistry().getAll();
+
+    if (entries.length === 0) {
+      return {
+        type: 'message' as const,
+        messageType: 'info' as const,
+        content: 'No background shells.',
+      };
+    }
+
+    const now = Date.now();
+    const lines: string[] = [
+      `Background shells (${entries.length} total)`,
+      '',
+    ];
+    for (const entry of entries) {
+      const endTime = entry.endTime ?? now;
+      const runtime = formatDuration(endTime - entry.startTime, {
+        hideTrailingZeros: true,
+      });
+      const pidPart = entry.pid !== undefined ? ` pid=${entry.pid}` : '';
+      lines.push(
+        `[${entry.shellId}] ${statusLabel(entry)}  ${runtime}${pidPart}  ${entry.command}`,
+      );
+      lines.push(`            output: ${entry.outputPath}`);
+    }
+
+    return {
+      type: 'message' as const,
+      messageType: 'info' as const,
+      content: lines.join('\n'),
+    };
+  },
+};
--- a/packages/core/src/config/config.ts
+++ b/packages/core/src/config/config.ts
@ -61,6 +61,7 @@ import { PermissionManager } from '../permissions/permission-manager.js';
 import { SubagentManager } from '../subagents/subagent-manager.js';
 import type { SubagentConfig } from '../subagents/types.js';
 import { BackgroundTaskRegistry } from '../agents/background-tasks.js';
+import { BackgroundShellRegistry } from '../services/backgroundShellRegistry.js';
 import {
  DEFAULT_OTLP_ENDPOINT,
  DEFAULT_TELEMETRY_TARGET,
@ -545,6 +546,7 @@ export class Config {
  private promptRegistry!: PromptRegistry;
  private subagentManager!: SubagentManager;
  private readonly backgroundTaskRegistry = new BackgroundTaskRegistry();
+  private readonly backgroundShellRegistry = new BackgroundShellRegistry();
  private extensionManager!: ExtensionManager;
  private skillManager: SkillManager | null = null;
  private permissionManager: PermissionManager | null = null;
@ -1610,6 +1612,7 @@ export class Config {
      }

      this.backgroundTaskRegistry.abortAll();
+      this.backgroundShellRegistry.abortAll();

      await this.cleanupArenaRuntime();
    } catch (error) {
@ -2489,6 +2492,10 @@ export class Config {
    return this.backgroundTaskRegistry;
  }

+  getBackgroundShellRegistry(): BackgroundShellRegistry {
+    return this.backgroundShellRegistry;
+  }
+
  /**
   * Whether interactive permission prompts should be auto-denied.
   * True for background agents that have no UI to show prompts.
--- a/packages/core/src/index.ts
+++ b/packages/core/src/index.ts
@ -147,6 +147,7 @@ export * from './services/sessionService.js';
 export * from './services/sessionTitle.js';
 export { stripTerminalControlSequences } from './utils/terminalSafe.js';
 export * from './services/shellExecutionService.js';
+export * from './services/backgroundShellRegistry.js';
 export * from './services/toolUseSummary.js';
 export * from './utils/bareMode.js';

--- a/packages/core/src/services/backgroundShellRegistry.test.ts
+++ b/packages/core/src/services/backgroundShellRegistry.test.ts
@ -0,0 +1,159 @@
+/**
+ * @license
+ * Copyright 2026 Qwen Team
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import { describe, expect, it } from 'vitest';
+import {
+  BackgroundShellRegistry,
+  type BackgroundShellEntry,
+} from './backgroundShellRegistry.js';
+
+function makeEntry(
+  overrides: Partial<BackgroundShellEntry> = {},
+): BackgroundShellEntry {
+  return {
+    shellId: 's1',
+    command: 'sleep 60',
+    cwd: '/tmp',
+    status: 'running',
+    startTime: 1000,
+    outputPath: '/tmp/s1.output',
+    abortController: new AbortController(),
+    ...overrides,
+  };
+}
+
+describe('BackgroundShellRegistry', () => {
+  describe('register / get / getAll', () => {
+    it('round-trips a registered entry by id', () => {
+      const reg = new BackgroundShellRegistry();
+      const e = makeEntry({ shellId: 'a' });
+      reg.register(e);
+      expect(reg.get('a')).toBe(e);
+    });
+
+    it('returns undefined for unknown id', () => {
+      const reg = new BackgroundShellRegistry();
+      expect(reg.get('missing')).toBeUndefined();
+    });
+
+    it('lists all entries via getAll', () => {
+      const reg = new BackgroundShellRegistry();
+      const a = makeEntry({ shellId: 'a' });
+      const b = makeEntry({ shellId: 'b' });
+      reg.register(a);
+      reg.register(b);
+      const all = reg.getAll();
+      expect(all).toHaveLength(2);
+      expect(all).toContain(a);
+      expect(all).toContain(b);
+    });
+  });
+
+  describe('complete', () => {
+    it('transitions running → completed with exitCode and endTime', () => {
+      const reg = new BackgroundShellRegistry();
+      reg.register(makeEntry({ shellId: 'a' }));
+      reg.complete('a', 0, 2000);
+      const e = reg.get('a')!;
+      expect(e.status).toBe('completed');
+      expect(e.exitCode).toBe(0);
+      expect(e.endTime).toBe(2000);
+    });
+
+    it('is a no-op when entry is not running', () => {
+      const reg = new BackgroundShellRegistry();
+      reg.register(makeEntry({ shellId: 'a' }));
+      reg.cancel('a', 1500);
+      reg.complete('a', 0, 2000);
+      const e = reg.get('a')!;
+      expect(e.status).toBe('cancelled');
+      expect(e.exitCode).toBeUndefined();
+    });
+
+    it('is a no-op for unknown id', () => {
+      const reg = new BackgroundShellRegistry();
+      expect(() => reg.complete('missing', 0, 0)).not.toThrow();
+    });
+  });
+
+  describe('fail', () => {
+    it('transitions running → failed with error and endTime', () => {
+      const reg = new BackgroundShellRegistry();
+      reg.register(makeEntry({ shellId: 'a' }));
+      reg.fail('a', 'spawn error', 2000);
+      const e = reg.get('a')!;
+      expect(e.status).toBe('failed');
+      expect(e.error).toBe('spawn error');
+      expect(e.endTime).toBe(2000);
+    });
+
+    it('is a no-op when entry is not running', () => {
+      const reg = new BackgroundShellRegistry();
+      reg.register(makeEntry({ shellId: 'a' }));
+      reg.complete('a', 0, 1500);
+      reg.fail('a', 'late error', 2000);
+      const e = reg.get('a')!;
+      expect(e.status).toBe('completed');
+      expect(e.error).toBeUndefined();
+    });
+  });
+
+  describe('abortAll', () => {
+    it('cancels every running entry and leaves terminal entries alone', () => {
+      const reg = new BackgroundShellRegistry();
+      const acRunning1 = new AbortController();
+      const acRunning2 = new AbortController();
+      const acDone = new AbortController();
+      reg.register(makeEntry({ shellId: 'a', abortController: acRunning1 }));
+      reg.register(makeEntry({ shellId: 'b', abortController: acRunning2 }));
+      reg.register(makeEntry({ shellId: 'c', abortController: acDone }));
+      reg.complete('c', 0, 1500);
+
+      reg.abortAll();
+
+      expect(reg.get('a')!.status).toBe('cancelled');
+      expect(reg.get('b')!.status).toBe('cancelled');
+      expect(reg.get('c')!.status).toBe('completed');
+      expect(acRunning1.signal.aborted).toBe(true);
+      expect(acRunning2.signal.aborted).toBe(true);
+      expect(acDone.signal.aborted).toBe(false);
+    });
+
+    it('is a no-op when registry is empty', () => {
+      const reg = new BackgroundShellRegistry();
+      expect(() => reg.abortAll()).not.toThrow();
+    });
+  });
+
+  describe('cancel', () => {
+    it('transitions running → cancelled and aborts the signal', () => {
+      const reg = new BackgroundShellRegistry();
+      const ac = new AbortController();
+      reg.register(makeEntry({ shellId: 'a', abortController: ac }));
+      reg.cancel('a', 2000);
+      const e = reg.get('a')!;
+      expect(e.status).toBe('cancelled');
+      expect(e.endTime).toBe(2000);
+      expect(ac.signal.aborted).toBe(true);
+    });
+
+    it('is a no-op when entry is already terminal', () => {
+      const reg = new BackgroundShellRegistry();
+      const ac = new AbortController();
+      reg.register(makeEntry({ shellId: 'a', abortController: ac }));
+      reg.complete('a', 0, 1500);
+      reg.cancel('a', 2000);
+      const e = reg.get('a')!;
+      expect(e.status).toBe('completed');
+      expect(ac.signal.aborted).toBe(false);
+    });
+
+    it('is a no-op for unknown id', () => {
+      const reg = new BackgroundShellRegistry();
+      expect(() => reg.cancel('missing', 0)).not.toThrow();
+    });
+  });
+});
--- a/packages/core/src/services/backgroundShellRegistry.ts
+++ b/packages/core/src/services/backgroundShellRegistry.ts
@ -0,0 +1,109 @@
+/**
+ * @license
+ * Copyright 2026 Qwen Team
+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+/**
+ * Tracks background shell processes spawned via the `shell` tool with
+ * `is_background: true`. Each entry holds the metadata the agent and the
+ * `/tasks` slash command need to query, observe, or terminate a running
+ * background shell.
+ *
+ * State machine: register → running → { completed | failed | cancelled }.
+ * Transitions out of running are one-shot: complete/fail/cancel become
+ * no-ops once the entry has settled. This prevents late callbacks (e.g. a
+ * process that exits during cancellation) from clobbering the terminal
+ * status.
+ */
+
+export type BackgroundShellStatus =
+  | 'running'
+  | 'completed'
+  | 'failed'
+  | 'cancelled';
+
+export interface BackgroundShellEntry {
+  /** Stable id used by the model and the `/tasks` UI. */
+  shellId: string;
+  /** The user-supplied command, after any pre-processing the tool applies. */
+  command: string;
+  /** Working directory the process was spawned in. */
+  cwd: string;
+  /** OS pid once spawned; absent if registration happens before spawn. */
+  pid?: number;
+  status: BackgroundShellStatus;
+  /** Exit code on `completed`. */
+  exitCode?: number;
+  /** Error message on `failed`. */
+  error?: string;
+  /** ms epoch when the entry was registered. */
+  startTime: number;
+  /** ms epoch when the entry transitioned out of running. */
+  endTime?: number;
+  /** Absolute path of the captured stdout/stderr file. */
+  outputPath: string;
+  /** Aborted by `cancel()`; callers should wire it into the spawn. */
+  abortController: AbortController;
+}
+
+export class BackgroundShellRegistry {
+  // Entries persist for the session lifetime — no automatic eviction of
+  // terminal entries. For typical interactive sessions (tens of background
+  // shells over an hour) this is fine, but long-running sessions that spawn
+  // many short-lived background commands will see the map and the on-disk
+  // output files grow without bound. Eviction policy (LRU? age-based? cap?)
+  // is left as a follow-up alongside output-file rotation.
+  private readonly entries = new Map<string, BackgroundShellEntry>();
+
+  register(entry: BackgroundShellEntry): void {
+    this.entries.set(entry.shellId, entry);
+  }
+
+  get(shellId: string): BackgroundShellEntry | undefined {
+    return this.entries.get(shellId);
+  }
+
+  getAll(): readonly BackgroundShellEntry[] {
+    return [...this.entries.values()];
+  }
+
+  complete(shellId: string, exitCode: number, endTime: number): void {
+    const entry = this.entries.get(shellId);
+    if (!entry || entry.status !== 'running') return;
+    entry.status = 'completed';
+    entry.exitCode = exitCode;
+    entry.endTime = endTime;
+  }
+
+  fail(shellId: string, error: string, endTime: number): void {
+    const entry = this.entries.get(shellId);
+    if (!entry || entry.status !== 'running') return;
+    entry.status = 'failed';
+    entry.error = error;
+    entry.endTime = endTime;
+  }
+
+  cancel(shellId: string, endTime: number): void {
+    const entry = this.entries.get(shellId);
+    if (!entry || entry.status !== 'running') return;
+    entry.status = 'cancelled';
+    entry.endTime = endTime;
+    entry.abortController.abort();
+  }
+
+  /**
+   * Cancel every still-running entry. Called on session/Config shutdown so
+   * background shells don't outlive the CLI process and leak orphaned
+   * children. Symmetric with `BackgroundTaskRegistry.abortAll()` for the
+   * subagent path.
+   */
+  abortAll(): void {
+    const endTime = Date.now();
+    for (const entry of Array.from(this.entries.values())) {
+      if (entry.status === 'running') {
+        this.cancel(entry.shellId, endTime);
+      }
+    }
+  }
+}
--- a/packages/core/src/services/shellExecutionService.ts
+++ b/packages/core/src/services/shellExecutionService.ts
@ -329,6 +329,7 @@ export class ShellExecutionService {
    abortSignal: AbortSignal,
    shouldUseNodePty: boolean,
    shellExecutionConfig: ShellExecutionConfig,
+    options: { streamStdout?: boolean } = {},
  ): Promise<ShellExecutionHandle> {
    if (shouldUseNodePty) {
      const ptyInfo = await getPty();
@ -353,6 +354,7 @@ export class ShellExecutionService {
      cwd,
      onOutputEvent,
      abortSignal,
+      options.streamStdout ?? false,
    );
  }

@ -361,6 +363,7 @@ export class ShellExecutionService {
    cwd: string,
    onOutputEvent: (event: ShellOutputEvent) => void,
    abortSignal: AbortSignal,
+    streamStdout: boolean,
  ): ShellExecutionHandle {
    try {
      const isWindows = os.platform() === 'win32';
@ -415,26 +418,61 @@ export class ShellExecutionService {
            }
          }

-          outputChunks.push(data);
-
+          // Binary sniff applies in both modes — even streaming consumers
+          // (e.g. background shell output file) shouldn't pile up text-decoded
+          // garbage when the command actually emits binary (`cat /bin/ls`,
+          // image dumps, etc.). Track sniffed bytes by running sum so the
+          // accumulator is truly byte-bounded — the previous version recomputed
+          // sniffedBytes from `slice(0, 20)` on every call, which never grew
+          // past the first 20 chunks' total and let the chunk array leak on
+          // line-sized streams.
          if (isStreamingRawContent && sniffedBytes < MAX_SNIFF_SIZE) {
-            const sniffBuffer = Buffer.concat(outputChunks.slice(0, 20));
-            sniffedBytes = sniffBuffer.length;
-
+            outputChunks.push(data);
+            sniffedBytes += data.length;
+            const sniffBuffer = Buffer.concat(outputChunks);
            if (isBinary(sniffBuffer)) {
              isStreamingRawContent = false;
+              if (streamStdout) {
+                // Tell the streaming consumer to stop writing text chunks;
+                // drop the sniff accumulator now so it can be GC'd.
+                onOutputEvent({ type: 'binary_detected' });
+                outputChunks.length = 0;
+              }
+            } else if (streamStdout && sniffedBytes >= MAX_SNIFF_SIZE) {
+              // Sniff passed in streaming mode — text confirmed, drop the
+              // accumulator. Subsequent chunks fall through to the streaming
+              // emit path below without ever touching outputChunks.
+              outputChunks.length = 0;
            }
+          } else if (!streamStdout) {
+            // Buffered (foreground) mode past sniff: keep accumulating for
+            // the final emit at exit. Streaming mode does not accumulate.
+            outputChunks.push(data);
          }

-          if (isStreamingRawContent) {
-            const decoder = stream === 'stdout' ? stdoutDecoder : stderrDecoder;
-            const decodedChunk = decoder.decode(data, { stream: true });
+          if (!isStreamingRawContent) {
+            // Binary mode: drop further data. Foreground emits the
+            // binary_detected event from handleExit (existing behavior);
+            // background already emitted it above.
+            return;
+          }

-            if (stream === 'stdout') {
-              stdout += decodedChunk;
-            } else {
-              stderr += decodedChunk;
-            }
+          const decoder = stream === 'stdout' ? stdoutDecoder : stderrDecoder;
+          const decodedChunk = decoder.decode(data, { stream: true });
+
+          if (streamStdout) {
+            // Streaming text mode: push through immediately, no string
+            // accumulation. (Up to ~4KB may already have been emitted
+            // before binary detection trips — bounded, acceptable.)
+            onOutputEvent({ type: 'data', chunk: decodedChunk });
+            return;
+          }
+
+          // Buffered text mode: accumulate for the final cleaned-blob emit.
+          if (stream === 'stdout') {
+            stdout += decodedChunk;
+          } else {
+            stderr += decodedChunk;
          }
        };

@ -451,7 +489,9 @@ export class ShellExecutionService {
          const finalStrippedOutput = stripAnsi(combinedOutput).trim();

          if (isStreamingRawContent) {
-            if (finalStrippedOutput) {
+            // In streaming mode chunks were already emitted as they arrived;
+            // re-emitting the final blob would duplicate everything.
+            if (!streamStdout && finalStrippedOutput) {
              onOutputEvent({ type: 'data', chunk: finalStrippedOutput });
            }
          } else {
--- a/packages/core/src/telemetry/file-exporters.test.ts
+++ b/packages/core/src/telemetry/file-exporters.test.ts
@ -27,7 +27,14 @@ describe('FileExporter.serialize', () => {

  afterEach(async () => {
    await exporter.shutdown();
-    fs.rmSync(tmpDir, { recursive: true, force: true });
+    // Windows occasionally returns ENOTEMPTY when the underlying file
+    // handle isn't fully released yet; retry a few times before failing.
+    fs.rmSync(tmpDir, {
+      recursive: true,
+      force: true,
+      maxRetries: 5,
+      retryDelay: 50,
+    });
  });

  // Regression for upstream PR #4689: a raw JSON.stringify on a ReadableSpan
--- a/packages/core/src/tools/shell.test.ts
+++ b/packages/core/src/tools/shell.test.ts
@ -31,8 +31,6 @@ import {
 } from '../services/shellExecutionService.js';
 import * as fs from 'node:fs';
 import * as os from 'node:os';
-import { EOL } from 'node:os';
-import * as path from 'node:path';
 import * as crypto from 'node:crypto';
 import { ToolErrorType } from './tool-error.js';
 import { OUTPUT_UPDATE_INTERVAL_MS } from './shell.js';
@ -55,12 +53,14 @@ describe('ShellTool', () => {
      getPermissionsDeny: vi.fn().mockReturnValue([]),
      getDebugMode: vi.fn().mockReturnValue(false),
      getTargetDir: vi.fn().mockReturnValue('/test/dir'),
+      getSessionId: vi.fn().mockReturnValue('test-session'),
      getWorkspaceContext: vi
        .fn()
        .mockReturnValue(createMockWorkspaceContext('/test/dir')),
      storage: {
        getUserSkillsDirs: vi.fn().mockReturnValue(['/test/dir/.qwen/skills']),
        getProjectTempDir: vi.fn().mockReturnValue('/tmp/qwen-temp'),
+        getProjectDir: vi.fn().mockReturnValue('/test/proj'),
      },
      getTruncateToolOutputThreshold: vi.fn().mockReturnValue(0),
      getTruncateToolOutputLines: vi.fn().mockReturnValue(0),
@ -72,8 +72,24 @@ describe('ShellTool', () => {
        email: 'qwen-coder@alibabacloud.com',
      }),
      getShouldUseNodePtyShell: vi.fn().mockReturnValue(false),
+      getBackgroundShellRegistry: vi.fn().mockReturnValue({
+        register: vi.fn(),
+        get: vi.fn(),
+        getAll: vi.fn().mockReturnValue([]),
+        cancel: vi.fn(),
+        complete: vi.fn(),
+        fail: vi.fn(),
+      }),
    } as unknown as Config;

+    // executeBackground writes to disk; stub mkdirSync + createWriteStream.
+    vi.mocked(fs.mkdirSync).mockReturnValue(undefined);
+    vi.mocked(fs.createWriteStream).mockReturnValue({
+      write: vi.fn(),
+      end: vi.fn(),
+      on: vi.fn(),
+    } as unknown as fs.WriteStream);
+
    shellTool = new ShellTool(mockConfig);

    vi.mocked(os.platform).mockReturnValue('linux');
@ -271,83 +287,202 @@ describe('ShellTool', () => {
      resolveExecutionPromise(fullResult);
    };

-    it('should wrap background command on linux and parse pgrep output', async () => {
+    it('runs background commands as managed pool entries (no & / pgrep wrap)', async () => {
+      const registry = mockConfig.getBackgroundShellRegistry();
      const invocation = shellTool.build({
-        command: 'my-command',
+        command: 'npm start',
        is_background: true,
      });
-      const promise = invocation.execute(mockAbortSignal);
-      resolveShellExecution({ pid: 54321 });

-      vi.mocked(fs.existsSync).mockReturnValue(true);
-      vi.mocked(fs.readFileSync).mockReturnValue(`54321${EOL}54322${EOL}`); // Service PID and background PID
+      const result = await invocation.execute(mockAbortSignal);

-      const result = await promise;
-
-      const tmpFile = path.join(os.tmpdir(), 'shell_pgrep_abcdef.tmp');
-      const wrappedCommand = `{ my-command & }; __code=$?; pgrep -g 0 >${tmpFile} 2>&1; exit $__code;`;
+      // Spawn happens with the unwrapped command — no '&', no pgrep envelope.
+      // Streaming mode is on so dev-server / watcher output flushes to the
+      // output file as it arrives instead of buffering until exit.
      expect(mockShellExecutionService).toHaveBeenCalledWith(
-        wrappedCommand,
+        'npm start',
        '/test/dir',
        expect.any(Function),
        expect.any(AbortSignal),
        false,
        {},
+        { streamStdout: true },
      );
-      expect(result.llmContent).toContain('PIDs: 54322');
-      expect(vi.mocked(fs.unlinkSync)).toHaveBeenCalledWith(tmpFile);
+      // Entry registered with the spawn pid.
+      expect(registry.register).toHaveBeenCalledTimes(1);
+      const entry = (registry.register as Mock).mock.calls[0][0];
+      expect(entry.command).toBe('npm start');
+      expect(entry.cwd).toBe('/test/dir');
+      expect(entry.status).toBe('running');
+      expect(entry.pid).toBe(12345);
+      expect(typeof entry.shellId).toBe('string');
+      expect(entry.outputPath).toContain('shell-');
+      // Returns immediately with id + output path; agent's turn isn't blocked.
+      expect(result.llmContent).toContain(entry.shellId);
+      expect(result.llmContent).toContain(entry.outputPath);
    });

-    it('should add ampersand to command when is_background is true and command does not end with &', async () => {
+    it('settles a background entry as completed when the process exits cleanly', async () => {
+      const registry = mockConfig.getBackgroundShellRegistry();
      const invocation = shellTool.build({
-        command: 'npm start',
+        command: 'true',
        is_background: true,
      });
-      const promise = invocation.execute(mockAbortSignal);
-      resolveShellExecution({ pid: 54321 });
+      await invocation.execute(mockAbortSignal);
+      const entry = (registry.register as Mock).mock.calls[0][0];

-      vi.mocked(fs.existsSync).mockReturnValue(true);
-      vi.mocked(fs.readFileSync).mockReturnValue('54321\n54322\n');
+      resolveExecutionPromise({
+        rawOutput: Buffer.from(''),
+        output: '',
+        exitCode: 0,
+        signal: null,
+        error: null,
+        aborted: false,
+        pid: 12345,
+        executionMethod: 'child_process',
+      });
+      // Flush the .then() microtask attached to resultPromise.
+      await new Promise((r) => setImmediate(r));

-      await promise;
+      expect(registry.complete).toHaveBeenCalledWith(
+        entry.shellId,
+        0,
+        expect.any(Number),
+      );
+      expect(registry.fail).not.toHaveBeenCalled();
+      expect(registry.cancel).not.toHaveBeenCalled();
+    });

-      const tmpFile = path.join(os.tmpdir(), 'shell_pgrep_abcdef.tmp');
-      const wrappedCommand = `{ npm start & }; __code=$?; pgrep -g 0 >${tmpFile} 2>&1; exit $__code;`;
+    it('settles a background entry as failed when ShellExecutionService reports error', async () => {
+      const registry = mockConfig.getBackgroundShellRegistry();
+      const invocation = shellTool.build({
+        command: 'no-such-command',
+        is_background: true,
+      });
+      await invocation.execute(mockAbortSignal);
+      const entry = (registry.register as Mock).mock.calls[0][0];
+
+      resolveExecutionPromise({
+        rawOutput: Buffer.from(''),
+        output: '',
+        exitCode: null,
+        signal: null,
+        error: new Error('spawn ENOENT'),
+        aborted: false,
+        pid: 12345,
+        executionMethod: 'child_process',
+      });
+      await new Promise((r) => setImmediate(r));
+
+      expect(registry.fail).toHaveBeenCalledWith(
+        entry.shellId,
+        'spawn ENOENT',
+        expect.any(Number),
+      );
+      expect(registry.complete).not.toHaveBeenCalled();
+    });
+
+    it('settles a background entry as failed on non-zero exit code (no error object)', async () => {
+      const registry = mockConfig.getBackgroundShellRegistry();
+      const invocation = shellTool.build({
+        command: 'false',
+        is_background: true,
+      });
+      await invocation.execute(mockAbortSignal);
+      const entry = (registry.register as Mock).mock.calls[0][0];
+
+      // ShellExecutionService reports a clean non-zero exit (no error object,
+      // no signal) — historically this got bucketed as `completed`, which
+      // misreported a failed `npm test` / `false` as a success.
+      resolveExecutionPromise({
+        rawOutput: Buffer.from(''),
+        output: '',
+        exitCode: 1,
+        signal: null,
+        error: null,
+        aborted: false,
+        pid: 12345,
+        executionMethod: 'child_process',
+      });
+      await new Promise((r) => setImmediate(r));
+
+      expect(registry.fail).toHaveBeenCalledWith(
+        entry.shellId,
+        expect.stringContaining('exited with code 1'),
+        expect.any(Number),
+      );
+      expect(registry.complete).not.toHaveBeenCalled();
+    });
+
+    it('strips trailing & from the spawned command (managed path handles backgrounding)', async () => {
+      const invocation = shellTool.build({
+        command: 'node server.js &',
+        is_background: true,
+      });
+      await invocation.execute(mockAbortSignal);
      expect(mockShellExecutionService).toHaveBeenCalledWith(
-        wrappedCommand,
+        'node server.js',
+        '/test/dir',
+        expect.any(Function),
+        expect.any(AbortSignal),
+        false,
+        {},
+        { streamStdout: true },
+      );
+    });
+
+    it('does not strip a trailing && (logical AND would be syntactically broken)', async () => {
+      const invocation = shellTool.build({
+        command: 'npm run dev &&',
+        is_background: true,
+      });
+      await invocation.execute(mockAbortSignal);
+      expect(mockShellExecutionService).toHaveBeenCalledWith(
+        'npm run dev &&',
        expect.any(String),
        expect.any(Function),
        expect.any(AbortSignal),
        false,
        {},
+        { streamStdout: true },
      );
    });

-    it('should not add extra ampersand when is_background is true and command already ends with &', async () => {
+    it('does not strip an escaped trailing \\& (literal &)', async () => {
      const invocation = shellTool.build({
-        command: 'npm start &',
+        command: 'echo foo \\&',
        is_background: true,
      });
-      const promise = invocation.execute(mockAbortSignal);
-      resolveShellExecution({ pid: 54321 });
-
-      vi.mocked(fs.existsSync).mockReturnValue(true);
-      vi.mocked(fs.readFileSync).mockReturnValue('54321\n54322\n');
-
-      await promise;
-
-      const tmpFile = path.join(os.tmpdir(), 'shell_pgrep_abcdef.tmp');
-      const wrappedCommand = `{ npm start & }; __code=$?; pgrep -g 0 >${tmpFile} 2>&1; exit $__code;`;
+      await invocation.execute(mockAbortSignal);
      expect(mockShellExecutionService).toHaveBeenCalledWith(
-        wrappedCommand,
+        'echo foo \\&',
        expect.any(String),
        expect.any(Function),
        expect.any(AbortSignal),
        false,
        {},
+        { streamStdout: true },
      );
    });

+    it('does not forward the turn signal into the background shell', async () => {
+      // Verifies: the AbortSignal handed to ShellExecutionService is the
+      // entry's own controller, not the outer turn signal. Cancelling the
+      // turn must not kill an intentionally backgrounded dev server / watcher.
+      const turnAc = new AbortController();
+      const invocation = shellTool.build({
+        command: 'npm run dev',
+        is_background: true,
+      });
+      await invocation.execute(turnAc.signal);
+      const passedSignal = mockShellExecutionService.mock.calls[0][3];
+      expect(passedSignal).not.toBe(turnAc.signal);
+      turnAc.abort();
+      // The signal handed to ShellExecutionService stays un-aborted —
+      // the turn's abort doesn't propagate into the background shell.
+      expect(passedSignal.aborted).toBe(false);
+    });
+
    it('should not add ampersand when is_background is false', async () => {
      const invocation = shellTool.build({
        command: 'npm test',
@ -479,23 +614,6 @@ describe('ShellTool', () => {
      ).toThrow('Directory must be an absolute path.');
    });

-    it('should clean up the temp file on synchronous execution error', async () => {
-      const error = new Error('sync spawn error');
-      mockShellExecutionService.mockImplementation(() => {
-        throw error;
-      });
-      vi.mocked(fs.existsSync).mockReturnValue(true); // Pretend the file exists
-
-      const invocation = shellTool.build({
-        command: 'a-command',
-        is_background: false,
-      });
-      await expect(invocation.execute(mockAbortSignal)).rejects.toThrow(error);
-
-      const tmpFile = path.join(os.tmpdir(), 'shell_pgrep_abcdef.tmp');
-      expect(vi.mocked(fs.unlinkSync)).toHaveBeenCalledWith(tmpFile);
-    });
-
    describe('Streaming to `updateOutput`', () => {
      let updateOutputMock: Mock;
      beforeEach(() => {
@ -1016,43 +1134,6 @@ describe('ShellTool', () => {
    });
  });

-  describe('Windows background execution', () => {
-    it('should clean up trailing ampersand on Windows for background tasks', async () => {
-      vi.mocked(os.platform).mockReturnValue('win32');
-      const mockAbortSignal = new AbortController().signal;
-
-      const invocation = shellTool.build({
-        command: 'npm start &',
-        is_background: true,
-      });
-
-      const promise = invocation.execute(mockAbortSignal);
-
-      // Simulate immediate success (process started)
-      resolveExecutionPromise({
-        rawOutput: Buffer.from(''),
-        output: '',
-        exitCode: 0,
-        signal: null,
-        error: null,
-        aborted: false,
-        pid: 12345,
-        executionMethod: 'child_process',
-      });
-
-      await promise;
-
-      expect(mockShellExecutionService).toHaveBeenCalledWith(
-        'npm start',
-        expect.any(String),
-        expect.any(Function),
-        expect.any(AbortSignal),
-        false,
-        {},
-      );
-    });
-  });
-
  describe('timeout parameter', () => {
    it('should validate timeout parameter correctly', async () => {
      // Valid timeout
@ -1177,40 +1258,6 @@ describe('ShellTool', () => {
      expect(calledSignal).not.toBe(mockAbortSignal);
    });

-    it('should not create timeout signal for background execution', async () => {
-      const mockAbortSignal = new AbortController().signal;
-      const invocation = shellTool.build({
-        command: 'npm start',
-        is_background: true,
-        timeout: 5000,
-      });
-
-      const promise = invocation.execute(mockAbortSignal);
-
-      resolveExecutionPromise({
-        rawOutput: Buffer.from(''),
-        output: 'Background command started. PID: 12345',
-        exitCode: 0,
-        signal: null,
-        error: null,
-        aborted: false,
-        pid: 12345,
-        executionMethod: 'child_process',
-      });
-
-      await promise;
-
-      // For background execution, the original signal should be used
-      expect(mockShellExecutionService).toHaveBeenCalledWith(
-        expect.any(String),
-        expect.any(String),
-        expect.any(Function),
-        mockAbortSignal,
-        false,
-        {},
-      );
-    });
-
    it('should handle timeout vs user cancellation correctly', async () => {
      const userAbortController = new AbortController();
      const invocation = shellTool.build({
--- a/packages/core/src/tools/shell.ts
+++ b/packages/core/src/tools/shell.ts
@ -6,7 +6,7 @@

 import fs from 'node:fs';
 import path from 'node:path';
-import os, { EOL } from 'node:os';
+import os from 'node:os';
 import crypto from 'node:crypto';
 import type { Config } from '../config/config.js';
 import { ToolNames, ToolDisplayNames } from './tool-names.js';
@ -29,6 +29,8 @@ import type {
  ShellOutputEvent,
 } from '../services/shellExecutionService.js';
 import { ShellExecutionService } from '../services/shellExecutionService.js';
+import type { BackgroundShellEntry } from '../services/backgroundShellRegistry.js';
+import stripAnsi from 'strip-ansi';
 import { formatMemoryUsage } from '../utils/formatters.js';
 import type { AnsiOutput } from '../utils/terminalSerializer.js';
 import { isSubpaths } from '../utils/paths.js';
@ -46,6 +48,20 @@ import {

 const debugLogger = createDebugLogger('SHELL');

+/**
+ * Strip a single bare trailing `&` (bash background operator) from a
+ * command string. Returns the input unchanged if the trailing form is
+ * `&&` (logical AND), `\&` (escaped literal `&`), or there is no `&`
+ * at the end at all. Linear time, no regex backtracking risk.
+ */
+function stripTrailingBackgroundAmp(command: string): string {
+  const trimmed = command.trimEnd();
+  if (!trimmed.endsWith('&')) return command;
+  if (trimmed.endsWith('&&')) return command;
+  if (trimmed.endsWith('\\&')) return command;
+  return trimmed.slice(0, -1).trimEnd();
+}
+
 export const OUTPUT_UPDATE_INTERVAL_MS = 1000;
 const DEFAULT_FOREGROUND_TIMEOUT_MS = 120000;

@ -208,9 +224,12 @@ export class ShellToolInvocation extends BaseToolInvocation<
      };
    }

-    const effectiveTimeout = this.params.is_background
-      ? undefined
-      : (this.params.timeout ?? DEFAULT_FOREGROUND_TIMEOUT_MS);
+    if (this.params.is_background) {
+      return this.executeBackground(signal, shellExecutionConfig);
+    }
+
+    const effectiveTimeout =
+      this.params.timeout ?? DEFAULT_FOREGROUND_TIMEOUT_MS;

    // Create combined signal with timeout for foreground execution
    let combinedSignal = signal;
@ -219,294 +238,345 @@ export class ShellToolInvocation extends BaseToolInvocation<
      combinedSignal = AbortSignal.any([signal, timeoutSignal]);
    }

-    const isWindows = os.platform() === 'win32';
-    const tempFileName = `shell_pgrep_${crypto
-      .randomBytes(6)
-      .toString('hex')}.tmp`;
-    const tempFilePath = path.join(os.tmpdir(), tempFileName);
+    // Add co-author to git commit commands
+    const processedCommand = this.addCoAuthorToGitCommit(strippedCommand);
+    const commandToExecute = processedCommand;
+    const cwd = this.params.directory || this.config.getTargetDir();

-    try {
-      // Add co-author to git commit commands
-      const processedCommand = this.addCoAuthorToGitCommit(strippedCommand);
+    let cumulativeOutput: string | AnsiOutput = '';
+    let lastUpdateTime = Date.now();
+    let isBinaryStream = false;
+    let totalLines = 0;
+    let totalBytes = 0;

-      const shouldRunInBackground = this.params.is_background;
-      let finalCommand = processedCommand;
+    const { result: resultPromise, pid } = await ShellExecutionService.execute(
+      commandToExecute,
+      cwd,
+      (event: ShellOutputEvent) => {
+        let shouldUpdate = false;

-      // On non-Windows, use & to run in background.
-      // On Windows, we don't use start /B because it creates a detached process that
-      // doesn't die when the parent dies. Instead, we rely on the race logic below
-      // to return early while keeping the process attached (detached: false).
-      if (
-        !isWindows &&
-        shouldRunInBackground &&
-        !finalCommand.trim().endsWith('&')
-      ) {
-        finalCommand = finalCommand.trim() + ' &';
-      }
-
-      // On Windows, we rely on the race logic below to handle background tasks.
-      // We just ensure the command string is clean.
-      if (isWindows && shouldRunInBackground) {
-        finalCommand = finalCommand.trim().replace(/&+$/, '').trim();
-      }
-
-      // On non-Windows background commands, wrap with pgrep to capture
-      // subprocess PIDs so we can report them to the user.
-      const commandToExecute =
-        !isWindows && shouldRunInBackground
-          ? (() => {
-              let command = finalCommand.trim();
-              if (!command.endsWith('&')) command += ';';
-              return `{ ${command} }; __code=$?; pgrep -g 0 >${tempFilePath} 2>&1; exit $__code;`;
-            })()
-          : finalCommand;
-
-      const cwd = this.params.directory || this.config.getTargetDir();
-
-      let cumulativeOutput: string | AnsiOutput = '';
-      let lastUpdateTime = Date.now();
-      let isBinaryStream = false;
-      let totalLines = 0;
-      let totalBytes = 0;
-
-      const { result: resultPromise, pid } =
-        await ShellExecutionService.execute(
-          commandToExecute,
-          cwd,
-          (event: ShellOutputEvent) => {
-            let shouldUpdate = false;
-
-            switch (event.type) {
-              case 'data':
-                if (isBinaryStream) break;
-                cumulativeOutput = event.chunk;
-                // Stats are only consumed by the ANSI-output branch below,
-                // so skip the per-chunk accounting for plain string chunks.
-                if (Array.isArray(event.chunk)) {
-                  totalLines = event.chunk.length;
-                  totalBytes = event.chunk.reduce(
-                    (sum, line) =>
-                      sum +
-                      line.reduce(
-                        (ls, token) =>
-                          ls + Buffer.byteLength(token.text, 'utf-8'),
-                        0,
-                      ),
+        switch (event.type) {
+          case 'data':
+            if (isBinaryStream) break;
+            cumulativeOutput = event.chunk;
+            // Stats are only consumed by the ANSI-output branch below,
+            // so skip the per-chunk accounting for plain string chunks.
+            if (Array.isArray(event.chunk)) {
+              totalLines = event.chunk.length;
+              totalBytes = event.chunk.reduce(
+                (sum, line) =>
+                  sum +
+                  line.reduce(
+                    (ls, token) => ls + Buffer.byteLength(token.text, 'utf-8'),
                    0,
-                  );
-                }
-                shouldUpdate = true;
-                break;
-              case 'binary_detected':
-                isBinaryStream = true;
-                cumulativeOutput =
-                  '[Binary output detected. Halting stream...]';
-                shouldUpdate = true;
-                break;
-              case 'binary_progress':
-                isBinaryStream = true;
-                cumulativeOutput = `[Receiving binary output... ${formatMemoryUsage(
-                  event.bytesReceived,
-                )} received]`;
-                if (Date.now() - lastUpdateTime > OUTPUT_UPDATE_INTERVAL_MS) {
-                  shouldUpdate = true;
-                }
-                break;
-              default: {
-                throw new Error('An unhandled ShellOutputEvent was found.');
-              }
+                  ),
+                0,
+              );
            }
-
-            if (shouldUpdate && updateOutput) {
-              if (typeof cumulativeOutput === 'string') {
-                updateOutput(cumulativeOutput);
-              } else {
-                updateOutput({
-                  ansiOutput: cumulativeOutput,
-                  totalLines,
-                  totalBytes,
-                  // Only include timeout when user explicitly set it
-                  ...(this.params.timeout != null && {
-                    timeoutMs: this.params.timeout,
-                  }),
-                });
-              }
-              lastUpdateTime = Date.now();
+            shouldUpdate = true;
+            break;
+          case 'binary_detected':
+            isBinaryStream = true;
+            cumulativeOutput = '[Binary output detected. Halting stream...]';
+            shouldUpdate = true;
+            break;
+          case 'binary_progress':
+            isBinaryStream = true;
+            cumulativeOutput = `[Receiving binary output... ${formatMemoryUsage(
+              event.bytesReceived,
+            )} received]`;
+            if (Date.now() - lastUpdateTime > OUTPUT_UPDATE_INTERVAL_MS) {
+              shouldUpdate = true;
            }
-          },
-          combinedSignal,
-          shouldRunInBackground
-            ? false
-            : this.config.getShouldUseNodePtyShell(),
-          shellExecutionConfig ?? {},
-        );
-
-      if (pid && setPidCallback) {
-        setPidCallback(pid);
-      }
-
-      // On Windows, background commands rely on early return since there's
-      // no & backgrounding or pgrep. Awaiting would block until completion.
-      if (shouldRunInBackground && isWindows) {
-        const pidMsg = pid ? ` PID: ${pid}` : '';
-        const killHint = ' (Use taskkill /F /T /PID <pid> to stop)';
-
-        return {
-          llmContent: `Background command started.${pidMsg}${killHint}`,
-          returnDisplay: `Background command started.${pidMsg}${killHint}`,
-        };
-      }
-
-      const result = await resultPromise;
-
-      if (shouldRunInBackground) {
-        // Read subprocess PIDs captured by the pgrep wrapper (non-Windows only)
-        const backgroundPIDs: number[] = [];
-        if (!isWindows) {
-          if (fs.existsSync(tempFilePath)) {
-            const pgrepLines = fs
-              .readFileSync(tempFilePath, 'utf8')
-              .split(EOL)
-              .filter(Boolean);
-            for (const line of pgrepLines) {
-              if (!/^\d+$/.test(line)) {
-                debugLogger.warn(`pgrep: ${line}`);
-                continue;
-              }
-              const bgPid = Number(line);
-              if (bgPid !== result.pid) {
-                backgroundPIDs.push(bgPid);
-              }
-            }
-          } else if (!signal.aborted) {
-            debugLogger.warn('missing pgrep output');
+            break;
+          default: {
+            throw new Error('An unhandled ShellOutputEvent was found.');
          }
        }

-        const bgPidMsg =
-          backgroundPIDs.length > 0
-            ? ` PIDs: ${backgroundPIDs.join(', ')}`
-            : pid
-              ? ` PID: ${pid}`
-              : '';
-        const killHint = ' (Use kill <pid> to stop)';
-
-        return {
-          llmContent: `Background command started.${bgPidMsg}${killHint}`,
-          returnDisplay: `Background command started.${bgPidMsg}${killHint}`,
-        };
-      }
-
-      let llmContent = '';
-      if (result.aborted) {
-        // Check if it was a timeout or user cancellation
-        const wasTimeout =
-          !this.params.is_background &&
-          effectiveTimeout &&
-          combinedSignal.aborted &&
-          !signal.aborted;
-
-        if (wasTimeout) {
-          llmContent = `Command timed out after ${effectiveTimeout}ms before it could complete.`;
-          if (result.output.trim()) {
-            llmContent += ` Below is the output before it timed out:\n${result.output}`;
+        if (shouldUpdate && updateOutput) {
+          if (typeof cumulativeOutput === 'string') {
+            updateOutput(cumulativeOutput);
          } else {
-            llmContent += ' There was no output before it timed out.';
-          }
-        } else {
-          llmContent =
-            'Command was cancelled by user before it could complete.';
-          if (result.output.trim()) {
-            llmContent += ` Below is the output before it was cancelled:\n${result.output}`;
-          } else {
-            llmContent += ' There was no output before it was cancelled.';
+            updateOutput({
+              ansiOutput: cumulativeOutput,
+              totalLines,
+              totalBytes,
+              // Only include timeout when user explicitly set it
+              ...(this.params.timeout != null && {
+                timeoutMs: this.params.timeout,
+              }),
+            });
          }
+          lastUpdateTime = Date.now();
        }
-      } else {
-        // Create a formatted error string for display, replacing the wrapper command
-        // with the user-facing command.
-        const finalError = result.error
-          ? result.error.message.replace(commandToExecute, this.params.command)
-          : '(none)';
+      },
+      combinedSignal,
+      this.config.getShouldUseNodePtyShell(),
+      shellExecutionConfig ?? {},
+    );

-        llmContent = [
-          `Command: ${this.params.command}`,
-          `Directory: ${this.params.directory || '(root)'}`,
-          `Output: ${result.output || '(empty)'}`,
-          `Error: ${finalError}`, // Use the cleaned error string.
-          `Exit Code: ${result.exitCode ?? '(none)'}`,
-          `Signal: ${result.signal ?? '(none)'}`,
-          `Process Group PGID: ${result.pid ?? '(none)'}`,
-        ].join('\n');
-      }
+    if (pid && setPidCallback) {
+      setPidCallback(pid);
+    }

-      let returnDisplayMessage = '';
-      if (this.config.getDebugMode()) {
-        returnDisplayMessage = llmContent;
-      } else {
+    const result = await resultPromise;
+
+    let llmContent = '';
+    if (result.aborted) {
+      // Check if it was a timeout or user cancellation
+      const wasTimeout =
+        effectiveTimeout && combinedSignal.aborted && !signal.aborted;
+
+      if (wasTimeout) {
+        llmContent = `Command timed out after ${effectiveTimeout}ms before it could complete.`;
        if (result.output.trim()) {
-          returnDisplayMessage = result.output;
+          llmContent += ` Below is the output before it timed out:\n${result.output}`;
        } else {
-          if (result.aborted) {
-            // Check if it was a timeout or user cancellation
-            const wasTimeout =
-              !this.params.is_background &&
-              effectiveTimeout &&
-              combinedSignal.aborted &&
-              !signal.aborted;
-
-            returnDisplayMessage = wasTimeout
-              ? `Command timed out after ${effectiveTimeout}ms.`
-              : 'Command cancelled by user.';
-          } else if (result.signal) {
-            returnDisplayMessage = `Command terminated by signal: ${result.signal}`;
-          } else if (result.error) {
-            returnDisplayMessage = `Command failed: ${getErrorMessage(
-              result.error,
-            )}`;
-          } else if (result.exitCode !== null && result.exitCode !== 0) {
-            returnDisplayMessage = `Command exited with code: ${result.exitCode}`;
-          }
-          // If output is empty and command succeeded (code 0, no error/signal/abort),
-          // returnDisplayMessage will remain empty, which is fine.
+          llmContent += ' There was no output before it timed out.';
+        }
+      } else {
+        llmContent = 'Command was cancelled by user before it could complete.';
+        if (result.output.trim()) {
+          llmContent += ` Below is the output before it was cancelled:\n${result.output}`;
+        } else {
+          llmContent += ' There was no output before it was cancelled.';
        }
      }
+    } else {
+      // Create a formatted error string for display, replacing the wrapper command
+      // with the user-facing command.
+      const finalError = result.error
+        ? result.error.message.replace(commandToExecute, this.params.command)
+        : '(none)';

-      // Truncate large output and save full content to a temp file.
-      if (typeof llmContent === 'string') {
-        const truncatedResult = await truncateToolOutput(
-          this.config,
-          ShellTool.Name,
-          llmContent,
-        );
+      llmContent = [
+        `Command: ${this.params.command}`,
+        `Directory: ${this.params.directory || '(root)'}`,
+        `Output: ${result.output || '(empty)'}`,
+        `Error: ${finalError}`, // Use the cleaned error string.
+        `Exit Code: ${result.exitCode ?? '(none)'}`,
+        `Signal: ${result.signal ?? '(none)'}`,
+        `Process Group PGID: ${result.pid ?? '(none)'}`,
+      ].join('\n');
+    }

-        if (truncatedResult.outputFile) {
-          llmContent = truncatedResult.content;
-          returnDisplayMessage +=
-            (returnDisplayMessage ? '\n' : '') +
-            `Output too long and was saved to: ${truncatedResult.outputFile}`;
+    let returnDisplayMessage = '';
+    if (this.config.getDebugMode()) {
+      returnDisplayMessage = llmContent;
+    } else {
+      if (result.output.trim()) {
+        returnDisplayMessage = result.output;
+      } else {
+        if (result.aborted) {
+          // Check if it was a timeout or user cancellation
+          const wasTimeout =
+            effectiveTimeout && combinedSignal.aborted && !signal.aborted;
+
+          returnDisplayMessage = wasTimeout
+            ? `Command timed out after ${effectiveTimeout}ms.`
+            : 'Command cancelled by user.';
+        } else if (result.signal) {
+          returnDisplayMessage = `Command terminated by signal: ${result.signal}`;
+        } else if (result.error) {
+          returnDisplayMessage = `Command failed: ${getErrorMessage(
+            result.error,
+          )}`;
+        } else if (result.exitCode !== null && result.exitCode !== 0) {
+          returnDisplayMessage = `Command exited with code: ${result.exitCode}`;
        }
-      }
-
-      const executionError = result.error
-        ? {
-            error: {
-              message: result.error.message,
-              type: ToolErrorType.SHELL_EXECUTE_ERROR,
-            },
-          }
-        : {};
-
-      return {
-        llmContent,
-        returnDisplay: returnDisplayMessage,
-        ...executionError,
-      };
-    } finally {
-      if (fs.existsSync(tempFilePath)) {
-        fs.unlinkSync(tempFilePath);
+        // If output is empty and command succeeded (code 0, no error/signal/abort),
+        // returnDisplayMessage will remain empty, which is fine.
      }
    }
+
+    // Truncate large output and save full content to a temp file.
+    if (typeof llmContent === 'string') {
+      const truncatedResult = await truncateToolOutput(
+        this.config,
+        ShellTool.Name,
+        llmContent,
+      );
+
+      if (truncatedResult.outputFile) {
+        llmContent = truncatedResult.content;
+        returnDisplayMessage +=
+          (returnDisplayMessage ? '\n' : '') +
+          `Output too long and was saved to: ${truncatedResult.outputFile}`;
+      }
+    }
+
+    const executionError = result.error
+      ? {
+          error: {
+            message: result.error.message,
+            type: ToolErrorType.SHELL_EXECUTE_ERROR,
+          },
+        }
+      : {};
+
+    return {
+      llmContent,
+      returnDisplay: returnDisplayMessage,
+      ...executionError,
+    };
+  }
+
+  /**
+   * Background-execution path: spawn the command into a managed registry
+   * entry instead of detaching with `&`. Output streams to a per-shell file
+   * the agent can `Read`; cancellation flows through the entry's
+   * AbortController; the registry's terminal status is set when the process
+   * exits. Returns immediately so the agent's turn isn't blocked.
+   */
+  private async executeBackground(
+    signal: AbortSignal,
+    shellExecutionConfig?: ShellExecutionConfig,
+  ): Promise<ToolResult> {
+    const strippedCommand = stripShellWrapper(this.params.command);
+    // Strip a single bare trailing `&` (the bash background operator) before
+    // spawn: bash treats it as background-detach, exits the wrapper
+    // immediately, and the real child outlives the wrapper — the registry
+    // would settle as `completed` while the shell is still running, and
+    // chunked output would land on a closed stream. The managed path is
+    // itself the backgrounding mechanism, so the trailing `&` is redundant.
+    //
+    // Deliberately precise: do not touch `&&` (logical AND), `\&` (escaped
+    // literal `&`), or commands without a trailing `&`. Earlier `\s*&+\s*$`
+    // was both too greedy (it ate `&&` and `\&`) and a ReDoS hazard on
+    // long all-`&` inputs. Plain string checks here are linear and clearer
+    // than a lookbehind regex.
+    const noTrailingAmp = stripTrailingBackgroundAmp(strippedCommand);
+    if (noTrailingAmp !== strippedCommand) {
+      debugLogger.warn(
+        'Stripped trailing & from background shell command — managed path handles backgrounding',
+      );
+    }
+    const processedCommand = this.addCoAuthorToGitCommit(noTrailingAmp);
+    const cwd = this.params.directory || this.config.getTargetDir();
+
+    // Output goes under the project temp dir (which `ReadFileTool`
+    // auto-allows by default), so the LLM can `Read` the captured output
+    // without bouncing off a permission prompt — important because
+    // background-agent contexts can't surface interactive prompts.
+    const outputDir = path.join(
+      this.config.storage.getProjectTempDir(),
+      'background-shells',
+      this.config.getSessionId(),
+    );
+    fs.mkdirSync(outputDir, { recursive: true });
+
+    const shellId = `bg_${crypto.randomBytes(4).toString('hex')}`;
+    const outputPath = path.join(outputDir, `shell-${shellId}.output`);
+
+    // Background shells are explicitly independent of the current turn:
+    // the user pressing Ctrl+C on a turn (which aborts `signal`) should
+    // NOT kill a long-running dev server / watcher they intentionally
+    // backgrounded. Cancellation flows only through the entry's own
+    // AbortController, driven by future `task_stop` integration (#3471).
+    // The `signal` parameter is still honored for the synchronous early
+    // return below (don't even spawn if the agent already aborted), but
+    // we deliberately do not forward it.
+    const entryAc = new AbortController();
+
+    const outputStream = fs.createWriteStream(outputPath, { flags: 'w' });
+    // Without an 'error' listener, a write failure (disk full, permission
+    // change, fs going away) would surface as an uncaught exception and
+    // kill the entire CLI session. Log + drop is the sane default — the
+    // process keeps running, the registry still settles via resultPromise.
+    outputStream.on('error', (err) => {
+      debugLogger.warn(
+        `background shell ${shellId} output write error: ${err.message}`,
+      );
+    });
+
+    const startTime = Date.now();
+    const entry: BackgroundShellEntry = {
+      shellId,
+      command: processedCommand,
+      cwd,
+      status: 'running',
+      startTime,
+      outputPath,
+      abortController: entryAc,
+    };
+
+    const { result: resultPromise, pid } = await ShellExecutionService.execute(
+      processedCommand,
+      cwd,
+      (event: ShellOutputEvent) => {
+        if (event.type === 'data' && typeof event.chunk === 'string') {
+          // Strip ANSI escape codes (color, cursor-move, clear-screen) before
+          // writing — agents read the file as plain text, and dev servers /
+          // build tools spam plenty of escape sequences that would render as
+          // garbage. Costs ~one regex per chunk; cheap relative to disk I/O.
+          outputStream.write(stripAnsi(event.chunk));
+        }
+        // ANSI array chunks and binary streams are not written to the output
+        // file: agents read the file as plain text and binary spam would be
+        // unhelpful.
+      },
+      entryAc.signal,
+      // Background shells are non-interactive by design — no terminal to
+      // attach a PTY to, no human to type at it. Force the child_process
+      // path so we don't pull in node-pty for fire-and-forget commands.
+      false,
+      shellExecutionConfig ?? {},
+      // Stream stdout/stderr through to the output file as chunks arrive.
+      // Default child_process mode buffers until exit, which would leave
+      // dev-server / watcher output files empty until the process dies.
+      { streamStdout: true },
+    );
+
+    if (pid !== undefined) entry.pid = pid;
+    const registry = this.config.getBackgroundShellRegistry();
+    registry.register(entry);
+
+    // Settle in the background — do NOT await here, the agent should be
+    // unblocked immediately.
+    void resultPromise.then(
+      (result) => {
+        outputStream.end();
+        const endTime = Date.now();
+        if (entryAc.signal.aborted) {
+          if (registry.get(shellId)?.status === 'running') {
+            registry.cancel(shellId, endTime);
+          }
+        } else if (
+          result.error ||
+          (result.exitCode !== null && result.exitCode !== 0) ||
+          result.signal !== null
+        ) {
+          // Non-zero exit / killed by signal / spawn error all count as failed.
+          // Treating them as `completed` would let `/tasks` (and any future
+          // model-facing notification) misreport a failed `npm test` or
+          // `false` command as a success.
+          const reason = result.error
+            ? result.error.message
+            : result.signal !== null
+              ? `terminated by signal ${result.signal}`
+              : `exited with code ${result.exitCode}`;
+          registry.fail(shellId, reason, endTime);
+        } else {
+          registry.complete(shellId, result.exitCode ?? 0, endTime);
+        }
+      },
+      (err) => {
+        outputStream.end();
+        registry.fail(shellId, getErrorMessage(err), Date.now());
+      },
+    );
+
+    const pidLine = pid !== undefined ? `pid: ${pid}\n` : '';
+    return {
+      llmContent:
+        `Background shell started.\n` +
+        `id: ${shellId}\n` +
+        pidLine +
+        `output file: ${outputPath}\n` +
+        `Use the /tasks command to list and inspect background shells, or Read the output file directly.`,
+      returnDisplay: `Background shell ${shellId} started${pid !== undefined ? ` (pid ${pid})` : ''}.`,
+    };
  }

  private addCoAuthorToGitCommit(command: string): string {