fix(core): liveness check + throw-safe abort-reason read + encoding-aware PTY snapshot (resolve 4th review pass)

Resolves 6 threads on PR-1 of #3831 — 1 Critical + 1 real bug + 2
quality + 2 test-coverage:

- **[Critical] `getShellAbortReasonKind` throw-safe property read.**
  Previous form read `reason.kind` after only checking that `kind` is
  an own property. An own accessor that throws (or a Proxy with a
  trapping getter) would throw before the helper reached either the
  cancel kill path or the background promote path. Abort handlers are
  dispatched async and not awaited by AbortSignal, so a leaked throw
  here would have left the shell process alive instead of being killed
  on cancel — quietly. Wrapped the property read in try/catch with a
  fall-back to the safe 'cancel' kill behavior.

- **Real bug: child_process post-exit race in background-promote**
  (`performBackgroundPromote`). The child may have already exited but
  the 'exit' event hasn't reached our handler yet (Node delivers
  events on the next microtask). Promoting in that window would
  detach our exit listener and report `promoted: true` for a process
  that's already dead — the caller would hold an inert pid expecting
  to take over. Now we read `child.exitCode` / `child.signalCode`
  before detaching: if either is non-null, fall through and let the
  pending exit handler resolve normally with the real exit info.
  Mirrored mock setup so `exitCode` / `signalCode` default to `null`
  (matching real ChildProcess) instead of `undefined`.

- **PTY snapshot: re-decode + replay (mirror exit-path encoding).**
  The promoted snapshot was serializing `headlessTerminal` directly,
  which was fed by a streaming decoder initialized from the
  first-chunk encoding heuristic. When early output is ASCII-only but
  later output is in a different encoding (GBK / Shift-JIS / etc.),
  this produces mojibake — and the normal exit path doesn't, because
  it re-decodes `finalBuffer` with `getCachedEncodingForBuffer` and
  replays through a fresh terminal. Now mirrors that logic so
  `result.output` shape matches across the two paths. Direct-serialize
  remains as a last-ditch fallback if replay throws.

- **Switch `default` no longer emits a runtime warn.** Reviewer noted
  the helper's whitelist made the `default: { _exhaustive: never }`
  branch unreachable at runtime — the `debugLogger.warn` in it could
  never fire. Kept the `_: never = kind` type assertion (so a future
  ShellAbortReason variant forces a TS error here, directing the
  developer to extend BOTH the helper's whitelist AND add a `case`),
  removed the unreachable warn. Added a comment that the assertion is
  the static-only safety net the union expansion would trigger.

- **Direct unit tests for `getShellAbortReasonKind`** (8 cases). The
  helper's prototype-pollution defense is the main reason it exists;
  if `hasOwnProperty` is accidentally removed the regression would
  silently send `abortController.abort({})` (any plain reason) into
  the promote path. Exported the helper and added direct tests for:
  null / undefined, non-object, empty object (no own kind), prototype-
  only kind (pollution), unknown kind value, throwing accessor, Proxy
  trap, and the two happy paths.

- **`removeListener` regression guard.** The fix to call
  `ptyProcess.removeListener('error', ...)` instead of `.off(...)`
  matters because `@lydell/node-pty`'s IPty interface only exposes
  `removeListener` — `.off()` throws TypeError on a real PTY but the
  EventEmitter mock tolerates both. Added a test that spies on both
  methods and asserts the production code uses `removeListener` for
  the 'error' event, so a future swap back to `.off()` regresses
  loudly under the mock instead of silently.

68 / 68 tests pass (58 baseline + 9 helper boundary + 1 removeListener
guard + 1 post-exit race); tsc clean; ESLint clean.
This commit is contained in:
Shaojin Wen 2026-05-06 15:09:02 +08:00
parent 094fff09ca
commit 4cc558b3d5
2 changed files with 256 additions and 18 deletions

View file

@ -21,7 +21,10 @@ import type {
ShellAbortReason,
ShellOutputEvent,
} from './shellExecutionService.js';
import { ShellExecutionService } from './shellExecutionService.js';
import {
getShellAbortReasonKind,
ShellExecutionService,
} from './shellExecutionService.js';
import type { AnsiOutput } from '../utils/terminalSerializer.js';
const { Terminal } = pkg;
@ -700,6 +703,40 @@ describe('ShellExecutionService', () => {
expect(exitDisposableStub.dispose).toHaveBeenCalled();
});
it("post-promotion: ptyProcess error listener is removed via 'removeListener', NOT 'off' (regression guard for @lydell/node-pty)", async () => {
// node EventEmitter exposes both `off` (Node 10+) and the legacy
// `removeListener`, but @lydell/node-pty's IPty interface only
// surfaces `removeListener` — calling `.off(...)` on a real PTY
// throws TypeError. Pin that the production code path uses
// `removeListener` so a future refactor swapping to `.off()`
// doesn't silently regress under the EventEmitter mock (which
// tolerates both).
const removeListenerSpy = vi.spyOn(mockPtyProcess, 'removeListener');
const offSpy = vi.spyOn(mockPtyProcess, 'off');
const { result } = await simulateExecution(
'tail -f /tmp/never.log',
(_pty, abortController) => {
abortController.abort({
kind: 'background',
shellId: 'bg_test123',
} satisfies ShellAbortReason);
},
);
expect(result.promoted).toBe(true);
// The 'error' handler is removed via legacy API; `.off` must not
// appear in the production teardown path.
expect(removeListenerSpy).toHaveBeenCalledWith(
'error',
expect.any(Function),
);
const offErrorCalls = offSpy.mock.calls.filter(
([event]) => event === 'error',
);
expect(offErrorCalls).toEqual([]);
});
it('post-promotion: PTY exit does NOT re-resolve the result (already resolved with promoted)', async () => {
// Pin: even if the still-running child later exits naturally and the
// caller's own exit listener fires, our foreground result Promise
@ -1057,6 +1094,22 @@ describe('ShellExecutionService child_process fallback', () => {
value: 12345,
configurable: true,
});
// Mirror real Node ChildProcess: `exitCode` / `signalCode` are `null`
// while the child is alive and become a number / signal name on
// exit. The background-promote liveness guard reads these to detect
// an exit that fired between abort dispatch and the abort handler
// run, and a default of `undefined` would mistakenly look terminal
// and skip the promote.
Object.defineProperty(mockChildProcess, 'exitCode', {
value: null,
writable: true,
configurable: true,
});
Object.defineProperty(mockChildProcess, 'signalCode', {
value: null,
writable: true,
configurable: true,
});
mockCpSpawn.mockReturnValue(mockChildProcess);
});
@ -1350,6 +1403,45 @@ describe('ShellExecutionService child_process fallback', () => {
expect(result.output).not.toContain('post-promote-stderr');
});
it('post-exit race: background-promote refuses if child is already terminal (exitCode/signalCode non-null)', async () => {
// Race window: the child may have exited (exitCode set) but the
// 'exit' event hasn't reached our handler yet because Node delivers
// child_process events on the next microtask. Promoting in that
// window would detach our exit listener and report `promoted: true`
// for a process that's already dead — the caller would hold an
// inert pid expecting to take over. Production code reads
// exitCode / signalCode before detaching; if either is non-null,
// it falls through and lets the pending exit handler resolve
// normally with the real exit info.
mockPlatform.mockReturnValue('linux');
const { result } = await simulateExecution(
'fast-and-cancelled',
(cp, abortController) => {
// Simulate the race: pretend the child has already exited
// (exitCode set on the ChildProcess) but the 'exit' event
// emit is queued behind the abort dispatch.
Object.defineProperty(cp, 'exitCode', {
value: 0,
writable: true,
configurable: true,
});
abortController.abort({
kind: 'background',
shellId: 'bg_test123',
} satisfies ShellAbortReason);
// Now drain the pending exit + close events; the normal
// exit path should resolve the result.
cp.emit('exit', 0, null);
cp.emit('close', 0, null);
},
);
// Result is the normal exit shape, not the promoted shape.
expect(result.promoted).toBeUndefined();
expect(result.aborted).toBe(true); // abortSignal.aborted is still true
expect(result.exitCode).toBe(0);
});
it('post-promotion: child exit does NOT re-resolve the result with a non-promoted shape', async () => {
mockPlatform.mockReturnValue('linux');
// Pin: even if the still-running child later exits naturally and the
@ -1665,3 +1757,77 @@ describe('ShellExecutionService execution method selection', () => {
expect(result.executionMethod).toBe('child_process');
});
});
describe('getShellAbortReasonKind (defensive abort-reason read)', () => {
it("returns 'cancel' for null reason (e.g. plain abortController.abort())", () => {
expect(getShellAbortReasonKind(null)).toBe('cancel');
expect(getShellAbortReasonKind(undefined)).toBe('cancel');
});
it("returns 'cancel' for non-object reasons (string / number / DOMException)", () => {
expect(getShellAbortReasonKind('background')).toBe('cancel');
expect(getShellAbortReasonKind(42)).toBe('cancel');
expect(getShellAbortReasonKind(true)).toBe('cancel');
// DOMException-like object — not the real DOMException constructor in
// the test runtime, but the principle is the same: a non-discriminated
// object reason without an own `kind` falls back to cancel.
expect(getShellAbortReasonKind(new Error('aborted'))).toBe('cancel');
});
it("returns 'cancel' for an empty object (no own kind)", () => {
expect(getShellAbortReasonKind({})).toBe('cancel');
});
it("returns 'cancel' when 'kind' lives only on the prototype (pollution defense)", () => {
const polluted: Record<string, unknown> = Object.create({
kind: 'background',
});
// hasOwnProperty('kind') is false → helper rejects the prototype-only kind
expect(getShellAbortReasonKind(polluted)).toBe('cancel');
});
it("returns 'cancel' for an unknown kind value (typo / future-untyped variant)", () => {
expect(getShellAbortReasonKind({ kind: 'suspend' })).toBe('cancel');
expect(getShellAbortReasonKind({ kind: 'BACKGROUND' })).toBe('cancel');
expect(getShellAbortReasonKind({ kind: 42 })).toBe('cancel');
});
it("returns 'cancel' when reading 'kind' throws (accessor / Proxy trap)", () => {
const throwingReason = Object.defineProperty({}, 'kind', {
enumerable: true,
configurable: true,
get() {
throw new Error('accessor blew up');
},
});
expect(getShellAbortReasonKind(throwingReason)).toBe('cancel');
const proxyReason = new Proxy(
{},
{
get(_target, prop) {
if (prop === 'kind') throw new Error('proxy trap blew up');
return undefined;
},
getOwnPropertyDescriptor(_target, prop) {
if (prop === 'kind') {
return { configurable: true, enumerable: true, value: 'unused' };
}
return undefined;
},
},
);
expect(getShellAbortReasonKind(proxyReason)).toBe('cancel');
});
it("returns 'background' for the canonical happy-path reason", () => {
expect(getShellAbortReasonKind({ kind: 'background' })).toBe('background');
expect(
getShellAbortReasonKind({ kind: 'background', shellId: 'bg_x' }),
).toBe('background');
});
it("returns 'cancel' for the canonical cancel reason", () => {
expect(getShellAbortReasonKind({ kind: 'cancel' })).toBe('cancel');
});
});

View file

@ -44,18 +44,34 @@ const PROMOTE_DRAIN_TIMEOUT_MS = 200;
* would force the kill path through the promote branch on any plain
* `abortController.abort({})`. Lifecycle/safety branches deserve the
* extra check.
* - Wrap the property read in try/catch an own getter or a `Proxy`
* trap may throw during inspection. A throw here would propagate up
* past the abort handler (which is dispatched async and not awaited
* by AbortSignal), leaving the shell process alive instead of being
* killed on cancel. We swallow the throw and fall back to 'cancel'.
* - Whitelist the value against the known union anything else (typos,
* future-untyped variants) defaults to `'cancel'` so the historical
* kill behavior is preserved as the safe fallback.
*
* Exported for direct unit testing of all six edge cases (null /
* non-object / `{}` no own kind / prototype-only kind / unknown kind /
* throwing accessor) the integration tests only exercise the three
* happy-path inputs.
*/
function getShellAbortReasonKind(reason: unknown): ShellAbortReason['kind'] {
export function getShellAbortReasonKind(
reason: unknown,
): ShellAbortReason['kind'] {
if (
reason !== null &&
typeof reason === 'object' &&
Object.prototype.hasOwnProperty.call(reason, 'kind')
) {
const kind = (reason as { kind?: unknown }).kind;
if (kind === 'background' || kind === 'cancel') return kind;
try {
const kind = (reason as { kind?: unknown }).kind;
if (kind === 'background' || kind === 'cancel') return kind;
} catch {
// Throwing accessor / Proxy trap — fall back to safe kill below.
}
}
return 'cancel';
}
@ -604,6 +620,25 @@ export class ShellExecutionService {
const performBackgroundPromote = (): void => {
if (!child.pid || exited) return;
// Race guard: the child may have already exited but the 'exit'
// event hasn't reached our handler yet (Node delivers
// child_process events on the next microtask). Promoting in
// that window would detach our exit listener, leak the
// already-terminal exit code, and report `promoted: true` to
// the caller for a process that's already dead — they'd hold
// an inert pid expecting to take over. Check exitCode /
// signalCode before detaching: if either is non-null the
// child is gone, so leave the listeners alone and let the
// pending exit handler fire normally with the real exit info.
if (child.exitCode !== null || child.signalCode !== null) {
debugLogger.debug(
`Background-promote requested for pid ${child.pid} but child ` +
`is already terminal (exitCode=${child.exitCode}, ` +
`signalCode=${child.signalCode}); falling through to the ` +
`normal exit-handled resolution.`,
);
return;
}
// Detach our listeners (so post-promote output doesn't leak
// into the foreground onOutputEvent or the now-finalized text
// decoder), drop the child from our active set (so cleanup()
@ -688,11 +723,17 @@ export class ShellExecutionService {
await performCancelKill();
return;
default: {
// Unreachable at runtime: getShellAbortReasonKind whitelists
// the return to the union members, so this branch only
// exists to force a TS error if the `ShellAbortReason` union
// ever gains a new variant — that error directs the
// developer to (1) extend the helper's whitelist and
// (2) add a `case` here. Without this exhaustiveness check
// the helper's whitelist and the switch could drift apart
// silently when the union grows.
const _exhaustive: never = kind;
debugLogger.warn(
`Unknown ShellAbortReason kind, falling back to cancel/kill: ${String(_exhaustive)}`,
);
await performCancelKill();
return _exhaustive;
}
}
};
@ -1171,17 +1212,42 @@ export class ShellExecutionService {
const finalBuffer = Buffer.concat(outputChunks);
let snapshot = '';
try {
snapshot = serializeTerminalToText(headlessTerminal) ?? '';
// Mirror the normal exit path's snapshot logic: re-decode
// the full buffer with the final encoding (the streaming
// decoder fed `headlessTerminal` from a first-chunk
// heuristic, which can mis-detect when early output is
// ASCII-only but later output is in a different encoding,
// e.g. GBK). Then replay through a fresh terminal so ANSI
// sequences land at the right cursor position. Falling back
// to `serializeTerminalToText(headlessTerminal)` would risk
// mojibake on the promoted snapshot that the normal exit
// path doesn't produce.
if (isStreamingRawContent) {
const finalEncoding = getCachedEncodingForBuffer(finalBuffer);
const decodedOutput = new TextDecoder(finalEncoding).decode(
finalBuffer,
);
snapshot = await replayTerminalOutput(decodedOutput, cols, rows);
} else {
snapshot = serializeTerminalToText(headlessTerminal) ?? '';
}
} catch (serErr) {
// Best-effort snapshot — terminal serialization may fail if
// the PTY is mid-render. Empty snapshot is acceptable since
// the caller has rawOutput, but log so the failure leaves a
// diagnostic trail (otherwise an empty `output` is
// indistinguishable from "command produced no output").
// Best-effort snapshot — re-decode + replay may fail (encoding
// detection error, terminal write throw, etc.). Empty snapshot
// is acceptable since the caller has rawOutput, but log so
// the failure leaves a diagnostic trail (otherwise an empty
// `output` is indistinguishable from "command produced no
// output"). Try the simpler direct-serialize path as a
// last-ditch fallback before giving up.
debugLogger.warn(
`Background-promote snapshot serialization failed: ${serErr instanceof Error ? serErr.message : String(serErr)}. ` +
`Result.output will be empty; caller should fall back to rawOutput.`,
`Background-promote snapshot replay failed: ${serErr instanceof Error ? serErr.message : String(serErr)}. ` +
`Falling back to direct headlessTerminal serialize; if that also fails, output stays empty.`,
);
try {
snapshot = serializeTerminalToText(headlessTerminal) ?? '';
} catch {
// Both paths failed — leave snapshot empty.
}
}
resolve({
rawOutput: finalBuffer,
@ -1234,11 +1300,17 @@ export class ShellExecutionService {
await performCancelKill();
return;
default: {
// Unreachable at runtime: getShellAbortReasonKind whitelists
// the return to the union members, so this branch only
// exists to force a TS error if the `ShellAbortReason` union
// ever gains a new variant — that error directs the
// developer to (1) extend the helper's whitelist and
// (2) add a `case` here. Without this exhaustiveness check
// the helper's whitelist and the switch could drift apart
// silently when the union grows.
const _exhaustive: never = kind;
debugLogger.warn(
`Unknown ShellAbortReason kind, falling back to cancel/kill: ${String(_exhaustive)}`,
);
await performCancelKill();
return _exhaustive;
}
}
};