fix(core,cli): 7 review findings on PR #4151 round 8

# Critical findings

[#3264638738] Sub-agent AUTO override stripped parent's shared PM with
no restore — DEFAULT-mode parent spawning an AUTO sub-agent silently
lost its dangerous allow rules forever (until mode toggle).

Fix: change `createApprovalModeOverride` to return `{config, cleanup}`.
The cleanup invokes `restoreDangerousRules()` if and only if this
override was responsible for the strip (parent was not already in
AUTO at override time and hasn't entered AUTO during the run). All 3
callers (agent.ts foreground + bg + fork-async, background-agent-
resume.ts, forkedAgent.ts) updated with cleanup in their existing
finally blocks. Outer catch in agent.ts also invokes cleanup so an
exception between override creation and the inner finallys doesn't
leak strip state.

[#3264638739] acceptEdits fast-path auto-approved writes to
`.git/hooks/`, `.husky/`, `package.json`, `.npmrc` etc — all paths
that execute code on subsequent tooling operations (git commit, npm
install, CI) were bypassing the classifier via the workspace-edit
fast-path. Hostile AGENTS.md → write hook → next git commit runs
arbitrary code.

Fix: PERSISTENCE_PATH_PATTERNS blocklist in passesAcceptEditsFastPath.
Edits to these paths fall through to the classifier (or to an
explicit user allow rule). Scope: code-execution surfaces only
(`.git/`, `.husky/`, `package.json`, `.npmrc`, Makefile/justfile/
Taskfile, `.github/workflows/`) — not arbitrary "sensitive" paths.

[#3264638748] Classifier ALLOW path had zero observability — operator
investigating "why was this dangerous command allowed" had no audit
trail.

Fix: `debugLogger.debug` (NOT info — skill filter 5 says no
always-info on happy paths) on stage-1 ALLOW and stage-2 ALLOW/BLOCK
paths. Off by default, grep-able when investigating.

# Suggestions

[#3264638759] ~80 lines of switch(decision.via) + denial-state updates
duplicated between coreToolScheduler.ts and ACP Session.ts.

Fix: extract `applyAutoModeDecision(decision, config, denialState)
-> AutoModeOutcome` in autoMode.ts. Both callers reduce to a small
switch on the outcome.kind (`approved` / `blocked` / `fallback`).
Single source of truth for the AUTO decision-handling protocol; drift
between CLI and ACP paths is now impossible at the structural level.

[#3264638761] Magic `40` hardcoded in scheduler + Session + transcript
builder.

Fix: export MAX_TRANSCRIPT_MESSAGES from classifier-transcript.ts,
import in both call sites.

[#3264638767] auto-mode.md promised 200-char per-entry / 50 entries
per-section caps for user hints; code in formatSection enforced
neither. Hostile workspace settings could bloat classifier system
prompt and overflow fast-model context.

Fix: enforce both caps in formatSection. Constants exported
(MAX_USER_HINT_LENGTH, MAX_USER_HINTS_PER_SECTION).

# Test coverage gaps (top-level)

[Test coverage] sanitizeClassifierReason, shouldRunAutoModeForCall,
and MAX_TRANSCRIPT_MESSAGES truncation had zero coverage.

Fix: 7 new test cases in classifier.test.ts (sanitizer), 5 cases in
autoMode.test.ts (gate function), 3 cases in classifier-transcript.
test.ts (truncation behavior). Total +15 assertions on security-
critical surfaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
LaZzyMan 2026-05-19 17:55:36 +08:00
parent 0bac5568f3
commit 1312d5749e
15 changed files with 488 additions and 120 deletions

View file

@ -60,13 +60,11 @@ import {
isPlanModeBlocked,
abortGoalForStopHookCap,
formatStopHookBlockingCapWarning,
applyAutoModeDecision,
evaluateAutoMode,
formatClassifierBlockMessage,
isApproveOutcome,
recordAllow,
recordBlock,
MAX_TRANSCRIPT_MESSAGES,
recordFallbackApprove,
recordUnavailable,
shouldFallback,
shouldRunAutoModeForCall,
} from '@qwen-code/qwen-code-core';
@ -1923,13 +1921,15 @@ export class Session implements SessionContext {
// existing manual-approval flow below.
if (!autoModeAllowed && shouldRunAutoModeForCall(approvalMode, fc.name)) {
const denialState = this.config.getAutoModeDenialState();
// `buildClassifierContents` retains only the most recent 40
// messages; ask the chat client for exactly that tail rather
// than triggering a `structuredClone` of the whole session
// on every non-fast-path AUTO call. Parallels
// coreToolScheduler.ts.
// `buildClassifierContents` retains only the most recent
// MAX_TRANSCRIPT_MESSAGES messages; ask the chat client for
// exactly that tail rather than triggering a `structuredClone`
// of the whole session on every non-fast-path AUTO call.
// Parallels coreToolScheduler.ts.
const messages =
this.config.getGeminiClient?.()?.getHistoryTail(40, false) ?? [];
this.config
.getGeminiClient?.()
?.getHistoryTail(MAX_TRANSCRIPT_MESSAGES, false) ?? [];
const decision = await evaluateAutoMode({
ctx: pmCtx,
pmForcedAsk,
@ -1940,45 +1940,27 @@ export class Session implements SessionContext {
skipClassifier: shouldFallback(denialState).fallback,
});
// Exhaustive switch (parallels the switch(decision.via) block in
// coreToolScheduler.ts, inside the evaluateAutoMode result
// handling). Using an `if/else if` here would silently fall
// through to "allow" if a new `via` variant were added to
// `AutoModeDecision` — fail-open. The `_exhaustive: never` arm
// makes that drift a compile-time error.
switch (decision.via) {
case 'fast-path:accept-edits':
case 'fast-path:allowlist':
this.config.setAutoModeDenialState(recordAllow(denialState));
autoModeAllowed = true;
break;
case 'classifier':
if (decision.shouldBlock) {
this.config.setAutoModeDenialState(
decision.unavailable
? recordUnavailable(denialState)
: recordBlock(denialState),
);
return earlyErrorResponse(
new Error(formatClassifierBlockMessage(decision)),
fc.name,
);
}
this.config.setAutoModeDenialState(recordAllow(denialState));
// Apply decision via shared helper — eliminates ~40 lines of
// line-for-line duplication with coreToolScheduler.ts and makes
// the CLI / ACP paths share one source of truth for the
// switch + denial-tracking state updates + exhaustiveness
// guard.
const outcome = applyAutoModeDecision(
decision,
this.config,
denialState,
);
switch (outcome.kind) {
case 'approved':
autoModeAllowed = true;
break;
case 'blocked':
return earlyErrorResponse(new Error(outcome.errorMessage), fc.name);
case 'fallback':
// Drop through to the manual-approval flow below.
break;
default: {
const _exhaustive: never = decision;
// Surface drift at runtime, not just compile-time. The TS
// exhaustiveness check is bypassable (`as` cast, JS interop,
// partial build), and without this log every tool call would
// silently degrade with zero operator-visible signal.
debugLogger.error(
`Auto mode: unrecognised decision.via "${(decision as { via: string }).via}" — falling through to manual approval`,
);
const _exhaustive: never = outcome;
void _exhaustive;
}
}

View file

@ -540,10 +540,11 @@ export class BackgroundAgentResumeService {
// continuing to read the parent's. Reusing `this.config`
// directly here would short-circuit that isolation. See the
// matching wrapper in `agent.ts:createApprovalModeOverride`.
const agentConfig = await createApprovalModeOverride(
this.config,
resolvedApprovalMode as ApprovalMode,
);
const { config: agentConfig, cleanup: restoreParentPM } =
await createApprovalModeOverride(
this.config,
resolvedApprovalMode as ApprovalMode,
);
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const bgConfig = Object.create(agentConfig) as any;
bgConfig.getShouldAvoidPermissionPrompts = () => true;
@ -823,6 +824,10 @@ export class BackgroundAgentResumeService {
.getToolRegistry()
.stop()
.catch(() => {});
// Restore parent PermissionManager's dangerous allow rules if
// this override stripped them. See createApprovalModeOverride
// strip-lifecycle comment in agent.ts.
restoreParentPM();
}
};

View file

@ -65,16 +65,15 @@ import {
isAutoEditApproved,
} from './permissionFlow.js';
import {
applyAutoModeDecision,
evaluateAutoMode,
formatClassifierBlockMessage,
shouldRunAutoModeForCall,
} from '../permissions/autoMode.js';
import { MAX_TRANSCRIPT_MESSAGES } from '../permissions/classifier-transcript.js';
import {
isApproveOutcome,
recordAllow,
recordBlock,
recordFallbackApprove,
recordUnavailable,
shouldFallback,
} from '../permissions/denialTracking.js';
import { getResponseTextFromParts } from '../utils/generateContentResponseUtilities.js';
@ -1381,12 +1380,15 @@ export class CoreToolScheduler {
if (shouldRunAutoModeForCall(approvalMode, canonicalName)) {
const denialState = this.config.getAutoModeDenialState();
const fallback = shouldFallback(denialState);
// `buildClassifierContents` retains only the most recent 40
// messages; ask the chat client for exactly that tail rather
// than triggering a `structuredClone` of the whole session
// on every non-fast-path AUTO call.
// `buildClassifierContents` retains only the most recent
// MAX_TRANSCRIPT_MESSAGES messages; ask the chat client for
// exactly that tail rather than triggering a
// `structuredClone` of the whole session on every non-
// fast-path AUTO call.
const messages =
this.config.getGeminiClient?.()?.getHistoryTail(40, false) ?? [];
this.config
.getGeminiClient?.()
?.getHistoryTail(MAX_TRANSCRIPT_MESSAGES, false) ?? [];
const decision = await evaluateAutoMode({
ctx: pmCtx,
pmForcedAsk,
@ -1397,36 +1399,26 @@ export class CoreToolScheduler {
skipClassifier: fallback.fallback,
});
const markApproved = () => {
this.config.setAutoModeDenialState(recordAllow(denialState));
this.setToolCallOutcome(
reqInfo.callId,
ToolConfirmationOutcome.ProceedAlways,
);
this.setStatusInternal(reqInfo.callId, 'scheduled');
};
switch (decision.via) {
case 'fast-path:accept-edits':
case 'fast-path:allowlist':
markApproved();
continue;
case 'classifier':
if (!decision.shouldBlock) {
markApproved();
continue;
}
this.config.setAutoModeDenialState(
decision.unavailable
? recordUnavailable(denialState)
: recordBlock(denialState),
const outcome = applyAutoModeDecision(
decision,
this.config,
denialState,
);
switch (outcome.kind) {
case 'approved':
this.setToolCallOutcome(
reqInfo.callId,
ToolConfirmationOutcome.ProceedAlways,
);
this.setStatusInternal(reqInfo.callId, 'scheduled');
continue;
case 'blocked':
this.setStatusInternal(
reqInfo.callId,
'error',
createErrorResponse(
reqInfo,
new Error(formatClassifierBlockMessage(decision)),
new Error(outcome.errorMessage),
ToolErrorType.EXECUTION_DENIED,
),
);
@ -1434,7 +1426,9 @@ export class CoreToolScheduler {
case 'fallback':
// Drop through to the manual-approval flow below. The
// pending dialog tells the user what's being asked;
// operators see the cause in the debug log.
// operators see the cause in the debug log (only when
// fallback was specifically armed by denialTracking —
// a pmForcedAsk fallback isn't an audit-worthy event).
if (fallback.fallback) {
debugLogger.warn(
`Auto mode fallback to manual approval (${fallback.reason}): consecutiveBlock=${denialState.consecutiveBlock}, consecutiveUnavailable=${denialState.consecutiveUnavailable}`,
@ -1442,15 +1436,7 @@ export class CoreToolScheduler {
}
break;
default: {
const _exhaustive: never = decision;
// Surface drift at runtime, not just compile-time. The TS
// exhaustiveness check is bypassable (`as` cast, JS
// interop, partial build), and without this log every
// tool call would silently degrade with zero operator-
// visible signal.
debugLogger.error(
`Auto mode: unrecognised decision.via "${(decision as { via: string }).via}" — falling through to manual approval`,
);
const _exhaustive: never = outcome;
void _exhaustive;
}
}

View file

@ -11,7 +11,9 @@ import {
formatClassifierBlockMessage,
isInSafeToolAllowlist,
passesAcceptEditsFastPath,
shouldRunAutoModeForCall,
} from './autoMode.js';
import { ApprovalMode } from '../config/config.js';
import { ToolNames } from '../tools/tool-names.js';
import type { Config } from '../config/config.js';
import type { PermissionCheckContext } from './types.js';
@ -350,3 +352,55 @@ describe('formatClassifierBlockMessage', () => {
).toBe('Auto mode classifier unavailable; action blocked for safety');
});
});
// ─── shouldRunAutoModeForCall ─────────────────────────────────────────────
describe('shouldRunAutoModeForCall', () => {
// Security-critical gate. Drift here would either silently skip AUTO
// for tools that need it (false negative — bypass) or invoke the
// classifier on tools that must always reach the user
// (false positive — UX break for ask_user_question / exit_plan_mode).
it('returns false when approval mode is not AUTO', () => {
for (const mode of [
ApprovalMode.DEFAULT,
ApprovalMode.PLAN,
ApprovalMode.AUTO_EDIT,
ApprovalMode.YOLO,
]) {
expect(shouldRunAutoModeForCall(mode, ToolNames.SHELL)).toBe(false);
}
});
it('returns true for arbitrary tools when mode is AUTO', () => {
for (const tool of [
ToolNames.SHELL,
ToolNames.EDIT,
ToolNames.WRITE_FILE,
ToolNames.WEB_FETCH,
ToolNames.AGENT,
ToolNames.SKILL,
ToolNames.READ_FILE,
]) {
expect(shouldRunAutoModeForCall(ApprovalMode.AUTO, tool)).toBe(true);
}
});
it('excludes ASK_USER_QUESTION even under AUTO — must always reach the user', () => {
expect(
shouldRunAutoModeForCall(ApprovalMode.AUTO, ToolNames.ASK_USER_QUESTION),
).toBe(false);
});
it('excludes EXIT_PLAN_MODE even under AUTO — plan exits are operator-driven', () => {
expect(
shouldRunAutoModeForCall(ApprovalMode.AUTO, ToolNames.EXIT_PLAN_MODE),
).toBe(false);
});
it('returns false for unknown tool names when not in AUTO', () => {
expect(shouldRunAutoModeForCall(ApprovalMode.DEFAULT, 'unknown_tool')).toBe(
false,
);
});
});

View file

@ -20,9 +20,18 @@
import type { Content } from '@google/genai';
import { ApprovalMode, type Config } from '../config/config.js';
import { ToolNames } from '../tools/tool-names.js';
import { createDebugLogger } from '../utils/debugLogger.js';
import { classifyAction, type ClassifierResult } from './classifier.js';
import {
recordAllow,
recordBlock,
recordUnavailable,
type AutoModeDenialState,
} from './denialTracking.js';
import type { PermissionCheckContext } from './types.js';
const autoModeDebugLogger = createDebugLogger('AUTO_MODE');
/**
* Built-in tools whose any-parameter behavior is safe under the AUTO mode
* classifier's threat model they never write files, never perform network
@ -88,15 +97,38 @@ export function shouldRunAutoModeForCall(
return true;
}
/**
* Paths inside the workspace that nevertheless execute code on subsequent
* tooling operations (git commit, npm install, CI runs, ) and must NOT
* take the acceptEdits fast-path. Without this list, a hostile AGENTS.md
* could instruct the agent to write `.git/hooks/pre-commit` fast-path
* approves (it's in workspace) next `git commit` runs arbitrary code
* without classifier review.
*
* Edits to these paths still pass through the AUTO classifier; users
* who want to allow specific hook/script edits can add an explicit
* `permissions.allow` rule.
*/
const PERSISTENCE_PATH_PATTERNS: readonly RegExp[] = Object.freeze([
/(^|\/)\.git\//, // git config, hooks, alias — covers .git/hooks/* and .git/config
/(^|\/)\.husky\//, // git hooks via husky
/(^|\/)package\.json$/, // npm scripts (root + nested workspaces)
/(^|\/)\.npmrc$/, // registry override → malicious package fetch on next install
/(^|\/)(Makefile|makefile|GNUmakefile)$/, // make targets
/(^|\/)\.?[Jj]ustfile$/, // just task runner
/(^|\/)Taskfile\.ya?ml$/, // go-task
/(^|\/)\.github\/workflows\//, // CI workflow definitions
]);
/**
* Returns true when the pending action is a file edit / write targeting a
* path that lies within the current workspace (cwd + additional directories).
* path that lies within the current workspace (cwd + additional directories)
* AND is NOT in {@link PERSISTENCE_PATH_PATTERNS}.
*
* Symlinks ARE resolved via `WorkspaceContext.isPathWithinWorkspace`, which
* internally calls `fs.realpathSync`. A symlink whose target is outside the
* workspace correctly fails this check and falls through to the classifier
* fail-safe by implementation. (Earlier revisions of this comment
* claimed the opposite; the actual behavior has always been to resolve.)
* fail-safe by implementation.
*
* Caller should only consult this when L4 evaluation returned `'default'`.
*/
@ -106,6 +138,12 @@ export function passesAcceptEditsFastPath(
): boolean {
if (!EDIT_TOOL_NAMES.has(ctx.toolName)) return false;
if (!ctx.filePath) return false;
// Persistence paths (hooks, package.json scripts, CI definitions) must
// never auto-approve via fast-path — they execute code on subsequent
// tooling operations.
if (PERSISTENCE_PATH_PATTERNS.some((p) => p.test(ctx.filePath!))) {
return false;
}
return config.getWorkspaceContext().isPathWithinWorkspace(ctx.filePath);
}
@ -131,6 +169,74 @@ export type AutoModeDecision =
}
| { via: 'fallback' };
/**
* Outcome of {@link applyAutoModeDecision}. Boils the union of
* `AutoModeDecision` plus denial-tracking state updates down to a
* three-way "what should the caller do" instruction so the scheduler /
* ACP paths share one decision handler instead of duplicating the
* switch + state-update boilerplate.
*/
export type AutoModeOutcome =
| { kind: 'approved' }
| { kind: 'blocked'; errorMessage: string }
| { kind: 'fallback' };
/**
* Apply an {@link AutoModeDecision} to denial-tracking state and return
* an outcome the caller can act on. Shared between
* `coreToolScheduler.ts` and `acp-integration/session/Session.ts` the
* switch on `decision.via`, the `recordAllow / recordBlock /
* recordUnavailable` updates, and the formatted block message used to
* all be duplicated line-for-line across the two files. Drift between
* those copies was a recurring class of bug across PR #4151 review
* rounds; this helper makes the two paths share one source of truth.
*
* Callers retain responsibility for the surrounding integration
* (marking the tool call scheduled vs writing an error response,
* logging the fallback reason with denial-state context, etc.) those
* pieces differ between scheduler and Session.
*/
export function applyAutoModeDecision(
decision: AutoModeDecision,
config: Config,
denialState: AutoModeDenialState,
): AutoModeOutcome {
switch (decision.via) {
case 'fast-path:accept-edits':
case 'fast-path:allowlist':
config.setAutoModeDenialState(recordAllow(denialState));
return { kind: 'approved' };
case 'classifier':
if (decision.shouldBlock) {
config.setAutoModeDenialState(
decision.unavailable
? recordUnavailable(denialState)
: recordBlock(denialState),
);
return {
kind: 'blocked',
errorMessage: formatClassifierBlockMessage(decision),
};
}
config.setAutoModeDenialState(recordAllow(denialState));
return { kind: 'approved' };
case 'fallback':
return { kind: 'fallback' };
default: {
const _exhaustive: never = decision;
// Surface drift at runtime — TS exhaustiveness can be bypassed
// via `as` cast / JS interop / partial build. Without this log
// every tool call would silently degrade to manual approval with
// zero operator-visible signal.
autoModeDebugLogger.error(
`Auto mode: unrecognised decision.via "${(decision as { via: string }).via}" — falling through to manual approval`,
);
void _exhaustive;
return { kind: 'fallback' };
}
}
}
/**
* Build the tool-error message the scheduler / ACP session returns when
* the classifier blocks or is unavailable. Shared between

View file

@ -109,6 +109,17 @@ export function buildClassifierSystemPrompt(config: Config): string {
.replace('{{ENVIRONMENT}}', formatSection(BUILTIN_ENVIRONMENT, userEnv));
}
/**
* Per-entry character cap and per-section count cap on user-provided
* hints / environment lines. Documented in `auto-mode.md` ("Each entry
* is capped at 200 characters", "accept up to 50 entries each")
* enforce them here so a hostile or accidental large hint payload
* cannot bloat the classifier system prompt and overflow the fast
* model's context window.
*/
export const MAX_USER_HINT_LENGTH = 200;
export const MAX_USER_HINTS_PER_SECTION = 50;
/**
* Render built-in entries as plain bullets, then append user-provided
* entries as JSON-quoted string literals labelled `user hint`.
@ -130,8 +141,15 @@ function formatSection(
userEntries: readonly string[],
): string {
const lines = builtIn.map((entry) => `- ${entry}`);
for (const entry of userEntries) {
lines.push(`- user hint: ${JSON.stringify(entry)}`);
// Enforce documented caps: take at most MAX_USER_HINTS_PER_SECTION
// entries and truncate each to MAX_USER_HINT_LENGTH characters.
const capped = userEntries.slice(0, MAX_USER_HINTS_PER_SECTION);
for (const entry of capped) {
const truncated =
entry.length > MAX_USER_HINT_LENGTH
? entry.slice(0, MAX_USER_HINT_LENGTH) + '…'
: entry;
lines.push(`- user hint: ${JSON.stringify(truncated)}`);
}
return lines.join('\n');
}

View file

@ -6,7 +6,10 @@
import { describe, it, expect } from 'vitest';
import type { Content } from '@google/genai';
import { buildClassifierContents } from './classifier-transcript.js';
import {
buildClassifierContents,
MAX_TRANSCRIPT_MESSAGES,
} from './classifier-transcript.js';
import {
DeclarativeTool,
type ToolInvocation,
@ -314,4 +317,53 @@ describe('buildClassifierContents', () => {
expect(serialized).toContain('evil.example.com');
expect(serialized).toContain('Prior action: run_shell_command');
});
// ─── MAX_TRANSCRIPT_MESSAGES truncation ─────────────────────────────
// Security-relevant: without truncation, a long session's transcript
// can overflow the fast model's context window, fail-close the
// classifier, and trigger denialTracking. The constant is exported
// so scheduler + Session can request exactly this slice from
// GeminiClient.getHistoryTail — verify the truncation actually fires
// when the input exceeds the window.
it('exports MAX_TRANSCRIPT_MESSAGES so callers can size getHistoryTail correctly', () => {
expect(typeof MAX_TRANSCRIPT_MESSAGES).toBe('number');
expect(MAX_TRANSCRIPT_MESSAGES).toBeGreaterThan(0);
});
it('truncates input to the most recent MAX_TRANSCRIPT_MESSAGES messages', () => {
// Build a history twice the cap; the oldest half should be dropped.
const messages: Content[] = [];
for (let i = 0; i < MAX_TRANSCRIPT_MESSAGES * 2; i++) {
messages.push({
role: 'user',
parts: [{ text: `msg-${i}` }],
});
}
const result = buildClassifierContents(messages, makeRegistry({}), {
toolName: 'read_file',
toolParams: { path: 'x.ts' },
});
const serialized = JSON.stringify(result);
// The oldest message must NOT appear — got dropped by truncation.
expect(serialized).not.toContain('"msg-0"');
// Earliest retained message is at index N (where 2N is total input).
expect(serialized).toContain(`"msg-${MAX_TRANSCRIPT_MESSAGES}"`);
// Most-recent message must appear.
expect(serialized).toContain(`"msg-${MAX_TRANSCRIPT_MESSAGES * 2 - 1}"`);
});
it('passes through history shorter than the cap unchanged', () => {
const messages: Content[] = [
{ role: 'user', parts: [{ text: 'first' }] },
{ role: 'user', parts: [{ text: 'second' }] },
];
const result = buildClassifierContents(messages, makeRegistry({}), {
toolName: 'read_file',
toolParams: { path: 'x.ts' },
});
const serialized = JSON.stringify(result);
expect(serialized).toContain('first');
expect(serialized).toContain('second');
});
});

View file

@ -48,7 +48,15 @@ export interface PendingAction {
* while preserving enough of the recent action chain for the classifier to
* apply its "untrusted tool-output" rule across a multi-step interaction.
*/
const MAX_TRANSCRIPT_MESSAGES = 40;
/**
* Maximum number of session messages forwarded to the classifier as
* context. Exported so the scheduler / ACP session paths can request
* exactly this slice via `getHistoryTail(MAX_TRANSCRIPT_MESSAGES)`
* rather than hardcoding `40` keeping the constant single-sourced
* means tuning the window doesn't require lockstep edits across
* three files.
*/
export const MAX_TRANSCRIPT_MESSAGES = 40;
/**
* Build the `contents` array for the classifier sideQuery call.

View file

@ -12,7 +12,11 @@ vi.mock('../utils/sideQuery.js', () => ({
runSideQuery: (...args: unknown[]) => runSideQueryMock(...args),
}));
import { classifyAction, type ClassifierInput } from './classifier.js';
import {
classifyAction,
sanitizeClassifierReason,
type ClassifierInput,
} from './classifier.js';
import type { Config } from '../config/config.js';
import type { ToolRegistry } from '../tools/tool-registry.js';
@ -211,3 +215,54 @@ describe('classifier configuration', () => {
// Context-overflow detection now delegated to the shared
// `isContextLengthExceededError` utility; tests covering its behavior live
// alongside that module (utils/contextLengthError.test.ts).
describe('sanitizeClassifierReason', () => {
// Security-critical: the classifier reason is LLM-generated and gets
// interpolated into the main model's tool-error message. A hostile
// reason can stage a prompt injection if not sanitized.
it('passes empty / falsy through unchanged', () => {
expect(sanitizeClassifierReason('')).toBe('');
});
it('strips simple pseudo-tags like <system>...</system>', () => {
expect(sanitizeClassifierReason('safe <system>danger</system> tail')).toBe(
'safe danger tail',
);
});
it('iterates strip until stable — no complete <...> tag can survive', () => {
// The threat is a pseudo-tag like `<system>...` confusing the
// downstream model. A single /<[^>]*>/g pass on a nested input
// like `<scr<script>extra>` leaves `>` orphaned tokens which is
// fine — what must NOT survive is any complete `<...>` pair.
const result = sanitizeClassifierReason('<scr<script>extra>payload');
expect(result).not.toMatch(/<[^>]*>/);
});
it('bounds iteration so adversarial inputs cannot create unbounded work', () => {
// 8-iteration cap means the function is O(n) regardless of how the
// attacker structures the input. Even a degenerate string with many
// overlapping tags terminates promptly.
const adversarial = '<a'.repeat(2000) + '>'.repeat(2000);
const t0 = Date.now();
sanitizeClassifierReason(adversarial);
expect(Date.now() - t0).toBeLessThan(1000);
});
it('collapses whitespace and newlines to single spaces', () => {
expect(sanitizeClassifierReason('line1\nline2\n\n\nline3')).toBe(
'line1 line2 line3',
);
});
it('hard-caps length at 200 characters', () => {
expect(sanitizeClassifierReason('a'.repeat(500)).length).toBe(200);
});
it('trims surrounding whitespace after collapse', () => {
expect(sanitizeClassifierReason(' leading trailing ')).toBe(
'leading trailing',
);
});
});

View file

@ -186,6 +186,12 @@ export async function classifyAction(
}
if (!stage1.shouldBlock) {
// Audit-trail at debug level (off by default, on when investigating
// "why was this dangerous command allowed"). Info would be noise on
// every non-fast-path AUTO call; debug is grep-able when needed.
debugLogger.debug(
`ALLOW stage=fast tool=${input.toolName} durationMs=${Date.now() - overallStart}`,
);
return {
shouldBlock: false,
reason: '',
@ -234,6 +240,14 @@ export async function classifyAction(
};
}
// Audit-trail at debug level for the stage-2 verdict — both ALLOW
// (where stage 1 flagged but stage 2 cleared) and the implicit BLOCK
// (stage 2 confirmed). The reason+thinking already carry the full
// explanation; this line just makes the verdict grep-able.
debugLogger.debug(
`${stage2.shouldBlock ? 'BLOCK' : 'ALLOW'} stage=thinking ` +
`tool=${input.toolName} durationMs=${Date.now() - overallStart}`,
);
return {
shouldBlock: stage2.shouldBlock,
// Stage-2 reason is LLM-generated and ends up interpolated into the

View file

@ -11,9 +11,11 @@ export type { PermissionManagerConfig } from './permission-manager.js';
export { extractShellOperations } from './shell-semantics.js';
export type { ShellOperation } from './shell-semantics.js';
export {
applyAutoModeDecision,
evaluateAutoMode,
formatClassifierBlockMessage,
type AutoModeDecision,
type AutoModeOutcome,
type EvaluateAutoModeInput,
SAFE_TOOL_ALLOWLIST,
isInSafeToolAllowlist,
@ -33,3 +35,4 @@ export {
resetDenialState,
shouldFallback,
} from './denialTracking.js';
export { MAX_TRANSCRIPT_MESSAGES } from './classifier-transcript.js';

View file

@ -140,7 +140,7 @@ describe('SubagentManager.buildSubagentContextOverride bound-tool isolation', ()
(parent as any).toolRegistry = parentRegistry;
// Layer 1: actual createApprovalModeOverride (sets the marker).
const upstreamWrapper = await createApprovalModeOverride(
const { config: upstreamWrapper } = await createApprovalModeOverride(
parent,
ApprovalMode.AUTO_EDIT,
);

View file

@ -53,7 +53,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(parent as any).toolRegistry = parentRegistry;
const child = await createApprovalModeOverride(
const { config: child } = await createApprovalModeOverride(
parent,
ApprovalMode.AUTO_EDIT,
);
@ -71,7 +71,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(parent as any).toolRegistry = parentRegistry;
const child = await createApprovalModeOverride(
const { config: child } = await createApprovalModeOverride(
parent,
ApprovalMode.AUTO_EDIT,
);
@ -110,7 +110,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(parent as any).toolRegistry = parentRegistry;
const child = await createApprovalModeOverride(
const { config: child } = await createApprovalModeOverride(
parent,
ApprovalMode.AUTO_EDIT,
);
@ -140,7 +140,10 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
expect(parent.getApprovalMode()).toBe(ApprovalMode.DEFAULT);
const child = await createApprovalModeOverride(parent, ApprovalMode.YOLO);
const { config: child } = await createApprovalModeOverride(
parent,
ApprovalMode.YOLO,
);
expect(child.getApprovalMode()).toBe(ApprovalMode.YOLO);
const childEdit = await child.getToolRegistry().ensureTool(ToolNames.EDIT);
@ -162,7 +165,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
// hook is reachable by introspecting the parent registry first.
const beforeNames = parentRegistry.getAllToolNames().sort();
const child = await createApprovalModeOverride(
const { config: child } = await createApprovalModeOverride(
parent,
ApprovalMode.AUTO_EDIT,
);
@ -199,7 +202,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(parentNonBare as any).toolRegistry = parentNonBareRegistry;
const childNonBare = await createApprovalModeOverride(
const { config: childNonBare } = await createApprovalModeOverride(
parentNonBare,
ApprovalMode.AUTO_EDIT,
);
@ -228,7 +231,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(parent as any).toolRegistry = parentRegistry;
const upstream = await createApprovalModeOverride(
const { config: upstream } = await createApprovalModeOverride(
parent,
ApprovalMode.AUTO_EDIT,
);

View file

@ -338,6 +338,20 @@ export async function rebuildToolRegistryOnOverride(
ov[TOOL_REGISTRY_REBUILT] = true;
}
/**
* Handle returned by {@link createApprovalModeOverride}.
*
* The `cleanup` callback MUST be invoked in a `finally` block after the
* sub-agent lifecycle ends. It restores the parent PermissionManager's
* dangerous allow rules if and only if this override was responsible
* for stripping them see {@link createApprovalModeOverride} below
* for the cases.
*/
export interface ApprovalModeOverrideHandle {
config: Config;
cleanup: () => void;
}
/**
* Creates a Config override with a different approval mode.
*
@ -348,28 +362,58 @@ export async function rebuildToolRegistryOnOverride(
* instances continue to resolve `this.config` to the parent, defeating
* per-Config isolation of FileReadCache / approval mode for any code
* path that goes through the bound tool.
*
* Returns `{ config, cleanup }`. Callers MUST invoke `cleanup` in a
* `finally` block after the override is no longer in use, otherwise
* the parent's PermissionManager may leak a strip across the sub-agent
* boundary (see strip lifecycle below).
*
* Strip lifecycle for AUTO overrides:
* - parent not in AUTO, override in AUTO: this function strips the
* PARENT's PM (shared via prototype chain the override cannot
* have its own PM without a much bigger refactor). `cleanup`
* restores the strip when the sub-agent finishes, but ONLY if the
* parent hasn't itself entered AUTO in the meantime (in which
* case restoring would undo the parent's own strip).
* - parent already in AUTO, override in AUTO: parent's
* `setApprovalMode` already stripped on its own entry. We don't
* strip again (would be a no-op anyway via sentinel) and don't
* restore on cleanup (lifecycle is parent-owned).
* - override not in AUTO: no strip, no restore.
*/
export async function createApprovalModeOverride(
base: Config,
mode: ApprovalMode,
): Promise<Config> {
): Promise<ApprovalModeOverrideHandle> {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const override = Object.create(base) as any;
override.getApprovalMode = (): ApprovalMode => mode;
await rebuildToolRegistryOnOverride(override as Config, base);
// AUTO mode requires dangerous allow rules (broad Bash / Agent / Skill)
// to be stripped so the classifier still gates them. The normal entry
// hook is `Config.setApprovalMode` (config.ts:~2560); the lightweight
// override path bypasses that, so we trigger the strip directly here.
// The strip is idempotent (sentinel-guarded), shared with the parent
// PermissionManager, and persists until ApprovalMode is toggled away
// from AUTO — same lifecycle as the top-level entry path.
let cleanup: () => void = () => {};
if (mode === ApprovalMode.AUTO) {
base.getPermissionManager?.()?.stripDangerousRulesForAutoMode();
const baseWasAuto = base.getApprovalMode() === ApprovalMode.AUTO;
if (!baseWasAuto) {
// This override is bringing AUTO into a non-AUTO parent. Strip
// dangerous allow rules so the sub-agent's classifier actually
// gates them, then arrange to restore on cleanup.
base.getPermissionManager?.()?.stripDangerousRulesForAutoMode();
cleanup = () => {
// Defensive: parent could have toggled to AUTO during the sub-
// agent's run. In that case parent now owns the strip lifecycle
// (its own `setApprovalMode(AUTO)` hook was responsible) and we
// must NOT restore — that would un-strip the parent's intent.
if (base.getApprovalMode() !== ApprovalMode.AUTO) {
base.getPermissionManager?.()?.restoreDangerousRules();
}
};
}
// baseWasAuto: parent's setApprovalMode already stripped; cleanup
// stays no-op since lifecycle is parent-owned.
}
return override as Config;
return { config: override as Config, cleanup };
}
/**
@ -1330,6 +1374,13 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
return '';
};
// Hoisted so the outer catch can restore parent PermissionManager
// state when an exception lands between `createApprovalModeOverride`
// and the fg / bg / fork inner finallys (e.g. worktree provisioning
// or `createAgentHeadless` throw). Assigned only after the override
// is created; stays a no-op for any earlier failure.
let restoreParentPM: () => void = () => {};
try {
const isFork = !this.params.subagent_type;
let subagentConfig: SubagentConfig;
@ -1549,10 +1600,11 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
// resolve `this.config` to the parent, reaching the parent's
// FileReadCache rather than the subagent's. See
// `createApprovalModeOverride` above for details.
const agentConfig = await createApprovalModeOverride(
const { config: agentConfig, cleanup } = await createApprovalModeOverride(
this.config,
resolvedApprovalMode,
);
restoreParentPM = cleanup;
// ── Optional worktree isolation (Phase 2: rebind cwd) ─────────
// Rebind every "where am I?" surface on the agent's Config
@ -1975,6 +2027,11 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
.getToolRegistry()
.stop()
.catch(() => {});
// Restore parent PermissionManager's dangerous allow rules
// if this AUTO override stripped them. Background path:
// restore fires when the bg agent terminates (complete /
// fail / cancel), not when this outer execute() returns.
restoreParentPM();
}
};
// Wrap in the agent-identity frame so nested `agent` tool calls
@ -2037,6 +2094,11 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
.getToolRegistry()
.stop()
.catch(() => {});
// Restore parent PM's dangerous allow rules if this AUTO
// override stripped them. Fork-async path: restore fires
// when the fork body terminates, not when the outer
// execute() returns the FORK_PLACEHOLDER_RESULT.
restoreParentPM();
}
});
void runInForkContext(runFramedFork);
@ -2298,6 +2360,11 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
.getToolRegistry()
.stop()
.catch(() => {});
// Restore parent PermissionManager's dangerous allow rules if
// this AUTO override stripped them on creation. No-op for non-
// AUTO overrides and for AUTO overrides when parent was already
// AUTO. See createApprovalModeOverride strip-lifecycle comment.
restoreParentPM();
}
} catch (error) {
const errorMessage =
@ -2322,6 +2389,18 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
}
}
// Restore parent PermissionManager if an exception landed between
// createApprovalModeOverride and the inner fg/bg/fork finallys.
// No-op when restoreParentPM is still the hoisted default (e.g.
// when createApprovalModeOverride itself threw).
try {
restoreParentPM();
} catch (restoreError) {
debugLogger.warn(
`[AgentTool] restoreParentPM after error failed: ${restoreError}`,
);
}
const errorDisplay: AgentResultDisplay = {
...this.currentDisplay!,
status: 'failed',

View file

@ -379,10 +379,12 @@ export async function runForkedAgent(
// wrapper's bound tools resolve `this.config.getPermissionManager()`
// through the prototype chain to the scoped wrapper's own override,
// while `this.config.getApprovalMode()` lands on YOLO.
const yoloConfig = await createApprovalModeOverride(
params.config,
ApprovalMode.YOLO,
);
const { config: yoloConfig, cleanup: restoreParentPM } =
await createApprovalModeOverride(params.config, ApprovalMode.YOLO);
// YOLO never triggers strip → restoreParentPM is a no-op. Kept for
// API symmetry with the other createApprovalModeOverride callers; if
// this function ever switches away from YOLO the lifecycle stays
// correct without further refactor.
const filesTouched = new Set<string>();
const emitter = new AgentEventEmitter();
@ -460,5 +462,6 @@ export async function runForkedAgent(
.getToolRegistry()
.stop()
.catch(() => {});
restoreParentPM();
}
}