fix(core,cli): 7 review findings on PR #4151 round 8

# Critical findings [#3264638738] Sub-agent AUTO override stripped parent's shared PM with no restore — DEFAULT-mode parent spawning an AUTO sub-agent silently lost its dangerous allow rules forever (until mode toggle). Fix: change `createApprovalModeOverride` to return `{config, cleanup}`. The cleanup invokes `restoreDangerousRules()` if and only if this override was responsible for the strip (parent was not already in AUTO at override time and hasn't entered AUTO during the run). All 3 callers (agent.ts foreground + bg + fork-async, background-agent- resume.ts, forkedAgent.ts) updated with cleanup in their existing finally blocks. Outer catch in agent.ts also invokes cleanup so an exception between override creation and the inner finallys doesn't leak strip state. [#3264638739] acceptEdits fast-path auto-approved writes to `.git/hooks/`, `.husky/`, `package.json`, `.npmrc` etc — all paths that execute code on subsequent tooling operations (git commit, npm install, CI) were bypassing the classifier via the workspace-edit fast-path. Hostile AGENTS.md → write hook → next git commit runs arbitrary code. Fix: PERSISTENCE_PATH_PATTERNS blocklist in passesAcceptEditsFastPath. Edits to these paths fall through to the classifier (or to an explicit user allow rule). Scope: code-execution surfaces only (`.git/`, `.husky/`, `package.json`, `.npmrc`, Makefile/justfile/ Taskfile, `.github/workflows/`) — not arbitrary "sensitive" paths. [#3264638748] Classifier ALLOW path had zero observability — operator investigating "why was this dangerous command allowed" had no audit trail. Fix: `debugLogger.debug` (NOT info — skill filter 5 says no always-info on happy paths) on stage-1 ALLOW and stage-2 ALLOW/BLOCK paths. Off by default, grep-able when investigating. # Suggestions [#3264638759] ~80 lines of switch(decision.via) + denial-state updates duplicated between coreToolScheduler.ts and ACP Session.ts. Fix: extract `applyAutoModeDecision(decision, config, denialState) -> AutoModeOutcome` in autoMode.ts. Both callers reduce to a small switch on the outcome.kind (`approved` / `blocked` / `fallback`). Single source of truth for the AUTO decision-handling protocol; drift between CLI and ACP paths is now impossible at the structural level. [#3264638761] Magic `40` hardcoded in scheduler + Session + transcript builder. Fix: export MAX_TRANSCRIPT_MESSAGES from classifier-transcript.ts, import in both call sites. [#3264638767] auto-mode.md promised 200-char per-entry / 50 entries per-section caps for user hints; code in formatSection enforced neither. Hostile workspace settings could bloat classifier system prompt and overflow fast-model context. Fix: enforce both caps in formatSection. Constants exported (MAX_USER_HINT_LENGTH, MAX_USER_HINTS_PER_SECTION). # Test coverage gaps (top-level) [Test coverage] sanitizeClassifierReason, shouldRunAutoModeForCall, and MAX_TRANSCRIPT_MESSAGES truncation had zero coverage. Fix: 7 new test cases in classifier.test.ts (sanitizer), 5 cases in autoMode.test.ts (gate function), 3 cases in classifier-transcript. test.ts (truncation behavior). Total +15 assertions on security- critical surfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 12:44:02 +00:00 · 2026-05-19 17:55:36 +08:00 · 2026-05-19 17:55:36 +08:00 · 1312d5749e
commit 1312d5749e
parent 0bac5568f3
15 changed files with 488 additions and 120 deletions
--- a/packages/cli/src/acp-integration/session/Session.ts
+++ b/packages/cli/src/acp-integration/session/Session.ts
@ -60,13 +60,11 @@ import {
  isPlanModeBlocked,
  abortGoalForStopHookCap,
  formatStopHookBlockingCapWarning,
+  applyAutoModeDecision,
  evaluateAutoMode,
-  formatClassifierBlockMessage,
  isApproveOutcome,
-  recordAllow,
-  recordBlock,
+  MAX_TRANSCRIPT_MESSAGES,
  recordFallbackApprove,
-  recordUnavailable,
  shouldFallback,
  shouldRunAutoModeForCall,
 } from '@qwen-code/qwen-code-core';
@ -1923,13 +1921,15 @@ export class Session implements SessionContext {
      // existing manual-approval flow below.
      if (!autoModeAllowed && shouldRunAutoModeForCall(approvalMode, fc.name)) {
        const denialState = this.config.getAutoModeDenialState();
-        // `buildClassifierContents` retains only the most recent 40
-        // messages; ask the chat client for exactly that tail rather
-        // than triggering a `structuredClone` of the whole session
-        // on every non-fast-path AUTO call. Parallels
-        // coreToolScheduler.ts.
+        // `buildClassifierContents` retains only the most recent
+        // MAX_TRANSCRIPT_MESSAGES messages; ask the chat client for
+        // exactly that tail rather than triggering a `structuredClone`
+        // of the whole session on every non-fast-path AUTO call.
+        // Parallels coreToolScheduler.ts.
        const messages =
-          this.config.getGeminiClient?.()?.getHistoryTail(40, false) ?? [];
+          this.config
+            .getGeminiClient?.()
+            ?.getHistoryTail(MAX_TRANSCRIPT_MESSAGES, false) ?? [];
        const decision = await evaluateAutoMode({
          ctx: pmCtx,
          pmForcedAsk,
@ -1940,45 +1940,27 @@ export class Session implements SessionContext {
          skipClassifier: shouldFallback(denialState).fallback,
        });

-        // Exhaustive switch (parallels the switch(decision.via) block in
-        // coreToolScheduler.ts, inside the evaluateAutoMode result
-        // handling). Using an `if/else if` here would silently fall
-        // through to "allow" if a new `via` variant were added to
-        // `AutoModeDecision` — fail-open. The `_exhaustive: never` arm
-        // makes that drift a compile-time error.
-        switch (decision.via) {
-          case 'fast-path:accept-edits':
-          case 'fast-path:allowlist':
-            this.config.setAutoModeDenialState(recordAllow(denialState));
-            autoModeAllowed = true;
-            break;
-          case 'classifier':
-            if (decision.shouldBlock) {
-              this.config.setAutoModeDenialState(
-                decision.unavailable
-                  ? recordUnavailable(denialState)
-                  : recordBlock(denialState),
-              );
-              return earlyErrorResponse(
-                new Error(formatClassifierBlockMessage(decision)),
-                fc.name,
-              );
-            }
-            this.config.setAutoModeDenialState(recordAllow(denialState));
+        // Apply decision via shared helper — eliminates ~40 lines of
+        // line-for-line duplication with coreToolScheduler.ts and makes
+        // the CLI / ACP paths share one source of truth for the
+        // switch + denial-tracking state updates + exhaustiveness
+        // guard.
+        const outcome = applyAutoModeDecision(
+          decision,
+          this.config,
+          denialState,
+        );
+        switch (outcome.kind) {
+          case 'approved':
            autoModeAllowed = true;
            break;
+          case 'blocked':
+            return earlyErrorResponse(new Error(outcome.errorMessage), fc.name);
          case 'fallback':
            // Drop through to the manual-approval flow below.
            break;
          default: {
-            const _exhaustive: never = decision;
-            // Surface drift at runtime, not just compile-time. The TS
-            // exhaustiveness check is bypassable (`as` cast, JS interop,
-            // partial build), and without this log every tool call would
-            // silently degrade with zero operator-visible signal.
-            debugLogger.error(
-              `Auto mode: unrecognised decision.via "${(decision as { via: string }).via}" — falling through to manual approval`,
-            );
+            const _exhaustive: never = outcome;
            void _exhaustive;
          }
        }
--- a/packages/core/src/agents/background-agent-resume.ts
+++ b/packages/core/src/agents/background-agent-resume.ts
@ -540,10 +540,11 @@ export class BackgroundAgentResumeService {
      // continuing to read the parent's. Reusing `this.config`
      // directly here would short-circuit that isolation. See the
      // matching wrapper in `agent.ts:createApprovalModeOverride`.
-      const agentConfig = await createApprovalModeOverride(
-        this.config,
-        resolvedApprovalMode as ApprovalMode,
-      );
+      const { config: agentConfig, cleanup: restoreParentPM } =
+        await createApprovalModeOverride(
+          this.config,
+          resolvedApprovalMode as ApprovalMode,
+        );
      // eslint-disable-next-line @typescript-eslint/no-explicit-any
      const bgConfig = Object.create(agentConfig) as any;
      bgConfig.getShouldAvoidPermissionPrompts = () => true;
@ -823,6 +824,10 @@ export class BackgroundAgentResumeService {
            .getToolRegistry()
            .stop()
            .catch(() => {});
+          // Restore parent PermissionManager's dangerous allow rules if
+          // this override stripped them. See createApprovalModeOverride
+          // strip-lifecycle comment in agent.ts.
+          restoreParentPM();
        }
      };

--- a/packages/core/src/core/coreToolScheduler.ts
+++ b/packages/core/src/core/coreToolScheduler.ts
@ -65,16 +65,15 @@ import {
  isAutoEditApproved,
 } from './permissionFlow.js';
 import {
+  applyAutoModeDecision,
  evaluateAutoMode,
-  formatClassifierBlockMessage,
  shouldRunAutoModeForCall,
 } from '../permissions/autoMode.js';
+import { MAX_TRANSCRIPT_MESSAGES } from '../permissions/classifier-transcript.js';
 import {
  isApproveOutcome,
  recordAllow,
-  recordBlock,
  recordFallbackApprove,
-  recordUnavailable,
  shouldFallback,
 } from '../permissions/denialTracking.js';
 import { getResponseTextFromParts } from '../utils/generateContentResponseUtilities.js';
@ -1381,12 +1380,15 @@ export class CoreToolScheduler {
          if (shouldRunAutoModeForCall(approvalMode, canonicalName)) {
            const denialState = this.config.getAutoModeDenialState();
            const fallback = shouldFallback(denialState);
-            // `buildClassifierContents` retains only the most recent 40
-            // messages; ask the chat client for exactly that tail rather
-            // than triggering a `structuredClone` of the whole session
-            // on every non-fast-path AUTO call.
+            // `buildClassifierContents` retains only the most recent
+            // MAX_TRANSCRIPT_MESSAGES messages; ask the chat client for
+            // exactly that tail rather than triggering a
+            // `structuredClone` of the whole session on every non-
+            // fast-path AUTO call.
            const messages =
-              this.config.getGeminiClient?.()?.getHistoryTail(40, false) ?? [];
+              this.config
+                .getGeminiClient?.()
+                ?.getHistoryTail(MAX_TRANSCRIPT_MESSAGES, false) ?? [];
            const decision = await evaluateAutoMode({
              ctx: pmCtx,
              pmForcedAsk,
@ -1397,36 +1399,26 @@ export class CoreToolScheduler {
              skipClassifier: fallback.fallback,
            });

-            const markApproved = () => {
-              this.config.setAutoModeDenialState(recordAllow(denialState));
-              this.setToolCallOutcome(
-                reqInfo.callId,
-                ToolConfirmationOutcome.ProceedAlways,
-              );
-              this.setStatusInternal(reqInfo.callId, 'scheduled');
-            };
-
-            switch (decision.via) {
-              case 'fast-path:accept-edits':
-              case 'fast-path:allowlist':
-                markApproved();
-                continue;
-              case 'classifier':
-                if (!decision.shouldBlock) {
-                  markApproved();
-                  continue;
-                }
-                this.config.setAutoModeDenialState(
-                  decision.unavailable
-                    ? recordUnavailable(denialState)
-                    : recordBlock(denialState),
+            const outcome = applyAutoModeDecision(
+              decision,
+              this.config,
+              denialState,
+            );
+            switch (outcome.kind) {
+              case 'approved':
+                this.setToolCallOutcome(
+                  reqInfo.callId,
+                  ToolConfirmationOutcome.ProceedAlways,
                );
+                this.setStatusInternal(reqInfo.callId, 'scheduled');
+                continue;
+              case 'blocked':
                this.setStatusInternal(
                  reqInfo.callId,
                  'error',
                  createErrorResponse(
                    reqInfo,
-                    new Error(formatClassifierBlockMessage(decision)),
+                    new Error(outcome.errorMessage),
                    ToolErrorType.EXECUTION_DENIED,
                  ),
                );
@ -1434,7 +1426,9 @@ export class CoreToolScheduler {
              case 'fallback':
                // Drop through to the manual-approval flow below. The
                // pending dialog tells the user what's being asked;
-                // operators see the cause in the debug log.
+                // operators see the cause in the debug log (only when
+                // fallback was specifically armed by denialTracking —
+                // a pmForcedAsk fallback isn't an audit-worthy event).
                if (fallback.fallback) {
                  debugLogger.warn(
                    `Auto mode fallback to manual approval (${fallback.reason}): consecutiveBlock=${denialState.consecutiveBlock}, consecutiveUnavailable=${denialState.consecutiveUnavailable}`,
@ -1442,15 +1436,7 @@ export class CoreToolScheduler {
                }
                break;
              default: {
-                const _exhaustive: never = decision;
-                // Surface drift at runtime, not just compile-time. The TS
-                // exhaustiveness check is bypassable (`as` cast, JS
-                // interop, partial build), and without this log every
-                // tool call would silently degrade with zero operator-
-                // visible signal.
-                debugLogger.error(
-                  `Auto mode: unrecognised decision.via "${(decision as { via: string }).via}" — falling through to manual approval`,
-                );
+                const _exhaustive: never = outcome;
                void _exhaustive;
              }
            }
--- a/packages/core/src/permissions/autoMode.test.ts
+++ b/packages/core/src/permissions/autoMode.test.ts
@ -11,7 +11,9 @@ import {
  formatClassifierBlockMessage,
  isInSafeToolAllowlist,
  passesAcceptEditsFastPath,
+  shouldRunAutoModeForCall,
 } from './autoMode.js';
+import { ApprovalMode } from '../config/config.js';
 import { ToolNames } from '../tools/tool-names.js';
 import type { Config } from '../config/config.js';
 import type { PermissionCheckContext } from './types.js';
@ -350,3 +352,55 @@ describe('formatClassifierBlockMessage', () => {
    ).toBe('Auto mode classifier unavailable; action blocked for safety');
  });
 });
+
+// ─── shouldRunAutoModeForCall ─────────────────────────────────────────────
+
+describe('shouldRunAutoModeForCall', () => {
+  // Security-critical gate. Drift here would either silently skip AUTO
+  // for tools that need it (false negative — bypass) or invoke the
+  // classifier on tools that must always reach the user
+  // (false positive — UX break for ask_user_question / exit_plan_mode).
+
+  it('returns false when approval mode is not AUTO', () => {
+    for (const mode of [
+      ApprovalMode.DEFAULT,
+      ApprovalMode.PLAN,
+      ApprovalMode.AUTO_EDIT,
+      ApprovalMode.YOLO,
+    ]) {
+      expect(shouldRunAutoModeForCall(mode, ToolNames.SHELL)).toBe(false);
+    }
+  });
+
+  it('returns true for arbitrary tools when mode is AUTO', () => {
+    for (const tool of [
+      ToolNames.SHELL,
+      ToolNames.EDIT,
+      ToolNames.WRITE_FILE,
+      ToolNames.WEB_FETCH,
+      ToolNames.AGENT,
+      ToolNames.SKILL,
+      ToolNames.READ_FILE,
+    ]) {
+      expect(shouldRunAutoModeForCall(ApprovalMode.AUTO, tool)).toBe(true);
+    }
+  });
+
+  it('excludes ASK_USER_QUESTION even under AUTO — must always reach the user', () => {
+    expect(
+      shouldRunAutoModeForCall(ApprovalMode.AUTO, ToolNames.ASK_USER_QUESTION),
+    ).toBe(false);
+  });
+
+  it('excludes EXIT_PLAN_MODE even under AUTO — plan exits are operator-driven', () => {
+    expect(
+      shouldRunAutoModeForCall(ApprovalMode.AUTO, ToolNames.EXIT_PLAN_MODE),
+    ).toBe(false);
+  });
+
+  it('returns false for unknown tool names when not in AUTO', () => {
+    expect(shouldRunAutoModeForCall(ApprovalMode.DEFAULT, 'unknown_tool')).toBe(
+      false,
+    );
+  });
+});
--- a/packages/core/src/permissions/autoMode.ts
+++ b/packages/core/src/permissions/autoMode.ts
@ -20,9 +20,18 @@
 import type { Content } from '@google/genai';
 import { ApprovalMode, type Config } from '../config/config.js';
 import { ToolNames } from '../tools/tool-names.js';
+import { createDebugLogger } from '../utils/debugLogger.js';
 import { classifyAction, type ClassifierResult } from './classifier.js';
+import {
+  recordAllow,
+  recordBlock,
+  recordUnavailable,
+  type AutoModeDenialState,
+} from './denialTracking.js';
 import type { PermissionCheckContext } from './types.js';

+const autoModeDebugLogger = createDebugLogger('AUTO_MODE');
+
 /**
 * Built-in tools whose any-parameter behavior is safe under the AUTO mode
 * classifier's threat model — they never write files, never perform network
@ -88,15 +97,38 @@ export function shouldRunAutoModeForCall(
  return true;
 }

+/**
+ * Paths inside the workspace that nevertheless execute code on subsequent
+ * tooling operations (git commit, npm install, CI runs, …) and must NOT
+ * take the acceptEdits fast-path. Without this list, a hostile AGENTS.md
+ * could instruct the agent to write `.git/hooks/pre-commit` → fast-path
+ * approves (it's in workspace) → next `git commit` runs arbitrary code
+ * without classifier review.
+ *
+ * Edits to these paths still pass through the AUTO classifier; users
+ * who want to allow specific hook/script edits can add an explicit
+ * `permissions.allow` rule.
+ */
+const PERSISTENCE_PATH_PATTERNS: readonly RegExp[] = Object.freeze([
+  /(^|\/)\.git\//, // git config, hooks, alias — covers .git/hooks/* and .git/config
+  /(^|\/)\.husky\//, // git hooks via husky
+  /(^|\/)package\.json$/, // npm scripts (root + nested workspaces)
+  /(^|\/)\.npmrc$/, // registry override → malicious package fetch on next install
+  /(^|\/)(Makefile|makefile|GNUmakefile)$/, // make targets
+  /(^|\/)\.?[Jj]ustfile$/, // just task runner
+  /(^|\/)Taskfile\.ya?ml$/, // go-task
+  /(^|\/)\.github\/workflows\//, // CI workflow definitions
+]);
+
 /**
 * Returns true when the pending action is a file edit / write targeting a
- * path that lies within the current workspace (cwd + additional directories).
+ * path that lies within the current workspace (cwd + additional directories)
+ * AND is NOT in {@link PERSISTENCE_PATH_PATTERNS}.
 *
 * Symlinks ARE resolved via `WorkspaceContext.isPathWithinWorkspace`, which
 * internally calls `fs.realpathSync`. A symlink whose target is outside the
 * workspace correctly fails this check and falls through to the classifier
- * — fail-safe by implementation. (Earlier revisions of this comment
- * claimed the opposite; the actual behavior has always been to resolve.)
+ * — fail-safe by implementation.
 *
 * Caller should only consult this when L4 evaluation returned `'default'`.
 */
@ -106,6 +138,12 @@ export function passesAcceptEditsFastPath(
 ): boolean {
  if (!EDIT_TOOL_NAMES.has(ctx.toolName)) return false;
  if (!ctx.filePath) return false;
+  // Persistence paths (hooks, package.json scripts, CI definitions) must
+  // never auto-approve via fast-path — they execute code on subsequent
+  // tooling operations.
+  if (PERSISTENCE_PATH_PATTERNS.some((p) => p.test(ctx.filePath!))) {
+    return false;
+  }
  return config.getWorkspaceContext().isPathWithinWorkspace(ctx.filePath);
 }

@ -131,6 +169,74 @@ export type AutoModeDecision =
    }
  | { via: 'fallback' };

+/**
+ * Outcome of {@link applyAutoModeDecision}. Boils the union of
+ * `AutoModeDecision` plus denial-tracking state updates down to a
+ * three-way "what should the caller do" instruction so the scheduler /
+ * ACP paths share one decision handler instead of duplicating the
+ * switch + state-update boilerplate.
+ */
+export type AutoModeOutcome =
+  | { kind: 'approved' }
+  | { kind: 'blocked'; errorMessage: string }
+  | { kind: 'fallback' };
+
+/**
+ * Apply an {@link AutoModeDecision} to denial-tracking state and return
+ * an outcome the caller can act on. Shared between
+ * `coreToolScheduler.ts` and `acp-integration/session/Session.ts` — the
+ * switch on `decision.via`, the `recordAllow / recordBlock /
+ * recordUnavailable` updates, and the formatted block message used to
+ * all be duplicated line-for-line across the two files. Drift between
+ * those copies was a recurring class of bug across PR #4151 review
+ * rounds; this helper makes the two paths share one source of truth.
+ *
+ * Callers retain responsibility for the surrounding integration
+ * (marking the tool call scheduled vs writing an error response,
+ * logging the fallback reason with denial-state context, etc.) — those
+ * pieces differ between scheduler and Session.
+ */
+export function applyAutoModeDecision(
+  decision: AutoModeDecision,
+  config: Config,
+  denialState: AutoModeDenialState,
+): AutoModeOutcome {
+  switch (decision.via) {
+    case 'fast-path:accept-edits':
+    case 'fast-path:allowlist':
+      config.setAutoModeDenialState(recordAllow(denialState));
+      return { kind: 'approved' };
+    case 'classifier':
+      if (decision.shouldBlock) {
+        config.setAutoModeDenialState(
+          decision.unavailable
+            ? recordUnavailable(denialState)
+            : recordBlock(denialState),
+        );
+        return {
+          kind: 'blocked',
+          errorMessage: formatClassifierBlockMessage(decision),
+        };
+      }
+      config.setAutoModeDenialState(recordAllow(denialState));
+      return { kind: 'approved' };
+    case 'fallback':
+      return { kind: 'fallback' };
+    default: {
+      const _exhaustive: never = decision;
+      // Surface drift at runtime — TS exhaustiveness can be bypassed
+      // via `as` cast / JS interop / partial build. Without this log
+      // every tool call would silently degrade to manual approval with
+      // zero operator-visible signal.
+      autoModeDebugLogger.error(
+        `Auto mode: unrecognised decision.via "${(decision as { via: string }).via}" — falling through to manual approval`,
+      );
+      void _exhaustive;
+      return { kind: 'fallback' };
+    }
+  }
+}
+
 /**
 * Build the tool-error message the scheduler / ACP session returns when
 * the classifier blocks or is unavailable. Shared between
--- a/packages/core/src/permissions/classifier-prompts/system-prompt.ts
+++ b/packages/core/src/permissions/classifier-prompts/system-prompt.ts
@ -109,6 +109,17 @@ export function buildClassifierSystemPrompt(config: Config): string {
    .replace('{{ENVIRONMENT}}', formatSection(BUILTIN_ENVIRONMENT, userEnv));
 }

+/**
+ * Per-entry character cap and per-section count cap on user-provided
+ * hints / environment lines. Documented in `auto-mode.md` ("Each entry
+ * is capped at 200 characters", "accept up to 50 entries each") —
+ * enforce them here so a hostile or accidental large hint payload
+ * cannot bloat the classifier system prompt and overflow the fast
+ * model's context window.
+ */
+export const MAX_USER_HINT_LENGTH = 200;
+export const MAX_USER_HINTS_PER_SECTION = 50;
+
 /**
 * Render built-in entries as plain bullets, then append user-provided
 * entries as JSON-quoted string literals labelled `user hint`.
@ -130,8 +141,15 @@ function formatSection(
  userEntries: readonly string[],
 ): string {
  const lines = builtIn.map((entry) => `- ${entry}`);
-  for (const entry of userEntries) {
-    lines.push(`- user hint: ${JSON.stringify(entry)}`);
+  // Enforce documented caps: take at most MAX_USER_HINTS_PER_SECTION
+  // entries and truncate each to MAX_USER_HINT_LENGTH characters.
+  const capped = userEntries.slice(0, MAX_USER_HINTS_PER_SECTION);
+  for (const entry of capped) {
+    const truncated =
+      entry.length > MAX_USER_HINT_LENGTH
+        ? entry.slice(0, MAX_USER_HINT_LENGTH) + '…'
+        : entry;
+    lines.push(`- user hint: ${JSON.stringify(truncated)}`);
  }
  return lines.join('\n');
 }
--- a/packages/core/src/permissions/classifier-transcript.test.ts
+++ b/packages/core/src/permissions/classifier-transcript.test.ts
@ -6,7 +6,10 @@

 import { describe, it, expect } from 'vitest';
 import type { Content } from '@google/genai';
-import { buildClassifierContents } from './classifier-transcript.js';
+import {
+  buildClassifierContents,
+  MAX_TRANSCRIPT_MESSAGES,
+} from './classifier-transcript.js';
 import {
  DeclarativeTool,
  type ToolInvocation,
@ -314,4 +317,53 @@ describe('buildClassifierContents', () => {
    expect(serialized).toContain('evil.example.com');
    expect(serialized).toContain('Prior action: run_shell_command');
  });
+
+  // ─── MAX_TRANSCRIPT_MESSAGES truncation ─────────────────────────────
+  // Security-relevant: without truncation, a long session's transcript
+  // can overflow the fast model's context window, fail-close the
+  // classifier, and trigger denialTracking. The constant is exported
+  // so scheduler + Session can request exactly this slice from
+  // GeminiClient.getHistoryTail — verify the truncation actually fires
+  // when the input exceeds the window.
+
+  it('exports MAX_TRANSCRIPT_MESSAGES so callers can size getHistoryTail correctly', () => {
+    expect(typeof MAX_TRANSCRIPT_MESSAGES).toBe('number');
+    expect(MAX_TRANSCRIPT_MESSAGES).toBeGreaterThan(0);
+  });
+
+  it('truncates input to the most recent MAX_TRANSCRIPT_MESSAGES messages', () => {
+    // Build a history twice the cap; the oldest half should be dropped.
+    const messages: Content[] = [];
+    for (let i = 0; i < MAX_TRANSCRIPT_MESSAGES * 2; i++) {
+      messages.push({
+        role: 'user',
+        parts: [{ text: `msg-${i}` }],
+      });
+    }
+    const result = buildClassifierContents(messages, makeRegistry({}), {
+      toolName: 'read_file',
+      toolParams: { path: 'x.ts' },
+    });
+    const serialized = JSON.stringify(result);
+    // The oldest message must NOT appear — got dropped by truncation.
+    expect(serialized).not.toContain('"msg-0"');
+    // Earliest retained message is at index N (where 2N is total input).
+    expect(serialized).toContain(`"msg-${MAX_TRANSCRIPT_MESSAGES}"`);
+    // Most-recent message must appear.
+    expect(serialized).toContain(`"msg-${MAX_TRANSCRIPT_MESSAGES * 2 - 1}"`);
+  });
+
+  it('passes through history shorter than the cap unchanged', () => {
+    const messages: Content[] = [
+      { role: 'user', parts: [{ text: 'first' }] },
+      { role: 'user', parts: [{ text: 'second' }] },
+    ];
+    const result = buildClassifierContents(messages, makeRegistry({}), {
+      toolName: 'read_file',
+      toolParams: { path: 'x.ts' },
+    });
+    const serialized = JSON.stringify(result);
+    expect(serialized).toContain('first');
+    expect(serialized).toContain('second');
+  });
 });
--- a/packages/core/src/permissions/classifier-transcript.ts
+++ b/packages/core/src/permissions/classifier-transcript.ts
@ -48,7 +48,15 @@ export interface PendingAction {
 * while preserving enough of the recent action chain for the classifier to
 * apply its "untrusted tool-output" rule across a multi-step interaction.
 */
-const MAX_TRANSCRIPT_MESSAGES = 40;
+/**
+ * Maximum number of session messages forwarded to the classifier as
+ * context. Exported so the scheduler / ACP session paths can request
+ * exactly this slice via `getHistoryTail(MAX_TRANSCRIPT_MESSAGES)`
+ * rather than hardcoding `40` — keeping the constant single-sourced
+ * means tuning the window doesn't require lockstep edits across
+ * three files.
+ */
+export const MAX_TRANSCRIPT_MESSAGES = 40;

 /**
 * Build the `contents` array for the classifier sideQuery call.
--- a/packages/core/src/permissions/classifier.test.ts
+++ b/packages/core/src/permissions/classifier.test.ts
@ -12,7 +12,11 @@ vi.mock('../utils/sideQuery.js', () => ({
  runSideQuery: (...args: unknown[]) => runSideQueryMock(...args),
 }));

-import { classifyAction, type ClassifierInput } from './classifier.js';
+import {
+  classifyAction,
+  sanitizeClassifierReason,
+  type ClassifierInput,
+} from './classifier.js';
 import type { Config } from '../config/config.js';
 import type { ToolRegistry } from '../tools/tool-registry.js';

@ -211,3 +215,54 @@ describe('classifier configuration', () => {
 // Context-overflow detection now delegated to the shared
 // `isContextLengthExceededError` utility; tests covering its behavior live
 // alongside that module (utils/contextLengthError.test.ts).
+
+describe('sanitizeClassifierReason', () => {
+  // Security-critical: the classifier reason is LLM-generated and gets
+  // interpolated into the main model's tool-error message. A hostile
+  // reason can stage a prompt injection if not sanitized.
+
+  it('passes empty / falsy through unchanged', () => {
+    expect(sanitizeClassifierReason('')).toBe('');
+  });
+
+  it('strips simple pseudo-tags like <system>...</system>', () => {
+    expect(sanitizeClassifierReason('safe <system>danger</system> tail')).toBe(
+      'safe danger tail',
+    );
+  });
+
+  it('iterates strip until stable — no complete <...> tag can survive', () => {
+    // The threat is a pseudo-tag like `<system>...` confusing the
+    // downstream model. A single /<[^>]*>/g pass on a nested input
+    // like `<scr<script>extra>` leaves `>` orphaned tokens which is
+    // fine — what must NOT survive is any complete `<...>` pair.
+    const result = sanitizeClassifierReason('<scr<script>extra>payload');
+    expect(result).not.toMatch(/<[^>]*>/);
+  });
+
+  it('bounds iteration so adversarial inputs cannot create unbounded work', () => {
+    // 8-iteration cap means the function is O(n) regardless of how the
+    // attacker structures the input. Even a degenerate string with many
+    // overlapping tags terminates promptly.
+    const adversarial = '<a'.repeat(2000) + '>'.repeat(2000);
+    const t0 = Date.now();
+    sanitizeClassifierReason(adversarial);
+    expect(Date.now() - t0).toBeLessThan(1000);
+  });
+
+  it('collapses whitespace and newlines to single spaces', () => {
+    expect(sanitizeClassifierReason('line1\nline2\n\n\nline3')).toBe(
+      'line1 line2 line3',
+    );
+  });
+
+  it('hard-caps length at 200 characters', () => {
+    expect(sanitizeClassifierReason('a'.repeat(500)).length).toBe(200);
+  });
+
+  it('trims surrounding whitespace after collapse', () => {
+    expect(sanitizeClassifierReason('   leading  trailing   ')).toBe(
+      'leading trailing',
+    );
+  });
+});
--- a/packages/core/src/permissions/classifier.ts
+++ b/packages/core/src/permissions/classifier.ts
@ -186,6 +186,12 @@ export async function classifyAction(
  }

  if (!stage1.shouldBlock) {
+    // Audit-trail at debug level (off by default, on when investigating
+    // "why was this dangerous command allowed"). Info would be noise on
+    // every non-fast-path AUTO call; debug is grep-able when needed.
+    debugLogger.debug(
+      `ALLOW stage=fast tool=${input.toolName} durationMs=${Date.now() - overallStart}`,
+    );
    return {
      shouldBlock: false,
      reason: '',
@ -234,6 +240,14 @@ export async function classifyAction(
    };
  }

+  // Audit-trail at debug level for the stage-2 verdict — both ALLOW
+  // (where stage 1 flagged but stage 2 cleared) and the implicit BLOCK
+  // (stage 2 confirmed). The reason+thinking already carry the full
+  // explanation; this line just makes the verdict grep-able.
+  debugLogger.debug(
+    `${stage2.shouldBlock ? 'BLOCK' : 'ALLOW'} stage=thinking ` +
+      `tool=${input.toolName} durationMs=${Date.now() - overallStart}`,
+  );
  return {
    shouldBlock: stage2.shouldBlock,
    // Stage-2 reason is LLM-generated and ends up interpolated into the
--- a/packages/core/src/permissions/index.ts
+++ b/packages/core/src/permissions/index.ts
@ -11,9 +11,11 @@ export type { PermissionManagerConfig } from './permission-manager.js';
 export { extractShellOperations } from './shell-semantics.js';
 export type { ShellOperation } from './shell-semantics.js';
 export {
+  applyAutoModeDecision,
  evaluateAutoMode,
  formatClassifierBlockMessage,
  type AutoModeDecision,
+  type AutoModeOutcome,
  type EvaluateAutoModeInput,
  SAFE_TOOL_ALLOWLIST,
  isInSafeToolAllowlist,
@ -33,3 +35,4 @@ export {
  resetDenialState,
  shouldFallback,
 } from './denialTracking.js';
+export { MAX_TRANSCRIPT_MESSAGES } from './classifier-transcript.js';
--- a/packages/core/src/subagents/subagent-manager-override.test.ts
+++ b/packages/core/src/subagents/subagent-manager-override.test.ts
@ -140,7 +140,7 @@ describe('SubagentManager.buildSubagentContextOverride bound-tool isolation', ()
    (parent as any).toolRegistry = parentRegistry;

    // Layer 1: actual createApprovalModeOverride (sets the marker).
-    const upstreamWrapper = await createApprovalModeOverride(
+    const { config: upstreamWrapper } = await createApprovalModeOverride(
      parent,
      ApprovalMode.AUTO_EDIT,
    );
--- a/packages/core/src/tools/agent/agent-override.test.ts
+++ b/packages/core/src/tools/agent/agent-override.test.ts
@ -53,7 +53,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
    // eslint-disable-next-line @typescript-eslint/no-explicit-any
    (parent as any).toolRegistry = parentRegistry;

-    const child = await createApprovalModeOverride(
+    const { config: child } = await createApprovalModeOverride(
      parent,
      ApprovalMode.AUTO_EDIT,
    );
@ -71,7 +71,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
    // eslint-disable-next-line @typescript-eslint/no-explicit-any
    (parent as any).toolRegistry = parentRegistry;

-    const child = await createApprovalModeOverride(
+    const { config: child } = await createApprovalModeOverride(
      parent,
      ApprovalMode.AUTO_EDIT,
    );
@ -110,7 +110,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
    // eslint-disable-next-line @typescript-eslint/no-explicit-any
    (parent as any).toolRegistry = parentRegistry;

-    const child = await createApprovalModeOverride(
+    const { config: child } = await createApprovalModeOverride(
      parent,
      ApprovalMode.AUTO_EDIT,
    );
@ -140,7 +140,10 @@ describe('createApprovalModeOverride bound-tool isolation', () => {

    expect(parent.getApprovalMode()).toBe(ApprovalMode.DEFAULT);

-    const child = await createApprovalModeOverride(parent, ApprovalMode.YOLO);
+    const { config: child } = await createApprovalModeOverride(
+      parent,
+      ApprovalMode.YOLO,
+    );
    expect(child.getApprovalMode()).toBe(ApprovalMode.YOLO);

    const childEdit = await child.getToolRegistry().ensureTool(ToolNames.EDIT);
@ -162,7 +165,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
    // hook is reachable by introspecting the parent registry first.
    const beforeNames = parentRegistry.getAllToolNames().sort();

-    const child = await createApprovalModeOverride(
+    const { config: child } = await createApprovalModeOverride(
      parent,
      ApprovalMode.AUTO_EDIT,
    );
@ -199,7 +202,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
    // eslint-disable-next-line @typescript-eslint/no-explicit-any
    (parentNonBare as any).toolRegistry = parentNonBareRegistry;

-    const childNonBare = await createApprovalModeOverride(
+    const { config: childNonBare } = await createApprovalModeOverride(
      parentNonBare,
      ApprovalMode.AUTO_EDIT,
    );
@ -228,7 +231,7 @@ describe('createApprovalModeOverride bound-tool isolation', () => {
      // eslint-disable-next-line @typescript-eslint/no-explicit-any
      (parent as any).toolRegistry = parentRegistry;

-      const upstream = await createApprovalModeOverride(
+      const { config: upstream } = await createApprovalModeOverride(
        parent,
        ApprovalMode.AUTO_EDIT,
      );
--- a/packages/core/src/tools/agent/agent.ts
+++ b/packages/core/src/tools/agent/agent.ts
@ -338,6 +338,20 @@ export async function rebuildToolRegistryOnOverride(
  ov[TOOL_REGISTRY_REBUILT] = true;
 }

+/**
+ * Handle returned by {@link createApprovalModeOverride}.
+ *
+ * The `cleanup` callback MUST be invoked in a `finally` block after the
+ * sub-agent lifecycle ends. It restores the parent PermissionManager's
+ * dangerous allow rules if and only if this override was responsible
+ * for stripping them — see {@link createApprovalModeOverride} below
+ * for the cases.
+ */
+export interface ApprovalModeOverrideHandle {
+  config: Config;
+  cleanup: () => void;
+}
+
 /**
 * Creates a Config override with a different approval mode.
 *
@ -348,28 +362,58 @@ export async function rebuildToolRegistryOnOverride(
 * instances continue to resolve `this.config` to the parent, defeating
 * per-Config isolation of FileReadCache / approval mode for any code
 * path that goes through the bound tool.
+ *
+ * Returns `{ config, cleanup }`. Callers MUST invoke `cleanup` in a
+ * `finally` block after the override is no longer in use, otherwise
+ * the parent's PermissionManager may leak a strip across the sub-agent
+ * boundary (see strip lifecycle below).
+ *
+ * Strip lifecycle for AUTO overrides:
+ *   - parent not in AUTO, override in AUTO: this function strips the
+ *     PARENT's PM (shared via prototype chain — the override cannot
+ *     have its own PM without a much bigger refactor). `cleanup`
+ *     restores the strip when the sub-agent finishes, but ONLY if the
+ *     parent hasn't itself entered AUTO in the meantime (in which
+ *     case restoring would undo the parent's own strip).
+ *   - parent already in AUTO, override in AUTO: parent's
+ *     `setApprovalMode` already stripped on its own entry. We don't
+ *     strip again (would be a no-op anyway via sentinel) and don't
+ *     restore on cleanup (lifecycle is parent-owned).
+ *   - override not in AUTO: no strip, no restore.
 */
 export async function createApprovalModeOverride(
  base: Config,
  mode: ApprovalMode,
-): Promise<Config> {
+): Promise<ApprovalModeOverrideHandle> {
  // eslint-disable-next-line @typescript-eslint/no-explicit-any
  const override = Object.create(base) as any;
  override.getApprovalMode = (): ApprovalMode => mode;
  await rebuildToolRegistryOnOverride(override as Config, base);

-  // AUTO mode requires dangerous allow rules (broad Bash / Agent / Skill)
-  // to be stripped so the classifier still gates them. The normal entry
-  // hook is `Config.setApprovalMode` (config.ts:~2560); the lightweight
-  // override path bypasses that, so we trigger the strip directly here.
-  // The strip is idempotent (sentinel-guarded), shared with the parent
-  // PermissionManager, and persists until ApprovalMode is toggled away
-  // from AUTO — same lifecycle as the top-level entry path.
+  let cleanup: () => void = () => {};
+
  if (mode === ApprovalMode.AUTO) {
-    base.getPermissionManager?.()?.stripDangerousRulesForAutoMode();
+    const baseWasAuto = base.getApprovalMode() === ApprovalMode.AUTO;
+    if (!baseWasAuto) {
+      // This override is bringing AUTO into a non-AUTO parent. Strip
+      // dangerous allow rules so the sub-agent's classifier actually
+      // gates them, then arrange to restore on cleanup.
+      base.getPermissionManager?.()?.stripDangerousRulesForAutoMode();
+      cleanup = () => {
+        // Defensive: parent could have toggled to AUTO during the sub-
+        // agent's run. In that case parent now owns the strip lifecycle
+        // (its own `setApprovalMode(AUTO)` hook was responsible) and we
+        // must NOT restore — that would un-strip the parent's intent.
+        if (base.getApprovalMode() !== ApprovalMode.AUTO) {
+          base.getPermissionManager?.()?.restoreDangerousRules();
+        }
+      };
+    }
+    // baseWasAuto: parent's setApprovalMode already stripped; cleanup
+    // stays no-op since lifecycle is parent-owned.
  }

-  return override as Config;
+  return { config: override as Config, cleanup };
 }

 /**
@ -1330,6 +1374,13 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
      return '';
    };

+    // Hoisted so the outer catch can restore parent PermissionManager
+    // state when an exception lands between `createApprovalModeOverride`
+    // and the fg / bg / fork inner finallys (e.g. worktree provisioning
+    // or `createAgentHeadless` throw). Assigned only after the override
+    // is created; stays a no-op for any earlier failure.
+    let restoreParentPM: () => void = () => {};
+
    try {
      const isFork = !this.params.subagent_type;
      let subagentConfig: SubagentConfig;
@ -1549,10 +1600,11 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
      // resolve `this.config` to the parent, reaching the parent's
      // FileReadCache rather than the subagent's. See
      // `createApprovalModeOverride` above for details.
-      const agentConfig = await createApprovalModeOverride(
+      const { config: agentConfig, cleanup } = await createApprovalModeOverride(
        this.config,
        resolvedApprovalMode,
      );
+      restoreParentPM = cleanup;

      // ── Optional worktree isolation (Phase 2: rebind cwd) ─────────
      // Rebind every "where am I?" surface on the agent's Config
@ -1975,6 +2027,11 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
              .getToolRegistry()
              .stop()
              .catch(() => {});
+            // Restore parent PermissionManager's dangerous allow rules
+            // if this AUTO override stripped them. Background path:
+            // restore fires when the bg agent terminates (complete /
+            // fail / cancel), not when this outer execute() returns.
+            restoreParentPM();
          }
        };
        // Wrap in the agent-identity frame so nested `agent` tool calls
@ -2037,6 +2094,11 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
                .getToolRegistry()
                .stop()
                .catch(() => {});
+              // Restore parent PM's dangerous allow rules if this AUTO
+              // override stripped them. Fork-async path: restore fires
+              // when the fork body terminates, not when the outer
+              // execute() returns the FORK_PLACEHOLDER_RESULT.
+              restoreParentPM();
            }
          });
        void runInForkContext(runFramedFork);
@ -2298,6 +2360,11 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
          .getToolRegistry()
          .stop()
          .catch(() => {});
+        // Restore parent PermissionManager's dangerous allow rules if
+        // this AUTO override stripped them on creation. No-op for non-
+        // AUTO overrides and for AUTO overrides when parent was already
+        // AUTO. See createApprovalModeOverride strip-lifecycle comment.
+        restoreParentPM();
      }
    } catch (error) {
      const errorMessage =
@ -2322,6 +2389,18 @@ class AgentToolInvocation extends BaseToolInvocation<AgentParams, ToolResult> {
        }
      }

+      // Restore parent PermissionManager if an exception landed between
+      // createApprovalModeOverride and the inner fg/bg/fork finallys.
+      // No-op when restoreParentPM is still the hoisted default (e.g.
+      // when createApprovalModeOverride itself threw).
+      try {
+        restoreParentPM();
+      } catch (restoreError) {
+        debugLogger.warn(
+          `[AgentTool] restoreParentPM after error failed: ${restoreError}`,
+        );
+      }
+
      const errorDisplay: AgentResultDisplay = {
        ...this.currentDisplay!,
        status: 'failed',
--- a/packages/core/src/utils/forkedAgent.ts
+++ b/packages/core/src/utils/forkedAgent.ts
@ -379,10 +379,12 @@ export async function runForkedAgent(
  // wrapper's bound tools resolve `this.config.getPermissionManager()`
  // through the prototype chain to the scoped wrapper's own override,
  // while `this.config.getApprovalMode()` lands on YOLO.
-  const yoloConfig = await createApprovalModeOverride(
-    params.config,
-    ApprovalMode.YOLO,
-  );
+  const { config: yoloConfig, cleanup: restoreParentPM } =
+    await createApprovalModeOverride(params.config, ApprovalMode.YOLO);
+  // YOLO never triggers strip → restoreParentPM is a no-op. Kept for
+  // API symmetry with the other createApprovalModeOverride callers; if
+  // this function ever switches away from YOLO the lifecycle stays
+  // correct without further refactor.
  const filesTouched = new Set<string>();

  const emitter = new AgentEventEmitter();
@ -460,5 +462,6 @@ export async function runForkedAgent(
      .getToolRegistry()
      .stop()
      .catch(() => {});
+    restoreParentPM();
  }
 }