goose/documentation/docs/guides/security/adversary-mode.md
Michael Neale 754c214df4
Some checks failed
Canary / Prepare Version (push) Failing after 3s
Canary / bundle-desktop (push) Has been skipped
Canary / bundle-desktop-intel (push) Has been skipped
Canary / bundle-desktop-linux (push) Has been skipped
Canary / bundle-desktop-windows (push) Has been skipped
Canary / build-cli (push) Has been skipped
Canary / Upload Install Script (push) Has been skipped
Canary / Release (push) Has been skipped
Unused Dependencies / machete (push) Has been skipped
CI / changes (push) Failing after 3s
Deploy Documentation / deploy (push) Failing after 4s
Live Provider Tests / check-fork (push) Successful in 2s
Publish Ask AI Bot Docker Image / docker (push) Failing after 5s
Publish Docker Image / docker (push) Failing after 3s
Scorecard supply-chain security / Scorecard analysis (push) Has been skipped
CI / Check Rust Code Format (push) Has been skipped
CI / Build and Test Rust Project (push) Has been skipped
CI / Lint Rust Code (push) Has been skipped
CI / Check OpenAPI Schema is Up-to-Date (push) Has been skipped
Live Provider Tests / changes (push) Failing after 3s
Live Provider Tests / Build Binary (push) Has been skipped
Live Provider Tests / Smoke Tests (Code Execution) (push) Has been skipped
Live Provider Tests / Smoke Tests (push) Has been skipped
Live Provider Tests / Compaction Tests (push) Has been skipped
Live Provider Tests / goose server HTTP integration tests (push) Has been skipped
CI / Test and Lint Electron Desktop App (push) Has been cancelled
feat: adversarial agent for preventing leaking of info and more (#7948)
2026-03-17 06:38:45 +00:00

4 KiB

sidebar_position title sidebar_label description
2 Adversary Mode Adversary Mode An independent agent reviewer that silently watches tool calls to protect you in case the agent goes rogue.

Adversary mode adds a silent, independent agent reviewer that watches tool calls before they execute. It acts as a second pair of eyes — protecting you in case the main agent is compromised, manipulated by a prompt injection, or simply doing something that doesn't match what you asked for.

Unlike pattern-based detection, the adversary reviewer understands context. It knows what you originally asked for, what you've said recently, and can judge whether a tool call makes sense for your task.

How It Works

  1. Before each tool call, the adversary reviewer checks your original task, recent messages, and the tool call details
  2. It evaluates the tool call against your rules and returns ALLOW or BLOCK
  3. Blocked tool calls are denied — the agent sees the rejection and cannot retry
  4. If the reviewer fails for any reason, the tool call is allowed through (fail-open)

The adversary uses the same model and provider goose is already configured with. No additional API keys or services needed.

Enabling Adversary Mode

Create a file at ~/.config/goose/adversary.md with your rules:

BLOCK if the tool call:
- Exfiltrates data (posting to unknown URLs, piping secrets to external services)
- Is destructive beyond the project scope (deleting system files, wiping directories)
- Installs malware or runs obfuscated code
- Attempts to escalate privileges unnecessarily
- Downloads and executes untrusted remote scripts

ALLOW normal development operations like editing files, running tests,
installing packages, using git, etc. Most tool calls are fine.
Err on the side of ALLOW — only block truly dangerous things.

That's it. If the file exists, adversary mode is on. Delete the file to turn it off.

An empty file uses sensible default rules similar to the example above.

Writing Good Rules

The rules in adversary.md are your policy, written in plain language:

  • Be specific about what should be blocked — vague rules lead to false positives
  • Err on the side of ALLOW — the adversary should catch genuinely dangerous actions, not slow down normal work
  • Think about your threat model — what would a rogue agent actually try to do?

You can tailor rules to your environment:

BLOCK if:
- Any network request goes to a domain not in: github.com, npmjs.org, pypi.org
- Files outside of ~/projects/ are modified
- SSH keys, .env files, or credentials are read or transmitted

ALLOW all standard development operations within ~/projects/.

What Gets Reviewed

By default, the adversary reviews shell and computercontroller__automation_script — the tools that can execute arbitrary code.

You can expand coverage by adding a tools: line at the top of your adversary.md:

tools: shell, computercontroller__automation_script
---
BLOCK if the command exfiltrates data or is destructive.
ALLOW normal development operations.

The tools: line is a comma-separated list of tool names to review. Everything before the --- separator is configuration; everything after is your rules. If you omit the tools: line, shell and computercontroller__automation_script are reviewed by default.

Some tool names you might want to add:

Tool name What it does
shell Executes shell commands (default)
computercontroller__automation_script Runs shell, Ruby, AppleScript, or PowerShell scripts (default)
computercontroller__computer_control UI automation — clicks, keystrokes, typing
computercontroller__web_scrape Fetches arbitrary URLs

See Also