goose/documentation/docs/guides/security/adversary-mode.md at 754c214df4245fc29e09d6b0b44a114eeebe46de

vrr/goose

Fork 0

mirror of https://github.com/block/goose.git synced 2026-04-28 11:39:43 +00:00

Michael Neale 754c214df4

Canary / Prepare Version (push) Failing after 3s

Details

Canary / bundle-desktop (push) Has been skipped

Details

Canary / bundle-desktop-intel (push) Has been skipped

Details

Canary / bundle-desktop-linux (push) Has been skipped

Details

Canary / bundle-desktop-windows (push) Has been skipped

Details

Canary / build-cli (push) Has been skipped

Details

Canary / Upload Install Script (push) Has been skipped

Details

Canary / Release (push) Has been skipped

Details

Unused Dependencies / machete (push) Has been skipped

Details

CI / changes (push) Failing after 3s

Details

Deploy Documentation / deploy (push) Failing after 4s

Details

Live Provider Tests / check-fork (push) Successful in 2s

Details

Publish Ask AI Bot Docker Image / docker (push) Failing after 5s

Details

Publish Docker Image / docker (push) Failing after 3s

Details

Scorecard supply-chain security / Scorecard analysis (push) Has been skipped

Details

CI / Check Rust Code Format (push) Has been skipped

Details

CI / Build and Test Rust Project (push) Has been skipped

Details

CI / Lint Rust Code (push) Has been skipped

Details

CI / Check OpenAPI Schema is Up-to-Date (push) Has been skipped

Details

Live Provider Tests / changes (push) Failing after 3s

Details

Live Provider Tests / Build Binary (push) Has been skipped

Details

Live Provider Tests / Smoke Tests (Code Execution) (push) Has been skipped

Details

Live Provider Tests / Smoke Tests (push) Has been skipped

Details

Live Provider Tests / Compaction Tests (push) Has been skipped

Details

Live Provider Tests / goose server HTTP integration tests (push) Has been skipped

Details

CI / Test and Lint Electron Desktop App (push) Has been cancelled

Details

feat: adversarial agent for preventing leaking of info and more (#7948 )

2026-03-17 06:38:45 +00:00

4 KiB

Raw Blame History

sidebar_position	title	sidebar_label	description
2	Adversary Mode	Adversary Mode	An independent agent reviewer that silently watches tool calls to protect you in case the agent goes rogue.

Adversary mode adds a silent, independent agent reviewer that watches tool calls before they execute. It acts as a second pair of eyes — protecting you in case the main agent is compromised, manipulated by a prompt injection, or simply doing something that doesn't match what you asked for.

Unlike pattern-based detection, the adversary reviewer understands context. It knows what you originally asked for, what you've said recently, and can judge whether a tool call makes sense for your task.

How It Works

Before each tool call, the adversary reviewer checks your original task, recent messages, and the tool call details
It evaluates the tool call against your rules and returns ALLOW or BLOCK
Blocked tool calls are denied — the agent sees the rejection and cannot retry
If the reviewer fails for any reason, the tool call is allowed through (fail-open)

The adversary uses the same model and provider goose is already configured with. No additional API keys or services needed.

Enabling Adversary Mode

Create a file at ~/.config/goose/adversary.md with your rules:

BLOCK if the tool call:
- Exfiltrates data (posting to unknown URLs, piping secrets to external services)
- Is destructive beyond the project scope (deleting system files, wiping directories)
- Installs malware or runs obfuscated code
- Attempts to escalate privileges unnecessarily
- Downloads and executes untrusted remote scripts

ALLOW normal development operations like editing files, running tests,
installing packages, using git, etc. Most tool calls are fine.
Err on the side of ALLOW — only block truly dangerous things.

That's it. If the file exists, adversary mode is on. Delete the file to turn it off.

An empty file uses sensible default rules similar to the example above.

Writing Good Rules

The rules in adversary.md are your policy, written in plain language:

Be specific about what should be blocked — vague rules lead to false positives
Err on the side of ALLOW — the adversary should catch genuinely dangerous actions, not slow down normal work
Think about your threat model — what would a rogue agent actually try to do?

You can tailor rules to your environment:

BLOCK if:
- Any network request goes to a domain not in: github.com, npmjs.org, pypi.org
- Files outside of ~/projects/ are modified
- SSH keys, .env files, or credentials are read or transmitted

ALLOW all standard development operations within ~/projects/.

What Gets Reviewed

By default, the adversary reviews shell and computercontroller__automation_script — the tools that can execute arbitrary code.

You can expand coverage by adding a tools: line at the top of your adversary.md:

tools: shell, computercontroller__automation_script
---
BLOCK if the command exfiltrates data or is destructive.
ALLOW normal development operations.

The tools: line is a comma-separated list of tool names to review. Everything before the --- separator is configuration; everything after is your rules. If you omit the tools: line, shell and computercontroller__automation_script are reviewed by default.

Some tool names you might want to add:

Tool name	What it does
`shell`	Executes shell commands (default)
`computercontroller__automation_script`	Runs shell, Ruby, AppleScript, or PowerShell scripts (default)
`computercontroller__computer_control`	UI automation — clicks, keystrokes, typing
`computercontroller__web_scrape`	Fetches arbitrary URLs

4 KiB Raw Blame History

How It Works

Enabling Adversary Mode

Writing Good Rules

What Gets Reviewed

See Also

4 KiB

Raw Blame History