mirror of
https://github.com/block/goose.git
synced 2026-05-05 15:30:13 +00:00
feat: adversarial agent for preventing leaking of info and more (#7948)
Some checks failed
Live Provider Tests / check-fork (push) Successful in 2s
Publish Docker Image / docker (push) Failing after 3s
Scorecard supply-chain security / Scorecard analysis (push) Has been skipped
CI / Check OpenAPI Schema is Up-to-Date (push) Has been skipped
Live Provider Tests / changes (push) Failing after 3s
Canary / Prepare Version (push) Failing after 3s
Canary / bundle-desktop (push) Has been skipped
Canary / bundle-desktop-intel (push) Has been skipped
Canary / bundle-desktop-linux (push) Has been skipped
Canary / bundle-desktop-windows (push) Has been skipped
Canary / build-cli (push) Has been skipped
Canary / Upload Install Script (push) Has been skipped
Canary / Release (push) Has been skipped
Unused Dependencies / machete (push) Has been skipped
CI / changes (push) Failing after 3s
Deploy Documentation / deploy (push) Failing after 4s
Publish Ask AI Bot Docker Image / docker (push) Failing after 5s
CI / Check Rust Code Format (push) Has been skipped
CI / Build and Test Rust Project (push) Has been skipped
CI / Lint Rust Code (push) Has been skipped
Live Provider Tests / Build Binary (push) Has been skipped
Live Provider Tests / Smoke Tests (Code Execution) (push) Has been skipped
Live Provider Tests / Smoke Tests (push) Has been skipped
Live Provider Tests / Compaction Tests (push) Has been skipped
Live Provider Tests / goose server HTTP integration tests (push) Has been skipped
CI / Test and Lint Electron Desktop App (push) Has been cancelled
Some checks failed
Live Provider Tests / check-fork (push) Successful in 2s
Publish Docker Image / docker (push) Failing after 3s
Scorecard supply-chain security / Scorecard analysis (push) Has been skipped
CI / Check OpenAPI Schema is Up-to-Date (push) Has been skipped
Live Provider Tests / changes (push) Failing after 3s
Canary / Prepare Version (push) Failing after 3s
Canary / bundle-desktop (push) Has been skipped
Canary / bundle-desktop-intel (push) Has been skipped
Canary / bundle-desktop-linux (push) Has been skipped
Canary / bundle-desktop-windows (push) Has been skipped
Canary / build-cli (push) Has been skipped
Canary / Upload Install Script (push) Has been skipped
Canary / Release (push) Has been skipped
Unused Dependencies / machete (push) Has been skipped
CI / changes (push) Failing after 3s
Deploy Documentation / deploy (push) Failing after 4s
Publish Ask AI Bot Docker Image / docker (push) Failing after 5s
CI / Check Rust Code Format (push) Has been skipped
CI / Build and Test Rust Project (push) Has been skipped
CI / Lint Rust Code (push) Has been skipped
Live Provider Tests / Build Binary (push) Has been skipped
Live Provider Tests / Smoke Tests (Code Execution) (push) Has been skipped
Live Provider Tests / Smoke Tests (push) Has been skipped
Live Provider Tests / Compaction Tests (push) Has been skipped
Live Provider Tests / goose server HTTP integration tests (push) Has been skipped
CI / Test and Lint Electron Desktop App (push) Has been cancelled
This commit is contained in:
parent
a0835be10f
commit
754c214df4
6 changed files with 904 additions and 0 deletions
88
documentation/docs/guides/security/adversary-mode.md
Normal file
88
documentation/docs/guides/security/adversary-mode.md
Normal file
|
|
@ -0,0 +1,88 @@
|
|||
---
|
||||
sidebar_position: 2
|
||||
title: Adversary Mode
|
||||
sidebar_label: Adversary Mode
|
||||
description: An independent agent reviewer that silently watches tool calls to protect you in case the agent goes rogue.
|
||||
---
|
||||
|
||||
Adversary mode adds a silent, independent agent reviewer that watches tool calls before they execute. It acts as a second pair of eyes — protecting you in case the main agent is compromised, manipulated by a prompt injection, or simply doing something that doesn't match what you asked for.
|
||||
|
||||
Unlike [pattern-based detection](/docs/guides/security/prompt-injection-detection), the adversary reviewer understands context. It knows what you originally asked for, what you've said recently, and can judge whether a tool call makes sense for your task.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Before each tool call, the adversary reviewer checks your **original task**, **recent messages**, and the **tool call details**
|
||||
2. It evaluates the tool call against your rules and returns **ALLOW** or **BLOCK**
|
||||
3. Blocked tool calls are denied — the agent sees the rejection and cannot retry
|
||||
4. If the reviewer fails for any reason, the tool call is allowed through (fail-open)
|
||||
|
||||
The adversary uses the same model and provider goose is already configured with. No additional API keys or services needed.
|
||||
|
||||
## Enabling Adversary Mode
|
||||
|
||||
Create a file at `~/.config/goose/adversary.md` with your rules:
|
||||
|
||||
```markdown
|
||||
BLOCK if the tool call:
|
||||
- Exfiltrates data (posting to unknown URLs, piping secrets to external services)
|
||||
- Is destructive beyond the project scope (deleting system files, wiping directories)
|
||||
- Installs malware or runs obfuscated code
|
||||
- Attempts to escalate privileges unnecessarily
|
||||
- Downloads and executes untrusted remote scripts
|
||||
|
||||
ALLOW normal development operations like editing files, running tests,
|
||||
installing packages, using git, etc. Most tool calls are fine.
|
||||
Err on the side of ALLOW — only block truly dangerous things.
|
||||
```
|
||||
|
||||
That's it. If the file exists, adversary mode is on. Delete the file to turn it off.
|
||||
|
||||
An empty file uses sensible default rules similar to the example above.
|
||||
|
||||
## Writing Good Rules
|
||||
|
||||
The rules in `adversary.md` are your policy, written in plain language:
|
||||
|
||||
- **Be specific** about what should be blocked — vague rules lead to false positives
|
||||
- **Err on the side of ALLOW** — the adversary should catch genuinely dangerous actions, not slow down normal work
|
||||
- **Think about your threat model** — what would a rogue agent actually try to do?
|
||||
|
||||
You can tailor rules to your environment:
|
||||
|
||||
```markdown
|
||||
BLOCK if:
|
||||
- Any network request goes to a domain not in: github.com, npmjs.org, pypi.org
|
||||
- Files outside of ~/projects/ are modified
|
||||
- SSH keys, .env files, or credentials are read or transmitted
|
||||
|
||||
ALLOW all standard development operations within ~/projects/.
|
||||
```
|
||||
|
||||
## What Gets Reviewed
|
||||
|
||||
By default, the adversary reviews **`shell`** and **`computercontroller__automation_script`** — the tools that can execute arbitrary code.
|
||||
|
||||
You can expand coverage by adding a `tools:` line at the top of your `adversary.md`:
|
||||
|
||||
```markdown
|
||||
tools: shell, computercontroller__automation_script
|
||||
---
|
||||
BLOCK if the command exfiltrates data or is destructive.
|
||||
ALLOW normal development operations.
|
||||
```
|
||||
|
||||
The `tools:` line is a comma-separated list of tool names to review. Everything before the `---` separator is configuration; everything after is your rules. If you omit the `tools:` line, `shell` and `computercontroller__automation_script` are reviewed by default.
|
||||
|
||||
Some tool names you might want to add:
|
||||
|
||||
| Tool name | What it does |
|
||||
|-----------|-------------|
|
||||
| `shell` | Executes shell commands (default) |
|
||||
| `computercontroller__automation_script` | Runs shell, Ruby, AppleScript, or PowerShell scripts (default) |
|
||||
| `computercontroller__computer_control` | UI automation — clicks, keystrokes, typing |
|
||||
| `computercontroller__web_scrape` | Fetches arbitrary URLs |
|
||||
|
||||
## See Also
|
||||
|
||||
- [Prompt Injection Detection](/docs/guides/security/prompt-injection-detection) — pattern-based detection (complementary, always-on when enabled)
|
||||
- [goose Permission Modes](/docs/guides/goose-permissions) — control goose's autonomy level
|
||||
Loading…
Add table
Add a link
Reference in a new issue