mirror of
https://github.com/block/goose.git
synced 2026-05-02 21:40:58 +00:00
docs: ml-based prompt injection detection (#6627)
Some checks are pending
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Deploy Documentation / deploy (push) Waiting to run
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (Code Execution) (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Documentation Site Preview / cleanup (push) Blocked by required conditions
Publish Docker Image / docker (push) Waiting to run
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Some checks are pending
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Deploy Documentation / deploy (push) Waiting to run
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (Code Execution) (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Documentation Site Preview / cleanup (push) Blocked by required conditions
Publish Docker Image / docker (push) Waiting to run
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
This commit is contained in:
parent
c57c2562a1
commit
4578c77576
5 changed files with 84 additions and 21 deletions
|
|
@ -1,4 +1,5 @@
|
|||
---
|
||||
sidebar_position: 1
|
||||
title: Prompt Injection Detection
|
||||
sidebar_label: Prompt Injection Detection
|
||||
description: Protect your workflow by detecting potentially harmful commands before they run.
|
||||
|
|
@ -16,15 +17,17 @@ You can help protect your goose workflows by enabling prompt injection detection
|
|||
- Attempts to access or exfiltrate sensitive data like SSH keys
|
||||
- System modifications that could compromise security
|
||||
|
||||
In addition, you can optionally enable [ML-based scanning](#enhanced-detection-with-machine-learning) using a specified model.
|
||||
|
||||
:::important
|
||||
These checks provide a safeguard, not a guarantee. They detect known patterns but cannot catch all possible threats, especially novel or sophisticated attacks.
|
||||
:::
|
||||
|
||||
## How Detection Works
|
||||
|
||||
When enabled, goose scans tool calls for risky patterns before they run:
|
||||
When enabled, goose uses a multi-layered approach to detect threats before they run:
|
||||
|
||||
1. **Tool call is intercepted and analyzed** - When goose prepares to execute a tool, the security system extracts the tool parameter text and checks it against [threat patterns](https://github.com/block/goose/blob/main/crates/goose/src/security/patterns.rs)
|
||||
1. **Tool call is intercepted and analyzed** - When goose prepares to execute a tool, the security system extracts the tool parameter text and checks it against [threat patterns](https://github.com/block/goose/blob/main/crates/goose/src/security/patterns.rs). If ML-based detection is enabled, it also uses machine learning to analyze the semantic content of the tool call and recent conversation messages to better understand context and reduce false positives.
|
||||
2. **Risk is assessed** - Detected threats are assigned confidence scores
|
||||
3. **Execution pauses** - Threats that exceed your configured threshold need your decision
|
||||
4. **Security alert appears** - The alert displays the confidence level, a description of the finding, and a unique finding ID. For example:
|
||||
|
|
@ -60,15 +63,25 @@ When in doubt, deny.
|
|||
3. Click the `Chat` tab
|
||||
4. Toggle `Enable Prompt Injection Detection` to the on setting
|
||||
5. Optionally adjust the `Detection Threshold` to [configure the sensitivity](#configuring-detection-threshold)
|
||||
6. Optionally enable ML-based detection:
|
||||
1. Toggle `Enable ML-based Detection` to the on setting
|
||||
2. Configure your inference endpoint:
|
||||
- `Endpoint URL`: URL to the classification service (e.g., Hugging Face)
|
||||
- `API Token`: Authentication token if required by your service
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="config" label="goose config file">
|
||||
|
||||
Add these settings to your [`config.yaml`](/docs/guides/config-files):
|
||||
Add security prompt settings to your [`config.yaml`](/docs/guides/config-files):
|
||||
|
||||
```yaml
|
||||
SECURITY_PROMPT_ENABLED: true
|
||||
SECURITY_PROMPT_THRESHOLD: 0.7 # Optional, default is 0.7
|
||||
SECURITY_PROMPT_THRESHOLD: 0.8 # Optional, default is 0.8
|
||||
|
||||
# Optional: Enable ML-based detection (Hugging Face example)
|
||||
SECURITY_PROMPT_CLASSIFIER_ENABLED: true
|
||||
SECURITY_PROMPT_CLASSIFIER_ENDPOINT: "https://router.huggingface.co/hf-inference/models/protectai/deberta-v3-base-prompt-injection-v2"
|
||||
SECURITY_PROMPT_CLASSIFIER_TOKEN: "YOUR_HUGGING_FACE_TOKEN"
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
|
@ -92,10 +105,27 @@ The threshold (0.01-1.0) controls how strict detection is:
|
|||
| **0.70-0.90** | Strict | Working with sensitive data or systems |
|
||||
| **0.90-1.00** | Maximum | High-security environments |
|
||||
|
||||
When the injection prompt detection feature is enabled, the default threshold is 0.7 (recommended for most users).
|
||||
When the injection prompt detection feature is enabled, the default threshold is 0.8 (recommended for most users).
|
||||
|
||||
Lower thresholds mean fewer alerts but might miss threats. Higher thresholds catch more potential issues but may flag legitimate operations. You can control this sensitivity/convenience tradeoff based on your needs.
|
||||
|
||||
## Enhanced Detection with Machine Learning
|
||||
|
||||
By default, prompt injection detection uses pattern matching, but you can optionally enable ML-based detection for improved accuracy and fewer false positives.
|
||||
|
||||
ML-based detection:
|
||||
- Analyzes the semantic content of tool calls and recent messages
|
||||
- Detects sophisticated attacks that patterns might miss
|
||||
- Reduces false positives by understanding conversation context
|
||||
- Requires providing a classification endpoint URL and API token (if required)
|
||||
|
||||
:::warning Privacy Consideration
|
||||
When ML-based detection is enabled, tool call content and recent messages are sent to the configured endpoint for analysis.
|
||||
:::
|
||||
|
||||
#### Self-Hosting ML Detection Endpoints
|
||||
If you want to run your own classification endpoint, see the [Classification API Specification](/docs/guides/security/classification-api-spec) for implementation details. The API follows the Hugging Face Inference API format.
|
||||
|
||||
## See Also
|
||||
|
||||
- [goose Permission Modes](/docs/guides/goose-permissions) - Control goose's autonomy level
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue