mirror of
https://github.com/block/goose.git
synced 2026-05-02 21:40:58 +00:00
docs: ml-based prompt injection detection (#6627)
Some checks are pending
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Deploy Documentation / deploy (push) Waiting to run
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (Code Execution) (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Documentation Site Preview / cleanup (push) Blocked by required conditions
Publish Docker Image / docker (push) Waiting to run
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
Some checks are pending
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Lint Rust Code (push) Blocked by required conditions
CI / Check OpenAPI Schema is Up-to-Date (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
Deploy Documentation / deploy (push) Waiting to run
Live Provider Tests / check-fork (push) Waiting to run
Live Provider Tests / changes (push) Blocked by required conditions
Live Provider Tests / Build Release Binary (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (push) Blocked by required conditions
Live Provider Tests / Smoke Tests (Code Execution) (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Documentation Site Preview / cleanup (push) Blocked by required conditions
Publish Docker Image / docker (push) Waiting to run
Scorecard supply-chain security / Scorecard analysis (push) Waiting to run
This commit is contained in:
parent
c57c2562a1
commit
4578c77576
5 changed files with 84 additions and 21 deletions
|
|
@ -1,19 +1,24 @@
|
|||
---
|
||||
sidebar_position: 2
|
||||
title: Classification API Specification
|
||||
unlisted: true
|
||||
description: API specification for self-hosting ML-based prompt injection detection endpoints.
|
||||
---
|
||||
|
||||
This document defines the API that Goose uses for ML-based prompt injection detection.
|
||||
This API specification defines the API that goose uses for ML-based [prompt injection detection](/docs/guides/security/prompt-injection-detection).
|
||||
|
||||
## Overview
|
||||
:::info For Self-Hosting Only
|
||||
This API specification is intended as a reference for users who want to self-host their own model and classification endpoint.
|
||||
|
||||
Goose requires a classification endpoint that can analyze text and return a score indicating the likelihood of prompt injection. This API follows the **HuggingFace Inference API format** for text classification, making it compatible with [HuggingFace Inference Endpoints](https://huggingface.co/docs/inference-providers/providers/hf-inference).
|
||||
If you're using an existing inference service like Hugging Face, you can just configure it in your [prompt injection detection](/docs/guides/security/prompt-injection-detection) settings.
|
||||
:::
|
||||
|
||||
goose requires a classification endpoint that can analyze text and return a score indicating the likelihood of prompt injection. This API follows the Hugging Face Inference API format for text classification, making it compatible with [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-providers/providers/hf-inference).
|
||||
|
||||
## Security & Privacy Considerations
|
||||
**Warning:** When using ML-based prompt injection detection, all tool call content and user messages sent for classification will be transmitted to the configured endpoint. This may include sensitive or confidential information.
|
||||
- If you use an external or third-party endpoint (e.g., HuggingFace Inference API, cloud-hosted models), your data will be sent over the network and processed by that service.
|
||||
- If you use an external or third-party endpoint (e.g., Hugging Face Inference API, cloud-hosted models), your data will be sent over the network and processed by that service.
|
||||
- Consider the sensitivity of your data before enabling ML-based detection or selecting an endpoint.
|
||||
- For highly sensitive or regulated data, use a self-hosted endpoint, run BERT models locally (see reference implementation) or ensure your chosen provider meets your security and compliance requirements.
|
||||
- For highly sensitive or regulated data, use a self-hosted endpoint, run BERT models locally or ensure your chosen provider meets your security and compliance requirements.
|
||||
- Review the endpoint's privacy policy and data handling practices.
|
||||
|
||||
## Endpoint
|
||||
|
|
@ -22,7 +27,7 @@ Goose requires a classification endpoint that can analyze text and return a scor
|
|||
|
||||
Analyzes text for prompt injection and returns classification results.
|
||||
|
||||
**Note:** The endpoint path can be configured. For HuggingFace, it's typically `/models/{model-id}`. For custom implementations, it can be any path (e.g., `/classify`, `/v1/classify`).
|
||||
**Note:** The endpoint path can be configured. For Hugging Face, it's typically `/models/{model-id}`. For custom implementations, it can be any path (e.g., `/classify`, `/v1/classify`).
|
||||
|
||||
#### Request
|
||||
|
||||
|
|
@ -68,17 +73,17 @@ Analyzes text for prompt injection and returns classification results.
|
|||
- `"SAFE"` or `"LABEL_0"`: Indicates safe/benign text
|
||||
- Implementations SHOULD return results sorted by score (highest first)
|
||||
|
||||
**Goose's Usage:**
|
||||
- Goose looks for the label with the highest score
|
||||
- If the top label is "INJECTION" (or "LABEL_1"), the score is used as the injection confidence
|
||||
- If the top label is "SAFE" (or "LABEL_0"), Goose uses `1.0 - score` as the injection confidence
|
||||
**goose's Usage:**
|
||||
- goose looks for the label with the highest score
|
||||
- If the top label is `"INJECTION"` (or `"LABEL_1"`), the score is used as the injection confidence
|
||||
- If the top label is `"SAFE"` (or `"LABEL_0"`), goose uses `1.0 - score` as the injection confidence
|
||||
|
||||
#### Status Codes
|
||||
|
||||
- `200 OK`: Successful classification
|
||||
- `400 Bad Request`: Invalid request format
|
||||
- `500 Internal Server Error`: Classification failed
|
||||
- `503 Service Unavailable`: Model is loading (HuggingFace specific)
|
||||
- `503 Service Unavailable`: Model is loading (Hugging Face specific)
|
||||
|
||||
#### Example
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue