pi-mono/packages/coding-agent/docs/models.md
Mario Zechner 7adb8e7634
Some checks are pending
CI / build-check-test (push) Waiting to run
feat(ai): add Together AI provider
2026-05-08 16:44:18 +02:00

474 lines
16 KiB
Markdown

# Custom Models
Add custom providers and models (Ollama, vLLM, LM Studio, proxies) via `~/.pi/agent/models.json`.
## Table of Contents
- [Minimal Example](#minimal-example)
- [Full Example](#full-example)
- [Supported APIs](#supported-apis)
- [Provider Configuration](#provider-configuration)
- [Model Configuration](#model-configuration)
- [Overriding Built-in Providers](#overriding-built-in-providers)
- [Per-model Overrides](#per-model-overrides)
- [Anthropic Messages Compatibility](#anthropic-messages-compatibility)
- [OpenAI Compatibility](#openai-compatibility)
## Minimal Example
For local models (Ollama, LM Studio, vLLM), only `id` is required per model:
```json
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"models": [
{ "id": "llama3.1:8b" },
{ "id": "qwen2.5-coder:7b" }
]
}
}
}
```
The `apiKey` is required but Ollama ignores it, so any value works.
Some OpenAI-compatible servers do not understand the `developer` role used for reasoning-capable models. For those providers, set `compat.supportsDeveloperRole` to `false` so pi sends the system prompt as a `system` message instead. If the server also does not support `reasoning_effort`, set `compat.supportsReasoningEffort` to `false` too.
You can set `compat` at the provider level to apply to all models, or at the model level to override a specific model. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers.
```json
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"compat": {
"supportsDeveloperRole": false,
"supportsReasoningEffort": false
},
"models": [
{
"id": "gpt-oss:20b",
"reasoning": true
}
]
}
}
}
```
## Full Example
Override defaults when you need specific values:
```json
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"models": [
{
"id": "llama3.1:8b",
"name": "Llama 3.1 8B (Local)",
"reasoning": false,
"input": ["text"],
"contextWindow": 128000,
"maxTokens": 32000,
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
}
]
}
}
}
```
The file reloads each time you open `/model`. Edit during session; no restart needed.
## Google AI Studio Example
Use `google-generative-ai` with a `baseUrl` to add models from Google AI Studio, including custom Gemma 4 entries:
```json
{
"providers": {
"my-google": {
"baseUrl": "https://generativelanguage.googleapis.com/v1beta",
"api": "google-generative-ai",
"apiKey": "GEMINI_API_KEY",
"models": [
{
"id": "gemma-4-31b-it",
"name": "Gemma 4 31B",
"input": ["text", "image"],
"contextWindow": 262144,
"reasoning": true
}
]
}
}
}
```
The `baseUrl` is required when adding custom models to the `google-generative-ai` API type.
## Supported APIs
| API | Description |
|-----|-------------|
| `openai-completions` | OpenAI Chat Completions (most compatible) |
| `openai-responses` | OpenAI Responses API |
| `anthropic-messages` | Anthropic Messages API |
| `google-generative-ai` | Google Generative AI |
Set `api` at provider level (default for all models) or model level (override per model).
## Provider Configuration
| Field | Description |
|-------|-------------|
| `baseUrl` | API endpoint URL |
| `api` | API type (see above) |
| `apiKey` | API key (see value resolution below) |
| `headers` | Custom headers (see value resolution below) |
| `authHeader` | Set `true` to add `Authorization: Bearer <apiKey>` automatically |
| `models` | Array of model configurations |
| `modelOverrides` | Per-model overrides for built-in models on this provider |
### Value Resolution
The `apiKey` and `headers` fields support three formats:
- **Shell command:** `"!command"` executes and uses stdout
```json
"apiKey": "!security find-generic-password -ws 'anthropic'"
"apiKey": "!op read 'op://vault/item/credential'"
```
- **Environment variable:** Uses the value of the named variable
```json
"apiKey": "MY_API_KEY"
```
- **Literal value:** Used directly
```json
"apiKey": "sk-..."
```
For `models.json`, shell commands are resolved at request time. pi intentionally does not apply built-in TTL, stale reuse, or recovery logic for arbitrary commands. Different commands need different caching and failure strategies, and pi cannot infer the right one.
If your command is slow, expensive, rate-limited, or should keep using a previous value on transient failures, wrap it in your own script or command that implements the caching or TTL behavior you want.
`/model` availability checks use configured auth presence and do not execute shell commands.
### Custom Headers
```json
{
"providers": {
"custom-proxy": {
"baseUrl": "https://proxy.example.com/v1",
"apiKey": "MY_API_KEY",
"api": "anthropic-messages",
"headers": {
"x-portkey-api-key": "PORTKEY_API_KEY",
"x-secret": "!op read 'op://vault/item/secret'"
},
"models": [...]
}
}
}
```
## Model Configuration
| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| `id` | Yes | — | Model identifier (passed to the API) |
| `name` | No | `id` | Human-readable model label. Used for matching (`--model` patterns) and shown in model details/status text. |
| `api` | No | provider's `api` | Override provider's API for this model |
| `reasoning` | No | `false` | Supports extended thinking |
| `thinkingLevelMap` | No | omitted | Maps pi thinking levels to provider values and marks unsupported levels (see below) |
| `input` | No | `["text"]` | Input types: `["text"]` or `["text", "image"]` |
| `contextWindow` | No | `128000` | Context window size in tokens |
| `maxTokens` | No | `16384` | Maximum output tokens |
| `cost` | No | all zeros | `{"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0}` (per million tokens) |
| `compat` | No | provider `compat` | Provider compatibility overrides. Merged with provider-level `compat` when both are set. |
Current behavior:
- `/model` and `--list-models` list entries by model `id`.
- The configured `name` is used for model matching and detail/status text.
### Thinking Level Map
Use `thinkingLevelMap` on a model to describe model-specific thinking controls. Keys are pi thinking levels: `off`, `minimal`, `low`, `medium`, `high`, `xhigh`.
Values are tristate:
| Value | Meaning |
|-------|---------|
| omitted | Level is supported and uses the provider's default mapping |
| string | Level is supported and this value is sent to the provider |
| `null` | Level is unsupported and hidden/skipped/clamped away |
Example for a model that only supports off, high, and max reasoning:
```json
{
"id": "deepseek-v4-pro",
"reasoning": true,
"thinkingLevelMap": {
"minimal": null,
"low": null,
"medium": null,
"high": "high",
"xhigh": "max"
}
}
```
Example for a model where thinking cannot be disabled:
```json
{
"id": "always-thinking-model",
"reasoning": true,
"thinkingLevelMap": {
"off": null
}
}
```
Migration: older configs that used `compat.reasoningEffortMap` should move that mapping to model-level `thinkingLevelMap`. Use `null` for levels that should not appear in the UI.
## Overriding Built-in Providers
Route a built-in provider through a proxy without redefining models:
```json
{
"providers": {
"anthropic": {
"baseUrl": "https://my-proxy.example.com/v1"
}
}
}
```
All built-in Anthropic models remain available. Existing OAuth or API key auth continues to work.
To merge custom models into a built-in provider, include the `models` array:
```json
{
"providers": {
"anthropic": {
"baseUrl": "https://my-proxy.example.com/v1",
"apiKey": "ANTHROPIC_API_KEY",
"api": "anthropic-messages",
"models": [...]
}
}
}
```
Merge semantics:
- Built-in models are kept.
- Custom models are upserted by `id` within the provider.
- If a custom model `id` matches a built-in model `id`, the custom model replaces that built-in model.
- If a custom model `id` is new, it is added alongside built-in models.
## Per-model Overrides
Use `modelOverrides` to customize specific built-in models without replacing the provider's full model list.
```json
{
"providers": {
"openrouter": {
"modelOverrides": {
"anthropic/claude-sonnet-4": {
"name": "Claude Sonnet 4 (Bedrock Route)",
"compat": {
"openRouterRouting": {
"only": ["amazon-bedrock"]
}
}
}
}
}
}
}
```
`modelOverrides` supports these fields per model: `name`, `reasoning`, `input`, `cost` (partial), `contextWindow`, `maxTokens`, `headers`, `compat`.
Behavior notes:
- `modelOverrides` are applied to built-in provider models.
- Unknown model IDs are ignored.
- You can combine provider-level `baseUrl`/`headers` with `modelOverrides`.
- If `models` is also defined for a provider, custom models are merged after built-in overrides. A custom model with the same `id` replaces the overridden built-in model entry.
## Anthropic Messages Compatibility
For providers or proxies using `api: "anthropic-messages"`, use `compat.supportsEagerToolInputStreaming` to control Anthropic fine-grained tool streaming compatibility.
By default pi sends per-tool `eager_input_streaming: true`. If a proxy or Anthropic-compatible backend rejects that field, set `supportsEagerToolInputStreaming` to `false`. Pi will omit `tools[].eager_input_streaming` and send the legacy `fine-grained-tool-streaming-2025-05-14` beta header for tool-enabled requests instead.
```json
{
"providers": {
"anthropic-proxy": {
"baseUrl": "https://proxy.example.com",
"api": "anthropic-messages",
"apiKey": "ANTHROPIC_PROXY_KEY",
"compat": {
"supportsEagerToolInputStreaming": false,
"supportsLongCacheRetention": true
},
"models": [
{
"id": "claude-opus-4-7",
"reasoning": true,
"input": ["text", "image"]
}
]
}
}
}
```
| Field | Description |
|-------|-------------|
| `supportsEagerToolInputStreaming` | Whether the provider accepts per-tool `eager_input_streaming`. Default: `true`. Set to `false` to omit that field and use the legacy fine-grained tool streaming beta header on tool-enabled requests. |
| `supportsLongCacheRetention` | Whether the provider accepts Anthropic long cache retention (`cache_control.ttl: "1h"`) when cache retention is `long`. Default: `true`. |
## OpenAI Compatibility
For providers with partial OpenAI compatibility, use the `compat` field.
- Provider-level `compat` applies defaults to all models under that provider.
- Model-level `compat` overrides provider-level values for that model.
```json
{
"providers": {
"local-llm": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"compat": {
"supportsUsageInStreaming": false,
"maxTokensField": "max_tokens"
},
"models": [...]
}
}
}
```
| Field | Description |
|-------|-------------|
| `supportsStore` | Provider supports `store` field |
| `supportsDeveloperRole` | Use `developer` vs `system` role |
| `supportsReasoningEffort` | Support for `reasoning_effort` parameter |
| `supportsUsageInStreaming` | Supports `stream_options: { include_usage: true }` (default: `true`) |
| `maxTokensField` | Use `max_completion_tokens` or `max_tokens` |
| `requiresToolResultName` | Include `name` on tool result messages |
| `requiresAssistantAfterToolResult` | Insert an assistant message before a user message after tool results |
| `requiresThinkingAsText` | Convert thinking blocks to plain text |
| `requiresReasoningContentOnAssistantMessages` | Include empty `reasoning_content` on all replayed assistant messages when reasoning is enabled |
| `thinkingFormat` | Use `reasoning_effort`, `openrouter`, `deepseek`, `together`, `zai`, `qwen`, or `qwen-chat-template` thinking parameters |
| `cacheControlFormat` | Use Anthropic-style `cache_control` markers on the system prompt, last tool definition, and last user/assistant text content. Currently only `anthropic` is supported. |
| `supportsStrictMode` | Include the `strict` field in tool definitions |
| `supportsLongCacheRetention` | Whether the provider accepts long cache retention when cache retention is `long`: `prompt_cache_retention: "24h"` for OpenAI prompt caching, or `cache_control.ttl: "1h"` when `cacheControlFormat` is `anthropic`. Default: `true`. |
| `openRouterRouting` | OpenRouter provider routing preferences. This object is sent as-is in the `provider` field of the [OpenRouter API request](https://openrouter.ai/docs/guides/routing/provider-selection). |
| `vercelGatewayRouting` | Vercel AI Gateway routing config for provider selection (`only`, `order`) |
`openrouter` uses `reasoning: { effort }`. `together` uses `reasoning: { enabled }` and also `reasoning_effort` when `supportsReasoningEffort` is enabled. `qwen` uses top-level `enable_thinking`. Use `qwen-chat-template` for local Qwen-compatible servers that require `chat_template_kwargs.enable_thinking`.
`cacheControlFormat: "anthropic"` is for OpenAI-compatible providers that expose Anthropic-style prompt caching through `cache_control` markers on text content and tool definitions.
Example:
```json
{
"providers": {
"openrouter": {
"baseUrl": "https://openrouter.ai/api/v1",
"apiKey": "OPENROUTER_API_KEY",
"api": "openai-completions",
"models": [
{
"id": "openrouter/anthropic/claude-3.5-sonnet",
"name": "OpenRouter Claude 3.5 Sonnet",
"compat": {
"openRouterRouting": {
"allow_fallbacks": true,
"require_parameters": false,
"data_collection": "deny",
"zdr": true,
"enforce_distillable_text": false,
"order": ["anthropic", "amazon-bedrock", "google-vertex"],
"only": ["anthropic", "amazon-bedrock"],
"ignore": ["gmicloud", "friendli"],
"quantizations": ["fp16", "bf16"],
"sort": {
"by": "price",
"partition": "model"
},
"max_price": {
"prompt": 10,
"completion": 20
},
"preferred_min_throughput": {
"p50": 100,
"p90": 50
},
"preferred_max_latency": {
"p50": 1,
"p90": 3,
"p99": 5
}
}
}
}
]
}
}
}
```
Vercel AI Gateway example:
```json
{
"providers": {
"vercel-ai-gateway": {
"baseUrl": "https://ai-gateway.vercel.sh/v1",
"apiKey": "AI_GATEWAY_API_KEY",
"api": "openai-completions",
"models": [
{
"id": "moonshotai/kimi-k2.5",
"name": "Kimi K2.5 (Fireworks via Vercel)",
"reasoning": true,
"input": ["text", "image"],
"cost": { "input": 0.6, "output": 3, "cacheRead": 0, "cacheWrite": 0 },
"contextWindow": 262144,
"maxTokens": 262144,
"compat": {
"vercelGatewayRouting": {
"only": ["fireworks", "novita"],
"order": ["fireworks", "novita"]
}
}
}
]
}
}
}
```