Built-in `xiaomi` provider now targets the API billing endpoint (https://api.xiaomimimo.com/anthropic) — a single stable URL for keys issued at platform.xiaomimimo.com. The Token Plan endpoints are exposed as three sibling providers, each with its own env var:
- xiaomi-token-plan-cn: XIAOMI_TOKEN_PLAN_CN_API_KEY
- xiaomi-token-plan-ams: XIAOMI_TOKEN_PLAN_AMS_API_KEY
- xiaomi-token-plan-sgp: XIAOMI_TOKEN_PLAN_SGP_API_KEY
BREAKING CHANGE: users who previously set XIAOMI_API_KEY against the Token Plan AMS endpoint must move to xiaomi-token-plan-ams and set XIAOMI_TOKEN_PLAN_AMS_API_KEY. This also resolves the 401 reported by on #4005, where a platform.xiaomimimo.com key fails against the Token Plan endpoint.
closes#4082
* feat(ai): add Cloudflare AI Gateway as a provider
Routes through Cloudflare's Unified API (`/compat`) for Workers AI and
Anthropic models, and through the provider-specific `/openai` subpath
for OpenAI models so reasoning models (gpt-5.x, o-series) can hit
`/v1/responses` natively. Once `/compat` adds Responses-API support,
the OpenAI subpath can be folded back in.
Catalog layout:
workers-ai/@cf/... -> openai-completions, gateway/.../compat
anthropic/... -> openai-completions, gateway/.../compat
<native-id> -> openai-responses, gateway/.../openai
(gpt-5.1, claude-... no, sorry: gpt-5.x and o-series only;
prefix stripped because the OpenAI SDK posts native ids)
Touches:
packages/ai/src/types.ts add cloudflare-ai-gateway to KnownProvider
packages/ai/src/env-api-keys.ts map to CLOUDFLARE_API_KEY
packages/ai/src/providers/cloudflare.ts add CLOUDFLARE_AI_GATEWAY_COMPAT_BASE_URL
and CLOUDFLARE_AI_GATEWAY_OPENAI_BASE_URL
packages/ai/src/providers/openai-responses.ts one-line dispatch through resolveCloudflareBaseUrl
(matches what openai-completions.ts already does)
packages/ai/scripts/generate-models.ts branch openai/* vs workers-ai/anthropic/*
packages/ai/src/models.generated.ts spliced 34 entries
packages/ai/test/stream.test.ts 3 e2e blocks (one per upstream)
packages/coding-agent/* defaultModelPerProvider, login, env docs,
README, providers.md
Verified end-to-end against a real Cloudflare account with unified
billing: 9/9 e2e tests pass across all three upstreams (Workers AI
Kimi K2.6, OpenAI gpt-5.1 reasoning, Anthropic claude-sonnet-4-5).
* refactor(ai): move AI Gateway User-Agent and per-route session-affinity flag to catalog
Mirrors the same per-model metadata refactor done for Workers AI in the
parent branch. All cloudflare-ai-gateway entries get the User-Agent
header. Only workers-ai/* gateway entries set
`compat.sendSessionAffinityHeaders: true` because the gateway
forwards that header to the underlying Workers AI runtime; anthropic/*
upstream and openai/* (openai-responses) don't use it.
packages/ai/scripts/generate-models.ts: emit headers (always) and
per-upstream compat (workers-ai only) on each cloudflare-ai-gateway
entry.
packages/ai/src/models.generated.ts: re-spliced 35 entries with
headers + conditional compat.
Behavior unchanged - 9/9 e2e tests pass across all three upstream
families.
* fix(ai): align AI Gateway with telemetry-aware UA helper
Adapts to badlogic/pi-mono#3851's follow-up fix ("honor telemetry for
Cloudflare attribution headers", fbb5eed) which moved the
'User-Agent: pi-coding-agent' header out of per-model catalog metadata
and into a centralized telemetry-honoring helper
(coding-agent/src/core/sdk.ts:getAttributionHeaders).
- packages/coding-agent/src/core/sdk.ts: extend the cloudflare branch of
getAttributionHeaders to also match cloudflare-ai-gateway and
gateway.ai.cloudflare.com.
- packages/ai/scripts/generate-models.ts and src/models.generated.ts:
drop 'headers' from the 35 cloudflare-ai-gateway entries (constant
CLOUDFLARE_STATIC_HEADERS no longer exists). Per-route
compat.sendSessionAffinityHeaders is unchanged.
End-to-end behavior unchanged: 9/9 tests still pass across all three
upstream families (Workers AI, Anthropic, OpenAI Responses).
---------
Co-authored-by: Mario Zechner <badlogicgames@gmail.com>
* feat(ai): add Cloudflare Workers AI as a provider
Cloudflare Workers AI hosts open-weight LLMs (Kimi K2.6, GPT-OSS,
GLM-4.7, Llama 4, Gemma 4, Nemotron 3) on Cloudflare's GPU network with
an OpenAI-compatible endpoint. Reuses the openai-completions API
protocol; the per-account URL contains a {CLOUDFLARE_ACCOUNT_ID}
placeholder resolved at request time by a small helper.
Pi automatically sets x-session-affinity for prefix caching:
https://developers.cloudflare.com/workers-ai/features/prompt-caching/
Auth: CLOUDFLARE_API_KEY (matches pi's *_API_KEY convention) +
CLOUDFLARE_ACCOUNT_ID. The User-Agent identifies traffic as
'pi-coding-agent' in Cloudflare analytics.
Verified end-to-end against a real Cloudflare account: 17 e2e tests
pass across stream/empty/tokens/unicode/tool-call-without-result/
total-tokens against @cf/moonshotai/kimi-k2.6.
Cloudflare AI Gateway is a separate, larger change (it requires routing
through provider-specific subpaths with the matching API protocol per
upstream) and will land in a follow-up PR.
* refactor(ai): move Cloudflare User-Agent and session-affinity flag to per-model metadata
Instead of conditionally setting them in openai-completions.ts based on
provider detection, declare them as model-level fields in the catalog
(headers + compat). This is consistent with how the github-copilot and
kimi-coding entries already declare their static headers.
packages/ai/scripts/generate-models.ts: emit headers and compat fields
on each cloudflare-workers-ai entry (CLOUDFLARE_STATIC_HEADERS).
packages/ai/src/providers/openai-completions.ts: drop the
isCloudflareProvider conditional that injected User-Agent and the
isCloudflareWorkersAI override of sendSessionAffinityHeaders.
packages/ai/src/models.generated.ts: re-spliced 8 cloudflare-workers-ai
entries with headers + compat.
Behavior is unchanged - verified via fetch interceptor that User-Agent
and x-session-affinity / session_id / x-client-request-id are still sent
on outbound requests. 5/5 e2e tests pass.
Add AWS_BEDROCK_FORCE_CACHE=1 environment variable support. When the
model ID doesn't contain a recognizable Claude model name, users can
set this variable to force cache point injection.
The process.env access was outside the typeof process check, which
would throw in browser environments. Moved inside the Node.js/Bun
block for consistency with other env var access.
Also added changelog entry for #1320 and improved docs clarity.
API keys in auth.json now support the same resolution as models.json:
- Shell command: "\!command" executes and uses stdout (cached)
- Environment variable: uses the value of the named variable
- Literal value: used directly
Extracted shared resolveConfigValue() to new resolve-config-value.ts module.
- Add kimi-coding provider using Anthropic Messages API
- API endpoint: https://api.kimi.com/coding/v1
- Environment variable: KIMI_API_KEY
- Models: kimi-k2-thinking (text), k2p5 (text + image)
- Add context overflow detection pattern for Kimi errors
- Add tests for all standard test suites
- Add huggingface to KnownProvider type
- Add HF_TOKEN env var mapping
- Process huggingface models from models.dev (14 models)
- Use openai-completions API with compat settings
- Add tests for all provider test suites
- Update documentation
fixes#994