mirror of https://github.com/anomalyco/opencode.git synced 2026-05-22 19:55:11 +00:00

History

Kit Langton e92c4fb460 chore: drop dead imports across opencode/core/llm (#28720 )		2026-05-21 17:03:12 -04:00
..
example	Refactor LLM route-first provider API (#28523 )	2026-05-20 20:15:52 -04:00
script	chore: generate	2026-05-08 20:57:36 +00:00
src	chore: drop dead imports across opencode/core/llm (#28720 )	2026-05-21 17:03:12 -04:00
test	chore: drop dead imports across opencode/core/llm (#28720 )	2026-05-21 17:03:12 -04:00
AGENTS.md	Refactor LLM route-first provider API (#28523 )	2026-05-20 20:15:52 -04:00
package.json	sync release versions for v1.15.7	2026-05-21 15:51:26 +00:00
README.md	Refactor LLM route-first provider API (#28523 )	2026-05-20 20:15:52 -04:00
sst-env.d.ts	sync	2026-05-10 02:17:32 -04:00
tsconfig.json

README.md

@opencode-ai/llm

Schema-first LLM core for opencode. One typed request, response, event, and tool language; provider quirks live in adapters, not in calling code.

import { Effect } from "effect"
import { LLM, LLMClient } from "@opencode-ai/llm"
import { OpenAI } from "@opencode-ai/llm/providers"

const model = OpenAI.configure({ apiKey: process.env.OPENAI_API_KEY }).responses("gpt-4o-mini")

const request = LLM.request({
  model,
  system: "You are concise.",
  prompt: "Say hello in one short sentence.",
  generation: { maxTokens: 40 },
})

const program = Effect.gen(function* () {
  const response = yield* LLMClient.generate(request)
  console.log(response.text)
})

Run LLMClient.stream(request) instead of generate when you want incremental LLMEvents. The event stream is provider-neutral — same shape across OpenAI Chat, OpenAI Responses, Anthropic Messages, Gemini, Bedrock Converse, and any OpenAI-compatible deployment.

Public API

LLM.request({...}) — build a provider-neutral LLMRequest. Accepts ergonomic inputs (system: string, prompt: string) that normalize into the canonical Schema classes.
LLM.generate / LLM.stream — re-exported from LLMClient for one-import use.
Message.user(...) / Message.assistant(...) / Message.tool(...) — message constructors from the canonical schema model.
Model.make(...) / ToolCallPart.make(...) / ToolResultPart.make(...) / ToolDefinition.make(...) — model and tool-related constructors from the canonical schema model.
LLMClient.prepare(request) — compile a request through protocol body construction, validation, and HTTP preparation without sending. Useful for inspection and testing.
LLMEvent.is.* — typed guards (is.textDelta, is.toolCall, is.finish, …) for filtering streams.

Caching

Prompt caching is on by default. Every LLMRequest resolves to cache: "auto" unless the caller opts out with cache: "none". Each protocol translates CacheHints to its wire format (cache_control on Anthropic, cachePoint on Bedrock; OpenAI and Gemini do implicit caching server-side and don't need inline markers — auto is a no-op there).

Auto placement

"auto" places three breakpoints — last tool definition, last system part, latest user message. The last-user-message boundary is the load-bearing detail: in a tool-use loop, a single user turn expands into many assistant/tool round-trips, all sharing that prefix. Caching at that boundary lets every intra-turn API call hit.

The math justifies the default: Anthropic's 5-minute cache write is 1.25× base, read is 0.1×, so a single reuse within 5 minutes already wins. One-shot completions below the per-model minimum-cacheable-token threshold silently no-op on the wire, so the worst case is harmless.

Opting out

LLM.request({
  model,
  system,
  prompt: "one-off question",
  cache: "none",
})

Granular policy

cache: {
  tools?: boolean,
  system?: boolean,
  messages?: "latest-user-message" | "latest-assistant" | { tail: number },
  ttlSeconds?: number,         // ≥ 3600 → 1h on Anthropic/Bedrock; else 5m
}

Manual hints

Inline CacheHint on any text / system / tool / tool-result part overrides automatic placement. The auto policy preserves manual hints; it only fills gaps.

LLM.request({
  model,
  system: [
    { type: "text", text: "stable system prompt", cache: { type: "ephemeral" } },
  ],
  ...
})

Provider behavior table

Protocol	`cache: "auto"`
Anthropic Messages	emits up to 3 `cache_control` markers (4-breakpoint cap enforced)
Bedrock Converse	emits up to 3 `cachePoint` blocks (4-breakpoint cap enforced)
OpenAI Chat / Responses	no-op (implicit caching above 1024 tokens)
Gemini	no-op (implicit caching on 2.5+; explicit `CachedContent` is out-of-band)

Normalized cache usage is read back into response.usage.cacheReadInputTokens and cacheWriteInputTokens across every provider.

Providers

Provider facades configure endpoint/auth/deployment details first, then expose model selectors that take only a model or deployment id. The selected model carries the executable route value used at runtime.

import { OpenAI, CloudflareAIGateway } from "@opencode-ai/llm/providers"

const openai = OpenAI.configure({ apiKey: process.env.OPENAI_API_KEY }).responses("gpt-4o-mini")
const gateway = CloudflareAIGateway.configure({
  accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
  gatewayApiKey: process.env.CLOUDFLARE_API_TOKEN,
}).model("workers-ai/@cf/meta/llama-3.1-8b-instruct")

Included providers: OpenAI, Anthropic, Google (Gemini), Amazon Bedrock, Azure OpenAI, Cloudflare AI Gateway, Cloudflare Workers AI, GitHub Copilot, OpenRouter, xAI, plus generic OpenAI-compatible helpers for DeepSeek, Cerebras, Groq, Fireworks, Together, etc.

Provider options & HTTP overlays

Three escape hatches in order of stability:

generation — portable knobs (maxTokens, temperature, topP, topK, penalties, seed, stop).
providerOptions: { <provider>: {...} } — typed-at-the-facade provider-specific knobs (OpenAI promptCacheKey, Anthropic thinking, Gemini thinkingConfig, OpenRouter routing).
http: { body, headers, query } — last-resort serializable overlays merged into the final HTTP request. Reach for this only when a stable typed path doesn't yet exist.

Route/provider defaults are overridden by request-level values for each axis.

Routes

Adding a new model or deployment is usually 5-15 lines using Route.make({ protocol, endpoint, auth, framing, ... }). The route owns endpoint/auth/framing and the protocol owns body construction plus stream parsing. Transports are reusable IO templates that receive route endpoint/auth at compile time. Capability/catalog metadata lives outside this low-level package; unsupported request shapes fail during protocol lowering. See AGENTS.md for the architectural detail.

Effect

This package is built on Effect. Public methods return Effect or Stream; provide LLMClient.layer for runtime dispatch and import the provider/protocol modules for the routes you use. The example at example/tutorial.ts is a runnable walkthrough.

README.md Unescape Escape