OmniRoute/AGENTS.md
Diego Rodrigues de Sa e Souza 9e45baae58
Some checks are pending
CI / Lint (push) Waiting to run
CI / Build language matrix (push) Waiting to run
CI / i18n Validation (push) Blocked by required conditions
CI / PR Test Policy (push) Waiting to run
CI / Advanced Security Scans (push) Waiting to run
CI / Build (push) Waiting to run
CI / Package Artifact (push) Blocked by required conditions
CI / Unit Tests (push) Blocked by required conditions
CI / Coverage (push) Blocked by required conditions
CI / SonarQube (push) Blocked by required conditions
CI / PR Coverage Comment (push) Blocked by required conditions
CI / E2E Tests (1/4) (push) Blocked by required conditions
CI / E2E Tests (2/4) (push) Blocked by required conditions
CI / E2E Tests (3/4) (push) Blocked by required conditions
CI / E2E Tests (4/4) (push) Blocked by required conditions
CI / Integration Tests (push) Blocked by required conditions
CI / Security Tests (push) Blocked by required conditions
CI / CI Dashboard (push) Blocked by required conditions
Publish to Docker Hub / Build and Push Docker (multi-arch) (push) Waiting to run
chore(release): v3.6.6 — Stabilization (#1241)
* fix(streaming): #1211 greedy strip omniModel tags to prevent literal \n\n artifacts

- Changed regex quantifier from ? to * in combo.ts, comboAgentMiddleware.ts,
  and contextHandoff.ts to greedily strip all JSON-escaped newline sequences
  surrounding <omniModel> tags in SSE streaming chunks
- Added \r to the character class for cross-platform robustness
- Fixed Playwright strict-mode violation in combo-unification.spec.ts
- Bumped OpenAPI version and CHANGELOG to 3.6.6

* fix: 3 bugs found during issue triage (#1175, #1187/#1218, #1202)

- fix(gemini): strip VS Code JSON Schema extensions from tool schemas (#1175)
  Add enumDescriptions, markdownDescription, markdownEnumDescriptions,
  enumItemLabels and tags to UNSUPPORTED_SCHEMA_CONSTRAINTS so the Gemini
  sanitizer removes them before forwarding. GitHub Copilot injects these
  non-standard fields into tool definitions, causing Gemini to reject with
  'Unknown name enumDescriptions at functionDeclarations[n].parameters'.

- fix(health-check): unwrap proxy config object before passing to getAccessToken (#1187 #1218)
  resolveProxyForConnection() returns { proxy, level, levelId } but the health
  check loop was passing the full wrapper to getAccessToken(), which expects the
  inner config object (.host, .port etc). The proxy dispatcher validated .host
  on the wrapper (undefined) and threw 'Context proxy host is required', silently
  marking every connection as unhealthy every sweep. Fix mirrors the pattern
  already used in chatHelpers.ts: proxyResult?.proxy || null.

- fix(ui): debounce models.dev sync interval slider to save only on release (#1202)
  The slider's onChange fired updateInterval() on every drag tick, sending a
  PATCH per pixel of movement. Rapid API responses overwrote UI state mid-drag.
  Introduce draftIntervalHours for smooth visual feedback; the PATCH fires
  on onMouseUp / onBlur once the user releases the control.

* fix(providers): update Xiaomi MiMo token-plan endpoints (#1238)

Integrated into release/v3.6.6

* fix(cc-compatible): trim beta flags and preserve cache passthrough (#1230)

Integrated into release/v3.6.6

* feat(memory+skills): full-featured memory & skills systems with tests (#1228)

Integrated into release/v3.6.6

* fix: forward client x-initiator header to GitHub Copilot upstream (#1227)

Integrated into release/v3.6.6

* feat(bailian-quota): add Alibaba Coding Plan quota monitoring (#1235)

* fix: resolve v3.6.6 backlog bugs (#1206, #1211, #1220, #1231)

- fix(core): #1206 inject startup guard against app/ and src/app/ conflict
- fix(health): #1220 add HEALTHCHECK_STAGGER_MS to prevent token refresh bursting
- fix(proxy): #1231 prioritize HTTP 429 over quota body heuristics
- fix(sse): #1211 strip leading double-newlines in responses API stream

* fix(tests): resolve memory migration and skills route pagination bugs from PR overlaps

* docs: Update CHANGELOG.md with v3.6.6 features (#1182, #1165, #1177)

* chore(release): bump version to 3.6.6

Update package versions for the electron app and open-sse package.
Sync llm.txt metadata and feature headings with the 3.6.6 release.

* feat(core): harden outbound provider calls and add cooldown retries

Add guarded outbound fetch helpers with private/local URL blocking,
controlled retries, timeout normalization, and route-level status
propagation for provider validation and model discovery.

Introduce cooldown-aware chat retries with configurable
requestRetry and maxRetryIntervalSec settings, model-scoped cooldown
responses, and improved rate-limit learning from headers and error
bodies so short upstream lockouts can recover automatically.

Also align Antigravity and Codex header handling, require API keys
for Pollinations, validate web runtime env at startup, restore
sanitized Gemini tool names in translated responses, and inject a
synthetic Claude text block when upstream SSE completes empty.

* feat(models): add glmt preset and hybrid token counting

Introduce GLM Thinking as a first-class provider preset with shared GLM
model metadata, pricing, usage sync, dashboard support, and provider
request defaults for higher token budgets and longer timeouts.

Use provider-side /messages/count_tokens when a Claude-compatible
upstream supports it, while preserving estimated fallback behavior for
missing models, missing credentials, and upstream failures.

Also add startup seeding for default model aliases and normalize common
cross-proxy model dialects so canonical slashful model ids do not get
misrouted during resolution.

* feat(api): add sync tokens and v1 websocket bridge

Add dedicated sync token storage, issuance, revocation, and bundle
download routes backed by stable config bundle versioning and ETag
support.

Expose the v1 websocket handshake route and custom Next server bridge so
OpenAI-compatible websocket traffic can be upgraded and proxied through
the dashboard and API bridge.

Expand compliance auditing with structured metadata, pagination, request
context, auth and provider credential events, and SSRF-blocked
validation logging.

* docs: Update all documentation for v3.6.6

- CHANGELOG: Add WebSocket bridge, GLM Thinking preset, safe outbound
  fetch/SSRF guard, cooldown-aware retries, compliance audit v2, model
  alias seeding, and all Internal Improvements for the 3 new commits
- README: Expand v3.6.x highlights table with 10 new features; add
  SafeOutboundFetch, CooldownAwareRetry, SSRF guard, TPS metric, sync
  tokens, WebSocket bridge to Resilience/Observability/Deployment tables
- ARCHITECTURE: Bump date; add new modules to executive summary, API
  routes, SSE core services, Auth/Security section; add SSRF/Outbound
  guard failure mode (section 6); expand module mapping
- ENVIRONMENT: Add OMNIROUTE_CRYPT_KEY/OMNIROUTE_API_KEY_BASE64 legacy
  aliases, OUTBOUND_SSRF_GUARD_ENABLED, CODEX_CLIENT_VERSION, and
  REQUEST_RETRY/MAX_RETRY_INTERVAL_SEC cooldown retry settings
- FEATURES: Add 6 new feature sections — V1 WebSocket Bridge, Sync
  Tokens & Config Bundle, GLM Thinking Preset, Safe Outbound Fetch &
  SSRF Guard, Cooldown-Aware Retries, Compliance Audit v2

* fix: use api64 for proxy test (#1255)

Integrated into release/v3.6.6 — IPv6 proxy test fix

* fix(page): update custom models section to include all providers #1200 (#1256)

Integrated into release/v3.6.6 — Gemini custom model picker fix

* fix: provide default client_id fallbacks to prevent broken OAuth requests (#1246)

Integrated into release/v3.6.6 — OAuth client_id default fallbacks

* fix: translate max_tokens/max_completion_tokens → max_output_tokens in Chat→Responses translator (#1245)

Integrated into release/v3.6.6 — max_tokens → max_output_tokens Responses API translation + unit tests

* feat(oauth): support cursor-agent CLI as Cursor credential source (#1258)

Integrated into release/v3.6.6 — cursor-agent CLI credential source support

* fix(cc-compatible): restore upstream SSE and correct stream/combo timeout behavior (#1257)

Integrated into release/v3.6.6 — CC-compatible upstream SSE restore + stream timeout fix + README table repair

* fix(cli-tools): resolve API key resolution and model mapping bugs in CLI tools (#1263)

Integrated into release/v3.6.6

* feat(cli-tools): add Qwen Code CLI integration (#1266)

Integrated into release/v3.6.6

* fix(i18n): add missing zh-CN translations and fix logger imports (#1269)

Integrated into release/v3.6.6

* fix(i18n): add Chinese i18n support to dashboard components (#1274)

Integrated into release/v3.6.6

* feat: update Pollinations to require API key, remove free tier flag (#1177)

* feat: friendly error messages for crypto/encryption failures (#1165)

* feat: add TPS (tokens per second) metric column to request logs (#1182)

* feat: merge custom/imported models into filter list for all providers (#1191)

* feat(fallback): Fix provider-profile-driven lockouts (#1267)

This integrates rdself's unify-provider-profile-locks PR manually to handle structural conflicts.

* fix(claude): proper Anthropic SDK integration (#1271)

* fix(healthcheck): use correct proxy wrapper format for getAccessToken (#1272)

* chore(release): v3.6.6 — skills registry stability fix + final integration

* fix(auth): harden bootstrap auth and memory dashboard behavior

Restrict unauthenticated writes to /api/settings/require-login to
the initial bootstrap window while keeping read-only checks public.
This prevents post-setup config changes without blocking first-run
login setup, and the onboarding flow now logs in immediately after
setting the password.

Restore memory API filtering and pagination behavior by supporting q
searches, honoring offset-based requests, and avoiding unrelated
fallback results when FTS misses. Update dashboard stats fallback to
use the response totals consistently.

Package the MCP server with explicit file entries and add regression
tests for bootstrap auth and memory route behavior

* fix(codex): remove max_output_tokens from body for compatibility

* chore(release): v3.6.6 — include PR 1274 fixes in changelog

* chore: exclude additional build artifacts and internal directories from npm package distribution

* fix: update Gemini OAuth test to match registry defaults + codex UI improvements

* fix: restore .mjs refs for scripts/ in test imports after ts migration

* fix: restore next.config.mjs ref in dev-origins test

* fix: implement db migration safety checks and codex config format

* fix: disable mass-migration abort during unit tests based on auto-backup flag

* fix: update script regex in auto-update tests to use .mjs

* feat: Add Perplexity Web (Session) provider (#1289)

Integrated into release/v3.6.6

* fix(cli): resolve codex routing config parsing, standardize select model button positioning, and clarify oauth documentation

* docs(changelog): record recent cli, provider, and test updates

Document the latest fixes for Codex routing configuration parsing and
Lobehub provider icon fallback behavior.

Add the note that the remaining JavaScript test files were migrated to
TypeScript ES modules to reflect the completed test stack transition.

* chore(release): merge #1286 minor improvements manually to avoid testing conflict

* chore(test): rename perplexity-web.test.mjs to .ts to maintain 100% TS codebase

* chore(docs): update CHANGELOG.md for perplexity-web provider

* fix(security): resolve CodeQL incomplete URL substring sanitization via URL parsing in test mocks

* fix: integrate compressContext() into chatCore.ts request pipeline

Proactively compress oversized contexts before sending to upstream providers,
preventing context_length_exceeded errors. Compression triggers at 85% of
model's context limit using the existing 3-layer compressContext() function.

- Import compressContext, estimateTokens, getTokenLimit from contextManager
- Add compression check after translation, before executor dispatch
- Estimate tokens and compare against 85% threshold of model's context limit
- Apply 3-layer compression (trim tools, compress thinking, purify history)
- Log compression events with before/after token counts and layers applied
- Audit compression events for observability
- Add unit tests verifying integration behavior

Closes #1290

* fix(tests): align reasoning expectations with GLM thinking structure

* fix: prevent orphaned tool_result messages in purifyHistory()

When purifyHistory() drops oldest messages to fit context window, it can
split tool_use/tool_result pairs — keeping the tool_result but dropping
the tool_use that initiated it. This causes upstream providers to reject
the request with format errors.

Add fixToolPairs() that runs after each purification pass to remove:
- OpenAI format: orphaned role='tool' messages without matching tool_calls ID
- Claude format: orphaned tool_result content blocks without matching tool_use ID

Closes #1291

* fix(tests): supply tool_use in mock so it is not dropped

* chore: convert remaining test to TypeScript

* fix(tests): restore compatibility with compressContext threshold test after tsx migration

* docs: finalize v3.6.6 release documentation

* fix(core): finalize provider removal, type issues, and codex API key config

* fix(dashboard): render Web/Cookie, Search, Audio provider sections and fix TypeScript errors

* fix: increase MCP web_search timeout to 60s (#1278)

* fix: route combo testing properly for embedding models (#1260)

* fix: accumulate excluded accounts in combo fallback loop (#1233)

* fix: strip leading whitespace and newlines from first streaming chunk (#1211)

* docs: clarify VPS and Docker settings for OAuth credentials (#1204)

* fix: return real retry-after for pipeline gates (#1301)

Integrated into release/v3.6.6 — returns real Retry-After values from pipeline gates

* feat: streaming semantic cache, Cursor auto-version detection, and call-log enhancements (#1296)

Integrated into release/v3.6.6 — streaming semantic cache, Cursor auto-version detection, call-log cache_source tracking

* feat(api): support more OpenAI types (image, embeddings, audio-transcriptions, audio-speech) (#1297)

Integrated into release/v3.6.6 — adds embeddings, audio-transcriptions, audio-speech, and images-generations support for custom OpenAI-compatible providers, plus Pollinations image registry

* deps: bump hono from 4.12.12 to 4.12.14 (#1302)

Integrated into release/v3.6.6

* deps: bump hono from 4.12.12 to 4.12.14 (#1306)

Integrated into release/v3.6.6

* chore: stabilization fixes for v3.6.6 (#1298, #1254, #59, CI)

* fix(providers): match correct endpoint for Xiaomi MiMo, strip routing prefix for custom openai endpoints (#1303, #1261)

* feat(storage): add database backup cleanup controls

* chore(release): v3.6.6 — Final Stabilization Push

* Backport call log storage refactor to release/v3.6.6 (#1307)

Integrated into release/v3.6.6

* deps: update dompurify to 3.4.0 to resolve CVE-XYZ (#60)

* test: disable sqlite auto backup in CI to resolve E2E timeout (#24481475058)

* chore(docs): sync CHANGELOG for v3.6.6 with missing features and fixes

* chore(release): prep v3.6.6 infrastructure and type safety fixes

- Migrated legacy .mjs scripts to .ts (bin, prepublish, policies)
- Resolved pre-commit strict lint (t11 budget) errors in combo.ts
- Explicitly typed all TS bindings in pack-artifact policies
- Updated package.json commands to run Node via tsx/esm internally
- Hardened CI/CD with explicit node version 22.22.2 checks
- Completed stage validations for v3.6.6 final release

* chore: fix TS build errors and e2e timeouts in CI

- Migrate nodeRuntimeSupport to TS interfaces avoiding implicit any
- Increase visibility timeouts in skills-marketplace E2E test to 15s to bypass CI flakiness
- Complete migration of .mjs scripts to .ts ensuring type safety

* chore(release): sync package version 3.6.6 across workspaces

* test(e2e): universally increase UI component visibility timeouts from 5s to 15s to bypass CI starvation

* chore(build): inject baseUrl, paths, and types:node into MITM tsconfig within prepublish hook to fix missing types in CI check

---------

Co-authored-by: diegosouzapw <diegosouzapw@users.noreply.github.com>
Co-authored-by: Jack <5443152+hijak@users.noreply.github.com>
Co-authored-by: Randi <55005611+rdself@users.noreply.github.com>
Co-authored-by: Paijo <14921983+oyi77@users.noreply.github.com>
Co-authored-by: Samuel Cedric <ceds.sam@gmail.com>
Co-authored-by: Max Garmash <max@37bytes.com>
Co-authored-by: Markus Hartung <mail@hartmark.se>
Co-authored-by: Gi99lin <74502520+Gi99lin@users.noreply.github.com>
Co-authored-by: Payne <baboialex95@gmail.com>
Co-authored-by: Benson K B <bensonkbmca@gmail.com>
Co-authored-by: clousky2020 <33016567+clousky2020@users.noreply.github.com>
Co-authored-by: Ravi Tharuma <25951435+RaviTharuma@users.noreply.github.com>
Co-authored-by: oyi77 <oyi77@users.noreply.github.com>
Co-authored-by: Hdsje <vovan877@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: xiaoge1688 <moyekongling@gmail.com>
2026-04-16 05:26:17 -03:00

20 KiB

omniroute — Agent Guidelines

Project

Unified AI proxy/router — route any LLM through one endpoint. Multi-provider support with 100+ providers (OpenAI, Anthropic, Gemini, DeepSeek, Groq, xAI, Mistral, Fireworks, Cohere, NVIDIA, Cerebras, Pollinations, Puter, Cloudflare AI, HuggingFace, DeepInfra, SambaNova, Meta Llama API, Moonshot AI, AI21 Labs, Databricks, Snowflake, and many more) with MCP Server (25 tools), A2A v0.3 Protocol, and Electron desktop app.

Stack

  • Runtime: Next.js 16 (App Router), Node.js ≥18 <24, ES Modules ("type": "module")
  • Language: TypeScript 5.9 (src/) + JavaScript (open-sse/, electron/)
  • Database: better-sqlite3 (SQLite) — DATA_DIR configurable, default ~/.omniroute/
  • Streaming: SSE via open-sse internal workspace package
  • Styling: Tailwind CSS v4
  • i18n: next-intl with 30 languages
  • Desktop: Electron (cross-platform: Windows, macOS, Linux)
  • Schemas: Zod v4 for all API / MCP input validation

Build, Lint, and Test Commands

Command Description
npm run dev Start Next.js dev server
npm run build Production build (isolated)
npm run start Run production build
npm run build:cli Build CLI package
npm run lint ESLint on all source files
npm run typecheck:core TypeScript core type checking
npm run typecheck:noimplicit:core Strict checking (no implicit any)
npm run check Run lint + test
npm run check:cycles Check for circular dependencies
npm run electron:dev Run Electron app in dev mode
npm run electron:build Build Electron app for current OS

Running Tests

# All tests (unit + vitest + ecosystem + e2e)
npm run test:all

# Single test file (Node.js native test runner — most tests use this)
node --import tsx/esm --test tests/unit/your-file.test.ts
node --import tsx/esm --test tests/unit/plan3-p0.test.ts
node --import tsx/esm --test tests/unit/fixes-p1.test.ts
node --import tsx/esm --test tests/unit/security-fase01.test.ts

# Integration tests
node --import tsx/esm --test tests/integration/*.test.ts

# Vitest (MCP server, autoCombo)
npm run test:vitest

# E2E with Playwright
npm run test:e2e

# Protocol clients E2E (MCP transports, A2A)
npm run test:protocols:e2e

# Ecosystem compatibility tests
npm run test:ecosystem

# Coverage (see CONTRIBUTING.md)
npm run test:coverage

For authoritative coverage requirements, test execution, and PR gates, see CONTRIBUTING.md.


Code Style Guidelines

Formatting (Prettier — enforced via lint-staged)

2 spaces · semicolons required · double quotes (") · 100 char width · es5 trailing commas. Always run prettier --write on changed files.

TypeScript

  • Target: ES2022 · Module: esnext · Resolution: bundler
  • strict: false — prefer explicit types, don't rely on inference
  • Path aliases: @/*src/, @omniroute/open-sseopen-sse/, @omniroute/open-sse/*open-sse/*

ESLint Rules

  • Security (error, everywhere): no-eval, no-implied-eval, no-new-func
  • Relaxed in open-sse/ and tests/: @typescript-eslint/no-explicit-any = warn
  • React hooks rules and @next/next/no-assign-module-variable disabled in open-sse/ and tests/

Naming

Element Convention Example
Files camelCase / kebab-case chatCore.ts, tokenHealthCheck.ts
React components PascalCase Dashboard.tsx, ProviderCard.tsx
Functions/variables camelCase getHealth(), switchCombo()
Constants UPPER_SNAKE MAX_RETRIES, DEFAULT_TIMEOUT
Interfaces PascalCase (I prefix optional) ProviderConfig
Enums PascalCase (members too) LogLevel.Error

Imports

  • Order: external → internal (@/, @omniroute/open-sse) → relative (./, ../)
  • No barrel imports from localDb.ts — import from the specific db/ module instead

Error Handling

  • try/catch with specific error types; always log with context (pino logger)
  • Never silently swallow errors in SSE streams — use abort signals for cleanup
  • Return proper HTTP status codes (4xx client, 5xx server)

Security

  • NEVER commit API keys, secrets, or credentials
  • Validate all user inputs with Zod schemas
  • Auth middleware required on all API routes
  • Never log SQLite encryption keys
  • Sanitize user content (dompurify for HTML)

Architecture

Data Layer (src/lib/db/)

All persistence uses SQLite through domain-specific modules: core.ts, providers.ts, models.ts, combos.ts, apiKeys.ts, settings.ts, backup.ts, proxies.ts, prompts.ts, webhooks.ts, detailedLogs.ts, domainState.ts, registeredKeys.ts, quotaSnapshots.ts, modelComboMappings.ts, cliToolState.ts, encryption.ts, readCache.ts, secrets.ts, stateReset.ts, contextHandoffs.ts. Schema migrations live in db/migrations/ and run via migrationRunner.ts. src/lib/localDb.ts is a re-export layer only — never add logic there.

DB Internals

  • core.ts: getDbInstance() returns a singleton better-sqlite3 instance with WAL journaling. SCHEMA_SQL defines 15 base tables. Helpers: rowToCamel, encryptConnectionFields.
  • migrationRunner.ts: Applies versioned SQL files from db/migrations/ inside transactions. Tracks applied migrations in _omniroute_migrations table.
  • Migrations: 21 files (001_initial_schema.sql021_combo_call_log_targets.sql). Each migration is idempotent and runs in a transaction.
  • Domain modules import getDbInstance() from core.ts for all CRUD operations. Each module owns a specific table/set of tables (e.g., providers.tsprovider_connections, combos.tscombos). Encryption helpers protect sensitive fields at rest.
  • localDb.ts re-exports all domain modules — consumers import from here for convenience.

API Route Layer (src/app/api/v1/)

Next.js App Router routes — each follows a consistent pattern:

Route → CORS preflight → Body validation (Zod) → Optional auth (extractApiKey/isValidApiKey)
  → API key policy enforcement (enforceApiKeyPolicy) → Handler delegation (open-sse)
Route Handler Notes
chat/completions/route.ts handleChat() + prompt injection guard (clones request)
responses/route.ts handleChat() (unified) Responses API format
embeddings/route.ts handleEmbedding() Model listing + creation
images/generations/route.ts handleImageGeneration() Model listing + creation
audio/transcriptions/route.ts audio handler Multipart form data
audio/speech/route.ts TTS handler Binary audio response
videos/generations/route.ts video handler ComfyUI/SD WebUI
music/generations/route.ts music handler ComfyUI workflows
moderations/route.ts moderation handler Content safety
rerank/route.ts rerank handler Document relevance
search/route.ts search handler Web search (5 providers)

No global Next.js middleware file — interception is route-specific. Auth is optional (controlled by REQUIRE_API_KEY env). Prompt injection guard is unique to chat completions.

Request Pipeline (open-sse/)

The open-sse/ workspace is the core streaming engine. Full request flow:

Client Request
  → src/app/api/v1/.../route.ts (Next.js route)
    → open-sse/handlers/chatCore.ts::handleChatCore()
      → Semantic/signature cache check
      → Rate limit check (rateLimitManager)
      → Combo routing? → open-sse/services/combo.ts::handleComboChat()
        → resolveComboTargets() → ordered ResolvedComboTarget[]
        → For each target: handleSingleModel() (wraps chatCore)
      → translateRequest() (open-sse/translator/)
        → Convert source format (e.g., OpenAI) → target format (e.g., Claude)
      → getExecutor() → provider-specific executor instance
        → executor.execute() (BaseExecutor → DefaultExecutor or provider-specific)
          → buildUrl() + buildHeaders() + transformRequest()
          → fetch() to upstream provider
          → Retry logic with exponential backoff
      → Response translation back to client format
      → If Responses API: responsesTransformer.ts TransformStream
  → SSE stream or JSON response to client

Handlers (open-sse/handlers/): chatCore.ts, responsesHandler.ts, embeddings.ts, imageGeneration.ts, videoGeneration.ts, musicGeneration.ts, audioSpeech.ts, audioTranscription.ts, moderations.ts, rerank.ts, search.ts.

Upstream headers: merged after default auth; same header name replaces executor value. T5 intra-family fallback recomputes headers using only the fallback model id. Forbidden header names: src/shared/constants/upstreamHeaders.ts — keep sanitize, Zod schemas, and unit tests aligned when editing.

Provider Categories

  • Free (4): Qoder AI, Qwen Code, Gemini CLI (deprecated), Kiro AI
  • OAuth (8): Claude Code, Antigravity, Codex, GitHub Copilot, Cursor, Kimi Coding, Kilo Code, Cline
  • API Key (91): OpenAI, Anthropic, Gemini, DeepSeek, Groq, xAI, Mistral, Perplexity, Together, Fireworks, Cerebras, Cohere, NVIDIA, Nebius, SiliconFlow, Hyperbolic, HuggingFace, OpenRouter, Vertex AI, Cloudflare AI, Scaleway, AI/ML API, Pollinations, Puter, Longcat, Alibaba, Kimi, Minimax, Blackbox, Synthetic, Kilo Gateway, Z.AI, GLM, Deepgram, AssemblyAI, ElevenLabs, Cartesia, PlayHT, Inworld, NanoBanana, SD WebUI, ComfyUI, Ollama Cloud, Perplexity Search, Serper, Brave, Exa, Tavily, OpenCode Zen/Go, Bailian Coding Plan, DeepInfra, Vercel AI Gateway, Lambda AI, SambaNova, nScale, OVHcloud AI, Baseten, PublicAI, Moonshot AI, Meta Llama API, v0 (Vercel), Morph, Featherless AI, FriendliAI, LlamaGate, Galadriel, Weights & Biases Inference, Volcengine, AI21 Labs, Venice.ai, Codestral, Upstage, Maritalk, Xiaomi MiMo, Inference.net, NanoGPT, Predibase, Bytez, Heroku AI, Databricks, Snowflake Cortex, GigaChat (Sber), and more.
  • Custom: OpenAI-compatible (openai-compatible-*) and Anthropic-compatible (anthropic-compatible-*) prefixes

Providers are registered in src/shared/constants/providers.ts with Zod validation at module load.

Executors (open-sse/executors/)

Provider-specific request executors: base.ts, default.ts, cursor.ts, codex.ts, antigravity.ts, github.ts, gemini-cli.ts, kiro.ts, qoder.ts, vertex.ts, cloudflare-ai.ts, opencode.ts, pollinations.ts, puter.ts.

Executor Internals

  • base.ts (BaseExecutor): Abstract base with buildUrl(), buildHeaders(), transformRequest(), retry logic (exponential backoff), and execute(). Subclasses override URL/header/transform methods for provider-specific behavior.
  • default.ts (DefaultExecutor extends BaseExecutor): Handles most OpenAI-compatible providers. Reads provider config from providerRegistry.ts to resolve base URL, auth header format, and request transformations.
  • getExecutor() (executors/index.ts): Factory that returns the correct executor instance based on provider ID. Provider-specific executors (Cursor, Codex, Vertex, etc.) override only what differs from the default.

Translator (open-sse/translator/)

Translates between API formats (OpenAI-format ↔ Anthropic, Gemini, etc.). Includes request/response translators with helpers for image handling.

Translator Internals

  • translator/index.ts: Exports translateRequest() and format constants. Called by chatCore.ts before executor dispatch.
  • Flow: translateRequest(body, sourceFormat, targetFormat) → detects source format (OpenAI, Anthropic, Gemini) → applies the matching translator module → returns transformed body ready for the target provider.
  • Response translation runs in reverse after upstream response, converting back to the client's expected format.

Transformer (open-sse/transformer/)

responsesTransformer.ts — transforms Responses API format to/from Chat Completions format.

Transformer Internals

  • createResponsesApiTransformStream(): Returns a TransformStream that converts Chat Completions SSE chunks (data: {"choices":[...]}) into Responses API SSE events (response.output_item.added, response.output_text.delta, etc.).
  • Used when the client sends a Responses API request: the request is internally converted to Chat Completions format, dispatched normally, and the response is piped through this transform stream before reaching the client.

Services (open-sse/services/)

36+ service modules including: combo.ts (routing engine), usage.ts, tokenRefresh.ts, rateLimitManager.ts, accountFallback.ts, sessionManager.ts, wildcardRouter.ts, autoCombo/, intentClassifier.ts, taskAwareRouter.ts, thinkingBudget.ts, contextManager.ts, modelDeprecation.ts, modelFamilyFallback.ts, emergencyFallback.ts, workflowFSM.ts, backgroundTaskDetector.ts, ipFilter.ts, signatureCache.ts, volumeDetector.ts, contextHandoff.ts, and more.

Combo Routing Engine (combo.ts)

  • handleComboChat(): Entry point for combo-routed requests. Receives the combo config and iterates through targets in order until one succeeds or all fail.
  • resolveComboTargets(): Expands a combo configuration into an ordered array of ResolvedComboTarget[], each specifying provider + model + account + credentials.
  • Strategies (13): priority, weighted, fill-first, round-robin, P2C, random, least-used, cost-optimized, strict-random, auto, lkgp, context-optimized, context-relay.
  • Each target calls handleSingleModel() which wraps handleChatCore() with per-target error handling and circuit breaker checks.

Domain Layer (src/domain/)

Policy engine modules: policyEngine.ts, comboResolver.ts, costRules.ts, degradation.ts, fallbackPolicy.ts, lockoutPolicy.ts, modelAvailability.ts, providerExpiration.ts, quotaCache.ts, responses.ts, configAudit.ts.

MCP Server (open-sse/mcp-server/)

25 tools, 3 transports (stdio / SSE / Streamable HTTP). Scoped auth (10 scopes), Zod schemas.

Core tools (18): get_health, list_combos, get_combo_metrics, switch_combo, check_quota, route_request, cost_report, list_models_catalog, simulate_route, set_budget_guard, set_routing_strategy, set_resilience_profile, test_combo, get_provider_metrics, best_combo_for_task, explain_route, get_session_snapshot, sync_pricing.

Memory tools (3): memory_search, memory_add, memory_clear.

Skill tools (4): skills_list, skills_enable, skills_execute, skills_executions.

MCP Internals

  • Tool registration: Each tool is an object with { name, description, inputSchema: ZodSchema, handler: async (args) => {...} }. Zod validates inputs before the handler fires.
  • createMcpServer() and startMcpStdio() exported from mcp-server/index.ts. createMcpServer() wires all tool sets; startMcpStdio() launches the stdio transport.
  • Transports: stdio (CLI omniroute --mcp), SSE (/api/mcp/sse), Streamable HTTP (/api/mcp/stream). All share the same tool/scope engine.
  • Scopes (10): Control which tool categories an API key can access. Enforcement happens before handler dispatch.
  • Audit: Every tool invocation is logged to SQLite (mcp_audit table) with tool name, args, success/failure, API key attribution, and timestamp.

A2A Server (src/lib/a2a/)

JSON-RPC 2.0, SSE streaming, Task Manager with TTL cleanup. Agent Card at /.well-known/agent.json. Skills: quotaManagement.ts, smartRouting.ts.

A2A Internals

  • taskManager.ts: State machine lifecycle for tasks: submitted → working → completed | failed | canceled. Tasks have TTL and are cleaned up automatically.
  • JSON-RPC methods: message/send (sync), message/stream (SSE), tasks/get, tasks/cancel. Dispatched via POST /a2a.
  • Skills: Registered in a DB-backed registry. Each skill receives task context (messages, metadata) and returns structured results. quotaManagement.ts summarizes quota; smartRouting.ts recommends routing decisions.
  • Agent Card: /.well-known/agent.json exposes capabilities, skills, and metadata for client auto-discovery.

ACP Module (src/lib/acp/)

Agent Communication Protocol registry and manager.

Memory System (src/lib/memory/)

Extraction, injection, retrieval, summarization, and store modules for persistent conversational memory across sessions.

Skills System (src/lib/skills/)

Extensible skill framework: registry, executor, sandbox, built-in skills, custom skill support, interception, and injection.

Skills Internals

  • registry.ts: DB-backed skill registration and discovery. Skills have metadata (name, description, version, enabled status) stored in SQLite.
  • executor.ts: Execution engine with configurable timeout and retry logic. Receives skill name + input, looks up the skill, runs it in the sandbox.
  • sandbox.ts: Isolation layer for custom (user-provided) skills. Limits resource access and execution time.
  • Built-in skills: Ship with OmniRoute (e.g., quota management, routing). Located alongside the registry.
  • Interception/Injection: Skills can intercept requests in the pipeline (pre/post processing) or inject context into prompts.

Compliance (src/lib/compliance/)

Policy index for compliance enforcement.

MITM Proxy (src/mitm/)

MITM proxy capability with certificate management, DNS handling, and target routing.

Middleware (src/middleware/)

Request middleware including promptInjectionGuard.ts.

Adding a New Provider

  1. Register in src/shared/constants/providers.ts
  2. Add executor in open-sse/executors/ (if custom logic needed)
  3. Add translator in open-sse/translator/ (if non-OpenAI format)
  4. Add OAuth config in src/lib/oauth/constants/oauth.ts (if OAuth-based)
  5. Add models in open-sse/config/providerRegistry.ts

Subdirectory AGENTS.md Files


Review Focus

  • DB ops go through src/lib/db/ modules, never raw SQL in routes
  • Provider requests flow through open-sse/handlers/
  • MCP/A2A pages are tabs inside /dashboard/endpoint, not standalone routes
  • No memory leaks in SSE streams (abort signals, cleanup)
  • Rate limit headers must be parsed correctly
  • All API inputs validated with Zod schemas
  • Provider constants validated at module load via Zod (src/shared/validation/providerSchema.ts)
  • Pricing data syncs from LiteLLM via src/lib/pricingSync.ts
  • Memory/Skills are cross-cutting: affect MCP tools, request pipeline, and A2A skills