* fix(streaming): #1211 greedy strip omniModel tags to prevent literal \n\n artifacts - Changed regex quantifier from ? to * in combo.ts, comboAgentMiddleware.ts, and contextHandoff.ts to greedily strip all JSON-escaped newline sequences surrounding <omniModel> tags in SSE streaming chunks - Added \r to the character class for cross-platform robustness - Fixed Playwright strict-mode violation in combo-unification.spec.ts - Bumped OpenAPI version and CHANGELOG to 3.6.6 * fix: 3 bugs found during issue triage (#1175, #1187/#1218, #1202) - fix(gemini): strip VS Code JSON Schema extensions from tool schemas (#1175) Add enumDescriptions, markdownDescription, markdownEnumDescriptions, enumItemLabels and tags to UNSUPPORTED_SCHEMA_CONSTRAINTS so the Gemini sanitizer removes them before forwarding. GitHub Copilot injects these non-standard fields into tool definitions, causing Gemini to reject with 'Unknown name enumDescriptions at functionDeclarations[n].parameters'. - fix(health-check): unwrap proxy config object before passing to getAccessToken (#1187 #1218) resolveProxyForConnection() returns { proxy, level, levelId } but the health check loop was passing the full wrapper to getAccessToken(), which expects the inner config object (.host, .port etc). The proxy dispatcher validated .host on the wrapper (undefined) and threw 'Context proxy host is required', silently marking every connection as unhealthy every sweep. Fix mirrors the pattern already used in chatHelpers.ts: proxyResult?.proxy || null. - fix(ui): debounce models.dev sync interval slider to save only on release (#1202) The slider's onChange fired updateInterval() on every drag tick, sending a PATCH per pixel of movement. Rapid API responses overwrote UI state mid-drag. Introduce draftIntervalHours for smooth visual feedback; the PATCH fires on onMouseUp / onBlur once the user releases the control. * fix(providers): update Xiaomi MiMo token-plan endpoints (#1238) Integrated into release/v3.6.6 * fix(cc-compatible): trim beta flags and preserve cache passthrough (#1230) Integrated into release/v3.6.6 * feat(memory+skills): full-featured memory & skills systems with tests (#1228) Integrated into release/v3.6.6 * fix: forward client x-initiator header to GitHub Copilot upstream (#1227) Integrated into release/v3.6.6 * feat(bailian-quota): add Alibaba Coding Plan quota monitoring (#1235) * fix: resolve v3.6.6 backlog bugs (#1206, #1211, #1220, #1231) - fix(core): #1206 inject startup guard against app/ and src/app/ conflict - fix(health): #1220 add HEALTHCHECK_STAGGER_MS to prevent token refresh bursting - fix(proxy): #1231 prioritize HTTP 429 over quota body heuristics - fix(sse): #1211 strip leading double-newlines in responses API stream * fix(tests): resolve memory migration and skills route pagination bugs from PR overlaps * docs: Update CHANGELOG.md with v3.6.6 features (#1182, #1165, #1177) * chore(release): bump version to 3.6.6 Update package versions for the electron app and open-sse package. Sync llm.txt metadata and feature headings with the 3.6.6 release. * feat(core): harden outbound provider calls and add cooldown retries Add guarded outbound fetch helpers with private/local URL blocking, controlled retries, timeout normalization, and route-level status propagation for provider validation and model discovery. Introduce cooldown-aware chat retries with configurable requestRetry and maxRetryIntervalSec settings, model-scoped cooldown responses, and improved rate-limit learning from headers and error bodies so short upstream lockouts can recover automatically. Also align Antigravity and Codex header handling, require API keys for Pollinations, validate web runtime env at startup, restore sanitized Gemini tool names in translated responses, and inject a synthetic Claude text block when upstream SSE completes empty. * feat(models): add glmt preset and hybrid token counting Introduce GLM Thinking as a first-class provider preset with shared GLM model metadata, pricing, usage sync, dashboard support, and provider request defaults for higher token budgets and longer timeouts. Use provider-side /messages/count_tokens when a Claude-compatible upstream supports it, while preserving estimated fallback behavior for missing models, missing credentials, and upstream failures. Also add startup seeding for default model aliases and normalize common cross-proxy model dialects so canonical slashful model ids do not get misrouted during resolution. * feat(api): add sync tokens and v1 websocket bridge Add dedicated sync token storage, issuance, revocation, and bundle download routes backed by stable config bundle versioning and ETag support. Expose the v1 websocket handshake route and custom Next server bridge so OpenAI-compatible websocket traffic can be upgraded and proxied through the dashboard and API bridge. Expand compliance auditing with structured metadata, pagination, request context, auth and provider credential events, and SSRF-blocked validation logging. * docs: Update all documentation for v3.6.6 - CHANGELOG: Add WebSocket bridge, GLM Thinking preset, safe outbound fetch/SSRF guard, cooldown-aware retries, compliance audit v2, model alias seeding, and all Internal Improvements for the 3 new commits - README: Expand v3.6.x highlights table with 10 new features; add SafeOutboundFetch, CooldownAwareRetry, SSRF guard, TPS metric, sync tokens, WebSocket bridge to Resilience/Observability/Deployment tables - ARCHITECTURE: Bump date; add new modules to executive summary, API routes, SSE core services, Auth/Security section; add SSRF/Outbound guard failure mode (section 6); expand module mapping - ENVIRONMENT: Add OMNIROUTE_CRYPT_KEY/OMNIROUTE_API_KEY_BASE64 legacy aliases, OUTBOUND_SSRF_GUARD_ENABLED, CODEX_CLIENT_VERSION, and REQUEST_RETRY/MAX_RETRY_INTERVAL_SEC cooldown retry settings - FEATURES: Add 6 new feature sections — V1 WebSocket Bridge, Sync Tokens & Config Bundle, GLM Thinking Preset, Safe Outbound Fetch & SSRF Guard, Cooldown-Aware Retries, Compliance Audit v2 * fix: use api64 for proxy test (#1255) Integrated into release/v3.6.6 — IPv6 proxy test fix * fix(page): update custom models section to include all providers #1200 (#1256) Integrated into release/v3.6.6 — Gemini custom model picker fix * fix: provide default client_id fallbacks to prevent broken OAuth requests (#1246) Integrated into release/v3.6.6 — OAuth client_id default fallbacks * fix: translate max_tokens/max_completion_tokens → max_output_tokens in Chat→Responses translator (#1245) Integrated into release/v3.6.6 — max_tokens → max_output_tokens Responses API translation + unit tests * feat(oauth): support cursor-agent CLI as Cursor credential source (#1258) Integrated into release/v3.6.6 — cursor-agent CLI credential source support * fix(cc-compatible): restore upstream SSE and correct stream/combo timeout behavior (#1257) Integrated into release/v3.6.6 — CC-compatible upstream SSE restore + stream timeout fix + README table repair * fix(cli-tools): resolve API key resolution and model mapping bugs in CLI tools (#1263) Integrated into release/v3.6.6 * feat(cli-tools): add Qwen Code CLI integration (#1266) Integrated into release/v3.6.6 * fix(i18n): add missing zh-CN translations and fix logger imports (#1269) Integrated into release/v3.6.6 * fix(i18n): add Chinese i18n support to dashboard components (#1274) Integrated into release/v3.6.6 * feat: update Pollinations to require API key, remove free tier flag (#1177) * feat: friendly error messages for crypto/encryption failures (#1165) * feat: add TPS (tokens per second) metric column to request logs (#1182) * feat: merge custom/imported models into filter list for all providers (#1191) * feat(fallback): Fix provider-profile-driven lockouts (#1267) This integrates rdself's unify-provider-profile-locks PR manually to handle structural conflicts. * fix(claude): proper Anthropic SDK integration (#1271) * fix(healthcheck): use correct proxy wrapper format for getAccessToken (#1272) * chore(release): v3.6.6 — skills registry stability fix + final integration * fix(auth): harden bootstrap auth and memory dashboard behavior Restrict unauthenticated writes to /api/settings/require-login to the initial bootstrap window while keeping read-only checks public. This prevents post-setup config changes without blocking first-run login setup, and the onboarding flow now logs in immediately after setting the password. Restore memory API filtering and pagination behavior by supporting q searches, honoring offset-based requests, and avoiding unrelated fallback results when FTS misses. Update dashboard stats fallback to use the response totals consistently. Package the MCP server with explicit file entries and add regression tests for bootstrap auth and memory route behavior * fix(codex): remove max_output_tokens from body for compatibility * chore(release): v3.6.6 — include PR 1274 fixes in changelog * chore: exclude additional build artifacts and internal directories from npm package distribution * fix: update Gemini OAuth test to match registry defaults + codex UI improvements * fix: restore .mjs refs for scripts/ in test imports after ts migration * fix: restore next.config.mjs ref in dev-origins test * fix: implement db migration safety checks and codex config format * fix: disable mass-migration abort during unit tests based on auto-backup flag * fix: update script regex in auto-update tests to use .mjs * feat: Add Perplexity Web (Session) provider (#1289) Integrated into release/v3.6.6 * fix(cli): resolve codex routing config parsing, standardize select model button positioning, and clarify oauth documentation * docs(changelog): record recent cli, provider, and test updates Document the latest fixes for Codex routing configuration parsing and Lobehub provider icon fallback behavior. Add the note that the remaining JavaScript test files were migrated to TypeScript ES modules to reflect the completed test stack transition. * chore(release): merge #1286 minor improvements manually to avoid testing conflict * chore(test): rename perplexity-web.test.mjs to .ts to maintain 100% TS codebase * chore(docs): update CHANGELOG.md for perplexity-web provider * fix(security): resolve CodeQL incomplete URL substring sanitization via URL parsing in test mocks * fix: integrate compressContext() into chatCore.ts request pipeline Proactively compress oversized contexts before sending to upstream providers, preventing context_length_exceeded errors. Compression triggers at 85% of model's context limit using the existing 3-layer compressContext() function. - Import compressContext, estimateTokens, getTokenLimit from contextManager - Add compression check after translation, before executor dispatch - Estimate tokens and compare against 85% threshold of model's context limit - Apply 3-layer compression (trim tools, compress thinking, purify history) - Log compression events with before/after token counts and layers applied - Audit compression events for observability - Add unit tests verifying integration behavior Closes #1290 * fix(tests): align reasoning expectations with GLM thinking structure * fix: prevent orphaned tool_result messages in purifyHistory() When purifyHistory() drops oldest messages to fit context window, it can split tool_use/tool_result pairs — keeping the tool_result but dropping the tool_use that initiated it. This causes upstream providers to reject the request with format errors. Add fixToolPairs() that runs after each purification pass to remove: - OpenAI format: orphaned role='tool' messages without matching tool_calls ID - Claude format: orphaned tool_result content blocks without matching tool_use ID Closes #1291 * fix(tests): supply tool_use in mock so it is not dropped * chore: convert remaining test to TypeScript * fix(tests): restore compatibility with compressContext threshold test after tsx migration * docs: finalize v3.6.6 release documentation * fix(core): finalize provider removal, type issues, and codex API key config * fix(dashboard): render Web/Cookie, Search, Audio provider sections and fix TypeScript errors * fix: increase MCP web_search timeout to 60s (#1278) * fix: route combo testing properly for embedding models (#1260) * fix: accumulate excluded accounts in combo fallback loop (#1233) * fix: strip leading whitespace and newlines from first streaming chunk (#1211) * docs: clarify VPS and Docker settings for OAuth credentials (#1204) * fix: return real retry-after for pipeline gates (#1301) Integrated into release/v3.6.6 — returns real Retry-After values from pipeline gates * feat: streaming semantic cache, Cursor auto-version detection, and call-log enhancements (#1296) Integrated into release/v3.6.6 — streaming semantic cache, Cursor auto-version detection, call-log cache_source tracking * feat(api): support more OpenAI types (image, embeddings, audio-transcriptions, audio-speech) (#1297) Integrated into release/v3.6.6 — adds embeddings, audio-transcriptions, audio-speech, and images-generations support for custom OpenAI-compatible providers, plus Pollinations image registry * deps: bump hono from 4.12.12 to 4.12.14 (#1302) Integrated into release/v3.6.6 * deps: bump hono from 4.12.12 to 4.12.14 (#1306) Integrated into release/v3.6.6 * chore: stabilization fixes for v3.6.6 (#1298, #1254, #59, CI) * fix(providers): match correct endpoint for Xiaomi MiMo, strip routing prefix for custom openai endpoints (#1303, #1261) * feat(storage): add database backup cleanup controls * chore(release): v3.6.6 — Final Stabilization Push * Backport call log storage refactor to release/v3.6.6 (#1307) Integrated into release/v3.6.6 * deps: update dompurify to 3.4.0 to resolve CVE-XYZ (#60) * test: disable sqlite auto backup in CI to resolve E2E timeout (#24481475058) * chore(docs): sync CHANGELOG for v3.6.6 with missing features and fixes * chore(release): prep v3.6.6 infrastructure and type safety fixes - Migrated legacy .mjs scripts to .ts (bin, prepublish, policies) - Resolved pre-commit strict lint (t11 budget) errors in combo.ts - Explicitly typed all TS bindings in pack-artifact policies - Updated package.json commands to run Node via tsx/esm internally - Hardened CI/CD with explicit node version 22.22.2 checks - Completed stage validations for v3.6.6 final release * chore: fix TS build errors and e2e timeouts in CI - Migrate nodeRuntimeSupport to TS interfaces avoiding implicit any - Increase visibility timeouts in skills-marketplace E2E test to 15s to bypass CI flakiness - Complete migration of .mjs scripts to .ts ensuring type safety * chore(release): sync package version 3.6.6 across workspaces * test(e2e): universally increase UI component visibility timeouts from 5s to 15s to bypass CI starvation * chore(build): inject baseUrl, paths, and types:node into MITM tsconfig within prepublish hook to fix missing types in CI check --------- Co-authored-by: diegosouzapw <diegosouzapw@users.noreply.github.com> Co-authored-by: Jack <5443152+hijak@users.noreply.github.com> Co-authored-by: Randi <55005611+rdself@users.noreply.github.com> Co-authored-by: Paijo <14921983+oyi77@users.noreply.github.com> Co-authored-by: Samuel Cedric <ceds.sam@gmail.com> Co-authored-by: Max Garmash <max@37bytes.com> Co-authored-by: Markus Hartung <mail@hartmark.se> Co-authored-by: Gi99lin <74502520+Gi99lin@users.noreply.github.com> Co-authored-by: Payne <baboialex95@gmail.com> Co-authored-by: Benson K B <bensonkbmca@gmail.com> Co-authored-by: clousky2020 <33016567+clousky2020@users.noreply.github.com> Co-authored-by: Ravi Tharuma <25951435+RaviTharuma@users.noreply.github.com> Co-authored-by: oyi77 <oyi77@users.noreply.github.com> Co-authored-by: Hdsje <vovan877@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: xiaoge1688 <moyekongling@gmail.com>
48 KiB
OmniRoute Architecture
🌐 Languages: 🇺🇸 English | 🇧🇷 Português (Brasil) | 🇪🇸 Español | 🇫🇷 Français | 🇮🇹 Italiano | 🇷🇺 Русский | 🇨🇳 中文 (简体) | 🇩🇪 Deutsch | 🇮🇳 हिन्दी | 🇹🇭 ไทย | 🇺🇦 Українська | 🇸🇦 العربية | 🇯🇵 日本語 | 🇻🇳 Tiếng Việt | 🇧🇬 Български | 🇩🇰 Dansk | 🇫🇮 Suomi | 🇮🇱 עברית | 🇭🇺 Magyar | 🇮🇩 Bahasa Indonesia | 🇰🇷 한국어 | 🇲🇾 Bahasa Melayu | 🇳🇱 Nederlands | 🇳🇴 Norsk | 🇵🇹 Português (Portugal) | 🇷🇴 Română | 🇵🇱 Polski | 🇸🇰 Slovenčina | 🇸🇪 Svenska | 🇵🇭 Filipino | 🇨🇿 Čeština
Last updated: 2026-04-15
Executive Summary
OmniRoute is a local AI routing gateway and dashboard built on Next.js.
It provides a single OpenAI-compatible endpoint (/v1/*) and routes traffic across multiple upstream providers with translation, fallback, token refresh, and usage tracking.
Core capabilities:
- OpenAI-compatible API surface for CLI/tools (100+ providers, 16 executors)
- Request/response translation across provider formats
- Model combo fallback (multi-model sequence)
- Structured combo steps (
provider + model + connection) with runtime ordering bycompositeTiers - Account-level fallback (multi-account per provider)
- Quota preflight and quota-aware P2C account selection in the main chat path
- OAuth + API-key provider connection management (13 OAuth modules)
- Embedding generation via
/v1/embeddings(6 providers, 9 models) - Image generation via
/v1/images/generations(10+ providers, 20+ models) - Audio transcription via
/v1/audio/transcriptions(7 providers) - Text-to-speech via
/v1/audio/speech(10 providers) - Video generation via
/v1/videos/generations(ComfyUI + SD WebUI) - Music generation via
/v1/music/generations(ComfyUI) - Web search via
/v1/search(5 providers) - Moderations via
/v1/moderations - Reranking via
/v1/rerank - Think tag parsing (
<think>...</think>) for reasoning models - Response sanitization for strict OpenAI SDK compatibility
- Role normalization (developer→system, system→user) for cross-provider compatibility
- Structured output conversion (json_schema → Gemini responseSchema)
- Local persistence for providers, keys, aliases, combos, settings, pricing (26 DB modules)
- Usage/cost tracking and request logging
- Optional cloud sync for multi-device/state sync
- IP allowlist/blocklist for API access control
- Thinking budget management (passthrough/auto/custom/adaptive)
- Global system prompt injection
- Session tracking and fingerprinting
- Per-account enhanced rate limiting with provider-specific profiles
- Circuit breaker pattern for provider resilience
- Anti-thundering herd protection with mutex locking
- Signature-based request deduplication cache
- Domain layer: model availability, cost rules, fallback policy, lockout policy
- Context Relay: session handoff summaries for account rotation continuity
- Domain state persistence (SQLite write-through cache for fallbacks, budgets, lockouts, circuit breakers)
- Policy engine for centralized request evaluation (lockout → budget → fallback)
- Request telemetry with p50/p95/p99 latency aggregation
- Combo target telemetry and historical combo target health via
combo_execution_key/combo_step_id - Correlation ID (X-Request-Id) for end-to-end tracing
- Compliance audit logging with opt-out per API key
- Eval framework for LLM quality assurance
- Resilience UI dashboard with real-time circuit breaker status
- MCP Server (25 tools) with 3 transports (stdio/SSE/Streamable HTTP)
- A2A Server (JSON-RPC 2.0 + SSE) with skills and task lifecycle
- Memory system (extraction, injection, retrieval, summarization)
- Skills system (registry, executor, sandbox, built-in skills)
- MITM proxy with certificate management and DNS handling
- Prompt injection guard middleware
- ACP (Agent Communication Protocol) registry
- Modular OAuth providers (13 individual modules under
src/lib/oauth/providers/) - Uninstall/full-uninstall scripts
- OAuth environment repair action
- WebSocket bridge for OpenAI-compatible WS clients (
/v1/ws) - Sync token management (issue/revoke, ETag-versioned config bundle download)
- GLM Thinking (
glmt) first-class provider preset - Hybrid token counting (provider-side
/messages/count_tokenswith estimation fallback) - Model alias auto-seeding (30+ cross-proxy dialect normalizations at startup)
- Safe outbound fetch with SSRF guard, private URL blocking, and configurable retry
- Cooldown-aware chat retries with configurable
requestRetryandmaxRetryIntervalSec - Runtime environment validation with Zod at startup
- Compliance audit v2 with pagination, provider CRUD events, and SSRF-blocked validation logging
Primary runtime model:
- Next.js app routes under
src/app/api/*implement both dashboard APIs and compatibility APIs - A shared SSE/routing core in
src/sse/*+open-sse/*handles provider execution, translation, streaming, fallback, and usage
Scope and Boundaries
In Scope
- Local gateway runtime
- Dashboard management APIs
- Provider authentication and token refresh
- Request translation and SSE streaming
- Local state + usage persistence
- Optional cloud sync orchestration
Out of Scope
- Cloud service implementation behind
NEXT_PUBLIC_CLOUD_URL - Provider SLA/control plane outside local process
- External CLI binaries themselves (Claude CLI, Codex CLI, etc.)
Dashboard Surface (Current)
Main pages under src/app/(dashboard)/dashboard/:
/dashboard— quick start + provider overview/dashboard/endpoint— endpoint proxy + MCP + A2A + API endpoint tabs/dashboard/providers— provider connections and credentials/dashboard/combos— combo strategies, templates, step-based builder, model routing rules, manual persisted ordering/dashboard/costs— cost aggregation and pricing visibility/dashboard/analytics— usage analytics, evaluations, combo target health/dashboard/limits— quota/rate controls/dashboard/cli-tools— CLI onboarding, runtime detection, config generation/dashboard/agents— detected ACP agents + custom agent registration/dashboard/media— image/video/music playground/dashboard/search-tools— search provider testing and history/dashboard/health— uptime, circuit breakers, rate limits, quota-monitored sessions/dashboard/logs— request/proxy/audit/console logs/dashboard/settings— system settings tabs (general, routing, combo defaults, etc.)/dashboard/api-manager— API key lifecycle and model permissions
High-Level System Context
flowchart LR
subgraph Clients[Developer Clients]
C1[Claude Code]
C2[Codex CLI]
C3[OpenClaw / Droid / Cline / Continue / Roo]
C4[Custom OpenAI-compatible clients]
BROWSER[Browser Dashboard]
end
subgraph Router[OmniRoute Local Process]
API[V1 Compatibility API\n/v1/*]
DASH[Dashboard + Management API\n/api/*]
CORE[SSE + Translation Core\nopen-sse + src/sse]
DB[(storage.sqlite)]
UDB[(usage tables + log artifacts)]
end
subgraph Upstreams[Upstream Providers]
P1[OAuth Providers\nClaude/Codex/Gemini/Qwen/Qoder/GitHub/Kiro/Cursor/Antigravity]
P2[API Key Providers\nOpenAI/Anthropic/OpenRouter/GLM/Kimi/MiniMax\nDeepSeek/Groq/xAI/Mistral/Perplexity\nTogether/Fireworks/Cerebras/Cohere/NVIDIA]
P3[Compatible Nodes\nOpenAI-compatible / Anthropic-compatible]
end
subgraph Cloud[Optional Cloud Sync]
CLOUD[Cloud Sync Endpoint\nNEXT_PUBLIC_CLOUD_URL]
end
C1 --> API
C2 --> API
C3 --> API
C4 --> API
BROWSER --> DASH
API --> CORE
DASH --> DB
CORE --> DB
CORE --> UDB
CORE --> P1
CORE --> P2
CORE --> P3
DASH --> CLOUD
Core Runtime Components
1) API and Routing Layer (Next.js App Routes)
Main directories:
src/app/api/v1/*andsrc/app/api/v1beta/*for compatibility APIssrc/app/api/*for management/configuration APIs- Next rewrites in
next.config.mjsmap/v1/*to/api/v1/*
Important compatibility routes:
src/app/api/v1/chat/completions/route.tssrc/app/api/v1/messages/route.tssrc/app/api/v1/responses/route.tssrc/app/api/v1/models/route.ts— includes custom models withcustom: truesrc/app/api/v1/embeddings/route.ts— embedding generation (6 providers)src/app/api/v1/images/generations/route.ts— image generation (4+ providers incl. Antigravity/Nebius)src/app/api/v1/messages/count_tokens/route.tssrc/app/api/v1/providers/[provider]/chat/completions/route.ts— dedicated per-provider chatsrc/app/api/v1/providers/[provider]/embeddings/route.ts— dedicated per-provider embeddingssrc/app/api/v1/providers/[provider]/images/generations/route.ts— dedicated per-provider imagessrc/app/api/v1beta/models/route.tssrc/app/api/v1beta/models/[...path]/route.ts
Management domains:
- Auth/settings:
src/app/api/auth/*,src/app/api/settings/* - Providers/connections:
src/app/api/providers* - Provider nodes:
src/app/api/provider-nodes* - Custom models:
src/app/api/provider-models(GET/POST/DELETE) - Model catalog:
src/app/api/models/route.ts(GET) - Proxy config:
src/app/api/settings/proxy(GET/PUT/DELETE) +src/app/api/settings/proxy/test(POST) - OAuth:
src/app/api/oauth/* - Keys/aliases/combos/pricing:
src/app/api/keys*,src/app/api/models/alias,src/app/api/combos*,src/app/api/pricing - Usage:
src/app/api/usage/* - Sync/cloud:
src/app/api/sync/*,src/app/api/cloud/* - CLI tooling helpers:
src/app/api/cli-tools/* - IP filter:
src/app/api/settings/ip-filter(GET/PUT) - Thinking budget:
src/app/api/settings/thinking-budget(GET/PUT) - System prompt:
src/app/api/settings/system-prompt(GET/PUT) - Sessions:
src/app/api/sessions(GET) - Rate limits:
src/app/api/rate-limits(GET) - Resilience:
src/app/api/resilience(GET/PATCH) — provider profiles, circuit breaker, rate limit state - Resilience reset:
src/app/api/resilience/reset(POST) — reset breakers + cooldowns - Cache stats:
src/app/api/cache/stats(GET/DELETE) - Model availability:
src/app/api/models/availability(GET/POST) - Telemetry:
src/app/api/telemetry/summary(GET) - Budget:
src/app/api/usage/budget(GET/POST) - Fallback chains:
src/app/api/fallback/chains(GET/POST/DELETE) - Compliance audit:
src/app/api/compliance/audit-log(GET, with pagination + structured metadata) - Evals:
src/app/api/evals(GET/POST),src/app/api/evals/[suiteId](GET) - Policies:
src/app/api/policies(GET/POST) - Sync tokens:
src/app/api/sync/tokens(GET/POST),src/app/api/sync/tokens/[id](GET/DELETE) - Config bundle:
src/app/api/sync/bundle(GET, ETag-versioned snapshot of settings/providers/combos/keys) - WebSocket:
src/app/api/v1/ws/route.ts— Upgrade handler for OpenAI-compatible WS clients
2) SSE + Translation Core
Main flow modules:
- Entry:
src/sse/handlers/chat.ts - Core orchestration:
open-sse/handlers/chatCore.ts - Provider execution adapters:
open-sse/executors/* - Format detection/provider config:
open-sse/services/provider.ts - Model parse/resolve:
src/sse/services/model.ts,open-sse/services/model.ts - Account fallback logic:
open-sse/services/accountFallback.ts - Translation registry:
open-sse/translator/index.ts - Stream transformations:
open-sse/utils/stream.ts,open-sse/utils/streamHandler.ts - Usage extraction/normalization:
open-sse/utils/usageTracking.ts - Think tag parser:
open-sse/utils/thinkTagParser.ts - Embedding handler:
open-sse/handlers/embeddings.ts - Embedding provider registry:
open-sse/config/embeddingRegistry.ts - Image generation handler:
open-sse/handlers/imageGeneration.ts - Image provider registry:
open-sse/config/imageRegistry.ts - Response sanitization:
open-sse/handlers/responseSanitizer.ts - Role normalization:
open-sse/services/roleNormalizer.ts
Services (business logic):
- Account selection/scoring:
open-sse/services/accountSelector.ts - Context lifecycle management:
open-sse/services/contextManager.ts - IP filter enforcement:
open-sse/services/ipFilter.ts - Session tracking:
open-sse/services/sessionManager.ts - Request deduplication:
open-sse/services/signatureCache.ts - System prompt injection:
open-sse/services/systemPrompt.ts - Thinking budget management:
open-sse/services/thinkingBudget.ts - Wildcard model routing:
open-sse/services/wildcardRouter.ts - Rate limit management:
open-sse/services/rateLimitManager.ts - Circuit breaker:
open-sse/services/circuitBreaker.ts - Context handoff:
open-sse/services/contextHandoff.ts— handoff summary generation and injection for context-relay strategy - Codex quota fetcher:
open-sse/services/codexQuotaFetcher.ts— fetches Codex quota for context-relay handoff decisions - Cooldown-aware retry:
src/sse/services/cooldownAwareRetry.ts— per-model cooldown retries with configurablerequestRetry/maxRetryIntervalSec - Safe outbound fetch:
src/shared/network/safeOutboundFetch.ts— guarded provider/model fetch with SSRF guard, private-URL blocking, retry, and timeout - Outbound URL guard:
src/shared/network/outboundUrlGuard.ts— validates provider URLs against private/localhost CIDR ranges - Provider request defaults:
open-sse/services/providerRequestDefaults.ts— provider-levelmaxTokens,temperature,thinkingBudgetTokensdefaults - GLM provider constants:
open-sse/config/glmProvider.ts— shared GLM models, quota URLs, GLMT timeout/defaults - Antigravity upstream:
open-sse/config/antigravityUpstream.ts— base URL and discovery path constants - Codex client constants:
open-sse/config/codexClient.ts— versioned user-agent and client-version values - Model alias seed:
src/lib/modelAliasSeed.ts— seeds 30+ cross-proxy dialect aliases at startup
Domain layer modules:
- Model availability:
src/lib/domain/modelAvailability.ts - Cost rules/budgets:
src/lib/domain/costRules.ts - Fallback policy:
src/lib/domain/fallbackPolicy.ts - Combo resolver:
src/lib/domain/comboResolver.ts - Lockout policy:
src/lib/domain/lockoutPolicy.ts - Policy engine:
src/domain/policyEngine.ts— centralized lockout → budget → fallback evaluation - Error codes catalog:
src/lib/domain/errorCodes.ts - Request ID:
src/lib/domain/requestId.ts - Fetch timeout:
src/lib/domain/fetchTimeout.ts - Request telemetry:
src/lib/domain/requestTelemetry.ts - Compliance/audit:
src/lib/domain/compliance/index.ts - Eval runner:
src/lib/domain/evalRunner.ts - Domain state persistence:
src/lib/db/domainState.ts— SQLite CRUD for fallback chains, budgets, cost history, lockout state, circuit breakers
OAuth provider modules (13 individual files under src/lib/oauth/providers/):
- Registry index:
src/lib/oauth/providers/index.ts - Individual providers:
claude.ts,codex.ts,gemini.ts,antigravity.ts,qoder.ts,qwen.ts,kimi-coding.ts,github.ts,kiro.ts,cursor.ts,kilocode.ts,cline.ts - Thin wrapper:
src/lib/oauth/providers.ts— re-exports from individual modules
3) Persistence Layer
Primary state DB (SQLite):
- Core infra:
src/lib/db/core.ts(better-sqlite3, migrations, WAL) - Re-export facade:
src/lib/localDb.ts(thin compatibility layer for callers) - file:
${DATA_DIR}/storage.sqlite(or$XDG_CONFIG_HOME/omniroute/storage.sqlitewhen set, else~/.omniroute/storage.sqlite) - entities (tables + KV namespaces): providerConnections, providerNodes, modelAliases, combos, apiKeys, settings, pricing, customModels, proxyConfig, ipFilter, thinkingBudget, systemPrompt
Usage persistence:
- facade:
src/lib/usageDb.ts(decomposed modules insrc/lib/usage/*) - SQLite tables in
storage.sqlite:usage_history,call_logs,proxy_logs - optional file artifacts remain for compatibility/debug (
${DATA_DIR}/log.txt,${DATA_DIR}/call_logs/,<repo>/logs/...) - legacy JSON files are migrated to SQLite by startup migrations when present
Domain State DB (SQLite):
src/lib/db/domainState.ts— CRUD operations for domain state- Tables (created in
src/lib/db/core.ts):domain_fallback_chains,domain_budgets,domain_cost_history,domain_lockout_state,domain_circuit_breakers - Write-through cache pattern: in-memory Maps are authoritative at runtime; mutations are written synchronously to SQLite; state is restored from DB on cold start
4) Auth + Security Surfaces
- Dashboard cookie auth:
src/proxy.ts,src/app/api/auth/login/route.ts - API key generation/verification:
src/shared/utils/apiKey.ts - Provider secrets persisted in
providerConnectionsentries - Outbound proxy support via
open-sse/utils/proxyFetch.ts(env vars) andopen-sse/utils/networkProxy.ts(configurable per-provider or global) - SSRF / outbound URL guard:
src/shared/network/outboundUrlGuard.ts— blocks private/loopback/link-local ranges for all provider calls - Runtime env validation:
src/lib/env/runtimeEnv.ts— Zod schema for all environment variables, surfaced as startup errors/warnings - Sync tokens:
src/lib/db/syncTokens.ts— scoped tokens for config bundle download endpoints; backed bysync_tokensSQLite table (migration024_create_sync_tokens.sql) - WebSocket handshake auth:
src/lib/ws/handshake.ts— validates WS upgrade requests via API key or session cookie
5) Cloud Sync
- Scheduler init:
src/lib/initCloudSync.ts,src/shared/services/initializeCloudSync.ts,src/shared/services/modelSyncScheduler.ts - Periodic task:
src/shared/services/cloudSyncScheduler.ts - Periodic task:
src/shared/services/modelSyncScheduler.ts - Control route:
src/app/api/sync/cloud/route.ts
Request Lifecycle (/v1/chat/completions)
sequenceDiagram
autonumber
participant Client as CLI/SDK Client
participant Route as /api/v1/chat/completions
participant Chat as src/sse/handlers/chat
participant Core as open-sse/handlers/chatCore
participant Model as Model Resolver
participant Auth as Credential Selector
participant Exec as Provider Executor
participant Prov as Upstream Provider
participant Stream as Stream Translator
participant Usage as usageDb
Client->>Route: POST /v1/chat/completions
Route->>Chat: handleChat(request)
Chat->>Model: parse/resolve model or combo
alt Combo model
Chat->>Chat: iterate combo models (handleComboChat)
end
Chat->>Auth: getProviderCredentials(provider)
Auth-->>Chat: active account + tokens/api key
Chat->>Core: handleChatCore(body, modelInfo, credentials)
Core->>Core: detect source format
Core->>Core: translate request to target format
Core->>Exec: execute(provider, transformedBody)
Exec->>Prov: upstream API call
Prov-->>Exec: SSE/JSON response
Exec-->>Core: response + metadata
alt 401/403
Core->>Exec: refreshCredentials()
Exec-->>Core: updated tokens
Core->>Exec: retry request
end
Core->>Stream: translate/normalize stream to client format
Stream-->>Client: SSE chunks / JSON response
Stream->>Usage: extract usage + persist history/log
Combo + Account Fallback Flow
flowchart TD
A[Incoming model string] --> B{Is combo name?}
B -- Yes --> C[Load combo models sequence]
B -- No --> D[Single model path]
C --> E[Try model N]
E --> F[Resolve provider/model]
D --> F
F --> G[Select account credentials]
G --> H{Credentials available?}
H -- No --> I[Return provider unavailable]
H -- Yes --> J[Execute request]
J --> K{Success?}
K -- Yes --> L[Return response]
K -- No --> M{Fallback-eligible error?}
M -- No --> N[Return error]
M -- Yes --> O[Mark account unavailable cooldown]
O --> P{Another account for provider?}
P -- Yes --> G
P -- No --> Q{In combo with next model?}
Q -- Yes --> E
Q -- No --> R[Return all unavailable]
Fallback decisions are driven by open-sse/services/accountFallback.ts using status codes and error-message heuristics. Combo routing adds one extra guard: provider-scoped 400s such as upstream content-block and role-validation failures are treated as model-local failures so later combo targets can still run.
OAuth Onboarding and Token Refresh Lifecycle
sequenceDiagram
autonumber
participant UI as Dashboard UI
participant OAuth as /api/oauth/[provider]/[action]
participant ProvAuth as Provider Auth Server
participant DB as localDb
participant Test as /api/providers/[id]/test
participant Exec as Provider Executor
UI->>OAuth: GET authorize or device-code
OAuth->>ProvAuth: create auth/device flow
ProvAuth-->>OAuth: auth URL or device code payload
OAuth-->>UI: flow data
UI->>OAuth: POST exchange or poll
OAuth->>ProvAuth: token exchange/poll
ProvAuth-->>OAuth: access/refresh tokens
OAuth->>DB: createProviderConnection(oauth data)
OAuth-->>UI: success + connection id
UI->>Test: POST /api/providers/[id]/test
Test->>Exec: validate credentials / optional refresh
Exec-->>Test: valid or refreshed token info
Test->>DB: update status/tokens/errors
Test-->>UI: validation result
Refresh during live traffic is executed inside open-sse/handlers/chatCore.ts via executor refreshCredentials().
Cloud Sync Lifecycle (Enable / Sync / Disable)
sequenceDiagram
autonumber
participant UI as Endpoint Page UI
participant Sync as /api/sync/cloud
participant DB as localDb
participant Cloud as External Cloud Sync
participant Claude as ~/.claude/settings.json
UI->>Sync: POST action=enable
Sync->>DB: set cloudEnabled=true
Sync->>DB: ensure API key exists
Sync->>Cloud: POST /sync/{machineId} (providers/aliases/combos/keys)
Cloud-->>Sync: sync result
Sync->>Cloud: GET /{machineId}/v1/verify
Sync-->>UI: enabled + verification status
UI->>Sync: POST action=sync
Sync->>Cloud: POST /sync/{machineId}
Cloud-->>Sync: remote data
Sync->>DB: update newer local tokens/status
Sync-->>UI: synced
UI->>Sync: POST action=disable
Sync->>DB: set cloudEnabled=false
Sync->>Cloud: DELETE /sync/{machineId}
Sync->>Claude: switch ANTHROPIC_BASE_URL back to local (if needed)
Sync-->>UI: disabled
Periodic sync is triggered by CloudSyncScheduler when cloud is enabled.
Data Model and Storage Map
erDiagram
SETTINGS ||--o{ PROVIDER_CONNECTION : controls
PROVIDER_NODE ||--o{ PROVIDER_CONNECTION : backs_compatible_provider
PROVIDER_CONNECTION ||--o{ USAGE_ENTRY : emits_usage
SETTINGS {
boolean cloudEnabled
number stickyRoundRobinLimit
boolean requireLogin
string password_hash
string fallbackStrategy
json rateLimitDefaults
json providerProfiles
}
PROVIDER_CONNECTION {
string id
string provider
string authType
string name
number priority
boolean isActive
string apiKey
string accessToken
string refreshToken
string expiresAt
string testStatus
string lastError
string rateLimitedUntil
json providerSpecificData
}
PROVIDER_NODE {
string id
string type
string name
string prefix
string apiType
string baseUrl
}
MODEL_ALIAS {
string alias
string targetModel
}
COMBO {
string id
string name
string[] models
}
API_KEY {
string id
string name
string key
string machineId
}
USAGE_ENTRY {
string provider
string model
number prompt_tokens
number completion_tokens
string connectionId
string timestamp
}
CUSTOM_MODEL {
string id
string name
string providerId
}
PROXY_CONFIG {
string global
json providers
}
IP_FILTER {
string mode
string[] allowlist
string[] blocklist
}
THINKING_BUDGET {
string mode
number customBudget
string effortLevel
}
SYSTEM_PROMPT {
boolean enabled
string prompt
string position
}
Physical storage files:
- primary runtime DB:
${DATA_DIR}/storage.sqlite - request log lines:
${DATA_DIR}/log.txt(compat/debug artifact) - structured call payload archives:
${DATA_DIR}/call_logs/ - optional translator/request debug sessions:
<repo>/logs/...
Deployment Topology
flowchart LR
subgraph LocalHost[Developer Host]
CLI[CLI Tools]
Browser[Dashboard Browser]
end
subgraph ContainerOrProcess[OmniRoute Runtime]
Next[Next.js Server\nPORT=20128]
Core[SSE Core + Executors]
MainDB[(storage.sqlite)]
UsageDB[(usage tables + log artifacts)]
end
subgraph External[External Services]
Providers[AI Providers]
SyncCloud[Cloud Sync Service]
end
CLI --> Next
Browser --> Next
Next --> Core
Next --> MainDB
Core --> MainDB
Core --> UsageDB
Core --> Providers
Next --> SyncCloud
Module Mapping (Decision-Critical)
Route and API Modules
src/app/api/v1/*,src/app/api/v1beta/*: compatibility APIssrc/app/api/v1/providers/[provider]/*: dedicated per-provider routes (chat, embeddings, images)src/app/api/providers*: provider CRUD, validation, testingsrc/app/api/provider-nodes*: custom compatible node managementsrc/app/api/provider-models: custom model management (CRUD)src/app/api/models/route.ts: model catalog API (aliases + custom models)src/app/api/oauth/*: OAuth/device-code flowssrc/app/api/keys*: local API key lifecyclesrc/app/api/models/alias: alias managementsrc/app/api/combos*: fallback combo managementsrc/app/api/pricing: pricing overrides for cost calculationsrc/app/api/settings/proxy: proxy configuration (GET/PUT/DELETE)src/app/api/settings/proxy/test: outbound proxy connectivity test (POST)src/app/api/usage/*: usage and logs APIssrc/app/api/sync/*+src/app/api/cloud/*: cloud sync and cloud-facing helperssrc/app/api/cli-tools/*: local CLI config writers/checkerssrc/app/api/settings/ip-filter: IP allowlist/blocklist (GET/PUT)src/app/api/settings/thinking-budget: thinking token budget config (GET/PUT)src/app/api/settings/system-prompt: global system prompt (GET/PUT)src/app/api/sessions: active session listing (GET)src/app/api/rate-limits: per-account rate limit status (GET)src/app/api/sync/tokens: sync token CRUD (GET/POST)src/app/api/sync/tokens/[id]: sync token get/delete (GET/DELETE)src/app/api/sync/bundle: config bundle download (GET, ETag versioning)src/app/api/v1/ws: WebSocket upgrade handler for OpenAI-compatible WS clients
Routing and Execution Core
src/sse/handlers/chat.ts: request parse, combo handling, account selection loopopen-sse/handlers/chatCore.ts: translation, executor dispatch, retry/refresh handling, stream setupopen-sse/executors/*: provider-specific network and format behavior
Translation Registry and Format Converters
open-sse/translator/index.ts: translator registry and orchestration- Request translators:
open-sse/translator/request/* - Response translators:
open-sse/translator/response/* - Format constants:
open-sse/translator/formats.ts
Persistence
src/lib/db/*: persistent config/state and domain persistence on SQLitesrc/lib/localDb.ts: compatibility re-export for DB modulessrc/lib/usageDb.ts: usage history/call logs facade on top of SQLite tables
Provider Executor Coverage (Strategy Pattern)
Each provider has a specialized executor extending BaseExecutor (in open-sse/executors/base.ts), which provides URL building, header construction, retry with exponential backoff, credential refresh hooks, and the execute() orchestration method.
| Executor | Provider(s) | Special Handling |
|---|---|---|
DefaultExecutor |
OpenAI, Claude, Gemini, Qwen, OpenRouter, GLM, Kimi, MiniMax, DeepSeek, Groq, xAI, Mistral, Perplexity, Together, Fireworks, Cerebras, Cohere, NVIDIA, etc. | Dynamic URL/header config per provider |
AntigravityExecutor |
Google Antigravity | Custom project/session IDs, Retry-After parsing |
CliProxyApiExecutor |
CLIProxyAPI-compatible providers | Custom auth and protocol handling |
CloudflareAiExecutor |
Cloudflare Workers AI | Account ID injection, Neurons-based usage tracking |
CodexExecutor |
OpenAI Codex | Injects system instructions, forces reasoning effort |
CursorExecutor |
Cursor IDE | ConnectRPC protocol, Protobuf encoding, request signing via checksum |
GithubExecutor |
GitHub Copilot | Copilot token refresh, VSCode-mimicking headers |
GeminiCLIExecutor |
Gemini CLI | Google OAuth token refresh cycle |
KiroExecutor |
AWS CodeWhisperer/Kiro | AWS EventStream binary format → SSE conversion |
OpenCodeExecutor |
OpenCode | AI SDK compatible provider setup |
PollinationsExecutor |
Pollinations AI | No API key required, rate-limited requests |
PuterExecutor |
Puter | Browser-based provider integration |
QoderExecutor |
Qoder AI | PAT and OAuth support, multi-model free tier |
VertexExecutor |
Google Vertex AI | Service account auth, region-based endpoints |
All other providers (including custom compatible nodes) use the DefaultExecutor.
Provider Compatibility Matrix
| Provider | Format | Auth | Stream | Non-Stream | Token Refresh | Usage API |
|---|---|---|---|---|---|---|
| Claude | claude | API Key / OAuth | ✅ | ✅ | ✅ | ⚠️ Admin only |
| Gemini | gemini | API Key / OAuth | ✅ | ✅ | ✅ | ⚠️ Cloud Console |
| Gemini CLI | gemini-cli | OAuth | ✅ | ✅ | ✅ | ⚠️ Cloud Console |
| Antigravity | antigravity | OAuth | ✅ | ✅ | ✅ | ✅ Full quota API |
| OpenAI | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Codex | openai-responses | OAuth | ✅ forced | ❌ | ✅ | ✅ Rate limits |
| GitHub Copilot | openai | OAuth + Copilot Token | ✅ | ✅ | ✅ | ✅ Quota snapshots |
| Cursor | cursor | Custom checksum | ✅ | ✅ | ❌ | ❌ |
| Kiro | kiro | AWS SSO OIDC | ✅ (EventStream) | ❌ | ✅ | ✅ Usage limits |
| Qwen | openai | OAuth | ✅ | ✅ | ✅ | ⚠️ Per request |
| Qoder | openai | OAuth / PAT | ✅ | ✅ | ✅ | ⚠️ Per request |
| Kilo Code | openai | OAuth | ✅ | ✅ | ✅ | ❌ |
| Cline | openai | OAuth | ✅ | ✅ | ✅ | ❌ |
| Kimi Coding | openai | OAuth | ✅ | ✅ | ✅ | ❌ |
| OpenRouter | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| GLM/Kimi/MiniMax | claude | API Key | ✅ | ✅ | ❌ | ❌ |
| DeepSeek | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Groq | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| xAI (Grok) | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Mistral | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Perplexity | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Together AI | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Fireworks AI | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Cerebras | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Cohere | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| NVIDIA NIM | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Cloudflare AI | openai | API Token + Acct ID | ✅ | ✅ | ❌ | ❌ |
| Pollinations | openai | None (no key) | ✅ | ✅ | ❌ | ❌ |
| Scaleway AI | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| LongCat | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Ollama Cloud | openai | API Key (optional) | ✅ | ✅ | ❌ | ❌ |
| HuggingFace | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Nebius | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| SiliconFlow | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Hyperbolic | openai | API Key | ✅ | ✅ | ❌ | ❌ |
| Vertex AI | gemini | Service Account | ✅ | ✅ | ✅ | ⚠️ Cloud Console |
| Puter | openai | API Key | ✅ | ✅ | ❌ | ❌ |
Format Translation Coverage
Detected source formats include:
openaiopenai-responsesclaudegemini
Target formats include:
- OpenAI chat/Responses
- Claude
- Gemini/Gemini-CLI/Antigravity envelope
- Kiro
- Cursor
Translations use OpenAI as the hub format — all conversions go through OpenAI as intermediate:
Source Format → OpenAI (hub) → Target Format
Translations are selected dynamically based on source payload shape and provider target format.
Additional processing layers in the translation pipeline:
- Response sanitization — Strips non-standard fields from OpenAI-format responses (both streaming and non-streaming) to ensure strict SDK compliance
- Role normalization — Converts
developer→systemfor non-OpenAI targets; mergessystem→userfor models that reject the system role (GLM, ERNIE) - Think tag extraction — Parses
<think>...</think>blocks from content intoreasoning_contentfield - Structured output — Converts OpenAI
response_format.json_schemato Gemini'sresponseMimeType+responseSchema
Supported API Endpoints
| Endpoint | Format | Handler |
|---|---|---|
POST /v1/chat/completions |
OpenAI Chat | src/sse/handlers/chat.ts |
POST /v1/messages |
Claude Messages | Same handler (auto-detected) |
POST /v1/responses |
OpenAI Responses | open-sse/handlers/responsesHandler.ts |
POST /v1/embeddings |
OpenAI Embeddings | open-sse/handlers/embeddings.ts |
GET /v1/embeddings |
Model listing | API route |
POST /v1/images/generations |
OpenAI Images | open-sse/handlers/imageGeneration.ts |
GET /v1/images/generations |
Model listing | API route |
POST /v1/providers/{provider}/chat/completions |
OpenAI Chat | Dedicated per-provider with model validation |
POST /v1/providers/{provider}/embeddings |
OpenAI Embeddings | Dedicated per-provider with model validation |
POST /v1/providers/{provider}/images/generations |
OpenAI Images | Dedicated per-provider with model validation |
POST /v1/messages/count_tokens |
Claude Token Count | API route |
GET /v1/models |
OpenAI Models list | API route (chat + embedding + image + custom models) |
GET /api/models/catalog |
Catalog | All models grouped by provider + type |
POST /v1beta/models/*:streamGenerateContent |
Gemini native | API route |
GET/PUT/DELETE /api/settings/proxy |
Proxy Config | Network proxy configuration |
POST /api/settings/proxy/test |
Proxy Connectivity | Proxy health/connectivity test endpoint |
GET/POST/DELETE /api/provider-models |
Provider Models | Provider model metadata backing custom and managed available models |
Bypass Handler
The bypass handler (open-sse/utils/bypassHandler.ts) intercepts known "throwaway" requests from Claude CLI — warmup pings, title extractions, and token counts — and returns a fake response without consuming upstream provider tokens. This is triggered only when User-Agent contains claude-cli.
Request Logging and Artifacts
The older file-based request logger (open-sse/utils/requestLogger.ts) is retained only for
legacy compatibility. The current runtime contract uses:
APP_LOG_TO_FILE=truefor application and audit logs written under<repo>/logs/- SQLite-backed call log records in
call_logs ${DATA_DIR}/call_logs/YYYY-MM-DD/...artifacts when the call log pipeline is enabled
Failure Modes and Resilience
1) Account/Provider Availability
- provider account cooldown on transient/rate/auth errors
- account fallback before failing request
- combo model fallback when current model/provider path is exhausted
2) Token Expiry
- pre-check and refresh with retry for refreshable providers
- 401/403 retry after refresh attempt in core path
3) Stream Safety
- disconnect-aware stream controller
- translation stream with end-of-stream flush and
[DONE]handling - usage estimation fallback when provider usage metadata is missing
4) Cloud Sync Degradation
- sync errors are surfaced but local runtime continues
- scheduler has retry-capable logic, but periodic execution currently calls single-attempt sync by default
5) Data Integrity
- SQLite schema migrations and auto-upgrade hooks at startup
- legacy JSON → SQLite migration compatibility path
6) SSRF / Outbound URL Guard
src/shared/network/outboundUrlGuard.tsblocks all private/loopback/link-local target URLs before they reach provider executors- Provider model discovery and validation routes use
src/shared/network/safeOutboundFetch.tswhich applies the guard before every outbound request - Guard errors surface as
URL_GUARD_BLOCKEDwith HTTP 422 and are logged to the compliance audit trail viaproviderAudit.ts
Observability and Operational Signals
Runtime visibility sources:
- console logs from
src/sse/utils/logger.ts - per-request usage aggregates in SQLite (
usage_history,call_logs,proxy_logs) - four-stage detailed payload captures in SQLite (
request_detail_logs) whensettings.detailed_logs_enabled=true - textual request status log in
log.txt(optional/compat) - optional application log files under
logs/whenAPP_LOG_TO_FILE=true - optional request artifacts under
${DATA_DIR}/call_logs/when the call log pipeline is enabled - dashboard usage endpoints (
/api/usage/*) for UI consumption
Detailed request payload capture stores up to four JSON payload stages per routed call:
- raw request received from the client
- translated request actually sent upstream
- provider response reconstructed as JSON; streamed responses are compacted to the final summary plus stream metadata
- final client response returned by OmniRoute; streamed responses are stored in the same compact summary form
Security-Sensitive Boundaries
- JWT secret (
JWT_SECRET) secures dashboard session cookie verification/signing - Initial password bootstrap (
INITIAL_PASSWORD) should be explicitly configured for first-run provisioning - API key HMAC secret (
API_KEY_SECRET) secures generated local API key format - Provider secrets (API keys/tokens) are persisted in local DB and should be protected at filesystem level
- Cloud sync endpoints rely on API key auth + machine id semantics
Environment and Runtime Matrix
Environment variables actively used by code:
- App/auth:
JWT_SECRET,INITIAL_PASSWORD - Storage:
DATA_DIR - Compatible node behavior:
ALLOW_MULTI_CONNECTIONS_PER_COMPAT_NODE - Optional storage base override (Linux/macOS when
DATA_DIRunset):XDG_CONFIG_HOME - Security hashing:
API_KEY_SECRET,MACHINE_ID_SALT - Logging:
APP_LOG_TO_FILE,APP_LOG_RETENTION_DAYS,CALL_LOG_RETENTION_DAYS - Sync/cloud URLing:
NEXT_PUBLIC_BASE_URL,NEXT_PUBLIC_CLOUD_URL - Outbound proxy:
HTTP_PROXY,HTTPS_PROXY,ALL_PROXY,NO_PROXYand lowercase variants - SOCKS5 feature flags:
ENABLE_SOCKS5_PROXY,NEXT_PUBLIC_ENABLE_SOCKS5_PROXY - Platform/runtime helpers (not app-specific config):
APPDATA,NODE_ENV,PORT,HOSTNAME
Known Architectural Notes
usageDbandlocalDbshare the same base directory policy (DATA_DIR->XDG_CONFIG_HOME/omniroute->~/.omniroute) with legacy file migration./api/v1/route.tsdelegates to the same unified catalog builder used by/api/v1/models(src/app/api/v1/models/catalog.ts) to avoid semantic drift.- Request logger writes full headers/body when enabled; treat log directory as sensitive.
- Cloud behavior depends on correct
NEXT_PUBLIC_BASE_URLand cloud endpoint reachability. - The
open-sse/directory is published as the@omniroute/open-ssenpm workspace package. Source code imports it via@omniroute/open-sse/...(resolved by Next.jstranspilePackages). File paths in this document still use the directory nameopen-sse/for consistency. - Charts in the dashboard use Recharts (SVG-based) for accessible, interactive analytics visualizations (model usage bar charts, provider breakdown tables with success rates).
- E2E tests use Playwright (
tests/e2e/), run vianpm run test:e2e. Unit tests use Node.js test runner (tests/unit/), run vianpm run test:unit. Source code undersrc/is TypeScript (.ts/.tsx); theopen-sse/workspace remains JavaScript (.js). - Settings page is organized into 5 tabs: Security, Routing (6 global strategies: fill-first, round-robin, p2c, random, least-used, cost-optimized), Resilience (editable rate limits, circuit breaker, policies, Context Relay handoff config), AI (thinking budget, system prompt, prompt cache), Advanced (proxy).
- Context Relay strategy (
context-relay) is split across two layers:combo.tsdecides if a handoff should be generated,chat.tsinjects the handoff after account resolution. Handoff data lives incontext_handoffsSQLite table. This split is intentional because onlychat.tsknows whether the actual account changed. - Proxy enforcement is now comprehensive:
tokenHealthCheck.tsresolves proxy per connection,/api/providers/validateusesrunWithProxyContext, andproxyFetch.tsusesundici.fetch()to maintain dispatcher compatibility on Node 22. - Node.js runtime policy detection:
/api/settings/require-loginreturnsnodeVersionandnodeCompatiblefields. The login page renders a warning banner when the runtime falls outside the supported secure Node.js lines.
Operational Verification Checklist
- Build from source:
npm run build - Build Docker image:
docker build -t omniroute . - Start service and verify:
GET /api/settingsGET /api/v1/models- CLI target base URL should be
http://<host>:20128/v1whenPORT=20128