mirror of https://github.com/diegosouzapw/OmniRoute.git synced 2026-04-26 13:31:00 +00:00

Huzaifa Al Mesbah 5b8e41c1b2

docs: fix broken documentation links in README and i18n (#1536 )

* docs: fix broken documentation links in README and i18n

- Fix auto-combo.md → AUTO-COMBO.md case mismatch in README and all 32 i18n READMEs
- Add missing docs/features/context-relay.md English source (copied from i18n)
- Allowlist docs/features/ in .gitignore so the new file is tracked

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add language switcher to context-relay.md English source

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-24 09:04:21 -03:00

118 KiB

Raw Permalink Blame History

🚀 OmniRoute — The Free AI Gateway (Slovenčina)

🌐 Languages: 🇺🇸 English · 🇸🇦 ar · 🇧🇬 bg · 🇧🇩 bn · 🇨🇿 cs · 🇩🇰 da · 🇩🇪 de · 🇪🇸 es · 🇮🇷 fa · 🇫🇮 fi · 🇫🇷 fr · 🇮🇳 gu · 🇮🇱 he · 🇮🇳 hi · 🇭🇺 hu · 🇮🇩 id · 🇮🇹 it · 🇯🇵 ja · 🇰🇷 ko · 🇮🇳 mr · 🇲🇾 ms · 🇳🇱 nl · 🇳🇴 no · 🇵🇭 phi · 🇵🇱 pl · 🇵🇹 pt · 🇧🇷 pt-BR · 🇷🇴 ro · 🇷🇺 ru · 🇸🇰 sk · 🇸🇪 sv · 🇰🇪 sw · 🇮🇳 ta · 🇮🇳 te · 🇹🇭 th · 🇹🇷 tr · 🇺🇦 uk-UA · 🇵🇰 ur · 🇻🇳 vi · 🇨🇳 zh-CN

Never stop coding. Smart routing to FREE & low-cost AI models with automatic fallback.

Your universal API proxy — one endpoint, 100+ providers, zero downtime. Now with MCP Server (25 tools), A2A Protocol, Memory/Skills Systems & Electron Desktop App.

Chat Completions • Embeddings • Image Generation • Video • Music • Audio • Reranking • Web Search • MCP Server • A2A Protocol • 100% TypeScript

🌐 Website • 🚀 Quick Start • 💡 Features • 📖 Docs • 💰 Pricing • 💬 WhatsApp

🖼️ Main Dashboard

📸 Dashboard Preview

Click to see dashboard screenshots

Page	Screenshot
Providers
Combos
Analytics
Health
Translator
Settings
CLI Tools
Usage Logs
Endpoints

🤖 Free AI Provider for your favorite coding agents

Connect any AI-powered IDE or CLI tool through OmniRoute — free API gateway for unlimited coding.

OpenClaw _{⭐ 205K}	NanoBot _{⭐ 20.9K}	PicoClaw _{⭐ 14.6K}	ZeroClaw _{⭐ 9.9K}	IronClaw _{⭐ 2.1K}
OpenCode _{⭐ 106K}	Codex CLI _{⭐ 60.8K}	Claude Code _{⭐ 67.3K}	Gemini CLI _{⭐ 94.7K}	Kilo Code _{⭐ 15.5K}

_{📡 All agents connect via http://localhost:20128/v1 or http://cloud.omniroute.online/v1 — one config, unlimited models and quota}

🤔 Why OmniRoute?

Stop wasting money and hitting limits:

Subscription quota expires unused every month
Rate limits stop you mid-coding
Expensive APIs ($20-50/month per provider)
Manual switching between providers

OmniRoute solves this:

✅ Maximize subscriptions - Track quota, use every bit before reset
✅ Auto fallback - Subscription → API Key → Cheap → Free, zero downtime
✅ Multi-account - Round-robin between accounts per provider
✅ Universal - Works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool

📧 Support

💬 Join our community! WhatsApp Group — Get help, share tips, and stay updated.

Website: omniroute.online
GitHub: github.com/diegosouzapw/OmniRoute
Issues: github.com/diegosouzapw/OmniRoute/issues
WhatsApp: Community Group
Contributing: See CONTRIBUTING.md, open a PR, or pick a good first issue
Original Project: 9router by decolua

🐛 Reporting a Bug?

When opening an issue, please run the system-info command and attach the generated file:

npm run system-info

This generates a system-info.txt with your Node.js version, OmniRoute version, OS details, installed CLI tools (qoder, gemini, claude, codex, antigravity, droid, etc.), Docker/PM2 status, and system packages — everything we need to reproduce your issue quickly. Attach the file directly to your GitHub issue.

🔄 How It Works

┌─────────────┐
│  Your CLI   │  (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...)
│   Tool      │
└──────┬──────┘
       │ http://localhost:20128/v1
       ↓
┌─────────────────────────────────────────┐
│           OmniRoute (Smart Router)        │
│  • Format translation (OpenAI ↔ Claude) │
│  • Quota tracking + Embeddings + Images │
│  • Auto token refresh                   │
└──────┬──────────────────────────────────┘
       │
       ├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI
       │   ↓ quota exhausted
       ├─→ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc.
       │   ↓ budget limit
       ├─→ [Tier 3: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M)
       │   ↓ budget limit
       └─→ [Tier 4: FREE] Qoder, Qwen, Kiro (unlimited)

Result: Never stop coding, minimal cost

🎯 What OmniRoute Solves — 30 Real Pain Points & Use Cases

Every developer using AI tools faces these problems daily. OmniRoute was built to solve them all — from cost overruns to regional blocks, from broken OAuth flows to protocol operations and enterprise observability.

💸 1. "I pay for an expensive subscription but still get interrupted by limits"

Developers pay $20–200/month for Claude Pro, Codex Pro, or GitHub Copilot. Even paying, quota has a ceiling — 5h of usage, weekly limits, or per-minute rate limits. Mid-coding session, the provider stops responding and the developer loses flow and productivity.

How OmniRoute solves it:

Smart 4-Tier Fallback — If subscription quota runs out, automatically redirects to API Key → Cheap → Free with zero manual intervention
Provider Limits Tracking — Cached quota snapshots refresh on a server-side schedule (default PROVIDER_LIMITS_SYNC_INTERVAL_MINUTES=70) with manual refresh available in the UI
Multi-Account Support — Multiple accounts per provider with auto round-robin — when one runs out, switches to the next
Custom Combos — Customizable fallback chains with 13 balancing strategies (priority, weighted, fill-first, round-robin, P2C, random, least-used, cost-optimized, strict-random, auto, lkgp, context-optimized, context-relay)
Structured Combo Builder — Build combos step-by-step with explicit provider + model + account selection, including repeated providers and fixed-account targets
Quota-Aware P2C — Power-of-two account selection now factors quota headroom, backoff, recent errors, and consecutive use
Codex Business Quotas — Business/Team workspace quota monitoring directly in the dashboard

🔌 2. "I need to use multiple providers but each has a different API"

OpenAI uses one format, Claude (Anthropic) uses another, Gemini yet another. If a dev wants to test models from different providers or fallback between them, they need to reconfigure SDKs, change endpoints, deal with incompatible formats. Custom providers (FriendLI, NIM) have non-standard model endpoints.

How OmniRoute solves it:

Unified Endpoint — A single http://localhost:20128/v1 serves as proxy for all 100+ providers
Format Translation — Automatic and transparent: OpenAI ↔ Claude ↔ Gemini ↔ Responses API
Response Sanitization — Strips non-standard fields (x_groq, usage_breakdown, service_tier) that break OpenAI SDK v1.83+
Role Normalization — Converts developer → system for non-OpenAI providers; system → user for GLM/ERNIE
Think Tag Extraction — Extracts <think> blocks from models like DeepSeek R1 into standardized reasoning_content
Structured Output for Gemini — json_schema → responseMimeType/responseSchema automatic conversion
stream defaults to false — Aligns with OpenAI spec, avoiding unexpected SSE in Python/Rust/Go SDKs

🌐 3. "My AI provider blocks my region/country"

Providers like OpenAI/Codex block access from certain geographic regions. Users get errors like unsupported_country_region_territory during OAuth and API connections. This is especially frustrating for developers from developing countries.

How OmniRoute solves it:

3-Level Proxy Config — Configurable proxy at 3 levels: global (all traffic), per-provider (one provider only), and per-connection/key
Color-Coded Proxy Badges — Visual indicators: 🟢 global proxy, 🟡 provider proxy, 🔵 connection proxy, always showing the IP
OAuth Token Exchange Through Proxy — OAuth flow also goes through the proxy, solving unsupported_country_region_territory
Connection Tests via Proxy — Connection tests use the configured proxy (no more direct bypass)
SOCKS5 Support — Full SOCKS5 proxy support for outbound routing
TLS Fingerprint Spoofing — Browser-like TLS fingerprint via wreq-js to bypass bot detection
🔏 CLI Fingerprint Matching — Reorders headers and body fields to match native CLI binary signatures, drastically reducing account flagging risk. The proxy IP is preserved — you get both stealth and IP masking simultaneously

🆓 4. "I want to use AI for coding but I have no money"

Not everyone can pay $20–200/month for AI subscriptions. Students, devs from emerging countries, hobbyists, and freelancers need access to quality models at zero cost.

How OmniRoute solves it:

Free Tier Providers Built-in — Native support for 100% free providers: Qoder (5 unlimited models via OAuth: kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2, kimi-k2), Qwen (4 unlimited models: qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next, vision-model), Kiro (Claude + AWS Builder ID for free), Gemini CLI (180K tokens/month free)
Ollama Cloud — Cloud-hosted Ollama models at api.ollama.com with free "Light usage" tier; use ollamacloud/<model> prefix
Free-Only Combos — Chain gc/gemini-3-flash → if/kimi-k2-thinking → qw/qwen3-coder-plus = $0/month with zero downtime
NVIDIA NIM Free Access — ~40 RPM dev-forever free access to 70+ models at build.nvidia.com (transitioning from credits to pure rate limits)
Cost Optimized Strategy — Routing strategy that automatically chooses the cheapest available provider

🔒 5. "I need to protect my AI gateway from unauthorized access"

When exposing an AI gateway to the network (LAN, VPS, Docker), anyone with the address can consume the developer's tokens/quota. Without protection, APIs are vulnerable to misuse, prompt injection, and abuse.

How OmniRoute solves it:

API Key Management — Generation, rotation, and scoping per provider with a dedicated /dashboard/api-manager page
Model-Level Permissions — Restrict API keys to specific models (openai/*, wildcard patterns), with Allow All/Restrict toggle
API Endpoint Protection — Require a key for /v1/models and block specific providers from the listing
Auth Guard + CSRF Protection — All dashboard routes protected with withAuth middleware + CSRF tokens
Rate Limiter — Per-IP rate limiting with configurable windows
IP Filtering — Allowlist/blocklist for access control
Prompt Injection Guard — Sanitization against malicious prompt patterns
AES-256-GCM Encryption — Credentials encrypted at rest

🛑 6. "My provider went down and I lost my coding flow"

AI providers can become unstable, return 5xx errors, or hit temporary rate limits. If a dev depends on a single provider, they're interrupted. Without circuit breakers, repeated retries can crash the application.

How OmniRoute solves it:

Request Queue & Pacing — Per-connection request buckets smooth bursts before they hit upstream rate caps
Connection Cooldown — A single connection cools down after retryable failures with optional upstream Retry-After hints and exponential backoff
Provider Circuit Breaker — The provider only trips after fallback is exhausted and the provider request still fails with provider-wide transient errors; connection-scoped 429 rate limits stay in Connection Cooldown
Wait For Cooldown — The server can wait for the earliest connection cooldown to expire and retry the same client request automatically
Anti-Thundering Herd — Mutex + semaphore protection against concurrent retry storms
Combo Fallback Chains — If the primary provider fails, automatically falls through the chain with no intervention
Health Dashboard — Uptime monitoring, provider circuit breaker states, cooldowns, cache stats, p50/p95/p99 latency

🔧 7. "Configuring each AI tool is tedious and repetitive"

Developers use Cursor, Claude Code, Codex CLI, OpenClaw, Gemini CLI, Kilo Code... Each tool needs a different config (API endpoint, key, model). Reconfiguring when switching providers or models is a waste of time.

How OmniRoute solves it:

CLI Tools Dashboard — Dedicated page with one-click setup for Claude Code, Codex CLI, OpenClaw, Kilo Code, Antigravity, Cline
GitHub Copilot Config Generator — Generates chatLanguageModels.json for VS Code with bulk model selection
Onboarding Wizard — Guided 4-step setup for first-time users
One endpoint, all models — Configure http://localhost:20128/v1 once, access 100+ providers

🔑 8. "Managing OAuth tokens from multiple providers is hell"

Claude Code, Codex, Gemini CLI, Copilot — all use OAuth 2.0 with expiring tokens. Developers need to re-authenticate constantly, deal with client_secret is missing, redirect_uri_mismatch, and failures on remote servers. OAuth on LAN/VPS is particularly problematic.

How OmniRoute solves it:

Auto Token Refresh — OAuth tokens refresh in background before expiration
OAuth 2.0 (PKCE) Built-in — Automatic flow for Claude Code, Codex, Gemini CLI, Copilot, Kiro, Qwen, Qoder
Multi-Account OAuth — Multiple accounts per provider via JWT/ID token extraction
OAuth LAN/Remote Fix — Private IP detection for redirect_uri + manual URL mode for remote servers
OAuth Behind Nginx — Uses window.location.origin for reverse proxy compatibility
Remote OAuth Guide — Step-by-step guide for Google Cloud credentials on VPS/Docker

📊 9. "I don't know how much I'm spending or where"

Developers use multiple paid providers but have no unified view of spending. Each provider has its own billing dashboard, but there's no consolidated view. Unexpected costs can pile up.

How OmniRoute solves it:

Cost Analytics Dashboard — Per-token cost tracking and budget management per provider
Budget Limits per Tier — Spending ceiling per tier that triggers automatic fallback
Per-Model Pricing Configuration — Configurable prices per model
Usage Statistics Per API Key — Request count and last-used timestamp per key
Analytics Dashboard — Stat cards, model usage chart, provider table with success rates and latency

🐛 10. "I can't diagnose errors and problems in AI calls"

When a call fails, the dev doesn't know if it was a rate limit, expired token, wrong format, or provider error. Fragmented logs across different terminals. Without observability, debugging is trial-and-error.

How OmniRoute solves it:

Unified Logs Dashboard — 4 tabs: Request Logs, Proxy Logs, Audit Logs, Console
Console Log Viewer — Real-time terminal-style viewer with color-coded levels, auto-scroll, search, filter
SQLite Summary Logs — Request and proxy log indexes stay queryable across restarts without loading large payload blobs into SQLite
Translator Playground — 4 debugging modes: Playground (format translation), Chat Tester (round-trip), Test Bench (batch), Live Monitor (real-time)
Request Telemetry — p50/p95/p99 latency + X-Request-Id tracing
File-Based Detail Artifacts — App logs rotate by size, retention days, and archive count; detailed request/response payloads live in DATA_DIR/call_logs/ and rotate independently of SQLite summaries
System Info Report — npm run system-info generates system-info.txt with your full environment (Node version, OmniRoute version, OS, CLI tools, Docker/PM2 status). Attach it when reporting issues for instant triage.

🏗️ 11. "Deploying and maintaining the gateway is complex"

Installing, configuring, and maintaining an AI proxy across different environments (local, VPS, Docker, cloud) is labor-intensive. Problems like hardcoded paths, EACCES on directories, port conflicts, and cross-platform builds add friction.

How OmniRoute solves it:

npm global install — npm install -g omniroute && omniroute — done
Docker Multi-Platform — AMD64 + ARM64 native (Apple Silicon, AWS Graviton, Raspberry Pi)
Docker Compose Profiles — base (no CLI tools) and cli (with Claude Code, Codex, OpenClaw)
Electron Desktop App — Native app for Windows/macOS/Linux with system tray, auto-start, offline mode
Split-Port Mode — API and Dashboard on separate ports for advanced scenarios (reverse proxy, container networking)
Cloud Sync — Config synchronization across devices via Cloudflare Workers
DB Backups — Automatic backup, restore, export and import of all settings, with DISABLE_SQLITE_AUTO_BACKUP for externally managed backups

🌍 12. "The interface is English-only and my team doesn't speak English"

Teams in non-English-speaking countries, especially in Latin America, Asia, and Europe, struggle with English-only interfaces. Language barriers reduce adoption and increase configuration errors.

How OmniRoute solves it:

Dashboard i18n — 30 Languages — All 500+ keys translated including Arabic, Bulgarian, Danish, German, Spanish, Finnish, French, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese (PT/BR), Romanian, Russian, Slovak, Swedish, Thai, Ukrainian, Vietnamese, Chinese, Filipino, English
RTL Support — Right-to-left support for Arabic and Hebrew
Multi-Language READMEs — 30 complete documentation translations
Language Selector — Globe icon in header for real-time switching

🔄 13. "I need more than chat — I need embeddings, images, audio"

AI isn't just chat completion. Devs need to generate images, transcribe audio, create embeddings for RAG, rerank documents, and moderate content. Each API has a different endpoint and format.

How OmniRoute solves it:

Embeddings — /v1/embeddings with 6 providers and 9+ models
Image Generation — /v1/images/generations with 10 providers and 20+ models (OpenAI, xAI, Together, Fireworks, Nebius, Hyperbolic, NanoBanana, Antigravity, SD WebUI, ComfyUI)
Text-to-Video — /v1/videos/generations — ComfyUI (AnimateDiff, SVD) and SD WebUI
Text-to-Music — /v1/music/generations — ComfyUI (Stable Audio Open, MusicGen)
Audio Transcription — /v1/audio/transcriptions — Whisper + Nvidia NIM, HuggingFace, Qwen3
Text-to-Speech — /v1/audio/speech — ElevenLabs, Nvidia NIM, HuggingFace, Coqui, Tortoise, Qwen3, Inworld, Cartesia, PlayHT, + existing providers
Moderations — /v1/moderations — Content safety checks
Reranking — /v1/rerank — Document relevance reranking
Responses API — Full /v1/responses support for Codex

🧪 14. "I have no way to test and compare quality across models"

Developers want to know which model is best for their use case — code, translation, reasoning — but comparing manually is slow. No integrated eval tools exist.

How OmniRoute solves it:

LLM Evaluations — Golden set testing with 10 pre-loaded cases covering greetings, math, geography, code generation, JSON compliance, translation, markdown, safety refusal
4 Match Strategies — exact, contains, regex, custom (JS function)
Translator Playground Test Bench — Batch testing with multiple inputs and expected outputs, cross-provider comparison
Chat Tester — Full round-trip with visual response rendering
Live Monitor — Real-time stream of all requests flowing through the proxy

📈 15. "I need to scale without losing performance"

As request volume grows, without caching the same questions generate duplicate costs. Without idempotency, duplicate requests waste processing. Per-provider rate limits must be respected.

How OmniRoute solves it:

Semantic Cache — Two-tier cache (signature + semantic) reduces cost and latency
Request Idempotency — 5s deduplication window for identical requests
Rate Limit Detection — Per-provider RPM, min gap, and max concurrent tracking
Request Queue & Pacing — Configurable queue, pacing, and concurrency defaults in Settings → Resilience
API Key Validation Cache — 3-tier cache for production performance
Health Dashboard with Telemetry — p50/p95/p99 latency, cache stats, uptime

🤖 16. "I want to control model behavior globally"

Developers who want all responses in a specific language, with a specific tone, or want to limit reasoning tokens. Configuring this in every tool/request is impractical.

How OmniRoute solves it:

System Prompt Injection — Global prompt applied to all requests
Thinking Budget Validation — Reasoning token allocation control per request (passthrough, auto, custom, adaptive)
9 Routing Strategies — Global strategies that determine how requests are distributed
Wildcard Router — provider/* patterns route dynamically to any provider
Combo Enable/Disable Toggle — Toggle combos directly from the dashboard
Manual Combo Ordering — Drag combo cards by handle and persist the order in SQLite
Provider Toggle — Enable/disable all connections for a provider with one click
Blocked Providers — Exclude specific providers from /v1/models listing

🧰 17. "I need MCP tools as first-class product capabilities"

Many AI gateways expose MCP only as a hidden implementation detail. Teams need a visible, manageable operation layer.

How OmniRoute solves it:

MCP appears in the dashboard navigation and endpoint protocol tab
Dedicated MCP management page with process, tools, scopes, and audit
Built-in quick-start for omniroute --mcp and client onboarding

🧠 18. "I need A2A orchestration with sync + stream task paths"

Agent workflows need both direct replies and long-running streamed execution with lifecycle control.

How OmniRoute solves it:

A2A JSON-RPC endpoint (POST /a2a) with message/send and message/stream
SSE streaming with terminal state propagation
Task lifecycle APIs for tasks/get and tasks/cancel

🛰️ 19. "I need real MCP process health, not guessed status"

Operational teams need to know if MCP is actually alive, not just whether an API is reachable.

How OmniRoute solves it:

Runtime heartbeat file with PID, timestamps, transport, tool count, and scope mode
MCP status API combining heartbeat + recent activity
UI status cards for process/uptime/heartbeat freshness

📋 20. "I need auditable MCP tool execution"

When tools mutate config or trigger ops actions, teams need forensic traceability.

How OmniRoute solves it:

SQLite-backed audit logging for MCP tool calls
Filters by tool, success/failure, API key, and pagination
Dashboard audit table + stats endpoints for automation

🔐 21. "I need scoped MCP permissions per integration"

Different clients should have least-privilege access to tool categories.

How OmniRoute solves it:

10 granular MCP scopes for controlled tool access
Scope enforcement and visibility in MCP management UI
Safe default posture for operational tooling

⚙️ 22. "I need operational controls without redeploying"

Teams need quick runtime changes during incidents or cost events.

How OmniRoute solves it:

Switch combo activation directly from MCP dashboard
Tune queue, cooldown, breaker, and wait settings from the dedicated Resilience page
Review live provider breaker state from the Health dashboard

🔄 23. "I need live A2A task lifecycle visibility and cancellation"

Without lifecycle visibility, task incidents become hard to triage.

How OmniRoute solves it:

Task listing/filtering by state/skill with pagination
Drill-down on task metadata, events, and artifacts
Task cancellation endpoint and UI action with confirmation

🌊 24. "I need active stream metrics for A2A load"

Streaming workflows require operational insight into concurrency and live connections.

How OmniRoute solves it:

Active stream counters integrated into A2A status
Last task timestamp and per-state counts
A2A dashboard cards for real-time ops monitoring

🪪 25. "I need standard agent discovery for clients"

External clients and orchestrators need machine-readable metadata for onboarding.

How OmniRoute solves it:

Agent Card exposed at /.well-known/agent.json
Capabilities and skills shown in management UI
A2A status API includes discovery metadata for automation

🧭 26. "I need protocol discoverability in the product UX"

If users cannot discover protocol surfaces, adoption and support quality drop.

How OmniRoute solves it:

Consolidated Endpoints page with tabs for Proxy, MCP, A2A, and API Endpoints
Inline service status toggles (Online/Offline) for MCP and A2A
Links from overview to dedicated management tabs

🧪 27. "I need end-to-end protocol validation with real clients"

Mock tests are not enough to validate protocol compatibility before release.

How OmniRoute solves it:

E2E suite that boots app and uses real MCP SDK client transport
A2A client tests for discovery, send, stream, get, and cancel flows
Cross-check assertions against MCP audit and A2A tasks APIs

📡 28. "I need unified observability across all interfaces"

Splitting observability by protocol creates blind spots and longer MTTR.

How OmniRoute solves it:

Unified dashboards/logs/analytics in one product
Health + audit + request telemetry across OpenAI, MCP, and A2A layers
Operational APIs for status and automation

💼 29. "I need one runtime for proxy + tools + agent orchestration"

Running many separate services increases operational cost and failure modes.

How OmniRoute solves it:

OpenAI-compatible proxy, MCP server, and A2A server in one stack
Shared auth, resilience, data store, and observability
Consistent policy model across all interaction surfaces

🚀 30. "I need to ship agentic workflows without glue-code sprawl"

Teams lose velocity when stitching multiple ad-hoc services and scripts.

How OmniRoute solves it:

Unified endpoint strategy for clients and agents
Built-in protocol management UIs and smoke validation paths
Production-ready foundations (security, logging, resilience, backup)

📚 31. "My long sessions crash with 'context_length_exceeded' limits"

During deep debugging, long histories with tool results quickly exceed provider token windows, causing failed requests and orphaned context.

How OmniRoute solves it:

Proactive Context Compression — Evaluates token budgets before the request hits upstream and proactively prunes old conversation history with a smart binary-search mechanism.
Structural Integrity Guards — Automatically tracks explicit tool_use definitions and ensures that if a tool input is truncated, its corresponding tool_result is also safely removed, preventing API validation errors.
Multi-Layer Dropping — Progressively drops system messages, regular messages, and finally enforces strict length limits without breaking conversational logic.

Example Playbooks (Integrated Use Cases)

Playbook A: Maximize paid subscription + cheap backup

Combo: "maximize-claude"
  1. cc/claude-opus-4-7
  2. glm/glm-4.7
  3. if/kimi-k2-thinking

Monthly cost: $20 + small backup spend
Outcome: higher quality, near-zero interruption

Playbook B: Zero-cost coding stack

Combo: "free-forever"
  1. gc/gemini-3-flash
  2. if/kimi-k2-thinking
  3. qw/qwen3-coder-plus

Monthly cost: $0
Outcome: stable free coding workflow

Playbook C: 24/7 always-on fallback chain

Combo: "always-on"
  1. cc/claude-opus-4-7
  2. cx/gpt-5.2-codex
  3. glm/glm-4.7
  4. minimax/MiniMax-M2.1
  5. if/kimi-k2-thinking

Outcome: deep fallback depth for deadline-critical workloads

Playbook D: Agent ops with MCP + A2A

1) Start MCP transport (`omniroute --mcp`) for tool-driven operations
2) Run A2A tasks via `message/send` and `message/stream`
3) Observe via /dashboard/endpoint (MCP and A2A tabs)
4) Toggle services via inline status controls

🆓 Start Free — Zero Configuration Cost

Setup AI coding in minutes at $0/month. Connect these free accounts and use the built-in Free Stack combo.

Step	Action	Providers Unlocked
1	Connect Kiro (AWS Builder ID OAuth)	Claude Sonnet 4.5, Haiku 4.5 — unlimited
2	Connect Qoder (Google OAuth)	kimi-k2-thinking, qwen3-coder-plus, deepseek-r1... — unlimited
3	Connect Qwen (Device Code)	qwen3-coder-plus, qwen3-coder-flash... — unlimited
4	Connect Gemini CLI (Google OAuth)	gemini-3-flash, gemini-2.5-pro — 180K/mo free
5	`/dashboard/combos` → Free Stack ($0) template	Round-robin all free providers automatically

Point any IDE/CLI to: http://localhost:20128/v1 · API Key: any-string · Done.

Optional extra coverage (also free): Groq API key (30 RPM free), NVIDIA NIM (40 RPM free, 70+ models), Cerebras (1M tok/day), LongCat API key (50M tokens/day!), Cloudflare Workers AI (10K Neurons/day, 50+ models).

Rýchly štart

1) Install and run

npm install -g omniroute
omniroute

pnpm users: Run pnpm approve-builds -g after install to enable native build scripts required by better-sqlite3 and @swc/core:
pnpm install -g omniroute
pnpm approve-builds -g   # Select all packages → approve
omniroute

Dashboard opens at http://localhost:20128 and API base URL is http://localhost:20128/v1.

Arch Linux (AUR)

Arch Linux users can install the AUR package, which installs OmniRoute and provides a systemd user service:

yay -S omniroute-bin
systemctl --user enable --now omniroute.service

Command	Description
`omniroute`	Start server (`PORT=20128`, API and dashboard on same port)
`omniroute --port 3000`	Set canonical/API port to 3000
`omniroute --mcp`	Start MCP server (stdio transport)
`omniroute --no-open`	Don't auto-open browser
`omniroute --help`	Show help

Optional split-port mode:

PORT=20128 DASHBOARD_PORT=20129 omniroute
# API:       http://localhost:20128/v1
# Dashboard: http://localhost:20129

2) Uninstalling

When you no longer need OmniRoute, we provide two quick scripts for a clean removal:

Command	Action
`npm run uninstall`	Removes the system app but keeps your DB and configurations in `~/.omniroute`.
`npm run uninstall:full`	Removes the app AND permanently erases all configurations, keys, and databases.

Note: To run these commands, navigate to the OmniRoute project folder (if you cloned it) and run them. Alternatively, if globally installed, you can simply run npm uninstall -g omniroute.

Long-Running Streaming Timeouts

For most deployments, you only need:

Variable	Default	Purpose
`REQUEST_TIMEOUT_MS`	`600000`	Shared baseline for upstream response-start timeout, hidden Undici timeouts, TLS fingerprint requests, and API bridge request/proxy timeouts
`STREAM_IDLE_TIMEOUT_MS`	inherits `REQUEST_TIMEOUT_MS`	Maximum gap between streaming chunks before OmniRoute aborts the SSE stream

Backward compatibility is preserved: existing FETCH_TIMEOUT_MS, API_BRIDGE_PROXY_TIMEOUT_MS, and other per-layer timeout vars still work and override the shared baseline.

For Claude Code-compatible upstreams (anthropic-compatible-cc-*), OmniRoute also derives the outbound X-Stainless-Timeout header from the resolved fetch timeout so provider-side read timeouts stay aligned with your env configuration.

For third-party Claude Code-compatible reverse proxies, OmniRoute keeps the default anthropic-beta set conservative and, when Client Cache Control is left on Auto, only forwards client-provided cache_control markers. If the request does not include cache_control, OmniRoute does not inject bridge-owned markers.

Advanced overrides are available if you need finer control:

Variable	Default	Purpose
`FETCH_TIMEOUT_MS`	inherits `REQUEST_TIMEOUT_MS`	Upstream response-start timeout used until response headers arrive
`FETCH_HEADERS_TIMEOUT_MS`	inherits `FETCH_TIMEOUT_MS`	Undici time limit for receiving upstream response headers
`FETCH_BODY_TIMEOUT_MS`	inherits `FETCH_TIMEOUT_MS`	Undici time limit between upstream body chunks (`0` disables it)
`FETCH_CONNECT_TIMEOUT_MS`	`30000`	Undici TCP connect timeout
`FETCH_KEEPALIVE_TIMEOUT_MS`	`4000`	Undici idle keep-alive socket timeout
`TLS_CLIENT_TIMEOUT_MS`	inherits `FETCH_TIMEOUT_MS`	Timeout for TLS fingerprint requests made through `wreq-js`
`API_BRIDGE_PROXY_TIMEOUT_MS`	inherits `REQUEST_TIMEOUT_MS` or `30000`	Timeout for `/v1` proxy forwarding from API port to dashboard port
`API_BRIDGE_SERVER_REQUEST_TIMEOUT_MS`	`max(API_BRIDGE_PROXY_TIMEOUT_MS, 300000)`	Incoming request timeout on the API bridge server
`API_BRIDGE_SERVER_HEADERS_TIMEOUT_MS`	`60000`	Incoming header timeout on the API bridge server
`API_BRIDGE_SERVER_KEEPALIVE_TIMEOUT_MS`	`5000`	Keep-alive timeout on the API bridge server
`API_BRIDGE_SERVER_SOCKET_TIMEOUT_MS`	`0`	Socket inactivity timeout on the API bridge server (`0` disables it)

For streaming requests, FETCH_TIMEOUT_MS only covers connection setup / waiting for the first upstream response. Once the stream is active, OmniRoute will only abort on an actual stall (STREAM_IDLE_TIMEOUT_MS) or Undici body inactivity (FETCH_BODY_TIMEOUT_MS).

If you run OmniRoute behind Nginx, Caddy, Cloudflare, or another reverse proxy, make sure the proxy timeouts are also higher than your OmniRoute stream/fetch timeouts.

2) Connect providers and create your API key

Open Dashboard → Providers and connect at least one provider (OAuth or API key).
Open Dashboard → Endpoints and create an API key.
(Optional) Open Dashboard → Combos and set your fallback chain.

3) Point your coding tool to OmniRoute

Base URL: http://localhost:20128/v1
API Key:  [copy from Endpoint page]
Model:    if/kimi-k2-thinking (or any provider/model prefix)

Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Cline, OpenClaw, OpenCode, and OpenAI-compatible SDKs.

4) Enable and validate protocols (v2.0)

MCP (for tool-driven operations):

omniroute --mcp

Then connect your MCP client over stdio and test tools like:

omniroute_get_health
omniroute_list_combos

A2A (for agent-to-agent workflows):

curl http://localhost:20128/.well-known/agent.json

curl -X POST http://localhost:20128/a2a \
  -H 'content-type: application/json' \
  -d '{"jsonrpc":"2.0","id":"quickstart","method":"message/send","params":{"skill":"quota-management","messages":[{"role":"user","content":"Give me a short quota summary."}]}}'

5) Validate everything end-to-end (recommended)

npm run test:protocols:e2e

This suite validates real MCP and A2A client flows against a running app.

Alternative: run from source

cp .env.example .env
npm install
PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run dev

Void Linux (`xbps-src` template)

For Void Linux users, you can build a native package using xbps-src. Save this block as srcpkgs/omniroute/template:

# Template file for 'omniroute'
pkgname=omniroute
version=3.4.1
revision=1
hostmakedepends="nodejs python3 make"
depends="openssl"
short_desc="Universal AI gateway with smart routing for multiple LLM providers"
maintainer="zenobit <zenobit@disroot.org>"
license="MIT"
homepage="https://github.com/diegosouzapw/OmniRoute"
distfiles="https://github.com/diegosouzapw/OmniRoute/archive/refs/tags/v${version}.tar.gz"
checksum=009400afee90a9f32599d8fe734145cfd84098140b7287990183dde45ae2245b
system_accounts="_omniroute"
omniroute_homedir="/var/lib/omniroute"
export NODE_ENV=production
export npm_config_engine_strict=false
export npm_config_loglevel=error
export npm_config_fund=false
export npm_config_audit=false

do_build() {
	# Determine target CPU arch for node-gyp
	local _gyp_arch
	case "$XBPS_TARGET_MACHINE" in
		aarch64*) _gyp_arch=arm64 ;;
		armv7*|armv6*) _gyp_arch=arm ;;
		i686*) _gyp_arch=ia32 ;;
		*) _gyp_arch=x64 ;;
	esac

	# 1) Install all deps – skip scripts (no network in do_build, native modules
	#    compiled separately below; better-sqlite3 is serverExternalPackage so
	#    Next.js does not execute it during next build)
	NODE_ENV=development npm ci --ignore-scripts

	# 2) Build the Next.js standalone bundle
	npm run build

	# 3) Copy static assets into standalone
	cp -r .next/static .next/standalone/.next/static
	[ -d public ] && cp -r public .next/standalone/public || true

	# 4) Compile better-sqlite3 native binding for the target architecture.
	#    Use node-gyp directly so CC/CXX from xbps-src cross-toolchain are used
	#    without npm altering them.
	local _node_gyp=/usr/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js
	(cd node_modules/better-sqlite3 && node "$_node_gyp" rebuild --arch="$_gyp_arch")

	# 5) Place the compiled binding into the standalone bundle
	local _bs3_release=.next/standalone/node_modules/better-sqlite3/build/Release
	mkdir -p "$_bs3_release"
	cp node_modules/better-sqlite3/build/Release/better_sqlite3.node "$_bs3_release/"

	# 6) Remove arch-specific sharp bundles – upstream sets images.unoptimized=true
	#    so sharp is not used at runtime; x64 .so files would break aarch64 strip
	rm -rf .next/standalone/node_modules/@img

	# 7) Copy pino runtime deps omitted by Next.js static analysis:
	#    pino-abstract-transport – required by pino's worker thread
	#    split2 – dep of pino-abstract-transport
	#    process-warning – dep of pino itself
	for _mod in pino-abstract-transport split2 process-warning; do
		cp -r "node_modules/$_mod" .next/standalone/node_modules/
	done
}

do_check() {
	npm run test:unit
}

do_install() {
	vmkdir usr/lib/omniroute/.next

	vcopy .next/standalone/. usr/lib/omniroute/.next/standalone

	# Prevent removal of empty Next.js app router dirs by the post-install hook
	for _d in \
		.next/standalone/.next/server/app/dashboard \
		.next/standalone/.next/server/app/dashboard/settings \
		.next/standalone/.next/server/app/dashboard/providers; do
		touch "${DESTDIR}/usr/lib/omniroute/${_d}/.keep"
	done

	cat > "${WRKDIR}/omniroute" <<'EOF'
#!/bin/sh
export PORT="${PORT:-20128}"
export DATA_DIR="${DATA_DIR:-${XDG_DATA_HOME:-${HOME}/.local/share}/omniroute}"
export APP_LOG_TO_FILE="${APP_LOG_TO_FILE:-false}"
mkdir -p "${DATA_DIR}"
exec node /usr/lib/omniroute/.next/standalone/server.js "$@"
EOF
	vbin "${WRKDIR}/omniroute"
}

post_install() {
	vlicense LICENSE
}

🐳 Docker

OmniRoute is available as a public Docker image on Docker Hub.

Quick run:

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

With environment file:

# Copy and edit .env first
cp .env.example .env

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  --env-file .env \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

Using Docker Compose:

# Base profile (no CLI tools)
docker compose --profile base up -d

# CLI profile (Claude Code, Codex, OpenClaw built-in)
docker compose --profile cli up -d

Dashboard support for Docker deployments now includes a one-click Cloudflare Quick Tunnel on Dashboard → Endpoints. The first enable downloads cloudflared only when needed, starts a temporary tunnel to your current /v1 endpoint, and shows the generated https://*.trycloudflare.com/v1 URL directly below your normal public URL.

Notes:

Quick Tunnel URLs are temporary and change after every restart.
Quick Tunnels are not auto-restored after an OmniRoute or container restart. Re-enable them from the dashboard when needed.
Managed install currently supports Linux, macOS, and Windows on x64 / arm64.
Managed Quick Tunnels default to HTTP/2 transport to avoid noisy QUIC UDP buffer warnings in constrained container environments. Set CLOUDFLARED_PROTOCOL=quic or auto if you want a different transport.
Docker images bundle system CA roots and pass them to managed cloudflared, which avoids TLS trust failures when the tunnel bootstraps inside the container.
SQLite runs in WAL mode. docker stop should be allowed to finish so OmniRoute can checkpoint the latest changes back into storage.sqlite.
The bundled Compose files already set a 40s stop grace period. If you run the image directly, keep --stop-timeout 40 (or similar) so manual stops do not cut off shutdown cleanup.
Set CLOUDFLARED_BIN=/absolute/path/to/cloudflared if you want OmniRoute to use an existing binary instead of downloading one.

Using Docker Compose with Caddy (HTTPS Auto-TLS):

OmniRoute can be securely exposed using Caddy's automatic SSL provisioning. Ensure your domain's DNS A record points to your server's IP.

services:
  omniroute:
    image: diegosouzapw/omniroute:latest
    container_name: omniroute
    restart: unless-stopped
    volumes:
      - omniroute-data:/app/data
    environment:
      - PORT=20128
      - NEXT_PUBLIC_BASE_URL=https://your-domain.com

  caddy:
    image: caddy:latest
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    command: caddy reverse-proxy --from https://your-domain.com --to http://omniroute:20128

volumes:
  omniroute-data:

Image	Tag	Size	Description
`diegosouzapw/omniroute`	`latest`	~250MB	Latest stable release
`diegosouzapw/omniroute`	`3.6.2`	~250MB	Current version

🖥️ Desktop App — Offline & Always-On

🆕 NEW! OmniRoute is now available as a native desktop application for Windows, macOS, and Linux.

Run OmniRoute as a standalone desktop app — no terminal, no browser, no internet required for local models. The Electron-based app includes:

🖥️ Native Window — Dedicated app window with system tray integration
🔄 Auto-Start — Launch OmniRoute on system login
🔔 Native Notifications — Get alerts for quota exhaustion or provider issues
⚡ One-Click Install — NSIS (Windows), DMG (macOS), AppImage (Linux)
🌐 Offline Mode — Works fully offline with bundled server

Rýchly štart

# Development mode
npm run electron:dev

# Build for your platform
npm run electron:build         # Current platform
npm run electron:build:win     # Windows (.exe)
npm run electron:build:mac     # macOS (.dmg) — x64 & arm64
npm run electron:build:linux   # Linux (.AppImage)

System Tray

When minimized, OmniRoute lives in your system tray with quick actions:

Open dashboard
Change server port
Quit application

📖 Full documentation: electron/README.md

💰 Pricing at a Glance

Tier	Provider	Cost	Quota Reset	Best For
💳 SUBSCRIPTION	Claude Code (Pro)	$20/mo	5h + weekly	Already subscribed
	Codex (Plus/Pro)	$20-200/mo	5h + weekly	OpenAI users
	Gemini CLI	FREE	180K/mo + 1K/day	Everyone!
	GitHub Copilot	$10-19/mo	Monthly	GitHub users
🔑 API KEY	NVIDIA NIM	FREE (dev forever)	~40 RPM	70+ open models
	Cerebras	FREE (1M tok/day)	60K TPM / 30 RPM	World's fastest
	Groq	FREE (30 RPM)	14.4K RPD	Ultra-fast Llama/Gemma
	DeepSeek V3.2	$0.27/$1.10 per 1M	None	Best price/quality reasoning
	xAI Grok-4 Fast	$0.20/$0.50 per 1M 🆕	None	Fastest + tool calling, ultralow
	xAI Grok-4 (standard)	$0.20/$1.50 per 1M 🆕	None	Reasoning flagship from xAI
	Mistral	Free trial + paid	Rate limited	European AI
	OpenRouter	Pay-per-use	None	100+ models aggr.
💰 CHEAP	GLM-5 (via Z.AI) 🆕	$0.5/1M	Daily 10AM	128K output, newest flagship
	GLM-4.7	$0.6/1M	Daily 10AM	Budget backup
	MiniMax M2.5 🆕	$0.3/1M input	5-hour rolling	Reasoning + agentic tasks
	MiniMax M2.1	$0.2/1M	5-hour rolling	Cheapest option
	Kimi K2.5 (Moonshot API) 🆕	Pay-per-use	None	Direct Moonshot API access
	Kimi K2	$9/mo flat	10M tokens/mo	Predictable cost
🆓 FREE	Qoder	$0	Unlimited	5 models unlimited
	Qwen	$0	Unlimited	4 models unlimited
	Kiro	$0	Unlimited	Claude Sonnet/Haiku (AWS Builder)
	LongCat Flash-Lite 🆕	$0 (50M tok/day 🔥)	1 RPS	Largest free quota on Earth
	Pollinations AI 🆕	$0 (no key needed)	1 req/15s	GPT-5, Claude, DeepSeek, Llama 4
	Cloudflare Workers AI 🆕	$0 (10K Neurons/day)	~150 resp/day	50+ models, global edge
	Scaleway AI 🆕	$0 (1M tokens total)	Rate limited	EU/GDPR, Qwen3 235B, Llama 70B

🆕 New models added (Mar 2026): Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.

💡 $0 Combo Stack — The Complete Free Setup:

# 🆓 Ultimate Free Stack 2026 — 11 Providers, $0 Forever
Kiro (kr/)             → Claude Sonnet/Haiku UNLIMITED
Qoder (if/)            → kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 UNLIMITED
LongCat Lite (lc/)     → LongCat-Flash-Lite — 50M tokens/day 🔥
Pollinations (pol/)    → GPT-5, Claude, DeepSeek, Llama 4 — no key needed
Qwen (qw/)             → qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next UNLIMITED
Gemini (gemini/)       → Gemini 2.5 Flash — 1,500 req/day free API key
Cloudflare AI (cf/)    → Llama 70B, Gemma 3, Mistral — 10K Neurons/day
Scaleway (scw/)        → Qwen3 235B, Llama 70B — 1M free tokens (EU)
Groq (groq/)           → Llama/Gemma ultra-fast — 14.4K req/day
NVIDIA NIM (nvidia/)   → 70+ open models — 40 RPM forever
Cerebras (cerebras/)   → Llama/Qwen world-fastest — 1M tok/day

Zero cost. Never stops coding. Configure this as one OmniRoute combo and all fallbacks happen automatically — no manual switching ever.

🆓 Free Models — What You Actually Get

All models below are 100% free with zero credit card required. OmniRoute auto-routes between them when one quota runs out — combine them all for an unbreakable $0 combo.

🔵 CLAUDE MODELS (via Kiro — AWS Builder ID)

Model	Prefix	Limit	Rate Limit
`claude-sonnet-4.5`	`kr/`	Unlimited	No reported daily cap
`claude-haiku-4.5`	`kr/`	Unlimited	No reported daily cap
`claude-opus-4.6`	`kr/`	Unlimited	Latest Opus via Kiro

🟢 QODER MODELS (Free PAT via qodercli)

Model	Prefix	Limit	Rate Limit
`kimi-k2-thinking`	`if/`	Unlimited	No reported cap
`qwen3-coder-plus`	`if/`	Unlimited	No reported cap
`deepseek-r1`	`if/`	Unlimited	No reported cap
`minimax-m2.1`	`if/`	Unlimited	No reported cap
`kimi-k2`	`if/`	Unlimited	No reported cap

Recommended connection method: Personal Access Token + qodercli. Browser OAuth is experimental and disabled by default unless QODER_OAUTH_* environment variables are configured.

🟡 QWEN MODELS (Device Code Auth)

Model	Prefix	Limit	Rate Limit
`qwen3-coder-plus`	`qw/`	Unlimited	No reported cap
`qwen3-coder-flash`	`qw/`	Unlimited	No reported cap
`qwen3-coder-next`	`qw/`	Unlimited	No reported cap
`vision-model`	`qw/`	Unlimited	Multimodal (images)

🟣 GEMINI CLI (Google OAuth)

Model	Prefix	Limit	Rate Limit
`gemini-3-flash-preview`	`gc/`	180K tok/month + 1K/day	Monthly reset
`gemini-2.5-pro`	`gc/`	180K/month (shared pool)	High quality

⚫ NVIDIA NIM (Free API Key — build.nvidia.com)

Tier	Daily Limit	Rate Limit	Notes
Free (Dev)	No token cap	~40 RPM	70+ models; transitioning to pure rate limits mid-2025

Popular free models: moonshotai/kimi-k2.5 (Kimi K2.5), z-ai/glm4.7 (GLM 4.7), deepseek-ai/deepseek-v3.2 (DeepSeek V3.2), nvidia/llama-3.3-70b-instruct, deepseek/deepseek-r1

⚪ CEREBRAS (Free API Key — inference.cerebras.ai)

Tier	Daily Limit	Rate Limit	Notes
Free	1M tokens/day	60K TPM / 30 RPM	World's fastest LLM inference; resets daily

Available free: llama-3.3-70b, llama-3.1-8b, deepseek-r1-distill-llama-70b

🔴 GROQ (Free API Key — console.groq.com)

Tier	Daily Limit	Rate Limit	Notes
Free	14.4K RPD	30 RPM per model	No credit card; 429 on limit, not charged

Available free: llama-3.3-70b-versatile, gemma2-9b-it, mixtral-8x7b, whisper-large-v3

🔴 LONGCAT AI (Free API Key — longcat.chat) 🆕

Model	Prefix	Daily Free Quota	Notes
`LongCat-Flash-Lite`	`lc/`	50M tokens 💥	Largest free quota ever
`LongCat-Flash-Chat`	`lc/`	500K tokens	Multi-turn chat
`LongCat-Flash-Thinking`	`lc/`	500K tokens	Reasoning / CoT
`LongCat-Flash-Thinking-2601`	`lc/`	500K tokens	Jan 2026 version
`LongCat-Flash-Omni-2603`	`lc/`	500K tokens	Multimodal

100% free while in public beta. Sign up at longcat.chat with email or phone. Resets daily 00:00 UTC.

🟢 POLLINATIONS AI (No API Key Required) 🆕

Model	Prefix	Rate Limit	Provider Behind
`openai`	`pol/`	1 req/15s	GPT-5
`claude`	`pol/`	1 req/15s	Anthropic Claude
`gemini`	`pol/`	1 req/15s	Google Gemini
`deepseek`	`pol/`	1 req/15s	DeepSeek V3
`llama`	`pol/`	1 req/15s	Meta Llama 4 Scout
`mistral`	`pol/`	1 req/15s	Mistral AI

✨ Zero friction: No signup, no API key. Add the Pollinations provider with an empty key field and it works immediately.

🟠 CLOUDFLARE WORKERS AI (Free API Key — cloudflare.com) 🆕

Tier	Daily Neurons	Equivalent Usage	Notes
Free	10,000	~150 LLM resp / 500s audio / 15K embeds	Global edge, 50+ models

Popular free models: @cf/meta/llama-3.3-70b-instruct, @cf/google/gemma-3-12b-it, @cf/openai/whisper-large-v3-turbo (free audio!), @cf/qwen/qwen2.5-coder-15b-instruct

Requires API Token + Account ID from dash.cloudflare.com. Store Account ID in provider settings.

🟣 SCALEWAY AI (1M Free Tokens — scaleway.com) 🆕

Tier	Free Quota	Location	Notes
Free	1M tokens	🇫🇷 Paris, EU	No credit card needed within limits

Available free: qwen3-235b-a22b-instruct-2507 (Qwen3 235B!), llama-3.1-70b-instruct, mistral-small-3.2-24b-instruct-2506, deepseek-v3-0324

EU/GDPR compliant. Get API key at console.scaleway.com.

💡 The Ultimate Free Stack (11 Providers, $0 Forever):

Kiro (kr/)             → Claude Sonnet/Haiku UNLIMITED
Qoder (if/)            → kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 UNLIMITED
LongCat Lite (lc/)     → LongCat-Flash-Lite — 50M tokens/day 🔥
Pollinations (pol/)    → GPT-5, Claude, DeepSeek, Llama 4 — no key needed
Qwen (qw/)             → qwen3-coder models UNLIMITED
Gemini (gemini/)       → Gemini 2.5 Flash — 1,500 req/day free
Cloudflare AI (cf/)    → 50+ models — 10K Neurons/day
Scaleway (scw/)        → Qwen3 235B, Llama 70B — 1M free tokens (EU)
Groq (groq/)           → Llama/Gemma — 14.4K req/day ultra-fast
NVIDIA NIM (nvidia/)   → 70+ open models — 40 RPM forever
Cerebras (cerebras/)   → Llama/Qwen world-fastest — 1M tok/day

🎙️ Free Transcription Combo

Transcribe any audio/video for $0 — Deepgram leads with $200 free, AssemblyAI $50 fallback, Groq Whisper as unlimited emergency backup.

Provider	Free Credits	Best Model	Rate Limit
🟢 Deepgram	$200 free (signup)	`nova-3` — best accuracy, 30+ languages	No RPM limit on free credits
🔵 AssemblyAI	$50 free (signup)	`universal-3-pro` — chapters, sentiment, PII	No RPM limit on free credits
🔴 Groq	Free forever	`whisper-large-v3` — OpenAI Whisper	30 RPM (rate limited)

Suggested combo in /dashboard/combos:

Name: free-transcription
Strategy: Priority
Nodes:
  [1] deepgram/nova-3          → uses $200 free first
  [2] assemblyai/universal-3-pro → fallback when Deepgram credits run out
  [3] groq/whisper-large-v3    → free forever, emergency fallback

Then in /dashboard/media → Transcription tab: upload any audio or video file → select your combo endpoint → get transcription in supported formats.

💡 Key Features

OmniRoute v3.6 is built as an operational platform, not just a relay proxy.

🆕 New — v3.6.x Highlights (Apr 2026)

Feature	What It Does
🌐 V1 WebSocket Bridge	OpenAI-compatible WebSocket traffic upgraded and proxied via `/v1/ws` — full streaming over WS with session auth (API key or session cookie)
🔑 Sync Tokens & Config Bundle	Issue/revoke sync tokens for config sync endpoints. Config bundles versioned with ETag for bandwidth-efficient polling
🧠 GLM Thinking (glmt) Preset	GLM Thinking registered first-class: 65 536 max tokens, 24 576 thinking budget, 900s timeout, usage sync & pricing — Claude-compatible API
🔢 Hybrid Token Counting	Uses provider-side `/messages/count_tokens` when available; falls back to estimation — accurate usage tracking without guessing
🌱 Model Alias Auto-Seed	30+ cross-proxy dialect aliases normalised at startup — no more routing mismatches
🛡️ Safe Outbound Fetch	All provider validation and model discovery go through a guarded fetch layer blocking private/local URLs with retry, timeout, and SSRF protection
⏳ Wait For Cooldown	Server-side chat retries when every candidate connection is cooling down; configurable `enabled`, `maxRetries`, and `maxRetryWaitSec`
🔍 Runtime Env Validation	Startup validates all env vars with Zod schemas — clear errors for missing secrets, invalid URLs, or wrong types
📋 Compliance Audit Expansion	Structured audit logs with pagination, request context, auth events, provider CRUD events, and SSRF-blocked validation logging
🔐 TPS Log Metric	Log details modal shows Tokens Per Second (TPS) — quick performance at-a-glance for every request
🗑️ Uninstall / Full Uninstall	`npm run uninstall` keeps data, `npm run uninstall:full` removes everything — clean removal for all install methods
🔧 OAuth Env Repair	One-click "Repair env" action for OAuth providers restores missing env vars and fixes broken auth state
🔒 Graceful Electron Shutdown	Electron `before-quit` shuts down Next.js gracefully, preventing SQLite WAL database locks on desktop close
👁️ Model Visibility Toggle	Per-model visibility toggle (👁 icon) with search filter and active-count badge (`N/M active`) on provider pages
📧 Email Privacy Masking	OAuth account emails masked (`di***@g**.com`), full address visible on hover
🔗 Context Relay Strategy	Combo strategy preserving session continuity via structured handoff summaries when accounts rotate mid-conversation
🛡️ Proxy Hardening	Token health check, API key validation, and undici dispatcher all honor proxy config
⚠️ Node.js 24 Login Warning	Login page proactively detects incompatible Node.js versions and shows a clear warning banner
📎 Gemini PDF Attachments	PDF attachments correctly routed to Gemini via `inline_data` and generic base64 detection
🔒 CodeQL Security Hardening	Resolved SSRF, insecure randomness, polynomial ReDoS, and incomplete URL sanitization alerts

🆕 New — ClawRouter-Inspired Improvements (Mar 2026)

Feature	What It Does
⚡ Grok-4 Fast Family	xAI models at $0.20/$0.50/M — benchmarked 1143ms (30% faster than Gemini 2.5 Flash)
🧠 GLM-5 via Z.AI	128K output context, $0.5/1M — newest flagship from the GLM family
🔮 MiniMax M2.5	Reasoning + agentic tasks at $0.30/1M — significant upgrade from M2.1
🎯 toolCalling Flag per Model	Per-model `toolCalling: true/false` in registry — AutoCombo skips non-tool-capable models
🌍 Multilingual Intent Detection	PT/ZH/ES/AR keywords in AutoCombo scoring — better model selection for non-English content
📊 Benchmark-Driven Fallbacks	Real p95 latency from live requests feeds combo scoring — AutoCombo learns from actual data
🔁 Request Deduplication	Content-hash based dedup window — multi-agent safe, prevents duplicate charges
🔌 Pluggable RouterStrategy	Extensible `RouterStrategy` interface — add custom routing logic as plugins

🚀 Previous v2.0.9+ — Playground, CLI Fingerprints & ACP

Feature	What It Does
🎮 Model Playground	Dashboard page to test any model directly — provider/model/endpoint selectors, Monaco Editor, streaming, abort, timing
🔏 CLI Fingerprint Matching	Per-provider header/body ordering to match native CLI signatures — toggle per provider in Settings > Security. Your proxy IP is preserved
🤝 ACP Support (Agent Client Protocol)	CLI agent discovery (Codex, Claude, Goose, Gemini CLI, OpenClaw + 9 more), process spawner, `/api/acp/agents` endpoint
🤖 ACP Agents Dashboard	Debug › Agents page — grid of 14 agents with install status, version, custom agent form for any CLI tool. OpenCode users get a "Download opencode.json" button that auto-generates a ready-to-use config with all available models.
🔧 Custom Model `apiFormat` Routing	Custom models with `apiFormat: "responses"` now correctly route to the Responses API translator
🏢 Codex Workspace Isolation	Multiple Codex workspaces per email — OAuth correctly separates connections by workspace ID
🔄 Electron Auto-Update	Desktop app checks for updates + auto-install on restart

🤖 Agent & Protocol Operations (v2.0)

Feature	What It Does
🔧 MCP Server (25 tools)	IDE/agent tools via 3 transports: stdio, SSE (`/api/mcp/sse`), Streamable HTTP (`/api/mcp/stream`). 18 core + 3 memory + 4 skill tools
🤝 A2A Server (JSON-RPC + SSE)	Agent-to-agent task execution with sync and streaming flows
🧭 Consolidated Endpoints Page	Tabbed management page with Endpoint Proxy, MCP, A2A, and API Endpoints tabs
🎚️ Service Enable/Disable Toggles	ON/OFF switches for MCP and A2A with settings persistence (default: OFF)
🛰️ MCP Runtime Heartbeat	Real process status (pid, uptime, heartbeat age, transport, scope mode)
📋 MCP Audit Trail	Filterable audit logs with success/failure and key attribution
🔐 MCP Scope Enforcement	10 granular scope permissions for controlled tool access
📡 A2A Task Lifecycle Management	List/filter tasks, inspect events/artifacts, cancel running tasks
📋 Agent Card Discovery	`/.well-known/agent.json` for client auto-discovery
🧪 Protocol E2E Test Harness	Real MCP SDK + A2A client flows in `test:protocols:e2e`
⚙️ Operational Controls	Switch combos, tune resilience settings, and review breaker state from dedicated Health and Settings surfaces

🧠 Routing & Intelligence

Feature	What It Does
🎯 Smart 4-Tier Fallback	Auto-route: Subscription → API Key → Cheap → Free
📊 Real-Time Quota Tracking	Live token count + reset countdown per provider
🔄 Format Translation	OpenAI ↔ Claude ↔ Gemini ↔ Responses with schema-safe conversions
👥 Multi-Account Support	Multiple accounts per provider with intelligent selection
🔄 Auto Token Refresh	OAuth tokens refresh automatically with retry
🎨 Custom Combos	13 balancing strategies + fallback chain control
🔗 Context Relay	Session continuity handoffs when account rotation happens mid-session
🌐 Wildcard Router	`provider/*` dynamic routing
🧠 Thinking Budget Controls	Passthrough, auto, custom, and adaptive reasoning limits
🔀 Model Aliases	Built-in + custom model aliasing and migration safety
⚡ Background Degradation	Route low-priority background tasks to cheaper models
🧪 Task-Aware Smart Routing	Auto-select model by content type (coding/vision/analysis/summarization)
🔄 A2A Agent Workflows	Deterministic FSM orchestrator for stateful multi-step agent executions
🔀 Adaptive Routing	Dynamic strategy override based on token volume and prompt complexity
🎲 Provider Diversity	Shannon entropy scoring balancing auto-combo traffic distribution
💬 System Prompt Injection	Global behavior controls applied consistently
📄 Responses API Compatibility	Full `/v1/responses` support for Codex and advanced agentic workflows

Feature	What It Does
🖼️ Image Generation	`/v1/images/generations` with cloud and local backends
📐 Embeddings	`/v1/embeddings` for search and RAG pipelines
🎤 Audio Transcription	`/v1/audio/transcriptions` — 7 providers (Deepgram Nova 3, AssemblyAI, Groq Whisper, HuggingFace, ElevenLabs, OpenAI, Azure), auto-language detection, MP4/MP3/WAV support
🔊 Text-to-Speech	`/v1/audio/speech` — 10 providers (ElevenLabs, OpenAI, Deepgram, Cartesia, PlayHT, HuggingFace, Nvidia NIM, Inworld, Coqui, Tortoise) with correct error messages
🎬 Video Generation	`/v1/videos/generations` (ComfyUI + SD WebUI workflows)
🎵 Music Generation	`/v1/music/generations` (ComfyUI workflows)
🛡️ Moderations	`/v1/moderations` safety checks
🔀 Reranking	`/v1/rerank` for relevance scoring
🔍 Web Search 🆕	`/v1/search` — 5 providers (Serper, Brave, Perplexity, Exa, Tavily), 6,500+ free/month, auto-failover, cache

🛡️ Resilience, Security & Governance

Feature	What It Does
🔌 Provider Circuit Breakers	Provider-wide trip/recover after fallback exhaustion with configurable thresholds
🔒 Daily Quota Lock 🆕	Detects exhaustion signals and locks routing for the specific model until midnight
🎯 Endpoint-Aware Models	Custom models declare supported endpoints + API format
🛡️ Anti-Thundering Herd	Mutex + semaphore protections on retry/rate events
🧠 Semantic + Signature Cache	Cost/latency reduction with two cache layers
⚡ Request Idempotency	Duplicate protection window
🔒 TLS Fingerprint Spoofing	Browser-like TLS fingerprint — reduces bot detection and account flagging
🔏 CLI Fingerprint Matching	Matches native CLI request signatures — reduces ban risk while preserving proxy IP
🌐 IP Filtering	Allowlist/blocklist control for exposed deployments
🚦 Request Queue & Pacing	Configurable per-connection request buckets for RPM, spacing, concurrency, and max wait
📉 Graceful Degradation	Multi-layer capability fallbacks protecting core gateway operations
📜 Config Audit Trail	Diff-based change tracking preventing operational drift with simple rollbacks
⏳ Provider Health Sync	Proactive token expiration monitoring triggering alerts before authorization failures
❄️ Connection Cooldown	Retryable 408/429/5xx failures cool down a single connection with optional upstream hints
🚪 Auto-Disable Banned Accounts	Permanently blocked token accounts can be disabled automatically
🔑 API Key Management + Scoping	Secure key issuance/rotation and model/provider controls
👁️ Scoped API Key Reveal 🆕	Opt-in recovery of API keys via `ALLOW_API_KEY_REVEAL`
🛡️ Protected `/models`	Optional auth gating and provider hiding for model catalog
🛡️ Safe Outbound Fetch 🆕	Guarded fetch for provider calls — blocks private/local URLs, retries, SSRF protection
⏳ Wait For Cooldown 🆕	Auto-retry chat after connection cooldowns; configurable `enabled`, `maxRetries`, and `maxRetryWaitSec`
🔍 Runtime Env Validation 🆕	Zod-based env schema validation at startup with actionable error messages
📋 Compliance Audit v2 🆕	Pagination, request context, auth events, provider CRUD, and SSRF-blocked logging

📊 Observability & Analytics

Feature	What It Does
📝 Request + Proxy Logging	Full request/response and proxy logging
📉 Streamed Detailed Logs	Reconstructs SSE payload streams cleanly into the UI
🏷️ Real-Time Model Badges 🆕	Live model status and daily quota countdown timers
📋 Unified Logs Dashboard	Request, proxy, audit, and console views in one page
🔍 Request Telemetry	p50/p95/p99 latency and request tracing
🏥 Health Dashboard	Uptime, breaker states, lockouts, cache stats
💰 Cost Tracking	Budget controls and per-model pricing visibility
📈 Analytics Visualizations	Model/provider usage insights and trend views
🧪 Evaluation Framework	Golden set testing with configurable match strategies
📡 Live Diagnostics 🆕	Semantic cache bypass for accurate combo live testing
🔐 TPS Log Metric 🆕	Tokens Per Second badge in log details modal

☁️ Deployment & Platform

Feature	What It Does
🌐 Deploy Anywhere	Localhost, VPS, Docker, Cloud environments
🚇 Cloudflare Tunnel 🆕	One-click Quick Tunnel integration from the dashboard
🔑 API Key Model Filtering	Native /v1/models response filtered via assigned Bearer context roles
⚡ Smart Cache Bypass	Configurable TTL heuristics and forced refetch controls
🔄 Backup/Restore	Export/import and disaster recovery flows
🧙 Onboarding Wizard	First-run guided setup
🔧 CLI Tools Dashboard	One-click setup for popular coding tools
🎮 Model Playground	Test any provider/model/endpoint from the dashboard
🔏 CLI Fingerprint Toggle	Per-provider fingerprint matching in Settings > Security
🌐 i18n (30 languages)	Full dashboard + docs language support with RTL coverage
🧹 Clear All Models	One-click model list clearing in provider details
👁️ Sidebar Controls 🆕	Hide components and integrations from Appearance Settings
📋 Issue Templates	Standardized GitHub templates for bugs and features
📂 Custom Data Directory	`DATA_DIR` override for storage location
🌐 V1 WebSocket Bridge 🆕	OpenAI-compatible WebSocket traffic proxied via `/v1/ws`
🔑 Sync Tokens & Bundle 🆕	Config sync tokens + versioned bundle endpoint with ETag support

Feature Deep Dive

Smart fallback with practical cost control

Combo: "my-coding-stack"
  1. cc/claude-opus-4-7
  2. nvidia/llama-3.3-70b
  3. glm/glm-4.7
  4. if/kimi-k2-thinking

When quota, rate, or health fails, OmniRoute automatically moves to the next candidate without manual switching.

Protocol management that is visible and operable

MCP + A2A are discoverable in UI and docs (not hidden)
Protocol status APIs expose live operational data (/api/mcp/*, /api/a2a/*)
Dashboards include actions for day-2 ops (combo toggles, breaker resets, task cancellation)

Translator + validation workflow

The Translator area includes:

Playground: request transformation checks
Chat Tester: full request/response round-trip
Test Bench: multiple cases in one run
Live Monitor: real-time traffic view

Plus protocol validation with real clients via npm run test:protocols:e2e.

📖 MCP Server README — Tool reference, IDE configs, and client examples

📖 A2A Server README — Skills, JSON-RPC methods, streaming, and task lifecycle

🧪 Evaluations (Evals)

OmniRoute includes a built-in evaluation framework to test LLM response quality against a golden set. Access it via Analytics → Evals in the dashboard.

Built-in Golden Set

The pre-loaded "OmniRoute Golden Set" contains test cases for:

Greetings, math, geography, code generation
JSON format compliance, translation, markdown generation
Safety refusal (harmful content), counting, boolean logic

Evaluation Strategies

Strategy	Description	Example
`exact`	Output must match exactly	`"4"`
`contains`	Output must contain substring (case-insensitive)	`"Paris"`
`regex`	Output must match regex pattern	`"1.2.3"`
`custom`	Custom JS function returns true/false	`(output) => output.length > 10`

📖 Setup Guide

Protocol Setup (MCP + A2A)

🧩 MCP Setup (Model Context Protocol)

Start MCP transport in stdio mode:

omniroute --mcp

Recommended validation flow:

Connect your MCP client over stdio.
Run omniroute_get_health.
Run omniroute_list_combos.
Open /dashboard/mcp to confirm heartbeat, activity, and audit.

Useful APIs for automation:

GET /api/mcp/status
GET /api/mcp/tools
GET /api/mcp/audit
GET /api/mcp/audit/stats

🤝 A2A Setup (Agent2Agent)

Discover the agent:

curl http://localhost:20128/.well-known/agent.json

Send a task:

curl -X POST http://localhost:20128/a2a \
  -H 'content-type: application/json' \
  -d '{"jsonrpc":"2.0","id":"setup-a2a","method":"message/send","params":{"skill":"quota-management","messages":[{"role":"user","content":"Summarize quota status."}]}}'

Manage lifecycle:

GET /api/a2a/status
GET /api/a2a/tasks
GET /api/a2a/tasks/:id
POST /api/a2a/tasks/:id/cancel

Operational UI:

/dashboard/a2a for task/state/stream observability and smoke actions

🧪 End-to-end protocol validation

Validate both protocols with real clients:

npm run test:protocols:e2e

This verifies:

MCP SDK client connect/list/call
A2A discovery/send/stream/get/cancel
Cross-check data in MCP audit and A2A task management APIs

💳 Subscription Providers

Claude Code (Pro/Max)

Dashboard → Providers → Connect Claude Code
→ OAuth login → Auto token refresh
→ 5-hour + weekly quota tracking

Models:
  cc/claude-opus-4-7
  cc/claude-sonnet-4-5-20250929
  cc/claude-haiku-4-5-20251001

Pro Tip: Use Opus for complex tasks, Sonnet for speed. OmniRoute tracks quota per model!

OpenAI Codex (Plus/Pro)

Dashboard → Providers → Connect Codex
→ OAuth login (port 1455)
→ 5-hour + weekly reset

Models:
  cx/gpt-5.2-codex
  cx/gpt-5.1-codex-max

Codex Account Limit Management (5h + Weekly)

Each Codex account now has policy toggles in Dashboard -> Providers:

5h (ON/OFF): enforce the 5-hour window threshold policy.
Weekly (ON/OFF): enforce the weekly window threshold policy.
Threshold behavior: when an enabled window reaches >=90% usage, that account is skipped.
Rotation behavior: OmniRoute routes to the next eligible Codex account automatically.
Reset behavior: when the provider resetAt time passes, the account becomes eligible again automatically.

Scenarios:

5h ON + Weekly ON: account is skipped when either window reaches threshold.
5h OFF + Weekly ON: only weekly usage can block the account.
5h ON + Weekly OFF: only 5-hour usage can block the account.
resetAt passed: account re-enters rotation automatically (no manual re-enable).

Gemini CLI (FREE 180K/month!)

Dashboard → Providers → Connect Gemini CLI
→ Google OAuth
→ 180K completions/month + 1K/day

Models:
  gc/gemini-3-flash-preview
  gc/gemini-2.5-pro

Best Value: Huge free tier! Use this before paid tiers.

GitHub Copilot

Dashboard → Providers → Connect GitHub
→ OAuth via GitHub
→ Monthly reset (1st of month)

Models:
  gh/gpt-5
  gh/claude-4.5-sonnet
  gh/gemini-3.1-pro-preview

🔑 API Key Providers

NVIDIA NIM (FREE developer access — 70+ models)

Sign up: build.nvidia.com
Get free API key (1000 inference credits included)
Dashboard → Add Provider → NVIDIA NIM:
- API Key: nvapi-your-key

Models: nvidia/llama-3.3-70b-instruct, nvidia/mistral-7b-instruct, and 50+ more

Pro Tip: OpenAI-compatible API — works seamlessly with OmniRoute's format translation!

DeepSeek

Sign up: platform.deepseek.com
Get API key
Dashboard → Add Provider → DeepSeek

Models: deepseek/deepseek-chat, deepseek/deepseek-coder

Groq (Free Tier Available!)

Sign up: console.groq.com
Get API key (free tier included)
Dashboard → Add Provider → Groq

Models: groq/llama-3.3-70b, groq/mixtral-8x7b

Pro Tip: Ultra-fast inference — best for real-time coding!

OpenRouter (100+ Models)

Sign up: openrouter.ai
Get API key
Dashboard → Add Provider → OpenRouter

Models: Access 100+ models from all major providers through a single API key.

Dashboard behavior: OpenRouter models are managed from Available Models. Manual add, import, and auto-sync all update the same list.

💰 Cheap Providers (Backup)

GLM-4.7 (Daily reset, $0.6/1M)

Sign up: Zhipu AI
Get API key from Coding Plan
Dashboard → Add API Key:
- Provider: glm
- API Key: your-key

Use: glm/glm-4.7

Pro Tip: Coding Plan offers 3× quota at 1/7 cost! Reset daily 10:00 AM.

MiniMax M2.1 (5h reset, $0.20/1M)

Sign up: MiniMax
Get API key
Dashboard → Add API Key

Use: minimax/MiniMax-M2.1

Pro Tip: Cheapest option for long context (1M tokens)!

Kimi K2 ($9/month flat)

Subscribe: Moonshot AI
Get API key
Dashboard → Add API Key

Use: kimi/kimi-latest

Pro Tip: Fixed $9/month for 10M tokens = $0.90/1M effective cost!

🆓 FREE Providers (Emergency Backup)

Qoder (5 FREE models via OAuth)

Dashboard → Connect Qoder
→ Qoder OAuth login
→ Unlimited usage

Models:
  if/kimi-k2-thinking
  if/qwen3-coder-plus
  if/glm-4.7
  if/minimax-m2
  if/deepseek-r1

Qwen (4 FREE models via Device Code)

Dashboard → Connect Qwen
→ Device code authorization
→ Unlimited usage

Models:
  qw/qwen3-coder-plus
  qw/qwen3-coder-flash

Kiro (Claude FREE)

Dashboard → Connect Kiro
→ AWS Builder ID or Google/GitHub
→ Unlimited usage

Models:
  kr/claude-sonnet-4.5
  kr/claude-haiku-4.5

🎨 Create Combos

Example 1: Maximize Subscription → Cheap Backup

Dashboard → Combos → Create New

Name: premium-coding
Models:
  1. cc/claude-opus-4-7 (Subscription primary)
  2. glm/glm-4.7 (Cheap backup, $0.6/1M)
  3. minimax/MiniMax-M2.1 (Cheapest fallback, $0.20/1M)

Use in CLI: premium-coding

Example 2: Free-Only (Zero Cost)

Name: free-combo
Models:
  1. gc/gemini-3-flash-preview (180K free/month)
  2. if/kimi-k2-thinking (unlimited)
  3. qw/qwen3-coder-plus (unlimited)

Cost: $0 forever!

🔧 CLI Integration

Cursor IDE

Settings → Models → Advanced:
  OpenAI API Base URL: http://localhost:20128/v1
  OpenAI API Key: [from OmniRoute dashboard]
  Model: cc/claude-opus-4-7

Claude Code

Use the CLI Tools page in the dashboard for one-click configuration, or edit ~/.claude/settings.json manually.

Codex CLI

export OPENAI_BASE_URL="http://localhost:20128"
export OPENAI_API_KEY="your-omniroute-api-key"

codex "your prompt"

OpenClaw

Option 1 — Dashboard (recommended):

Dashboard → CLI Tools → OpenClaw → Select Model → Apply

Option 2 — Manual: Edit ~/.openclaw/openclaw.json:

{
  "models": {
    "providers": {
      "omniroute": {
        "baseUrl": "http://127.0.0.1:20128/v1",
        "apiKey": "sk_omniroute",
        "api": "openai-completions"
      }
    }
  }
}

Note: OpenClaw only works with local OmniRoute. Use 127.0.0.1 instead of localhost to avoid IPv6 resolution issues.

Cline / Continue / RooCode

Settings → API Configuration:
  Provider: OpenAI Compatible
  Base URL: http://localhost:20128/v1
  API Key: [from OmniRoute dashboard]
  Model: if/kimi-k2-thinking

OpenCode

Step 1: Add OmniRoute as a custom provider:

opencode
/connect
# Select "Other" → Enter ID: "omniroute" → Enter your OmniRoute API key

Step 2: Create/edit opencode.json in your project root:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "omniroute": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "OmniRoute",
      "options": {
        "baseURL": "http://localhost:20128/v1"
      },
      "models": {
        "cc/claude-sonnet-4-20250514": { "name": "Claude Sonnet 4" },
        "gg/gemini-2.5-pro": { "name": "Gemini 2.5 Pro" },
        "if/kimi-k2-thinking": { "name": "Kimi K2 (Free)" }
      }
    }
  }
}

Step 3: Select the model in OpenCode:

/models
# Select any OmniRoute model from the list

Tip: Add any model available in your OmniRoute /v1/models endpoint to the models section. Use the format provider/model-id from your OmniRoute dashboard.

Riešenie problémov

Click to expand troubleshooting guide

"Language model did not provide messages"

Provider quota exhausted → Check dashboard quota tracker
Solution: Use combo fallback or switch to cheaper tier

Rate limiting

Subscription quota out → Fallback to GLM/MiniMax
Add combo: cc/claude-opus-4-7 → glm/glm-4.7 → if/kimi-k2-thinking

OAuth token expired

Auto-refreshed by OmniRoute
If issues persist: Dashboard → Provider → Reconnect

High costs

Check usage stats in Dashboard → Costs
Switch primary model to GLM/MiniMax
Use free tier (Gemini CLI, Qoder) for non-critical tasks

Dashboard/API ports are wrong

PORT is the canonical base port (and API port by default)
API_PORT overrides only OpenAI-compatible API listener
DASHBOARD_PORT overrides only dashboard/Next.js listener
Set NEXT_PUBLIC_BASE_URL to your dashboard/public URL (for OAuth callbacks)

Cloud sync errors

Verify BASE_URL points to your running instance
Verify CLOUD_URL points to your expected cloud endpoint
Keep NEXT_PUBLIC_* values aligned with server-side values

First login not working

Check INITIAL_PASSWORD in .env
If unset, fallback password is 123456

No request logs

call_logs in SQLite stores summary metadata for the Request Logs table and analytics views
Detailed request/response payloads are written to DATA_DIR/call_logs/ as one JSON artifact per request
Enable pipeline capture from Dashboard → Logs → Request Logs if you need detailed per-stage payloads
Export Logs reads the artifact files on demand, while Export All includes the call_logs/ directory alongside storage.sqlite
Set APP_LOG_TO_FILE=true if you also want application console logs in logs/application/app.log
Adjust APP_LOG_MAX_FILE_SIZE, APP_LOG_RETENTION_DAYS, APP_LOG_MAX_FILES, and CALL_LOG_MAX_ENTRIES as needed

Connection test shows "Invalid" for OpenAI-compatible providers

Many providers don't expose a /models endpoint
OmniRoute v1.0.6+ includes fallback validation via chat completions
Ensure base URL includes /v1 suffix

🔐 OAuth on a Remote Server

⚠️ Important for users running OmniRoute on a VPS, Docker, or any remote server

Why does Antigravity / Gemini CLI OAuth fail on remote servers?

The Antigravity and Gemini CLI providers use Google OAuth 2.0. Google requires the redirect_uri in the OAuth flow to exactly match one of the pre-registered URIs in the app's Google Cloud Console.

The OAuth credentials bundled in OmniRoute are registered for localhost only. When you access OmniRoute on a remote server (e.g. https://omniroute.myserver.com), Google rejects the authentication with:

Error 400: redirect_uri_mismatch

Solution: Configure your own OAuth credentials

You need to create an OAuth 2.0 Client ID in Google Cloud Console with your server's URI.

Step-by-step

1. Open Google Cloud Console

Go to: https://console.cloud.google.com/apis/credentials

2. Create a new OAuth 2.0 Client ID

Click "+ Create Credentials" → "OAuth client ID"
Application type: "Web application"
Name: anything you like (e.g. OmniRoute Remote)

3. Add Authorized Redirect URIs

In the "Authorized redirect URIs" field, add:

https://your-server.com/callback

Replace your-server.com with your server's domain or IP (include the port if needed, e.g. http://45.33.32.156:20128/callback).

4. Save and copy the credentials

After creating, Google will show the Client ID and Client Secret.

5. Set environment variables

In your .env (or Docker environment variables):

# For Antigravity:
ANTIGRAVITY_OAUTH_CLIENT_ID=your-client-id.apps.googleusercontent.com
ANTIGRAVITY_OAUTH_CLIENT_SECRET=GOCSPX-your-secret

# For Gemini CLI:
GEMINI_OAUTH_CLIENT_ID=your-client-id.apps.googleusercontent.com
GEMINI_OAUTH_CLIENT_SECRET=GOCSPX-your-secret
GEMINI_CLI_OAUTH_CLIENT_SECRET=GOCSPX-your-secret

6. Restart OmniRoute

# npm:
npm run dev

# Docker:
docker restart omniroute

7. Try connecting again

Dashboard → Providers → Antigravity (or Gemini CLI) → OAuth

Google will now redirect correctly to https://your-server.com/callback.

Temporary workaround (without custom credentials)

If you don't want to set up your own credentials right now, you can still use the manual URL flow:

OmniRoute opens the Google authorization URL
After authorizing, Google tries to redirect to localhost (which fails on the remote server)
Copy the full URL from your browser's address bar (even if the page doesn't load)
Paste that URL into the field shown in the OmniRoute connection modal
Click "Connect"

This works because the authorization code in the URL is valid regardless of whether the redirect page loaded.

🇧🇷 Versão em Português

Por que o OAuth do Antigravity / Gemini CLI falha em servidores remotos?

Os provedores Antigravity e Gemini CLI usam Google OAuth 2.0 para autenticação. O Google exige que a redirect_uri usada no fluxo OAuth seja exatamente uma das URIs pré-cadastradas no Google Cloud Console do aplicativo.

As credenciais OAuth embutidas no OmniRoute estão cadastradas apenas para localhost. Quando você acessa o OmniRoute em um servidor remoto (ex: https://omniroute.meuservidor.com), o Google rejeita a autenticação com:

Error 400: redirect_uri_mismatch

Solução: Configure suas próprias credenciais OAuth

Você precisa criar um OAuth 2.0 Client ID no Google Cloud Console com a URI do seu servidor.

Passo a passo

1. Acesse o Google Cloud Console

Abra: https://console.cloud.google.com/apis/credentials

2. Crie um novo OAuth 2.0 Client ID

Clique em "+ Create Credentials" → "OAuth client ID"
Tipo de aplicativo: "Web application"
Nome: escolha qualquer nome (ex: OmniRoute Remote)

3. Adicione as Authorized Redirect URIs

No campo "Authorized redirect URIs", adicione:

https://seu-servidor.com/callback

Substitua seu-servidor.com pelo domínio ou IP do seu servidor (inclua a porta se necessário, ex: http://45.33.32.156:20128/callback).

4. Salve e copie as credenciais

Após criar, o Google mostrará o Client ID e o Client Secret.

5. Configure as variáveis de ambiente

No seu .env (ou nas variáveis de ambiente do Docker):

# Para Antigravity:
ANTIGRAVITY_OAUTH_CLIENT_ID=seu-client-id.apps.googleusercontent.com
ANTIGRAVITY_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret

# Para Gemini CLI:
GEMINI_OAUTH_CLIENT_ID=seu-client-id.apps.googleusercontent.com
GEMINI_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret
GEMINI_CLI_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret

6. Reinicie o OmniRoute

# Se usando npm:
npm run dev

# Se usando Docker:
docker restart omniroute

7. Tente conectar novamente

Dashboard → Providers → Antigravity (ou Gemini CLI) → OAuth

Agora o Google redirecionará corretamente para https://seu-servidor.com/callback e a autenticação funcionará.

Workaround temporário (sem configurar credenciais próprias)

Se não quiser criar credenciais próprias agora, ainda é possível usar o fluxo manual de URL:

O OmniRoute abrirá a URL de autorização do Google
Após você autorizar, o Google tentará redirecionar para localhost (que falha no servidor remoto)
Copie a URL completa da barra de endereço do seu browser (mesmo que a página não carregue)
Cole essa URL no campo que aparece no modal de conexão do OmniRoute
Clique em "Connect"

Este workaround funciona porque o código de autorização na URL é válido independente do redirect ter carregado ou não.

🛠️ Tech Stack

Click to expand tech stack details

Runtime: Node.js 18–22 LTS (⚠️ Node.js 24+ is not supported — better-sqlite3 native binaries are incompatible)
Language: TypeScript 5.9 — 100% TypeScript across src/ and open-sse/ (zero any in core modules since v2.0)
Framework: Next.js 16 + React 19 + Tailwind CSS 4
Database: better-sqlite3 (SQLite) + LowDB (JSON legacy) — domain state, proxy logs, MCP audit, routing decisions, memory, skills
Schemas: Zod (MCP tool I/O validation, API contracts)
Protocols: MCP (stdio/HTTP) + A2A v0.3 (JSON-RPC 2.0 + SSE)
Streaming: Server-Sent Events (SSE)
Auth: OAuth 2.0 (PKCE) + JWT + API Keys + MCP Scoped Authorization
Testing: Node.js test runner + Vitest (900+ tests including unit, integration, E2E)
CI/CD: GitHub Actions (auto npm publish + Docker Hub on release)
Website: omniroute.online
Package: npmjs.com/package/omniroute
Docker: hub.docker.com/r/diegosouzapw/omniroute
Resilience: Circuit breaker, exponential backoff, anti-thundering herd, TLS spoofing, auto-combo self-healing

Dokumentácia

Document	Description
User Guide	Providers, combos, CLI integration, deployment
API Reference	All endpoints with examples
MCP Server	25 MCP tools, IDE configs, Python/TS/Go clients
A2A Server	JSON-RPC 2.0 protocol, skills, streaming, task mgmt
Auto-Combo Engine	6-factor scoring, mode packs, self-healing
Context Relay	Session handoff strategy for account rotation
Troubleshooting	Common problems and solutions
Architecture	System architecture and internals
Codebase Documentation	Beginner-friendly codebase walkthrough
Uninstall Guide	Clean removal for all install methods
Environment Config	Complete `.env` variables and references
Contributing	Development setup and guidelines
OpenAPI Spec	OpenAPI 3.0 specification
Security Policy	Vulnerability reporting and security practices
VM Deployment	Complete guide: VM + nginx + Cloudflare setup
Features Gallery	Visual dashboard tour with screenshots
Release Checklist	Pre-release validation steps

🗺️ Roadmap

OmniRoute has 218+ features planned across multiple development phases. Here are the key areas:

Category	Planned Features	Highlights
🧠 Routing & Intelligence	25+	Lowest-latency routing, tag-based routing, quota preflight, quota-aware P2C, step-based combo routing
🔒 Security & Compliance	20+	SSRF hardening, credential cloaking, rate-limit per endpoint, management key scoping
📊 Observability	15+	OpenTelemetry integration, real-time quota monitoring, combo target health, cost tracking per model
🔄 Provider Integrations	20+	Dynamic model registry, connection cooldowns, multi-account Codex, Copilot quota parsing
⚡ Performance	15+	Dual cache layer, prompt cache, response cache, streaming keepalive, batch API
🌐 Ecosystem	10+	WebSocket API, config hot-reload, distributed config store, commercial mode

🔜 Coming Soon

🔗 OpenCode Integration — Native provider support for the OpenCode AI coding IDE
🔗 TRAE Integration — Full support for the TRAE AI development framework
📦 Batch API — Asynchronous batch processing for bulk requests
🎯 Tag-Based Routing — Route requests based on custom tags and metadata
💰 Lowest-Cost Strategy — Automatically select the cheapest available provider

📝 Full feature specifications available in docs/new-features/ (217 detailed specs)

👥 Contributors

How to Contribute

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Releasing a New Version

# Create a release — npm publish happens automatically
gh release create v2.0.0 --title "v2.0.0" --generate-notes

📊 Star History

🌍 StarMapper

🙏 Acknowledgments

Special thanks to 9router by decolua — the original project that inspired this fork. OmniRoute builds upon that incredible foundation with additional features, multi-modal APIs, and a full TypeScript rewrite.

Special thanks to CLIProxyAPI — the original Go implementation that inspired this JavaScript port.

Licencia

MIT License - see LICENSE for details.

_{Built with ❤️ for developers who code 24/7}
_{omniroute.online}

118 KiB Raw Permalink Blame History Unescape Escape

🚀 OmniRoute — The Free AI Gateway (Slovenčina)

Never stop coding. Smart routing to FREE & low-cost AI models with automatic fallback.

🖼️ Main Dashboard

📸 Dashboard Preview

🤖 Free AI Provider for your favorite coding agents

🤔 Why OmniRoute?

📧 Support

🐛 Reporting a Bug?

🔄 How It Works

🎯 What OmniRoute Solves — 30 Real Pain Points & Use Cases

Example Playbooks (Integrated Use Cases)

🆓 Start Free — Zero Configuration Cost

Rýchly štart

1) Install and run

Arch Linux (AUR)

2) Uninstalling

Long-Running Streaming Timeouts

2) Connect providers and create your API key

3) Point your coding tool to OmniRoute

4) Enable and validate protocols (v2.0)

5) Validate everything end-to-end (recommended)

Alternative: run from source

🐳 Docker

🖥️ Desktop App — Offline & Always-On

Rýchly štart

System Tray

💰 Pricing at a Glance

🆓 Free Models — What You Actually Get

🔵 CLAUDE MODELS (via Kiro — AWS Builder ID)

🟢 QODER MODELS (Free PAT via qodercli)

🟡 QWEN MODELS (Device Code Auth)

🟣 GEMINI CLI (Google OAuth)

⚫ NVIDIA NIM (Free API Key — build.nvidia.com)

⚪ CEREBRAS (Free API Key — inference.cerebras.ai)

🔴 GROQ (Free API Key — console.groq.com)

🔴 LONGCAT AI (Free API Key — longcat.chat) 🆕

🟢 POLLINATIONS AI (No API Key Required) 🆕

🟠 CLOUDFLARE WORKERS AI (Free API Key — cloudflare.com) 🆕

🟣 SCALEWAY AI (1M Free Tokens — scaleway.com) 🆕

🎙️ Free Transcription Combo

💡 Key Features

🆕 New — v3.6.x Highlights (Apr 2026)

🆕 New — ClawRouter-Inspired Improvements (Mar 2026)

🚀 Previous v2.0.9+ — Playground, CLI Fingerprints & ACP

🤖 Agent & Protocol Operations (v2.0)

🧠 Routing & Intelligence

🎵 Multi-Modal APIs

🛡️ Resilience, Security & Governance

📊 Observability & Analytics

☁️ Deployment & Platform

Feature Deep Dive

Smart fallback with practical cost control

Protocol management that is visible and operable

Translator + validation workflow

🧪 Evaluations (Evals)

Built-in Golden Set

Evaluation Strategies

📖 Setup Guide

Protocol Setup (MCP + A2A)

Claude Code (Pro/Max)

OpenAI Codex (Plus/Pro)

Codex Account Limit Management (5h + Weekly)

Gemini CLI (FREE 180K/month!)

GitHub Copilot

NVIDIA NIM (FREE developer access — 70+ models)

DeepSeek

Groq (Free Tier Available!)

OpenRouter (100+ Models)

GLM-4.7 (Daily reset, $0.6/1M)

MiniMax M2.1 (5h reset, $0.20/1M)

Kimi K2 ($9/month flat)

Qoder (5 FREE models via OAuth)

Qwen (4 FREE models via Device Code)

Kiro (Claude FREE)

Example 1: Maximize Subscription → Cheap Backup

Example 2: Free-Only (Zero Cost)

Cursor IDE

Claude Code

Codex CLI

118 KiB

Raw Permalink Blame History