mirror of
https://github.com/diegosouzapw/OmniRoute.git
synced 2026-04-28 06:19:46 +00:00
2379 lines
118 KiB
Markdown
2379 lines
118 KiB
Markdown
# 🚀 OmniRoute — The Free AI Gateway (বাংলা)
|
||
|
||
🌐 **Languages:** 🇺🇸 [English](../../../README.md) · 🇸🇦 [ar](../ar/README.md) · 🇧🇬 [bg](../bg/README.md) · 🇧🇩 [bn](../bn/README.md) · 🇨🇿 [cs](../cs/README.md) · 🇩🇰 [da](../da/README.md) · 🇩🇪 [de](../de/README.md) · 🇪🇸 [es](../es/README.md) · 🇮🇷 [fa](../fa/README.md) · 🇫🇮 [fi](../fi/README.md) · 🇫🇷 [fr](../fr/README.md) · 🇮🇳 [gu](../gu/README.md) · 🇮🇱 [he](../he/README.md) · 🇮🇳 [hi](../hi/README.md) · 🇭🇺 [hu](../hu/README.md) · 🇮🇩 [id](../id/README.md) · 🇮🇹 [it](../it/README.md) · 🇯🇵 [ja](../ja/README.md) · 🇰🇷 [ko](../ko/README.md) · 🇮🇳 [mr](../mr/README.md) · 🇲🇾 [ms](../ms/README.md) · 🇳🇱 [nl](../nl/README.md) · 🇳🇴 [no](../no/README.md) · 🇵🇭 [phi](../phi/README.md) · 🇵🇱 [pl](../pl/README.md) · 🇵🇹 [pt](../pt/README.md) · 🇧🇷 [pt-BR](../pt-BR/README.md) · 🇷🇴 [ro](../ro/README.md) · 🇷🇺 [ru](../ru/README.md) · 🇸🇰 [sk](../sk/README.md) · 🇸🇪 [sv](../sv/README.md) · 🇰🇪 [sw](../sw/README.md) · 🇮🇳 [ta](../ta/README.md) · 🇮🇳 [te](../te/README.md) · 🇹🇭 [th](../th/README.md) · 🇹🇷 [tr](../tr/README.md) · 🇺🇦 [uk-UA](../uk-UA/README.md) · 🇵🇰 [ur](../ur/README.md) · 🇻🇳 [vi](../vi/README.md) · 🇨🇳 [zh-CN](../zh-CN/README.md)
|
||
|
||
---
|
||
|
||
### Never stop coding. Smart routing to **FREE & low-cost AI models** with automatic fallback.
|
||
|
||
_Your universal API proxy — one endpoint, 100+ providers, zero downtime. Now with **MCP Server (25 tools)**, **A2A Protocol**, **Memory/Skills Systems** & **Electron Desktop App**._
|
||
|
||
**Chat Completions • Embeddings • Image Generation • Video • Music • Audio • Reranking • **Web Search** • MCP Server • A2A Protocol • 100% TypeScript**
|
||
|
||
---
|
||
|
||
<div align="center">
|
||
|
||
[](https://www.npmjs.com/package/omniroute)
|
||
[](https://hub.docker.com/r/diegosouzapw/omniroute)
|
||
|
||

|
||

|
||
|
||

|
||

|
||

|
||
|
||
[](https://github.com/diegosouzapw/OmniRoute/stargazers)
|
||
[](https://github.com/diegosouzapw/OmniRoute/issues)
|
||
[](https://github.com/diegosouzapw/OmniRoute/blob/main/LICENSE)
|
||
[](https://github.com/diegosouzapw/OmniRoute/commits/main)
|
||
[](https://github.com/diegosouzapw)
|
||
[](https://github.com/diegosouzapw/OmniRoute)
|
||
[](https://github.com/diegosouzapw/OmniRoute/pulls?q=is%3Apr+is%3Aclosed)
|
||
[](https://github.com/diegosouzapw/OmniRoute/tags)
|
||
[](https://github.com/diegosouzapw)
|
||
[](https://github.com/diegosouzapw?tab=followers)
|
||
[](https://github.com/diegosouzapw/OmniRoute/network/members)
|
||
[](https://github.com/diegosouzapw/OmniRoute/watchers)
|
||
|
||
[](https://github.com/diegosouzapw/OmniRoute/blob/main/LICENSE)
|
||
[](https://omniroute.online)
|
||
[](https://chat.whatsapp.com/JI7cDQ1GyaiDHhVBpLxf8b?mode=gi_t)
|
||
|
||
[🌐 Website](https://omniroute.online) • [🚀 Quick Start](#-quick-start) • [💡 Features](#-key-features) • [📖 Docs](#-documentation) • [💰 Pricing](#-pricing-at-a-glance) • [💬 WhatsApp](https://chat.whatsapp.com/JI7cDQ1GyaiDHhVBpLxf8b?mode=gi_t)
|
||
|
||
</div>
|
||
|
||
🌐 **Available in:** 🇺🇸 [English](README.md) | 🇧🇷 [Português (Brasil)](docs/i18n/pt-BR/README.md) | 🇪🇸 [Español](docs/i18n/es/README.md) | 🇫🇷 [Français](docs/i18n/fr/README.md) | 🇮🇹 [Italiano](docs/i18n/it/README.md) | 🇷🇺 [Русский](docs/i18n/ru/README.md) | 🇨🇳 [中文 (简体)](docs/i18n/zh-CN/README.md) | 🇩🇪 [Deutsch](docs/i18n/de/README.md) | 🇮🇳 [हिन्दी](docs/i18n/in/README.md) | 🇹🇭 [ไทย](docs/i18n/th/README.md) | 🇺🇦 [Українська](docs/i18n/uk-UA/README.md) | 🇸🇦 [العربية](docs/i18n/ar/README.md) | 🇯🇵 [日本語](docs/i18n/ja/README.md) | 🇻🇳 [Tiếng Việt](docs/i18n/vi/README.md) | 🇧🇬 [Български](docs/i18n/bg/README.md) | 🇩🇰 [Dansk](docs/i18n/da/README.md) | 🇫🇮 [Suomi](docs/i18n/fi/README.md) | 🇮🇱 [עברית](docs/i18n/he/README.md) | 🇭🇺 [Magyar](docs/i18n/hu/README.md) | 🇮🇩 [Bahasa Indonesia](docs/i18n/id/README.md) | 🇰🇷 [한국어](docs/i18n/ko/README.md) | 🇲🇾 [Bahasa Melayu](docs/i18n/ms/README.md) | 🇳🇱 [Nederlands](docs/i18n/nl/README.md) | 🇳🇴 [Norsk](docs/i18n/no/README.md) | 🇵🇹 [Português (Portugal)](docs/i18n/pt/README.md) | 🇷🇴 [Română](docs/i18n/ro/README.md) | 🇵🇱 [Polski](docs/i18n/pl/README.md) | 🇸🇰 [Slovenčina](docs/i18n/sk/README.md) | 🇸🇪 [Svenska](docs/i18n/sv/README.md) | 🇵🇭 [Filipino](docs/i18n/phi/README.md) | 🇨🇿 [Čeština](docs/i18n/cs/README.md)
|
||
|
||
---
|
||
|
||
## 🖼️ Main Dashboard
|
||
|
||
<div align="center">
|
||
<img src="./docs/screenshots/MainOmniRoute.png" alt="OmniRoute Dashboard" width="800"/>
|
||
</div>
|
||
|
||
---
|
||
|
||
## 📸 Dashboard Preview
|
||
|
||
<details>
|
||
<summary><b>Click to see dashboard screenshots</b></summary>
|
||
|
||
| Page | Screenshot |
|
||
| -------------- | ------------------------------------------------- |
|
||
| **Providers** |  |
|
||
| **Combos** |  |
|
||
| **Analytics** |  |
|
||
| **Health** |  |
|
||
| **Translator** |  |
|
||
| **Settings** |  |
|
||
| **CLI Tools** |  |
|
||
| **Usage Logs** |  |
|
||
| **Endpoints** |  |
|
||
|
||
</details>
|
||
|
||
---
|
||
|
||
### 🤖 Free AI Provider for your favorite coding agents
|
||
|
||
_Connect any AI-powered IDE or CLI tool through OmniRoute — free API gateway for unlimited coding._
|
||
|
||
<table>
|
||
<tr>
|
||
<td align="center" width="110">
|
||
<a href="https://github.com/openclaw/openclaw">
|
||
<img src="./public/providers/openclaw.png" alt="OpenClaw" width="48"/><br/>
|
||
<b>OpenClaw</b>
|
||
</a><br/>
|
||
<sub>⭐ 205K</sub>
|
||
</td>
|
||
<td align="center" width="110">
|
||
<a href="https://github.com/HKUDS/nanobot">
|
||
<img src="./public/providers/nanobot.png" alt="NanoBot" width="48"/><br/>
|
||
<b>NanoBot</b>
|
||
</a><br/>
|
||
<sub>⭐ 20.9K</sub>
|
||
</td>
|
||
<td align="center" width="110">
|
||
<a href="https://github.com/sipeed/picoclaw">
|
||
<img src="./public/providers/picoclaw.jpg" alt="PicoClaw" width="48"/><br/>
|
||
<b>PicoClaw</b>
|
||
</a><br/>
|
||
<sub>⭐ 14.6K</sub>
|
||
</td>
|
||
<td align="center" width="110">
|
||
<a href="https://github.com/zeroclaw-labs/zeroclaw">
|
||
<img src="./public/providers/zeroclaw.png" alt="ZeroClaw" width="48"/><br/>
|
||
<b>ZeroClaw</b>
|
||
</a><br/>
|
||
<sub>⭐ 9.9K</sub>
|
||
</td>
|
||
<td align="center" width="110">
|
||
<a href="https://github.com/nearai/ironclaw">
|
||
<img src="./public/providers/ironclaw.png" alt="IronClaw" width="48"/><br/>
|
||
<b>IronClaw</b>
|
||
</a><br/>
|
||
<sub>⭐ 2.1K</sub>
|
||
</td>
|
||
</tr>
|
||
<tr>
|
||
<td align="center" width="110">
|
||
<a href="https://github.com/anomalyco/opencode">
|
||
<img src="./public/providers/opencode.svg" alt="OpenCode" width="48"/><br/>
|
||
<b>OpenCode</b>
|
||
</a><br/>
|
||
<sub>⭐ 106K</sub>
|
||
</td>
|
||
<td align="center" width="110">
|
||
<a href="https://github.com/openai/codex">
|
||
<img src="./public/providers/codex.png" alt="Codex CLI" width="48"/><br/>
|
||
<b>Codex CLI</b>
|
||
</a><br/>
|
||
<sub>⭐ 60.8K</sub>
|
||
</td>
|
||
<td align="center" width="110">
|
||
<a href="https://github.com/anthropics/claude-code">
|
||
<img src="./public/providers/claude.png" alt="Claude Code" width="48"/><br/>
|
||
<b>Claude Code</b>
|
||
</a><br/>
|
||
<sub>⭐ 67.3K</sub>
|
||
</td>
|
||
<td align="center" width="110">
|
||
<a href="https://github.com/google-gemini/gemini-cli">
|
||
<img src="./public/providers/gemini-cli.png" alt="Gemini CLI" width="48"/><br/>
|
||
<b>Gemini CLI</b>
|
||
</a><br/>
|
||
<sub>⭐ 94.7K</sub>
|
||
</td>
|
||
<td align="center" width="110">
|
||
<a href="https://github.com/Kilo-Org/kilocode">
|
||
<img src="./public/providers/kilocode.png" alt="Kilo Code" width="48"/><br/>
|
||
<b>Kilo Code</b>
|
||
</a><br/>
|
||
<sub>⭐ 15.5K</sub>
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
|
||
<sub>📡 All agents connect via <code>http://localhost:20128/v1</code> or <code>http://cloud.omniroute.online/v1</code> — one config, unlimited models and quota</sub>
|
||
|
||
---
|
||
|
||
## 🤔 Why OmniRoute?
|
||
|
||
**Stop wasting money and hitting limits:**
|
||
|
||
- <img src="https://img.shields.io/badge/✗-e74c3c?style=flat-square" height="16"/> Subscription quota expires unused every month
|
||
- <img src="https://img.shields.io/badge/✗-e74c3c?style=flat-square" height="16"/> Rate limits stop you mid-coding
|
||
- <img src="https://img.shields.io/badge/✗-e74c3c?style=flat-square" height="16"/> Expensive APIs ($20-50/month per provider)
|
||
- <img src="https://img.shields.io/badge/✗-e74c3c?style=flat-square" height="16"/> Manual switching between providers
|
||
|
||
**OmniRoute solves this:**
|
||
|
||
- ✅ **Maximize subscriptions** - Track quota, use every bit before reset
|
||
- ✅ **Auto fallback** - Subscription → API Key → Cheap → Free, zero downtime
|
||
- ✅ **Multi-account** - Round-robin between accounts per provider
|
||
- ✅ **Universal** - Works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool
|
||
|
||
---
|
||
|
||
## 📧 Support
|
||
|
||
> 💬 **Join our community!** [WhatsApp Group](https://chat.whatsapp.com/JI7cDQ1GyaiDHhVBpLxf8b?mode=gi_t) — Get help, share tips, and stay updated.
|
||
|
||
- **Website**: [omniroute.online](https://omniroute.online)
|
||
- **GitHub**: [github.com/diegosouzapw/OmniRoute](https://github.com/diegosouzapw/OmniRoute)
|
||
- **Issues**: [github.com/diegosouzapw/OmniRoute/issues](https://github.com/diegosouzapw/OmniRoute/issues)
|
||
- **WhatsApp**: [Community Group](https://chat.whatsapp.com/JI7cDQ1GyaiDHhVBpLxf8b?mode=gi_t)
|
||
- **Contributing**: See [CONTRIBUTING.md](CONTRIBUTING.md), open a PR, or pick a `good first issue`
|
||
- **Original Project**: [9router by decolua](https://github.com/decolua/9router)
|
||
|
||
### 🐛 Reporting a Bug?
|
||
|
||
When opening an issue, please run the system-info command and attach the generated file:
|
||
|
||
```bash
|
||
npm run system-info
|
||
```
|
||
|
||
This generates a `system-info.txt` with your Node.js version, OmniRoute version, OS details, installed CLI tools (qoder, gemini, claude, codex, antigravity, droid, etc.), Docker/PM2 status, and system packages — everything we need to reproduce your issue quickly. Attach the file directly to your GitHub issue.
|
||
|
||
---
|
||
|
||
## 🔄 How It Works
|
||
|
||
```
|
||
┌─────────────┐
|
||
│ Your CLI │ (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...)
|
||
│ Tool │
|
||
└──────┬──────┘
|
||
│ http://localhost:20128/v1
|
||
↓
|
||
┌─────────────────────────────────────────┐
|
||
│ OmniRoute (Smart Router) │
|
||
│ • Format translation (OpenAI ↔ Claude) │
|
||
│ • Quota tracking + Embeddings + Images │
|
||
│ • Auto token refresh │
|
||
└──────┬──────────────────────────────────┘
|
||
│
|
||
├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI
|
||
│ ↓ quota exhausted
|
||
├─→ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc.
|
||
│ ↓ budget limit
|
||
├─→ [Tier 3: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M)
|
||
│ ↓ budget limit
|
||
└─→ [Tier 4: FREE] Qoder, Qwen, Kiro (unlimited)
|
||
|
||
Result: Never stop coding, minimal cost
|
||
```
|
||
|
||
---
|
||
|
||
## 🎯 What OmniRoute Solves — 30 Real Pain Points & Use Cases
|
||
|
||
> **Every developer using AI tools faces these problems daily.** OmniRoute was built to solve them all — from cost overruns to regional blocks, from broken OAuth flows to protocol operations and enterprise observability.
|
||
|
||
<details>
|
||
<summary><b>💸 1. "I pay for an expensive subscription but still get interrupted by limits"</b></summary>
|
||
|
||
Developers pay $20–200/month for Claude Pro, Codex Pro, or GitHub Copilot. Even paying, quota has a ceiling — 5h of usage, weekly limits, or per-minute rate limits. Mid-coding session, the provider stops responding and the developer loses flow and productivity.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Smart 4-Tier Fallback** — If subscription quota runs out, automatically redirects to API Key → Cheap → Free with zero manual intervention
|
||
- **Provider Limits Tracking** — Cached quota snapshots refresh on a server-side schedule (default `PROVIDER_LIMITS_SYNC_INTERVAL_MINUTES=70`) with manual refresh available in the UI
|
||
- **Multi-Account Support** — Multiple accounts per provider with auto round-robin — when one runs out, switches to the next
|
||
- **Custom Combos** — Customizable fallback chains with 13 balancing strategies (priority, weighted, fill-first, round-robin, P2C, random, least-used, cost-optimized, strict-random, auto, lkgp, context-optimized, **context-relay**)
|
||
- **Structured Combo Builder** — Build combos step-by-step with explicit provider + model + account selection, including repeated providers and fixed-account targets
|
||
- **Quota-Aware P2C** — Power-of-two account selection now factors quota headroom, backoff, recent errors, and consecutive use
|
||
- **Codex Business Quotas** — Business/Team workspace quota monitoring directly in the dashboard
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🔌 2. "I need to use multiple providers but each has a different API"</b></summary>
|
||
|
||
OpenAI uses one format, Claude (Anthropic) uses another, Gemini yet another. If a dev wants to test models from different providers or fallback between them, they need to reconfigure SDKs, change endpoints, deal with incompatible formats. Custom providers (FriendLI, NIM) have non-standard model endpoints.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Unified Endpoint** — A single `http://localhost:20128/v1` serves as proxy for all 100+ providers
|
||
- **Format Translation** — Automatic and transparent: OpenAI ↔ Claude ↔ Gemini ↔ Responses API
|
||
- **Response Sanitization** — Strips non-standard fields (`x_groq`, `usage_breakdown`, `service_tier`) that break OpenAI SDK v1.83+
|
||
- **Role Normalization** — Converts `developer` → `system` for non-OpenAI providers; `system` → `user` for GLM/ERNIE
|
||
- **Think Tag Extraction** — Extracts `<think>` blocks from models like DeepSeek R1 into standardized `reasoning_content`
|
||
- **Structured Output for Gemini** — `json_schema` → `responseMimeType`/`responseSchema` automatic conversion
|
||
- **`stream` defaults to `false`** — Aligns with OpenAI spec, avoiding unexpected SSE in Python/Rust/Go SDKs
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🌐 3. "My AI provider blocks my region/country"</b></summary>
|
||
|
||
Providers like OpenAI/Codex block access from certain geographic regions. Users get errors like `unsupported_country_region_territory` during OAuth and API connections. This is especially frustrating for developers from developing countries.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **3-Level Proxy Config** — Configurable proxy at 3 levels: global (all traffic), per-provider (one provider only), and per-connection/key
|
||
- **Color-Coded Proxy Badges** — Visual indicators: 🟢 global proxy, 🟡 provider proxy, 🔵 connection proxy, always showing the IP
|
||
- **OAuth Token Exchange Through Proxy** — OAuth flow also goes through the proxy, solving `unsupported_country_region_territory`
|
||
- **Connection Tests via Proxy** — Connection tests use the configured proxy (no more direct bypass)
|
||
- **SOCKS5 Support** — Full SOCKS5 proxy support for outbound routing
|
||
- **TLS Fingerprint Spoofing** — Browser-like TLS fingerprint via `wreq-js` to bypass bot detection
|
||
- **🔏 CLI Fingerprint Matching** — Reorders headers and body fields to match native CLI binary signatures, drastically reducing account flagging risk. The proxy IP is preserved — you get both stealth **and** IP masking simultaneously
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🆓 4. "I want to use AI for coding but I have no money"</b></summary>
|
||
|
||
Not everyone can pay $20–200/month for AI subscriptions. Students, devs from emerging countries, hobbyists, and freelancers need access to quality models at zero cost.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Free Tier Providers Built-in** — Native support for 100% free providers: Qoder (5 unlimited models via OAuth: kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2, kimi-k2), Qwen (4 unlimited models: qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next, vision-model), Kiro (Claude + AWS Builder ID for free), Gemini CLI (180K tokens/month free)
|
||
- **Ollama Cloud** — Cloud-hosted Ollama models at `api.ollama.com` with free "Light usage" tier; use `ollamacloud/<model>` prefix
|
||
- **Free-Only Combos** — Chain `gc/gemini-3-flash → if/kimi-k2-thinking → qw/qwen3-coder-plus` = $0/month with zero downtime
|
||
- **NVIDIA NIM Free Access** — ~40 RPM dev-forever free access to 70+ models at build.nvidia.com (transitioning from credits to pure rate limits)
|
||
- **Cost Optimized Strategy** — Routing strategy that automatically chooses the cheapest available provider
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🔒 5. "I need to protect my AI gateway from unauthorized access"</b></summary>
|
||
|
||
When exposing an AI gateway to the network (LAN, VPS, Docker), anyone with the address can consume the developer's tokens/quota. Without protection, APIs are vulnerable to misuse, prompt injection, and abuse.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **API Key Management** — Generation, rotation, and scoping per provider with a dedicated `/dashboard/api-manager` page
|
||
- **Model-Level Permissions** — Restrict API keys to specific models (`openai/*`, wildcard patterns), with Allow All/Restrict toggle
|
||
- **API Endpoint Protection** — Require a key for `/v1/models` and block specific providers from the listing
|
||
- **Auth Guard + CSRF Protection** — All dashboard routes protected with `withAuth` middleware + CSRF tokens
|
||
- **Rate Limiter** — Per-IP rate limiting with configurable windows
|
||
- **IP Filtering** — Allowlist/blocklist for access control
|
||
- **Prompt Injection Guard** — Sanitization against malicious prompt patterns
|
||
- **AES-256-GCM Encryption** — Credentials encrypted at rest
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🛑 6. "My provider went down and I lost my coding flow"</b></summary>
|
||
|
||
AI providers can become unstable, return 5xx errors, or hit temporary rate limits. If a dev depends on a single provider, they're interrupted. Without circuit breakers, repeated retries can crash the application.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Request Queue & Pacing** — Per-connection request buckets smooth bursts before they hit upstream rate caps
|
||
- **Connection Cooldown** — A single connection cools down after retryable failures with optional upstream `Retry-After` hints and exponential backoff
|
||
- **Provider Circuit Breaker** — The provider only trips after fallback is exhausted and the provider request still fails with provider-wide transient errors; connection-scoped `429` rate limits stay in Connection Cooldown
|
||
- **Wait For Cooldown** — The server can wait for the earliest connection cooldown to expire and retry the same client request automatically
|
||
- **Anti-Thundering Herd** — Mutex + semaphore protection against concurrent retry storms
|
||
- **Combo Fallback Chains** — If the primary provider fails, automatically falls through the chain with no intervention
|
||
- **Health Dashboard** — Uptime monitoring, provider circuit breaker states, cooldowns, cache stats, p50/p95/p99 latency
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🔧 7. "Configuring each AI tool is tedious and repetitive"</b></summary>
|
||
|
||
Developers use Cursor, Claude Code, Codex CLI, OpenClaw, Gemini CLI, Kilo Code... Each tool needs a different config (API endpoint, key, model). Reconfiguring when switching providers or models is a waste of time.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **CLI Tools Dashboard** — Dedicated page with one-click setup for Claude Code, Codex CLI, OpenClaw, Kilo Code, Antigravity, Cline
|
||
- **GitHub Copilot Config Generator** — Generates `chatLanguageModels.json` for VS Code with bulk model selection
|
||
- **Onboarding Wizard** — Guided 4-step setup for first-time users
|
||
- **One endpoint, all models** — Configure `http://localhost:20128/v1` once, access 100+ providers
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🔑 8. "Managing OAuth tokens from multiple providers is hell"</b></summary>
|
||
|
||
Claude Code, Codex, Gemini CLI, Copilot — all use OAuth 2.0 with expiring tokens. Developers need to re-authenticate constantly, deal with `client_secret is missing`, `redirect_uri_mismatch`, and failures on remote servers. OAuth on LAN/VPS is particularly problematic.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Auto Token Refresh** — OAuth tokens refresh in background before expiration
|
||
- **OAuth 2.0 (PKCE) Built-in** — Automatic flow for Claude Code, Codex, Gemini CLI, Copilot, Kiro, Qwen, Qoder
|
||
- **Multi-Account OAuth** — Multiple accounts per provider via JWT/ID token extraction
|
||
- **OAuth LAN/Remote Fix** — Private IP detection for `redirect_uri` + manual URL mode for remote servers
|
||
- **OAuth Behind Nginx** — Uses `window.location.origin` for reverse proxy compatibility
|
||
- **Remote OAuth Guide** — Step-by-step guide for Google Cloud credentials on VPS/Docker
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>📊 9. "I don't know how much I'm spending or where"</b></summary>
|
||
|
||
Developers use multiple paid providers but have no unified view of spending. Each provider has its own billing dashboard, but there's no consolidated view. Unexpected costs can pile up.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Cost Analytics Dashboard** — Per-token cost tracking and budget management per provider
|
||
- **Budget Limits per Tier** — Spending ceiling per tier that triggers automatic fallback
|
||
- **Per-Model Pricing Configuration** — Configurable prices per model
|
||
- **Usage Statistics Per API Key** — Request count and last-used timestamp per key
|
||
- **Analytics Dashboard** — Stat cards, model usage chart, provider table with success rates and latency
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🐛 10. "I can't diagnose errors and problems in AI calls"</b></summary>
|
||
|
||
When a call fails, the dev doesn't know if it was a rate limit, expired token, wrong format, or provider error. Fragmented logs across different terminals. Without observability, debugging is trial-and-error.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Unified Logs Dashboard** — 4 tabs: Request Logs, Proxy Logs, Audit Logs, Console
|
||
- **Console Log Viewer** — Real-time terminal-style viewer with color-coded levels, auto-scroll, search, filter
|
||
- **SQLite Summary Logs** — Request and proxy log indexes stay queryable across restarts without loading large payload blobs into SQLite
|
||
- **Translator Playground** — 4 debugging modes: Playground (format translation), Chat Tester (round-trip), Test Bench (batch), Live Monitor (real-time)
|
||
- **Request Telemetry** — p50/p95/p99 latency + X-Request-Id tracing
|
||
- **File-Based Detail Artifacts** — App logs rotate by size, retention days, and archive count; detailed request/response payloads live in `DATA_DIR/call_logs/` and rotate independently of SQLite summaries
|
||
- **System Info Report** — `npm run system-info` generates `system-info.txt` with your full environment (Node version, OmniRoute version, OS, CLI tools, Docker/PM2 status). Attach it when reporting issues for instant triage.
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🏗️ 11. "Deploying and maintaining the gateway is complex"</b></summary>
|
||
|
||
Installing, configuring, and maintaining an AI proxy across different environments (local, VPS, Docker, cloud) is labor-intensive. Problems like hardcoded paths, `EACCES` on directories, port conflicts, and cross-platform builds add friction.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **npm global install** — `npm install -g omniroute && omniroute` — done
|
||
- **Docker Multi-Platform** — AMD64 + ARM64 native (Apple Silicon, AWS Graviton, Raspberry Pi)
|
||
- **Docker Compose Profiles** — `base` (no CLI tools) and `cli` (with Claude Code, Codex, OpenClaw)
|
||
- **Electron Desktop App** — Native app for Windows/macOS/Linux with system tray, auto-start, offline mode
|
||
- **Split-Port Mode** — API and Dashboard on separate ports for advanced scenarios (reverse proxy, container networking)
|
||
- **Cloud Sync** — Config synchronization across devices via Cloudflare Workers
|
||
- **DB Backups** — Automatic backup, restore, export and import of all settings, with `DISABLE_SQLITE_AUTO_BACKUP` for externally managed backups
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🌍 12. "The interface is English-only and my team doesn't speak English"</b></summary>
|
||
|
||
Teams in non-English-speaking countries, especially in Latin America, Asia, and Europe, struggle with English-only interfaces. Language barriers reduce adoption and increase configuration errors.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Dashboard i18n — 30 Languages** — All 500+ keys translated including Arabic, Bulgarian, Danish, German, Spanish, Finnish, French, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese (PT/BR), Romanian, Russian, Slovak, Swedish, Thai, Ukrainian, Vietnamese, Chinese, Filipino, English
|
||
- **RTL Support** — Right-to-left support for Arabic and Hebrew
|
||
- **Multi-Language READMEs** — 30 complete documentation translations
|
||
- **Language Selector** — Globe icon in header for real-time switching
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🔄 13. "I need more than chat — I need embeddings, images, audio"</b></summary>
|
||
|
||
AI isn't just chat completion. Devs need to generate images, transcribe audio, create embeddings for RAG, rerank documents, and moderate content. Each API has a different endpoint and format.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Embeddings** — `/v1/embeddings` with 6 providers and 9+ models
|
||
- **Image Generation** — `/v1/images/generations` with 10 providers and 20+ models (OpenAI, xAI, Together, Fireworks, Nebius, Hyperbolic, NanoBanana, Antigravity, SD WebUI, ComfyUI)
|
||
- **Text-to-Video** — `/v1/videos/generations` — ComfyUI (AnimateDiff, SVD) and SD WebUI
|
||
- **Text-to-Music** — `/v1/music/generations` — ComfyUI (Stable Audio Open, MusicGen)
|
||
- **Audio Transcription** — `/v1/audio/transcriptions` — Whisper + Nvidia NIM, HuggingFace, Qwen3
|
||
- **Text-to-Speech** — `/v1/audio/speech` — ElevenLabs, Nvidia NIM, HuggingFace, Coqui, Tortoise, Qwen3, **Inworld**, **Cartesia**, **PlayHT**, + existing providers
|
||
- **Moderations** — `/v1/moderations` — Content safety checks
|
||
- **Reranking** — `/v1/rerank` — Document relevance reranking
|
||
- **Responses API** — Full `/v1/responses` support for Codex
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🧪 14. "I have no way to test and compare quality across models"</b></summary>
|
||
|
||
Developers want to know which model is best for their use case — code, translation, reasoning — but comparing manually is slow. No integrated eval tools exist.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **LLM Evaluations** — Golden set testing with 10 pre-loaded cases covering greetings, math, geography, code generation, JSON compliance, translation, markdown, safety refusal
|
||
- **4 Match Strategies** — `exact`, `contains`, `regex`, `custom` (JS function)
|
||
- **Translator Playground Test Bench** — Batch testing with multiple inputs and expected outputs, cross-provider comparison
|
||
- **Chat Tester** — Full round-trip with visual response rendering
|
||
- **Live Monitor** — Real-time stream of all requests flowing through the proxy
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>📈 15. "I need to scale without losing performance"</b></summary>
|
||
|
||
As request volume grows, without caching the same questions generate duplicate costs. Without idempotency, duplicate requests waste processing. Per-provider rate limits must be respected.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Semantic Cache** — Two-tier cache (signature + semantic) reduces cost and latency
|
||
- **Request Idempotency** — 5s deduplication window for identical requests
|
||
- **Rate Limit Detection** — Per-provider RPM, min gap, and max concurrent tracking
|
||
- **Request Queue & Pacing** — Configurable queue, pacing, and concurrency defaults in Settings → Resilience
|
||
- **API Key Validation Cache** — 3-tier cache for production performance
|
||
- **Health Dashboard with Telemetry** — p50/p95/p99 latency, cache stats, uptime
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🤖 16. "I want to control model behavior globally"</b></summary>
|
||
|
||
Developers who want all responses in a specific language, with a specific tone, or want to limit reasoning tokens. Configuring this in every tool/request is impractical.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **System Prompt Injection** — Global prompt applied to all requests
|
||
- **Thinking Budget Validation** — Reasoning token allocation control per request (passthrough, auto, custom, adaptive)
|
||
- **9 Routing Strategies** — Global strategies that determine how requests are distributed
|
||
- **Wildcard Router** — `provider/*` patterns route dynamically to any provider
|
||
- **Combo Enable/Disable Toggle** — Toggle combos directly from the dashboard
|
||
- **Manual Combo Ordering** — Drag combo cards by handle and persist the order in SQLite
|
||
- **Provider Toggle** — Enable/disable all connections for a provider with one click
|
||
- **Blocked Providers** — Exclude specific providers from `/v1/models` listing
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🧰 17. "I need MCP tools as first-class product capabilities"</b></summary>
|
||
|
||
Many AI gateways expose MCP only as a hidden implementation detail. Teams need a visible, manageable operation layer.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- MCP appears in the dashboard navigation and endpoint protocol tab
|
||
- Dedicated MCP management page with process, tools, scopes, and audit
|
||
- Built-in quick-start for `omniroute --mcp` and client onboarding
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🧠 18. "I need A2A orchestration with sync + stream task paths"</b></summary>
|
||
|
||
Agent workflows need both direct replies and long-running streamed execution with lifecycle control.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- A2A JSON-RPC endpoint (`POST /a2a`) with `message/send` and `message/stream`
|
||
- SSE streaming with terminal state propagation
|
||
- Task lifecycle APIs for `tasks/get` and `tasks/cancel`
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🛰️ 19. "I need real MCP process health, not guessed status"</b></summary>
|
||
|
||
Operational teams need to know if MCP is actually alive, not just whether an API is reachable.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- Runtime heartbeat file with PID, timestamps, transport, tool count, and scope mode
|
||
- MCP status API combining heartbeat + recent activity
|
||
- UI status cards for process/uptime/heartbeat freshness
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>📋 20. "I need auditable MCP tool execution"</b></summary>
|
||
|
||
When tools mutate config or trigger ops actions, teams need forensic traceability.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- SQLite-backed audit logging for MCP tool calls
|
||
- Filters by tool, success/failure, API key, and pagination
|
||
- Dashboard audit table + stats endpoints for automation
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🔐 21. "I need scoped MCP permissions per integration"</b></summary>
|
||
|
||
Different clients should have least-privilege access to tool categories.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- 10 granular MCP scopes for controlled tool access
|
||
- Scope enforcement and visibility in MCP management UI
|
||
- Safe default posture for operational tooling
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>⚙️ 22. "I need operational controls without redeploying"</b></summary>
|
||
|
||
Teams need quick runtime changes during incidents or cost events.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- Switch combo activation directly from MCP dashboard
|
||
- Tune queue, cooldown, breaker, and wait settings from the dedicated Resilience page
|
||
- Review live provider breaker state from the Health dashboard
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🔄 23. "I need live A2A task lifecycle visibility and cancellation"</b></summary>
|
||
|
||
Without lifecycle visibility, task incidents become hard to triage.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- Task listing/filtering by state/skill with pagination
|
||
- Drill-down on task metadata, events, and artifacts
|
||
- Task cancellation endpoint and UI action with confirmation
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🌊 24. "I need active stream metrics for A2A load"</b></summary>
|
||
|
||
Streaming workflows require operational insight into concurrency and live connections.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- Active stream counters integrated into A2A status
|
||
- Last task timestamp and per-state counts
|
||
- A2A dashboard cards for real-time ops monitoring
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🪪 25. "I need standard agent discovery for clients"</b></summary>
|
||
|
||
External clients and orchestrators need machine-readable metadata for onboarding.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- Agent Card exposed at `/.well-known/agent.json`
|
||
- Capabilities and skills shown in management UI
|
||
- A2A status API includes discovery metadata for automation
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🧭 26. "I need protocol discoverability in the product UX"</b></summary>
|
||
|
||
If users cannot discover protocol surfaces, adoption and support quality drop.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- Consolidated **Endpoints** page with tabs for Proxy, MCP, A2A, and API Endpoints
|
||
- Inline service status toggles (Online/Offline) for MCP and A2A
|
||
- Links from overview to dedicated management tabs
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🧪 27. "I need end-to-end protocol validation with real clients"</b></summary>
|
||
|
||
Mock tests are not enough to validate protocol compatibility before release.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- E2E suite that boots app and uses real MCP SDK client transport
|
||
- A2A client tests for discovery, send, stream, get, and cancel flows
|
||
- Cross-check assertions against MCP audit and A2A tasks APIs
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>📡 28. "I need unified observability across all interfaces"</b></summary>
|
||
|
||
Splitting observability by protocol creates blind spots and longer MTTR.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- Unified dashboards/logs/analytics in one product
|
||
- Health + audit + request telemetry across OpenAI, MCP, and A2A layers
|
||
- Operational APIs for status and automation
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>💼 29. "I need one runtime for proxy + tools + agent orchestration"</b></summary>
|
||
|
||
Running many separate services increases operational cost and failure modes.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- OpenAI-compatible proxy, MCP server, and A2A server in one stack
|
||
- Shared auth, resilience, data store, and observability
|
||
- Consistent policy model across all interaction surfaces
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🚀 30. "I need to ship agentic workflows without glue-code sprawl"</b></summary>
|
||
|
||
Teams lose velocity when stitching multiple ad-hoc services and scripts.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- Unified endpoint strategy for clients and agents
|
||
- Built-in protocol management UIs and smoke validation paths
|
||
- Production-ready foundations (security, logging, resilience, backup)
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>📚 31. "My long sessions crash with 'context_length_exceeded' limits"</b></summary>
|
||
|
||
During deep debugging, long histories with tool results quickly exceed provider token windows, causing failed requests and orphaned context.
|
||
|
||
**How OmniRoute solves it:**
|
||
|
||
- **Proactive Context Compression** — Evaluates token budgets before the request hits upstream and proactively prunes old conversation history with a smart binary-search mechanism.
|
||
- **Structural Integrity Guards** — Automatically tracks explicit `tool_use` definitions and ensures that if a tool input is truncated, its corresponding `tool_result` is also safely removed, preventing API validation errors.
|
||
- **Multi-Layer Dropping** — Progressively drops system messages, regular messages, and finally enforces strict length limits without breaking conversational logic.
|
||
|
||
</details>
|
||
|
||
### Example Playbooks (Integrated Use Cases)
|
||
|
||
**Playbook A: Maximize paid subscription + cheap backup**
|
||
|
||
```txt
|
||
Combo: "maximize-claude"
|
||
1. cc/claude-opus-4-7
|
||
2. glm/glm-4.7
|
||
3. if/kimi-k2-thinking
|
||
|
||
Monthly cost: $20 + small backup spend
|
||
Outcome: higher quality, near-zero interruption
|
||
```
|
||
|
||
**Playbook B: Zero-cost coding stack**
|
||
|
||
```txt
|
||
Combo: "free-forever"
|
||
1. gc/gemini-3-flash
|
||
2. if/kimi-k2-thinking
|
||
3. qw/qwen3-coder-plus
|
||
|
||
Monthly cost: $0
|
||
Outcome: stable free coding workflow
|
||
```
|
||
|
||
**Playbook C: 24/7 always-on fallback chain**
|
||
|
||
```txt
|
||
Combo: "always-on"
|
||
1. cc/claude-opus-4-7
|
||
2. cx/gpt-5.2-codex
|
||
3. glm/glm-4.7
|
||
4. minimax/MiniMax-M2.1
|
||
5. if/kimi-k2-thinking
|
||
|
||
Outcome: deep fallback depth for deadline-critical workloads
|
||
```
|
||
|
||
**Playbook D: Agent ops with MCP + A2A**
|
||
|
||
```txt
|
||
1) Start MCP transport (`omniroute --mcp`) for tool-driven operations
|
||
2) Run A2A tasks via `message/send` and `message/stream`
|
||
3) Observe via /dashboard/endpoint (MCP and A2A tabs)
|
||
4) Toggle services via inline status controls
|
||
```
|
||
|
||
---
|
||
|
||
## 🆓 Start Free — Zero Configuration Cost
|
||
|
||
> Setup AI coding in minutes at **$0/month**. Connect these free accounts and use the built-in **Free Stack** combo.
|
||
|
||
| Step | Action | Providers Unlocked |
|
||
| ---- | -------------------------------------------------- | ------------------------------------------------------------------ |
|
||
| 1 | Connect **Kiro** (AWS Builder ID OAuth) | Claude Sonnet 4.5, Haiku 4.5 — **unlimited** |
|
||
| 2 | Connect **Qoder** (Google OAuth) | kimi-k2-thinking, qwen3-coder-plus, deepseek-r1... — **unlimited** |
|
||
| 3 | Connect **Qwen** (Device Code) | qwen3-coder-plus, qwen3-coder-flash... — **unlimited** |
|
||
| 4 | Connect **Gemini CLI** (Google OAuth) | gemini-3-flash, gemini-2.5-pro — **180K/mo free** |
|
||
| 5 | `/dashboard/combos` → **Free Stack ($0)** template | Round-robin all free providers automatically |
|
||
|
||
**Point any IDE/CLI to:** `http://localhost:20128/v1` · API Key: `any-string` · Done.
|
||
|
||
> **Optional extra coverage (also free):** Groq API key (30 RPM free), NVIDIA NIM (40 RPM free, 70+ models), Cerebras (1M tok/day), LongCat API key (50M tokens/day!), Cloudflare Workers AI (10K Neurons/day, 50+ models).
|
||
|
||
## Inicio Rápido
|
||
|
||
### 1) Install and run
|
||
|
||
```bash
|
||
npm install -g omniroute
|
||
omniroute
|
||
```
|
||
|
||
> **pnpm users:** Run `pnpm approve-builds -g` after install to enable native build scripts required by `better-sqlite3` and `@swc/core`:
|
||
>
|
||
> ```bash
|
||
> pnpm install -g omniroute
|
||
> pnpm approve-builds -g # Select all packages → approve
|
||
> omniroute
|
||
> ```
|
||
|
||
Dashboard opens at `http://localhost:20128` and API base URL is `http://localhost:20128/v1`.
|
||
|
||
#### Arch Linux (AUR)
|
||
|
||
Arch Linux users can install the [AUR package](https://aur.archlinux.org/packages/omniroute-bin), which installs OmniRoute and provides a systemd user service:
|
||
|
||
```bash
|
||
yay -S omniroute-bin
|
||
systemctl --user enable --now omniroute.service
|
||
```
|
||
|
||
| Command | Description |
|
||
| ----------------------- | ----------------------------------------------------------- |
|
||
| `omniroute` | Start server (`PORT=20128`, API and dashboard on same port) |
|
||
| `omniroute --port 3000` | Set canonical/API port to 3000 |
|
||
| `omniroute --mcp` | Start MCP server (stdio transport) |
|
||
| `omniroute --no-open` | Don't auto-open browser |
|
||
| `omniroute --help` | Show help |
|
||
|
||
Optional split-port mode:
|
||
|
||
```bash
|
||
PORT=20128 DASHBOARD_PORT=20129 omniroute
|
||
# API: http://localhost:20128/v1
|
||
# Dashboard: http://localhost:20129
|
||
```
|
||
|
||
### 2) Uninstalling
|
||
|
||
When you no longer need OmniRoute, we provide two quick scripts for a clean removal:
|
||
|
||
| Command | Action |
|
||
| ------------------------ | ----------------------------------------------------------------------------------- |
|
||
| `npm run uninstall` | Removes the system app but **keeps your DB and configurations** in `~/.omniroute`. |
|
||
| `npm run uninstall:full` | Removes the app AND permanently **erases all configurations, keys, and databases**. |
|
||
|
||
> Note: To run these commands, navigate to the OmniRoute project folder (if you cloned it) and run them. Alternatively, if globally installed, you can simply run `npm uninstall -g omniroute`.
|
||
|
||
### Long-Running Streaming Timeouts
|
||
|
||
For most deployments, you only need:
|
||
|
||
| Variable | Default | Purpose |
|
||
| ------------------------ | ----------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| `REQUEST_TIMEOUT_MS` | `600000` | Shared baseline for upstream response-start timeout, hidden Undici timeouts, TLS fingerprint requests, and API bridge request/proxy timeouts |
|
||
| `STREAM_IDLE_TIMEOUT_MS` | inherits `REQUEST_TIMEOUT_MS` | Maximum gap between streaming chunks before OmniRoute aborts the SSE stream |
|
||
|
||
Backward compatibility is preserved: existing `FETCH_TIMEOUT_MS`, `API_BRIDGE_PROXY_TIMEOUT_MS`, and other per-layer timeout vars still work and override the shared baseline.
|
||
|
||
For Claude Code-compatible upstreams (`anthropic-compatible-cc-*`), OmniRoute also derives the outbound `X-Stainless-Timeout` header from the resolved fetch timeout so provider-side read timeouts stay aligned with your env configuration.
|
||
|
||
For third-party Claude Code-compatible reverse proxies, OmniRoute keeps the default
|
||
`anthropic-beta` set conservative and, when `Client Cache Control` is left on `Auto`,
|
||
only forwards client-provided `cache_control` markers. If the request does not include
|
||
`cache_control`, OmniRoute does not inject bridge-owned markers.
|
||
|
||
Advanced overrides are available if you need finer control:
|
||
|
||
| Variable | Default | Purpose |
|
||
| ---------------------------------------- | ------------------------------------------ | -------------------------------------------------------------------- |
|
||
| `FETCH_TIMEOUT_MS` | inherits `REQUEST_TIMEOUT_MS` | Upstream response-start timeout used until response headers arrive |
|
||
| `FETCH_HEADERS_TIMEOUT_MS` | inherits `FETCH_TIMEOUT_MS` | Undici time limit for receiving upstream response headers |
|
||
| `FETCH_BODY_TIMEOUT_MS` | inherits `FETCH_TIMEOUT_MS` | Undici time limit between upstream body chunks (`0` disables it) |
|
||
| `FETCH_CONNECT_TIMEOUT_MS` | `30000` | Undici TCP connect timeout |
|
||
| `FETCH_KEEPALIVE_TIMEOUT_MS` | `4000` | Undici idle keep-alive socket timeout |
|
||
| `TLS_CLIENT_TIMEOUT_MS` | inherits `FETCH_TIMEOUT_MS` | Timeout for TLS fingerprint requests made through `wreq-js` |
|
||
| `API_BRIDGE_PROXY_TIMEOUT_MS` | inherits `REQUEST_TIMEOUT_MS` or `30000` | Timeout for `/v1` proxy forwarding from API port to dashboard port |
|
||
| `API_BRIDGE_SERVER_REQUEST_TIMEOUT_MS` | `max(API_BRIDGE_PROXY_TIMEOUT_MS, 300000)` | Incoming request timeout on the API bridge server |
|
||
| `API_BRIDGE_SERVER_HEADERS_TIMEOUT_MS` | `60000` | Incoming header timeout on the API bridge server |
|
||
| `API_BRIDGE_SERVER_KEEPALIVE_TIMEOUT_MS` | `5000` | Keep-alive timeout on the API bridge server |
|
||
| `API_BRIDGE_SERVER_SOCKET_TIMEOUT_MS` | `0` | Socket inactivity timeout on the API bridge server (`0` disables it) |
|
||
|
||
For streaming requests, `FETCH_TIMEOUT_MS` only covers connection setup / waiting for the first upstream response. Once the stream is active, OmniRoute will only abort on an actual stall (`STREAM_IDLE_TIMEOUT_MS`) or Undici body inactivity (`FETCH_BODY_TIMEOUT_MS`).
|
||
|
||
If you run OmniRoute behind Nginx, Caddy, Cloudflare, or another reverse proxy, make sure the proxy
|
||
timeouts are also higher than your OmniRoute stream/fetch timeouts.
|
||
|
||
### 2) Connect providers and create your API key
|
||
|
||
1. Open Dashboard → `Providers` and connect at least one provider (OAuth or API key).
|
||
2. Open Dashboard → `Endpoints` and create an API key.
|
||
3. (Optional) Open Dashboard → `Combos` and set your fallback chain.
|
||
|
||
### 3) Point your coding tool to OmniRoute
|
||
|
||
```txt
|
||
Base URL: http://localhost:20128/v1
|
||
API Key: [copy from Endpoint page]
|
||
Model: if/kimi-k2-thinking (or any provider/model prefix)
|
||
```
|
||
|
||
Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Cline, OpenClaw, OpenCode, and OpenAI-compatible SDKs.
|
||
|
||
### 4) Enable and validate protocols (v2.0)
|
||
|
||
**MCP (for tool-driven operations):**
|
||
|
||
```bash
|
||
omniroute --mcp
|
||
```
|
||
|
||
Then connect your MCP client over `stdio` and test tools like:
|
||
|
||
- `omniroute_get_health`
|
||
- `omniroute_list_combos`
|
||
|
||
**A2A (for agent-to-agent workflows):**
|
||
|
||
```bash
|
||
curl http://localhost:20128/.well-known/agent.json
|
||
```
|
||
|
||
```bash
|
||
curl -X POST http://localhost:20128/a2a \
|
||
-H 'content-type: application/json' \
|
||
-d '{"jsonrpc":"2.0","id":"quickstart","method":"message/send","params":{"skill":"quota-management","messages":[{"role":"user","content":"Give me a short quota summary."}]}}'
|
||
```
|
||
|
||
### 5) Validate everything end-to-end (recommended)
|
||
|
||
```bash
|
||
npm run test:protocols:e2e
|
||
```
|
||
|
||
This suite validates real MCP and A2A client flows against a running app.
|
||
|
||
### Alternative: run from source
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
npm install
|
||
PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run dev
|
||
```
|
||
|
||
<details>
|
||
<summary><b>Void Linux (`xbps-src` template)</b></summary>
|
||
|
||
For Void Linux users, you can build a native package using `xbps-src`. Save this block as `srcpkgs/omniroute/template`:
|
||
|
||
```bash
|
||
# Template file for 'omniroute'
|
||
pkgname=omniroute
|
||
version=3.4.1
|
||
revision=1
|
||
hostmakedepends="nodejs python3 make"
|
||
depends="openssl"
|
||
short_desc="Universal AI gateway with smart routing for multiple LLM providers"
|
||
maintainer="zenobit <zenobit@disroot.org>"
|
||
license="MIT"
|
||
homepage="https://github.com/diegosouzapw/OmniRoute"
|
||
distfiles="https://github.com/diegosouzapw/OmniRoute/archive/refs/tags/v${version}.tar.gz"
|
||
checksum=009400afee90a9f32599d8fe734145cfd84098140b7287990183dde45ae2245b
|
||
system_accounts="_omniroute"
|
||
omniroute_homedir="/var/lib/omniroute"
|
||
export NODE_ENV=production
|
||
export npm_config_engine_strict=false
|
||
export npm_config_loglevel=error
|
||
export npm_config_fund=false
|
||
export npm_config_audit=false
|
||
|
||
do_build() {
|
||
# Determine target CPU arch for node-gyp
|
||
local _gyp_arch
|
||
case "$XBPS_TARGET_MACHINE" in
|
||
aarch64*) _gyp_arch=arm64 ;;
|
||
armv7*|armv6*) _gyp_arch=arm ;;
|
||
i686*) _gyp_arch=ia32 ;;
|
||
*) _gyp_arch=x64 ;;
|
||
esac
|
||
|
||
# 1) Install all deps – skip scripts (no network in do_build, native modules
|
||
# compiled separately below; better-sqlite3 is serverExternalPackage so
|
||
# Next.js does not execute it during next build)
|
||
NODE_ENV=development npm ci --ignore-scripts
|
||
|
||
# 2) Build the Next.js standalone bundle
|
||
npm run build
|
||
|
||
# 3) Copy static assets into standalone
|
||
cp -r .next/static .next/standalone/.next/static
|
||
[ -d public ] && cp -r public .next/standalone/public || true
|
||
|
||
# 4) Compile better-sqlite3 native binding for the target architecture.
|
||
# Use node-gyp directly so CC/CXX from xbps-src cross-toolchain are used
|
||
# without npm altering them.
|
||
local _node_gyp=/usr/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js
|
||
(cd node_modules/better-sqlite3 && node "$_node_gyp" rebuild --arch="$_gyp_arch")
|
||
|
||
# 5) Place the compiled binding into the standalone bundle
|
||
local _bs3_release=.next/standalone/node_modules/better-sqlite3/build/Release
|
||
mkdir -p "$_bs3_release"
|
||
cp node_modules/better-sqlite3/build/Release/better_sqlite3.node "$_bs3_release/"
|
||
|
||
# 6) Remove arch-specific sharp bundles – upstream sets images.unoptimized=true
|
||
# so sharp is not used at runtime; x64 .so files would break aarch64 strip
|
||
rm -rf .next/standalone/node_modules/@img
|
||
|
||
# 7) Copy pino runtime deps omitted by Next.js static analysis:
|
||
# pino-abstract-transport – required by pino's worker thread
|
||
# split2 – dep of pino-abstract-transport
|
||
# process-warning – dep of pino itself
|
||
for _mod in pino-abstract-transport split2 process-warning; do
|
||
cp -r "node_modules/$_mod" .next/standalone/node_modules/
|
||
done
|
||
}
|
||
|
||
do_check() {
|
||
npm run test:unit
|
||
}
|
||
|
||
do_install() {
|
||
vmkdir usr/lib/omniroute/.next
|
||
|
||
vcopy .next/standalone/. usr/lib/omniroute/.next/standalone
|
||
|
||
# Prevent removal of empty Next.js app router dirs by the post-install hook
|
||
for _d in \
|
||
.next/standalone/.next/server/app/dashboard \
|
||
.next/standalone/.next/server/app/dashboard/settings \
|
||
.next/standalone/.next/server/app/dashboard/providers; do
|
||
touch "${DESTDIR}/usr/lib/omniroute/${_d}/.keep"
|
||
done
|
||
|
||
cat > "${WRKDIR}/omniroute" <<'EOF'
|
||
#!/bin/sh
|
||
export PORT="${PORT:-20128}"
|
||
export DATA_DIR="${DATA_DIR:-${XDG_DATA_HOME:-${HOME}/.local/share}/omniroute}"
|
||
export APP_LOG_TO_FILE="${APP_LOG_TO_FILE:-false}"
|
||
mkdir -p "${DATA_DIR}"
|
||
exec node /usr/lib/omniroute/.next/standalone/server.js "$@"
|
||
EOF
|
||
vbin "${WRKDIR}/omniroute"
|
||
}
|
||
|
||
post_install() {
|
||
vlicense LICENSE
|
||
}
|
||
```
|
||
|
||
</details>
|
||
|
||
---
|
||
|
||
## 🐳 Docker
|
||
|
||
OmniRoute is available as a public Docker image on [Docker Hub](https://hub.docker.com/r/diegosouzapw/omniroute).
|
||
|
||
**Quick run:**
|
||
|
||
```bash
|
||
docker run -d \
|
||
--name omniroute \
|
||
--restart unless-stopped \
|
||
--stop-timeout 40 \
|
||
-p 20128:20128 \
|
||
-v omniroute-data:/app/data \
|
||
diegosouzapw/omniroute:latest
|
||
```
|
||
|
||
**With environment file:**
|
||
|
||
```bash
|
||
# Copy and edit .env first
|
||
cp .env.example .env
|
||
|
||
docker run -d \
|
||
--name omniroute \
|
||
--restart unless-stopped \
|
||
--stop-timeout 40 \
|
||
--env-file .env \
|
||
-p 20128:20128 \
|
||
-v omniroute-data:/app/data \
|
||
diegosouzapw/omniroute:latest
|
||
```
|
||
|
||
**Using Docker Compose:**
|
||
|
||
```bash
|
||
# Base profile (no CLI tools)
|
||
docker compose --profile base up -d
|
||
|
||
# CLI profile (Claude Code, Codex, OpenClaw built-in)
|
||
docker compose --profile cli up -d
|
||
```
|
||
|
||
Dashboard support for Docker deployments now includes a one-click **Cloudflare Quick Tunnel** on `Dashboard → Endpoints`. The first enable downloads `cloudflared` only when needed, starts a temporary tunnel to your current `/v1` endpoint, and shows the generated `https://*.trycloudflare.com/v1` URL directly below your normal public URL.
|
||
|
||
Notes:
|
||
|
||
- Quick Tunnel URLs are temporary and change after every restart.
|
||
- Quick Tunnels are not auto-restored after an OmniRoute or container restart. Re-enable them from the dashboard when needed.
|
||
- Managed install currently supports Linux, macOS, and Windows on `x64` / `arm64`.
|
||
- Managed Quick Tunnels default to HTTP/2 transport to avoid noisy QUIC UDP buffer warnings in constrained container environments. Set `CLOUDFLARED_PROTOCOL=quic` or `auto` if you want a different transport.
|
||
- Docker images bundle system CA roots and pass them to managed `cloudflared`, which avoids TLS trust failures when the tunnel bootstraps inside the container.
|
||
- SQLite runs in WAL mode. `docker stop` should be allowed to finish so OmniRoute can checkpoint the latest changes back into `storage.sqlite`.
|
||
- The bundled Compose files already set a 40s stop grace period. If you run the image directly, keep `--stop-timeout 40` (or similar) so manual stops do not cut off shutdown cleanup.
|
||
- Set `CLOUDFLARED_BIN=/absolute/path/to/cloudflared` if you want OmniRoute to use an existing binary instead of downloading one.
|
||
|
||
**Using Docker Compose with Caddy (HTTPS Auto-TLS):**
|
||
|
||
OmniRoute can be securely exposed using Caddy's automatic SSL provisioning. Ensure your domain's DNS A record points to your server's IP.
|
||
|
||
```yaml
|
||
services:
|
||
omniroute:
|
||
image: diegosouzapw/omniroute:latest
|
||
container_name: omniroute
|
||
restart: unless-stopped
|
||
volumes:
|
||
- omniroute-data:/app/data
|
||
environment:
|
||
- PORT=20128
|
||
- NEXT_PUBLIC_BASE_URL=https://your-domain.com
|
||
|
||
caddy:
|
||
image: caddy:latest
|
||
container_name: caddy
|
||
restart: unless-stopped
|
||
ports:
|
||
- "80:80"
|
||
- "443:443"
|
||
command: caddy reverse-proxy --from https://your-domain.com --to http://omniroute:20128
|
||
|
||
volumes:
|
||
omniroute-data:
|
||
```
|
||
|
||
| Image | Tag | Size | Description |
|
||
| ------------------------ | -------- | ------ | --------------------- |
|
||
| `diegosouzapw/omniroute` | `latest` | ~250MB | Latest stable release |
|
||
| `diegosouzapw/omniroute` | `3.6.2` | ~250MB | Current version |
|
||
|
||
---
|
||
|
||
## 🖥️ Desktop App — Offline & Always-On
|
||
|
||
> 🆕 **NEW!** OmniRoute is now available as a **native desktop application** for Windows, macOS, and Linux.
|
||
|
||
Run OmniRoute as a standalone desktop app — no terminal, no browser, no internet required for local models. The Electron-based app includes:
|
||
|
||
- 🖥️ **Native Window** — Dedicated app window with system tray integration
|
||
- 🔄 **Auto-Start** — Launch OmniRoute on system login
|
||
- 🔔 **Native Notifications** — Get alerts for quota exhaustion or provider issues
|
||
- ⚡ **One-Click Install** — NSIS (Windows), DMG (macOS), AppImage (Linux)
|
||
- 🌐 **Offline Mode** — Works fully offline with bundled server
|
||
|
||
### Inicio Rápido
|
||
|
||
```bash
|
||
# Development mode
|
||
npm run electron:dev
|
||
|
||
# Build for your platform
|
||
npm run electron:build # Current platform
|
||
npm run electron:build:win # Windows (.exe)
|
||
npm run electron:build:mac # macOS (.dmg) — x64 & arm64
|
||
npm run electron:build:linux # Linux (.AppImage)
|
||
```
|
||
|
||
### System Tray
|
||
|
||
When minimized, OmniRoute lives in your system tray with quick actions:
|
||
|
||
- Open dashboard
|
||
- Change server port
|
||
- Quit application
|
||
|
||
📖 Full documentation: [`electron/README.md`](electron/README.md)
|
||
|
||
---
|
||
|
||
## 💰 Pricing at a Glance
|
||
|
||
| Tier | Provider | Cost | Quota Reset | Best For |
|
||
| ------------------- | --------------------------- | ------------------------- | ---------------- | --------------------------------- |
|
||
| **💳 SUBSCRIPTION** | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed |
|
||
| | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users |
|
||
| | Gemini CLI | **FREE** | 180K/mo + 1K/day | Everyone! |
|
||
| | GitHub Copilot | $10-19/mo | Monthly | GitHub users |
|
||
| **🔑 API KEY** | NVIDIA NIM | **FREE** (dev forever) | ~40 RPM | 70+ open models |
|
||
| | Cerebras | **FREE** (1M tok/day) | 60K TPM / 30 RPM | World's fastest |
|
||
| | Groq | **FREE** (30 RPM) | 14.4K RPD | Ultra-fast Llama/Gemma |
|
||
| | DeepSeek V3.2 | $0.27/$1.10 per 1M | None | Best price/quality reasoning |
|
||
| | xAI Grok-4 Fast | **$0.20/$0.50 per 1M** 🆕 | None | Fastest + tool calling, ultralow |
|
||
| | xAI Grok-4 (standard) | $0.20/$1.50 per 1M 🆕 | None | Reasoning flagship from xAI |
|
||
| | Mistral | Free trial + paid | Rate limited | European AI |
|
||
| | OpenRouter | Pay-per-use | None | 100+ models aggr. |
|
||
| **💰 CHEAP** | GLM-5 (via Z.AI) 🆕 | $0.5/1M | Daily 10AM | 128K output, newest flagship |
|
||
| | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup |
|
||
| | MiniMax M2.5 🆕 | $0.3/1M input | 5-hour rolling | Reasoning + agentic tasks |
|
||
| | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option |
|
||
| | Kimi K2.5 (Moonshot API) 🆕 | Pay-per-use | None | Direct Moonshot API access |
|
||
| | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost |
|
||
| **🆓 FREE** | Qoder | **$0** | Unlimited | 5 models unlimited |
|
||
| | Qwen | **$0** | Unlimited | 4 models unlimited |
|
||
| | Kiro | **$0** | Unlimited | Claude Sonnet/Haiku (AWS Builder) |
|
||
| | LongCat Flash-Lite 🆕 | **$0** (50M tok/day 🔥) | 1 RPS | Largest free quota on Earth |
|
||
| | Pollinations AI 🆕 | **$0** (no key needed) | 1 req/15s | GPT-5, Claude, DeepSeek, Llama 4 |
|
||
| | Cloudflare Workers AI 🆕 | **$0** (10K Neurons/day) | ~150 resp/day | 50+ models, global edge |
|
||
| | Scaleway AI 🆕 | **$0** (1M tokens total) | Rate limited | EU/GDPR, Qwen3 235B, Llama 70B |
|
||
|
||
> 🆕 **New models added (Mar 2026):** Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.
|
||
|
||
**💡 $0 Combo Stack — The Complete Free Setup:**
|
||
|
||
```
|
||
# 🆓 Ultimate Free Stack 2026 — 11 Providers, $0 Forever
|
||
Kiro (kr/) → Claude Sonnet/Haiku UNLIMITED
|
||
Qoder (if/) → kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 UNLIMITED
|
||
LongCat Lite (lc/) → LongCat-Flash-Lite — 50M tokens/day 🔥
|
||
Pollinations (pol/) → GPT-5, Claude, DeepSeek, Llama 4 — no key needed
|
||
Qwen (qw/) → qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next UNLIMITED
|
||
Gemini (gemini/) → Gemini 2.5 Flash — 1,500 req/day free API key
|
||
Cloudflare AI (cf/) → Llama 70B, Gemma 3, Mistral — 10K Neurons/day
|
||
Scaleway (scw/) → Qwen3 235B, Llama 70B — 1M free tokens (EU)
|
||
Groq (groq/) → Llama/Gemma ultra-fast — 14.4K req/day
|
||
NVIDIA NIM (nvidia/) → 70+ open models — 40 RPM forever
|
||
Cerebras (cerebras/) → Llama/Qwen world-fastest — 1M tok/day
|
||
```
|
||
|
||
**Zero cost. Never stops coding.** Configure this as one OmniRoute combo and all fallbacks happen automatically — no manual switching ever.
|
||
|
||
---
|
||
|
||
---
|
||
|
||
## 🆓 Free Models — What You Actually Get
|
||
|
||
> All models below are **100% free with zero credit card required**. OmniRoute auto-routes between them when one quota runs out — combine them all for an unbreakable $0 combo.
|
||
|
||
### 🔵 CLAUDE MODELS (via Kiro — AWS Builder ID)
|
||
|
||
| Model | Prefix | Limit | Rate Limit |
|
||
| ------------------- | ------ | ------------- | --------------------- |
|
||
| `claude-sonnet-4.5` | `kr/` | **Unlimited** | No reported daily cap |
|
||
| `claude-haiku-4.5` | `kr/` | **Unlimited** | No reported daily cap |
|
||
| `claude-opus-4.6` | `kr/` | **Unlimited** | Latest Opus via Kiro |
|
||
|
||
### 🟢 QODER MODELS (Free PAT via qodercli)
|
||
|
||
| Model | Prefix | Limit | Rate Limit |
|
||
| ------------------ | ------ | ------------- | --------------- |
|
||
| `kimi-k2-thinking` | `if/` | **Unlimited** | No reported cap |
|
||
| `qwen3-coder-plus` | `if/` | **Unlimited** | No reported cap |
|
||
| `deepseek-r1` | `if/` | **Unlimited** | No reported cap |
|
||
| `minimax-m2.1` | `if/` | **Unlimited** | No reported cap |
|
||
| `kimi-k2` | `if/` | **Unlimited** | No reported cap |
|
||
|
||
> Recommended connection method: **Personal Access Token + `qodercli`**. Browser OAuth is
|
||
> experimental and disabled by default unless `QODER_OAUTH_*` environment variables are configured.
|
||
|
||
### 🟡 QWEN MODELS (Device Code Auth)
|
||
|
||
| Model | Prefix | Limit | Rate Limit |
|
||
| ------------------- | ------ | ------------- | ------------------- |
|
||
| `qwen3-coder-plus` | `qw/` | **Unlimited** | No reported cap |
|
||
| `qwen3-coder-flash` | `qw/` | **Unlimited** | No reported cap |
|
||
| `qwen3-coder-next` | `qw/` | **Unlimited** | No reported cap |
|
||
| `vision-model` | `qw/` | **Unlimited** | Multimodal (images) |
|
||
|
||
### 🟣 GEMINI CLI (Google OAuth)
|
||
|
||
| Model | Prefix | Limit | Rate Limit |
|
||
| ------------------------ | ------ | --------------------------- | ------------- |
|
||
| `gemini-3-flash-preview` | `gc/` | **180K tok/month** + 1K/day | Monthly reset |
|
||
| `gemini-2.5-pro` | `gc/` | 180K/month (shared pool) | High quality |
|
||
|
||
### ⚫ NVIDIA NIM (Free API Key — build.nvidia.com)
|
||
|
||
| Tier | Daily Limit | Rate Limit | Notes |
|
||
| ---------- | ------------ | ----------- | ------------------------------------------------------ |
|
||
| Free (Dev) | No token cap | **~40 RPM** | 70+ models; transitioning to pure rate limits mid-2025 |
|
||
|
||
Popular free models: `moonshotai/kimi-k2.5` (Kimi K2.5), `z-ai/glm4.7` (GLM 4.7), `deepseek-ai/deepseek-v3.2` (DeepSeek V3.2), `nvidia/llama-3.3-70b-instruct`, `deepseek/deepseek-r1`
|
||
|
||
### ⚪ CEREBRAS (Free API Key — inference.cerebras.ai)
|
||
|
||
| Tier | Daily Limit | Rate Limit | Notes |
|
||
| ---- | ----------------- | ---------------- | ------------------------------------------- |
|
||
| Free | **1M tokens/day** | 60K TPM / 30 RPM | World's fastest LLM inference; resets daily |
|
||
|
||
Available free: `llama-3.3-70b`, `llama-3.1-8b`, `deepseek-r1-distill-llama-70b`
|
||
|
||
### 🔴 GROQ (Free API Key — console.groq.com)
|
||
|
||
| Tier | Daily Limit | Rate Limit | Notes |
|
||
| ---- | ------------- | ---------------- | ----------------------------------------- |
|
||
| Free | **14.4K RPD** | 30 RPM per model | No credit card; 429 on limit, not charged |
|
||
|
||
Available free: `llama-3.3-70b-versatile`, `gemma2-9b-it`, `mixtral-8x7b`, `whisper-large-v3`
|
||
|
||
### 🔴 LONGCAT AI (Free API Key — longcat.chat) 🆕
|
||
|
||
| Model | Prefix | Daily Free Quota | Notes |
|
||
| ----------------------------- | ------ | ----------------- | ----------------------- |
|
||
| `LongCat-Flash-Lite` | `lc/` | **50M tokens** 💥 | Largest free quota ever |
|
||
| `LongCat-Flash-Chat` | `lc/` | 500K tokens | Multi-turn chat |
|
||
| `LongCat-Flash-Thinking` | `lc/` | 500K tokens | Reasoning / CoT |
|
||
| `LongCat-Flash-Thinking-2601` | `lc/` | 500K tokens | Jan 2026 version |
|
||
| `LongCat-Flash-Omni-2603` | `lc/` | 500K tokens | Multimodal |
|
||
|
||
> 100% free while in public beta. Sign up at [longcat.chat](https://longcat.chat) with email or phone. Resets daily 00:00 UTC.
|
||
|
||
### 🟢 POLLINATIONS AI (No API Key Required) 🆕
|
||
|
||
| Model | Prefix | Rate Limit | Provider Behind |
|
||
| ---------- | ------ | ---------- | ------------------ |
|
||
| `openai` | `pol/` | 1 req/15s | GPT-5 |
|
||
| `claude` | `pol/` | 1 req/15s | Anthropic Claude |
|
||
| `gemini` | `pol/` | 1 req/15s | Google Gemini |
|
||
| `deepseek` | `pol/` | 1 req/15s | DeepSeek V3 |
|
||
| `llama` | `pol/` | 1 req/15s | Meta Llama 4 Scout |
|
||
| `mistral` | `pol/` | 1 req/15s | Mistral AI |
|
||
|
||
> ✨ **Zero friction:** No signup, no API key. Add the Pollinations provider with an empty key field and it works immediately.
|
||
|
||
### 🟠 CLOUDFLARE WORKERS AI (Free API Key — cloudflare.com) 🆕
|
||
|
||
| Tier | Daily Neurons | Equivalent Usage | Notes |
|
||
| ---- | ------------- | --------------------------------------- | ----------------------- |
|
||
| Free | **10,000** | ~150 LLM resp / 500s audio / 15K embeds | Global edge, 50+ models |
|
||
|
||
Popular free models: `@cf/meta/llama-3.3-70b-instruct`, `@cf/google/gemma-3-12b-it`, `@cf/openai/whisper-large-v3-turbo` (free audio!), `@cf/qwen/qwen2.5-coder-15b-instruct`
|
||
|
||
> Requires API Token + Account ID from [dash.cloudflare.com](https://dash.cloudflare.com). Store Account ID in provider settings.
|
||
|
||
### 🟣 SCALEWAY AI (1M Free Tokens — scaleway.com) 🆕
|
||
|
||
| Tier | Free Quota | Location | Notes |
|
||
| ---- | ------------- | ------------ | ----------------------------------- |
|
||
| Free | **1M tokens** | 🇫🇷 Paris, EU | No credit card needed within limits |
|
||
|
||
Available free: `qwen3-235b-a22b-instruct-2507` (Qwen3 235B!), `llama-3.1-70b-instruct`, `mistral-small-3.2-24b-instruct-2506`, `deepseek-v3-0324`
|
||
|
||
> EU/GDPR compliant. Get API key at [console.scaleway.com](https://console.scaleway.com).
|
||
|
||
> **💡 The Ultimate Free Stack (11 Providers, $0 Forever):**
|
||
>
|
||
> ```
|
||
> Kiro (kr/) → Claude Sonnet/Haiku UNLIMITED
|
||
> Qoder (if/) → kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 UNLIMITED
|
||
> LongCat Lite (lc/) → LongCat-Flash-Lite — 50M tokens/day 🔥
|
||
> Pollinations (pol/) → GPT-5, Claude, DeepSeek, Llama 4 — no key needed
|
||
> Qwen (qw/) → qwen3-coder models UNLIMITED
|
||
> Gemini (gemini/) → Gemini 2.5 Flash — 1,500 req/day free
|
||
> Cloudflare AI (cf/) → 50+ models — 10K Neurons/day
|
||
> Scaleway (scw/) → Qwen3 235B, Llama 70B — 1M free tokens (EU)
|
||
> Groq (groq/) → Llama/Gemma — 14.4K req/day ultra-fast
|
||
> NVIDIA NIM (nvidia/) → 70+ open models — 40 RPM forever
|
||
> Cerebras (cerebras/) → Llama/Qwen world-fastest — 1M tok/day
|
||
> ```
|
||
|
||
## 🎙️ Free Transcription Combo
|
||
|
||
> Transcribe any audio/video for **$0** — Deepgram leads with $200 free, AssemblyAI $50 fallback, Groq Whisper as unlimited emergency backup.
|
||
|
||
| Provider | Free Credits | Best Model | Rate Limit |
|
||
| ----------------- | ---------------------- | -------------------------------------------- | ---------------------------- |
|
||
| 🟢 **Deepgram** | **$200 free** (signup) | `nova-3` — best accuracy, 30+ languages | No RPM limit on free credits |
|
||
| 🔵 **AssemblyAI** | **$50 free** (signup) | `universal-3-pro` — chapters, sentiment, PII | No RPM limit on free credits |
|
||
| 🔴 **Groq** | **Free forever** | `whisper-large-v3` — OpenAI Whisper | 30 RPM (rate limited) |
|
||
|
||
**Suggested combo in `/dashboard/combos`:**
|
||
|
||
```
|
||
Name: free-transcription
|
||
Strategy: Priority
|
||
Nodes:
|
||
[1] deepgram/nova-3 → uses $200 free first
|
||
[2] assemblyai/universal-3-pro → fallback when Deepgram credits run out
|
||
[3] groq/whisper-large-v3 → free forever, emergency fallback
|
||
```
|
||
|
||
Then in `/dashboard/media` → **Transcription** tab: upload any audio or video file → select your combo endpoint → get transcription in supported formats.
|
||
|
||
## 💡 Key Features
|
||
|
||
OmniRoute v3.6 is built as an operational platform, not just a relay proxy.
|
||
|
||
### 🆕 New — v3.6.x Highlights (Apr 2026)
|
||
|
||
| Feature | What It Does |
|
||
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| 🌐 **V1 WebSocket Bridge** | OpenAI-compatible WebSocket traffic upgraded and proxied via `/v1/ws` — full streaming over WS with session auth (API key or session cookie) |
|
||
| 🔑 **Sync Tokens & Config Bundle** | Issue/revoke sync tokens for config sync endpoints. Config bundles versioned with ETag for bandwidth-efficient polling |
|
||
| 🧠 **GLM Thinking (glmt) Preset** | GLM Thinking registered first-class: 65 536 max tokens, 24 576 thinking budget, 900s timeout, usage sync & pricing — Claude-compatible API |
|
||
| 🔢 **Hybrid Token Counting** | Uses provider-side `/messages/count_tokens` when available; falls back to estimation — accurate usage tracking without guessing |
|
||
| 🌱 **Model Alias Auto-Seed** | 30+ cross-proxy dialect aliases normalised at startup — no more routing mismatches |
|
||
| 🛡️ **Safe Outbound Fetch** | All provider validation and model discovery go through a guarded fetch layer blocking private/local URLs with retry, timeout, and SSRF protection |
|
||
| ⏳ **Wait For Cooldown** | Server-side chat retries when every candidate connection is cooling down; configurable `enabled`, `maxRetries`, and `maxRetryWaitSec` |
|
||
| 🔍 **Runtime Env Validation** | Startup validates all env vars with Zod schemas — clear errors for missing secrets, invalid URLs, or wrong types |
|
||
| 📋 **Compliance Audit Expansion** | Structured audit logs with pagination, request context, auth events, provider CRUD events, and SSRF-blocked validation logging |
|
||
| 🔐 **TPS Log Metric** | Log details modal shows Tokens Per Second (TPS) — quick performance at-a-glance for every request |
|
||
| 🗑️ **Uninstall / Full Uninstall** | `npm run uninstall` keeps data, `npm run uninstall:full` removes everything — clean removal for all install methods |
|
||
| 🔧 **OAuth Env Repair** | One-click "Repair env" action for OAuth providers restores missing env vars and fixes broken auth state |
|
||
| 🔒 **Graceful Electron Shutdown** | Electron `before-quit` shuts down Next.js gracefully, preventing SQLite WAL database locks on desktop close |
|
||
| 👁️ **Model Visibility Toggle** | Per-model visibility toggle (👁 icon) with search filter and active-count badge (`N/M active`) on provider pages |
|
||
| 📧 **Email Privacy Masking** | OAuth account emails masked (`di*****@g****.com`), full address visible on hover |
|
||
| 🔗 **Context Relay Strategy** | Combo strategy preserving session continuity via structured handoff summaries when accounts rotate mid-conversation |
|
||
| 🛡️ **Proxy Hardening** | Token health check, API key validation, and undici dispatcher all honor proxy config |
|
||
| ⚠️ **Node.js 24 Login Warning** | Login page proactively detects incompatible Node.js versions and shows a clear warning banner |
|
||
| 📎 **Gemini PDF Attachments** | PDF attachments correctly routed to Gemini via `inline_data` and generic base64 detection |
|
||
| 🔒 **CodeQL Security Hardening** | Resolved SSRF, insecure randomness, polynomial ReDoS, and incomplete URL sanitization alerts |
|
||
|
||
### 🆕 New — ClawRouter-Inspired Improvements (Mar 2026)
|
||
|
||
| Feature | What It Does |
|
||
| ------------------------------------ | ------------------------------------------------------------------------------------------- |
|
||
| ⚡ **Grok-4 Fast Family** | xAI models at $0.20/$0.50/M — benchmarked 1143ms (30% faster than Gemini 2.5 Flash) |
|
||
| 🧠 **GLM-5 via Z.AI** | 128K output context, $0.5/1M — newest flagship from the GLM family |
|
||
| 🔮 **MiniMax M2.5** | Reasoning + agentic tasks at $0.30/1M — significant upgrade from M2.1 |
|
||
| 🎯 **toolCalling Flag per Model** | Per-model `toolCalling: true/false` in registry — AutoCombo skips non-tool-capable models |
|
||
| 🌍 **Multilingual Intent Detection** | PT/ZH/ES/AR keywords in AutoCombo scoring — better model selection for non-English content |
|
||
| 📊 **Benchmark-Driven Fallbacks** | Real p95 latency from live requests feeds combo scoring — AutoCombo learns from actual data |
|
||
| 🔁 **Request Deduplication** | Content-hash based dedup window — multi-agent safe, prevents duplicate charges |
|
||
| 🔌 **Pluggable RouterStrategy** | Extensible `RouterStrategy` interface — add custom routing logic as plugins |
|
||
|
||
### 🚀 Previous v2.0.9+ — Playground, CLI Fingerprints & ACP
|
||
|
||
| Feature | What It Does |
|
||
| ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| 🎮 **Model Playground** | Dashboard page to test any model directly — provider/model/endpoint selectors, Monaco Editor, streaming, abort, timing |
|
||
| 🔏 **CLI Fingerprint Matching** | Per-provider header/body ordering to match native CLI signatures — toggle per provider in Settings > Security. **Your proxy IP is preserved** |
|
||
| 🤝 **ACP Support (Agent Client Protocol)** | CLI agent discovery (Codex, Claude, Goose, Gemini CLI, OpenClaw + 9 more), process spawner, `/api/acp/agents` endpoint |
|
||
| 🤖 **ACP Agents Dashboard** | Debug › Agents page — grid of 14 agents with install status, version, custom agent form for any CLI tool. **OpenCode** users get a "Download opencode.json" button that auto-generates a ready-to-use config with all available models. |
|
||
| 🔧 **Custom Model `apiFormat` Routing** | Custom models with `apiFormat: "responses"` now correctly route to the Responses API translator |
|
||
| 🏢 **Codex Workspace Isolation** | Multiple Codex workspaces per email — OAuth correctly separates connections by workspace ID |
|
||
| 🔄 **Electron Auto-Update** | Desktop app checks for updates + auto-install on restart |
|
||
|
||
### 🤖 Agent & Protocol Operations (v2.0)
|
||
|
||
| Feature | What It Does |
|
||
| ------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| 🔧 **MCP Server (25 tools)** | IDE/agent tools via 3 transports: stdio, SSE (`/api/mcp/sse`), Streamable HTTP (`/api/mcp/stream`). 18 core + 3 memory + 4 skill tools |
|
||
| 🤝 **A2A Server (JSON-RPC + SSE)** | Agent-to-agent task execution with sync and streaming flows |
|
||
| 🧭 **Consolidated Endpoints Page** | Tabbed management page with Endpoint Proxy, MCP, A2A, and API Endpoints tabs |
|
||
| 🎚️ **Service Enable/Disable Toggles** | ON/OFF switches for MCP and A2A with settings persistence (default: OFF) |
|
||
| 🛰️ **MCP Runtime Heartbeat** | Real process status (pid, uptime, heartbeat age, transport, scope mode) |
|
||
| 📋 **MCP Audit Trail** | Filterable audit logs with success/failure and key attribution |
|
||
| 🔐 **MCP Scope Enforcement** | 10 granular scope permissions for controlled tool access |
|
||
| 📡 **A2A Task Lifecycle Management** | List/filter tasks, inspect events/artifacts, cancel running tasks |
|
||
| 📋 **Agent Card Discovery** | `/.well-known/agent.json` for client auto-discovery |
|
||
| 🧪 **Protocol E2E Test Harness** | Real MCP SDK + A2A client flows in `test:protocols:e2e` |
|
||
| ⚙️ **Operational Controls** | Switch combos, tune resilience settings, and review breaker state from dedicated Health and Settings surfaces |
|
||
|
||
### 🧠 Routing & Intelligence
|
||
|
||
| Feature | What It Does |
|
||
| ---------------------------------- | ------------------------------------------------------------------------ |
|
||
| 🎯 **Smart 4-Tier Fallback** | Auto-route: Subscription → API Key → Cheap → Free |
|
||
| 📊 **Real-Time Quota Tracking** | Live token count + reset countdown per provider |
|
||
| 🔄 **Format Translation** | OpenAI ↔ Claude ↔ Gemini ↔ Responses with schema-safe conversions |
|
||
| 👥 **Multi-Account Support** | Multiple accounts per provider with intelligent selection |
|
||
| 🔄 **Auto Token Refresh** | OAuth tokens refresh automatically with retry |
|
||
| 🎨 **Custom Combos** | 13 balancing strategies + fallback chain control |
|
||
| 🔗 **Context Relay** | Session continuity handoffs when account rotation happens mid-session |
|
||
| 🌐 **Wildcard Router** | `provider/*` dynamic routing |
|
||
| 🧠 **Thinking Budget Controls** | Passthrough, auto, custom, and adaptive reasoning limits |
|
||
| 🔀 **Model Aliases** | Built-in + custom model aliasing and migration safety |
|
||
| ⚡ **Background Degradation** | Route low-priority background tasks to cheaper models |
|
||
| 🧪 **Task-Aware Smart Routing** | Auto-select model by content type (coding/vision/analysis/summarization) |
|
||
| 🔄 **A2A Agent Workflows** | Deterministic FSM orchestrator for stateful multi-step agent executions |
|
||
| 🔀 **Adaptive Routing** | Dynamic strategy override based on token volume and prompt complexity |
|
||
| 🎲 **Provider Diversity** | Shannon entropy scoring balancing auto-combo traffic distribution |
|
||
| 💬 **System Prompt Injection** | Global behavior controls applied consistently |
|
||
| 📄 **Responses API Compatibility** | Full `/v1/responses` support for Codex and advanced agentic workflows |
|
||
|
||
### 🎵 Multi-Modal APIs
|
||
|
||
| Feature | What It Does |
|
||
| -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| 🖼️ **Image Generation** | `/v1/images/generations` with cloud and local backends |
|
||
| 📐 **Embeddings** | `/v1/embeddings` for search and RAG pipelines |
|
||
| 🎤 **Audio Transcription** | `/v1/audio/transcriptions` — 7 providers (Deepgram Nova 3, AssemblyAI, Groq Whisper, HuggingFace, ElevenLabs, OpenAI, Azure), auto-language detection, MP4/MP3/WAV support |
|
||
| 🔊 **Text-to-Speech** | `/v1/audio/speech` — 10 providers (ElevenLabs, OpenAI, Deepgram, Cartesia, PlayHT, HuggingFace, Nvidia NIM, Inworld, Coqui, Tortoise) with correct error messages |
|
||
| 🎬 **Video Generation** | `/v1/videos/generations` (ComfyUI + SD WebUI workflows) |
|
||
| 🎵 **Music Generation** | `/v1/music/generations` (ComfyUI workflows) |
|
||
| 🛡️ **Moderations** | `/v1/moderations` safety checks |
|
||
| 🔀 **Reranking** | `/v1/rerank` for relevance scoring |
|
||
| 🔍 **Web Search** 🆕 | `/v1/search` — 5 providers (Serper, Brave, Perplexity, Exa, Tavily), 6,500+ free/month, auto-failover, cache |
|
||
|
||
### 🛡️ Resilience, Security & Governance
|
||
|
||
| Feature | What It Does |
|
||
| ----------------------------------- | ------------------------------------------------------------------------------------------------------- |
|
||
| 🔌 **Provider Circuit Breakers** | Provider-wide trip/recover after fallback exhaustion with configurable thresholds |
|
||
| 🔒 **Daily Quota Lock** 🆕 | Detects exhaustion signals and locks routing for the specific model until midnight |
|
||
| 🎯 **Endpoint-Aware Models** | Custom models declare supported endpoints + API format |
|
||
| 🛡️ **Anti-Thundering Herd** | Mutex + semaphore protections on retry/rate events |
|
||
| 🧠 **Semantic + Signature Cache** | Cost/latency reduction with two cache layers |
|
||
| ⚡ **Request Idempotency** | Duplicate protection window |
|
||
| 🔒 **TLS Fingerprint Spoofing** | Browser-like TLS fingerprint — **reduces bot detection and account flagging** |
|
||
| 🔏 **CLI Fingerprint Matching** | Matches native CLI request signatures — **reduces ban risk while preserving proxy IP** |
|
||
| 🌐 **IP Filtering** | Allowlist/blocklist control for exposed deployments |
|
||
| 🚦 **Request Queue & Pacing** | Configurable per-connection request buckets for RPM, spacing, concurrency, and max wait |
|
||
| 📉 **Graceful Degradation** | Multi-layer capability fallbacks protecting core gateway operations |
|
||
| 📜 **Config Audit Trail** | Diff-based change tracking preventing operational drift with simple rollbacks |
|
||
| ⏳ **Provider Health Sync** | Proactive token expiration monitoring triggering alerts before authorization failures |
|
||
| ❄️ **Connection Cooldown** | Retryable 408/429/5xx failures cool down a single connection with optional upstream hints |
|
||
| 🚪 **Auto-Disable Banned Accounts** | Permanently blocked token accounts can be disabled automatically |
|
||
| 🔑 **API Key Management + Scoping** | Secure key issuance/rotation and model/provider controls |
|
||
| 👁️ **Scoped API Key Reveal** 🆕 | Opt-in recovery of API keys via `ALLOW_API_KEY_REVEAL` |
|
||
| 🛡️ **Protected `/models`** | Optional auth gating and provider hiding for model catalog |
|
||
| 🛡️ **Safe Outbound Fetch** 🆕 | Guarded fetch for provider calls — blocks private/local URLs, retries, SSRF protection |
|
||
| ⏳ **Wait For Cooldown** 🆕 | Auto-retry chat after connection cooldowns; configurable `enabled`, `maxRetries`, and `maxRetryWaitSec` |
|
||
| 🔍 **Runtime Env Validation** 🆕 | Zod-based env schema validation at startup with actionable error messages |
|
||
| 📋 **Compliance Audit v2** 🆕 | Pagination, request context, auth events, provider CRUD, and SSRF-blocked logging |
|
||
|
||
### 📊 Observability & Analytics
|
||
|
||
| Feature | What It Does |
|
||
| -------------------------------- | ----------------------------------------------------- |
|
||
| 📝 **Request + Proxy Logging** | Full request/response and proxy logging |
|
||
| 📉 **Streamed Detailed Logs** | Reconstructs SSE payload streams cleanly into the UI |
|
||
| 🏷️ **Real-Time Model Badges** 🆕 | Live model status and daily quota countdown timers |
|
||
| 📋 **Unified Logs Dashboard** | Request, proxy, audit, and console views in one page |
|
||
| 🔍 **Request Telemetry** | p50/p95/p99 latency and request tracing |
|
||
| 🏥 **Health Dashboard** | Uptime, breaker states, lockouts, cache stats |
|
||
| 💰 **Cost Tracking** | Budget controls and per-model pricing visibility |
|
||
| 📈 **Analytics Visualizations** | Model/provider usage insights and trend views |
|
||
| 🧪 **Evaluation Framework** | Golden set testing with configurable match strategies |
|
||
| 📡 **Live Diagnostics** 🆕 | Semantic cache bypass for accurate combo live testing |
|
||
| 🔐 **TPS Log Metric** 🆕 | Tokens Per Second badge in log details modal |
|
||
|
||
### ☁️ Deployment & Platform
|
||
|
||
| Feature | What It Does |
|
||
| ------------------------------ | --------------------------------------------------------------------- |
|
||
| 🌐 **Deploy Anywhere** | Localhost, VPS, Docker, Cloud environments |
|
||
| 🚇 **Cloudflare Tunnel** 🆕 | One-click Quick Tunnel integration from the dashboard |
|
||
| 🔑 **API Key Model Filtering** | Native /v1/models response filtered via assigned Bearer context roles |
|
||
| ⚡ **Smart Cache Bypass** | Configurable TTL heuristics and forced refetch controls |
|
||
| 🔄 **Backup/Restore** | Export/import and disaster recovery flows |
|
||
| 🧙 **Onboarding Wizard** | First-run guided setup |
|
||
| 🔧 **CLI Tools Dashboard** | One-click setup for popular coding tools |
|
||
| 🎮 **Model Playground** | Test any provider/model/endpoint from the dashboard |
|
||
| 🔏 **CLI Fingerprint Toggle** | Per-provider fingerprint matching in Settings > Security |
|
||
| 🌐 **i18n (30 languages)** | Full dashboard + docs language support with RTL coverage |
|
||
| 🧹 **Clear All Models** | One-click model list clearing in provider details |
|
||
| 👁️ **Sidebar Controls** 🆕 | Hide components and integrations from Appearance Settings |
|
||
| 📋 **Issue Templates** | Standardized GitHub templates for bugs and features |
|
||
| 📂 **Custom Data Directory** | `DATA_DIR` override for storage location |
|
||
| 🌐 **V1 WebSocket Bridge** 🆕 | OpenAI-compatible WebSocket traffic proxied via `/v1/ws` |
|
||
| 🔑 **Sync Tokens & Bundle** 🆕 | Config sync tokens + versioned bundle endpoint with ETag support |
|
||
|
||
### Feature Deep Dive
|
||
|
||
#### Smart fallback with practical cost control
|
||
|
||
```txt
|
||
Combo: "my-coding-stack"
|
||
1. cc/claude-opus-4-7
|
||
2. nvidia/llama-3.3-70b
|
||
3. glm/glm-4.7
|
||
4. if/kimi-k2-thinking
|
||
```
|
||
|
||
When quota, rate, or health fails, OmniRoute automatically moves to the next candidate without manual switching.
|
||
|
||
#### Protocol management that is visible and operable
|
||
|
||
- MCP + A2A are discoverable in UI and docs (not hidden)
|
||
- Protocol status APIs expose live operational data (`/api/mcp/*`, `/api/a2a/*`)
|
||
- Dashboards include actions for day-2 ops (combo toggles, breaker resets, task cancellation)
|
||
|
||
#### Translator + validation workflow
|
||
|
||
The Translator area includes:
|
||
|
||
- **Playground**: request transformation checks
|
||
- **Chat Tester**: full request/response round-trip
|
||
- **Test Bench**: multiple cases in one run
|
||
- **Live Monitor**: real-time traffic view
|
||
|
||
Plus protocol validation with real clients via `npm run test:protocols:e2e`.
|
||
|
||
> 📖 **[MCP Server README](open-sse/mcp-server/README.md)** — Tool reference, IDE configs, and client examples
|
||
>
|
||
> 📖 **[A2A Server README](src/lib/a2a/README.md)** — Skills, JSON-RPC methods, streaming, and task lifecycle
|
||
|
||
## 🧪 Evaluations (Evals)
|
||
|
||
OmniRoute includes a built-in evaluation framework to test LLM response quality against a golden set. Access it via **Analytics → Evals** in the dashboard.
|
||
|
||
### Built-in Golden Set
|
||
|
||
The pre-loaded "OmniRoute Golden Set" contains test cases for:
|
||
|
||
- Greetings, math, geography, code generation
|
||
- JSON format compliance, translation, markdown generation
|
||
- Safety refusal (harmful content), counting, boolean logic
|
||
|
||
### Evaluation Strategies
|
||
|
||
| Strategy | Description | Example |
|
||
| ---------- | ------------------------------------------------ | -------------------------------- |
|
||
| `exact` | Output must match exactly | `"4"` |
|
||
| `contains` | Output must contain substring (case-insensitive) | `"Paris"` |
|
||
| `regex` | Output must match regex pattern | `"1.*2.*3"` |
|
||
| `custom` | Custom JS function returns true/false | `(output) => output.length > 10` |
|
||
|
||
---
|
||
|
||
## 📖 Setup Guide
|
||
|
||
### Protocol Setup (MCP + A2A)
|
||
|
||
<details>
|
||
<summary><b>🧩 MCP Setup (Model Context Protocol)</b></summary>
|
||
|
||
Start MCP transport in stdio mode:
|
||
|
||
```bash
|
||
omniroute --mcp
|
||
```
|
||
|
||
Recommended validation flow:
|
||
|
||
1. Connect your MCP client over stdio.
|
||
2. Run `omniroute_get_health`.
|
||
3. Run `omniroute_list_combos`.
|
||
4. Open `/dashboard/mcp` to confirm heartbeat, activity, and audit.
|
||
|
||
Useful APIs for automation:
|
||
|
||
- `GET /api/mcp/status`
|
||
- `GET /api/mcp/tools`
|
||
- `GET /api/mcp/audit`
|
||
- `GET /api/mcp/audit/stats`
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🤝 A2A Setup (Agent2Agent)</b></summary>
|
||
|
||
Discover the agent:
|
||
|
||
```bash
|
||
curl http://localhost:20128/.well-known/agent.json
|
||
```
|
||
|
||
Send a task:
|
||
|
||
```bash
|
||
curl -X POST http://localhost:20128/a2a \
|
||
-H 'content-type: application/json' \
|
||
-d '{"jsonrpc":"2.0","id":"setup-a2a","method":"message/send","params":{"skill":"quota-management","messages":[{"role":"user","content":"Summarize quota status."}]}}'
|
||
```
|
||
|
||
Manage lifecycle:
|
||
|
||
- `GET /api/a2a/status`
|
||
- `GET /api/a2a/tasks`
|
||
- `GET /api/a2a/tasks/:id`
|
||
- `POST /api/a2a/tasks/:id/cancel`
|
||
|
||
Operational UI:
|
||
|
||
- `/dashboard/a2a` for task/state/stream observability and smoke actions
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🧪 End-to-end protocol validation</b></summary>
|
||
|
||
Validate both protocols with real clients:
|
||
|
||
```bash
|
||
npm run test:protocols:e2e
|
||
```
|
||
|
||
This verifies:
|
||
|
||
- MCP SDK client connect/list/call
|
||
- A2A discovery/send/stream/get/cancel
|
||
- Cross-check data in MCP audit and A2A task management APIs
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>💳 Subscription Providers</b></summary>
|
||
|
||
### Claude Code (Pro/Max)
|
||
|
||
```bash
|
||
Dashboard → Providers → Connect Claude Code
|
||
→ OAuth login → Auto token refresh
|
||
→ 5-hour + weekly quota tracking
|
||
|
||
Models:
|
||
cc/claude-opus-4-7
|
||
cc/claude-sonnet-4-5-20250929
|
||
cc/claude-haiku-4-5-20251001
|
||
```
|
||
|
||
**Pro Tip:** Use Opus for complex tasks, Sonnet for speed. OmniRoute tracks quota per model!
|
||
|
||
### OpenAI Codex (Plus/Pro)
|
||
|
||
```bash
|
||
Dashboard → Providers → Connect Codex
|
||
→ OAuth login (port 1455)
|
||
→ 5-hour + weekly reset
|
||
|
||
Models:
|
||
cx/gpt-5.2-codex
|
||
cx/gpt-5.1-codex-max
|
||
```
|
||
|
||
#### Codex Account Limit Management (5h + Weekly)
|
||
|
||
Each Codex account now has policy toggles in `Dashboard -> Providers`:
|
||
|
||
- `5h` (ON/OFF): enforce the 5-hour window threshold policy.
|
||
- `Weekly` (ON/OFF): enforce the weekly window threshold policy.
|
||
- Threshold behavior: when an enabled window reaches >=90% usage, that account is skipped.
|
||
- Rotation behavior: OmniRoute routes to the next eligible Codex account automatically.
|
||
- Reset behavior: when the provider `resetAt` time passes, the account becomes eligible again automatically.
|
||
|
||
Scenarios:
|
||
|
||
- `5h ON` + `Weekly ON`: account is skipped when either window reaches threshold.
|
||
- `5h OFF` + `Weekly ON`: only weekly usage can block the account.
|
||
- `5h ON` + `Weekly OFF`: only 5-hour usage can block the account.
|
||
- `resetAt` passed: account re-enters rotation automatically (no manual re-enable).
|
||
|
||
### Gemini CLI (FREE 180K/month!)
|
||
|
||
```bash
|
||
Dashboard → Providers → Connect Gemini CLI
|
||
→ Google OAuth
|
||
→ 180K completions/month + 1K/day
|
||
|
||
Models:
|
||
gc/gemini-3-flash-preview
|
||
gc/gemini-2.5-pro
|
||
```
|
||
|
||
**Best Value:** Huge free tier! Use this before paid tiers.
|
||
|
||
### GitHub Copilot
|
||
|
||
```bash
|
||
Dashboard → Providers → Connect GitHub
|
||
→ OAuth via GitHub
|
||
→ Monthly reset (1st of month)
|
||
|
||
Models:
|
||
gh/gpt-5
|
||
gh/claude-4.5-sonnet
|
||
gh/gemini-3.1-pro-preview
|
||
```
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🔑 API Key Providers</b></summary>
|
||
|
||
### NVIDIA NIM (FREE developer access — 70+ models)
|
||
|
||
1. Sign up: [build.nvidia.com](https://build.nvidia.com)
|
||
2. Get free API key (1000 inference credits included)
|
||
3. Dashboard → Add Provider → NVIDIA NIM:
|
||
- API Key: `nvapi-your-key`
|
||
|
||
**Models:** `nvidia/llama-3.3-70b-instruct`, `nvidia/mistral-7b-instruct`, and 50+ more
|
||
|
||
**Pro Tip:** OpenAI-compatible API — works seamlessly with OmniRoute's format translation!
|
||
|
||
### DeepSeek
|
||
|
||
1. Sign up: [platform.deepseek.com](https://platform.deepseek.com)
|
||
2. Get API key
|
||
3. Dashboard → Add Provider → DeepSeek
|
||
|
||
**Models:** `deepseek/deepseek-chat`, `deepseek/deepseek-coder`
|
||
|
||
### Groq (Free Tier Available!)
|
||
|
||
1. Sign up: [console.groq.com](https://console.groq.com)
|
||
2. Get API key (free tier included)
|
||
3. Dashboard → Add Provider → Groq
|
||
|
||
**Models:** `groq/llama-3.3-70b`, `groq/mixtral-8x7b`
|
||
|
||
**Pro Tip:** Ultra-fast inference — best for real-time coding!
|
||
|
||
### OpenRouter (100+ Models)
|
||
|
||
1. Sign up: [openrouter.ai](https://openrouter.ai)
|
||
2. Get API key
|
||
3. Dashboard → Add Provider → OpenRouter
|
||
|
||
**Models:** Access 100+ models from all major providers through a single API key.
|
||
|
||
**Dashboard behavior:** OpenRouter models are managed from **Available Models**. Manual add, import, and auto-sync all update the same list.
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>💰 Cheap Providers (Backup)</b></summary>
|
||
|
||
### GLM-4.7 (Daily reset, $0.6/1M)
|
||
|
||
1. Sign up: [Zhipu AI](https://open.bigmodel.cn/)
|
||
2. Get API key from Coding Plan
|
||
3. Dashboard → Add API Key:
|
||
- Provider: `glm`
|
||
- API Key: `your-key`
|
||
|
||
**Use:** `glm/glm-4.7`
|
||
|
||
**Pro Tip:** Coding Plan offers 3× quota at 1/7 cost! Reset daily 10:00 AM.
|
||
|
||
### MiniMax M2.1 (5h reset, $0.20/1M)
|
||
|
||
1. Sign up: [MiniMax](https://www.minimax.io/)
|
||
2. Get API key
|
||
3. Dashboard → Add API Key
|
||
|
||
**Use:** `minimax/MiniMax-M2.1`
|
||
|
||
**Pro Tip:** Cheapest option for long context (1M tokens)!
|
||
|
||
### Kimi K2 ($9/month flat)
|
||
|
||
1. Subscribe: [Moonshot AI](https://platform.moonshot.ai/)
|
||
2. Get API key
|
||
3. Dashboard → Add API Key
|
||
|
||
**Use:** `kimi/kimi-latest`
|
||
|
||
**Pro Tip:** Fixed $9/month for 10M tokens = $0.90/1M effective cost!
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🆓 FREE Providers (Emergency Backup)</b></summary>
|
||
|
||
### Qoder (5 FREE models via OAuth)
|
||
|
||
```bash
|
||
Dashboard → Connect Qoder
|
||
→ Qoder OAuth login
|
||
→ Unlimited usage
|
||
|
||
Models:
|
||
if/kimi-k2-thinking
|
||
if/qwen3-coder-plus
|
||
if/glm-4.7
|
||
if/minimax-m2
|
||
if/deepseek-r1
|
||
```
|
||
|
||
### Qwen (4 FREE models via Device Code)
|
||
|
||
```bash
|
||
Dashboard → Connect Qwen
|
||
→ Device code authorization
|
||
→ Unlimited usage
|
||
|
||
Models:
|
||
qw/qwen3-coder-plus
|
||
qw/qwen3-coder-flash
|
||
```
|
||
|
||
### Kiro (Claude FREE)
|
||
|
||
```bash
|
||
Dashboard → Connect Kiro
|
||
→ AWS Builder ID or Google/GitHub
|
||
→ Unlimited usage
|
||
|
||
Models:
|
||
kr/claude-sonnet-4.5
|
||
kr/claude-haiku-4.5
|
||
```
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🎨 Create Combos</b></summary>
|
||
|
||
### Example 1: Maximize Subscription → Cheap Backup
|
||
|
||
```
|
||
Dashboard → Combos → Create New
|
||
|
||
Name: premium-coding
|
||
Models:
|
||
1. cc/claude-opus-4-7 (Subscription primary)
|
||
2. glm/glm-4.7 (Cheap backup, $0.6/1M)
|
||
3. minimax/MiniMax-M2.1 (Cheapest fallback, $0.20/1M)
|
||
|
||
Use in CLI: premium-coding
|
||
```
|
||
|
||
### Example 2: Free-Only (Zero Cost)
|
||
|
||
```
|
||
Name: free-combo
|
||
Models:
|
||
1. gc/gemini-3-flash-preview (180K free/month)
|
||
2. if/kimi-k2-thinking (unlimited)
|
||
3. qw/qwen3-coder-plus (unlimited)
|
||
|
||
Cost: $0 forever!
|
||
```
|
||
|
||
</details>
|
||
|
||
<details>
|
||
<summary><b>🔧 CLI Integration</b></summary>
|
||
|
||
### Cursor IDE
|
||
|
||
```
|
||
Settings → Models → Advanced:
|
||
OpenAI API Base URL: http://localhost:20128/v1
|
||
OpenAI API Key: [from OmniRoute dashboard]
|
||
Model: cc/claude-opus-4-7
|
||
```
|
||
|
||
### Claude Code
|
||
|
||
Use the **CLI Tools** page in the dashboard for one-click configuration, or edit `~/.claude/settings.json` manually.
|
||
|
||
### Codex CLI
|
||
|
||
```bash
|
||
export OPENAI_BASE_URL="http://localhost:20128"
|
||
export OPENAI_API_KEY="your-omniroute-api-key"
|
||
|
||
codex "your prompt"
|
||
```
|
||
|
||
### OpenClaw
|
||
|
||
**Option 1 — Dashboard (recommended):**
|
||
|
||
```
|
||
Dashboard → CLI Tools → OpenClaw → Select Model → Apply
|
||
```
|
||
|
||
**Option 2 — Manual:** Edit `~/.openclaw/openclaw.json`:
|
||
|
||
```json
|
||
{
|
||
"models": {
|
||
"providers": {
|
||
"omniroute": {
|
||
"baseUrl": "http://127.0.0.1:20128/v1",
|
||
"apiKey": "sk_omniroute",
|
||
"api": "openai-completions"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
> **Note:** OpenClaw only works with local OmniRoute. Use `127.0.0.1` instead of `localhost` to avoid IPv6 resolution issues.
|
||
|
||
### Cline / Continue / RooCode
|
||
|
||
```
|
||
Settings → API Configuration:
|
||
Provider: OpenAI Compatible
|
||
Base URL: http://localhost:20128/v1
|
||
API Key: [from OmniRoute dashboard]
|
||
Model: if/kimi-k2-thinking
|
||
```
|
||
|
||
### OpenCode
|
||
|
||
**Step 1:** Add OmniRoute as a custom provider:
|
||
|
||
```bash
|
||
opencode
|
||
/connect
|
||
# Select "Other" → Enter ID: "omniroute" → Enter your OmniRoute API key
|
||
```
|
||
|
||
**Step 2:** Create/edit `opencode.json` in your project root:
|
||
|
||
```json
|
||
{
|
||
"$schema": "https://opencode.ai/config.json",
|
||
"provider": {
|
||
"omniroute": {
|
||
"npm": "@ai-sdk/openai-compatible",
|
||
"name": "OmniRoute",
|
||
"options": {
|
||
"baseURL": "http://localhost:20128/v1"
|
||
},
|
||
"models": {
|
||
"cc/claude-sonnet-4-20250514": { "name": "Claude Sonnet 4" },
|
||
"gg/gemini-2.5-pro": { "name": "Gemini 2.5 Pro" },
|
||
"if/kimi-k2-thinking": { "name": "Kimi K2 (Free)" }
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Step 3:** Select the model in OpenCode:
|
||
|
||
```bash
|
||
/models
|
||
# Select any OmniRoute model from the list
|
||
```
|
||
|
||
> **Tip:** Add any model available in your OmniRoute `/v1/models` endpoint to the `models` section. Use the format `provider/model-id` from your OmniRoute dashboard.
|
||
|
||
</details>
|
||
|
||
---
|
||
|
||
## Solución de Problemas
|
||
|
||
<details>
|
||
<summary><b>Click to expand troubleshooting guide</b></summary>
|
||
|
||
**"Language model did not provide messages"**
|
||
|
||
- Provider quota exhausted → Check dashboard quota tracker
|
||
- Solution: Use combo fallback or switch to cheaper tier
|
||
|
||
**Rate limiting**
|
||
|
||
- Subscription quota out → Fallback to GLM/MiniMax
|
||
- Add combo: `cc/claude-opus-4-7 → glm/glm-4.7 → if/kimi-k2-thinking`
|
||
|
||
**OAuth token expired**
|
||
|
||
- Auto-refreshed by OmniRoute
|
||
- If issues persist: Dashboard → Provider → Reconnect
|
||
|
||
**High costs**
|
||
|
||
- Check usage stats in Dashboard → Costs
|
||
- Switch primary model to GLM/MiniMax
|
||
- Use free tier (Gemini CLI, Qoder) for non-critical tasks
|
||
|
||
**Dashboard/API ports are wrong**
|
||
|
||
- `PORT` is the canonical base port (and API port by default)
|
||
- `API_PORT` overrides only OpenAI-compatible API listener
|
||
- `DASHBOARD_PORT` overrides only dashboard/Next.js listener
|
||
- Set `NEXT_PUBLIC_BASE_URL` to your dashboard/public URL (for OAuth callbacks)
|
||
|
||
**Cloud sync errors**
|
||
|
||
- Verify `BASE_URL` points to your running instance
|
||
- Verify `CLOUD_URL` points to your expected cloud endpoint
|
||
- Keep `NEXT_PUBLIC_*` values aligned with server-side values
|
||
|
||
**First login not working**
|
||
|
||
- Check `INITIAL_PASSWORD` in `.env`
|
||
- If unset, fallback password is `123456`
|
||
|
||
**No request logs**
|
||
|
||
- `call_logs` in SQLite stores summary metadata for the Request Logs table and analytics views
|
||
- Detailed request/response payloads are written to `DATA_DIR/call_logs/` as one JSON artifact per request
|
||
- Enable pipeline capture from Dashboard → Logs → Request Logs if you need detailed per-stage payloads
|
||
- `Export Logs` reads the artifact files on demand, while `Export All` includes the `call_logs/` directory alongside `storage.sqlite`
|
||
- Set `APP_LOG_TO_FILE=true` if you also want application console logs in `logs/application/app.log`
|
||
- Adjust `APP_LOG_MAX_FILE_SIZE`, `APP_LOG_RETENTION_DAYS`, `APP_LOG_MAX_FILES`, and `CALL_LOG_MAX_ENTRIES` as needed
|
||
|
||
**Connection test shows "Invalid" for OpenAI-compatible providers**
|
||
|
||
- Many providers don't expose a `/models` endpoint
|
||
- OmniRoute v1.0.6+ includes fallback validation via chat completions
|
||
- Ensure base URL includes `/v1` suffix
|
||
|
||
### 🔐 OAuth on a Remote Server
|
||
|
||
<a name="oauth-on-a-remote-server"></a>
|
||
<a name="oauth-em-servidor-remoto"></a>
|
||
|
||
> **⚠️ Important for users running OmniRoute on a VPS, Docker, or any remote server**
|
||
|
||
#### Why does Antigravity / Gemini CLI OAuth fail on remote servers?
|
||
|
||
The **Antigravity** and **Gemini CLI** providers use **Google OAuth 2.0**. Google requires the `redirect_uri` in the OAuth flow to exactly match one of the pre-registered URIs in the app's Google Cloud Console.
|
||
|
||
The OAuth credentials bundled in OmniRoute are registered **for `localhost` only**. When you access OmniRoute on a remote server (e.g. `https://omniroute.myserver.com`), Google rejects the authentication with:
|
||
|
||
```
|
||
Error 400: redirect_uri_mismatch
|
||
```
|
||
|
||
#### Solution: Configure your own OAuth credentials
|
||
|
||
You need to create an **OAuth 2.0 Client ID** in Google Cloud Console with your server's URI.
|
||
|
||
#### Step-by-step
|
||
|
||
**1. Open Google Cloud Console**
|
||
|
||
Go to: [https://console.cloud.google.com/apis/credentials](https://console.cloud.google.com/apis/credentials)
|
||
|
||
**2. Create a new OAuth 2.0 Client ID**
|
||
|
||
- Click **"+ Create Credentials"** → **"OAuth client ID"**
|
||
- Application type: **"Web application"**
|
||
- Name: anything you like (e.g. `OmniRoute Remote`)
|
||
|
||
**3. Add Authorized Redirect URIs**
|
||
|
||
In the **"Authorized redirect URIs"** field, add:
|
||
|
||
```
|
||
https://your-server.com/callback
|
||
```
|
||
|
||
> Replace `your-server.com` with your server's domain or IP (include the port if needed, e.g. `http://45.33.32.156:20128/callback`).
|
||
|
||
**4. Save and copy the credentials**
|
||
|
||
After creating, Google will show the **Client ID** and **Client Secret**.
|
||
|
||
**5. Set environment variables**
|
||
|
||
In your `.env` (or Docker environment variables):
|
||
|
||
```bash
|
||
# For Antigravity:
|
||
ANTIGRAVITY_OAUTH_CLIENT_ID=your-client-id.apps.googleusercontent.com
|
||
ANTIGRAVITY_OAUTH_CLIENT_SECRET=GOCSPX-your-secret
|
||
|
||
# For Gemini CLI:
|
||
GEMINI_OAUTH_CLIENT_ID=your-client-id.apps.googleusercontent.com
|
||
GEMINI_OAUTH_CLIENT_SECRET=GOCSPX-your-secret
|
||
GEMINI_CLI_OAUTH_CLIENT_SECRET=GOCSPX-your-secret
|
||
```
|
||
|
||
**6. Restart OmniRoute**
|
||
|
||
```bash
|
||
# npm:
|
||
npm run dev
|
||
|
||
# Docker:
|
||
docker restart omniroute
|
||
```
|
||
|
||
**7. Try connecting again**
|
||
|
||
Dashboard → Providers → Antigravity (or Gemini CLI) → OAuth
|
||
|
||
Google will now redirect correctly to `https://your-server.com/callback`.
|
||
|
||
---
|
||
|
||
#### Temporary workaround (without custom credentials)
|
||
|
||
If you don't want to set up your own credentials right now, you can still use the **manual URL flow**:
|
||
|
||
1. OmniRoute opens the Google authorization URL
|
||
2. After authorizing, Google tries to redirect to `localhost` (which fails on the remote server)
|
||
3. **Copy the full URL** from your browser's address bar (even if the page doesn't load)
|
||
4. Paste that URL into the field shown in the OmniRoute connection modal
|
||
5. Click **"Connect"**
|
||
|
||
> This works because the authorization code in the URL is valid regardless of whether the redirect page loaded.
|
||
|
||
---
|
||
|
||
<details>
|
||
<summary><b>🇧🇷 Versão em Português</b></summary>
|
||
|
||
#### Por que o OAuth do Antigravity / Gemini CLI falha em servidores remotos?
|
||
|
||
Os provedores **Antigravity** e **Gemini CLI** usam **Google OAuth 2.0** para autenticação. O Google exige que a `redirect_uri` usada no fluxo OAuth seja **exatamente** uma das URIs pré-cadastradas no Google Cloud Console do aplicativo.
|
||
|
||
As credenciais OAuth embutidas no OmniRoute estão cadastradas **apenas para `localhost`**. Quando você acessa o OmniRoute em um servidor remoto (ex: `https://omniroute.meuservidor.com`), o Google rejeita a autenticação com:
|
||
|
||
```
|
||
Error 400: redirect_uri_mismatch
|
||
```
|
||
|
||
#### Solução: Configure suas próprias credenciais OAuth
|
||
|
||
Você precisa criar um **OAuth 2.0 Client ID** no Google Cloud Console com a URI do seu servidor.
|
||
|
||
#### Passo a passo
|
||
|
||
**1. Acesse o Google Cloud Console**
|
||
|
||
Abra: [https://console.cloud.google.com/apis/credentials](https://console.cloud.google.com/apis/credentials)
|
||
|
||
**2. Crie um novo OAuth 2.0 Client ID**
|
||
|
||
- Clique em **"+ Create Credentials"** → **"OAuth client ID"**
|
||
- Tipo de aplicativo: **"Web application"**
|
||
- Nome: escolha qualquer nome (ex: `OmniRoute Remote`)
|
||
|
||
**3. Adicione as Authorized Redirect URIs**
|
||
|
||
No campo **"Authorized redirect URIs"**, adicione:
|
||
|
||
```
|
||
https://seu-servidor.com/callback
|
||
```
|
||
|
||
> Substitua `seu-servidor.com` pelo domínio ou IP do seu servidor (inclua a porta se necessário, ex: `http://45.33.32.156:20128/callback`).
|
||
|
||
**4. Salve e copie as credenciais**
|
||
|
||
Após criar, o Google mostrará o **Client ID** e o **Client Secret**.
|
||
|
||
**5. Configure as variáveis de ambiente**
|
||
|
||
No seu `.env` (ou nas variáveis de ambiente do Docker):
|
||
|
||
```bash
|
||
# Para Antigravity:
|
||
ANTIGRAVITY_OAUTH_CLIENT_ID=seu-client-id.apps.googleusercontent.com
|
||
ANTIGRAVITY_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret
|
||
|
||
# Para Gemini CLI:
|
||
GEMINI_OAUTH_CLIENT_ID=seu-client-id.apps.googleusercontent.com
|
||
GEMINI_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret
|
||
GEMINI_CLI_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret
|
||
```
|
||
|
||
**6. Reinicie o OmniRoute**
|
||
|
||
```bash
|
||
# Se usando npm:
|
||
npm run dev
|
||
|
||
# Se usando Docker:
|
||
docker restart omniroute
|
||
```
|
||
|
||
**7. Tente conectar novamente**
|
||
|
||
Dashboard → Providers → Antigravity (ou Gemini CLI) → OAuth
|
||
|
||
Agora o Google redirecionará corretamente para `https://seu-servidor.com/callback` e a autenticação funcionará.
|
||
|
||
---
|
||
|
||
#### Workaround temporário (sem configurar credenciais próprias)
|
||
|
||
Se não quiser criar credenciais próprias agora, ainda é possível usar o fluxo **manual de URL**:
|
||
|
||
1. O OmniRoute abrirá a URL de autorização do Google
|
||
2. Após você autorizar, o Google tentará redirecionar para `localhost` (que falha no servidor remoto)
|
||
3. **Copie a URL completa** da barra de endereço do seu browser (mesmo que a página não carregue)
|
||
4. Cole essa URL no campo que aparece no modal de conexão do OmniRoute
|
||
5. Clique em **"Connect"**
|
||
|
||
> Este workaround funciona porque o código de autorização na URL é válido independente do redirect ter carregado ou não.
|
||
|
||
</details>
|
||
|
||
---
|
||
|
||
</details>
|
||
|
||
## 🛠️ Tech Stack
|
||
|
||
<details>
|
||
<summary><b>Click to expand tech stack details</b></summary>
|
||
|
||
- **Runtime**: Node.js 18–22 LTS (⚠️ Node.js 24+ is **not supported** — `better-sqlite3` native binaries are incompatible)
|
||
- **Language**: TypeScript 5.9 — **100% TypeScript** across `src/` and `open-sse/` (zero `any` in core modules since v2.0)
|
||
- **Framework**: Next.js 16 + React 19 + Tailwind CSS 4
|
||
- **Database**: better-sqlite3 (SQLite) + LowDB (JSON legacy) — domain state, proxy logs, MCP audit, routing decisions, memory, skills
|
||
- **Schemas**: Zod (MCP tool I/O validation, API contracts)
|
||
- **Protocols**: MCP (stdio/HTTP) + A2A v0.3 (JSON-RPC 2.0 + SSE)
|
||
- **Streaming**: Server-Sent Events (SSE)
|
||
- **Auth**: OAuth 2.0 (PKCE) + JWT + API Keys + MCP Scoped Authorization
|
||
- **Testing**: Node.js test runner + Vitest (900+ tests including unit, integration, E2E)
|
||
- **CI/CD**: GitHub Actions (auto npm publish + Docker Hub on release)
|
||
- **Website**: [omniroute.online](https://omniroute.online)
|
||
- **Package**: [npmjs.com/package/omniroute](https://www.npmjs.com/package/omniroute)
|
||
- **Docker**: [hub.docker.com/r/diegosouzapw/omniroute](https://hub.docker.com/r/diegosouzapw/omniroute)
|
||
- **Resilience**: Circuit breaker, exponential backoff, anti-thundering herd, TLS spoofing, auto-combo self-healing
|
||
|
||
</details>
|
||
|
||
---
|
||
|
||
## Documentación
|
||
|
||
| Document | Description |
|
||
| -------------------------------------------------------- | --------------------------------------------------- |
|
||
| [User Guide](docs/USER_GUIDE.md) | Providers, combos, CLI integration, deployment |
|
||
| [API Reference](docs/API_REFERENCE.md) | All endpoints with examples |
|
||
| [MCP Server](open-sse/mcp-server/README.md) | 25 MCP tools, IDE configs, Python/TS/Go clients |
|
||
| [A2A Server](src/lib/a2a/README.md) | JSON-RPC 2.0 protocol, skills, streaming, task mgmt |
|
||
| [Auto-Combo Engine](docs/auto-combo.md) | 6-factor scoring, mode packs, self-healing |
|
||
| [Context Relay](docs/features/context-relay.md) | Session handoff strategy for account rotation |
|
||
| [Troubleshooting](docs/TROUBLESHOOTING.md) | Common problems and solutions |
|
||
| [Architecture](docs/ARCHITECTURE.md) | System architecture and internals |
|
||
| [Codebase Documentation](docs/CODEBASE_DOCUMENTATION.md) | Beginner-friendly codebase walkthrough |
|
||
| [Uninstall Guide](docs/UNINSTALL.md) | Clean removal for all install methods |
|
||
| [Environment Config](docs/ENVIRONMENT.md) | Complete `.env` variables and references |
|
||
| [Contributing](CONTRIBUTING.md) | Development setup and guidelines |
|
||
| [OpenAPI Spec](docs/openapi.yaml) | OpenAPI 3.0 specification |
|
||
| [Security Policy](SECURITY.md) | Vulnerability reporting and security practices |
|
||
| [VM Deployment](docs/VM_DEPLOYMENT_GUIDE.md) | Complete guide: VM + nginx + Cloudflare setup |
|
||
| [Features Gallery](docs/FEATURES.md) | Visual dashboard tour with screenshots |
|
||
| [Release Checklist](docs/RELEASE_CHECKLIST.md) | Pre-release validation steps |
|
||
|
||
---
|
||
|
||
## 🗺️ Roadmap
|
||
|
||
OmniRoute has **218+ features planned** across multiple development phases. Here are the key areas:
|
||
|
||
| Category | Planned Features | Highlights |
|
||
| ----------------------------- | ---------------- | ----------------------------------------------------------------------------------------------------- |
|
||
| 🧠 **Routing & Intelligence** | 25+ | Lowest-latency routing, tag-based routing, quota preflight, quota-aware P2C, step-based combo routing |
|
||
| 🔒 **Security & Compliance** | 20+ | SSRF hardening, credential cloaking, rate-limit per endpoint, management key scoping |
|
||
| 📊 **Observability** | 15+ | OpenTelemetry integration, real-time quota monitoring, combo target health, cost tracking per model |
|
||
| 🔄 **Provider Integrations** | 20+ | Dynamic model registry, connection cooldowns, multi-account Codex, Copilot quota parsing |
|
||
| ⚡ **Performance** | 15+ | Dual cache layer, prompt cache, response cache, streaming keepalive, batch API |
|
||
| 🌐 **Ecosystem** | 10+ | WebSocket API, config hot-reload, distributed config store, commercial mode |
|
||
|
||
### 🔜 Coming Soon
|
||
|
||
- 🔗 **OpenCode Integration** — Native provider support for the OpenCode AI coding IDE
|
||
- 🔗 **TRAE Integration** — Full support for the TRAE AI development framework
|
||
- 📦 **Batch API** — Asynchronous batch processing for bulk requests
|
||
- 🎯 **Tag-Based Routing** — Route requests based on custom tags and metadata
|
||
- 💰 **Lowest-Cost Strategy** — Automatically select the cheapest available provider
|
||
|
||
> 📝 Full feature specifications available in [`docs/new-features/`](docs/new-features/) (217 detailed specs)
|
||
|
||
---
|
||
|
||
## 👥 Contributors
|
||
|
||
[](https://github.com/diegosouzapw/OmniRoute/graphs/contributors)
|
||
|
||
### How to Contribute
|
||
|
||
1. Fork the repository
|
||
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
|
||
3. Commit your changes (`git commit -m 'Add amazing feature'`)
|
||
4. Push to the branch (`git push origin feature/amazing-feature`)
|
||
5. Open a Pull Request
|
||
|
||
See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
|
||
|
||
### Releasing a New Version
|
||
|
||
```bash
|
||
# Create a release — npm publish happens automatically
|
||
gh release create v2.0.0 --title "v2.0.0" --generate-notes
|
||
```
|
||
|
||
---
|
||
|
||
## 📊 Star History
|
||
|
||
<a href="https://www.star-history.com/?repos=diegosouzapw%2Fomniroute&type=date&legend=top-left">
|
||
<picture>
|
||
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/chart?repos=diegosouzapw/omniroute&type=date&theme=dark&legend=top-left" />
|
||
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/chart?repos=diegosouzapw/omniroute&type=date&legend=top-left" />
|
||
<img alt="Star History Chart" src="https://api.star-history.com/chart?repos=diegosouzapw/omniroute&type=date&legend=top-left" />
|
||
</picture>
|
||
</a>
|
||
|
||
## 🌍 StarMapper
|
||
|
||
<a href="https://starmapper.bruniaux.com/diegosouzapw/omniroute">
|
||
<picture>
|
||
<source media="(prefers-color-scheme: dark)" srcset="https://starmapper.bruniaux.com/api/map-image/diegosouzapw/omniroute?theme=dark" />
|
||
<source media="(prefers-color-scheme: light)" srcset="https://starmapper.bruniaux.com/api/map-image/diegosouzapw/omniroute?theme=light" />
|
||
<img alt="StarMapper" src="https://starmapper.bruniaux.com/api/map-image/diegosouzapw/omniroute" />
|
||
</picture>
|
||
</a>
|
||
|
||
## 🙏 Acknowledgments
|
||
|
||
Special thanks to **[9router](https://github.com/decolua/9router)** by **[decolua](https://github.com/decolua)** — the original project that inspired this fork. OmniRoute builds upon that incredible foundation with additional features, multi-modal APIs, and a full TypeScript rewrite.
|
||
|
||
Special thanks to **[CLIProxyAPI](https://github.com/router-for-me/CLIProxyAPI)** — the original Go implementation that inspired this JavaScript port.
|
||
|
||
---
|
||
|
||
## Licencia
|
||
|
||
MIT License - see [LICENSE](LICENSE) for details.
|
||
|
||
---
|
||
|
||
<div align="center">
|
||
<sub>Built with ❤️ for developers who code 24/7</sub>
|
||
<br/>
|
||
<sub><a href="https://omniroute.online">omniroute.online</a></sub>
|
||
</div>
|
||
<!-- GitHub Discussions enabled for community Q&A -->
|