# ๐Ÿš€ OmniRoute โ€” The Free AI Gateway (เฆฌเฆพเฆ‚เฆฒเฆพ) ๐ŸŒ **Languages:** ๐Ÿ‡บ๐Ÿ‡ธ [English](../../../README.md) ยท ๐Ÿ‡ธ๐Ÿ‡ฆ [ar](../ar/README.md) ยท ๐Ÿ‡ง๐Ÿ‡ฌ [bg](../bg/README.md) ยท ๐Ÿ‡ง๐Ÿ‡ฉ [bn](../bn/README.md) ยท ๐Ÿ‡จ๐Ÿ‡ฟ [cs](../cs/README.md) ยท ๐Ÿ‡ฉ๐Ÿ‡ฐ [da](../da/README.md) ยท ๐Ÿ‡ฉ๐Ÿ‡ช [de](../de/README.md) ยท ๐Ÿ‡ช๐Ÿ‡ธ [es](../es/README.md) ยท ๐Ÿ‡ฎ๐Ÿ‡ท [fa](../fa/README.md) ยท ๐Ÿ‡ซ๐Ÿ‡ฎ [fi](../fi/README.md) ยท ๐Ÿ‡ซ๐Ÿ‡ท [fr](../fr/README.md) ยท ๐Ÿ‡ฎ๐Ÿ‡ณ [gu](../gu/README.md) ยท ๐Ÿ‡ฎ๐Ÿ‡ฑ [he](../he/README.md) ยท ๐Ÿ‡ฎ๐Ÿ‡ณ [hi](../hi/README.md) ยท ๐Ÿ‡ญ๐Ÿ‡บ [hu](../hu/README.md) ยท ๐Ÿ‡ฎ๐Ÿ‡ฉ [id](../id/README.md) ยท ๐Ÿ‡ฎ๐Ÿ‡น [it](../it/README.md) ยท ๐Ÿ‡ฏ๐Ÿ‡ต [ja](../ja/README.md) ยท ๐Ÿ‡ฐ๐Ÿ‡ท [ko](../ko/README.md) ยท ๐Ÿ‡ฎ๐Ÿ‡ณ [mr](../mr/README.md) ยท ๐Ÿ‡ฒ๐Ÿ‡พ [ms](../ms/README.md) ยท ๐Ÿ‡ณ๐Ÿ‡ฑ [nl](../nl/README.md) ยท ๐Ÿ‡ณ๐Ÿ‡ด [no](../no/README.md) ยท ๐Ÿ‡ต๐Ÿ‡ญ [phi](../phi/README.md) ยท ๐Ÿ‡ต๐Ÿ‡ฑ [pl](../pl/README.md) ยท ๐Ÿ‡ต๐Ÿ‡น [pt](../pt/README.md) ยท ๐Ÿ‡ง๐Ÿ‡ท [pt-BR](../pt-BR/README.md) ยท ๐Ÿ‡ท๐Ÿ‡ด [ro](../ro/README.md) ยท ๐Ÿ‡ท๐Ÿ‡บ [ru](../ru/README.md) ยท ๐Ÿ‡ธ๐Ÿ‡ฐ [sk](../sk/README.md) ยท ๐Ÿ‡ธ๐Ÿ‡ช [sv](../sv/README.md) ยท ๐Ÿ‡ฐ๐Ÿ‡ช [sw](../sw/README.md) ยท ๐Ÿ‡ฎ๐Ÿ‡ณ [ta](../ta/README.md) ยท ๐Ÿ‡ฎ๐Ÿ‡ณ [te](../te/README.md) ยท ๐Ÿ‡น๐Ÿ‡ญ [th](../th/README.md) ยท ๐Ÿ‡น๐Ÿ‡ท [tr](../tr/README.md) ยท ๐Ÿ‡บ๐Ÿ‡ฆ [uk-UA](../uk-UA/README.md) ยท ๐Ÿ‡ต๐Ÿ‡ฐ [ur](../ur/README.md) ยท ๐Ÿ‡ป๐Ÿ‡ณ [vi](../vi/README.md) ยท ๐Ÿ‡จ๐Ÿ‡ณ [zh-CN](../zh-CN/README.md) --- ### Never stop coding. Smart routing to **FREE & low-cost AI models** with automatic fallback. _Your universal API proxy โ€” one endpoint, 100+ providers, zero downtime. Now with **MCP Server (25 tools)**, **A2A Protocol**, **Memory/Skills Systems** & **Electron Desktop App**._ **Chat Completions โ€ข Embeddings โ€ข Image Generation โ€ข Video โ€ข Music โ€ข Audio โ€ข Reranking โ€ข **Web Search** โ€ข MCP Server โ€ข A2A Protocol โ€ข 100% TypeScript** ---
[![npm version](https://img.shields.io/npm/v/omniroute?color=cb3837&logo=npm)](https://www.npmjs.com/package/omniroute) [![Docker Hub](https://img.shields.io/docker/v/diegosouzapw/omniroute?label=Docker%20Hub&logo=docker&color=2496ED)](https://hub.docker.com/r/diegosouzapw/omniroute) ![NPM Downloads](https://img.shields.io/npm/dw/omniroute?label=npm%20down%20week&color=red) ![NPM Downloads](https://img.shields.io/npm/dm/omniroute?label=npm%20down%20month&color=red) ![NPM Downloads](https://img.shields.io/npm/d18m/omniroute?label=npm%20down%20year&color=red) ![Docker Pulls](https://img.shields.io/docker/pulls/diegosouzapw/omniroute) ![GitHub Downloads (all assets, all releases)](https://img.shields.io/github/downloads/diegosouzapw/omniroute/total?style=flat&label=eletron%20donwloads&color=blue) [![stars](https://custom-icon-badges.demolab.com/github/stars/diegosouzapw/OmniRoute?logo=star&style=flat)](https://github.com/diegosouzapw/OmniRoute/stargazers) [![open issues](https://custom-icon-badges.demolab.com/github/issues-raw/diegosouzapw/OmniRoute?logo=issue)](https://github.com/diegosouzapw/OmniRoute/issues) [![license](https://custom-icon-badges.demolab.com/github/license/diegosouzapw/OmniRoute?logo=law)](https://github.com/diegosouzapw/OmniRoute/blob/main/LICENSE) [![last commit](https://custom-icon-badges.demolab.com/github/last-commit/diegosouzapw/OmniRoute?logo=history&logoColor=white)](https://github.com/diegosouzapw/OmniRoute/commits/main) [![total contributions](https://custom-icon-badges.demolab.com/badge/dynamic/json?logo=graph&logoColor=fff&color=blue&label=total%20contributions&query=%24.totalContributions&url=https%3A%2F%2Fstreak-stats.demolab.com%2F%3Fuser%3Ddiegosouzapw%26type%3Djson)](https://github.com/diegosouzapw) [![code size](https://custom-icon-badges.demolab.com/github/languages/code-size/diegosouzapw/OmniRoute?logo=file-code&logoColor=white)](https://github.com/diegosouzapw/OmniRoute) [![pr closed](https://custom-icon-badges.demolab.com/github/issues-pr-closed/diegosouzapw/OmniRoute?color=purple&logo=git-pull-request&logoColor=white)](https://github.com/diegosouzapw/OmniRoute/pulls?q=is%3Apr+is%3Aclosed) [![tag](https://custom-icon-badges.demolab.com/github/v/tag/diegosouzapw/OmniRoute?logo=tag&logoColor=white)](https://github.com/diegosouzapw/OmniRoute/tags) [![github streak](https://custom-icon-badges.demolab.com/badge/dynamic/json?logo=fire&logoColor=fff&color=orange&label=github%20streak&query=%24.currentStreak.length&suffix=%20days&url=https%3A%2F%2Fstreak-stats.demolab.com%2F%3Fuser%3Ddiegosouzapw%26type%3Djson)](https://github.com/diegosouzapw) [![followers](https://custom-icon-badges.demolab.com/github/followers/diegosouzapw?logo=person-add)](https://github.com/diegosouzapw?tab=followers) [![fork](https://custom-icon-badges.demolab.com/github/forks/diegosouzapw/OmniRoute?logo=fork)](https://github.com/diegosouzapw/OmniRoute/network/members) [![watch](https://custom-icon-badges.demolab.com/github/watchers/diegosouzapw/OmniRoute?logo=eye)](https://github.com/diegosouzapw/OmniRoute/watchers) [![License](https://img.shields.io/github/license/diegosouzapw/OmniRoute)](https://github.com/diegosouzapw/OmniRoute/blob/main/LICENSE) [![Website](https://img.shields.io/badge/Website-omniroute.online-blue?logo=google-chrome&logoColor=white)](https://omniroute.online) [![WhatsApp](https://img.shields.io/badge/WhatsApp-Community-25D366?logo=whatsapp&logoColor=white)](https://chat.whatsapp.com/JI7cDQ1GyaiDHhVBpLxf8b?mode=gi_t) [๐ŸŒ Website](https://omniroute.online) โ€ข [๐Ÿš€ Quick Start](#-quick-start) โ€ข [๐Ÿ’ก Features](#-key-features) โ€ข [๐Ÿ“– Docs](#-documentation) โ€ข [๐Ÿ’ฐ Pricing](#-pricing-at-a-glance) โ€ข [๐Ÿ’ฌ WhatsApp](https://chat.whatsapp.com/JI7cDQ1GyaiDHhVBpLxf8b?mode=gi_t)
๐ŸŒ **Available in:** ๐Ÿ‡บ๐Ÿ‡ธ [English](README.md) | ๐Ÿ‡ง๐Ÿ‡ท [Portuguรชs (Brasil)](docs/i18n/pt-BR/README.md) | ๐Ÿ‡ช๐Ÿ‡ธ [Espaรฑol](docs/i18n/es/README.md) | ๐Ÿ‡ซ๐Ÿ‡ท [Franรงais](docs/i18n/fr/README.md) | ๐Ÿ‡ฎ๐Ÿ‡น [Italiano](docs/i18n/it/README.md) | ๐Ÿ‡ท๐Ÿ‡บ [ะ ัƒััะบะธะน](docs/i18n/ru/README.md) | ๐Ÿ‡จ๐Ÿ‡ณ [ไธญๆ–‡ (็ฎ€ไฝ“)](docs/i18n/zh-CN/README.md) | ๐Ÿ‡ฉ๐Ÿ‡ช [Deutsch](docs/i18n/de/README.md) | ๐Ÿ‡ฎ๐Ÿ‡ณ [เคนเคฟเคจเฅเคฆเฅ€](docs/i18n/in/README.md) | ๐Ÿ‡น๐Ÿ‡ญ [เน„เธ—เธข](docs/i18n/th/README.md) | ๐Ÿ‡บ๐Ÿ‡ฆ [ะฃะบั€ะฐั—ะฝััŒะบะฐ](docs/i18n/uk-UA/README.md) | ๐Ÿ‡ธ๐Ÿ‡ฆ [ุงู„ุนุฑุจูŠุฉ](docs/i18n/ar/README.md) | ๐Ÿ‡ฏ๐Ÿ‡ต [ๆ—ฅๆœฌ่ชž](docs/i18n/ja/README.md) | ๐Ÿ‡ป๐Ÿ‡ณ [Tiแบฟng Viแป‡t](docs/i18n/vi/README.md) | ๐Ÿ‡ง๐Ÿ‡ฌ [ะ‘ัŠะปะณะฐั€ัะบะธ](docs/i18n/bg/README.md) | ๐Ÿ‡ฉ๐Ÿ‡ฐ [Dansk](docs/i18n/da/README.md) | ๐Ÿ‡ซ๐Ÿ‡ฎ [Suomi](docs/i18n/fi/README.md) | ๐Ÿ‡ฎ๐Ÿ‡ฑ [ืขื‘ืจื™ืช](docs/i18n/he/README.md) | ๐Ÿ‡ญ๐Ÿ‡บ [Magyar](docs/i18n/hu/README.md) | ๐Ÿ‡ฎ๐Ÿ‡ฉ [Bahasa Indonesia](docs/i18n/id/README.md) | ๐Ÿ‡ฐ๐Ÿ‡ท [ํ•œ๊ตญ์–ด](docs/i18n/ko/README.md) | ๐Ÿ‡ฒ๐Ÿ‡พ [Bahasa Melayu](docs/i18n/ms/README.md) | ๐Ÿ‡ณ๐Ÿ‡ฑ [Nederlands](docs/i18n/nl/README.md) | ๐Ÿ‡ณ๐Ÿ‡ด [Norsk](docs/i18n/no/README.md) | ๐Ÿ‡ต๐Ÿ‡น [Portuguรชs (Portugal)](docs/i18n/pt/README.md) | ๐Ÿ‡ท๐Ÿ‡ด [Romรขnฤƒ](docs/i18n/ro/README.md) | ๐Ÿ‡ต๐Ÿ‡ฑ [Polski](docs/i18n/pl/README.md) | ๐Ÿ‡ธ๐Ÿ‡ฐ [Slovenฤina](docs/i18n/sk/README.md) | ๐Ÿ‡ธ๐Ÿ‡ช [Svenska](docs/i18n/sv/README.md) | ๐Ÿ‡ต๐Ÿ‡ญ [Filipino](docs/i18n/phi/README.md) | ๐Ÿ‡จ๐Ÿ‡ฟ [ฤŒeลกtina](docs/i18n/cs/README.md) --- ## ๐Ÿ–ผ๏ธ Main Dashboard
OmniRoute Dashboard
--- ## ๐Ÿ“ธ Dashboard Preview
Click to see dashboard screenshots | Page | Screenshot | | -------------- | ------------------------------------------------- | | **Providers** | ![Providers](docs/screenshots/01-providers.png) | | **Combos** | ![Combos](docs/screenshots/02-combos.png) | | **Analytics** | ![Analytics](docs/screenshots/03-analytics.png) | | **Health** | ![Health](docs/screenshots/04-health.png) | | **Translator** | ![Translator](docs/screenshots/05-translator.png) | | **Settings** | ![Settings](docs/screenshots/06-settings.png) | | **CLI Tools** | ![CLI Tools](docs/screenshots/07-cli-tools.png) | | **Usage Logs** | ![Usage](docs/screenshots/08-usage.png) | | **Endpoints** | ![Endpoints](docs/screenshots/09-endpoint.png) |
--- ### ๐Ÿค– Free AI Provider for your favorite coding agents _Connect any AI-powered IDE or CLI tool through OmniRoute โ€” free API gateway for unlimited coding._
OpenClaw
OpenClaw

โญ 205K
NanoBot
NanoBot

โญ 20.9K
PicoClaw
PicoClaw

โญ 14.6K
ZeroClaw
ZeroClaw

โญ 9.9K
IronClaw
IronClaw

โญ 2.1K
OpenCode
OpenCode

โญ 106K
Codex CLI
Codex CLI

โญ 60.8K
Claude Code
Claude Code

โญ 67.3K
Gemini CLI
Gemini CLI

โญ 94.7K
Kilo Code
Kilo Code

โญ 15.5K
๐Ÿ“ก All agents connect via http://localhost:20128/v1 or http://cloud.omniroute.online/v1 โ€” one config, unlimited models and quota --- ## ๐Ÿค” Why OmniRoute? **Stop wasting money and hitting limits:** - Subscription quota expires unused every month - Rate limits stop you mid-coding - Expensive APIs ($20-50/month per provider) - Manual switching between providers **OmniRoute solves this:** - โœ… **Maximize subscriptions** - Track quota, use every bit before reset - โœ… **Auto fallback** - Subscription โ†’ API Key โ†’ Cheap โ†’ Free, zero downtime - โœ… **Multi-account** - Round-robin between accounts per provider - โœ… **Universal** - Works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool --- ## ๐Ÿ“ง Support > ๐Ÿ’ฌ **Join our community!** [WhatsApp Group](https://chat.whatsapp.com/JI7cDQ1GyaiDHhVBpLxf8b?mode=gi_t) โ€” Get help, share tips, and stay updated. - **Website**: [omniroute.online](https://omniroute.online) - **GitHub**: [github.com/diegosouzapw/OmniRoute](https://github.com/diegosouzapw/OmniRoute) - **Issues**: [github.com/diegosouzapw/OmniRoute/issues](https://github.com/diegosouzapw/OmniRoute/issues) - **WhatsApp**: [Community Group](https://chat.whatsapp.com/JI7cDQ1GyaiDHhVBpLxf8b?mode=gi_t) - **Contributing**: See [CONTRIBUTING.md](CONTRIBUTING.md), open a PR, or pick a `good first issue` - **Original Project**: [9router by decolua](https://github.com/decolua/9router) ### ๐Ÿ› Reporting a Bug? When opening an issue, please run the system-info command and attach the generated file: ```bash npm run system-info ``` This generates a `system-info.txt` with your Node.js version, OmniRoute version, OS details, installed CLI tools (qoder, gemini, claude, codex, antigravity, droid, etc.), Docker/PM2 status, and system packages โ€” everything we need to reproduce your issue quickly. Attach the file directly to your GitHub issue. --- ## ๐Ÿ”„ How It Works ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Your CLI โ”‚ (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...) โ”‚ Tool โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ http://localhost:20128/v1 โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ OmniRoute (Smart Router) โ”‚ โ”‚ โ€ข Format translation (OpenAI โ†” Claude) โ”‚ โ”‚ โ€ข Quota tracking + Embeddings + Images โ”‚ โ”‚ โ€ข Auto token refresh โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”œโ”€โ†’ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI โ”‚ โ†“ quota exhausted โ”œโ”€โ†’ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc. โ”‚ โ†“ budget limit โ”œโ”€โ†’ [Tier 3: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M) โ”‚ โ†“ budget limit โ””โ”€โ†’ [Tier 4: FREE] Qoder, Qwen, Kiro (unlimited) Result: Never stop coding, minimal cost ``` --- ## ๐ŸŽฏ What OmniRoute Solves โ€” 30 Real Pain Points & Use Cases > **Every developer using AI tools faces these problems daily.** OmniRoute was built to solve them all โ€” from cost overruns to regional blocks, from broken OAuth flows to protocol operations and enterprise observability.
๐Ÿ’ธ 1. "I pay for an expensive subscription but still get interrupted by limits" Developers pay $20โ€“200/month for Claude Pro, Codex Pro, or GitHub Copilot. Even paying, quota has a ceiling โ€” 5h of usage, weekly limits, or per-minute rate limits. Mid-coding session, the provider stops responding and the developer loses flow and productivity. **How OmniRoute solves it:** - **Smart 4-Tier Fallback** โ€” If subscription quota runs out, automatically redirects to API Key โ†’ Cheap โ†’ Free with zero manual intervention - **Provider Limits Tracking** โ€” Cached quota snapshots refresh on a server-side schedule (default `PROVIDER_LIMITS_SYNC_INTERVAL_MINUTES=70`) with manual refresh available in the UI - **Multi-Account Support** โ€” Multiple accounts per provider with auto round-robin โ€” when one runs out, switches to the next - **Custom Combos** โ€” Customizable fallback chains with 13 balancing strategies (priority, weighted, fill-first, round-robin, P2C, random, least-used, cost-optimized, strict-random, auto, lkgp, context-optimized, **context-relay**) - **Structured Combo Builder** โ€” Build combos step-by-step with explicit provider + model + account selection, including repeated providers and fixed-account targets - **Quota-Aware P2C** โ€” Power-of-two account selection now factors quota headroom, backoff, recent errors, and consecutive use - **Codex Business Quotas** โ€” Business/Team workspace quota monitoring directly in the dashboard
๐Ÿ”Œ 2. "I need to use multiple providers but each has a different API" OpenAI uses one format, Claude (Anthropic) uses another, Gemini yet another. If a dev wants to test models from different providers or fallback between them, they need to reconfigure SDKs, change endpoints, deal with incompatible formats. Custom providers (FriendLI, NIM) have non-standard model endpoints. **How OmniRoute solves it:** - **Unified Endpoint** โ€” A single `http://localhost:20128/v1` serves as proxy for all 100+ providers - **Format Translation** โ€” Automatic and transparent: OpenAI โ†” Claude โ†” Gemini โ†” Responses API - **Response Sanitization** โ€” Strips non-standard fields (`x_groq`, `usage_breakdown`, `service_tier`) that break OpenAI SDK v1.83+ - **Role Normalization** โ€” Converts `developer` โ†’ `system` for non-OpenAI providers; `system` โ†’ `user` for GLM/ERNIE - **Think Tag Extraction** โ€” Extracts `` blocks from models like DeepSeek R1 into standardized `reasoning_content` - **Structured Output for Gemini** โ€” `json_schema` โ†’ `responseMimeType`/`responseSchema` automatic conversion - **`stream` defaults to `false`** โ€” Aligns with OpenAI spec, avoiding unexpected SSE in Python/Rust/Go SDKs
๐ŸŒ 3. "My AI provider blocks my region/country" Providers like OpenAI/Codex block access from certain geographic regions. Users get errors like `unsupported_country_region_territory` during OAuth and API connections. This is especially frustrating for developers from developing countries. **How OmniRoute solves it:** - **3-Level Proxy Config** โ€” Configurable proxy at 3 levels: global (all traffic), per-provider (one provider only), and per-connection/key - **Color-Coded Proxy Badges** โ€” Visual indicators: ๐ŸŸข global proxy, ๐ŸŸก provider proxy, ๐Ÿ”ต connection proxy, always showing the IP - **OAuth Token Exchange Through Proxy** โ€” OAuth flow also goes through the proxy, solving `unsupported_country_region_territory` - **Connection Tests via Proxy** โ€” Connection tests use the configured proxy (no more direct bypass) - **SOCKS5 Support** โ€” Full SOCKS5 proxy support for outbound routing - **TLS Fingerprint Spoofing** โ€” Browser-like TLS fingerprint via `wreq-js` to bypass bot detection - **๐Ÿ” CLI Fingerprint Matching** โ€” Reorders headers and body fields to match native CLI binary signatures, drastically reducing account flagging risk. The proxy IP is preserved โ€” you get both stealth **and** IP masking simultaneously
๐Ÿ†“ 4. "I want to use AI for coding but I have no money" Not everyone can pay $20โ€“200/month for AI subscriptions. Students, devs from emerging countries, hobbyists, and freelancers need access to quality models at zero cost. **How OmniRoute solves it:** - **Free Tier Providers Built-in** โ€” Native support for 100% free providers: Qoder (5 unlimited models via OAuth: kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2, kimi-k2), Qwen (4 unlimited models: qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next, vision-model), Kiro (Claude + AWS Builder ID for free), Gemini CLI (180K tokens/month free) - **Ollama Cloud** โ€” Cloud-hosted Ollama models at `api.ollama.com` with free "Light usage" tier; use `ollamacloud/` prefix - **Free-Only Combos** โ€” Chain `gc/gemini-3-flash โ†’ if/kimi-k2-thinking โ†’ qw/qwen3-coder-plus` = $0/month with zero downtime - **NVIDIA NIM Free Access** โ€” ~40 RPM dev-forever free access to 70+ models at build.nvidia.com (transitioning from credits to pure rate limits) - **Cost Optimized Strategy** โ€” Routing strategy that automatically chooses the cheapest available provider
๐Ÿ”’ 5. "I need to protect my AI gateway from unauthorized access" When exposing an AI gateway to the network (LAN, VPS, Docker), anyone with the address can consume the developer's tokens/quota. Without protection, APIs are vulnerable to misuse, prompt injection, and abuse. **How OmniRoute solves it:** - **API Key Management** โ€” Generation, rotation, and scoping per provider with a dedicated `/dashboard/api-manager` page - **Model-Level Permissions** โ€” Restrict API keys to specific models (`openai/*`, wildcard patterns), with Allow All/Restrict toggle - **API Endpoint Protection** โ€” Require a key for `/v1/models` and block specific providers from the listing - **Auth Guard + CSRF Protection** โ€” All dashboard routes protected with `withAuth` middleware + CSRF tokens - **Rate Limiter** โ€” Per-IP rate limiting with configurable windows - **IP Filtering** โ€” Allowlist/blocklist for access control - **Prompt Injection Guard** โ€” Sanitization against malicious prompt patterns - **AES-256-GCM Encryption** โ€” Credentials encrypted at rest
๐Ÿ›‘ 6. "My provider went down and I lost my coding flow" AI providers can become unstable, return 5xx errors, or hit temporary rate limits. If a dev depends on a single provider, they're interrupted. Without circuit breakers, repeated retries can crash the application. **How OmniRoute solves it:** - **Request Queue & Pacing** โ€” Per-connection request buckets smooth bursts before they hit upstream rate caps - **Connection Cooldown** โ€” A single connection cools down after retryable failures with optional upstream `Retry-After` hints and exponential backoff - **Provider Circuit Breaker** โ€” The provider only trips after fallback is exhausted and the provider request still fails with provider-wide transient errors; connection-scoped `429` rate limits stay in Connection Cooldown - **Wait For Cooldown** โ€” The server can wait for the earliest connection cooldown to expire and retry the same client request automatically - **Anti-Thundering Herd** โ€” Mutex + semaphore protection against concurrent retry storms - **Combo Fallback Chains** โ€” If the primary provider fails, automatically falls through the chain with no intervention - **Health Dashboard** โ€” Uptime monitoring, provider circuit breaker states, cooldowns, cache stats, p50/p95/p99 latency
๐Ÿ”ง 7. "Configuring each AI tool is tedious and repetitive" Developers use Cursor, Claude Code, Codex CLI, OpenClaw, Gemini CLI, Kilo Code... Each tool needs a different config (API endpoint, key, model). Reconfiguring when switching providers or models is a waste of time. **How OmniRoute solves it:** - **CLI Tools Dashboard** โ€” Dedicated page with one-click setup for Claude Code, Codex CLI, OpenClaw, Kilo Code, Antigravity, Cline - **GitHub Copilot Config Generator** โ€” Generates `chatLanguageModels.json` for VS Code with bulk model selection - **Onboarding Wizard** โ€” Guided 4-step setup for first-time users - **One endpoint, all models** โ€” Configure `http://localhost:20128/v1` once, access 100+ providers
๐Ÿ”‘ 8. "Managing OAuth tokens from multiple providers is hell" Claude Code, Codex, Gemini CLI, Copilot โ€” all use OAuth 2.0 with expiring tokens. Developers need to re-authenticate constantly, deal with `client_secret is missing`, `redirect_uri_mismatch`, and failures on remote servers. OAuth on LAN/VPS is particularly problematic. **How OmniRoute solves it:** - **Auto Token Refresh** โ€” OAuth tokens refresh in background before expiration - **OAuth 2.0 (PKCE) Built-in** โ€” Automatic flow for Claude Code, Codex, Gemini CLI, Copilot, Kiro, Qwen, Qoder - **Multi-Account OAuth** โ€” Multiple accounts per provider via JWT/ID token extraction - **OAuth LAN/Remote Fix** โ€” Private IP detection for `redirect_uri` + manual URL mode for remote servers - **OAuth Behind Nginx** โ€” Uses `window.location.origin` for reverse proxy compatibility - **Remote OAuth Guide** โ€” Step-by-step guide for Google Cloud credentials on VPS/Docker
๐Ÿ“Š 9. "I don't know how much I'm spending or where" Developers use multiple paid providers but have no unified view of spending. Each provider has its own billing dashboard, but there's no consolidated view. Unexpected costs can pile up. **How OmniRoute solves it:** - **Cost Analytics Dashboard** โ€” Per-token cost tracking and budget management per provider - **Budget Limits per Tier** โ€” Spending ceiling per tier that triggers automatic fallback - **Per-Model Pricing Configuration** โ€” Configurable prices per model - **Usage Statistics Per API Key** โ€” Request count and last-used timestamp per key - **Analytics Dashboard** โ€” Stat cards, model usage chart, provider table with success rates and latency
๐Ÿ› 10. "I can't diagnose errors and problems in AI calls" When a call fails, the dev doesn't know if it was a rate limit, expired token, wrong format, or provider error. Fragmented logs across different terminals. Without observability, debugging is trial-and-error. **How OmniRoute solves it:** - **Unified Logs Dashboard** โ€” 4 tabs: Request Logs, Proxy Logs, Audit Logs, Console - **Console Log Viewer** โ€” Real-time terminal-style viewer with color-coded levels, auto-scroll, search, filter - **SQLite Summary Logs** โ€” Request and proxy log indexes stay queryable across restarts without loading large payload blobs into SQLite - **Translator Playground** โ€” 4 debugging modes: Playground (format translation), Chat Tester (round-trip), Test Bench (batch), Live Monitor (real-time) - **Request Telemetry** โ€” p50/p95/p99 latency + X-Request-Id tracing - **File-Based Detail Artifacts** โ€” App logs rotate by size, retention days, and archive count; detailed request/response payloads live in `DATA_DIR/call_logs/` and rotate independently of SQLite summaries - **System Info Report** โ€” `npm run system-info` generates `system-info.txt` with your full environment (Node version, OmniRoute version, OS, CLI tools, Docker/PM2 status). Attach it when reporting issues for instant triage.
๐Ÿ—๏ธ 11. "Deploying and maintaining the gateway is complex" Installing, configuring, and maintaining an AI proxy across different environments (local, VPS, Docker, cloud) is labor-intensive. Problems like hardcoded paths, `EACCES` on directories, port conflicts, and cross-platform builds add friction. **How OmniRoute solves it:** - **npm global install** โ€” `npm install -g omniroute && omniroute` โ€” done - **Docker Multi-Platform** โ€” AMD64 + ARM64 native (Apple Silicon, AWS Graviton, Raspberry Pi) - **Docker Compose Profiles** โ€” `base` (no CLI tools) and `cli` (with Claude Code, Codex, OpenClaw) - **Electron Desktop App** โ€” Native app for Windows/macOS/Linux with system tray, auto-start, offline mode - **Split-Port Mode** โ€” API and Dashboard on separate ports for advanced scenarios (reverse proxy, container networking) - **Cloud Sync** โ€” Config synchronization across devices via Cloudflare Workers - **DB Backups** โ€” Automatic backup, restore, export and import of all settings, with `DISABLE_SQLITE_AUTO_BACKUP` for externally managed backups
๐ŸŒ 12. "The interface is English-only and my team doesn't speak English" Teams in non-English-speaking countries, especially in Latin America, Asia, and Europe, struggle with English-only interfaces. Language barriers reduce adoption and increase configuration errors. **How OmniRoute solves it:** - **Dashboard i18n โ€” 30 Languages** โ€” All 500+ keys translated including Arabic, Bulgarian, Danish, German, Spanish, Finnish, French, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese (PT/BR), Romanian, Russian, Slovak, Swedish, Thai, Ukrainian, Vietnamese, Chinese, Filipino, English - **RTL Support** โ€” Right-to-left support for Arabic and Hebrew - **Multi-Language READMEs** โ€” 30 complete documentation translations - **Language Selector** โ€” Globe icon in header for real-time switching
๐Ÿ”„ 13. "I need more than chat โ€” I need embeddings, images, audio" AI isn't just chat completion. Devs need to generate images, transcribe audio, create embeddings for RAG, rerank documents, and moderate content. Each API has a different endpoint and format. **How OmniRoute solves it:** - **Embeddings** โ€” `/v1/embeddings` with 6 providers and 9+ models - **Image Generation** โ€” `/v1/images/generations` with 10 providers and 20+ models (OpenAI, xAI, Together, Fireworks, Nebius, Hyperbolic, NanoBanana, Antigravity, SD WebUI, ComfyUI) - **Text-to-Video** โ€” `/v1/videos/generations` โ€” ComfyUI (AnimateDiff, SVD) and SD WebUI - **Text-to-Music** โ€” `/v1/music/generations` โ€” ComfyUI (Stable Audio Open, MusicGen) - **Audio Transcription** โ€” `/v1/audio/transcriptions` โ€” Whisper + Nvidia NIM, HuggingFace, Qwen3 - **Text-to-Speech** โ€” `/v1/audio/speech` โ€” ElevenLabs, Nvidia NIM, HuggingFace, Coqui, Tortoise, Qwen3, **Inworld**, **Cartesia**, **PlayHT**, + existing providers - **Moderations** โ€” `/v1/moderations` โ€” Content safety checks - **Reranking** โ€” `/v1/rerank` โ€” Document relevance reranking - **Responses API** โ€” Full `/v1/responses` support for Codex
๐Ÿงช 14. "I have no way to test and compare quality across models" Developers want to know which model is best for their use case โ€” code, translation, reasoning โ€” but comparing manually is slow. No integrated eval tools exist. **How OmniRoute solves it:** - **LLM Evaluations** โ€” Golden set testing with 10 pre-loaded cases covering greetings, math, geography, code generation, JSON compliance, translation, markdown, safety refusal - **4 Match Strategies** โ€” `exact`, `contains`, `regex`, `custom` (JS function) - **Translator Playground Test Bench** โ€” Batch testing with multiple inputs and expected outputs, cross-provider comparison - **Chat Tester** โ€” Full round-trip with visual response rendering - **Live Monitor** โ€” Real-time stream of all requests flowing through the proxy
๐Ÿ“ˆ 15. "I need to scale without losing performance" As request volume grows, without caching the same questions generate duplicate costs. Without idempotency, duplicate requests waste processing. Per-provider rate limits must be respected. **How OmniRoute solves it:** - **Semantic Cache** โ€” Two-tier cache (signature + semantic) reduces cost and latency - **Request Idempotency** โ€” 5s deduplication window for identical requests - **Rate Limit Detection** โ€” Per-provider RPM, min gap, and max concurrent tracking - **Request Queue & Pacing** โ€” Configurable queue, pacing, and concurrency defaults in Settings โ†’ Resilience - **API Key Validation Cache** โ€” 3-tier cache for production performance - **Health Dashboard with Telemetry** โ€” p50/p95/p99 latency, cache stats, uptime
๐Ÿค– 16. "I want to control model behavior globally" Developers who want all responses in a specific language, with a specific tone, or want to limit reasoning tokens. Configuring this in every tool/request is impractical. **How OmniRoute solves it:** - **System Prompt Injection** โ€” Global prompt applied to all requests - **Thinking Budget Validation** โ€” Reasoning token allocation control per request (passthrough, auto, custom, adaptive) - **9 Routing Strategies** โ€” Global strategies that determine how requests are distributed - **Wildcard Router** โ€” `provider/*` patterns route dynamically to any provider - **Combo Enable/Disable Toggle** โ€” Toggle combos directly from the dashboard - **Manual Combo Ordering** โ€” Drag combo cards by handle and persist the order in SQLite - **Provider Toggle** โ€” Enable/disable all connections for a provider with one click - **Blocked Providers** โ€” Exclude specific providers from `/v1/models` listing
๐Ÿงฐ 17. "I need MCP tools as first-class product capabilities" Many AI gateways expose MCP only as a hidden implementation detail. Teams need a visible, manageable operation layer. **How OmniRoute solves it:** - MCP appears in the dashboard navigation and endpoint protocol tab - Dedicated MCP management page with process, tools, scopes, and audit - Built-in quick-start for `omniroute --mcp` and client onboarding
๐Ÿง  18. "I need A2A orchestration with sync + stream task paths" Agent workflows need both direct replies and long-running streamed execution with lifecycle control. **How OmniRoute solves it:** - A2A JSON-RPC endpoint (`POST /a2a`) with `message/send` and `message/stream` - SSE streaming with terminal state propagation - Task lifecycle APIs for `tasks/get` and `tasks/cancel`
๐Ÿ›ฐ๏ธ 19. "I need real MCP process health, not guessed status" Operational teams need to know if MCP is actually alive, not just whether an API is reachable. **How OmniRoute solves it:** - Runtime heartbeat file with PID, timestamps, transport, tool count, and scope mode - MCP status API combining heartbeat + recent activity - UI status cards for process/uptime/heartbeat freshness
๐Ÿ“‹ 20. "I need auditable MCP tool execution" When tools mutate config or trigger ops actions, teams need forensic traceability. **How OmniRoute solves it:** - SQLite-backed audit logging for MCP tool calls - Filters by tool, success/failure, API key, and pagination - Dashboard audit table + stats endpoints for automation
๐Ÿ” 21. "I need scoped MCP permissions per integration" Different clients should have least-privilege access to tool categories. **How OmniRoute solves it:** - 10 granular MCP scopes for controlled tool access - Scope enforcement and visibility in MCP management UI - Safe default posture for operational tooling
โš™๏ธ 22. "I need operational controls without redeploying" Teams need quick runtime changes during incidents or cost events. **How OmniRoute solves it:** - Switch combo activation directly from MCP dashboard - Tune queue, cooldown, breaker, and wait settings from the dedicated Resilience page - Review live provider breaker state from the Health dashboard
๐Ÿ”„ 23. "I need live A2A task lifecycle visibility and cancellation" Without lifecycle visibility, task incidents become hard to triage. **How OmniRoute solves it:** - Task listing/filtering by state/skill with pagination - Drill-down on task metadata, events, and artifacts - Task cancellation endpoint and UI action with confirmation
๐ŸŒŠ 24. "I need active stream metrics for A2A load" Streaming workflows require operational insight into concurrency and live connections. **How OmniRoute solves it:** - Active stream counters integrated into A2A status - Last task timestamp and per-state counts - A2A dashboard cards for real-time ops monitoring
๐Ÿชช 25. "I need standard agent discovery for clients" External clients and orchestrators need machine-readable metadata for onboarding. **How OmniRoute solves it:** - Agent Card exposed at `/.well-known/agent.json` - Capabilities and skills shown in management UI - A2A status API includes discovery metadata for automation
๐Ÿงญ 26. "I need protocol discoverability in the product UX" If users cannot discover protocol surfaces, adoption and support quality drop. **How OmniRoute solves it:** - Consolidated **Endpoints** page with tabs for Proxy, MCP, A2A, and API Endpoints - Inline service status toggles (Online/Offline) for MCP and A2A - Links from overview to dedicated management tabs
๐Ÿงช 27. "I need end-to-end protocol validation with real clients" Mock tests are not enough to validate protocol compatibility before release. **How OmniRoute solves it:** - E2E suite that boots app and uses real MCP SDK client transport - A2A client tests for discovery, send, stream, get, and cancel flows - Cross-check assertions against MCP audit and A2A tasks APIs
๐Ÿ“ก 28. "I need unified observability across all interfaces" Splitting observability by protocol creates blind spots and longer MTTR. **How OmniRoute solves it:** - Unified dashboards/logs/analytics in one product - Health + audit + request telemetry across OpenAI, MCP, and A2A layers - Operational APIs for status and automation
๐Ÿ’ผ 29. "I need one runtime for proxy + tools + agent orchestration" Running many separate services increases operational cost and failure modes. **How OmniRoute solves it:** - OpenAI-compatible proxy, MCP server, and A2A server in one stack - Shared auth, resilience, data store, and observability - Consistent policy model across all interaction surfaces
๐Ÿš€ 30. "I need to ship agentic workflows without glue-code sprawl" Teams lose velocity when stitching multiple ad-hoc services and scripts. **How OmniRoute solves it:** - Unified endpoint strategy for clients and agents - Built-in protocol management UIs and smoke validation paths - Production-ready foundations (security, logging, resilience, backup)
๐Ÿ“š 31. "My long sessions crash with 'context_length_exceeded' limits" During deep debugging, long histories with tool results quickly exceed provider token windows, causing failed requests and orphaned context. **How OmniRoute solves it:** - **Proactive Context Compression** โ€” Evaluates token budgets before the request hits upstream and proactively prunes old conversation history with a smart binary-search mechanism. - **Structural Integrity Guards** โ€” Automatically tracks explicit `tool_use` definitions and ensures that if a tool input is truncated, its corresponding `tool_result` is also safely removed, preventing API validation errors. - **Multi-Layer Dropping** โ€” Progressively drops system messages, regular messages, and finally enforces strict length limits without breaking conversational logic.
### Example Playbooks (Integrated Use Cases) **Playbook A: Maximize paid subscription + cheap backup** ```txt Combo: "maximize-claude" 1. cc/claude-opus-4-7 2. glm/glm-4.7 3. if/kimi-k2-thinking Monthly cost: $20 + small backup spend Outcome: higher quality, near-zero interruption ``` **Playbook B: Zero-cost coding stack** ```txt Combo: "free-forever" 1. gc/gemini-3-flash 2. if/kimi-k2-thinking 3. qw/qwen3-coder-plus Monthly cost: $0 Outcome: stable free coding workflow ``` **Playbook C: 24/7 always-on fallback chain** ```txt Combo: "always-on" 1. cc/claude-opus-4-7 2. cx/gpt-5.2-codex 3. glm/glm-4.7 4. minimax/MiniMax-M2.1 5. if/kimi-k2-thinking Outcome: deep fallback depth for deadline-critical workloads ``` **Playbook D: Agent ops with MCP + A2A** ```txt 1) Start MCP transport (`omniroute --mcp`) for tool-driven operations 2) Run A2A tasks via `message/send` and `message/stream` 3) Observe via /dashboard/endpoint (MCP and A2A tabs) 4) Toggle services via inline status controls ``` --- ## ๐Ÿ†“ Start Free โ€” Zero Configuration Cost > Setup AI coding in minutes at **$0/month**. Connect these free accounts and use the built-in **Free Stack** combo. | Step | Action | Providers Unlocked | | ---- | -------------------------------------------------- | ------------------------------------------------------------------ | | 1 | Connect **Kiro** (AWS Builder ID OAuth) | Claude Sonnet 4.5, Haiku 4.5 โ€” **unlimited** | | 2 | Connect **Qoder** (Google OAuth) | kimi-k2-thinking, qwen3-coder-plus, deepseek-r1... โ€” **unlimited** | | 3 | Connect **Qwen** (Device Code) | qwen3-coder-plus, qwen3-coder-flash... โ€” **unlimited** | | 4 | Connect **Gemini CLI** (Google OAuth) | gemini-3-flash, gemini-2.5-pro โ€” **180K/mo free** | | 5 | `/dashboard/combos` โ†’ **Free Stack ($0)** template | Round-robin all free providers automatically | **Point any IDE/CLI to:** `http://localhost:20128/v1` ยท API Key: `any-string` ยท Done. > **Optional extra coverage (also free):** Groq API key (30 RPM free), NVIDIA NIM (40 RPM free, 70+ models), Cerebras (1M tok/day), LongCat API key (50M tokens/day!), Cloudflare Workers AI (10K Neurons/day, 50+ models). ## Inicio Rรกpido ### 1) Install and run ```bash npm install -g omniroute omniroute ``` > **pnpm users:** Run `pnpm approve-builds -g` after install to enable native build scripts required by `better-sqlite3` and `@swc/core`: > > ```bash > pnpm install -g omniroute > pnpm approve-builds -g # Select all packages โ†’ approve > omniroute > ``` Dashboard opens at `http://localhost:20128` and API base URL is `http://localhost:20128/v1`. #### Arch Linux (AUR) Arch Linux users can install the [AUR package](https://aur.archlinux.org/packages/omniroute-bin), which installs OmniRoute and provides a systemd user service: ```bash yay -S omniroute-bin systemctl --user enable --now omniroute.service ``` | Command | Description | | ----------------------- | ----------------------------------------------------------- | | `omniroute` | Start server (`PORT=20128`, API and dashboard on same port) | | `omniroute --port 3000` | Set canonical/API port to 3000 | | `omniroute --mcp` | Start MCP server (stdio transport) | | `omniroute --no-open` | Don't auto-open browser | | `omniroute --help` | Show help | Optional split-port mode: ```bash PORT=20128 DASHBOARD_PORT=20129 omniroute # API: http://localhost:20128/v1 # Dashboard: http://localhost:20129 ``` ### 2) Uninstalling When you no longer need OmniRoute, we provide two quick scripts for a clean removal: | Command | Action | | ------------------------ | ----------------------------------------------------------------------------------- | | `npm run uninstall` | Removes the system app but **keeps your DB and configurations** in `~/.omniroute`. | | `npm run uninstall:full` | Removes the app AND permanently **erases all configurations, keys, and databases**. | > Note: To run these commands, navigate to the OmniRoute project folder (if you cloned it) and run them. Alternatively, if globally installed, you can simply run `npm uninstall -g omniroute`. ### Long-Running Streaming Timeouts For most deployments, you only need: | Variable | Default | Purpose | | ------------------------ | ----------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | | `REQUEST_TIMEOUT_MS` | `600000` | Shared baseline for upstream response-start timeout, hidden Undici timeouts, TLS fingerprint requests, and API bridge request/proxy timeouts | | `STREAM_IDLE_TIMEOUT_MS` | inherits `REQUEST_TIMEOUT_MS` | Maximum gap between streaming chunks before OmniRoute aborts the SSE stream | Backward compatibility is preserved: existing `FETCH_TIMEOUT_MS`, `API_BRIDGE_PROXY_TIMEOUT_MS`, and other per-layer timeout vars still work and override the shared baseline. For Claude Code-compatible upstreams (`anthropic-compatible-cc-*`), OmniRoute also derives the outbound `X-Stainless-Timeout` header from the resolved fetch timeout so provider-side read timeouts stay aligned with your env configuration. For third-party Claude Code-compatible reverse proxies, OmniRoute keeps the default `anthropic-beta` set conservative and, when `Client Cache Control` is left on `Auto`, only forwards client-provided `cache_control` markers. If the request does not include `cache_control`, OmniRoute does not inject bridge-owned markers. Advanced overrides are available if you need finer control: | Variable | Default | Purpose | | ---------------------------------------- | ------------------------------------------ | -------------------------------------------------------------------- | | `FETCH_TIMEOUT_MS` | inherits `REQUEST_TIMEOUT_MS` | Upstream response-start timeout used until response headers arrive | | `FETCH_HEADERS_TIMEOUT_MS` | inherits `FETCH_TIMEOUT_MS` | Undici time limit for receiving upstream response headers | | `FETCH_BODY_TIMEOUT_MS` | inherits `FETCH_TIMEOUT_MS` | Undici time limit between upstream body chunks (`0` disables it) | | `FETCH_CONNECT_TIMEOUT_MS` | `30000` | Undici TCP connect timeout | | `FETCH_KEEPALIVE_TIMEOUT_MS` | `4000` | Undici idle keep-alive socket timeout | | `TLS_CLIENT_TIMEOUT_MS` | inherits `FETCH_TIMEOUT_MS` | Timeout for TLS fingerprint requests made through `wreq-js` | | `API_BRIDGE_PROXY_TIMEOUT_MS` | inherits `REQUEST_TIMEOUT_MS` or `30000` | Timeout for `/v1` proxy forwarding from API port to dashboard port | | `API_BRIDGE_SERVER_REQUEST_TIMEOUT_MS` | `max(API_BRIDGE_PROXY_TIMEOUT_MS, 300000)` | Incoming request timeout on the API bridge server | | `API_BRIDGE_SERVER_HEADERS_TIMEOUT_MS` | `60000` | Incoming header timeout on the API bridge server | | `API_BRIDGE_SERVER_KEEPALIVE_TIMEOUT_MS` | `5000` | Keep-alive timeout on the API bridge server | | `API_BRIDGE_SERVER_SOCKET_TIMEOUT_MS` | `0` | Socket inactivity timeout on the API bridge server (`0` disables it) | For streaming requests, `FETCH_TIMEOUT_MS` only covers connection setup / waiting for the first upstream response. Once the stream is active, OmniRoute will only abort on an actual stall (`STREAM_IDLE_TIMEOUT_MS`) or Undici body inactivity (`FETCH_BODY_TIMEOUT_MS`). If you run OmniRoute behind Nginx, Caddy, Cloudflare, or another reverse proxy, make sure the proxy timeouts are also higher than your OmniRoute stream/fetch timeouts. ### 2) Connect providers and create your API key 1. Open Dashboard โ†’ `Providers` and connect at least one provider (OAuth or API key). 2. Open Dashboard โ†’ `Endpoints` and create an API key. 3. (Optional) Open Dashboard โ†’ `Combos` and set your fallback chain. ### 3) Point your coding tool to OmniRoute ```txt Base URL: http://localhost:20128/v1 API Key: [copy from Endpoint page] Model: if/kimi-k2-thinking (or any provider/model prefix) ``` Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Cline, OpenClaw, OpenCode, and OpenAI-compatible SDKs. ### 4) Enable and validate protocols (v2.0) **MCP (for tool-driven operations):** ```bash omniroute --mcp ``` Then connect your MCP client over `stdio` and test tools like: - `omniroute_get_health` - `omniroute_list_combos` **A2A (for agent-to-agent workflows):** ```bash curl http://localhost:20128/.well-known/agent.json ``` ```bash curl -X POST http://localhost:20128/a2a \ -H 'content-type: application/json' \ -d '{"jsonrpc":"2.0","id":"quickstart","method":"message/send","params":{"skill":"quota-management","messages":[{"role":"user","content":"Give me a short quota summary."}]}}' ``` ### 5) Validate everything end-to-end (recommended) ```bash npm run test:protocols:e2e ``` This suite validates real MCP and A2A client flows against a running app. ### Alternative: run from source ```bash cp .env.example .env npm install PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run dev ```
Void Linux (`xbps-src` template) For Void Linux users, you can build a native package using `xbps-src`. Save this block as `srcpkgs/omniroute/template`: ```bash # Template file for 'omniroute' pkgname=omniroute version=3.4.1 revision=1 hostmakedepends="nodejs python3 make" depends="openssl" short_desc="Universal AI gateway with smart routing for multiple LLM providers" maintainer="zenobit " license="MIT" homepage="https://github.com/diegosouzapw/OmniRoute" distfiles="https://github.com/diegosouzapw/OmniRoute/archive/refs/tags/v${version}.tar.gz" checksum=009400afee90a9f32599d8fe734145cfd84098140b7287990183dde45ae2245b system_accounts="_omniroute" omniroute_homedir="/var/lib/omniroute" export NODE_ENV=production export npm_config_engine_strict=false export npm_config_loglevel=error export npm_config_fund=false export npm_config_audit=false do_build() { # Determine target CPU arch for node-gyp local _gyp_arch case "$XBPS_TARGET_MACHINE" in aarch64*) _gyp_arch=arm64 ;; armv7*|armv6*) _gyp_arch=arm ;; i686*) _gyp_arch=ia32 ;; *) _gyp_arch=x64 ;; esac # 1) Install all deps โ€“ skip scripts (no network in do_build, native modules # compiled separately below; better-sqlite3 is serverExternalPackage so # Next.js does not execute it during next build) NODE_ENV=development npm ci --ignore-scripts # 2) Build the Next.js standalone bundle npm run build # 3) Copy static assets into standalone cp -r .next/static .next/standalone/.next/static [ -d public ] && cp -r public .next/standalone/public || true # 4) Compile better-sqlite3 native binding for the target architecture. # Use node-gyp directly so CC/CXX from xbps-src cross-toolchain are used # without npm altering them. local _node_gyp=/usr/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js (cd node_modules/better-sqlite3 && node "$_node_gyp" rebuild --arch="$_gyp_arch") # 5) Place the compiled binding into the standalone bundle local _bs3_release=.next/standalone/node_modules/better-sqlite3/build/Release mkdir -p "$_bs3_release" cp node_modules/better-sqlite3/build/Release/better_sqlite3.node "$_bs3_release/" # 6) Remove arch-specific sharp bundles โ€“ upstream sets images.unoptimized=true # so sharp is not used at runtime; x64 .so files would break aarch64 strip rm -rf .next/standalone/node_modules/@img # 7) Copy pino runtime deps omitted by Next.js static analysis: # pino-abstract-transport โ€“ required by pino's worker thread # split2 โ€“ dep of pino-abstract-transport # process-warning โ€“ dep of pino itself for _mod in pino-abstract-transport split2 process-warning; do cp -r "node_modules/$_mod" .next/standalone/node_modules/ done } do_check() { npm run test:unit } do_install() { vmkdir usr/lib/omniroute/.next vcopy .next/standalone/. usr/lib/omniroute/.next/standalone # Prevent removal of empty Next.js app router dirs by the post-install hook for _d in \ .next/standalone/.next/server/app/dashboard \ .next/standalone/.next/server/app/dashboard/settings \ .next/standalone/.next/server/app/dashboard/providers; do touch "${DESTDIR}/usr/lib/omniroute/${_d}/.keep" done cat > "${WRKDIR}/omniroute" <<'EOF' #!/bin/sh export PORT="${PORT:-20128}" export DATA_DIR="${DATA_DIR:-${XDG_DATA_HOME:-${HOME}/.local/share}/omniroute}" export APP_LOG_TO_FILE="${APP_LOG_TO_FILE:-false}" mkdir -p "${DATA_DIR}" exec node /usr/lib/omniroute/.next/standalone/server.js "$@" EOF vbin "${WRKDIR}/omniroute" } post_install() { vlicense LICENSE } ```
--- ## ๐Ÿณ Docker OmniRoute is available as a public Docker image on [Docker Hub](https://hub.docker.com/r/diegosouzapw/omniroute). **Quick run:** ```bash docker run -d \ --name omniroute \ --restart unless-stopped \ --stop-timeout 40 \ -p 20128:20128 \ -v omniroute-data:/app/data \ diegosouzapw/omniroute:latest ``` **With environment file:** ```bash # Copy and edit .env first cp .env.example .env docker run -d \ --name omniroute \ --restart unless-stopped \ --stop-timeout 40 \ --env-file .env \ -p 20128:20128 \ -v omniroute-data:/app/data \ diegosouzapw/omniroute:latest ``` **Using Docker Compose:** ```bash # Base profile (no CLI tools) docker compose --profile base up -d # CLI profile (Claude Code, Codex, OpenClaw built-in) docker compose --profile cli up -d ``` Dashboard support for Docker deployments now includes a one-click **Cloudflare Quick Tunnel** on `Dashboard โ†’ Endpoints`. The first enable downloads `cloudflared` only when needed, starts a temporary tunnel to your current `/v1` endpoint, and shows the generated `https://*.trycloudflare.com/v1` URL directly below your normal public URL. Notes: - Quick Tunnel URLs are temporary and change after every restart. - Quick Tunnels are not auto-restored after an OmniRoute or container restart. Re-enable them from the dashboard when needed. - Managed install currently supports Linux, macOS, and Windows on `x64` / `arm64`. - Managed Quick Tunnels default to HTTP/2 transport to avoid noisy QUIC UDP buffer warnings in constrained container environments. Set `CLOUDFLARED_PROTOCOL=quic` or `auto` if you want a different transport. - Docker images bundle system CA roots and pass them to managed `cloudflared`, which avoids TLS trust failures when the tunnel bootstraps inside the container. - SQLite runs in WAL mode. `docker stop` should be allowed to finish so OmniRoute can checkpoint the latest changes back into `storage.sqlite`. - The bundled Compose files already set a 40s stop grace period. If you run the image directly, keep `--stop-timeout 40` (or similar) so manual stops do not cut off shutdown cleanup. - Set `CLOUDFLARED_BIN=/absolute/path/to/cloudflared` if you want OmniRoute to use an existing binary instead of downloading one. **Using Docker Compose with Caddy (HTTPS Auto-TLS):** OmniRoute can be securely exposed using Caddy's automatic SSL provisioning. Ensure your domain's DNS A record points to your server's IP. ```yaml services: omniroute: image: diegosouzapw/omniroute:latest container_name: omniroute restart: unless-stopped volumes: - omniroute-data:/app/data environment: - PORT=20128 - NEXT_PUBLIC_BASE_URL=https://your-domain.com caddy: image: caddy:latest container_name: caddy restart: unless-stopped ports: - "80:80" - "443:443" command: caddy reverse-proxy --from https://your-domain.com --to http://omniroute:20128 volumes: omniroute-data: ``` | Image | Tag | Size | Description | | ------------------------ | -------- | ------ | --------------------- | | `diegosouzapw/omniroute` | `latest` | ~250MB | Latest stable release | | `diegosouzapw/omniroute` | `3.6.2` | ~250MB | Current version | --- ## ๐Ÿ–ฅ๏ธ Desktop App โ€” Offline & Always-On > ๐Ÿ†• **NEW!** OmniRoute is now available as a **native desktop application** for Windows, macOS, and Linux. Run OmniRoute as a standalone desktop app โ€” no terminal, no browser, no internet required for local models. The Electron-based app includes: - ๐Ÿ–ฅ๏ธ **Native Window** โ€” Dedicated app window with system tray integration - ๐Ÿ”„ **Auto-Start** โ€” Launch OmniRoute on system login - ๐Ÿ”” **Native Notifications** โ€” Get alerts for quota exhaustion or provider issues - โšก **One-Click Install** โ€” NSIS (Windows), DMG (macOS), AppImage (Linux) - ๐ŸŒ **Offline Mode** โ€” Works fully offline with bundled server ### Inicio Rรกpido ```bash # Development mode npm run electron:dev # Build for your platform npm run electron:build # Current platform npm run electron:build:win # Windows (.exe) npm run electron:build:mac # macOS (.dmg) โ€” x64 & arm64 npm run electron:build:linux # Linux (.AppImage) ``` ### System Tray When minimized, OmniRoute lives in your system tray with quick actions: - Open dashboard - Change server port - Quit application ๐Ÿ“– Full documentation: [`electron/README.md`](electron/README.md) --- ## ๐Ÿ’ฐ Pricing at a Glance | Tier | Provider | Cost | Quota Reset | Best For | | ------------------- | --------------------------- | ------------------------- | ---------------- | --------------------------------- | | **๐Ÿ’ณ SUBSCRIPTION** | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed | | | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users | | | Gemini CLI | **FREE** | 180K/mo + 1K/day | Everyone! | | | GitHub Copilot | $10-19/mo | Monthly | GitHub users | | **๐Ÿ”‘ API KEY** | NVIDIA NIM | **FREE** (dev forever) | ~40 RPM | 70+ open models | | | Cerebras | **FREE** (1M tok/day) | 60K TPM / 30 RPM | World's fastest | | | Groq | **FREE** (30 RPM) | 14.4K RPD | Ultra-fast Llama/Gemma | | | DeepSeek V3.2 | $0.27/$1.10 per 1M | None | Best price/quality reasoning | | | xAI Grok-4 Fast | **$0.20/$0.50 per 1M** ๐Ÿ†• | None | Fastest + tool calling, ultralow | | | xAI Grok-4 (standard) | $0.20/$1.50 per 1M ๐Ÿ†• | None | Reasoning flagship from xAI | | | Mistral | Free trial + paid | Rate limited | European AI | | | OpenRouter | Pay-per-use | None | 100+ models aggr. | | **๐Ÿ’ฐ CHEAP** | GLM-5 (via Z.AI) ๐Ÿ†• | $0.5/1M | Daily 10AM | 128K output, newest flagship | | | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup | | | MiniMax M2.5 ๐Ÿ†• | $0.3/1M input | 5-hour rolling | Reasoning + agentic tasks | | | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option | | | Kimi K2.5 (Moonshot API) ๐Ÿ†• | Pay-per-use | None | Direct Moonshot API access | | | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost | | **๐Ÿ†“ FREE** | Qoder | **$0** | Unlimited | 5 models unlimited | | | Qwen | **$0** | Unlimited | 4 models unlimited | | | Kiro | **$0** | Unlimited | Claude Sonnet/Haiku (AWS Builder) | | | LongCat Flash-Lite ๐Ÿ†• | **$0** (50M tok/day ๐Ÿ”ฅ) | 1 RPS | Largest free quota on Earth | | | Pollinations AI ๐Ÿ†• | **$0** (no key needed) | 1 req/15s | GPT-5, Claude, DeepSeek, Llama 4 | | | Cloudflare Workers AI ๐Ÿ†• | **$0** (10K Neurons/day) | ~150 resp/day | 50+ models, global edge | | | Scaleway AI ๐Ÿ†• | **$0** (1M tokens total) | Rate limited | EU/GDPR, Qwen3 235B, Llama 70B | > ๐Ÿ†• **New models added (Mar 2026):** Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms โ€” 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API. **๐Ÿ’ก $0 Combo Stack โ€” The Complete Free Setup:** ``` # ๐Ÿ†“ Ultimate Free Stack 2026 โ€” 11 Providers, $0 Forever Kiro (kr/) โ†’ Claude Sonnet/Haiku UNLIMITED Qoder (if/) โ†’ kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 UNLIMITED LongCat Lite (lc/) โ†’ LongCat-Flash-Lite โ€” 50M tokens/day ๐Ÿ”ฅ Pollinations (pol/) โ†’ GPT-5, Claude, DeepSeek, Llama 4 โ€” no key needed Qwen (qw/) โ†’ qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next UNLIMITED Gemini (gemini/) โ†’ Gemini 2.5 Flash โ€” 1,500 req/day free API key Cloudflare AI (cf/) โ†’ Llama 70B, Gemma 3, Mistral โ€” 10K Neurons/day Scaleway (scw/) โ†’ Qwen3 235B, Llama 70B โ€” 1M free tokens (EU) Groq (groq/) โ†’ Llama/Gemma ultra-fast โ€” 14.4K req/day NVIDIA NIM (nvidia/) โ†’ 70+ open models โ€” 40 RPM forever Cerebras (cerebras/) โ†’ Llama/Qwen world-fastest โ€” 1M tok/day ``` **Zero cost. Never stops coding.** Configure this as one OmniRoute combo and all fallbacks happen automatically โ€” no manual switching ever. --- --- ## ๐Ÿ†“ Free Models โ€” What You Actually Get > All models below are **100% free with zero credit card required**. OmniRoute auto-routes between them when one quota runs out โ€” combine them all for an unbreakable $0 combo. ### ๐Ÿ”ต CLAUDE MODELS (via Kiro โ€” AWS Builder ID) | Model | Prefix | Limit | Rate Limit | | ------------------- | ------ | ------------- | --------------------- | | `claude-sonnet-4.5` | `kr/` | **Unlimited** | No reported daily cap | | `claude-haiku-4.5` | `kr/` | **Unlimited** | No reported daily cap | | `claude-opus-4.6` | `kr/` | **Unlimited** | Latest Opus via Kiro | ### ๐ŸŸข QODER MODELS (Free PAT via qodercli) | Model | Prefix | Limit | Rate Limit | | ------------------ | ------ | ------------- | --------------- | | `kimi-k2-thinking` | `if/` | **Unlimited** | No reported cap | | `qwen3-coder-plus` | `if/` | **Unlimited** | No reported cap | | `deepseek-r1` | `if/` | **Unlimited** | No reported cap | | `minimax-m2.1` | `if/` | **Unlimited** | No reported cap | | `kimi-k2` | `if/` | **Unlimited** | No reported cap | > Recommended connection method: **Personal Access Token + `qodercli`**. Browser OAuth is > experimental and disabled by default unless `QODER_OAUTH_*` environment variables are configured. ### ๐ŸŸก QWEN MODELS (Device Code Auth) | Model | Prefix | Limit | Rate Limit | | ------------------- | ------ | ------------- | ------------------- | | `qwen3-coder-plus` | `qw/` | **Unlimited** | No reported cap | | `qwen3-coder-flash` | `qw/` | **Unlimited** | No reported cap | | `qwen3-coder-next` | `qw/` | **Unlimited** | No reported cap | | `vision-model` | `qw/` | **Unlimited** | Multimodal (images) | ### ๐ŸŸฃ GEMINI CLI (Google OAuth) | Model | Prefix | Limit | Rate Limit | | ------------------------ | ------ | --------------------------- | ------------- | | `gemini-3-flash-preview` | `gc/` | **180K tok/month** + 1K/day | Monthly reset | | `gemini-2.5-pro` | `gc/` | 180K/month (shared pool) | High quality | ### โšซ NVIDIA NIM (Free API Key โ€” build.nvidia.com) | Tier | Daily Limit | Rate Limit | Notes | | ---------- | ------------ | ----------- | ------------------------------------------------------ | | Free (Dev) | No token cap | **~40 RPM** | 70+ models; transitioning to pure rate limits mid-2025 | Popular free models: `moonshotai/kimi-k2.5` (Kimi K2.5), `z-ai/glm4.7` (GLM 4.7), `deepseek-ai/deepseek-v3.2` (DeepSeek V3.2), `nvidia/llama-3.3-70b-instruct`, `deepseek/deepseek-r1` ### โšช CEREBRAS (Free API Key โ€” inference.cerebras.ai) | Tier | Daily Limit | Rate Limit | Notes | | ---- | ----------------- | ---------------- | ------------------------------------------- | | Free | **1M tokens/day** | 60K TPM / 30 RPM | World's fastest LLM inference; resets daily | Available free: `llama-3.3-70b`, `llama-3.1-8b`, `deepseek-r1-distill-llama-70b` ### ๐Ÿ”ด GROQ (Free API Key โ€” console.groq.com) | Tier | Daily Limit | Rate Limit | Notes | | ---- | ------------- | ---------------- | ----------------------------------------- | | Free | **14.4K RPD** | 30 RPM per model | No credit card; 429 on limit, not charged | Available free: `llama-3.3-70b-versatile`, `gemma2-9b-it`, `mixtral-8x7b`, `whisper-large-v3` ### ๐Ÿ”ด LONGCAT AI (Free API Key โ€” longcat.chat) ๐Ÿ†• | Model | Prefix | Daily Free Quota | Notes | | ----------------------------- | ------ | ----------------- | ----------------------- | | `LongCat-Flash-Lite` | `lc/` | **50M tokens** ๐Ÿ’ฅ | Largest free quota ever | | `LongCat-Flash-Chat` | `lc/` | 500K tokens | Multi-turn chat | | `LongCat-Flash-Thinking` | `lc/` | 500K tokens | Reasoning / CoT | | `LongCat-Flash-Thinking-2601` | `lc/` | 500K tokens | Jan 2026 version | | `LongCat-Flash-Omni-2603` | `lc/` | 500K tokens | Multimodal | > 100% free while in public beta. Sign up at [longcat.chat](https://longcat.chat) with email or phone. Resets daily 00:00 UTC. ### ๐ŸŸข POLLINATIONS AI (No API Key Required) ๐Ÿ†• | Model | Prefix | Rate Limit | Provider Behind | | ---------- | ------ | ---------- | ------------------ | | `openai` | `pol/` | 1 req/15s | GPT-5 | | `claude` | `pol/` | 1 req/15s | Anthropic Claude | | `gemini` | `pol/` | 1 req/15s | Google Gemini | | `deepseek` | `pol/` | 1 req/15s | DeepSeek V3 | | `llama` | `pol/` | 1 req/15s | Meta Llama 4 Scout | | `mistral` | `pol/` | 1 req/15s | Mistral AI | > โœจ **Zero friction:** No signup, no API key. Add the Pollinations provider with an empty key field and it works immediately. ### ๐ŸŸ  CLOUDFLARE WORKERS AI (Free API Key โ€” cloudflare.com) ๐Ÿ†• | Tier | Daily Neurons | Equivalent Usage | Notes | | ---- | ------------- | --------------------------------------- | ----------------------- | | Free | **10,000** | ~150 LLM resp / 500s audio / 15K embeds | Global edge, 50+ models | Popular free models: `@cf/meta/llama-3.3-70b-instruct`, `@cf/google/gemma-3-12b-it`, `@cf/openai/whisper-large-v3-turbo` (free audio!), `@cf/qwen/qwen2.5-coder-15b-instruct` > Requires API Token + Account ID from [dash.cloudflare.com](https://dash.cloudflare.com). Store Account ID in provider settings. ### ๐ŸŸฃ SCALEWAY AI (1M Free Tokens โ€” scaleway.com) ๐Ÿ†• | Tier | Free Quota | Location | Notes | | ---- | ------------- | ------------ | ----------------------------------- | | Free | **1M tokens** | ๐Ÿ‡ซ๐Ÿ‡ท Paris, EU | No credit card needed within limits | Available free: `qwen3-235b-a22b-instruct-2507` (Qwen3 235B!), `llama-3.1-70b-instruct`, `mistral-small-3.2-24b-instruct-2506`, `deepseek-v3-0324` > EU/GDPR compliant. Get API key at [console.scaleway.com](https://console.scaleway.com). > **๐Ÿ’ก The Ultimate Free Stack (11 Providers, $0 Forever):** > > ``` > Kiro (kr/) โ†’ Claude Sonnet/Haiku UNLIMITED > Qoder (if/) โ†’ kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 UNLIMITED > LongCat Lite (lc/) โ†’ LongCat-Flash-Lite โ€” 50M tokens/day ๐Ÿ”ฅ > Pollinations (pol/) โ†’ GPT-5, Claude, DeepSeek, Llama 4 โ€” no key needed > Qwen (qw/) โ†’ qwen3-coder models UNLIMITED > Gemini (gemini/) โ†’ Gemini 2.5 Flash โ€” 1,500 req/day free > Cloudflare AI (cf/) โ†’ 50+ models โ€” 10K Neurons/day > Scaleway (scw/) โ†’ Qwen3 235B, Llama 70B โ€” 1M free tokens (EU) > Groq (groq/) โ†’ Llama/Gemma โ€” 14.4K req/day ultra-fast > NVIDIA NIM (nvidia/) โ†’ 70+ open models โ€” 40 RPM forever > Cerebras (cerebras/) โ†’ Llama/Qwen world-fastest โ€” 1M tok/day > ``` ## ๐ŸŽ™๏ธ Free Transcription Combo > Transcribe any audio/video for **$0** โ€” Deepgram leads with $200 free, AssemblyAI $50 fallback, Groq Whisper as unlimited emergency backup. | Provider | Free Credits | Best Model | Rate Limit | | ----------------- | ---------------------- | -------------------------------------------- | ---------------------------- | | ๐ŸŸข **Deepgram** | **$200 free** (signup) | `nova-3` โ€” best accuracy, 30+ languages | No RPM limit on free credits | | ๐Ÿ”ต **AssemblyAI** | **$50 free** (signup) | `universal-3-pro` โ€” chapters, sentiment, PII | No RPM limit on free credits | | ๐Ÿ”ด **Groq** | **Free forever** | `whisper-large-v3` โ€” OpenAI Whisper | 30 RPM (rate limited) | **Suggested combo in `/dashboard/combos`:** ``` Name: free-transcription Strategy: Priority Nodes: [1] deepgram/nova-3 โ†’ uses $200 free first [2] assemblyai/universal-3-pro โ†’ fallback when Deepgram credits run out [3] groq/whisper-large-v3 โ†’ free forever, emergency fallback ``` Then in `/dashboard/media` โ†’ **Transcription** tab: upload any audio or video file โ†’ select your combo endpoint โ†’ get transcription in supported formats. ## ๐Ÿ’ก Key Features OmniRoute v3.6 is built as an operational platform, not just a relay proxy. ### ๐Ÿ†• New โ€” v3.6.x Highlights (Apr 2026) | Feature | What It Does | | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | | ๐ŸŒ **V1 WebSocket Bridge** | OpenAI-compatible WebSocket traffic upgraded and proxied via `/v1/ws` โ€” full streaming over WS with session auth (API key or session cookie) | | ๐Ÿ”‘ **Sync Tokens & Config Bundle** | Issue/revoke sync tokens for config sync endpoints. Config bundles versioned with ETag for bandwidth-efficient polling | | ๐Ÿง  **GLM Thinking (glmt) Preset** | GLM Thinking registered first-class: 65 536 max tokens, 24 576 thinking budget, 900s timeout, usage sync & pricing โ€” Claude-compatible API | | ๐Ÿ”ข **Hybrid Token Counting** | Uses provider-side `/messages/count_tokens` when available; falls back to estimation โ€” accurate usage tracking without guessing | | ๐ŸŒฑ **Model Alias Auto-Seed** | 30+ cross-proxy dialect aliases normalised at startup โ€” no more routing mismatches | | ๐Ÿ›ก๏ธ **Safe Outbound Fetch** | All provider validation and model discovery go through a guarded fetch layer blocking private/local URLs with retry, timeout, and SSRF protection | | โณ **Wait For Cooldown** | Server-side chat retries when every candidate connection is cooling down; configurable `enabled`, `maxRetries`, and `maxRetryWaitSec` | | ๐Ÿ” **Runtime Env Validation** | Startup validates all env vars with Zod schemas โ€” clear errors for missing secrets, invalid URLs, or wrong types | | ๐Ÿ“‹ **Compliance Audit Expansion** | Structured audit logs with pagination, request context, auth events, provider CRUD events, and SSRF-blocked validation logging | | ๐Ÿ” **TPS Log Metric** | Log details modal shows Tokens Per Second (TPS) โ€” quick performance at-a-glance for every request | | ๐Ÿ—‘๏ธ **Uninstall / Full Uninstall** | `npm run uninstall` keeps data, `npm run uninstall:full` removes everything โ€” clean removal for all install methods | | ๐Ÿ”ง **OAuth Env Repair** | One-click "Repair env" action for OAuth providers restores missing env vars and fixes broken auth state | | ๐Ÿ”’ **Graceful Electron Shutdown** | Electron `before-quit` shuts down Next.js gracefully, preventing SQLite WAL database locks on desktop close | | ๐Ÿ‘๏ธ **Model Visibility Toggle** | Per-model visibility toggle (๐Ÿ‘ icon) with search filter and active-count badge (`N/M active`) on provider pages | | ๐Ÿ“ง **Email Privacy Masking** | OAuth account emails masked (`di*****@g****.com`), full address visible on hover | | ๐Ÿ”— **Context Relay Strategy** | Combo strategy preserving session continuity via structured handoff summaries when accounts rotate mid-conversation | | ๐Ÿ›ก๏ธ **Proxy Hardening** | Token health check, API key validation, and undici dispatcher all honor proxy config | | โš ๏ธ **Node.js 24 Login Warning** | Login page proactively detects incompatible Node.js versions and shows a clear warning banner | | ๐Ÿ“Ž **Gemini PDF Attachments** | PDF attachments correctly routed to Gemini via `inline_data` and generic base64 detection | | ๐Ÿ”’ **CodeQL Security Hardening** | Resolved SSRF, insecure randomness, polynomial ReDoS, and incomplete URL sanitization alerts | ### ๐Ÿ†• New โ€” ClawRouter-Inspired Improvements (Mar 2026) | Feature | What It Does | | ------------------------------------ | ------------------------------------------------------------------------------------------- | | โšก **Grok-4 Fast Family** | xAI models at $0.20/$0.50/M โ€” benchmarked 1143ms (30% faster than Gemini 2.5 Flash) | | ๐Ÿง  **GLM-5 via Z.AI** | 128K output context, $0.5/1M โ€” newest flagship from the GLM family | | ๐Ÿ”ฎ **MiniMax M2.5** | Reasoning + agentic tasks at $0.30/1M โ€” significant upgrade from M2.1 | | ๐ŸŽฏ **toolCalling Flag per Model** | Per-model `toolCalling: true/false` in registry โ€” AutoCombo skips non-tool-capable models | | ๐ŸŒ **Multilingual Intent Detection** | PT/ZH/ES/AR keywords in AutoCombo scoring โ€” better model selection for non-English content | | ๐Ÿ“Š **Benchmark-Driven Fallbacks** | Real p95 latency from live requests feeds combo scoring โ€” AutoCombo learns from actual data | | ๐Ÿ” **Request Deduplication** | Content-hash based dedup window โ€” multi-agent safe, prevents duplicate charges | | ๐Ÿ”Œ **Pluggable RouterStrategy** | Extensible `RouterStrategy` interface โ€” add custom routing logic as plugins | ### ๐Ÿš€ Previous v2.0.9+ โ€” Playground, CLI Fingerprints & ACP | Feature | What It Does | | ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ๐ŸŽฎ **Model Playground** | Dashboard page to test any model directly โ€” provider/model/endpoint selectors, Monaco Editor, streaming, abort, timing | | ๐Ÿ” **CLI Fingerprint Matching** | Per-provider header/body ordering to match native CLI signatures โ€” toggle per provider in Settings > Security. **Your proxy IP is preserved** | | ๐Ÿค **ACP Support (Agent Client Protocol)** | CLI agent discovery (Codex, Claude, Goose, Gemini CLI, OpenClaw + 9 more), process spawner, `/api/acp/agents` endpoint | | ๐Ÿค– **ACP Agents Dashboard** | Debug โ€บ Agents page โ€” grid of 14 agents with install status, version, custom agent form for any CLI tool. **OpenCode** users get a "Download opencode.json" button that auto-generates a ready-to-use config with all available models. | | ๐Ÿ”ง **Custom Model `apiFormat` Routing** | Custom models with `apiFormat: "responses"` now correctly route to the Responses API translator | | ๐Ÿข **Codex Workspace Isolation** | Multiple Codex workspaces per email โ€” OAuth correctly separates connections by workspace ID | | ๐Ÿ”„ **Electron Auto-Update** | Desktop app checks for updates + auto-install on restart | ### ๐Ÿค– Agent & Protocol Operations (v2.0) | Feature | What It Does | | ------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | | ๐Ÿ”ง **MCP Server (25 tools)** | IDE/agent tools via 3 transports: stdio, SSE (`/api/mcp/sse`), Streamable HTTP (`/api/mcp/stream`). 18 core + 3 memory + 4 skill tools | | ๐Ÿค **A2A Server (JSON-RPC + SSE)** | Agent-to-agent task execution with sync and streaming flows | | ๐Ÿงญ **Consolidated Endpoints Page** | Tabbed management page with Endpoint Proxy, MCP, A2A, and API Endpoints tabs | | ๐ŸŽš๏ธ **Service Enable/Disable Toggles** | ON/OFF switches for MCP and A2A with settings persistence (default: OFF) | | ๐Ÿ›ฐ๏ธ **MCP Runtime Heartbeat** | Real process status (pid, uptime, heartbeat age, transport, scope mode) | | ๐Ÿ“‹ **MCP Audit Trail** | Filterable audit logs with success/failure and key attribution | | ๐Ÿ” **MCP Scope Enforcement** | 10 granular scope permissions for controlled tool access | | ๐Ÿ“ก **A2A Task Lifecycle Management** | List/filter tasks, inspect events/artifacts, cancel running tasks | | ๐Ÿ“‹ **Agent Card Discovery** | `/.well-known/agent.json` for client auto-discovery | | ๐Ÿงช **Protocol E2E Test Harness** | Real MCP SDK + A2A client flows in `test:protocols:e2e` | | โš™๏ธ **Operational Controls** | Switch combos, tune resilience settings, and review breaker state from dedicated Health and Settings surfaces | ### ๐Ÿง  Routing & Intelligence | Feature | What It Does | | ---------------------------------- | ------------------------------------------------------------------------ | | ๐ŸŽฏ **Smart 4-Tier Fallback** | Auto-route: Subscription โ†’ API Key โ†’ Cheap โ†’ Free | | ๐Ÿ“Š **Real-Time Quota Tracking** | Live token count + reset countdown per provider | | ๐Ÿ”„ **Format Translation** | OpenAI โ†” Claude โ†” Gemini โ†” Responses with schema-safe conversions | | ๐Ÿ‘ฅ **Multi-Account Support** | Multiple accounts per provider with intelligent selection | | ๐Ÿ”„ **Auto Token Refresh** | OAuth tokens refresh automatically with retry | | ๐ŸŽจ **Custom Combos** | 13 balancing strategies + fallback chain control | | ๐Ÿ”— **Context Relay** | Session continuity handoffs when account rotation happens mid-session | | ๐ŸŒ **Wildcard Router** | `provider/*` dynamic routing | | ๐Ÿง  **Thinking Budget Controls** | Passthrough, auto, custom, and adaptive reasoning limits | | ๐Ÿ”€ **Model Aliases** | Built-in + custom model aliasing and migration safety | | โšก **Background Degradation** | Route low-priority background tasks to cheaper models | | ๐Ÿงช **Task-Aware Smart Routing** | Auto-select model by content type (coding/vision/analysis/summarization) | | ๐Ÿ”„ **A2A Agent Workflows** | Deterministic FSM orchestrator for stateful multi-step agent executions | | ๐Ÿ”€ **Adaptive Routing** | Dynamic strategy override based on token volume and prompt complexity | | ๐ŸŽฒ **Provider Diversity** | Shannon entropy scoring balancing auto-combo traffic distribution | | ๐Ÿ’ฌ **System Prompt Injection** | Global behavior controls applied consistently | | ๐Ÿ“„ **Responses API Compatibility** | Full `/v1/responses` support for Codex and advanced agentic workflows | ### ๐ŸŽต Multi-Modal APIs | Feature | What It Does | | -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ๐Ÿ–ผ๏ธ **Image Generation** | `/v1/images/generations` with cloud and local backends | | ๐Ÿ“ **Embeddings** | `/v1/embeddings` for search and RAG pipelines | | ๐ŸŽค **Audio Transcription** | `/v1/audio/transcriptions` โ€” 7 providers (Deepgram Nova 3, AssemblyAI, Groq Whisper, HuggingFace, ElevenLabs, OpenAI, Azure), auto-language detection, MP4/MP3/WAV support | | ๐Ÿ”Š **Text-to-Speech** | `/v1/audio/speech` โ€” 10 providers (ElevenLabs, OpenAI, Deepgram, Cartesia, PlayHT, HuggingFace, Nvidia NIM, Inworld, Coqui, Tortoise) with correct error messages | | ๐ŸŽฌ **Video Generation** | `/v1/videos/generations` (ComfyUI + SD WebUI workflows) | | ๐ŸŽต **Music Generation** | `/v1/music/generations` (ComfyUI workflows) | | ๐Ÿ›ก๏ธ **Moderations** | `/v1/moderations` safety checks | | ๐Ÿ”€ **Reranking** | `/v1/rerank` for relevance scoring | | ๐Ÿ” **Web Search** ๐Ÿ†• | `/v1/search` โ€” 5 providers (Serper, Brave, Perplexity, Exa, Tavily), 6,500+ free/month, auto-failover, cache | ### ๐Ÿ›ก๏ธ Resilience, Security & Governance | Feature | What It Does | | ----------------------------------- | ------------------------------------------------------------------------------------------------------- | | ๐Ÿ”Œ **Provider Circuit Breakers** | Provider-wide trip/recover after fallback exhaustion with configurable thresholds | | ๐Ÿ”’ **Daily Quota Lock** ๐Ÿ†• | Detects exhaustion signals and locks routing for the specific model until midnight | | ๐ŸŽฏ **Endpoint-Aware Models** | Custom models declare supported endpoints + API format | | ๐Ÿ›ก๏ธ **Anti-Thundering Herd** | Mutex + semaphore protections on retry/rate events | | ๐Ÿง  **Semantic + Signature Cache** | Cost/latency reduction with two cache layers | | โšก **Request Idempotency** | Duplicate protection window | | ๐Ÿ”’ **TLS Fingerprint Spoofing** | Browser-like TLS fingerprint โ€” **reduces bot detection and account flagging** | | ๐Ÿ” **CLI Fingerprint Matching** | Matches native CLI request signatures โ€” **reduces ban risk while preserving proxy IP** | | ๐ŸŒ **IP Filtering** | Allowlist/blocklist control for exposed deployments | | ๐Ÿšฆ **Request Queue & Pacing** | Configurable per-connection request buckets for RPM, spacing, concurrency, and max wait | | ๐Ÿ“‰ **Graceful Degradation** | Multi-layer capability fallbacks protecting core gateway operations | | ๐Ÿ“œ **Config Audit Trail** | Diff-based change tracking preventing operational drift with simple rollbacks | | โณ **Provider Health Sync** | Proactive token expiration monitoring triggering alerts before authorization failures | | โ„๏ธ **Connection Cooldown** | Retryable 408/429/5xx failures cool down a single connection with optional upstream hints | | ๐Ÿšช **Auto-Disable Banned Accounts** | Permanently blocked token accounts can be disabled automatically | | ๐Ÿ”‘ **API Key Management + Scoping** | Secure key issuance/rotation and model/provider controls | | ๐Ÿ‘๏ธ **Scoped API Key Reveal** ๐Ÿ†• | Opt-in recovery of API keys via `ALLOW_API_KEY_REVEAL` | | ๐Ÿ›ก๏ธ **Protected `/models`** | Optional auth gating and provider hiding for model catalog | | ๐Ÿ›ก๏ธ **Safe Outbound Fetch** ๐Ÿ†• | Guarded fetch for provider calls โ€” blocks private/local URLs, retries, SSRF protection | | โณ **Wait For Cooldown** ๐Ÿ†• | Auto-retry chat after connection cooldowns; configurable `enabled`, `maxRetries`, and `maxRetryWaitSec` | | ๐Ÿ” **Runtime Env Validation** ๐Ÿ†• | Zod-based env schema validation at startup with actionable error messages | | ๐Ÿ“‹ **Compliance Audit v2** ๐Ÿ†• | Pagination, request context, auth events, provider CRUD, and SSRF-blocked logging | ### ๐Ÿ“Š Observability & Analytics | Feature | What It Does | | -------------------------------- | ----------------------------------------------------- | | ๐Ÿ“ **Request + Proxy Logging** | Full request/response and proxy logging | | ๐Ÿ“‰ **Streamed Detailed Logs** | Reconstructs SSE payload streams cleanly into the UI | | ๐Ÿท๏ธ **Real-Time Model Badges** ๐Ÿ†• | Live model status and daily quota countdown timers | | ๐Ÿ“‹ **Unified Logs Dashboard** | Request, proxy, audit, and console views in one page | | ๐Ÿ” **Request Telemetry** | p50/p95/p99 latency and request tracing | | ๐Ÿฅ **Health Dashboard** | Uptime, breaker states, lockouts, cache stats | | ๐Ÿ’ฐ **Cost Tracking** | Budget controls and per-model pricing visibility | | ๐Ÿ“ˆ **Analytics Visualizations** | Model/provider usage insights and trend views | | ๐Ÿงช **Evaluation Framework** | Golden set testing with configurable match strategies | | ๐Ÿ“ก **Live Diagnostics** ๐Ÿ†• | Semantic cache bypass for accurate combo live testing | | ๐Ÿ” **TPS Log Metric** ๐Ÿ†• | Tokens Per Second badge in log details modal | ### โ˜๏ธ Deployment & Platform | Feature | What It Does | | ------------------------------ | --------------------------------------------------------------------- | | ๐ŸŒ **Deploy Anywhere** | Localhost, VPS, Docker, Cloud environments | | ๐Ÿš‡ **Cloudflare Tunnel** ๐Ÿ†• | One-click Quick Tunnel integration from the dashboard | | ๐Ÿ”‘ **API Key Model Filtering** | Native /v1/models response filtered via assigned Bearer context roles | | โšก **Smart Cache Bypass** | Configurable TTL heuristics and forced refetch controls | | ๐Ÿ”„ **Backup/Restore** | Export/import and disaster recovery flows | | ๐Ÿง™ **Onboarding Wizard** | First-run guided setup | | ๐Ÿ”ง **CLI Tools Dashboard** | One-click setup for popular coding tools | | ๐ŸŽฎ **Model Playground** | Test any provider/model/endpoint from the dashboard | | ๐Ÿ” **CLI Fingerprint Toggle** | Per-provider fingerprint matching in Settings > Security | | ๐ŸŒ **i18n (30 languages)** | Full dashboard + docs language support with RTL coverage | | ๐Ÿงน **Clear All Models** | One-click model list clearing in provider details | | ๐Ÿ‘๏ธ **Sidebar Controls** ๐Ÿ†• | Hide components and integrations from Appearance Settings | | ๐Ÿ“‹ **Issue Templates** | Standardized GitHub templates for bugs and features | | ๐Ÿ“‚ **Custom Data Directory** | `DATA_DIR` override for storage location | | ๐ŸŒ **V1 WebSocket Bridge** ๐Ÿ†• | OpenAI-compatible WebSocket traffic proxied via `/v1/ws` | | ๐Ÿ”‘ **Sync Tokens & Bundle** ๐Ÿ†• | Config sync tokens + versioned bundle endpoint with ETag support | ### Feature Deep Dive #### Smart fallback with practical cost control ```txt Combo: "my-coding-stack" 1. cc/claude-opus-4-7 2. nvidia/llama-3.3-70b 3. glm/glm-4.7 4. if/kimi-k2-thinking ``` When quota, rate, or health fails, OmniRoute automatically moves to the next candidate without manual switching. #### Protocol management that is visible and operable - MCP + A2A are discoverable in UI and docs (not hidden) - Protocol status APIs expose live operational data (`/api/mcp/*`, `/api/a2a/*`) - Dashboards include actions for day-2 ops (combo toggles, breaker resets, task cancellation) #### Translator + validation workflow The Translator area includes: - **Playground**: request transformation checks - **Chat Tester**: full request/response round-trip - **Test Bench**: multiple cases in one run - **Live Monitor**: real-time traffic view Plus protocol validation with real clients via `npm run test:protocols:e2e`. > ๐Ÿ“– **[MCP Server README](open-sse/mcp-server/README.md)** โ€” Tool reference, IDE configs, and client examples > > ๐Ÿ“– **[A2A Server README](src/lib/a2a/README.md)** โ€” Skills, JSON-RPC methods, streaming, and task lifecycle ## ๐Ÿงช Evaluations (Evals) OmniRoute includes a built-in evaluation framework to test LLM response quality against a golden set. Access it via **Analytics โ†’ Evals** in the dashboard. ### Built-in Golden Set The pre-loaded "OmniRoute Golden Set" contains test cases for: - Greetings, math, geography, code generation - JSON format compliance, translation, markdown generation - Safety refusal (harmful content), counting, boolean logic ### Evaluation Strategies | Strategy | Description | Example | | ---------- | ------------------------------------------------ | -------------------------------- | | `exact` | Output must match exactly | `"4"` | | `contains` | Output must contain substring (case-insensitive) | `"Paris"` | | `regex` | Output must match regex pattern | `"1.*2.*3"` | | `custom` | Custom JS function returns true/false | `(output) => output.length > 10` | --- ## ๐Ÿ“– Setup Guide ### Protocol Setup (MCP + A2A)
๐Ÿงฉ MCP Setup (Model Context Protocol) Start MCP transport in stdio mode: ```bash omniroute --mcp ``` Recommended validation flow: 1. Connect your MCP client over stdio. 2. Run `omniroute_get_health`. 3. Run `omniroute_list_combos`. 4. Open `/dashboard/mcp` to confirm heartbeat, activity, and audit. Useful APIs for automation: - `GET /api/mcp/status` - `GET /api/mcp/tools` - `GET /api/mcp/audit` - `GET /api/mcp/audit/stats`
๐Ÿค A2A Setup (Agent2Agent) Discover the agent: ```bash curl http://localhost:20128/.well-known/agent.json ``` Send a task: ```bash curl -X POST http://localhost:20128/a2a \ -H 'content-type: application/json' \ -d '{"jsonrpc":"2.0","id":"setup-a2a","method":"message/send","params":{"skill":"quota-management","messages":[{"role":"user","content":"Summarize quota status."}]}}' ``` Manage lifecycle: - `GET /api/a2a/status` - `GET /api/a2a/tasks` - `GET /api/a2a/tasks/:id` - `POST /api/a2a/tasks/:id/cancel` Operational UI: - `/dashboard/a2a` for task/state/stream observability and smoke actions
๐Ÿงช End-to-end protocol validation Validate both protocols with real clients: ```bash npm run test:protocols:e2e ``` This verifies: - MCP SDK client connect/list/call - A2A discovery/send/stream/get/cancel - Cross-check data in MCP audit and A2A task management APIs
๐Ÿ’ณ Subscription Providers ### Claude Code (Pro/Max) ```bash Dashboard โ†’ Providers โ†’ Connect Claude Code โ†’ OAuth login โ†’ Auto token refresh โ†’ 5-hour + weekly quota tracking Models: cc/claude-opus-4-7 cc/claude-sonnet-4-5-20250929 cc/claude-haiku-4-5-20251001 ``` **Pro Tip:** Use Opus for complex tasks, Sonnet for speed. OmniRoute tracks quota per model! ### OpenAI Codex (Plus/Pro) ```bash Dashboard โ†’ Providers โ†’ Connect Codex โ†’ OAuth login (port 1455) โ†’ 5-hour + weekly reset Models: cx/gpt-5.2-codex cx/gpt-5.1-codex-max ``` #### Codex Account Limit Management (5h + Weekly) Each Codex account now has policy toggles in `Dashboard -> Providers`: - `5h` (ON/OFF): enforce the 5-hour window threshold policy. - `Weekly` (ON/OFF): enforce the weekly window threshold policy. - Threshold behavior: when an enabled window reaches >=90% usage, that account is skipped. - Rotation behavior: OmniRoute routes to the next eligible Codex account automatically. - Reset behavior: when the provider `resetAt` time passes, the account becomes eligible again automatically. Scenarios: - `5h ON` + `Weekly ON`: account is skipped when either window reaches threshold. - `5h OFF` + `Weekly ON`: only weekly usage can block the account. - `5h ON` + `Weekly OFF`: only 5-hour usage can block the account. - `resetAt` passed: account re-enters rotation automatically (no manual re-enable). ### Gemini CLI (FREE 180K/month!) ```bash Dashboard โ†’ Providers โ†’ Connect Gemini CLI โ†’ Google OAuth โ†’ 180K completions/month + 1K/day Models: gc/gemini-3-flash-preview gc/gemini-2.5-pro ``` **Best Value:** Huge free tier! Use this before paid tiers. ### GitHub Copilot ```bash Dashboard โ†’ Providers โ†’ Connect GitHub โ†’ OAuth via GitHub โ†’ Monthly reset (1st of month) Models: gh/gpt-5 gh/claude-4.5-sonnet gh/gemini-3.1-pro-preview ```
๐Ÿ”‘ API Key Providers ### NVIDIA NIM (FREE developer access โ€” 70+ models) 1. Sign up: [build.nvidia.com](https://build.nvidia.com) 2. Get free API key (1000 inference credits included) 3. Dashboard โ†’ Add Provider โ†’ NVIDIA NIM: - API Key: `nvapi-your-key` **Models:** `nvidia/llama-3.3-70b-instruct`, `nvidia/mistral-7b-instruct`, and 50+ more **Pro Tip:** OpenAI-compatible API โ€” works seamlessly with OmniRoute's format translation! ### DeepSeek 1. Sign up: [platform.deepseek.com](https://platform.deepseek.com) 2. Get API key 3. Dashboard โ†’ Add Provider โ†’ DeepSeek **Models:** `deepseek/deepseek-chat`, `deepseek/deepseek-coder` ### Groq (Free Tier Available!) 1. Sign up: [console.groq.com](https://console.groq.com) 2. Get API key (free tier included) 3. Dashboard โ†’ Add Provider โ†’ Groq **Models:** `groq/llama-3.3-70b`, `groq/mixtral-8x7b` **Pro Tip:** Ultra-fast inference โ€” best for real-time coding! ### OpenRouter (100+ Models) 1. Sign up: [openrouter.ai](https://openrouter.ai) 2. Get API key 3. Dashboard โ†’ Add Provider โ†’ OpenRouter **Models:** Access 100+ models from all major providers through a single API key. **Dashboard behavior:** OpenRouter models are managed from **Available Models**. Manual add, import, and auto-sync all update the same list.
๐Ÿ’ฐ Cheap Providers (Backup) ### GLM-4.7 (Daily reset, $0.6/1M) 1. Sign up: [Zhipu AI](https://open.bigmodel.cn/) 2. Get API key from Coding Plan 3. Dashboard โ†’ Add API Key: - Provider: `glm` - API Key: `your-key` **Use:** `glm/glm-4.7` **Pro Tip:** Coding Plan offers 3ร— quota at 1/7 cost! Reset daily 10:00 AM. ### MiniMax M2.1 (5h reset, $0.20/1M) 1. Sign up: [MiniMax](https://www.minimax.io/) 2. Get API key 3. Dashboard โ†’ Add API Key **Use:** `minimax/MiniMax-M2.1` **Pro Tip:** Cheapest option for long context (1M tokens)! ### Kimi K2 ($9/month flat) 1. Subscribe: [Moonshot AI](https://platform.moonshot.ai/) 2. Get API key 3. Dashboard โ†’ Add API Key **Use:** `kimi/kimi-latest` **Pro Tip:** Fixed $9/month for 10M tokens = $0.90/1M effective cost!
๐Ÿ†“ FREE Providers (Emergency Backup) ### Qoder (5 FREE models via OAuth) ```bash Dashboard โ†’ Connect Qoder โ†’ Qoder OAuth login โ†’ Unlimited usage Models: if/kimi-k2-thinking if/qwen3-coder-plus if/glm-4.7 if/minimax-m2 if/deepseek-r1 ``` ### Qwen (4 FREE models via Device Code) ```bash Dashboard โ†’ Connect Qwen โ†’ Device code authorization โ†’ Unlimited usage Models: qw/qwen3-coder-plus qw/qwen3-coder-flash ``` ### Kiro (Claude FREE) ```bash Dashboard โ†’ Connect Kiro โ†’ AWS Builder ID or Google/GitHub โ†’ Unlimited usage Models: kr/claude-sonnet-4.5 kr/claude-haiku-4.5 ```
๐ŸŽจ Create Combos ### Example 1: Maximize Subscription โ†’ Cheap Backup ``` Dashboard โ†’ Combos โ†’ Create New Name: premium-coding Models: 1. cc/claude-opus-4-7 (Subscription primary) 2. glm/glm-4.7 (Cheap backup, $0.6/1M) 3. minimax/MiniMax-M2.1 (Cheapest fallback, $0.20/1M) Use in CLI: premium-coding ``` ### Example 2: Free-Only (Zero Cost) ``` Name: free-combo Models: 1. gc/gemini-3-flash-preview (180K free/month) 2. if/kimi-k2-thinking (unlimited) 3. qw/qwen3-coder-plus (unlimited) Cost: $0 forever! ```
๐Ÿ”ง CLI Integration ### Cursor IDE ``` Settings โ†’ Models โ†’ Advanced: OpenAI API Base URL: http://localhost:20128/v1 OpenAI API Key: [from OmniRoute dashboard] Model: cc/claude-opus-4-7 ``` ### Claude Code Use the **CLI Tools** page in the dashboard for one-click configuration, or edit `~/.claude/settings.json` manually. ### Codex CLI ```bash export OPENAI_BASE_URL="http://localhost:20128" export OPENAI_API_KEY="your-omniroute-api-key" codex "your prompt" ``` ### OpenClaw **Option 1 โ€” Dashboard (recommended):** ``` Dashboard โ†’ CLI Tools โ†’ OpenClaw โ†’ Select Model โ†’ Apply ``` **Option 2 โ€” Manual:** Edit `~/.openclaw/openclaw.json`: ```json { "models": { "providers": { "omniroute": { "baseUrl": "http://127.0.0.1:20128/v1", "apiKey": "sk_omniroute", "api": "openai-completions" } } } } ``` > **Note:** OpenClaw only works with local OmniRoute. Use `127.0.0.1` instead of `localhost` to avoid IPv6 resolution issues. ### Cline / Continue / RooCode ``` Settings โ†’ API Configuration: Provider: OpenAI Compatible Base URL: http://localhost:20128/v1 API Key: [from OmniRoute dashboard] Model: if/kimi-k2-thinking ``` ### OpenCode **Step 1:** Add OmniRoute as a custom provider: ```bash opencode /connect # Select "Other" โ†’ Enter ID: "omniroute" โ†’ Enter your OmniRoute API key ``` **Step 2:** Create/edit `opencode.json` in your project root: ```json { "$schema": "https://opencode.ai/config.json", "provider": { "omniroute": { "npm": "@ai-sdk/openai-compatible", "name": "OmniRoute", "options": { "baseURL": "http://localhost:20128/v1" }, "models": { "cc/claude-sonnet-4-20250514": { "name": "Claude Sonnet 4" }, "gg/gemini-2.5-pro": { "name": "Gemini 2.5 Pro" }, "if/kimi-k2-thinking": { "name": "Kimi K2 (Free)" } } } } } ``` **Step 3:** Select the model in OpenCode: ```bash /models # Select any OmniRoute model from the list ``` > **Tip:** Add any model available in your OmniRoute `/v1/models` endpoint to the `models` section. Use the format `provider/model-id` from your OmniRoute dashboard.
--- ## Soluciรณn de Problemas
Click to expand troubleshooting guide **"Language model did not provide messages"** - Provider quota exhausted โ†’ Check dashboard quota tracker - Solution: Use combo fallback or switch to cheaper tier **Rate limiting** - Subscription quota out โ†’ Fallback to GLM/MiniMax - Add combo: `cc/claude-opus-4-7 โ†’ glm/glm-4.7 โ†’ if/kimi-k2-thinking` **OAuth token expired** - Auto-refreshed by OmniRoute - If issues persist: Dashboard โ†’ Provider โ†’ Reconnect **High costs** - Check usage stats in Dashboard โ†’ Costs - Switch primary model to GLM/MiniMax - Use free tier (Gemini CLI, Qoder) for non-critical tasks **Dashboard/API ports are wrong** - `PORT` is the canonical base port (and API port by default) - `API_PORT` overrides only OpenAI-compatible API listener - `DASHBOARD_PORT` overrides only dashboard/Next.js listener - Set `NEXT_PUBLIC_BASE_URL` to your dashboard/public URL (for OAuth callbacks) **Cloud sync errors** - Verify `BASE_URL` points to your running instance - Verify `CLOUD_URL` points to your expected cloud endpoint - Keep `NEXT_PUBLIC_*` values aligned with server-side values **First login not working** - Check `INITIAL_PASSWORD` in `.env` - If unset, fallback password is `123456` **No request logs** - `call_logs` in SQLite stores summary metadata for the Request Logs table and analytics views - Detailed request/response payloads are written to `DATA_DIR/call_logs/` as one JSON artifact per request - Enable pipeline capture from Dashboard โ†’ Logs โ†’ Request Logs if you need detailed per-stage payloads - `Export Logs` reads the artifact files on demand, while `Export All` includes the `call_logs/` directory alongside `storage.sqlite` - Set `APP_LOG_TO_FILE=true` if you also want application console logs in `logs/application/app.log` - Adjust `APP_LOG_MAX_FILE_SIZE`, `APP_LOG_RETENTION_DAYS`, `APP_LOG_MAX_FILES`, and `CALL_LOG_MAX_ENTRIES` as needed **Connection test shows "Invalid" for OpenAI-compatible providers** - Many providers don't expose a `/models` endpoint - OmniRoute v1.0.6+ includes fallback validation via chat completions - Ensure base URL includes `/v1` suffix ### ๐Ÿ” OAuth on a Remote Server > **โš ๏ธ Important for users running OmniRoute on a VPS, Docker, or any remote server** #### Why does Antigravity / Gemini CLI OAuth fail on remote servers? The **Antigravity** and **Gemini CLI** providers use **Google OAuth 2.0**. Google requires the `redirect_uri` in the OAuth flow to exactly match one of the pre-registered URIs in the app's Google Cloud Console. The OAuth credentials bundled in OmniRoute are registered **for `localhost` only**. When you access OmniRoute on a remote server (e.g. `https://omniroute.myserver.com`), Google rejects the authentication with: ``` Error 400: redirect_uri_mismatch ``` #### Solution: Configure your own OAuth credentials You need to create an **OAuth 2.0 Client ID** in Google Cloud Console with your server's URI. #### Step-by-step **1. Open Google Cloud Console** Go to: [https://console.cloud.google.com/apis/credentials](https://console.cloud.google.com/apis/credentials) **2. Create a new OAuth 2.0 Client ID** - Click **"+ Create Credentials"** โ†’ **"OAuth client ID"** - Application type: **"Web application"** - Name: anything you like (e.g. `OmniRoute Remote`) **3. Add Authorized Redirect URIs** In the **"Authorized redirect URIs"** field, add: ``` https://your-server.com/callback ``` > Replace `your-server.com` with your server's domain or IP (include the port if needed, e.g. `http://45.33.32.156:20128/callback`). **4. Save and copy the credentials** After creating, Google will show the **Client ID** and **Client Secret**. **5. Set environment variables** In your `.env` (or Docker environment variables): ```bash # For Antigravity: ANTIGRAVITY_OAUTH_CLIENT_ID=your-client-id.apps.googleusercontent.com ANTIGRAVITY_OAUTH_CLIENT_SECRET=GOCSPX-your-secret # For Gemini CLI: GEMINI_OAUTH_CLIENT_ID=your-client-id.apps.googleusercontent.com GEMINI_OAUTH_CLIENT_SECRET=GOCSPX-your-secret GEMINI_CLI_OAUTH_CLIENT_SECRET=GOCSPX-your-secret ``` **6. Restart OmniRoute** ```bash # npm: npm run dev # Docker: docker restart omniroute ``` **7. Try connecting again** Dashboard โ†’ Providers โ†’ Antigravity (or Gemini CLI) โ†’ OAuth Google will now redirect correctly to `https://your-server.com/callback`. --- #### Temporary workaround (without custom credentials) If you don't want to set up your own credentials right now, you can still use the **manual URL flow**: 1. OmniRoute opens the Google authorization URL 2. After authorizing, Google tries to redirect to `localhost` (which fails on the remote server) 3. **Copy the full URL** from your browser's address bar (even if the page doesn't load) 4. Paste that URL into the field shown in the OmniRoute connection modal 5. Click **"Connect"** > This works because the authorization code in the URL is valid regardless of whether the redirect page loaded. ---
๐Ÿ‡ง๐Ÿ‡ท Versรฃo em Portuguรชs #### Por que o OAuth do Antigravity / Gemini CLI falha em servidores remotos? Os provedores **Antigravity** e **Gemini CLI** usam **Google OAuth 2.0** para autenticaรงรฃo. O Google exige que a `redirect_uri` usada no fluxo OAuth seja **exatamente** uma das URIs prรฉ-cadastradas no Google Cloud Console do aplicativo. As credenciais OAuth embutidas no OmniRoute estรฃo cadastradas **apenas para `localhost`**. Quando vocรช acessa o OmniRoute em um servidor remoto (ex: `https://omniroute.meuservidor.com`), o Google rejeita a autenticaรงรฃo com: ``` Error 400: redirect_uri_mismatch ``` #### Soluรงรฃo: Configure suas prรณprias credenciais OAuth Vocรช precisa criar um **OAuth 2.0 Client ID** no Google Cloud Console com a URI do seu servidor. #### Passo a passo **1. Acesse o Google Cloud Console** Abra: [https://console.cloud.google.com/apis/credentials](https://console.cloud.google.com/apis/credentials) **2. Crie um novo OAuth 2.0 Client ID** - Clique em **"+ Create Credentials"** โ†’ **"OAuth client ID"** - Tipo de aplicativo: **"Web application"** - Nome: escolha qualquer nome (ex: `OmniRoute Remote`) **3. Adicione as Authorized Redirect URIs** No campo **"Authorized redirect URIs"**, adicione: ``` https://seu-servidor.com/callback ``` > Substitua `seu-servidor.com` pelo domรญnio ou IP do seu servidor (inclua a porta se necessรกrio, ex: `http://45.33.32.156:20128/callback`). **4. Salve e copie as credenciais** Apรณs criar, o Google mostrarรก o **Client ID** e o **Client Secret**. **5. Configure as variรกveis de ambiente** No seu `.env` (ou nas variรกveis de ambiente do Docker): ```bash # Para Antigravity: ANTIGRAVITY_OAUTH_CLIENT_ID=seu-client-id.apps.googleusercontent.com ANTIGRAVITY_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret # Para Gemini CLI: GEMINI_OAUTH_CLIENT_ID=seu-client-id.apps.googleusercontent.com GEMINI_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret GEMINI_CLI_OAUTH_CLIENT_SECRET=GOCSPX-seu-secret ``` **6. Reinicie o OmniRoute** ```bash # Se usando npm: npm run dev # Se usando Docker: docker restart omniroute ``` **7. Tente conectar novamente** Dashboard โ†’ Providers โ†’ Antigravity (ou Gemini CLI) โ†’ OAuth Agora o Google redirecionarรก corretamente para `https://seu-servidor.com/callback` e a autenticaรงรฃo funcionarรก. --- #### Workaround temporรกrio (sem configurar credenciais prรณprias) Se nรฃo quiser criar credenciais prรณprias agora, ainda รฉ possรญvel usar o fluxo **manual de URL**: 1. O OmniRoute abrirรก a URL de autorizaรงรฃo do Google 2. Apรณs vocรช autorizar, o Google tentarรก redirecionar para `localhost` (que falha no servidor remoto) 3. **Copie a URL completa** da barra de endereรงo do seu browser (mesmo que a pรกgina nรฃo carregue) 4. Cole essa URL no campo que aparece no modal de conexรฃo do OmniRoute 5. Clique em **"Connect"** > Este workaround funciona porque o cรณdigo de autorizaรงรฃo na URL รฉ vรกlido independente do redirect ter carregado ou nรฃo.
---
## ๐Ÿ› ๏ธ Tech Stack
Click to expand tech stack details - **Runtime**: Node.js 18โ€“22 LTS (โš ๏ธ Node.js 24+ is **not supported** โ€” `better-sqlite3` native binaries are incompatible) - **Language**: TypeScript 5.9 โ€” **100% TypeScript** across `src/` and `open-sse/` (zero `any` in core modules since v2.0) - **Framework**: Next.js 16 + React 19 + Tailwind CSS 4 - **Database**: better-sqlite3 (SQLite) + LowDB (JSON legacy) โ€” domain state, proxy logs, MCP audit, routing decisions, memory, skills - **Schemas**: Zod (MCP tool I/O validation, API contracts) - **Protocols**: MCP (stdio/HTTP) + A2A v0.3 (JSON-RPC 2.0 + SSE) - **Streaming**: Server-Sent Events (SSE) - **Auth**: OAuth 2.0 (PKCE) + JWT + API Keys + MCP Scoped Authorization - **Testing**: Node.js test runner + Vitest (900+ tests including unit, integration, E2E) - **CI/CD**: GitHub Actions (auto npm publish + Docker Hub on release) - **Website**: [omniroute.online](https://omniroute.online) - **Package**: [npmjs.com/package/omniroute](https://www.npmjs.com/package/omniroute) - **Docker**: [hub.docker.com/r/diegosouzapw/omniroute](https://hub.docker.com/r/diegosouzapw/omniroute) - **Resilience**: Circuit breaker, exponential backoff, anti-thundering herd, TLS spoofing, auto-combo self-healing
--- ## Documentaciรณn | Document | Description | | -------------------------------------------------------- | --------------------------------------------------- | | [User Guide](docs/USER_GUIDE.md) | Providers, combos, CLI integration, deployment | | [API Reference](docs/API_REFERENCE.md) | All endpoints with examples | | [MCP Server](open-sse/mcp-server/README.md) | 25 MCP tools, IDE configs, Python/TS/Go clients | | [A2A Server](src/lib/a2a/README.md) | JSON-RPC 2.0 protocol, skills, streaming, task mgmt | | [Auto-Combo Engine](docs/auto-combo.md) | 6-factor scoring, mode packs, self-healing | | [Context Relay](docs/features/context-relay.md) | Session handoff strategy for account rotation | | [Troubleshooting](docs/TROUBLESHOOTING.md) | Common problems and solutions | | [Architecture](docs/ARCHITECTURE.md) | System architecture and internals | | [Codebase Documentation](docs/CODEBASE_DOCUMENTATION.md) | Beginner-friendly codebase walkthrough | | [Uninstall Guide](docs/UNINSTALL.md) | Clean removal for all install methods | | [Environment Config](docs/ENVIRONMENT.md) | Complete `.env` variables and references | | [Contributing](CONTRIBUTING.md) | Development setup and guidelines | | [OpenAPI Spec](docs/openapi.yaml) | OpenAPI 3.0 specification | | [Security Policy](SECURITY.md) | Vulnerability reporting and security practices | | [VM Deployment](docs/VM_DEPLOYMENT_GUIDE.md) | Complete guide: VM + nginx + Cloudflare setup | | [Features Gallery](docs/FEATURES.md) | Visual dashboard tour with screenshots | | [Release Checklist](docs/RELEASE_CHECKLIST.md) | Pre-release validation steps | --- ## ๐Ÿ—บ๏ธ Roadmap OmniRoute has **218+ features planned** across multiple development phases. Here are the key areas: | Category | Planned Features | Highlights | | ----------------------------- | ---------------- | ----------------------------------------------------------------------------------------------------- | | ๐Ÿง  **Routing & Intelligence** | 25+ | Lowest-latency routing, tag-based routing, quota preflight, quota-aware P2C, step-based combo routing | | ๐Ÿ”’ **Security & Compliance** | 20+ | SSRF hardening, credential cloaking, rate-limit per endpoint, management key scoping | | ๐Ÿ“Š **Observability** | 15+ | OpenTelemetry integration, real-time quota monitoring, combo target health, cost tracking per model | | ๐Ÿ”„ **Provider Integrations** | 20+ | Dynamic model registry, connection cooldowns, multi-account Codex, Copilot quota parsing | | โšก **Performance** | 15+ | Dual cache layer, prompt cache, response cache, streaming keepalive, batch API | | ๐ŸŒ **Ecosystem** | 10+ | WebSocket API, config hot-reload, distributed config store, commercial mode | ### ๐Ÿ”œ Coming Soon - ๐Ÿ”— **OpenCode Integration** โ€” Native provider support for the OpenCode AI coding IDE - ๐Ÿ”— **TRAE Integration** โ€” Full support for the TRAE AI development framework - ๐Ÿ“ฆ **Batch API** โ€” Asynchronous batch processing for bulk requests - ๐ŸŽฏ **Tag-Based Routing** โ€” Route requests based on custom tags and metadata - ๐Ÿ’ฐ **Lowest-Cost Strategy** โ€” Automatically select the cheapest available provider > ๐Ÿ“ Full feature specifications available in [`docs/new-features/`](docs/new-features/) (217 detailed specs) --- ## ๐Ÿ‘ฅ Contributors [![Contributors](https://contrib.rocks/image?repo=diegosouzapw/OmniRoute&max=100&columns=20&anon=1)](https://github.com/diegosouzapw/OmniRoute/graphs/contributors) ### How to Contribute 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines. ### Releasing a New Version ```bash # Create a release โ€” npm publish happens automatically gh release create v2.0.0 --title "v2.0.0" --generate-notes ``` --- ## ๐Ÿ“Š Star History Star History Chart ## ๐ŸŒ StarMapper StarMapper ## ๐Ÿ™ Acknowledgments Special thanks to **[9router](https://github.com/decolua/9router)** by **[decolua](https://github.com/decolua)** โ€” the original project that inspired this fork. OmniRoute builds upon that incredible foundation with additional features, multi-modal APIs, and a full TypeScript rewrite. Special thanks to **[CLIProxyAPI](https://github.com/router-for-me/CLIProxyAPI)** โ€” the original Go implementation that inspired this JavaScript port. --- ## Licencia MIT License - see [LICENSE](LICENSE) for details. ---
Built with โค๏ธ for developers who code 24/7
omniroute.online