mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-25 15:03:46 +00:00
SSE Proxy Decoupling (ADR-130): - Fix ruvbrain-sse proxy: proper MCP handshake, session creation, drain polling - Fix internal queue endpoints: session_create keeps receiver, drain returns buffered messages - Add response_queues to AppState for SSE proxy communication - Skip sparsifier for >5M edge graphs (was crashing on 16M edges) - Add SSE_DISABLED/MAX_SSE env vars for configurable connection limits - Route SSE to dedicated mcp.pi.ruv.io subdomain (Cloudflare CNAME) - Serve SSE at root / path on proxy (no /sse needed) - Update all references from pi.ruv.io/sse to mcp.pi.ruv.io - Fix Dockerfile consciousness crate build (feature/version mismatches) Claude Code CLI Source Research (ADR-133): - 19 research documents analyzing Claude Code internals (3000+ lines) - Decompiler script + RVF corpus builder for all major versions - Binary RVF containers for v0.2, v1.0, v2.0, v2.1 (300-2068 vectors each) - Call graphs, class hierarchies, state machines from minified source Integration Strategy (ADR-134): - 6-tier integration plan: WASM MCP, agents, hooks, cache, SDK, plugin - Integration guide with architecture diagrams and performance targets Co-Authored-By: claude-flow <ruv@ruv.net>
5.7 KiB
5.7 KiB
Claude Code CLI: Models and API Integration
Supported Models
Model References Found in Source
| Model ID | Family | Notes |
|---|---|---|
claude-opus-4-6 |
Opus | Latest (current default for complex) |
claude-opus-4-5 |
Opus | |
claude-opus-4-5-20251101 |
Opus | Dated release |
claude-opus-4-1 |
Opus | |
claude-opus-4-1-20250805 |
Opus | Dated release |
claude-opus-4-0 |
Opus | |
claude-opus-4 |
Opus | Alias |
claude-opus-4-20250514 |
Opus | Dated release |
claude-4-opus-20250514 |
Opus | Legacy naming |
claude-sonnet-4-6 |
Sonnet | Latest Sonnet |
claude-sonnet-4-5 |
Sonnet | |
claude-sonnet-4-5-20250929 |
Sonnet | Dated release |
claude-sonnet-4 |
Sonnet | Alias |
claude-sonnet-4-20250514 |
Sonnet | Dated release |
claude-sonnet-3-7 |
Sonnet | Legacy |
claude-3-7-sonnet-20250219 |
Sonnet | Legacy naming |
claude-3-5-sonnet-20241022 |
Sonnet | Legacy |
claude-3-sonnet-20240229 |
Sonnet | Legacy |
claude-haiku-4-5 |
Haiku | |
claude-haiku-4-5-20251001 |
Haiku | Dated release |
claude-haiku-4 |
Haiku | Alias |
claude-haiku-3-5 |
Haiku | Legacy |
claude-3-5-haiku-20241022 |
Haiku | Legacy naming |
claude-instant-1.1 |
Instant | Legacy |
claude-instant-1.2 |
Instant | Legacy |
claude-code-20250219 |
Code | Specialized code model |
claude-3-opus-20240229 |
Opus | Legacy |
Model Selection
| Mechanism | Priority | Description |
|---|---|---|
--model CLI flag |
Highest | Runtime override |
ANTHROPIC_MODEL env var |
High | Environment override |
model in settings |
Medium | Persistent config |
availableModels allowlist |
- | Restricts options |
mainLoopModel |
- | Internal selection |
| Built-in default | Lowest | Fallback |
Model Aliases
Users can specify short aliases: "opus", "sonnet", "haiku".
The availableModels allowlist accepts these aliases, which map to
the latest model in each family.
Model Overrides
modelOverrides maps Anthropic model IDs to provider-specific IDs:
{
"modelOverrides": {
"claude-sonnet-4-6": "us.anthropic.claude-sonnet-4-6-v1:0"
}
}
API Integration
Anthropic Direct API
- Endpoint:
ANTHROPIC_BASE_URL(default:https://api.anthropic.com) - Authentication:
ANTHROPIC_API_KEYor OAuth token - Unix socket:
ANTHROPIC_UNIX_SOCKETfor local proxying
API Endpoints Used
| Endpoint | Purpose |
|---|---|
/v1/messages |
Main conversation API (streaming) |
/v1/messages/count_tokens |
Token counting |
/v1/messages/batches |
Batch processing |
/v1/models |
Model listing |
/v1/complete |
Legacy completion |
/v1/token |
Token validation |
/v1/files |
File management |
/v1/code/upstreamproxy/ws |
WebSocket proxy |
/v2/session_ingress/shttp/mcp/ |
MCP session ingress |
Provider Backends
| Provider | Client | Auth |
|---|---|---|
| Anthropic Direct | Anthropic (SDK) |
API key / OAuth |
| AWS Bedrock | BedrockClient / BedrockRuntimeClient |
AWS IAM |
| Google Vertex AI | Native HTTP | GCP credentials |
| Azure Foundry | Native HTTP | Azure credentials |
| Anthropic AWS | AnthropicAws |
Hybrid auth |
Prompt Caching
Anthropic's prompt caching reduces repeat token costs:
cache_control: { type: "ephemeral" }-- Standard cache- 1-hour cache on Bedrock (
ENABLE_PROMPT_CACHING_1H_BEDROCK) - Per-model disable:
DISABLE_PROMPT_CACHING_HAIKU/SONNET/OPUS - Cache sharing:
promptCacheSharingEnabled - Token tracking:
promptCacheReadTokens,promptCacheWriteTokens
Retry and Fallback
CLAUDE_CODE_MAX_RETRIES-- Max API retry count--fallback-model-- Fallback model for overload (print mode only)FALLBACK_FOR_ALL_PRIMARY_MODELS-- Universal fallbackCLAUDE_CODE_SKIP_FAST_MODE_NETWORK_ERRORS-- Skip on network errorsCLAUDE_CODE_DISABLE_NONSTREAMING_FALLBACK-- Disable non-streaming fallback
Request Customization
ANTHROPIC_BETAS/--betas-- Beta feature headersANTHROPIC_CUSTOM_HEADERS-- Custom request headersCLAUDE_CODE_EXTRA_BODY-- Extra request body fieldsCLAUDE_CODE_EXTRA_METADATA-- Extra metadataAPI_TIMEOUT_MS-- Request timeoutCLAUDE_CODE_ATTRIBUTION_HEADER-- Attribution headerCLAUDE_CODE_STALL_TIMEOUT_MS_FOR_TESTING-- Stall detection
Structured Output
--json-schema-- Enforce JSON schema on outputMAX_STRUCTURED_OUTPUT_RETRIES-- Retry on schema validation failure
Token Management
MAX_THINKING_TOKENS-- Cap thinking tokensCLAUDE_CODE_MAX_OUTPUT_TOKENS-- Cap output tokensCLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS-- File read budgetCLAUDE_CODE_BLOCKING_LIMIT_OVERRIDE-- Rate limit override- Token counting via
/v1/messages/count_tokens
Effort Level
--effortCLI flag:low,medium,high,maxCLAUDE_CODE_EFFORT_LEVELenv vareffortLevelin settingsCLAUDE_CODE_ALWAYS_ENABLE_EFFORT-- Always apply effort level/effortslash command -- Interactive change
Thinking/Reasoning
alwaysThinkingEnabled-- Enable extended thinkingCLAUDE_CODE_DISABLE_THINKING-- Disable thinkingCLAUDE_CODE_DISABLE_ADAPTIVE_THINKING-- Disable adaptive modeDISABLE_INTERLEAVED_THINKING-- Disable interleaved thinkingMAX_THINKING_TOKENS-- Token budget for thinkingCLAUDE_CODE_DISABLE_FAST_MODE-- Disable fast mode shortcut
Fast Mode
Fast mode uses smaller/faster models for simple operations:
fastModesetting (boolean)fastModePerSessionOptIn-- Opt-in per session/fastslash command -- ToggleANTHROPIC_SMALL_FAST_MODEL-- Custom small modelANTHROPIC_SMALL_FAST_MODEL_AWS_REGION-- Region for small model