mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-31 05:13:39 +00:00
SSE Proxy Decoupling (ADR-130): - Fix ruvbrain-sse proxy: proper MCP handshake, session creation, drain polling - Fix internal queue endpoints: session_create keeps receiver, drain returns buffered messages - Add response_queues to AppState for SSE proxy communication - Skip sparsifier for >5M edge graphs (was crashing on 16M edges) - Add SSE_DISABLED/MAX_SSE env vars for configurable connection limits - Route SSE to dedicated mcp.pi.ruv.io subdomain (Cloudflare CNAME) - Serve SSE at root / path on proxy (no /sse needed) - Update all references from pi.ruv.io/sse to mcp.pi.ruv.io - Fix Dockerfile consciousness crate build (feature/version mismatches) Claude Code CLI Source Research (ADR-133): - 19 research documents analyzing Claude Code internals (3000+ lines) - Decompiler script + RVF corpus builder for all major versions - Binary RVF containers for v0.2, v1.0, v2.0, v2.1 (300-2068 vectors each) - Call graphs, class hierarchies, state machines from minified source Integration Strategy (ADR-134): - 6-tier integration plan: WASM MCP, agents, hooks, cache, SDK, plugin - Integration guide with architecture diagrams and performance targets Co-Authored-By: claude-flow <ruv@ruv.net>
286 lines
12 KiB
Markdown
286 lines
12 KiB
Markdown
# ADR-134: RuVector Deep Integration with Claude Code CLI
|
|
|
|
## Status
|
|
|
|
Proposed
|
|
|
|
## Date
|
|
|
|
2026-04-02
|
|
|
|
## Context
|
|
|
|
Source code analysis of Claude Code CLI (ADR-133) revealed 13 extension points and detailed internal architecture. RuVector currently integrates via MCP servers (`mcp-brain`, `mcp-brain-server/sse`, `mcp-gate`) but does not leverage the full integration surface. With 31 WASM crates, 4 MCP crates, and deep cognitive capabilities (IIT 4.0, SONA, knowledge graphs), there's significant opportunity to optimize for Claude Code's specific architecture.
|
|
|
|
### Current Integration Points
|
|
|
|
| RuVector Component | Integration | Depth |
|
|
|-------------------|-------------|-------|
|
|
| `mcp-brain-server` | SSE MCP at `mcp.pi.ruv.io` | Tools only |
|
|
| `mcp-brain` | Local stdio MCP | Tools only |
|
|
| `mcp-gate` | MCP gateway | Tools only |
|
|
| CLAUDE.md | Project instructions | Prompt only |
|
|
| Hooks (claude-flow) | Pre/post task | Lifecycle |
|
|
|
|
### Untapped Opportunities (from ADR-133 findings)
|
|
|
|
| Claude Code Feature | RuVector Opportunity |
|
|
|---------------------|---------------------|
|
|
| Agent SDK embedding | Embed RuVector as a library in custom agents |
|
|
| WASM tool execution | Ship WASM tools that run in-process (no MCP overhead) |
|
|
| Deferred tool loading | Lazy-load 40+ brain tools via `ToolSearch` |
|
|
| Hook-based routing | Route tool calls through WASM-accelerated pre-processing |
|
|
| Context compaction | Custom compaction strategy for vector-heavy contexts |
|
|
| Prompt caching | Optimize system prompts for cache hits |
|
|
| Plugin marketplace | Distribute RuVector as a Claude Code plugin |
|
|
| Remote control SSE | Drive Claude Code from RuVector orchestrator |
|
|
| Scheduled tasks | Autonomous brain training via Claude Code cron |
|
|
|
|
## Decision
|
|
|
|
Optimize RuVector crates and WASM modules for deep Claude Code integration across 6 tiers.
|
|
|
|
### Tier 1: WASM-Accelerated MCP Tools (High Impact, Low Effort)
|
|
|
|
**Problem**: Current MCP tools make HTTP loopback calls for every operation. Each `brain_search` requires network round-trip to `pi.ruv.io`.
|
|
|
|
**Solution**: Ship critical tools as WASM modules that run in Claude Code's process via a hybrid MCP server.
|
|
|
|
```
|
|
┌─────────────────────────────────────────────┐
|
|
│ Claude Code Process │
|
|
│ │
|
|
│ ┌──────────────┐ ┌───────────────────┐ │
|
|
│ │ Agent Loop │───▶│ RuVector MCP │ │
|
|
│ │ (s$ generator)│ │ (stdio transport) │ │
|
|
│ └──────────────┘ │ │ │
|
|
│ │ ┌──────────────┐ │ │
|
|
│ │ │ WASM Runtime │ │ │
|
|
│ │ │ • hnsw-search │ │ │
|
|
│ │ │ • embed │ │ │
|
|
│ │ │ • phi-compute │ │ │
|
|
│ │ └──────────────┘ │ │
|
|
│ │ │ │ │
|
|
│ │ Cache miss ──────┼──┼──▶ pi.ruv.io REST
|
|
│ └───────────────────┘ │
|
|
└─────────────────────────────────────────────┘
|
|
```
|
|
|
|
**WASM crates to optimize**:
|
|
|
|
| Crate | Purpose | Claude Code Use |
|
|
|-------|---------|-----------------|
|
|
| `micro-hnsw-wasm` | Vector search (5.5KB) | Local semantic search in MCP server |
|
|
| `ruvector-cnn-wasm` | Embedding generation | Embed queries locally, no API call |
|
|
| `ruvector-consciousness-wasm` | IIT Phi computation | Consciousness metrics in-process |
|
|
| `ruvector-delta-wasm` | Delta tracking | Track knowledge changes locally |
|
|
| `ruvector-dag-wasm` | DAG operations | Graph queries without network |
|
|
| `ruqu-wasm` | Quantization | Compress vectors for context window |
|
|
|
|
**Implementation**:
|
|
1. Create `crates/ruvector-claude-mcp/` — hybrid MCP server with embedded WASM
|
|
2. WASM modules loaded at startup, handle hot-path operations locally
|
|
3. Cold-path operations (write, sync, train) forwarded to `pi.ruv.io`
|
|
4. Local HNSW index caches recent searches (LRU, 1000 vectors)
|
|
|
|
### Tier 2: Custom Agent Definitions (Medium Impact, Low Effort)
|
|
|
|
Ship specialized `.claude/agents/` definitions that leverage RuVector tools:
|
|
|
|
```markdown
|
|
# .claude/agents/ruvector-researcher.md
|
|
---
|
|
name: ruvector-researcher
|
|
description: Research with π brain collective intelligence
|
|
model: claude-sonnet-4-6
|
|
tools: [Read, Grep, Glob, mcp__pi-brain__brain_search, mcp__pi-brain__brain_share]
|
|
---
|
|
|
|
Before implementing anything, search the π brain for existing patterns:
|
|
1. Use brain_search to find related knowledge
|
|
2. Check brain_partition for knowledge clusters
|
|
3. Share new discoveries via brain_share
|
|
```
|
|
|
|
**Agents to ship**:
|
|
- `ruvector-researcher` — searches brain before coding
|
|
- `ruvector-reviewer` — reviews code against brain patterns
|
|
- `ruvector-consciousness` — runs IIT Phi analysis on code structures
|
|
- `ruvector-architect` — uses graph topology for architecture decisions
|
|
|
|
### Tier 3: Hook-Based Intelligence (High Impact, Medium Effort)
|
|
|
|
Leverage Claude Code's hook system for real-time intelligence:
|
|
|
|
```json
|
|
{
|
|
"hooks": {
|
|
"PreToolUse": [{
|
|
"matcher": "Edit|Write",
|
|
"hooks": [{
|
|
"type": "command",
|
|
"command": "npx @ruvector/hooks pre-edit --file $CLAUDE_FILE_PATH"
|
|
}]
|
|
}],
|
|
"PostToolUse": [{
|
|
"matcher": "Bash",
|
|
"hooks": [{
|
|
"type": "command",
|
|
"command": "npx @ruvector/hooks post-bash --exit-code $CLAUDE_EXIT_CODE"
|
|
}]
|
|
}],
|
|
"Stop": [{
|
|
"matcher": "",
|
|
"hooks": [{
|
|
"type": "command",
|
|
"command": "npx @ruvector/hooks session-end --share-to-brain"
|
|
}]
|
|
}]
|
|
}
|
|
}
|
|
```
|
|
|
|
**Hook capabilities**:
|
|
- **PreToolUse (Edit/Write)**: Check brain for known anti-patterns before file edits
|
|
- **PostToolUse (Bash)**: Learn from command outcomes, share errors to brain
|
|
- **Stop**: Auto-share session discoveries to collective brain
|
|
- **PreToolUse blocker**: WASM-accelerated security scan before tool execution
|
|
|
|
### Tier 4: Prompt Cache Optimization (Medium Impact, Low Effort)
|
|
|
|
Claude Code uses Anthropic's prompt caching (`cache_control: { type: "ephemeral" }`). Optimize system prompts for maximum cache reuse:
|
|
|
|
1. **Static prefix**: CLAUDE.md instructions, RVF format docs, brain tool schemas — these rarely change and cache well
|
|
2. **Dynamic suffix**: Brain search results, recent memories — appended after the cached prefix
|
|
3. **Tool schema ordering**: List most-used tools first for higher cache hit rate
|
|
|
|
**Implementation**: Restructure CLAUDE.md to front-load stable content:
|
|
```
|
|
[CACHED] Project rules, architecture, conventions (rarely changes)
|
|
[CACHED] RuVector tool schemas (40 tools, stable across sessions)
|
|
[DYNAMIC] Recent brain context (changes per query)
|
|
```
|
|
|
|
### Tier 5: Agent SDK Embedding (High Impact, High Effort)
|
|
|
|
Use `@anthropic-ai/claude-agent-sdk` to embed Claude Code inside RuVector orchestration:
|
|
|
|
```typescript
|
|
import { query } from "@anthropic-ai/claude-agent-sdk";
|
|
|
|
// RuVector orchestrator drives Claude Code as a cognitive worker
|
|
async function brainEnhancedQuery(task: string) {
|
|
// 1. Search brain for context
|
|
const context = await brainSearch(task);
|
|
|
|
// 2. Run Claude Code with brain context injected
|
|
for await (const event of query({
|
|
prompt: `${context}\n\nTask: ${task}`,
|
|
options: {
|
|
allowedTools: ["Read", "Edit", "Bash", "mcp__pi-brain__*"],
|
|
maxTurns: 10,
|
|
}
|
|
})) {
|
|
// 3. Feed results back to brain
|
|
if (event.type === "result") {
|
|
await brainShare(event.result);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Tier 6: Plugin Marketplace Distribution (Medium Impact, Medium Effort)
|
|
|
|
Package RuVector as a Claude Code plugin for one-click installation:
|
|
|
|
```json
|
|
{
|
|
"name": "@ruvector/claude-plugin",
|
|
"claudeCode": {
|
|
"mcpServers": {
|
|
"pi-brain": { "url": "https://mcp.pi.ruv.io" }
|
|
},
|
|
"agents": ["researcher", "reviewer", "consciousness"],
|
|
"skills": ["brain-search", "brain-share", "phi-analyze"],
|
|
"hooks": { ... }
|
|
}
|
|
}
|
|
```
|
|
|
|
## Implementation Priority
|
|
|
|
| Tier | Effort | Impact | Timeline |
|
|
|------|--------|--------|----------|
|
|
| 1. WASM MCP | 2 weeks | High — 10x faster tool calls | Sprint 1 |
|
|
| 2. Agent defs | 2 days | Medium — better UX | Sprint 1 |
|
|
| 3. Hooks | 1 week | High — real-time intelligence | Sprint 1 |
|
|
| 4. Cache opt | 2 days | Medium — cost reduction | Sprint 1 |
|
|
| 5. Agent SDK | 3 weeks | High — full embedding | Sprint 2 |
|
|
| 6. Plugin | 1 week | Medium — distribution | Sprint 2 |
|
|
|
|
## WASM Optimization Targets
|
|
|
|
Based on Claude Code's tool dispatch pattern (`validateInput` → `call`), optimize WASM modules for:
|
|
|
|
| Metric | Current | Target | How |
|
|
|--------|---------|--------|-----|
|
|
| `brain_search` latency | ~200ms (network) | <5ms (local WASM HNSW) | `micro-hnsw-wasm` with local cache |
|
|
| Embedding generation | ~100ms (API) | <10ms (local WASM) | `ruvector-cnn-wasm` HashEmbedder |
|
|
| Tool schema load | 40 tools at startup | Deferred via ToolSearch | Lazy-load tool groups |
|
|
| Context usage | ~2000 tokens/tool schema | ~500 tokens (compressed) | Merge related tool schemas |
|
|
| Permission checks | Per-tool | Batch via PreToolUse hook | WASM pre-filter |
|
|
|
|
## Graph-Informed Architecture
|
|
|
|
From the dependency graph analysis (doc 12, 16), Claude Code's tool dispatch follows:
|
|
|
|
```
|
|
Agent Loop (s$)
|
|
└─▶ Tool Dispatch
|
|
├─▶ Built-in tools (Read, Edit, Bash, etc.)
|
|
├─▶ MCP tools (mcp__server__tool namespace)
|
|
│ └─▶ stdio/SSE/WS transport
|
|
└─▶ Agent tool (spawns sub-agents)
|
|
```
|
|
|
|
**Optimization insight**: MCP tools go through a transport layer that adds ~50ms overhead per call. By embedding WASM in the MCP server process, we eliminate the transport hop for hot-path operations while keeping cold-path operations on the network.
|
|
|
|
**Graph topology insight**: The knowledge graph (16M edges) should inform which tools are co-invoked. From brain usage patterns:
|
|
- `brain_search` → `brain_share` (90% co-occurrence) — bundle in one WASM module
|
|
- `brain_status` → standalone — keep lightweight
|
|
- `brain_partition` → heavy computation — always remote
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- 10-40x latency reduction for hot-path brain operations via local WASM
|
|
- Richer integration surface (hooks, agents, skills, not just tools)
|
|
- Cost reduction through prompt cache optimization
|
|
- Plugin distribution enables one-click adoption
|
|
- Graph-informed architecture avoids optimizing cold paths
|
|
|
|
### Negative
|
|
|
|
- WASM modules need synchronization with remote brain state
|
|
- Local HNSW cache can become stale (mitigated by TTL + invalidation)
|
|
- Agent SDK embedding increases coupling with Claude Code's release cycle
|
|
- Plugin marketplace requirements may evolve
|
|
|
|
### Risks
|
|
|
|
| Risk | Mitigation |
|
|
|------|------------|
|
|
| WASM module size bloat | Use `micro-hnsw-wasm` (5.5KB), not full crate |
|
|
| Cache coherence | TTL-based invalidation + version vector |
|
|
| Claude Code breaking changes | Pin to stable Agent SDK version |
|
|
| MCP protocol evolution | Abstract transport behind trait |
|
|
|
|
## References
|
|
|
|
- [ADR-133: Claude Code Source Analysis](./ADR-133-claude-code-source-analysis.md)
|
|
- [ADR-130: SSE Decoupling](./ADR-130-mcp-sse-decoupling-midstream-queue.md)
|
|
- [ADR-066: SSE MCP Transport](./ADR-066-sse-mcp-transport.md)
|
|
- Research: `/docs/research/claude-code-rvsource/`
|
|
- Claude Code extension docs: `13-extension-points.md`
|
|
- Core module analysis: `15-core-module-analysis.md`
|