ruvector/docs/adr/ADR-134-ruvector-claude-code-deep-integration.md
rUv 930fca916f feat(sse): decouple SSE to mcp.pi.ruv.io proxy + Claude Code source research
SSE Proxy Decoupling (ADR-130):
- Fix ruvbrain-sse proxy: proper MCP handshake, session creation, drain polling
- Fix internal queue endpoints: session_create keeps receiver, drain returns buffered messages
- Add response_queues to AppState for SSE proxy communication
- Skip sparsifier for >5M edge graphs (was crashing on 16M edges)
- Add SSE_DISABLED/MAX_SSE env vars for configurable connection limits
- Route SSE to dedicated mcp.pi.ruv.io subdomain (Cloudflare CNAME)
- Serve SSE at root / path on proxy (no /sse needed)
- Update all references from pi.ruv.io/sse to mcp.pi.ruv.io
- Fix Dockerfile consciousness crate build (feature/version mismatches)

Claude Code CLI Source Research (ADR-133):
- 19 research documents analyzing Claude Code internals (3000+ lines)
- Decompiler script + RVF corpus builder for all major versions
- Binary RVF containers for v0.2, v1.0, v2.0, v2.1 (300-2068 vectors each)
- Call graphs, class hierarchies, state machines from minified source

Integration Strategy (ADR-134):
- 6-tier integration plan: WASM MCP, agents, hooks, cache, SDK, plugin
- Integration guide with architecture diagrams and performance targets

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-02 23:39:56 +00:00

286 lines
12 KiB
Markdown

# ADR-134: RuVector Deep Integration with Claude Code CLI
## Status
Proposed
## Date
2026-04-02
## Context
Source code analysis of Claude Code CLI (ADR-133) revealed 13 extension points and detailed internal architecture. RuVector currently integrates via MCP servers (`mcp-brain`, `mcp-brain-server/sse`, `mcp-gate`) but does not leverage the full integration surface. With 31 WASM crates, 4 MCP crates, and deep cognitive capabilities (IIT 4.0, SONA, knowledge graphs), there's significant opportunity to optimize for Claude Code's specific architecture.
### Current Integration Points
| RuVector Component | Integration | Depth |
|-------------------|-------------|-------|
| `mcp-brain-server` | SSE MCP at `mcp.pi.ruv.io` | Tools only |
| `mcp-brain` | Local stdio MCP | Tools only |
| `mcp-gate` | MCP gateway | Tools only |
| CLAUDE.md | Project instructions | Prompt only |
| Hooks (claude-flow) | Pre/post task | Lifecycle |
### Untapped Opportunities (from ADR-133 findings)
| Claude Code Feature | RuVector Opportunity |
|---------------------|---------------------|
| Agent SDK embedding | Embed RuVector as a library in custom agents |
| WASM tool execution | Ship WASM tools that run in-process (no MCP overhead) |
| Deferred tool loading | Lazy-load 40+ brain tools via `ToolSearch` |
| Hook-based routing | Route tool calls through WASM-accelerated pre-processing |
| Context compaction | Custom compaction strategy for vector-heavy contexts |
| Prompt caching | Optimize system prompts for cache hits |
| Plugin marketplace | Distribute RuVector as a Claude Code plugin |
| Remote control SSE | Drive Claude Code from RuVector orchestrator |
| Scheduled tasks | Autonomous brain training via Claude Code cron |
## Decision
Optimize RuVector crates and WASM modules for deep Claude Code integration across 6 tiers.
### Tier 1: WASM-Accelerated MCP Tools (High Impact, Low Effort)
**Problem**: Current MCP tools make HTTP loopback calls for every operation. Each `brain_search` requires network round-trip to `pi.ruv.io`.
**Solution**: Ship critical tools as WASM modules that run in Claude Code's process via a hybrid MCP server.
```
┌─────────────────────────────────────────────┐
│ Claude Code Process │
│ │
│ ┌──────────────┐ ┌───────────────────┐ │
│ │ Agent Loop │───▶│ RuVector MCP │ │
│ │ (s$ generator)│ │ (stdio transport) │ │
│ └──────────────┘ │ │ │
│ │ ┌──────────────┐ │ │
│ │ │ WASM Runtime │ │ │
│ │ │ • hnsw-search │ │ │
│ │ │ • embed │ │ │
│ │ │ • phi-compute │ │ │
│ │ └──────────────┘ │ │
│ │ │ │ │
│ │ Cache miss ──────┼──┼──▶ pi.ruv.io REST
│ └───────────────────┘ │
└─────────────────────────────────────────────┘
```
**WASM crates to optimize**:
| Crate | Purpose | Claude Code Use |
|-------|---------|-----------------|
| `micro-hnsw-wasm` | Vector search (5.5KB) | Local semantic search in MCP server |
| `ruvector-cnn-wasm` | Embedding generation | Embed queries locally, no API call |
| `ruvector-consciousness-wasm` | IIT Phi computation | Consciousness metrics in-process |
| `ruvector-delta-wasm` | Delta tracking | Track knowledge changes locally |
| `ruvector-dag-wasm` | DAG operations | Graph queries without network |
| `ruqu-wasm` | Quantization | Compress vectors for context window |
**Implementation**:
1. Create `crates/ruvector-claude-mcp/` — hybrid MCP server with embedded WASM
2. WASM modules loaded at startup, handle hot-path operations locally
3. Cold-path operations (write, sync, train) forwarded to `pi.ruv.io`
4. Local HNSW index caches recent searches (LRU, 1000 vectors)
### Tier 2: Custom Agent Definitions (Medium Impact, Low Effort)
Ship specialized `.claude/agents/` definitions that leverage RuVector tools:
```markdown
# .claude/agents/ruvector-researcher.md
---
name: ruvector-researcher
description: Research with π brain collective intelligence
model: claude-sonnet-4-6
tools: [Read, Grep, Glob, mcp__pi-brain__brain_search, mcp__pi-brain__brain_share]
---
Before implementing anything, search the π brain for existing patterns:
1. Use brain_search to find related knowledge
2. Check brain_partition for knowledge clusters
3. Share new discoveries via brain_share
```
**Agents to ship**:
- `ruvector-researcher` — searches brain before coding
- `ruvector-reviewer` — reviews code against brain patterns
- `ruvector-consciousness` — runs IIT Phi analysis on code structures
- `ruvector-architect` — uses graph topology for architecture decisions
### Tier 3: Hook-Based Intelligence (High Impact, Medium Effort)
Leverage Claude Code's hook system for real-time intelligence:
```json
{
"hooks": {
"PreToolUse": [{
"matcher": "Edit|Write",
"hooks": [{
"type": "command",
"command": "npx @ruvector/hooks pre-edit --file $CLAUDE_FILE_PATH"
}]
}],
"PostToolUse": [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": "npx @ruvector/hooks post-bash --exit-code $CLAUDE_EXIT_CODE"
}]
}],
"Stop": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "npx @ruvector/hooks session-end --share-to-brain"
}]
}]
}
}
```
**Hook capabilities**:
- **PreToolUse (Edit/Write)**: Check brain for known anti-patterns before file edits
- **PostToolUse (Bash)**: Learn from command outcomes, share errors to brain
- **Stop**: Auto-share session discoveries to collective brain
- **PreToolUse blocker**: WASM-accelerated security scan before tool execution
### Tier 4: Prompt Cache Optimization (Medium Impact, Low Effort)
Claude Code uses Anthropic's prompt caching (`cache_control: { type: "ephemeral" }`). Optimize system prompts for maximum cache reuse:
1. **Static prefix**: CLAUDE.md instructions, RVF format docs, brain tool schemas — these rarely change and cache well
2. **Dynamic suffix**: Brain search results, recent memories — appended after the cached prefix
3. **Tool schema ordering**: List most-used tools first for higher cache hit rate
**Implementation**: Restructure CLAUDE.md to front-load stable content:
```
[CACHED] Project rules, architecture, conventions (rarely changes)
[CACHED] RuVector tool schemas (40 tools, stable across sessions)
[DYNAMIC] Recent brain context (changes per query)
```
### Tier 5: Agent SDK Embedding (High Impact, High Effort)
Use `@anthropic-ai/claude-agent-sdk` to embed Claude Code inside RuVector orchestration:
```typescript
import { query } from "@anthropic-ai/claude-agent-sdk";
// RuVector orchestrator drives Claude Code as a cognitive worker
async function brainEnhancedQuery(task: string) {
// 1. Search brain for context
const context = await brainSearch(task);
// 2. Run Claude Code with brain context injected
for await (const event of query({
prompt: `${context}\n\nTask: ${task}`,
options: {
allowedTools: ["Read", "Edit", "Bash", "mcp__pi-brain__*"],
maxTurns: 10,
}
})) {
// 3. Feed results back to brain
if (event.type === "result") {
await brainShare(event.result);
}
}
}
```
### Tier 6: Plugin Marketplace Distribution (Medium Impact, Medium Effort)
Package RuVector as a Claude Code plugin for one-click installation:
```json
{
"name": "@ruvector/claude-plugin",
"claudeCode": {
"mcpServers": {
"pi-brain": { "url": "https://mcp.pi.ruv.io" }
},
"agents": ["researcher", "reviewer", "consciousness"],
"skills": ["brain-search", "brain-share", "phi-analyze"],
"hooks": { ... }
}
}
```
## Implementation Priority
| Tier | Effort | Impact | Timeline |
|------|--------|--------|----------|
| 1. WASM MCP | 2 weeks | High — 10x faster tool calls | Sprint 1 |
| 2. Agent defs | 2 days | Medium — better UX | Sprint 1 |
| 3. Hooks | 1 week | High — real-time intelligence | Sprint 1 |
| 4. Cache opt | 2 days | Medium — cost reduction | Sprint 1 |
| 5. Agent SDK | 3 weeks | High — full embedding | Sprint 2 |
| 6. Plugin | 1 week | Medium — distribution | Sprint 2 |
## WASM Optimization Targets
Based on Claude Code's tool dispatch pattern (`validateInput``call`), optimize WASM modules for:
| Metric | Current | Target | How |
|--------|---------|--------|-----|
| `brain_search` latency | ~200ms (network) | <5ms (local WASM HNSW) | `micro-hnsw-wasm` with local cache |
| Embedding generation | ~100ms (API) | <10ms (local WASM) | `ruvector-cnn-wasm` HashEmbedder |
| Tool schema load | 40 tools at startup | Deferred via ToolSearch | Lazy-load tool groups |
| Context usage | ~2000 tokens/tool schema | ~500 tokens (compressed) | Merge related tool schemas |
| Permission checks | Per-tool | Batch via PreToolUse hook | WASM pre-filter |
## Graph-Informed Architecture
From the dependency graph analysis (doc 12, 16), Claude Code's tool dispatch follows:
```
Agent Loop (s$)
└─▶ Tool Dispatch
├─▶ Built-in tools (Read, Edit, Bash, etc.)
├─▶ MCP tools (mcp__server__tool namespace)
│ └─▶ stdio/SSE/WS transport
└─▶ Agent tool (spawns sub-agents)
```
**Optimization insight**: MCP tools go through a transport layer that adds ~50ms overhead per call. By embedding WASM in the MCP server process, we eliminate the transport hop for hot-path operations while keeping cold-path operations on the network.
**Graph topology insight**: The knowledge graph (16M edges) should inform which tools are co-invoked. From brain usage patterns:
- `brain_search` `brain_share` (90% co-occurrence) bundle in one WASM module
- `brain_status` standalone keep lightweight
- `brain_partition` heavy computation always remote
## Consequences
### Positive
- 10-40x latency reduction for hot-path brain operations via local WASM
- Richer integration surface (hooks, agents, skills, not just tools)
- Cost reduction through prompt cache optimization
- Plugin distribution enables one-click adoption
- Graph-informed architecture avoids optimizing cold paths
### Negative
- WASM modules need synchronization with remote brain state
- Local HNSW cache can become stale (mitigated by TTL + invalidation)
- Agent SDK embedding increases coupling with Claude Code's release cycle
- Plugin marketplace requirements may evolve
### Risks
| Risk | Mitigation |
|------|------------|
| WASM module size bloat | Use `micro-hnsw-wasm` (5.5KB), not full crate |
| Cache coherence | TTL-based invalidation + version vector |
| Claude Code breaking changes | Pin to stable Agent SDK version |
| MCP protocol evolution | Abstract transport behind trait |
## References
- [ADR-133: Claude Code Source Analysis](./ADR-133-claude-code-source-analysis.md)
- [ADR-130: SSE Decoupling](./ADR-130-mcp-sse-decoupling-midstream-queue.md)
- [ADR-066: SSE MCP Transport](./ADR-066-sse-mcp-transport.md)
- Research: `/docs/research/claude-code-rvsource/`
- Claude Code extension docs: `13-extension-points.md`
- Core module analysis: `15-core-module-analysis.md`