ruvector/docs/adr/ADR-134-ruvector-claude-code-deep-integration.md
rUv 1e09c2fe89 feat(sse): decouple SSE to mcp.pi.ruv.io proxy + Claude Code source research
SSE Proxy Decoupling (ADR-130):
- Fix ruvbrain-sse proxy: proper MCP handshake, session creation, drain polling
- Fix internal queue endpoints: session_create keeps receiver, drain returns buffered messages
- Add response_queues to AppState for SSE proxy communication
- Skip sparsifier for >5M edge graphs (was crashing on 16M edges)
- Add SSE_DISABLED/MAX_SSE env vars for configurable connection limits
- Route SSE to dedicated mcp.pi.ruv.io subdomain (Cloudflare CNAME)
- Serve SSE at root / path on proxy (no /sse needed)
- Update all references from pi.ruv.io/sse to mcp.pi.ruv.io
- Fix Dockerfile consciousness crate build (feature/version mismatches)

Claude Code CLI Source Research (ADR-133):
- 19 research documents analyzing Claude Code internals (3000+ lines)
- Decompiler script + RVF corpus builder for all major versions
- Binary RVF containers for v0.2, v1.0, v2.0, v2.1 (300-2068 vectors each)
- Call graphs, class hierarchies, state machines from minified source

Integration Strategy (ADR-134):
- 6-tier integration plan: WASM MCP, agents, hooks, cache, SDK, plugin
- Integration guide with architecture diagrams and performance targets

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-02 23:39:56 +00:00

12 KiB

ADR-134: RuVector Deep Integration with Claude Code CLI

Status

Proposed

Date

2026-04-02

Context

Source code analysis of Claude Code CLI (ADR-133) revealed 13 extension points and detailed internal architecture. RuVector currently integrates via MCP servers (mcp-brain, mcp-brain-server/sse, mcp-gate) but does not leverage the full integration surface. With 31 WASM crates, 4 MCP crates, and deep cognitive capabilities (IIT 4.0, SONA, knowledge graphs), there's significant opportunity to optimize for Claude Code's specific architecture.

Current Integration Points

RuVector Component Integration Depth
mcp-brain-server SSE MCP at mcp.pi.ruv.io Tools only
mcp-brain Local stdio MCP Tools only
mcp-gate MCP gateway Tools only
CLAUDE.md Project instructions Prompt only
Hooks (claude-flow) Pre/post task Lifecycle

Untapped Opportunities (from ADR-133 findings)

Claude Code Feature RuVector Opportunity
Agent SDK embedding Embed RuVector as a library in custom agents
WASM tool execution Ship WASM tools that run in-process (no MCP overhead)
Deferred tool loading Lazy-load 40+ brain tools via ToolSearch
Hook-based routing Route tool calls through WASM-accelerated pre-processing
Context compaction Custom compaction strategy for vector-heavy contexts
Prompt caching Optimize system prompts for cache hits
Plugin marketplace Distribute RuVector as a Claude Code plugin
Remote control SSE Drive Claude Code from RuVector orchestrator
Scheduled tasks Autonomous brain training via Claude Code cron

Decision

Optimize RuVector crates and WASM modules for deep Claude Code integration across 6 tiers.

Tier 1: WASM-Accelerated MCP Tools (High Impact, Low Effort)

Problem: Current MCP tools make HTTP loopback calls for every operation. Each brain_search requires network round-trip to pi.ruv.io.

Solution: Ship critical tools as WASM modules that run in Claude Code's process via a hybrid MCP server.

┌─────────────────────────────────────────────┐
│ Claude Code Process                          │
│                                              │
│  ┌──────────────┐    ┌───────────────────┐  │
│  │ Agent Loop    │───▶│ RuVector MCP      │  │
│  │ (s$ generator)│    │ (stdio transport)  │  │
│  └──────────────┘    │                    │  │
│                       │ ┌──────────────┐  │  │
│                       │ │ WASM Runtime  │  │  │
│                       │ │ • hnsw-search │  │  │
│                       │ │ • embed       │  │  │
│                       │ │ • phi-compute │  │  │
│                       │ └──────────────┘  │  │
│                       │        │           │  │
│                       │   Cache miss ──────┼──┼──▶ pi.ruv.io REST
│                       └───────────────────┘  │
└─────────────────────────────────────────────┘

WASM crates to optimize:

Crate Purpose Claude Code Use
micro-hnsw-wasm Vector search (5.5KB) Local semantic search in MCP server
ruvector-cnn-wasm Embedding generation Embed queries locally, no API call
ruvector-consciousness-wasm IIT Phi computation Consciousness metrics in-process
ruvector-delta-wasm Delta tracking Track knowledge changes locally
ruvector-dag-wasm DAG operations Graph queries without network
ruqu-wasm Quantization Compress vectors for context window

Implementation:

  1. Create crates/ruvector-claude-mcp/ — hybrid MCP server with embedded WASM
  2. WASM modules loaded at startup, handle hot-path operations locally
  3. Cold-path operations (write, sync, train) forwarded to pi.ruv.io
  4. Local HNSW index caches recent searches (LRU, 1000 vectors)

Tier 2: Custom Agent Definitions (Medium Impact, Low Effort)

Ship specialized .claude/agents/ definitions that leverage RuVector tools:

# .claude/agents/ruvector-researcher.md
---
name: ruvector-researcher
description: Research with π brain collective intelligence
model: claude-sonnet-4-6
tools: [Read, Grep, Glob, mcp__pi-brain__brain_search, mcp__pi-brain__brain_share]
---

Before implementing anything, search the π brain for existing patterns:
1. Use brain_search to find related knowledge
2. Check brain_partition for knowledge clusters
3. Share new discoveries via brain_share

Agents to ship:

  • ruvector-researcher — searches brain before coding
  • ruvector-reviewer — reviews code against brain patterns
  • ruvector-consciousness — runs IIT Phi analysis on code structures
  • ruvector-architect — uses graph topology for architecture decisions

Tier 3: Hook-Based Intelligence (High Impact, Medium Effort)

Leverage Claude Code's hook system for real-time intelligence:

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "npx @ruvector/hooks pre-edit --file $CLAUDE_FILE_PATH"
      }]
    }],
    "PostToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "npx @ruvector/hooks post-bash --exit-code $CLAUDE_EXIT_CODE"
      }]
    }],
    "Stop": [{
      "matcher": "",
      "hooks": [{
        "type": "command",
        "command": "npx @ruvector/hooks session-end --share-to-brain"
      }]
    }]
  }
}

Hook capabilities:

  • PreToolUse (Edit/Write): Check brain for known anti-patterns before file edits
  • PostToolUse (Bash): Learn from command outcomes, share errors to brain
  • Stop: Auto-share session discoveries to collective brain
  • PreToolUse blocker: WASM-accelerated security scan before tool execution

Tier 4: Prompt Cache Optimization (Medium Impact, Low Effort)

Claude Code uses Anthropic's prompt caching (cache_control: { type: "ephemeral" }). Optimize system prompts for maximum cache reuse:

  1. Static prefix: CLAUDE.md instructions, RVF format docs, brain tool schemas — these rarely change and cache well
  2. Dynamic suffix: Brain search results, recent memories — appended after the cached prefix
  3. Tool schema ordering: List most-used tools first for higher cache hit rate

Implementation: Restructure CLAUDE.md to front-load stable content:

[CACHED] Project rules, architecture, conventions (rarely changes)
[CACHED] RuVector tool schemas (40 tools, stable across sessions)
[DYNAMIC] Recent brain context (changes per query)

Tier 5: Agent SDK Embedding (High Impact, High Effort)

Use @anthropic-ai/claude-agent-sdk to embed Claude Code inside RuVector orchestration:

import { query } from "@anthropic-ai/claude-agent-sdk";

// RuVector orchestrator drives Claude Code as a cognitive worker
async function brainEnhancedQuery(task: string) {
  // 1. Search brain for context
  const context = await brainSearch(task);
  
  // 2. Run Claude Code with brain context injected
  for await (const event of query({
    prompt: `${context}\n\nTask: ${task}`,
    options: {
      allowedTools: ["Read", "Edit", "Bash", "mcp__pi-brain__*"],
      maxTurns: 10,
    }
  })) {
    // 3. Feed results back to brain
    if (event.type === "result") {
      await brainShare(event.result);
    }
  }
}

Tier 6: Plugin Marketplace Distribution (Medium Impact, Medium Effort)

Package RuVector as a Claude Code plugin for one-click installation:

{
  "name": "@ruvector/claude-plugin",
  "claudeCode": {
    "mcpServers": {
      "pi-brain": { "url": "https://mcp.pi.ruv.io" }
    },
    "agents": ["researcher", "reviewer", "consciousness"],
    "skills": ["brain-search", "brain-share", "phi-analyze"],
    "hooks": { ... }
  }
}

Implementation Priority

Tier Effort Impact Timeline
1. WASM MCP 2 weeks High — 10x faster tool calls Sprint 1
2. Agent defs 2 days Medium — better UX Sprint 1
3. Hooks 1 week High — real-time intelligence Sprint 1
4. Cache opt 2 days Medium — cost reduction Sprint 1
5. Agent SDK 3 weeks High — full embedding Sprint 2
6. Plugin 1 week Medium — distribution Sprint 2

WASM Optimization Targets

Based on Claude Code's tool dispatch pattern (validateInputcall), optimize WASM modules for:

Metric Current Target How
brain_search latency ~200ms (network) <5ms (local WASM HNSW) micro-hnsw-wasm with local cache
Embedding generation ~100ms (API) <10ms (local WASM) ruvector-cnn-wasm HashEmbedder
Tool schema load 40 tools at startup Deferred via ToolSearch Lazy-load tool groups
Context usage ~2000 tokens/tool schema ~500 tokens (compressed) Merge related tool schemas
Permission checks Per-tool Batch via PreToolUse hook WASM pre-filter

Graph-Informed Architecture

From the dependency graph analysis (doc 12, 16), Claude Code's tool dispatch follows:

Agent Loop (s$)
  └─▶ Tool Dispatch
       ├─▶ Built-in tools (Read, Edit, Bash, etc.)
       ├─▶ MCP tools (mcp__server__tool namespace)
       │    └─▶ stdio/SSE/WS transport
       └─▶ Agent tool (spawns sub-agents)

Optimization insight: MCP tools go through a transport layer that adds ~50ms overhead per call. By embedding WASM in the MCP server process, we eliminate the transport hop for hot-path operations while keeping cold-path operations on the network.

Graph topology insight: The knowledge graph (16M edges) should inform which tools are co-invoked. From brain usage patterns:

  • brain_searchbrain_share (90% co-occurrence) — bundle in one WASM module
  • brain_status → standalone — keep lightweight
  • brain_partition → heavy computation — always remote

Consequences

Positive

  • 10-40x latency reduction for hot-path brain operations via local WASM
  • Richer integration surface (hooks, agents, skills, not just tools)
  • Cost reduction through prompt cache optimization
  • Plugin distribution enables one-click adoption
  • Graph-informed architecture avoids optimizing cold paths

Negative

  • WASM modules need synchronization with remote brain state
  • Local HNSW cache can become stale (mitigated by TTL + invalidation)
  • Agent SDK embedding increases coupling with Claude Code's release cycle
  • Plugin marketplace requirements may evolve

Risks

Risk Mitigation
WASM module size bloat Use micro-hnsw-wasm (5.5KB), not full crate
Cache coherence TTL-based invalidation + version vector
Claude Code breaking changes Pin to stable Agent SDK version
MCP protocol evolution Abstract transport behind trait

References