ruvector/docs/adr/ADR-134-ruvector-claude-code-deep-integration.md

# ADR-134: RuVector Deep Integration with Claude Code CLI

## Status

Proposed

## Date

2026-04-02

## Context

Source code analysis of Claude Code CLI (ADR-133) revealed 13 extension points and detailed internal architecture. RuVector currently integrates via MCP servers (`mcp-brain`, `mcp-brain-server/sse`, `mcp-gate`) but does not leverage the full integration surface. With 31 WASM crates, 4 MCP crates, and deep cognitive capabilities (IIT 4.0, SONA, knowledge graphs), there's significant opportunity to optimize for Claude Code's specific architecture.

### Current Integration Points

| RuVector Component | Integration | Depth |
|-------------------|-------------|-------|
| `mcp-brain-server` | SSE MCP at `mcp.pi.ruv.io` | Tools only |
| `mcp-brain` | Local stdio MCP | Tools only |
| `mcp-gate` | MCP gateway | Tools only |
| CLAUDE.md | Project instructions | Prompt only |
| Hooks (claude-flow) | Pre/post task | Lifecycle |

### Untapped Opportunities (from ADR-133 findings)

| Claude Code Feature | RuVector Opportunity |
|---------------------|---------------------|
| Agent SDK embedding | Embed RuVector as a library in custom agents |
| WASM tool execution | Ship WASM tools that run in-process (no MCP overhead) |
| Deferred tool loading | Lazy-load 40+ brain tools via `ToolSearch` |
| Hook-based routing | Route tool calls through WASM-accelerated pre-processing |
| Context compaction | Custom compaction strategy for vector-heavy contexts |
| Prompt caching | Optimize system prompts for cache hits |
| Plugin marketplace | Distribute RuVector as a Claude Code plugin |
| Remote control SSE | Drive Claude Code from RuVector orchestrator |
| Scheduled tasks | Autonomous brain training via Claude Code cron |

## Decision

Optimize RuVector crates and WASM modules for deep Claude Code integration across 6 tiers.

### Tier 1: WASM-Accelerated MCP Tools (High Impact, Low Effort)

**Problem**: Current MCP tools make HTTP loopback calls for every operation. Each `brain_search` requires network round-trip to `pi.ruv.io`.

**Solution**: Ship critical tools as WASM modules that run in Claude Code's process via a hybrid MCP server.

```
┌─────────────────────────────────────────────┐
│ Claude Code Process                          │
│                                              │
│  ┌──────────────┐    ┌───────────────────┐  │
│  │ Agent Loop    │───▶│ RuVector MCP      │  │
│  │ (s$ generator)│    │ (stdio transport)  │  │
│  └──────────────┘    │                    │  │
│                       │ ┌──────────────┐  │  │
│                       │ │ WASM Runtime  │  │  │
│                       │ │ • hnsw-search │  │  │
│                       │ │ • embed       │  │  │
│                       │ │ • phi-compute │  │  │
│                       │ └──────────────┘  │  │
│                       │        │           │  │
│                       │   Cache miss ──────┼──┼──▶ pi.ruv.io REST
│                       └───────────────────┘  │
└─────────────────────────────────────────────┘
```

**WASM crates to optimize**:

| Crate | Purpose | Claude Code Use |
|-------|---------|-----------------|
| `micro-hnsw-wasm` | Vector search (5.5KB) | Local semantic search in MCP server |
| `ruvector-cnn-wasm` | Embedding generation | Embed queries locally, no API call |
| `ruvector-consciousness-wasm` | IIT Phi computation | Consciousness metrics in-process |
| `ruvector-delta-wasm` | Delta tracking | Track knowledge changes locally |
| `ruvector-dag-wasm` | DAG operations | Graph queries without network |
| `ruqu-wasm` | Quantization | Compress vectors for context window |

**Implementation**:
1. Create `crates/ruvector-claude-mcp/` — hybrid MCP server with embedded WASM
2. WASM modules loaded at startup, handle hot-path operations locally
3. Cold-path operations (write, sync, train) forwarded to `pi.ruv.io`
4. Local HNSW index caches recent searches (LRU, 1000 vectors)

### Tier 2: Custom Agent Definitions (Medium Impact, Low Effort)

Ship specialized `.claude/agents/` definitions that leverage RuVector tools:

```markdown
# .claude/agents/ruvector-researcher.md
---
name: ruvector-researcher
description: Research with π brain collective intelligence
model: claude-sonnet-4-6
tools: [Read, Grep, Glob, mcp__pi-brain__brain_search, mcp__pi-brain__brain_share]
---

Before implementing anything, search the π brain for existing patterns:
1. Use brain_search to find related knowledge
2. Check brain_partition for knowledge clusters
3. Share new discoveries via brain_share
```

**Agents to ship**:
- `ruvector-researcher` — searches brain before coding
- `ruvector-reviewer` — reviews code against brain patterns
- `ruvector-consciousness` — runs IIT Phi analysis on code structures
- `ruvector-architect` — uses graph topology for architecture decisions

### Tier 3: Hook-Based Intelligence (High Impact, Medium Effort)

Leverage Claude Code's hook system for real-time intelligence:

```json
{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "npx @ruvector/hooks pre-edit --file $CLAUDE_FILE_PATH"
      }]
    }],
    "PostToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "npx @ruvector/hooks post-bash --exit-code $CLAUDE_EXIT_CODE"
      }]
    }],
    "Stop": [{
      "matcher": "",
      "hooks": [{
        "type": "command",
        "command": "npx @ruvector/hooks session-end --share-to-brain"
      }]
    }]
  }
}
```

**Hook capabilities**:
- **PreToolUse (Edit/Write)**: Check brain for known anti-patterns before file edits
- **PostToolUse (Bash)**: Learn from command outcomes, share errors to brain
- **Stop**: Auto-share session discoveries to collective brain
- **PreToolUse blocker**: WASM-accelerated security scan before tool execution

### Tier 4: Prompt Cache Optimization (Medium Impact, Low Effort)

Claude Code uses Anthropic's prompt caching (`cache_control: { type: "ephemeral" }`). Optimize system prompts for maximum cache reuse:

1. **Static prefix**: CLAUDE.md instructions, RVF format docs, brain tool schemas — these rarely change and cache well
2. **Dynamic suffix**: Brain search results, recent memories — appended after the cached prefix
3. **Tool schema ordering**: List most-used tools first for higher cache hit rate

**Implementation**: Restructure CLAUDE.md to front-load stable content:
```
[CACHED] Project rules, architecture, conventions (rarely changes)
[CACHED] RuVector tool schemas (40 tools, stable across sessions)
[DYNAMIC] Recent brain context (changes per query)
```

### Tier 5: Agent SDK Embedding (High Impact, High Effort)

Use `@anthropic-ai/claude-agent-sdk` to embed Claude Code inside RuVector orchestration:

```typescript
import { query } from "@anthropic-ai/claude-agent-sdk";

// RuVector orchestrator drives Claude Code as a cognitive worker
async function brainEnhancedQuery(task: string) {
  // 1. Search brain for context
  const context = await brainSearch(task);

  // 2. Run Claude Code with brain context injected
  for await (const event of query({
    prompt: `${context}\n\nTask: ${task}`,
    options: {
      allowedTools: ["Read", "Edit", "Bash", "mcp__pi-brain__*"],
      maxTurns: 10,
    }
  })) {
    // 3. Feed results back to brain
    if (event.type === "result") {
      await brainShare(event.result);
    }
  }
}
```

### Tier 6: Plugin Marketplace Distribution (Medium Impact, Medium Effort)

Package RuVector as a Claude Code plugin for one-click installation:

```json
{
  "name": "@ruvector/claude-plugin",
  "claudeCode": {
    "mcpServers": {
      "pi-brain": { "url": "https://mcp.pi.ruv.io" }
    },
    "agents": ["researcher", "reviewer", "consciousness"],
    "skills": ["brain-search", "brain-share", "phi-analyze"],
    "hooks": { ... }
  }
}
```

## Implementation Priority

| Tier | Effort | Impact | Timeline |
|------|--------|--------|----------|
| 1. WASM MCP | 2 weeks | High — 10x faster tool calls | Sprint 1 |
| 2. Agent defs | 2 days | Medium — better UX | Sprint 1 |
| 3. Hooks | 1 week | High — real-time intelligence | Sprint 1 |
| 4. Cache opt | 2 days | Medium — cost reduction | Sprint 1 |
| 5. Agent SDK | 3 weeks | High — full embedding | Sprint 2 |
| 6. Plugin | 1 week | Medium — distribution | Sprint 2 |

## WASM Optimization Targets

Based on Claude Code's tool dispatch pattern (`validateInput` → `call`), optimize WASM modules for:

| Metric | Current | Target | How |
|--------|---------|--------|-----|
| `brain_search` latency | ~200ms (network) | <5ms (local WASM HNSW) | `micro-hnsw-wasm` with local cache |
| Embedding generation | ~100ms (API) | <10ms (local WASM) | `ruvector-cnn-wasm` HashEmbedder |
| Tool schema load | 40 tools at startup | Deferred via ToolSearch | Lazy-load tool groups |
| Context usage | ~2000 tokens/tool schema | ~500 tokens (compressed) | Merge related tool schemas |
| Permission checks | Per-tool | Batch via PreToolUse hook | WASM pre-filter |

## Graph-Informed Architecture

From the dependency graph analysis (doc 12, 16), Claude Code's tool dispatch follows:

```
Agent Loop (s$)
  └─▶ Tool Dispatch
       ├─▶ Built-in tools (Read, Edit, Bash, etc.)
       ├─▶ MCP tools (mcp__server__tool namespace)
       │    └─▶ stdio/SSE/WS transport
       └─▶ Agent tool (spawns sub-agents)
```

**Optimization insight**: MCP tools go through a transport layer that adds ~50ms overhead per call. By embedding WASM in the MCP server process, we eliminate the transport hop for hot-path operations while keeping cold-path operations on the network.

**Graph topology insight**: The knowledge graph (16M edges) should inform which tools are co-invoked. From brain usage patterns:
- `brain_search` → `brain_share` (90% co-occurrence) — bundle in one WASM module
- `brain_status` → standalone — keep lightweight
- `brain_partition` → heavy computation — always remote

## Consequences

### Positive

- 10-40x latency reduction for hot-path brain operations via local WASM
- Richer integration surface (hooks, agents, skills, not just tools)
- Cost reduction through prompt cache optimization
- Plugin distribution enables one-click adoption
- Graph-informed architecture avoids optimizing cold paths

### Negative

- WASM modules need synchronization with remote brain state
- Local HNSW cache can become stale (mitigated by TTL + invalidation)
- Agent SDK embedding increases coupling with Claude Code's release cycle
- Plugin marketplace requirements may evolve

### Risks

| Risk | Mitigation |
|------|------------|
| WASM module size bloat | Use `micro-hnsw-wasm` (5.5KB), not full crate |
| Cache coherence | TTL-based invalidation + version vector |
| Claude Code breaking changes | Pin to stable Agent SDK version |
| MCP protocol evolution | Abstract transport behind trait |

## References

- [ADR-133: Claude Code Source Analysis](./ADR-133-claude-code-source-analysis.md)
- [ADR-130: SSE Decoupling](./ADR-130-mcp-sse-decoupling-midstream-queue.md)
- [ADR-066: SSE MCP Transport](./ADR-066-sse-mcp-transport.md)
- Research: `/docs/research/claude-code-rvsource/`
- Claude Code extension docs: `13-extension-points.md`
- Core module analysis: `15-core-module-analysis.md`