mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-25 15:03:46 +00:00

rUv 1e09c2fe89 feat(sse): decouple SSE to mcp.pi.ruv.io proxy + Claude Code source research

SSE Proxy Decoupling (ADR-130):
- Fix ruvbrain-sse proxy: proper MCP handshake, session creation, drain polling
- Fix internal queue endpoints: session_create keeps receiver, drain returns buffered messages
- Add response_queues to AppState for SSE proxy communication
- Skip sparsifier for >5M edge graphs (was crashing on 16M edges)
- Add SSE_DISABLED/MAX_SSE env vars for configurable connection limits
- Route SSE to dedicated mcp.pi.ruv.io subdomain (Cloudflare CNAME)
- Serve SSE at root / path on proxy (no /sse needed)
- Update all references from pi.ruv.io/sse to mcp.pi.ruv.io
- Fix Dockerfile consciousness crate build (feature/version mismatches)

Claude Code CLI Source Research (ADR-133):
- 19 research documents analyzing Claude Code internals (3000+ lines)
- Decompiler script + RVF corpus builder for all major versions
- Binary RVF containers for v0.2, v1.0, v2.0, v2.1 (300-2068 vectors each)
- Call graphs, class hierarchies, state machines from minified source

Integration Strategy (ADR-134):
- 6-tier integration plan: WASM MCP, agents, hooks, cache, SDK, plugin
- Integration guide with architecture diagrams and performance targets

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-04-02 23:39:56 +00:00

12 KiB

Raw Permalink Blame History

ADR-134: RuVector Deep Integration with Claude Code CLI

Status

Proposed

Date

2026-04-02

Context

Source code analysis of Claude Code CLI (ADR-133) revealed 13 extension points and detailed internal architecture. RuVector currently integrates via MCP servers (mcp-brain, mcp-brain-server/sse, mcp-gate) but does not leverage the full integration surface. With 31 WASM crates, 4 MCP crates, and deep cognitive capabilities (IIT 4.0, SONA, knowledge graphs), there's significant opportunity to optimize for Claude Code's specific architecture.

Current Integration Points

RuVector Component	Integration	Depth
`mcp-brain-server`	SSE MCP at `mcp.pi.ruv.io`	Tools only
`mcp-brain`	Local stdio MCP	Tools only
`mcp-gate`	MCP gateway	Tools only
CLAUDE.md	Project instructions	Prompt only
Hooks (claude-flow)	Pre/post task	Lifecycle

Untapped Opportunities (from ADR-133 findings)

Claude Code Feature	RuVector Opportunity
Agent SDK embedding	Embed RuVector as a library in custom agents
WASM tool execution	Ship WASM tools that run in-process (no MCP overhead)
Deferred tool loading	Lazy-load 40+ brain tools via `ToolSearch`
Hook-based routing	Route tool calls through WASM-accelerated pre-processing
Context compaction	Custom compaction strategy for vector-heavy contexts
Prompt caching	Optimize system prompts for cache hits
Plugin marketplace	Distribute RuVector as a Claude Code plugin
Remote control SSE	Drive Claude Code from RuVector orchestrator
Scheduled tasks	Autonomous brain training via Claude Code cron

Decision

Optimize RuVector crates and WASM modules for deep Claude Code integration across 6 tiers.

Tier 1: WASM-Accelerated MCP Tools (High Impact, Low Effort)

Problem: Current MCP tools make HTTP loopback calls for every operation. Each brain_search requires network round-trip to pi.ruv.io.

Solution: Ship critical tools as WASM modules that run in Claude Code's process via a hybrid MCP server.

┌─────────────────────────────────────────────┐
│ Claude Code Process                          │
│                                              │
│  ┌──────────────┐    ┌───────────────────┐  │
│  │ Agent Loop    │───▶│ RuVector MCP      │  │
│  │ (s$ generator)│    │ (stdio transport)  │  │
│  └──────────────┘    │                    │  │
│                       │ ┌──────────────┐  │  │
│                       │ │ WASM Runtime  │  │  │
│                       │ │ • hnsw-search │  │  │
│                       │ │ • embed       │  │  │
│                       │ │ • phi-compute │  │  │
│                       │ └──────────────┘  │  │
│                       │        │           │  │
│                       │   Cache miss ──────┼──┼──▶ pi.ruv.io REST
│                       └───────────────────┘  │
└─────────────────────────────────────────────┘

WASM crates to optimize:

Crate	Purpose	Claude Code Use
`micro-hnsw-wasm`	Vector search (5.5KB)	Local semantic search in MCP server
`ruvector-cnn-wasm`	Embedding generation	Embed queries locally, no API call
`ruvector-consciousness-wasm`	IIT Phi computation	Consciousness metrics in-process
`ruvector-delta-wasm`	Delta tracking	Track knowledge changes locally
`ruvector-dag-wasm`	DAG operations	Graph queries without network
`ruqu-wasm`	Quantization	Compress vectors for context window

Implementation:

Create crates/ruvector-claude-mcp/ — hybrid MCP server with embedded WASM
WASM modules loaded at startup, handle hot-path operations locally
Cold-path operations (write, sync, train) forwarded to pi.ruv.io
Local HNSW index caches recent searches (LRU, 1000 vectors)

Tier 2: Custom Agent Definitions (Medium Impact, Low Effort)

Ship specialized .claude/agents/ definitions that leverage RuVector tools:

# .claude/agents/ruvector-researcher.md
---
name: ruvector-researcher
description: Research with π brain collective intelligence
model: claude-sonnet-4-6
tools: [Read, Grep, Glob, mcp__pi-brain__brain_search, mcp__pi-brain__brain_share]
---

Before implementing anything, search the π brain for existing patterns:
1. Use brain_search to find related knowledge
2. Check brain_partition for knowledge clusters
3. Share new discoveries via brain_share

Agents to ship:

ruvector-researcher — searches brain before coding
ruvector-reviewer — reviews code against brain patterns
ruvector-consciousness — runs IIT Phi analysis on code structures
ruvector-architect — uses graph topology for architecture decisions

Tier 3: Hook-Based Intelligence (High Impact, Medium Effort)

Leverage Claude Code's hook system for real-time intelligence:

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{
        "type": "command",
        "command": "npx @ruvector/hooks pre-edit --file $CLAUDE_FILE_PATH"
      }]
    }],
    "PostToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "npx @ruvector/hooks post-bash --exit-code $CLAUDE_EXIT_CODE"
      }]
    }],
    "Stop": [{
      "matcher": "",
      "hooks": [{
        "type": "command",
        "command": "npx @ruvector/hooks session-end --share-to-brain"
      }]
    }]
  }
}

Hook capabilities:

PreToolUse (Edit/Write): Check brain for known anti-patterns before file edits
PostToolUse (Bash): Learn from command outcomes, share errors to brain
Stop: Auto-share session discoveries to collective brain
PreToolUse blocker: WASM-accelerated security scan before tool execution

Tier 4: Prompt Cache Optimization (Medium Impact, Low Effort)

Claude Code uses Anthropic's prompt caching (cache_control: { type: "ephemeral" }). Optimize system prompts for maximum cache reuse:

Static prefix: CLAUDE.md instructions, RVF format docs, brain tool schemas — these rarely change and cache well
Dynamic suffix: Brain search results, recent memories — appended after the cached prefix
Tool schema ordering: List most-used tools first for higher cache hit rate

Implementation: Restructure CLAUDE.md to front-load stable content:

[CACHED] Project rules, architecture, conventions (rarely changes)
[CACHED] RuVector tool schemas (40 tools, stable across sessions)
[DYNAMIC] Recent brain context (changes per query)

Tier 5: Agent SDK Embedding (High Impact, High Effort)

Use @anthropic-ai/claude-agent-sdk to embed Claude Code inside RuVector orchestration:

import { query } from "@anthropic-ai/claude-agent-sdk";

// RuVector orchestrator drives Claude Code as a cognitive worker
async function brainEnhancedQuery(task: string) {
  // 1. Search brain for context
  const context = await brainSearch(task);
  
  // 2. Run Claude Code with brain context injected
  for await (const event of query({
    prompt: `${context}\n\nTask: ${task}`,
    options: {
      allowedTools: ["Read", "Edit", "Bash", "mcp__pi-brain__*"],
      maxTurns: 10,
    }
  })) {
    // 3. Feed results back to brain
    if (event.type === "result") {
      await brainShare(event.result);
    }
  }
}

Tier 6: Plugin Marketplace Distribution (Medium Impact, Medium Effort)

Package RuVector as a Claude Code plugin for one-click installation:

{
  "name": "@ruvector/claude-plugin",
  "claudeCode": {
    "mcpServers": {
      "pi-brain": { "url": "https://mcp.pi.ruv.io" }
    },
    "agents": ["researcher", "reviewer", "consciousness"],
    "skills": ["brain-search", "brain-share", "phi-analyze"],
    "hooks": { ... }
  }
}

Implementation Priority

Tier	Effort	Impact	Timeline
1. WASM MCP	2 weeks	High — 10x faster tool calls	Sprint 1
2. Agent defs	2 days	Medium — better UX	Sprint 1
3. Hooks	1 week	High — real-time intelligence	Sprint 1
4. Cache opt	2 days	Medium — cost reduction	Sprint 1
5. Agent SDK	3 weeks	High — full embedding	Sprint 2
6. Plugin	1 week	Medium — distribution	Sprint 2

WASM Optimization Targets

Based on Claude Code's tool dispatch pattern (validateInput → call), optimize WASM modules for:

Metric	Current	Target	How
`brain_search` latency	~200ms (network)	<5ms (local WASM HNSW)	`micro-hnsw-wasm` with local cache
Embedding generation	~100ms (API)	<10ms (local WASM)	`ruvector-cnn-wasm` HashEmbedder
Tool schema load	40 tools at startup	Deferred via ToolSearch	Lazy-load tool groups
Context usage	~2000 tokens/tool schema	~500 tokens (compressed)	Merge related tool schemas
Permission checks	Per-tool	Batch via PreToolUse hook	WASM pre-filter

Graph-Informed Architecture

From the dependency graph analysis (doc 12, 16), Claude Code's tool dispatch follows:

Agent Loop (s$)
  └─▶ Tool Dispatch
       ├─▶ Built-in tools (Read, Edit, Bash, etc.)
       ├─▶ MCP tools (mcp__server__tool namespace)
       │    └─▶ stdio/SSE/WS transport
       └─▶ Agent tool (spawns sub-agents)

Optimization insight: MCP tools go through a transport layer that adds ~50ms overhead per call. By embedding WASM in the MCP server process, we eliminate the transport hop for hot-path operations while keeping cold-path operations on the network.

Graph topology insight: The knowledge graph (16M edges) should inform which tools are co-invoked. From brain usage patterns:

brain_search → brain_share (90% co-occurrence) — bundle in one WASM module
brain_status → standalone — keep lightweight
brain_partition → heavy computation — always remote

Consequences

Positive

10-40x latency reduction for hot-path brain operations via local WASM
Richer integration surface (hooks, agents, skills, not just tools)
Cost reduction through prompt cache optimization
Plugin distribution enables one-click adoption
Graph-informed architecture avoids optimizing cold paths

Negative

WASM modules need synchronization with remote brain state
Local HNSW cache can become stale (mitigated by TTL + invalidation)
Agent SDK embedding increases coupling with Claude Code's release cycle
Plugin marketplace requirements may evolve

Risks

Risk	Mitigation
WASM module size bloat	Use `micro-hnsw-wasm` (5.5KB), not full crate
Cache coherence	TTL-based invalidation + version vector
Claude Code breaking changes	Pin to stable Agent SDK version
MCP protocol evolution	Abstract transport behind trait

References

ADR-133: Claude Code Source Analysis
ADR-130: SSE Decoupling
ADR-066: SSE MCP Transport
Research: /docs/research/claude-code-rvsource/
Claude Code extension docs: 13-extension-points.md
Core module analysis: 15-core-module-analysis.md

12 KiB Raw Permalink Blame History