ruvector/docs/adr/ADR-133-claude-code-source-analysis.md
rUv 930fca916f feat(sse): decouple SSE to mcp.pi.ruv.io proxy + Claude Code source research
SSE Proxy Decoupling (ADR-130):
- Fix ruvbrain-sse proxy: proper MCP handshake, session creation, drain polling
- Fix internal queue endpoints: session_create keeps receiver, drain returns buffered messages
- Add response_queues to AppState for SSE proxy communication
- Skip sparsifier for >5M edge graphs (was crashing on 16M edges)
- Add SSE_DISABLED/MAX_SSE env vars for configurable connection limits
- Route SSE to dedicated mcp.pi.ruv.io subdomain (Cloudflare CNAME)
- Serve SSE at root / path on proxy (no /sse needed)
- Update all references from pi.ruv.io/sse to mcp.pi.ruv.io
- Fix Dockerfile consciousness crate build (feature/version mismatches)

Claude Code CLI Source Research (ADR-133):
- 19 research documents analyzing Claude Code internals (3000+ lines)
- Decompiler script + RVF corpus builder for all major versions
- Binary RVF containers for v0.2, v1.0, v2.0, v2.1 (300-2068 vectors each)
- Call graphs, class hierarchies, state machines from minified source

Integration Strategy (ADR-134):
- 6-tier integration plan: WASM MCP, agents, hooks, cache, SDK, plugin
- Integration guide with architecture diagrams and performance targets

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-02 23:39:56 +00:00

4.5 KiB

ADR-133: Claude Code CLI Source Code Analysis

Status

Deployed (2026-04-02)

Date

2026-04-02

Context

Understanding Claude Code's internal architecture is critical for building effective extensions (agents, hooks, skills, MCP servers) and for debugging integration issues. The CLI ships as a minified Bun 1.3.11 Single Executable Application (229 MB) with no public source code, making direct inspection necessary.

Decision

Perform systematic reverse-engineering analysis of the Claude Code CLI binary using string extraction, pattern matching, and structural analysis. Store findings as structured research documents in /docs/research/claude-code-rvsource/.

Approach: Agentic Jujutsu

Use Claude Code's own agent capabilities (background research agents, parallel tool calls) to analyze itself — the tool examines its own binary.

Architecture Findings

Binary Structure

  • Format: Bun 1.3.11 SEA (Single Executable Application)
  • Size: 229 MB binary, ~12.8 MB bundled/minified JavaScript
  • Entry: cli.js — single bundle with all application code
  • Runtime: Bun (not Node.js) with native module support

Code Metrics (from decompilation)

Metric Count
Classes 1,557
Functions 19,464
Async functions 884
Arrow functions 23,537
Environment variables 498
Built-in tools 25+
Slash commands 39
Permission modes 6
Auth providers 5
Extension points 13

Core Modules Identified

Module Minified Symbol Role
Agent Loop s$ Async generator — recursive yield* after tool execution
Tool Registry XF0 validateInput/call interface, 25+ built-in tools
Permission Checker 5 modes, sandbox (bubblewrap/seatbelt), managed settings
Context Manager Auto-compaction via clear_tool_uses_20250919 API feature
MCP Client stdio/SSE/WebSocket transports, OAuth/OIDC, MCPB bundles
Streaming Handler SSE event processing, 3 stream modes

Key Architectural Patterns

  1. Recursive generator loop: The agent loop is an async generator that yield*s itself after each tool execution, producing 13 distinct event types
  2. Loopback proxy for MCP: Tool calls from MCP are proxied to REST endpoints via HTTP loopback on localhost
  3. Deferred tool loading: MCP tool schemas loaded lazily via ToolSearch to keep initial context small
  4. Context compaction: Automatic context window management with micro-compaction for stale tool results

Research Output

Documents (18 files, ~3,000 lines)

File Content
00-index.md Master index
01 - 13 Architecture analysis (overview, tools, agent loop, permissions, MCP, hooks, context, config, subagents, models, telemetry, dependencies, extensions)
14 - 18 Source code analysis (extraction, core modules, call graphs, class hierarchy, state machines)

Extracted Source (6 RVF-header files)

Module extractions in extracted/ with metadata headers:

  • agent-loop.rvf, tool-dispatch.rvf, permission-system.rvf
  • mcp-client.rvf, context-manager.rvf, streaming-handler.rvf

Note: These use text format with RVF-style metadata headers, not binary RVF containers. Future work: convert to proper RVF cognitive containers with vector embeddings using rvf-cli.

Decompiler Script

/scripts/claude-code-decompile.sh — automated extraction tool:

  • Finds Claude Code source (npm package or Bun SEA binary)
  • Extracts JavaScript via strings + pattern matching
  • Basic beautification and module splitting
  • Generates files with metadata headers

Consequences

Positive

  • Comprehensive internal reference for building Claude Code extensions
  • Call graphs and state machines clarify execution flow
  • Decompiler script enables analysis of future versions
  • Understanding of tool dispatch enables better MCP server design

Negative

  • Analysis based on minified code — some mappings are speculative
  • Findings may become stale as Claude Code updates frequently
  • Binary extraction captures string literals well but misses control flow nuances

Risks

  • Decompiled source is for internal research only — respect Anthropic's terms of service
  • Minified symbol names (s$, XF0, ye) may change between versions

References