ruvector/docs/research/claude-code-rvsource/14-source-extraction.md
rUv 930fca916f feat(sse): decouple SSE to mcp.pi.ruv.io proxy + Claude Code source research
SSE Proxy Decoupling (ADR-130):
- Fix ruvbrain-sse proxy: proper MCP handshake, session creation, drain polling
- Fix internal queue endpoints: session_create keeps receiver, drain returns buffered messages
- Add response_queues to AppState for SSE proxy communication
- Skip sparsifier for >5M edge graphs (was crashing on 16M edges)
- Add SSE_DISABLED/MAX_SSE env vars for configurable connection limits
- Route SSE to dedicated mcp.pi.ruv.io subdomain (Cloudflare CNAME)
- Serve SSE at root / path on proxy (no /sse needed)
- Update all references from pi.ruv.io/sse to mcp.pi.ruv.io
- Fix Dockerfile consciousness crate build (feature/version mismatches)

Claude Code CLI Source Research (ADR-133):
- 19 research documents analyzing Claude Code internals (3000+ lines)
- Decompiler script + RVF corpus builder for all major versions
- Binary RVF containers for v0.2, v1.0, v2.0, v2.1 (300-2068 vectors each)
- Call graphs, class hierarchies, state machines from minified source

Integration Strategy (ADR-134):
- 6-tier integration plan: WASM MCP, agents, hooks, cache, SDK, plugin
- Integration guide with architecture diagrams and performance targets

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-02 23:39:56 +00:00

168 lines
6.2 KiB
Markdown

# 14 - Source Extraction and Code Metrics
## Binary Analysis
### Distribution Formats
Claude Code ships in two forms:
| Format | Path | Size | Version |
|--------|------|------|---------|
| Bun SEA (ELF) | `~/.local/share/claude/versions/2.1.90` | 229,902,976 bytes (219 MB) | 2.1.90 |
| NPM bundle | `@anthropic-ai/claude-code/cli.js` | 11,044,554 bytes (10.5 MB) | 2.0.62 |
The Bun SEA binary is a dynamically linked ELF executable embedding the Bun runtime (v1.2+) with
the JavaScript bundle inlined. The NPM package contains the same logical code as a single minified
`cli.js` file plus tree-sitter WASM modules and a vendored ripgrep binary.
### Binary Structure (Bun SEA)
```
ELF 64-bit LSB executable, x86-64
├── Bun runtime (~219 MB)
│ ├── V8/JavaScriptCore bindings
│ ├── libc, libcrypto, libssl
│ └── Node.js compatibility layer
├── Embedded JS bundle (~11 MB equivalent)
│ ├── Minified application code
│ ├── Bundled npm dependencies (lodash, zod, ink, etc.)
│ └── Tree-sitter grammars (WASM)
└── Bun SEA metadata markers
├── @bun @bytecode @bun-cjs
└── {"method":"Bun.canReload"}
```
### NPM Package Layout
```
@anthropic-ai/claude-code/
├── cli.js 11,044,554 bytes (single minified bundle)
├── sdk-tools.d.ts 65,511 bytes (TypeScript type defs for tools)
├── package.json 1,196 bytes
├── tree-sitter-bash.wasm 1,380,769 bytes
├── tree-sitter.wasm 205,498 bytes
├── vendor/
│ └── ripgrep/ (bundled rg binary)
├── LICENSE.md
└── README.md
```
### Version Management
Multiple versions coexist under `~/.local/share/claude/versions/`:
- `2.1.86` (228,280,960 bytes) - 2025-03-27
- `2.1.87` (228,280,960 bytes) - 2025-03-29
- `2.1.90` (229,902,976 bytes) - 2025-04-02 (current)
The active version is symlinked: `~/.local/bin/claude -> ~/.local/share/claude/versions/2.1.90`
## Code Metrics (from cli.js v2.0.62)
| Metric | Count |
|--------|-------|
| File size | 11,044,554 bytes |
| Lines (minified) | 4,836 |
| Estimated functions | 19,464 |
| Async functions | 884 |
| Arrow functions | 23,537 |
| Classes | 1,557 |
| Class inheritance (extends) | 956 |
| `for await` loops | 41 |
| `yield*` statements | 18 |
| Async generators | 6 (core loop functions) |
| Node.js built-in imports | 25+ modules |
### Estimated Original Source Size
The minified bundle is ~11 MB in 4,836 lines. Given typical minification ratios (3-5x for
well-structured TypeScript), the original source is estimated at 33-55 MB / 50,000-150,000 lines
across hundreds of modules.
### Minification Characteristics
The code uses aggressive mangling:
- All local variables shortened to 1-3 characters (`A`, `Q`, `B`, `G`, `Z`, `Y`)
- Module-scoped functions use short hashes (`s$`, `ye`, `nB`, `QA`, `wG`)
- Class names are 2-4 character hashes (`L6`, `V9`, `GI`, `RX`, `oJ`, `c90`)
- Original names survive only in string literals and public API surfaces
### Key Identifiable Functions (from code patterns)
| Minified Name | Likely Original | Evidence |
|--------------|-----------------|----------|
| `s$` | `agentLoop` / `queryLoop` | Core async generator; receives messages, systemPrompt, canUseTool, toolUseContext |
| `ye` | `resolveModel` | Takes permissionMode, mainLoopModel, exceeds200kTokens |
| `Ll` | `createInitialAppState` | Returns full AppState object with all fields |
| `QA` | `trackEvent` / `telemetry` | Called everywhere with event name + payload |
| `nB` | `updateSettings` | Writes to settings store |
| `wG` | `logDebug` | Debug logging with event names like "query_query_start" |
| `Y7` | `getMainLoopModel` | Returns current model |
| `Y0` | `getCwd` | Returns current working directory |
| `GB` | `getProjectDir` | Returns project directory |
| `Bd` | `getContextWindow` | Takes model, returns context window size |
| `xC` | `getToolPermissionContext` | Returns permission context object |
| `Sn` | `extractTextContent` | Extracts text from API response |
| `B0` | `getAgentId` | Returns current agent ID |
## Extraction Methods
### Method 1: NPM Package (recommended)
The `cli.js` from the NPM package is directly analyzable JavaScript:
```bash
CLI="$(npm root -g)/claude-flow/node_modules/@anthropic-ai/claude-code/cli.js"
# or install directly:
npm pack @anthropic-ai/claude-code && tar xzf anthropic-ai-claude-code-*.tgz
```
### Method 2: Binary Strings
```bash
strings ~/.local/share/claude/versions/2.1.90 | grep -c 'function\|class '
# Returns ~9,887 readable JS fragments
```
The binary contains the same JS but embedded within the Bun SEA container. The NPM
package provides cleaner access to the same code.
### Method 3: Bun SEA Extraction
The Bun SEA format embeds a bytecode blob. To extract:
```bash
# Find the JS entry section
strings -t d binary | grep '#!/usr/bin/env' | head -1
# Use offset to extract the embedded module
```
## Source Files Referenced
- Extracted module analysis: `extracted/agent-loop.rvf`
- Tool dispatch patterns: `extracted/tool-dispatch.rvf`
- Permission system: `extracted/permission-system.rvf`
- MCP client: `extracted/mcp-client.rvf`
- Context manager: `extracted/context-manager.rvf`
- Streaming handler: `extracted/streaming-handler.rvf`
## Dependencies (Node.js Built-in Imports)
```
assert, async_hooks, child_process, crypto, events, fs, fs/promises,
http, https, module, net, os, path, process, stream, tty, url, util, zlib
node:buffer, node:child_process, node:crypto, node:fs, node:fs/promises,
node:http, node:https, node:module, node:net, node:os, node:path,
node:process, node:stream, node:timers/promises, node:tty, node:url,
node:util, node:zlib
```
## Bundled Third-Party Libraries (identified from code patterns)
- Zod (schema validation - `S.string()`, `S.enum()`, `S.record()`)
- Ink / React (terminal UI - `createElement`, `useCallback`, `useEffect`, `useRef`)
- Sentry (error tracking - `globalEventProcessors`, `_dispatching`)
- GrowthBook (feature flags - `stickyBucketService`, `getExperiment`)
- Statsig (experimentation - `_getFeatureGateImpl`)
- Sharp (image processing - `@img/sharp-*` optional deps)
- node-forge (crypto - `aes.startEncrypting`, `aes.createDecryptionCipher`)
- tree-sitter (AST parsing - WASM modules)
- ripgrep (file search - vendored binary)