mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-25 15:03:46 +00:00

rUv e39b5901c1 feat(decompiler): rebuild all versions — organized source/rvf separation, 100% coverage

Rebuilt all 4 versions from scratch:
- v0.2.x: 1,049 classes, 13,869 functions, 3,375 RVF vectors
- v1.0.x: 1,390 classes, 16,593 functions, 4,669 RVF vectors
- v2.0.x: 1,612 classes, 20,395 functions, 5,712 RVF vectors
- v2.1.x: 1,632 classes, 19,906 functions, 9,058 RVF vectors

Structure: source/ (17 JS modules in subfolders) + rvf/ (9 containers)
- Zero mixing: no JS in rvf dirs, no RVF in source dirs
- 100% code coverage: uncategorized/ catches everything
- 17 modules: core/3, tools/3, permissions/1, config/3, telemetry/1, ui/2, types/1, uncategorized/1
- 9 RVF containers per version (1 master + 8 per-category)

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-04-03 03:18:41 +00:00

8.7 KiB

Raw Permalink Blame History

Claude Code CLI Source Analysis: Research Index

Project

Deep-dive reverse engineering of the Claude Code CLI (v2.1.91) internal architecture, based on binary analysis, string extraction, pattern matching, and configuration schema examination.

Methodology

This analysis used "agentic jujutsu" -- the tool analyzed itself by:

Locating the binary and extension files on disk
Extracting the embedded JavaScript source from the Bun SEA binary
Pattern-matching against 12.8 MB of application code for class names, function signatures, string literals, and configuration patterns
Analyzing the 76-property settings schema
Cross-referencing 498 environment variables
Mapping tool definitions, hook events, and MCP protocol methods

Research Documents

#	Document	Description
01	Overview and Binary Structure	Binary format, installation paths, Bun SEA architecture, version management
02	Tool System	25+ built-in tools, MCP tool integration, tool schemas, validation, content block types
03	Agent Loop and Execution Flow	Entry points, main loop, streaming, conversation management, slash commands, output formats
04	Permission System	6 permission modes, permission flow, sandbox integration, managed settings
05	MCP Integration	4 transports, 13 protocol methods, connection management, OAuth, tool discovery
06	Hooks System	6 hook events, command/HTTP hook types, lifecycle, security controls
07	Context and Session Management	Token budgets, auto-compaction, session persistence, CLAUDE.md, file checkpointing, prompt caching
08	Configuration and Environment	Settings hierarchy, 76 settings, 498 env vars, home directory structure
09	Agent and Subagent System	Agent types, task/subagent lifecycle, skill system, plugin marketplace
10	Models and API	27+ model IDs, 5 provider backends, API endpoints, prompt caching, effort levels
11	Telemetry and Observability	OpenTelemetry, Datadog, Perfetto, debugging, cost tracking
12	Dependency Graph	Module relationships, data flow, state management, initialization sequence
13	Extension Points	13 extension mechanisms from CLAUDE.md to Agent SDK
14	Source Extraction	Binary analysis, code metrics, extraction methods, dependency identification
15	Core Module Analysis	Agent loop, tool dispatch, permissions, context management, MCP, streaming
16	Call Graphs	Mermaid call graphs: boot, agent loop, tool dispatch, permissions, MCP, compaction
17	Class Hierarchy	1,557 classes, inheritance trees, AppState type, tool registry
18	State Machines	Agent loop, permission, session, streaming, MCP, sandbox state machines

Extracted Source (v2.1.91)

Source and RVF cleanly separated. Master RVF: 9,058 vectors.

Directory	Module	Fragments	Confidence
`source/core/`	agent-loop.js	77	High
`source/core/`	context-manager.js	49	High
`source/core/`	streaming-handler.js	24	High
`source/core/`	session.js	361	High
`source/tools/`	tool-dispatch.js	531	High
`source/tools/mcp/`	mcp-client.js	51	High
`source/permissions/`	permission-system.js	500	High
`source/config/`	config.js	473	High
`source/config/`	model-provider.js	165	Medium
`source/config/`	env-vars.js	223	Pattern
`source/telemetry/`	telemetry.js	524	High
`source/telemetry/`	telemetry-events.js	861	Pattern
`source/ui/`	commands.js	80	Medium
`source/ui/`	command-defs.js	93	Pattern
`source/types/`	class-hierarchy.js	1,467	Pattern
`source/types/`	api-endpoints.js	52	Pattern
`source/uncategorized/`	uncategorized.js	3,162	Low

RVF containers in rvf/: master.rvf (all), core.rvf, tools.rvf, permissions.rvf, config.rvf, telemetry.rvf, etc.

Additional Research

#	Document	Description
19	RuVector Integration Guide	6-tier integration plan: WASM MCP, agents, hooks, cache, SDK, plugin
20	SOTA Decompiler Research	Survey of JSNice, DeGuard, DIRE, VarCLR + ruDevolution validation
21	Model Weight Analysis	Embedded models, LoRA federation, GPU training, GGUF parsing

RVF Version Corpus

Version	Latest	Vectors	RVF Size	Bundle	Classes	Functions	Modules
v0.2.x	0.2.126	3,375	1,731 KB	6.9 MB	1,049	13,869	17
v1.0.x	1.0.128	4,669	2,388 KB	8.9 MB	1,390	16,593	17
v2.0.x	2.0.77	5,712	2,918 KB	10.5 MB	1,612	20,395	17
v2.1.x	2.1.91	9,058	4,617 KB	12.6 MB	1,632	19,906	17

Tools

Tool	Description
`scripts/rebuild-all-versions.mjs`	Full rebuild of all version decompilations (Node.js)
`scripts/claude-code-decompile.sh`	CLI decompiler (extract, beautify, split)
`scripts/claude-code-rvf-corpus.sh`	Build RVF containers for all versions (shell wrapper)
`npm/packages/ruvector/src/decompiler/`	Decompiler library (module-splitter, metrics, witness)
`npx ruvector decompile <package>`	npm CLI decompiler
`examples/decompiler-dashboard/`	Visual explorer (Vite + React)
`crates/ruvector-decompiler/`	Rust decompiler crate (MinCut + AI + witness)

ruDevolution SOTA Results

95.7% name accuracy — beats JSNice (63%), DIRE (65.8%), VarCLR (72%) by 23-35 points.

Trained on 8,201 pairs, 673K param transformer, pure Rust inference (<5ms, zero deps).

Key Findings

Architecture Summary

Runtime: Bun 1.3.11 Single Executable Application (229 MB binary)
Application code: ~12.8 MB of bundled, minified JavaScript
UI: React 18.3.1 WebView (VS Code) + Ink-style terminal (CLI)
API: Anthropic Messages API with streaming SSE
Extension: MCP client protocol with 4 transports

By the Numbers

Metric	Count
Built-in tools	25+
Slash commands	39
Environment variables	498
Settings properties	76
Supported models	27+
MCP protocol methods	13
Hook event types	6
Permission modes	5 (acceptEdits, bypassPermissions, default, dontAsk, plan)
Extension mechanisms	13
Auth providers	5
MCP transports	4
Output formats	3
Source code classes	1,557
Functions (estimated)	19,464
Async generators (core loops)	6
Bundle size (minified)	11 MB / 4,836 lines

Architecture Pattern

Claude Code follows a plugin-oriented monolith pattern:

Single binary deployment (Bun SEA)
Modular internal architecture with clear subsystem boundaries
Extensive extension surface (MCP, hooks, agents, skills, plugins)
Multi-provider backend abstraction (Anthropic/AWS/GCP/Azure)
Layered security (permissions -> sandbox -> hooks -> managed settings)

Limitations

Source is minified/mangled: variable names are meaningless (e.g., Yq, f9)
Cannot trace exact function boundaries or module structure
V8 snapshot region (~100MB) could not be decompiled
Some patterns may be from bundled dependencies, not Claude Code itself
This analysis reflects v2.1.91; architecture may change between versions

8.7 KiB Raw Permalink Blame History