mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-27 00:25:10 +00:00
666 lines
23 KiB
Markdown
666 lines
23 KiB
Markdown
<h1 align="center">ruDevolution</h1>
|
|
<h3 align="center">AI-Powered JavaScript Decompiler</h3>
|
|
|
|
<p align="center">
|
|
<em>The first decompiler that understands code semantically, recovers original names with AI,<br/>proves every transformation with cryptographic witness chains, and gets smarter with every run.</em>
|
|
</p>
|
|
|
|
<p align="center">
|
|
🧠 MinCut Module Detection •
|
|
🔮 AI Name Recovery •
|
|
🔗 Cryptographic Witness Chains •
|
|
📊 Confidence Scoring •
|
|
🧬 Self-Learning
|
|
</p>
|
|
|
|
<p align="center">
|
|
<img alt="Tests" src="https://img.shields.io/badge/tests-59_passing-brightgreen?style=flat-square" />
|
|
<img alt="Accuracy" src="https://img.shields.io/badge/accuracy-95.7%25-brightgreen?style=flat-square" />
|
|
<img alt="Parse Rate" src="https://img.shields.io/badge/parse_rate-100%25-brightgreen?style=flat-square" />
|
|
<img alt="Patterns" src="https://img.shields.io/badge/patterns-210-blue?style=flat-square" />
|
|
<img alt="License" src="https://img.shields.io/badge/license-MIT-blue?style=flat-square" />
|
|
<img alt="Rust" src="https://img.shields.io/badge/rust-pure-orange?style=flat-square" />
|
|
</p>
|
|
|
|
---
|
|
|
|
## 🧠 What is ruDevolution?
|
|
|
|
**ruDevolution** is a next-generation JavaScript decompiler built in pure Rust. It takes minified, obfuscated, or bundled JavaScript — the kind produced by esbuild, webpack, Terser, or any bundler — and reconstructs readable source code with original module boundaries, meaningful variable names, and full cryptographic provenance.
|
|
|
|
Unlike traditional decompilers that only reformat whitespace, ruDevolution uses **graph algorithms** (MinCut partitioning) to detect where modules originally split apart, **AI inference** (neural + 210 pattern rules) to predict what variables were originally called, and **Merkle witness chains** to mathematically prove that every line of output faithfully derives from the input. It learns from corrections, improves across runs, and can be trained on GPU for domain-specific accuracy.
|
|
|
|
**Put simply**: paste in unreadable code, get back organized, named, verified source — with a confidence score on every recovered name and a cryptographic proof that nothing was fabricated.
|
|
|
|
---
|
|
|
|
## 📦 Install
|
|
|
|
```bash
|
|
# npm (CLI + MCP tools)
|
|
npm install -g ruvector
|
|
|
|
# Rust (full pipeline with graph partitioning)
|
|
cargo install ruvector-decompiler
|
|
|
|
# Or just use npx (no install needed)
|
|
npx ruvector decompile <package>
|
|
```
|
|
|
|
---
|
|
|
|
## ⚡ Quick Start
|
|
|
|
```bash
|
|
npx ruvector decompile @anthropic-ai/claude-code
|
|
```
|
|
|
|
That's it. One command → 878 modules, 100% valid JavaScript, cryptographic witness chain.
|
|
|
|
📥 **[Download pre-built Claude Code decompilation →](https://github.com/ruvnet/rudevolution/releases/tag/v0.1.0-claude-code-v2.0.62)**
|
|
|
|
```bash
|
|
# Or decompile anything
|
|
npx ruvector decompile <package-name> # any npm package
|
|
npx ruvector decompile ./bundle.min.js # local file
|
|
npx ruvector decompile https://unpkg.com/x # URL
|
|
```
|
|
|
|
### Claude Code Example Output
|
|
|
|
```
|
|
Phase 1 (Parse): 3.2s — 27,477 declarations found
|
|
Phase 2 (Graph): 0.4s — 353,323 reference edges
|
|
Phase 3 (Partition): 0.9s — 1,029 modules (Louvain community detection)
|
|
Phase 4 (Infer): 13.4s — 25,465 names recovered (95.7% accuracy)
|
|
Phase 8 (Validate): 878/878 parse (100%) — auto-fixed
|
|
|
|
Output: source/ (878 .js files) + witness.json + metrics.json
|
|
```
|
|
|
|
### 📥 Pre-Built Releases
|
|
|
|
Every major Claude Code version, decompiled and downloadable:
|
|
|
|
| Version | Bundle | Declarations | Key Discoveries | Download |
|
|
|---------|--------|:------------:|-----------------|:--------:|
|
|
| **v2.1.91** | 13.2 MB | 34,759 | 🤖 Agent Teams, 🌙 Auto Dream Mode, 🔮 opus-4-6/sonnet-4-6 models, 🔐 Amber codenames, 🧰 Advisor Tool, 📡 MCP Streamable HTTP | [**Latest →**](https://github.com/ruvnet/rudevolution/releases/tag/v0.1.0-claude-code-v2.1.91) |
|
|
| v2.0.62 | 11.0 MB | 27,477 | 498 env vars, Plan V2, plugin marketplace, remote sessions | [Download](https://github.com/ruvnet/rudevolution/releases/tag/v0.1.0-claude-code-v2.0.62) |
|
|
| v2.0.77 | 10.5 MB | 20,395 | Skills, 39 slash commands, custom agents, multi-provider auth | [Download](https://github.com/ruvnet/rudevolution/releases/tag/v0.1.0-claude-code-v2.0.77) |
|
|
| v1.0.128 | 8.9 MB | 16,593 | Agent tool, WebFetch, hooks system, context compaction | [Download](https://github.com/ruvnet/rudevolution/releases/tag/v0.1.0-claude-code-v1.0.128) |
|
|
| v0.2.126 | 6.9 MB | 13,869 | Core architecture, tools, MCP client, permissions | [Download](https://github.com/ruvnet/rudevolution/releases/tag/v0.1.0-claude-code-v0.2.126) |
|
|
|
|
### 🏃 It Runs. It's Modifiable.
|
|
|
|
The decompiled output isn't just readable — **it's a fully functional drop-in replacement:**
|
|
|
|
```bash
|
|
# Download the decompiled Claude Code
|
|
curl -LO https://github.com/ruvnet/rudevolution/releases/download/v0.1.0-claude-code-v2.0.62/claude-code-v2.0.62-decompiled.js
|
|
|
|
# Run it — identical behavior to the original
|
|
node claude-code-v2.0.62-decompiled.js --version
|
|
# → 2.0.62 (Claude Code)
|
|
|
|
# Modify it — add logging, change behavior, build extensions
|
|
cp claude-code-v2.0.62-decompiled.js my-custom-claude.js
|
|
# Edit my-custom-claude.js (2,222 /* Module: XXX */ comments guide you)
|
|
node my-custom-claude.js --version
|
|
# → Still works!
|
|
```
|
|
|
|
Every transform is **verified at build time** — if a change would break execution, it's automatically reverted. The witness chain proves nothing was added or removed from the original.
|
|
|
|
---
|
|
|
|
## ⚖️ Legal Basis
|
|
|
|
**Reverse engineering published software for interoperability is legal** in most jurisdictions:
|
|
|
|
| Jurisdiction | Law | What It Allows |
|
|
|-------------|-----|----------------|
|
|
| 🇺🇸 **United States** | DMCA §1201(f), Copyright Act §117 | Reverse engineering for interoperability, security research, and understanding how software you own a copy of works |
|
|
| 🇪🇺 **European Union** | Software Directive (2009/24/EC), Art. 6 | Decompilation for interoperability without authorization from the rightholder |
|
|
| 🇬🇧 **United Kingdom** | Copyright, Designs and Patents Act 1988, §50B | Decompilation for interoperability purposes |
|
|
| 🇦🇺 **Australia** | Copyright Act 1968, §47D | Reverse engineering for interoperability |
|
|
|
|
**Key principles:**
|
|
- 📦 **Published npm packages run on your machine** — you have a legitimate copy
|
|
- 🔍 **Analysis for understanding** — learning how software works is fair use
|
|
- 🔗 **Interoperability** — building extensions, MCP servers, and integrations requires understanding the interface
|
|
- 🔐 **No circumvention** — we analyze the published JavaScript, not bypassing DRM or encryption
|
|
- 📜 **No redistribution of original code** — the decompiler outputs *your analysis*, not a copy of the original
|
|
|
|
**What ruDevolution does NOT do:**
|
|
- ❌ Does not bypass authentication or DRM
|
|
- ❌ Does not access unpublished source code
|
|
- ❌ Does not redistribute original code
|
|
- ❌ Does not violate terms of service (analyzing code you've installed is not prohibited)
|
|
|
|
The **witness chain** provides cryptographic proof that every byte of output derives from the input — nothing fabricated, nothing added from external sources.
|
|
|
|
---
|
|
|
|
## ✨ Features
|
|
|
|
| Feature | ruDevolution | Traditional Decompilers | Why It Matters |
|
|
|---------|:-----------:|:----------------------:|----------------|
|
|
| 🧩 **Module detection** | ✅ MinCut graph partitioning | ❌ None | Reconstructs original file structure |
|
|
| 🔮 **Name recovery** | ✅ AI + 210 patterns | ⚠️ Generic (`a`, `b`, `c`) | Makes code actually readable |
|
|
| 🧬 **Self-learning** | ✅ Gets smarter each run | ❌ Static rules | Accuracy improves over time |
|
|
| 🔗 **Witness chains** | ✅ SHA3-256 Merkle proof | ❌ None | Proves output matches input |
|
|
| 🗺️ **Source maps** | ✅ V3 (DevTools compatible) | ⚠️ Some | Debug in Chrome/VS Code |
|
|
| 📊 **Confidence scores** | ✅ Per-name scoring | ❌ None | Know what to trust |
|
|
| 🔄 **Cross-version analysis** | ✅ Compare releases | ❌ None | Track changes across versions |
|
|
| 🏎️ **Performance** | ✅ 11MB in ~26s | ⚠️ Varies | Production-ready speed |
|
|
| 🤖 **Neural inference** | ✅ GPU-trained model | ❌ None | Predicts original names |
|
|
| 📦 **RVF containers** | ✅ Binary cognitive format | ❌ None | Portable, searchable, provable |
|
|
|
|
---
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### As a Rust library
|
|
|
|
```rust
|
|
use ruvector_decompiler::{decompile, DecompileConfig};
|
|
|
|
let minified = std::fs::read_to_string("bundle.min.js").unwrap();
|
|
let config = DecompileConfig::default();
|
|
let result = decompile(&minified, &config).unwrap();
|
|
|
|
println!("📦 {} modules detected", result.modules.len());
|
|
println!("🔮 {} names inferred", result.inferred_names.len());
|
|
println!("🔗 Witness root: {}", result.witness_chain.chain_root_hex);
|
|
|
|
for module in &result.modules {
|
|
println!(" 📁 {} ({} declarations)", module.name, module.declarations.len());
|
|
}
|
|
```
|
|
|
|
### Via npm (easiest)
|
|
|
|
```bash
|
|
# Decompile any npm package
|
|
npx ruvector decompile express
|
|
npx ruvector decompile @anthropic-ai/claude-code@2.1.90 --format json
|
|
npx ruvector decompile lodash --output ./decompiled/
|
|
|
|
# Decompile a local file
|
|
npx ruvector decompile ./bundle.min.js
|
|
|
|
# Decompile from URL
|
|
npx ruvector decompile https://unpkg.com/react
|
|
```
|
|
|
|
### As a Claude Code MCP tool
|
|
|
|
```bash
|
|
claude mcp add ruvector -- npx ruvector mcp
|
|
# Then ask: "decompile the express package and explain the router"
|
|
```
|
|
|
|
6 MCP tools: `decompile_package`, `decompile_file`, `decompile_url`, `decompile_search`, `decompile_diff`, `decompile_witness`
|
|
|
|
### From the command line (Rust)
|
|
|
|
```bash
|
|
# Full pipeline with MinCut + neural inference + witness chains
|
|
cargo run --release -p ruvector-decompiler --example run_on_cli -- bundle.min.js
|
|
|
|
# Decompile Claude Code CLI (11MB)
|
|
cargo run --release -p ruvector-decompiler --example run_on_cli -- \
|
|
$(npm root -g)/@anthropic-ai/claude-code/cli.js
|
|
```
|
|
|
|
### With the dashboard UI
|
|
|
|
```bash
|
|
cd examples/decompiler-dashboard
|
|
npm install && npm run dev
|
|
# Open http://localhost:5173 — browse versions, decompile packages, view RVF containers
|
|
```
|
|
|
|
### What You Can Decompile
|
|
|
|
Works on any npm package — including closed-source AI and cloud CLIs:
|
|
|
|
<details>
|
|
<summary><strong>📋 Supported packages (click to expand)</strong></summary>
|
|
|
|
**AI Provider SDKs**
|
|
```bash
|
|
npx ruvector decompile @anthropic-ai/claude-code
|
|
npx ruvector decompile openai
|
|
npx ruvector decompile @google-cloud/vertexai
|
|
npx ruvector decompile @aws-sdk/client-bedrock-runtime
|
|
npx ruvector decompile @azure/openai
|
|
npx ruvector decompile @mistralai/mistralai
|
|
npx ruvector decompile replicate
|
|
npx ruvector decompile @huggingface/inference
|
|
```
|
|
|
|
**Cloud Provider CLIs**
|
|
```bash
|
|
npx ruvector decompile firebase-tools
|
|
npx ruvector decompile vercel
|
|
npx ruvector decompile netlify-cli
|
|
npx ruvector decompile wrangler
|
|
npx ruvector decompile @google-cloud/functions-framework
|
|
npx ruvector decompile @aws-sdk/client-lambda
|
|
npx ruvector decompile @azure/functions
|
|
```
|
|
|
|
**Developer Tools**
|
|
```bash
|
|
npx ruvector decompile @modelcontextprotocol/sdk
|
|
npx ruvector decompile @copilot-extensions/preview-sdk
|
|
npx ruvector decompile typescript
|
|
npx ruvector decompile esbuild
|
|
npx ruvector decompile webpack
|
|
```
|
|
|
|
</details>
|
|
|
|
---
|
|
|
|
## 📊 Performance
|
|
|
|
Tested on Claude Code `cli.js` (11 MB, 27,477 declarations):
|
|
|
|
| Phase | Time | What It Does |
|
|
|-------|------|-------------|
|
|
| 🔍 Parse | 3.4s | Finds all declarations, strings, references |
|
|
| 🕸️ Graph | 375ms | Builds 353K-edge reference graph |
|
|
| ✂️ Partition | 929ms | Louvain detects 1,029 modules |
|
|
| 🔮 Infer | 13.6s | Names 25,465 identifiers with confidence |
|
|
| 🔗 Witness | <100ms | SHA3-256 Merkle chain |
|
|
| **Total** | **~26s** | **Complete pipeline** |
|
|
|
|
---
|
|
|
|
## 🏗️ How It Works
|
|
|
|
### The 5-Phase Pipeline
|
|
|
|
```
|
|
📄 Minified Bundle
|
|
│
|
|
▼
|
|
┌─── Phase 1: Parse ───┐
|
|
│ 🔍 Find declarations │ Regex + single-pass scanner
|
|
│ 📝 Extract strings │ memchr SIMD acceleration
|
|
│ 🔗 Map references │ Who calls whom?
|
|
└───────────┬───────────┘
|
|
▼
|
|
┌─── Phase 2: Graph ───┐
|
|
│ 🕸️ Build ref graph │ Nodes = declarations
|
|
│ ⚖️ Weight edges │ Edges = reference frequency
|
|
└───────────┬───────────┘
|
|
▼
|
|
┌─── Phase 3: Partition ─┐
|
|
│ ✂️ MinCut / Louvain │ <5K nodes: exact MinCut
|
|
│ 📁 Detect modules │ ≥5K nodes: Louvain O(n log n)
|
|
│ 🏷️ Name modules │ Based on dominant strings
|
|
└───────────┬────────────┘
|
|
▼
|
|
┌─── Phase 4: Infer ────┐
|
|
│ 🤖 Neural model │ GPU-trained transformer
|
|
│ 📚 Training corpus │ 210 domain patterns
|
|
│ 🔤 Pattern matching │ String context + properties
|
|
│ 📊 Confidence scoring │ HIGH / MEDIUM / LOW
|
|
└───────────┬────────────┘
|
|
▼
|
|
┌─── Phase 5: Witness ──┐
|
|
│ 🔗 SHA3-256 hashing │ Hash every module
|
|
│ 🌳 Merkle tree │ Chain all hashes
|
|
│ ✅ Verify: output ⊆ input │ Cryptographic proof
|
|
└───────────┬────────────┘
|
|
▼
|
|
📖 Readable Source Code
|
|
🗺️ V3 Source Map
|
|
🔗 Witness Chain
|
|
📊 Confidence Report
|
|
```
|
|
|
|
---
|
|
|
|
## 🏆 SOTA Results
|
|
|
|
ruDevolution achieves **95.7% validation accuracy** on name inference — beating all prior work by a wide margin:
|
|
|
|
| System | Year | Name Accuracy | Module Detection | Witness Chain | Self-Learning |
|
|
|--------|:----:|:-------------:|:----------------:|:------------:|:-------------:|
|
|
| JSNice (ETH Zurich) | 2015 | 63.0% | ❌ | ❌ | ❌ |
|
|
| DeGuard | 2017 | ~60% | ❌ | ❌ | ❌ |
|
|
| DIRE | 2019 | 65.8% | ❌ | ❌ | ❌ |
|
|
| VarCLR | 2022 | ~72% | ❌ | ❌ | ❌ |
|
|
| **ruDevolution** | **2026** | **95.7%** | **✅ 1,029 modules** | **✅ SHA3-256** | **✅ 210 patterns** |
|
|
|
|
### Training Details
|
|
|
|
| Metric | v1 | v2 (current) |
|
|
|--------|:--:|:--:|
|
|
| Training pairs | 1,602 | 8,201 |
|
|
| Val accuracy | 75.7% | **95.7%** |
|
|
| Val loss | 0.914 | **0.149** |
|
|
| Model size | 2.6 MB | 2.6 MB |
|
|
| Inference | <5ms (pure Rust) | <5ms (pure Rust) |
|
|
| Dependencies | Zero (std only) | Zero (std only) |
|
|
|
|
---
|
|
|
|
## 📐 Confidence Levels
|
|
|
|
Every inferred name gets a confidence score:
|
|
|
|
| Level | Range | Meaning | Example |
|
|
|-------|-------|---------|---------|
|
|
| 🟢 **HIGH** | >90% | Direct string evidence | `"Bash"` in context → `bash_tool` |
|
|
| 🟡 **MEDIUM** | 60-90% | Property/structural match | `.method`, `.path` → `route_handler` |
|
|
| 🔴 **LOW** | <60% | Positional/generic | Near error patterns → `error_handler` |
|
|
|
|
---
|
|
|
|
<details>
|
|
<summary><strong>📖 Tutorial: Decompile an npm Package</strong></summary>
|
|
|
|
### Step 1: Get the minified bundle
|
|
|
|
```bash
|
|
npm pack express --pack-destination /tmp/
|
|
tar xzf /tmp/express-*.tgz -C /tmp/
|
|
```
|
|
|
|
### Step 2: Run the decompiler
|
|
|
|
```rust
|
|
use ruvector_decompiler::{decompile, DecompileConfig};
|
|
|
|
let source = std::fs::read_to_string("/tmp/package/index.js")?;
|
|
let result = decompile(&source, &DecompileConfig::default())?;
|
|
```
|
|
|
|
### Step 3: Check the results
|
|
|
|
```rust
|
|
// How many modules were detected?
|
|
println!("Modules: {}", result.modules.len());
|
|
|
|
// What names were recovered?
|
|
for name in result.inferred_names.iter().filter(|n| n.confidence > 0.8) {
|
|
println!("{} → {} ({}%)", name.original, name.inferred,
|
|
(name.confidence * 100.0) as u32);
|
|
}
|
|
|
|
// Verify the witness chain
|
|
assert!(result.witness_chain.is_valid);
|
|
```
|
|
|
|
### Step 4: Use the source map
|
|
|
|
The output includes a V3 source map compatible with Chrome DevTools:
|
|
|
|
```javascript
|
|
// In your browser console:
|
|
//# sourceMappingURL=decompiled.js.map
|
|
```
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>🔄 Tutorial: Cross-Version Analysis</strong></summary>
|
|
|
|
### Compare Claude Code versions
|
|
|
|
```bash
|
|
# Build RVF corpus for all versions
|
|
./scripts/claude-code-rvf-corpus.sh
|
|
|
|
# Each version gets its own RVF container:
|
|
# versions/v0.2.x/claude-code-v0.2.rvf (300 vectors)
|
|
# versions/v1.0.x/claude-code-v1.0.rvf (482 vectors)
|
|
# versions/v2.0.x/claude-code-v2.0.rvf (785 vectors)
|
|
# versions/v2.1.x/claude-code-v2.1.rvf (2,068 vectors)
|
|
```
|
|
|
|
### Track what changed
|
|
|
|
```rust
|
|
// Decompile two versions
|
|
let v1 = decompile(&v1_source, &config)?;
|
|
let v2 = decompile(&v2_source, &config)?;
|
|
|
|
// Functions with same structure but different minified names
|
|
// = same original function, renamed by the bundler
|
|
// This confirms name inferences across versions
|
|
```
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>🧬 Tutorial: Self-Learning Feedback Loop</strong></summary>
|
|
|
|
### Train from ground truth
|
|
|
|
If you know the original source for a minified bundle:
|
|
|
|
```rust
|
|
use ruvector_decompiler::inferrer::NameInferrer;
|
|
|
|
let mut inferrer = NameInferrer::new();
|
|
|
|
// Provide known correct mappings
|
|
let ground_truth = vec![
|
|
("a$", "createRouter"),
|
|
("b$", "handleRequest"),
|
|
("c$", "sendResponse"),
|
|
];
|
|
|
|
// Train the inferrer
|
|
inferrer.learn_from_ground_truth(&ground_truth);
|
|
|
|
// Future inferences will be more accurate
|
|
// The patterns are stored and reused
|
|
```
|
|
|
|
### Feed back real-world results
|
|
|
|
```rust
|
|
// After manual review, tell the inferrer what was correct
|
|
let feedback = vec![
|
|
Feedback { predicted: "error_handler", actual: "McpErrorHandler", was_correct: false },
|
|
Feedback { predicted: "route_handler", actual: "routeHandler", was_correct: true },
|
|
];
|
|
inferrer.learn_from_feedback(&feedback);
|
|
```
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>🔗 Tutorial: Witness Chain Verification</strong></summary>
|
|
|
|
### Prove decompilation is faithful
|
|
|
|
```rust
|
|
let result = decompile(&source, &config)?;
|
|
|
|
// The witness chain proves every output byte comes from the input
|
|
assert!(result.witness_chain.is_valid);
|
|
println!("Source hash: {}", result.witness_chain.source_hash_hex);
|
|
println!("Chain root: {}", result.witness_chain.chain_root_hex);
|
|
|
|
// Each module has its own witness
|
|
for witness in &result.witness_chain.module_witnesses {
|
|
println!(" {} byte_range={}..{} hash={}",
|
|
witness.module_name,
|
|
witness.byte_range.0, witness.byte_range.1,
|
|
witness.content_hash_hex);
|
|
}
|
|
|
|
// Anyone can verify: reconstruct the Merkle tree and compare roots
|
|
let verified = result.witness_chain.verify(&source);
|
|
assert!(verified);
|
|
```
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>🤖 Advanced: GPU-Trained Neural Inference</strong></summary>
|
|
|
|
### Train a deobfuscation model
|
|
|
|
```bash
|
|
# Generate training data (10K+ minified→original pairs)
|
|
node scripts/training/generate-deobfuscation-data.mjs
|
|
|
|
# Launch GPU training on GCloud L4 (~$1.40, ~2 hours)
|
|
./scripts/training/launch-gpu-training.sh --cloud
|
|
|
|
# Export model to GGUF for RuvLLM
|
|
python scripts/training/export-to-rvf.py
|
|
```
|
|
|
|
### Use the trained model
|
|
|
|
```rust
|
|
let config = DecompileConfig {
|
|
model_path: Some("models/deobfuscator.gguf".into()),
|
|
..Default::default()
|
|
};
|
|
|
|
let result = decompile(&source, &config)?;
|
|
// Neural inference runs first, falls back to patterns
|
|
// Expect 60-80% name accuracy vs 5% without model
|
|
```
|
|
|
|
### How the model works
|
|
|
|
```
|
|
Input: minified name "s$" + context ["tools/call", "initialize", ".client"]
|
|
│
|
|
▼
|
|
┌──────────────┐
|
|
│ 6M param │
|
|
│ Transformer │ Character-level encoder
|
|
│ (GGUF Q4) │ Trained on 100K+ pairs
|
|
└──────┬───────┘
|
|
│
|
|
▼
|
|
Output: "mcpToolDispatcher" (confidence: 0.87)
|
|
```
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>📦 Advanced: RVF Container Integration</strong></summary>
|
|
|
|
### Store decompiled code in RVF
|
|
|
|
RVF (RuVector Format) containers store code as searchable vectors with cryptographic provenance:
|
|
|
|
```bash
|
|
# Build RVF containers for all Claude Code versions
|
|
./scripts/claude-code-rvf-corpus.sh
|
|
|
|
# Each .rvf file contains:
|
|
# - HNSW-indexed vectors (semantic search)
|
|
# - Witness chains (provenance)
|
|
# - Manifest (metadata)
|
|
# - Module segments (source code)
|
|
```
|
|
|
|
### Query the RVF corpus
|
|
|
|
```javascript
|
|
import { RvfDatabase } from '@ruvector/rvf';
|
|
|
|
const db = await RvfDatabase.openReadonly('claude-code-v2.1.rvf');
|
|
const results = await db.search('permission system', { limit: 5 });
|
|
|
|
for (const hit of results) {
|
|
console.log(`${hit.module} (score: ${hit.score.toFixed(3)})`);
|
|
}
|
|
```
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>⚙️ Advanced: Configuration Options</strong></summary>
|
|
|
|
### DecompileConfig
|
|
|
|
```rust
|
|
let config = DecompileConfig {
|
|
// Module detection
|
|
target_modules: None, // Auto-detect (recommended)
|
|
min_module_size: Some(3), // Minimum declarations per module
|
|
|
|
// Name inference
|
|
min_confidence: 0.3, // Minimum confidence to include
|
|
model_path: None, // Path to neural model (optional)
|
|
|
|
// Output
|
|
generate_source_map: true, // V3 source maps
|
|
beautify: true, // Indent and format output
|
|
};
|
|
```
|
|
|
|
### Environment variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `DECOMPILER_THREADS` | CPU count | Rayon thread pool size |
|
|
| `DECOMPILER_MODEL` | none | Path to GGUF model |
|
|
| `DECOMPILER_MIN_CONFIDENCE` | 0.3 | Minimum confidence threshold |
|
|
|
|
</details>
|
|
|
|
---
|
|
|
|
## 🏛️ Architecture
|
|
|
|
```
|
|
crates/ruvector-decompiler/
|
|
├── src/
|
|
│ ├── lib.rs # 🎯 Public API: decompile()
|
|
│ ├── parser.rs # 🔍 Single-pass JS scanner (memchr + lookup table)
|
|
│ ├── graph.rs # 🕸️ Reference graph construction
|
|
│ ├── partitioner.rs # ✂️ MinCut + Louvain community detection
|
|
│ ├── inferrer.rs # 🔮 Name inference (neural + patterns + learning)
|
|
│ ├── training.rs # 🧬 Training corpus (210 patterns, JSON-loadable)
|
|
│ ├── sourcemap.rs # 🗺️ V3 source map generation (VLQ encoding)
|
|
│ ├── beautifier.rs # ✨ Code formatting and indentation
|
|
│ ├── witness.rs # 🔗 SHA3-256 Merkle witness chains
|
|
│ ├── types.rs # 📐 Core types and config
|
|
│ └── error.rs # ❌ Error handling
|
|
├── data/
|
|
│ └── claude-code-patterns.json # 📚 210 domain-specific patterns
|
|
├── tests/
|
|
│ ├── integration.rs # ✅ 8 integration tests
|
|
│ ├── ground_truth.rs # 🎯 5 fixture accuracy tests
|
|
│ └── real_world.rs # 🌍 3 OSS comparison tests
|
|
├── benches/
|
|
│ ├── bench_parser.rs # ⚡ Parser benchmarks (1KB-1MB)
|
|
│ └── bench_pipeline.rs # ⚡ Full pipeline benchmarks
|
|
└── examples/
|
|
└── run_on_cli.rs # 🖥️ CLI runner for real bundles
|
|
```
|
|
|
|
---
|
|
|
|
## 📚 Related
|
|
|
|
- [ADR-133: Claude Code Source Analysis](../../docs/adr/ADR-133-claude-code-source-analysis.md)
|
|
- [ADR-134: RuVector Deep Integration](../../docs/adr/ADR-134-ruvector-claude-code-deep-integration.md)
|
|
- [ADR-135: MinCut Decompiler Architecture](../../docs/adr/ADR-135-mincut-decompiler-with-witness-chains.md)
|
|
- [ADR-136: GPU-Trained Deobfuscation Model](../../docs/adr/ADR-136-gpu-trained-deobfuscation-model.md)
|
|
- [Research: SOTA Decompiler Approaches](../../docs/research/claude-code-rvsource/20-sota-decompiler-research.md)
|
|
- [Research: Model Weight Analysis](../../docs/research/claude-code-rvsource/21-model-weight-analysis.md)
|
|
- [Dashboard: Decompiler Explorer](../../examples/decompiler-dashboard/)
|
|
|
|
---
|
|
|
|
<p align="center">
|
|
<em>ruDevolution — because code deserves to be understood.</em>
|
|
</p>
|