|
|
||
|---|---|---|
| .. | ||
| benches | ||
| examples | ||
| src | ||
| tests | ||
| Cargo.toml | ||
| README.md | ||
ruDevolution
The first decompiler that understands code, proves its work, and learns from every run.
🧠 MinCut Module Detection • 🔮 AI Name Recovery • 🔗 Cryptographic Witness Chains • 📊 Confidence Scoring • 🧬 Self-Learning
🧠 What is ruDevolution?
ruDevolution turns scrambled, minified JavaScript back into readable, organized source code — then proves every step with cryptographic witness chains.
Most decompilers just reformat code. ruDevolution understands it:
📦 Input (minified) 📖 Output (reconstructed)
───────────────────── ──────────────────────────
var a=function(b){ // Module: http-router (92% confidence)
return b.c("d")}; var createRoute = function(request) {
var e=class extends f{ return request.method("GET");
constructor(){this.g="h"}} };
// Module: base-component (88% confidence)
var Component = class extends BaseElement {
constructor() {
this.tagName = "div";
}
}
✅ Witness chain: a3f2c8...→ 7b1e9d...
📋 Source map: output.js.map (V3)
✨ Features
| Feature | ruDevolution | Traditional Decompilers | Why It Matters |
|---|---|---|---|
| 🧩 Module detection | ✅ MinCut graph partitioning | ❌ None | Reconstructs original file structure |
| 🔮 Name recovery | ✅ AI + 210 patterns | ⚠️ Generic (a, b, c) |
Makes code actually readable |
| 🧬 Self-learning | ✅ Gets smarter each run | ❌ Static rules | Accuracy improves over time |
| 🔗 Witness chains | ✅ SHA3-256 Merkle proof | ❌ None | Proves output matches input |
| 🗺️ Source maps | ✅ V3 (DevTools compatible) | ⚠️ Some | Debug in Chrome/VS Code |
| 📊 Confidence scores | ✅ Per-name scoring | ❌ None | Know what to trust |
| 🔄 Cross-version analysis | ✅ Compare releases | ❌ None | Track changes across versions |
| 🏎️ Performance | ✅ 11MB in ~26s | ⚠️ Varies | Production-ready speed |
| 🤖 Neural inference | ✅ GPU-trained model | ❌ None | Predicts original names |
| 📦 RVF containers | ✅ Binary cognitive format | ❌ None | Portable, searchable, provable |
🚀 Quick Start
As a Rust library
use ruvector_decompiler::{decompile, DecompileConfig};
let minified = std::fs::read_to_string("bundle.min.js").unwrap();
let config = DecompileConfig::default();
let result = decompile(&minified, &config).unwrap();
println!("📦 {} modules detected", result.modules.len());
println!("🔮 {} names inferred", result.inferred_names.len());
println!("🔗 Witness root: {}", result.witness_chain.chain_root_hex);
for module in &result.modules {
println!(" 📁 {} ({} declarations)", module.name, module.declarations.len());
}
From the command line
# Decompile a minified bundle
cargo run --release -p ruvector-decompiler --example run_on_cli -- bundle.min.js
# Decompile Claude Code CLI
cargo run --release -p ruvector-decompiler --example run_on_cli -- \
$(npm root -g)/@anthropic-ai/claude-code/cli.js
With the dashboard UI
cd examples/decompiler-dashboard
npm install && npm run dev
# Open http://localhost:5173 — paste any npm package name to decompile
📊 Performance
Tested on Claude Code cli.js (11 MB, 27,477 declarations):
| Phase | Time | What It Does |
|---|---|---|
| 🔍 Parse | 3.4s | Finds all declarations, strings, references |
| 🕸️ Graph | 375ms | Builds 353K-edge reference graph |
| ✂️ Partition | 929ms | Louvain detects 1,029 modules |
| 🔮 Infer | 13.6s | Names 25,465 identifiers with confidence |
| 🔗 Witness | <100ms | SHA3-256 Merkle chain |
| Total | ~26s | Complete pipeline |
🏗️ How It Works
The 5-Phase Pipeline
📄 Minified Bundle
│
▼
┌─── Phase 1: Parse ───┐
│ 🔍 Find declarations │ Regex + single-pass scanner
│ 📝 Extract strings │ memchr SIMD acceleration
│ 🔗 Map references │ Who calls whom?
└───────────┬───────────┘
▼
┌─── Phase 2: Graph ───┐
│ 🕸️ Build ref graph │ Nodes = declarations
│ ⚖️ Weight edges │ Edges = reference frequency
└───────────┬───────────┘
▼
┌─── Phase 3: Partition ─┐
│ ✂️ MinCut / Louvain │ <5K nodes: exact MinCut
│ 📁 Detect modules │ ≥5K nodes: Louvain O(n log n)
│ 🏷️ Name modules │ Based on dominant strings
└───────────┬────────────┘
▼
┌─── Phase 4: Infer ────┐
│ 🤖 Neural model │ GPU-trained transformer
│ 📚 Training corpus │ 210 domain patterns
│ 🔤 Pattern matching │ String context + properties
│ 📊 Confidence scoring │ HIGH / MEDIUM / LOW
└───────────┬────────────┘
▼
┌─── Phase 5: Witness ──┐
│ 🔗 SHA3-256 hashing │ Hash every module
│ 🌳 Merkle tree │ Chain all hashes
│ ✅ Verify: output ⊆ input │ Cryptographic proof
└───────────┬────────────┘
▼
📖 Readable Source Code
🗺️ V3 Source Map
🔗 Witness Chain
📊 Confidence Report
📐 Confidence Levels
Every inferred name gets a confidence score:
| Level | Range | Meaning | Example |
|---|---|---|---|
| 🟢 HIGH | >90% | Direct string evidence | "Bash" in context → bash_tool |
| 🟡 MEDIUM | 60-90% | Property/structural match | .method, .path → route_handler |
| 🔴 LOW | <60% | Positional/generic | Near error patterns → error_handler |
📖 Tutorial: Decompile an npm Package
Step 1: Get the minified bundle
npm pack express --pack-destination /tmp/
tar xzf /tmp/express-*.tgz -C /tmp/
Step 2: Run the decompiler
use ruvector_decompiler::{decompile, DecompileConfig};
let source = std::fs::read_to_string("/tmp/package/index.js")?;
let result = decompile(&source, &DecompileConfig::default())?;
Step 3: Check the results
// How many modules were detected?
println!("Modules: {}", result.modules.len());
// What names were recovered?
for name in result.inferred_names.iter().filter(|n| n.confidence > 0.8) {
println!("{} → {} ({}%)", name.original, name.inferred,
(name.confidence * 100.0) as u32);
}
// Verify the witness chain
assert!(result.witness_chain.is_valid);
Step 4: Use the source map
The output includes a V3 source map compatible with Chrome DevTools:
// In your browser console:
//# sourceMappingURL=decompiled.js.map
🔄 Tutorial: Cross-Version Analysis
Compare Claude Code versions
# Build RVF corpus for all versions
./scripts/claude-code-rvf-corpus.sh
# Each version gets its own RVF container:
# versions/v0.2.x/claude-code-v0.2.rvf (300 vectors)
# versions/v1.0.x/claude-code-v1.0.rvf (482 vectors)
# versions/v2.0.x/claude-code-v2.0.rvf (785 vectors)
# versions/v2.1.x/claude-code-v2.1.rvf (2,068 vectors)
Track what changed
// Decompile two versions
let v1 = decompile(&v1_source, &config)?;
let v2 = decompile(&v2_source, &config)?;
// Functions with same structure but different minified names
// = same original function, renamed by the bundler
// This confirms name inferences across versions
🧬 Tutorial: Self-Learning Feedback Loop
Train from ground truth
If you know the original source for a minified bundle:
use ruvector_decompiler::inferrer::NameInferrer;
let mut inferrer = NameInferrer::new();
// Provide known correct mappings
let ground_truth = vec![
("a$", "createRouter"),
("b$", "handleRequest"),
("c$", "sendResponse"),
];
// Train the inferrer
inferrer.learn_from_ground_truth(&ground_truth);
// Future inferences will be more accurate
// The patterns are stored and reused
Feed back real-world results
// After manual review, tell the inferrer what was correct
let feedback = vec![
Feedback { predicted: "error_handler", actual: "McpErrorHandler", was_correct: false },
Feedback { predicted: "route_handler", actual: "routeHandler", was_correct: true },
];
inferrer.learn_from_feedback(&feedback);
🔗 Tutorial: Witness Chain Verification
Prove decompilation is faithful
let result = decompile(&source, &config)?;
// The witness chain proves every output byte comes from the input
assert!(result.witness_chain.is_valid);
println!("Source hash: {}", result.witness_chain.source_hash_hex);
println!("Chain root: {}", result.witness_chain.chain_root_hex);
// Each module has its own witness
for witness in &result.witness_chain.module_witnesses {
println!(" {} byte_range={}..{} hash={}",
witness.module_name,
witness.byte_range.0, witness.byte_range.1,
witness.content_hash_hex);
}
// Anyone can verify: reconstruct the Merkle tree and compare roots
let verified = result.witness_chain.verify(&source);
assert!(verified);
🤖 Advanced: GPU-Trained Neural Inference
Train a deobfuscation model
# Generate training data (10K+ minified→original pairs)
node scripts/training/generate-deobfuscation-data.mjs
# Launch GPU training on GCloud L4 (~$1.40, ~2 hours)
./scripts/training/launch-gpu-training.sh --cloud
# Export model to GGUF for RuvLLM
python scripts/training/export-to-rvf.py
Use the trained model
let config = DecompileConfig {
model_path: Some("models/deobfuscator.gguf".into()),
..Default::default()
};
let result = decompile(&source, &config)?;
// Neural inference runs first, falls back to patterns
// Expect 60-80% name accuracy vs 5% without model
How the model works
Input: minified name "s$" + context ["tools/call", "initialize", ".client"]
│
▼
┌──────────────┐
│ 6M param │
│ Transformer │ Character-level encoder
│ (GGUF Q4) │ Trained on 100K+ pairs
└──────┬───────┘
│
▼
Output: "mcpToolDispatcher" (confidence: 0.87)
📦 Advanced: RVF Container Integration
Store decompiled code in RVF
RVF (RuVector Format) containers store code as searchable vectors with cryptographic provenance:
# Build RVF containers for all Claude Code versions
./scripts/claude-code-rvf-corpus.sh
# Each .rvf file contains:
# - HNSW-indexed vectors (semantic search)
# - Witness chains (provenance)
# - Manifest (metadata)
# - Module segments (source code)
Query the RVF corpus
import { RvfDatabase } from '@ruvector/rvf';
const db = await RvfDatabase.openReadonly('claude-code-v2.1.rvf');
const results = await db.search('permission system', { limit: 5 });
for (const hit of results) {
console.log(`${hit.module} (score: ${hit.score.toFixed(3)})`);
}
⚙️ Advanced: Configuration Options
DecompileConfig
let config = DecompileConfig {
// Module detection
target_modules: None, // Auto-detect (recommended)
min_module_size: Some(3), // Minimum declarations per module
// Name inference
min_confidence: 0.3, // Minimum confidence to include
model_path: None, // Path to neural model (optional)
// Output
generate_source_map: true, // V3 source maps
beautify: true, // Indent and format output
};
Environment variables
| Variable | Default | Description |
|---|---|---|
DECOMPILER_THREADS |
CPU count | Rayon thread pool size |
DECOMPILER_MODEL |
none | Path to GGUF model |
DECOMPILER_MIN_CONFIDENCE |
0.3 | Minimum confidence threshold |
🏛️ Architecture
crates/ruvector-decompiler/
├── src/
│ ├── lib.rs # 🎯 Public API: decompile()
│ ├── parser.rs # 🔍 Single-pass JS scanner (memchr + lookup table)
│ ├── graph.rs # 🕸️ Reference graph construction
│ ├── partitioner.rs # ✂️ MinCut + Louvain community detection
│ ├── inferrer.rs # 🔮 Name inference (neural + patterns + learning)
│ ├── training.rs # 🧬 Training corpus (210 patterns, JSON-loadable)
│ ├── sourcemap.rs # 🗺️ V3 source map generation (VLQ encoding)
│ ├── beautifier.rs # ✨ Code formatting and indentation
│ ├── witness.rs # 🔗 SHA3-256 Merkle witness chains
│ ├── types.rs # 📐 Core types and config
│ └── error.rs # ❌ Error handling
├── data/
│ └── claude-code-patterns.json # 📚 210 domain-specific patterns
├── tests/
│ ├── integration.rs # ✅ 8 integration tests
│ ├── ground_truth.rs # 🎯 5 fixture accuracy tests
│ └── real_world.rs # 🌍 3 OSS comparison tests
├── benches/
│ ├── bench_parser.rs # ⚡ Parser benchmarks (1KB-1MB)
│ └── bench_pipeline.rs # ⚡ Full pipeline benchmarks
└── examples/
└── run_on_cli.rs # 🖥️ CLI runner for real bundles
📚 Related
- ADR-133: Claude Code Source Analysis
- ADR-134: RuVector Deep Integration
- ADR-135: MinCut Decompiler Architecture
- ADR-136: GPU-Trained Deobfuscation Model
- Research: SOTA Decompiler Approaches
- Research: Model Weight Analysis
- Dashboard: Decompiler Explorer
ruDevolution — because code deserves to be understood.