rUv
7704c94624
feat(decompiler): LLM weight decompiler + API prober (ADR-138)
...
Model weight decompilation:
- GGUF v2/v3 parser (self-contained, no ruvllm dep)
- Safetensors JSON header parser
- Architecture inference from tensor shapes (GQA, FFN, vocab)
- Tokenizer extraction, quantization detection
- Witness chain for model provenance
- 6 integration tests, behind `model` feature flag
API probing (live tested):
- Probes Claude, OpenAI, Gemini APIs without weight access
- Detects: streaming, tools, system_prompt, vision capabilities
- Measures: latency, tokens/sec, tokenizer type
- Model fingerprinting via self-identification + math tests
- Verified: Gemini 2.0 Flash (556ms, 46 tok/s, all caps detected)
CLI: npx ruvector decompile --model file.gguf
npx ruvector decompile --api gemini-2.0-flash
78 Rust tests passing.
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 19:08:30 +00:00
rUv
99d13f6811
fix(decompiler): proper multi-level folder hierarchy from graph
...
tree.rs fixes:
- Target 10 top-level folders (was collapsing to 1)
- Max cluster size capped at 20% of total (prevents mega-folders)
- Geometric mean normalization (prevents giant clusters absorbing all)
- Leaf threshold: 20 modules at depth 1+ (was 3)
Claude Code result: 19 directories with graph-derived names
(asyncgenerator, bedrockclient, systempromptsectioncache, etc.)
59 tests passing.
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 15:51:48 +00:00
rUv
a3029eaecb
feat(decompiler): WASM Louvain pipeline — npx now produces 589+ modules
...
Compiled ruvector-decompiler to WASM via wasm-pack:
- crates/ruvector-decompiler-wasm/ — wasm-bindgen wrapper (cdylib)
- rayon gated behind optional `parallel` feature (sequential in WASM)
- DecompileConfig now Deserializable for JSON config passing
- 1.5MB WASM binary at npm/packages/ruvector/wasm/
npx ruvector decompile now tries: WASM Louvain → Rust binary → keyword split
Result: 589 modules from Claude Code (was 5 with keyword splitter)
59 Rust tests pass, WASM verified from Node.js.
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 15:25:23 +00:00
rUv
c28fce2229
feat(decompiler): graph-derived hierarchical folder structure (Phase 7)
...
Folder structure emerges from the dependency graph — not hardcoded keywords.
tree.rs (362 lines):
- Agglomerative clustering on inter-module edge weights
- TF-IDF naming: most discriminative strings name each folder
- Recursive depth control (configurable max_depth, min_folder_size)
inferrer.rs: infer_folder_name() with TF-IDF scoring
types.rs: ModuleTree struct, hierarchical config options
run_on_cli.rs: --output-dir prints folder tree to disk
module-splitter.js: JS-side tree builder with same approach
Key principle: tightly-coupled code shares a folder,
MinCut boundaries become folder boundaries, names from context.
59 tests passing, zero warnings.
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 03:26:54 +00:00
rUv
38086742ff
feat(decompiler): pure Rust transformer inference — zero ML dependencies
...
transformer.rs (416 lines): complete forward pass in std Rust
- Multi-head self-attention with padding mask
- GELU activation, layer norm, softmax
- Loads weights from simple binary format (2.6MB)
- Zero external deps — just f32 math
neural.rs: Backend enum (Transformer/ONNX/Stub)
- .bin → pure Rust (always available, no feature flag)
- .onnx → ort (behind neural feature flag)
- .gguf/.rvf → stub for future RuvLLM integration
export-weights-bin.py: PyTorch → binary weight dump
- 42 tensors, 673,152 parameters, 2.6MB output
56 tests passing, zero warnings.
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 02:41:47 +00:00
rUv
e7e48eb88e
feat(decompiler): ONNX Runtime neural inference + 8,226 training pairs
...
Neural inference (behind `neural` feature flag):
- Full ONNX Runtime integration via `ort` crate
- Loads .onnx models, encodes context as byte tensors
- Softmax confidence scoring, character-level decoding
- Falls back to pattern-based when model unavailable
Training data expansion: 1,602 → 8,226 pairs
- 200+ function names, 90+ class names, 170+ variable names
- 16 minifier styles, 5 context variations per entry
- Extracted identifier dictionaries (381 lines)
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 02:30:41 +00:00
rUv
8aafea328d
feat(decompiler): GPU training pipeline for neural name inference (ADR-136)
...
Training pipeline:
- generate-deobfuscation-data.mjs: 1,200+ training pairs from fixtures + synthetic
- train-deobfuscator.py: 6M param transformer (3 layers, 4 heads, 128 embed)
- export-to-rvf.py: PyTorch → ONNX → GGUF Q4 → RVF OVERLAY
- launch-gpu-training.sh: GCloud L4 GPU (--local, --cloud-run, --spot)
- Dockerfile.deobfuscator: pytorch/pytorch:2.2.0-cuda12.1
Decompiler integration:
- NeuralInferrer behind optional `neural` feature flag
- model_path in DecompileConfig
- Falls through to pattern-based when model unavailable
- Zero binary impact without feature flag
All tests pass, cargo check clean with and without neural feature.
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 02:08:19 +00:00
rUv
829537d998
perf(decompiler): ultra-optimize — 35x faster Louvain, memchr, 210 patterns
...
Louvain partitioning: 33s → 929ms (35x faster!)
- Pre-computed sigma_totals replaces O(n²) community_total_weight
- Rayon parallel local-move phase
- Incremental O(1) updates per node move
Parser: 4.5s → 3.4s (1.3x faster)
- memchr SIMD for string delimiter scanning
- 256-entry lookup table for character classification
- unsafe from_utf8_unchecked for ASCII-guaranteed identifiers
- Pre-sized HashSet allocations
Training patterns: 50 → 210 (4.2x more coverage)
- 27 tool patterns, 23 MCP, 21 UI/Ink, 20 config
- 16 error, 14 session, 14 streaming, 15 auth
- 14 CLI, 10 telemetry
51 tests passing, zero warnings.
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 02:01:17 +00:00
rUv
f1ee2f8eb2
perf(decompiler): 4x parser speedup, Louvain partitioning, training corpus
...
Bottleneck 1 - Parser: 18.3s → 4.5s (4x faster)
- Single-pass body scanner replaces 3 regex passes per declaration
- scan_body_single_pass() collects strings, props, idents in one traversal
Bottleneck 2 - Partitioning: skipped → 33s (now works on 27K nodes)
- Louvain community detection for graphs ≥5K nodes
- Detects 1,029 modules in Claude Code (was 1 or skipped)
- Falls back to exact MinCut for <5K nodes
Bottleneck 3 - Memory: 592MB → 568MB (incremental, more needed)
- Pre-allocated output buffers in beautifier
- Direct write via format_declaration_into() / indent_braces_into()
Bottleneck 4 - Name inference: 5.2% → 5.2% HIGH (training data loaded)
- 50 domain-specific patterns in data/claude-code-patterns.json
- TrainingCorpus with compile-time embedding via include_str!()
- Runtime corpus loading via TrainingCorpus::from_json()
51 tests passing, zero warnings.
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 01:18:31 +00:00
rUv
8315e0a61a
fix(decompiler): review fixes, benchmarks, real-world validation
...
Bugs fixed:
- assert!() in witness verification → proper Err return
- Swapped property-to-name mappings in inferrer
- Escape sequences in beautifier indent_braces
- Doc comments: SHAKE-256 → SHA3-256 (correct hash function)
Performance:
- Cached regex compilation via once_cell::Lazy (7 regexes)
- HashSet for O(1) lookups (was Vec O(n))
- Optimized hex encoding with lookup table
- Added ES module export support
Benchmarks (criterion):
- 1KB: 58μs parse, 230μs pipeline
- 10KB: 581μs parse, 1.7ms pipeline
- 100KB: 5.4ms parse, 26.2ms pipeline
- 1MB: 53.5ms parse (linear scaling)
Real-world: Claude Code cli.js (10.53 MB):
- 27,477 declarations, 601,653 edges
- 1,344 HIGH confidence names (5.2%)
- 5,843 MEDIUM confidence names (22.8%)
- 24.6s total pipeline time
OSS fixtures: lodash, express, redux with self-learning loop
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 00:47:13 +00:00
rUv
2804e9c650
feat(decompiler): MinCut-based JS decompiler with witness chains (ADR-135)
...
5-phase decompilation pipeline:
1. Regex-based parser extracts declarations, strings, property accesses
2. MinCut graph partitioning detects original module boundaries
3. Name inference with confidence scoring (HIGH/MEDIUM/LOW)
4. V3 source map generation (browser DevTools compatible)
5. SHAKE-256 Merkle witness chains for cryptographic provenance
Ground-truth validation:
- 5 test fixtures (Express, MCP Server, React, Multi-Module, Tools)
- Self-learning feedback loop via learn_from_ground_truth()
- 14 tests, all passing
SOTA research document covering JSNice, DeGuard, cross-version
fingerprinting, and RuVector's unique advantage combining MinCut,
IIT Phi, SONA, and HNSW for decompilation.
Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-03 00:04:36 +00:00