ruvector

mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-24 13:54:31 +00:00

Author	SHA1	Message	Date
rUv	7704c94624	feat(decompiler): LLM weight decompiler + API prober (ADR-138) Model weight decompilation: - GGUF v2/v3 parser (self-contained, no ruvllm dep) - Safetensors JSON header parser - Architecture inference from tensor shapes (GQA, FFN, vocab) - Tokenizer extraction, quantization detection - Witness chain for model provenance - 6 integration tests, behind `model` feature flag API probing (live tested): - Probes Claude, OpenAI, Gemini APIs without weight access - Detects: streaming, tools, system_prompt, vision capabilities - Measures: latency, tokens/sec, tokenizer type - Model fingerprinting via self-identification + math tests - Verified: Gemini 2.0 Flash (556ms, 46 tok/s, all caps detected) CLI: npx ruvector decompile --model file.gguf npx ruvector decompile --api gemini-2.0-flash 78 Rust tests passing. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 19:08:30 +00:00
rUv	99d13f6811	fix(decompiler): proper multi-level folder hierarchy from graph tree.rs fixes: - Target 10 top-level folders (was collapsing to 1) - Max cluster size capped at 20% of total (prevents mega-folders) - Geometric mean normalization (prevents giant clusters absorbing all) - Leaf threshold: 20 modules at depth 1+ (was 3) Claude Code result: 19 directories with graph-derived names (asyncgenerator, bedrockclient, systempromptsectioncache, etc.) 59 tests passing. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 15:51:48 +00:00
rUv	a3029eaecb	feat(decompiler): WASM Louvain pipeline — npx now produces 589+ modules Compiled ruvector-decompiler to WASM via wasm-pack: - crates/ruvector-decompiler-wasm/ — wasm-bindgen wrapper (cdylib) - rayon gated behind optional `parallel` feature (sequential in WASM) - DecompileConfig now Deserializable for JSON config passing - 1.5MB WASM binary at npm/packages/ruvector/wasm/ npx ruvector decompile now tries: WASM Louvain → Rust binary → keyword split Result: 589 modules from Claude Code (was 5 with keyword splitter) 59 Rust tests pass, WASM verified from Node.js. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 15:25:23 +00:00
rUv	c28fce2229	feat(decompiler): graph-derived hierarchical folder structure (Phase 7) Folder structure emerges from the dependency graph — not hardcoded keywords. tree.rs (362 lines): - Agglomerative clustering on inter-module edge weights - TF-IDF naming: most discriminative strings name each folder - Recursive depth control (configurable max_depth, min_folder_size) inferrer.rs: infer_folder_name() with TF-IDF scoring types.rs: ModuleTree struct, hierarchical config options run_on_cli.rs: --output-dir prints folder tree to disk module-splitter.js: JS-side tree builder with same approach Key principle: tightly-coupled code shares a folder, MinCut boundaries become folder boundaries, names from context. 59 tests passing, zero warnings. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 03:26:54 +00:00
rUv	38086742ff	feat(decompiler): pure Rust transformer inference — zero ML dependencies transformer.rs (416 lines): complete forward pass in std Rust - Multi-head self-attention with padding mask - GELU activation, layer norm, softmax - Loads weights from simple binary format (2.6MB) - Zero external deps — just f32 math neural.rs: Backend enum (Transformer/ONNX/Stub) - .bin → pure Rust (always available, no feature flag) - .onnx → ort (behind neural feature flag) - .gguf/.rvf → stub for future RuvLLM integration export-weights-bin.py: PyTorch → binary weight dump - 42 tensors, 673,152 parameters, 2.6MB output 56 tests passing, zero warnings. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 02:41:47 +00:00
rUv	e7e48eb88e	feat(decompiler): ONNX Runtime neural inference + 8,226 training pairs Neural inference (behind `neural` feature flag): - Full ONNX Runtime integration via `ort` crate - Loads .onnx models, encodes context as byte tensors - Softmax confidence scoring, character-level decoding - Falls back to pattern-based when model unavailable Training data expansion: 1,602 → 8,226 pairs - 200+ function names, 90+ class names, 170+ variable names - 16 minifier styles, 5 context variations per entry - Extracted identifier dictionaries (381 lines) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 02:30:41 +00:00
rUv	8aafea328d	feat(decompiler): GPU training pipeline for neural name inference (ADR-136) Training pipeline: - generate-deobfuscation-data.mjs: 1,200+ training pairs from fixtures + synthetic - train-deobfuscator.py: 6M param transformer (3 layers, 4 heads, 128 embed) - export-to-rvf.py: PyTorch → ONNX → GGUF Q4 → RVF OVERLAY - launch-gpu-training.sh: GCloud L4 GPU (--local, --cloud-run, --spot) - Dockerfile.deobfuscator: pytorch/pytorch:2.2.0-cuda12.1 Decompiler integration: - NeuralInferrer behind optional `neural` feature flag - model_path in DecompileConfig - Falls through to pattern-based when model unavailable - Zero binary impact without feature flag All tests pass, cargo check clean with and without neural feature. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 02:08:19 +00:00
rUv	829537d998	perf(decompiler): ultra-optimize — 35x faster Louvain, memchr, 210 patterns Louvain partitioning: 33s → 929ms (35x faster!) - Pre-computed sigma_totals replaces O(n²) community_total_weight - Rayon parallel local-move phase - Incremental O(1) updates per node move Parser: 4.5s → 3.4s (1.3x faster) - memchr SIMD for string delimiter scanning - 256-entry lookup table for character classification - unsafe from_utf8_unchecked for ASCII-guaranteed identifiers - Pre-sized HashSet allocations Training patterns: 50 → 210 (4.2x more coverage) - 27 tool patterns, 23 MCP, 21 UI/Ink, 20 config - 16 error, 14 session, 14 streaming, 15 auth - 14 CLI, 10 telemetry 51 tests passing, zero warnings. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 02:01:17 +00:00
rUv	f1ee2f8eb2	perf(decompiler): 4x parser speedup, Louvain partitioning, training corpus Bottleneck 1 - Parser: 18.3s → 4.5s (4x faster) - Single-pass body scanner replaces 3 regex passes per declaration - scan_body_single_pass() collects strings, props, idents in one traversal Bottleneck 2 - Partitioning: skipped → 33s (now works on 27K nodes) - Louvain community detection for graphs ≥5K nodes - Detects 1,029 modules in Claude Code (was 1 or skipped) - Falls back to exact MinCut for <5K nodes Bottleneck 3 - Memory: 592MB → 568MB (incremental, more needed) - Pre-allocated output buffers in beautifier - Direct write via format_declaration_into() / indent_braces_into() Bottleneck 4 - Name inference: 5.2% → 5.2% HIGH (training data loaded) - 50 domain-specific patterns in data/claude-code-patterns.json - TrainingCorpus with compile-time embedding via include_str!() - Runtime corpus loading via TrainingCorpus::from_json() 51 tests passing, zero warnings. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 01:18:31 +00:00
rUv	8315e0a61a	fix(decompiler): review fixes, benchmarks, real-world validation Bugs fixed: - assert!() in witness verification → proper Err return - Swapped property-to-name mappings in inferrer - Escape sequences in beautifier indent_braces - Doc comments: SHAKE-256 → SHA3-256 (correct hash function) Performance: - Cached regex compilation via once_cell::Lazy (7 regexes) - HashSet for O(1) lookups (was Vec O(n)) - Optimized hex encoding with lookup table - Added ES module export support Benchmarks (criterion): - 1KB: 58μs parse, 230μs pipeline - 10KB: 581μs parse, 1.7ms pipeline - 100KB: 5.4ms parse, 26.2ms pipeline - 1MB: 53.5ms parse (linear scaling) Real-world: Claude Code cli.js (10.53 MB): - 27,477 declarations, 601,653 edges - 1,344 HIGH confidence names (5.2%) - 5,843 MEDIUM confidence names (22.8%) - 24.6s total pipeline time OSS fixtures: lodash, express, redux with self-learning loop Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 00:47:13 +00:00
rUv	2804e9c650	feat(decompiler): MinCut-based JS decompiler with witness chains (ADR-135) 5-phase decompilation pipeline: 1. Regex-based parser extracts declarations, strings, property accesses 2. MinCut graph partitioning detects original module boundaries 3. Name inference with confidence scoring (HIGH/MEDIUM/LOW) 4. V3 source map generation (browser DevTools compatible) 5. SHAKE-256 Merkle witness chains for cryptographic provenance Ground-truth validation: - 5 test fixtures (Express, MCP Server, React, Multi-Module, Tools) - Self-learning feedback loop via learn_from_ground_truth() - 14 tests, all passing SOTA research document covering JSNice, DeGuard, cross-version fingerprinting, and RuVector's unique advantage combining MinCut, IIT Phi, SONA, and HNSW for decompilation. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-03 00:04:36 +00:00

11 commits