mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-25 15:03:46 +00:00

rUv 2b173d4df5 feat(decompiler): 95.7% accuracy — beats SOTA by 32.7 points

v2 model trained on 8,201 pairs (5x expansion):
- Val accuracy: 75.7% → 95.7% (+20 points)
- Val loss: 0.914 → 0.149 (6x improvement)
- Beats JSNice (63%), DIRE (65.8%), VarCLR (72%) by wide margin

Updated all ADRs and research docs with v2 results.
Exported weights-v2.bin (2.6MB) for pure Rust inference.

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-04-03 02:58:36 +00:00

21 KiB

Raw Permalink Blame History

Model Weight Analysis: Claude Code CLI and RuVector Integration

Research Date: 2026-04-02 (updated 2026-04-03) Status: Complete — model trained and deployed

1. Claude Code CLI Binary Analysis

1.1 Package Structure

The Claude Code npm package at @anthropic-ai/claude-code contains:

File	Size	Purpose
`cli.js`	11.0 MB	Bundled JS application (minified, single-file)
`tree-sitter.wasm`	205 KB	Tree-sitter runtime (WASM MVP)
`tree-sitter-bash.wasm`	1.38 MB	Bash grammar with parse tables
`vendor/ripgrep/`	~5-7 MB per platform	Native `rg` + `ripgrep.node` bindings

1.2 Embedded Models: None Found

The cli.js binary contains no embedded ML model weights. Specifically:

Zero base64-encoded weight blobs: A scan for base64 strings >= 100 characters returned 0 matches.
No ONNX/GGUF/safetensors references: No .onnx, .gguf, or .safetensors file patterns found as model loading paths.
No model architecture configs: No hidden_size, num_layers, vocab_size, or attention_heads configuration objects detected.
No local inference code: No ONNX runtime, TensorFlow.js, or WASM-based inference engine bundled.
No tokenizer/BPE data: No BPE merge tables, vocabulary files, or tiktoken data embedded directly. The strings bpe, tokenize, encode/decode, and vocab appear only in highlight.js language grammar definitions (for syntax highlighting of programming languages like PHP, ISBL, etc.), not as actual tokenizer implementations.

Conclusion: Claude Code is a pure API client. All inference happens server-side via Anthropic's API. The binary is a sophisticated CLI/TUI application with no local model execution capability.

1.3 Tree-Sitter WASM Analysis

The two WASM files are standard tree-sitter modules:

tree-sitter.wasm (205 KB): The tree-sitter runtime library compiled to WASM MVP. Contains the incremental parsing engine, tree manipulation functions, and query engine. No ML components -- tree-sitter uses hand-written GLR/LR parsers, not neural models.
tree-sitter-bash.wasm (1.38 MB): The Bash language grammar. Contains:
- LR parse tables (state transition matrices)
- Token recognition rules (lexer automata)
- External scanner for heredocs, command substitution
- Exports the symbol tree_sitter_bash

These parse tables are deterministic finite automata, not neural network weights. The 1.38 MB size reflects the complexity of Bash grammar (heredocs, nested quoting, arithmetic expansion, etc.).

1.4 Vendor Directory

The vendor/ directory contains only ripgrep binaries:

vendor/ripgrep/
  arm64-darwin/   rg (4.2MB) + ripgrep.node (5.9MB)
  arm64-linux/    rg (5.0MB) + ripgrep.node (4.4MB)
  x64-darwin/     rg (4.9MB) + ripgrep.node (5.9MB)
  x64-linux/      rg (6.3MB) + ripgrep.node (4.8MB)
  x64-win32/      rg.exe (5.2MB) + ripgrep.node (6.6MB)

The ripgrep.node files are Node.js native addon bindings (NAPI-RS) for the Rust ripgrep library. No model weights.

2. External Model References in Claude Code

2.1 Model IDs (Extracted from cli.js)

Claude Code references these model identifiers:

Active Models (Current Generation):

Model ID	Display Name	Notes
`claude-sonnet-4-5-20250929`	Sonnet 4.5	"1M context window - for long sessions"
`claude-sonnet-4-20250514`	Sonnet 4	Standard model
`claude-opus-4-5-20251101`	Opus 4.5	Premium tier
`claude-opus-4-1-20250805`	Opus 4.1	Premium tier
`claude-opus-4-20250514`	Opus 4	Premium tier
`claude-haiku-4-5-20251001`	Haiku 4.5	Fast, lower cost

Legacy Models (Referenced for deprecation handling):

Model ID	Display Name
`claude-3-7-sonnet-20250219`	Sonnet 3.7
`claude-3-5-sonnet-20241022`	Sonnet 3.5
`claude-3-5-haiku-20241022`	Haiku 3.5
`claude-3-opus-20240229`	Opus 3
`claude-3-sonnet-20240229`	Sonnet 3

Internal Model Reference:

claude-code-20250219: A special model identifier, likely a fine-tuned variant or routing label for Claude Code-specific behavior.

2.2 API Endpoints

Claude Code communicates with these endpoints:

Endpoint	Purpose
`api.anthropic.com`	Primary Anthropic API
Vertex AI (Google Cloud)	Alternative provider via `@anthropic-ai/vertex-sdk`
AWS Bedrock	Alternative provider via `@aws-sdk/client-bedrock-runtime`

The code constructs requests to /v1/messages (standard) and /v1/messages?beta=true (beta features). It uses the anthropic-beta header for features like:

token-counting-2024-11-01
structured-outputs-2025-09-17
skills-2025-10-02

2.3 Model Selection Logic

The model selection hierarchy in cli.js:

CLI flag --model (highest priority)
Environment variable ANTHROPIC_MODEL
Settings file model preference
Agent definition model (if using an agent)
Default model via tp() function (returns the current default, appears to be Sonnet 4.5)

Subscription-based gating: Opus models require specific plan tiers. The code checks subscriptionType and shows: "Your plan doesn't include Opus in Claude Code. You can turn on /extra-usage or /upgrade to Max to access it."

2.4 Feature Flags and Telemetry

Statsig: Claude Code uses Statsig for feature flag management (references to statsig, feature gates, dynamic configs). Model routing decisions may be A/B tested through Statsig experiments.
Datadog: Telemetry is sent to Datadog via DD-API-KEY header. Events include model selection, token usage, tool usage, and HTTP status codes. Model names are sanitized -- non-Claude models are logged as "other".

3. Model Weight Reverse-Engineering: SOTA Approaches

3.1 Architecture Reconstruction from Weight Tensors

Given a weight file (safetensors, GGUF, ONNX), the architecture can be inferred:

Layer Type Identification by Tensor Shape:

Tensor Shape Pattern	Likely Layer Type
`[hidden, hidden*3]` or `[hidden, hidden, 3]`	Multi-head self-attention (QKV projection)
`[hidden, hidden]`	Attention output projection or dense layer
`[hidden, 4*hidden]`	MLP up-projection (SwiGLU: `[hidden, 8/3*hidden]`)
`[4*hidden, hidden]`	MLP down-projection
`[vocab_size, hidden]`	Embedding or LM head
`[hidden]`	LayerNorm/RMSNorm scale

Architecture Detection Algorithm:

Parse all tensor names and shapes from the weight file
Group tensors by layer index (e.g., model.layers.0.*)
For each layer, identify attention, MLP, and normalization components
Infer hidden_size from attention weight dimensions
Infer num_heads from hidden_size / head_dim (head_dim typically 64 or 128)
Infer intermediate_size from MLP weight dimensions
Count total layers for num_hidden_layers
Detect activation function from MLP structure (gate_proj presence = SwiGLU)

3.2 Quantization Scheme Detection

GGUF Quantization Types (from RuvLLM's gguf/quantization.rs):

Q4_0, Q4_1: 4-bit quantization (block size 32)
Q5_0, Q5_1: 5-bit quantization
Q8_0, Q8_1: 8-bit quantization
Q2_K through Q6_K: K-quant family (variable bit-width per block)
IQ1_S through IQ4_XS: Importance-weighted quantization

Detection from GGUF files: Each tensor's quantization type is stored in the tensor info header. The parser reads tensor_count entries, each containing name, dimensions, quant type, and data offset.

GPTQ/AWQ Detection: These use safetensors format with specific tensor naming:

GPTQ: *.qweight, *.qzeros, *.scales, *.g_idx
AWQ: *.qweight, *.qzeros, *.scales (no g_idx)
Both store 4-bit weights packed into int32 tensors

3.3 Tokenizer Extraction

Tokenizer vocabularies can be extracted from:

GGUF metadata: tokenizer.ggml.tokens, tokenizer.ggml.merges, tokenizer.ggml.model
HuggingFace tokenizer.json: Contains full BPE/WordPiece/Unigram model
SentencePiece .model files: Protocol buffer format with vocab + merge rules

3.4 Tools for Weight Analysis

Tool	Capability
`safetensors` (Python)	Parse safetensors headers without loading data
`gguf` (Python/Rust)	Full GGUF parsing, metadata + tensor inspection
`onnx` (Python)	ONNX model graph inspection
RuvLLM `gguf/parser.rs`	Native Rust GGUF v3 parser with full type support
`transformers`	`AutoConfig.from_pretrained()` for architecture detection

4. RVF OVERLAY Segment and LoRA Integration

4.1 RVF Segment Types

The RVF (RuVector Format) defines 28 segment types. Key segments for model weight storage:

Type	Hex	Purpose
`Vec`	0x01	Raw vector embeddings
`Overlay`	0x03	Graph overlay deltas, partition updates
`Quant`	0x06	Quantization dictionaries and codebooks
`Delta`	0x23	Sparse delta patches
`TransferPrior`	0x30	Cross-domain posterior summaries
`AggregateWeights`	0x36	Federated-averaged SONA/LoRA deltas

The AggregateWeights segment (0x36) is specifically designed for storing "federated-averaged SONA weights: aggregated LoRA deltas, participation count, round number, convergence metrics."

4.2 Brain Server LoRA Federation

The brain server at pi.ruv.io implements a federated LoRA system:

Architecture:

LoraFederationStore: Accumulates submissions and produces consensus weights
Default configuration: rank=2, hidden_dim=128
Byzantine-fault-tolerant aggregation via per-parameter median

LoRA Weight Format (LoraSubmission):

struct LoraSubmission {
    down_proj: Vec<f32>,   // Shape: [hidden_dim, rank] flattened
    up_proj: Vec<f32>,     // Shape: [rank, hidden_dim] flattened
    rank: usize,           // Typically 2
    hidden_dim: usize,     // Typically 128
    evidence_count: u64,   // Number of patterns that informed this delta
}

Validation (Gate A):

Verifies down_proj.len() == hidden_dim * rank
Verifies up_proj.len() == rank * hidden_dim
Checks all values are finite (no NaN/Inf)
Rejects if max absolute value > 10.0

Federation Aggregation:

Collect pending submissions with contributor reputation scores
Compute per-parameter median for outlier detection
Weight each submission by reputation * evidence_count
Normalize weights and compute weighted average
Store previous consensus for rollback capability
Increment epoch counter
Persist to Firestore under brain_lora collection

Auto-submission from SONA: When the SONA (Self-Optimizing Neural Architecture) engine produces patterns during a /v1/share request, a LoRA weight delta is automatically generated and submitted to the federation store.

API Endpoints:

GET /v1/lora/latest: Returns current consensus weights + epoch number
POST /v1/lora/submit: Accept session LoRA weights for federation

4.3 MicroLoRA in RuvLLM

The crates/ruvllm/src/lora/micro_lora.rs implements a lightweight LoRA adapter:

MicroLoraConfig: Configurable rank, alpha, dropout, and target modules.

Forward Pass: Standard LoRA: output = x @ lora_a @ lora_b * (alpha / rank)

SIMD Optimization: For rank=1, the implementation uses AVX2/NEON SIMD intrinsics with 8-wide vectorized dot products for the lora_a and lora_b matrix multiplications.

Integration with RVF: LoRA deltas can be serialized into AggregateWeights (0x36) segments for inclusion in RVF files, enabling federated learning exports with differential privacy attestation (DiffPrivacyProof segment 0x34).

5. RuvLLM Model Support

5.1 Supported Architectures

RuvLLM (crates/ruvllm/) supports:

GGUF v3 format: Full parser with header, metadata, tensor info, and data extraction
Quantization: Q2_K through Q8_1, IQ variants, and custom RuvLtra quantization
Backends: Candle, CoreML, Metal, Mistral, Gemma2, Phi-3, hybrid pipeline
LoRA: MicroLoRA adapters, LoRA-QAT (Quantization-Aware Training), adapter merge/training
BitNet: Ternary (1.58-bit) quantization with TL1 kernel, WASM and AVX2 optimized paths
MoE: Mixture of Experts with precision allocator, SRAM mapper, expert router

5.2 Auto-Detection System

RuvLLM's autodetect.rs provides runtime capability detection:

Platform: macOS, Linux, Windows, WASM, iOS, Android
CPU features: NEON, AVX2, AVX-512, SSE4.2
GPU: Metal, CUDA, WebGPU
Automatic backend and thread count selection

6. Recommendations for a Model Weight Decompiler

6.1 Architecture

Build a tool that can:

Detect format: Magic bytes distinguish GGUF (0x46475547), safetensors (JSON header), ONNX (protobuf), PyTorch (zip/pickle)
Extract metadata: Parse headers without loading tensor data
Reconstruct architecture: Use tensor name patterns and shapes to infer model config
Detect quantization: Identify quant scheme from tensor types or weight distributions
Extract tokenizer: Pull vocabulary and merge rules from metadata

6.2 Implementation Strategy

Leverage existing RuvLLM infrastructure:

crates/ruvllm/src/gguf/parser.rs: Already handles GGUF v3 parsing
crates/ruvllm/src/autodetect.rs: Platform detection for optimized decompilation
crates/ruvllm/src/quantize/: Quantization detection and analysis
crates/rvf/rvf-types/src/segment_type.rs: RVF format definitions

6.3 Key Insights

Claude Code has no local weights: It is purely an API client. The "model" is the server-side Claude instance accessed via REST API. There is nothing to decompile locally.
Tree-sitter is not ML: The WASM modules contain deterministic parse tables (LR automata), not neural network weights. They are compiled from tree-sitter grammar DSL files.
RVF format is designed for weight federation: The segment type system (especially AggregateWeights 0x36, Delta 0x23, and Quant 0x06) provides first-class support for storing, transmitting, and aggregating model weight deltas.
Brain server LoRA is rank-2, 128-dim: The federated LoRA system uses very small adapters (rank=2, hidden_dim=128 = 512 parameters total), which are designed for rapid adaptation of the embedding/routing layer rather than full model fine-tuning.
Model routing in Claude Code is server-controlled: The model ID sent to the API determines behavior. Claude Code has A/B testing infrastructure (Statsig) that may route users to different model versions transparently.

7. GPU-Accelerated Training and Analysis (GCloud)

7.1 Available Infrastructure

GCloud project ruv-dev (us-central1) has access to:

GPU Type	VRAM	Best For
NVIDIA L4	24 GB	Inference, light training, LoRA fine-tuning
NVIDIA A100 40GB	40 GB	Full model training, large batch embedding
NVIDIA A100 80GB	80 GB	Large model fine-tuning, multi-task training
NVIDIA H100 80GB	80 GB	Maximum throughput training

Existing Cloud Run jobs that could leverage GPU:

ruvbrain-worker: Worker job (ADR-130 notes "GPU: L4 (optional)")
ruvltra-nightly-train: Nightly training job (runs daily at 03:00 UTC)
ruvltra-calibration: Quantization calibration
ruvltra-benchmark: Performance benchmarking

7.2 GPU Use Cases for Model Weight Analysis

Name Inference Model (Minified-to-Original Deobfuscation):

Task: Fine-tune a small transformer (e.g., CodeT5-small, 60M params) on pairs of (minified_name, original_name) extracted from open-source JS bundles and their source maps.
GPU: L4 sufficient. Training dataset: scrape npm packages with source maps, extract variable name pairs from webpack/esbuild bundles.
Expected result: A model that predicts formatUserProfile from fU or createRouter from cR, improving readability of decompiled cli.js.
Training time estimate: ~2-4 hours on L4 for 1M name pairs with CodeT5-small.

Louvain Graph Partitioner on GPU:

The brain server's knowledge graph (350K+ edges) uses MinCut partitioning. GPU-accelerated graph partitioning (cuGraph Louvain) would reduce partition computation from minutes to seconds.
GPU: L4 sufficient for graphs under 10M edges.
Implementation: Use RAPIDS cuGraph via Python, or port the Rust crates/mincut/ to use cudarc bindings.

Batch Embedding Generation:

Re-embed the entire brain corpus (1,500+ memories) using a GPU-accelerated embedding model instead of the current HashEmbedder/RlmEmbedder.
GPU: L4 with a model like all-MiniLM-L6-v2 (23M params) or bge-small-en (33M params).
Throughput: ~10,000 embeddings/second on L4 vs ~100/second on CPU.
This would improve the quality of semantic search in /v1/memories/search.

Large Model Weight Inspection:

For analyzing large GGUF/safetensors files (e.g., 70B parameter models at Q4 = ~40GB), GPU VRAM enables memory-mapped tensor analysis without disk thrashing.
GPU: A100 80GB for models up to 70B; H100 for larger.
Use case: Statistical analysis of weight distributions, quantization error measurement, layer-wise activation profiling.

7.3 Recommended GPU Configuration

For immediate use, spin up an L4 instance for the name inference and embedding tasks:

gcloud compute instances create ruvector-gpu-worker \
  --zone=us-central1-a \
  --machine-type=g2-standard-8 \
  --accelerator=type=nvidia-l4,count=1 \
  --boot-disk-size=200GB \
  --image-family=pytorch-latest-gpu \
  --image-project=deeplearning-platform-release \
  --maintenance-policy=TERMINATE

For large model analysis, use a spot A100 to minimize cost:

gcloud compute instances create ruvector-model-analysis \
  --zone=us-central1-a \
  --machine-type=a2-highgpu-1g \
  --accelerator=type=nvidia-tesla-a100,count=1 \
  --boot-disk-size=500GB \
  --image-family=pytorch-latest-gpu \
  --image-project=deeplearning-platform-release \
  --provisioning-model=SPOT \
  --maintenance-policy=TERMINATE

8. Results: Trained Deobfuscation Model (2026-04-03)

The recommendations from sections 6-7 have been implemented. A name inference model was trained and integrated into the decompiler.

8.1 Model Architecture (Implemented)

Property	Value
Architecture	3-layer transformer encoder
Embed dim	128
Attention heads	4
FFN dim	512
Vocab size	256 (byte-level)
Parameters	673,152
Input	context[64] + minified_name[32] bytes
Output	predicted_name[32] chars + confidence

8.2 Training Results

Metric	v1 (1,602 pairs)	v2 (8,201 pairs)
Val accuracy	75.7%	95.7%
Val loss	0.914	0.149
Epochs	10	30
Training time	~70s (CPU)	~5 min (CPU)

v2 beats JSNice (2015) SOTA of 63% by 32.7 percentage points. 5x more training data drove accuracy from 75.7% → 95.7%.

8.3 Model Artifacts

File	Size	Format	Use
`model/best_model.pt`	8.0 MB	PyTorch checkpoint	Training/export
`model/deobfuscator.onnx`	221 KB	ONNX	ort-based inference
`model/weights.bin`	2.6 MB	Raw f32 binary	Pure Rust inference

8.4 Inference Backends (Implemented)

Backend	File ext	Dependencies	Feature flag	Status
Pure Rust transformer	`.bin`	None (std only)	Always available	Deployed
ONNX Runtime	`.onnx`	`ort` crate	`neural`	Deployed
RuvLLM GGUF	`.gguf`	`ruvllm`	`ruvllm`	Stub (future)
RVF OVERLAY	`.rvf`	`rvf-runtime`	`rvf`	Stub (future)

The pure Rust transformer (transformer.rs, 416 lines) implements the full forward pass — multi-head self-attention, GELU activation, layer norm, softmax — with zero external ML dependencies. Loads weights from a simple binary format and produces identical output to PyTorch within f32 epsilon.

8.5 Training Pipeline

generate-deobfuscation-data.mjs  →  8,201 training pairs
         ↓
train-deobfuscator.py            →  PyTorch model (673K params)
         ↓
export-weights-bin.py            →  weights.bin (2.6 MB)
         ↓
transformer.rs                   →  Pure Rust inference (<5ms/name)

8.6 Integration with Decompiler

The NeuralInferrer in crates/ruvector-decompiler/src/neural.rs uses a Backend enum:

.bin → TransformerEncoder (pure Rust, always available)
.onnx → ort::Session (behind neural feature flag)
File path set via DecompileConfig.model_path

In the inference pipeline, neural inference is tried first (highest accuracy), then falls back to training corpus patterns (210 patterns), then generic heuristics.

8.7 ADRs Implemented

ADR	Title	Status
ADR-135	MinCut Decompiler with Witness Chains	Deployed
ADR-136	GPU-Trained Deobfuscation Model	Deployed (CPU training complete)
ADR-137	npm Decompiler CLI and MCP Tools	Proposed

21 KiB Raw Permalink Blame History