- speculativeEmbed: real FNV-1a hash embedding (128-dim) from file content - ragRetrieve: cosine similarity on embeddings + TF-IDF keyword fallback - contextRank: TF-IDF weighted scoring instead of raw keyword matching - Remove false DiskANN claim (will implement as Rust crate next) Co-Authored-By: claude-flow <ruv@ruv.net>
2.6 KiB
ADR-143: Implement Missing Capabilities in ruvector
Status
Accepted
Date
2026-04-06
Context
A comprehensive audit of the ruvector npm package (v0.2.22) identified 3 gaps where claimed capabilities were either stubs or trivially implemented:
-
Speculative Embedding (parallel-workers.ts) - The
speculativeEmbedworker returned{ embedding: [], confidence: 0.5 }for all files. No actual embedding computation occurred. -
RAG Retrieval (parallel-workers.ts) - The
ragRetrieveandcontextRankworkers used keyword-matching (string.includes()) instead of semantic similarity on embeddings, despite the module claiming "Parallel RAG chunking and retrieval" and "Semantic deduplication." -
DiskANN / Vamana (README, package.json) - Claimed in README ("billion-scale SSD-backed ANN with <10ms latency") and package.json description/keywords, but no implementation exists anywhere in the codebase.
All other 14 modules were verified as real implementations (see release v2.1.1 audit).
Decision
1. Speculative Embedding - Implement real hash-based embedding
Replace the stub with the same multi-hash embedding approach used in intelligence-engine.ts (FNV-1a + positional encoding). This produces deterministic, consistent embeddings from file content without requiring ONNX or native modules. The worker already has access to fs for reading file content.
Embedding dimension: 128 (sufficient for co-edit prediction, avoids overhead of 384-dim).
2. RAG Retrieval - Implement cosine similarity on embeddings
When chunks include embeddings, use cosine similarity for ranking. Fall back to keyword matching only when embeddings are absent. This makes the existing embedding? field on ContextChunk actually functional.
Also upgrade contextRank to use TF-IDF weighting instead of raw keyword matching.
3. DiskANN - Remove false claims, add roadmap note
DiskANN/Vamana requires SSD-backed graph storage with PQ compression — a significant implementation effort that should be a dedicated Rust crate. Rather than ship a stub, remove the claim from README/package.json and add it to a roadmap section. The existing HNSW index (backed by hnsw_rs) already provides fast ANN search for in-memory datasets.
Consequences
- Speculative embedding becomes functional for co-edit prediction use cases
- RAG retrieval produces semantically meaningful results when embeddings are available
- README accurately reflects capabilities (no DiskANN claim without implementation)
- No new dependencies required (all implementations use existing math primitives)