mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-27 00:25:10 +00:00
Fix CLI dimension field mismatch + add TurboQuant to README (#309)
* fix(cli): correct field name mismatch in create and benchmark commands The CLI passed `dimension` (singular) but the native NAPI binding expects `dimensions` (plural). Also fix `db.save()` call which doesn't exist on VectorDBWrapper — use `storagePath` constructor option instead. Fixes #307 Co-Authored-By: claude-flow <ruv@ruv.net> * docs: add TurboQuant to README capabilities and comparison tables Co-Authored-By: claude-flow <ruv@ruv.net> * docs(npm): update ruvector npm package for v2.1 SOTA features - Add v2.1 section with FlashAttention-3, Graph RAG, hybrid search, DiskANN, ColBERT, Matryoshka, MLA, Mamba SSM, TurboQuant, OPQ, GraphMAE - Update description to highlight hybrid retrieval and Graph RAG - Add keywords: graph-rag, diskann, hybrid-search, colbert, turboquant, mamba - Bump version to 0.2.19 Co-Authored-By: claude-flow <ruv@ruv.net> * feat(ruvllm): update npm package with TurboQuant docs and SEO keywords - Add TurboQuant KV-cache compression section (2-4 bit, 6-8x savings) - Update description and add v2.5 feature table - Add SEO keywords: turboquant, kv-cache, quantization, flash-attention, speculative-decoding, gguf, mamba, edge-ai, local-llm, model-compression - Bump to v2.5.4, publish ruvllm crate to 2.1.0 Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
parent
8e0943acac
commit
e7e545e22c
6 changed files with 87 additions and 26 deletions
|
|
@ -45,6 +45,7 @@ User Query → [SONA Engine] → Model Response → User Feedback
|
|||
| [Hybrid search](./crates/ruvector-core) | 🔍 Sparse vectors + dense vectors with RRF fusion — 20-49% better retrieval | Keyword OR vector, not both |
|
||||
| [Graph RAG](./crates/ruvector-core) | 📊 Knowledge graph + community detection for multi-hop queries — 30-60% improvement | Naive chunk-based RAG |
|
||||
| [DiskANN](./crates/ruvector-core) | 💾 Billion-scale SSD-backed ANN with <10ms latency via Vamana graph | Memory-only indexes |
|
||||
| [TurboQuant](./crates/ruvllm) | ⚡ 2-4 bit KV-cache quantization — 6-8x memory savings with <0.5% quality loss | No quantization or 8-bit only |
|
||||
| [ColBERT multi-vector](./crates/ruvector-core) | 🎯 Per-token late interaction retrieval (MaxSim) for fine-grained matching | Single-vector only |
|
||||
| [Matryoshka embeddings](./crates/ruvector-core) | 🪆 Adaptive-dimension search — coarse-to-fine funnel for speed with minimal recall loss | Fixed dimensions only |
|
||||
| **Graph & Relationships** | | |
|
||||
|
|
@ -97,6 +98,7 @@ User Query → [SONA Engine] → Model Response → User Feedback
|
|||
| 8f | [**OPQ**](./crates/ruvector-core) | Optimized Product Quantization with learned rotation — 10-30% error reduction vs standard PQ |
|
||||
| 8g | [**LSM compaction**](./crates/ruvector-core) | Log-Structured Merge-tree for write-heavy vector workloads with bloom filters |
|
||||
| 8h | [**GraphMAE**](./crates/ruvector-gnn) | Graph Masked Autoencoder — self-supervised node representation learning with GAT encoder |
|
||||
| 8i | [**TurboQuant**](./crates/ruvllm) | 2-4 bit asymmetric KV-cache quantization — 6-8x memory reduction, <0.5% perplexity loss, H2O/PyramidKV eviction |
|
||||
|
||||
**Distributed Systems**
|
||||
| # | Capability | What It Does |
|
||||
|
|
|
|||
|
|
@ -10,9 +10,9 @@
|
|||
|
||||
**The fastest vector database for Node.js—built in Rust, runs everywhere**
|
||||
|
||||
Ruvector is a next-generation vector database that brings **enterprise-grade semantic search** to Node.js applications. Unlike cloud-only solutions or Python-first databases, Ruvector is designed specifically for JavaScript/TypeScript developers who need **blazing-fast vector similarity search** without the complexity of external services.
|
||||
Ruvector is a self-learning vector database with **enterprise-grade semantic search**, hybrid retrieval (sparse + dense), Graph RAG, FlashAttention-3, and billion-scale DiskANN — all in a single npm package. Unlike cloud-only solutions or Python-first databases, Ruvector is designed for JavaScript/TypeScript developers who need **blazing-fast vector search** without external services.
|
||||
|
||||
> 🚀 **Sub-millisecond queries** • 🎯 **52,000+ inserts/sec** • 💾 **~50 bytes per vector** • 🌍 **Runs anywhere**
|
||||
> 🚀 **Sub-millisecond queries** • 🎯 **52,000+ inserts/sec** • 💾 **~50 bytes per vector** • 🌍 **Runs anywhere** • 🧠 **859 tests passing**
|
||||
|
||||
Built by [rUv](https://ruv.io) with production-grade Rust performance and intelligent platform detection—**automatically uses native bindings when available, falls back to WebAssembly when needed**.
|
||||
|
||||
|
|
@ -36,12 +36,25 @@ npx ruvector hooks init --pretrain --build-agents quality
|
|||
- 🔗 **Co-edit Patterns** — Learns file relationships from git history
|
||||
- 💾 **Vector Memory** — HNSW-indexed semantic recall (150x faster)
|
||||
|
||||
### New in v2.1 — SOTA Vector Search
|
||||
- **FlashAttention-3** — IO-aware tiled attention, O(N) memory instead of O(N^2)
|
||||
- **Graph RAG** — Knowledge graph + community detection for multi-hop queries (30-60% improvement)
|
||||
- **Hybrid Search** — Sparse + dense vectors with RRF fusion (20-49% better retrieval)
|
||||
- **DiskANN / Vamana** — Billion-scale SSD-backed ANN with <10ms latency
|
||||
- **ColBERT Multi-Vector** — Per-token late interaction retrieval (MaxSim)
|
||||
- **Matryoshka Embeddings** — Adaptive-dimension search with funnel/cascade modes
|
||||
- **MLA** — Multi-Head Latent Attention with ~93% KV-cache compression (DeepSeek-V2/V3)
|
||||
- **Mamba SSM** — Selective State Space Models for linear-time sequence processing
|
||||
- **TurboQuant** — 2-4 bit KV-cache quantization, 6-8x memory reduction
|
||||
- **OPQ** — Optimized Product Quantization with learned rotation (10-30% error reduction)
|
||||
- **GraphMAE** — Graph Masked Autoencoder for self-supervised node learning
|
||||
|
||||
### New in v2.0
|
||||
- ⚡ **ONNX WASM Embeddings** — all-MiniLM-L6-v2 (384d) runs locally, no API needed
|
||||
- 🌳 **AST Analysis** — Symbol extraction, complexity metrics, import graphs
|
||||
- 📊 **Diff Embeddings** — Semantic change classification with risk scoring
|
||||
- 🧪 **Coverage Routing** — Test coverage-aware agent selection
|
||||
- 🔍 **Graph Algorithms** — MinCut boundaries, Louvain communities, Spectral clustering
|
||||
- **ONNX WASM Embeddings** — all-MiniLM-L6-v2 (384d) runs locally, no API needed
|
||||
- **AST Analysis** — Symbol extraction, complexity metrics, import graphs
|
||||
- **Diff Embeddings** — Semantic change classification with risk scoring
|
||||
- **Coverage Routing** — Test coverage-aware agent selection
|
||||
- **Graph Algorithms** — MinCut boundaries, Louvain communities, Spectral clustering
|
||||
- 🛡️ **Security Scanning** — Parallel vulnerability pattern detection
|
||||
- 🎯 **RAG Context** — Semantic retrieval with HNSW indexing
|
||||
|
||||
|
|
|
|||
|
|
@ -144,13 +144,11 @@ program
|
|||
try {
|
||||
const dimension = parseInt(options.dimension);
|
||||
const db = new VectorDB({
|
||||
dimension,
|
||||
dimensions: dimension,
|
||||
metric: options.metric,
|
||||
path: dbPath,
|
||||
autoPersist: true
|
||||
storagePath: dbPath,
|
||||
});
|
||||
|
||||
db.save(dbPath);
|
||||
spinner.succeed(chalk.green(`Database created: ${dbPath}`));
|
||||
console.log(chalk.gray(` Dimension: ${dimension}`));
|
||||
console.log(chalk.gray(` Metric: ${options.metric}`));
|
||||
|
|
@ -322,7 +320,7 @@ program
|
|||
let spinner = ora('Creating database...').start();
|
||||
|
||||
try {
|
||||
const db = new VectorDB({ dimension, metric: 'cosine' });
|
||||
const db = new VectorDB({ dimensions: dimension, metric: 'cosine' });
|
||||
spinner.succeed();
|
||||
|
||||
// Insert benchmark
|
||||
|
|
@ -366,10 +364,9 @@ program
|
|||
console.log(chalk.gray(` Avg Latency: ${chalk.yellow(avgLatency)}ms`));
|
||||
|
||||
// Stats
|
||||
const stats = db.stats();
|
||||
console.log(chalk.cyan('\nFinal Stats:'));
|
||||
console.log(chalk.white(` Vector Count: ${chalk.yellow(stats.count)}`));
|
||||
console.log(chalk.white(` Dimension: ${chalk.yellow(stats.dimension)}`));
|
||||
console.log(chalk.white(` Vector Count: ${chalk.yellow(numVectors)}`));
|
||||
console.log(chalk.white(` Dimension: ${chalk.yellow(dimension)}`));
|
||||
console.log(chalk.white(` Implementation: ${chalk.yellow(getImplementationType())}`));
|
||||
|
||||
} catch (error) {
|
||||
|
|
@ -2537,7 +2534,7 @@ program
|
|||
const spinner = ora('Creating demo database...').start();
|
||||
|
||||
try {
|
||||
const db = new VectorDB({ dimension: 4, metric: 'cosine' });
|
||||
const db = new VectorDB({ dimensions: 4, metric: 'cosine' });
|
||||
|
||||
spinner.text = 'Inserting vectors...';
|
||||
db.insert('vec1', [1.0, 0.0, 0.0, 0.0], { label: 'x-axis' });
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
{
|
||||
"name": "ruvector",
|
||||
"version": "0.2.0",
|
||||
"description": "High-performance vector database for Node.js with automatic native/WASM fallback",
|
||||
"version": "0.2.19",
|
||||
"description": "Self-learning vector database for Node.js — hybrid search, Graph RAG, FlashAttention-3, DiskANN, 50+ attention mechanisms",
|
||||
"main": "dist/index.js",
|
||||
"types": "dist/index.d.ts",
|
||||
"bin": {
|
||||
|
|
@ -46,6 +46,12 @@
|
|||
"shared-intelligence",
|
||||
"mcp",
|
||||
"edge-computing",
|
||||
"graph-rag",
|
||||
"diskann",
|
||||
"hybrid-search",
|
||||
"colbert",
|
||||
"turboquant",
|
||||
"mamba",
|
||||
"pi-brain",
|
||||
"identity",
|
||||
"pi-key",
|
||||
|
|
|
|||
|
|
@ -1,6 +1,13 @@
|
|||
# @ruvector/ruvllm v2.3
|
||||
# @ruvector/ruvllm
|
||||
|
||||
Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, and SIMD inference for Node.js.
|
||||
[](https://www.npmjs.com/package/@ruvector/ruvllm)
|
||||
[](https://www.npmjs.com/package/@ruvector/ruvllm)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://github.com/ruvnet/ruvector)
|
||||
|
||||
**Self-learning LLM runtime for Node.js** — GGUF inference, TurboQuant KV-cache compression (6-8x memory savings), SONA adaptive learning, FlashAttention, speculative decoding, and SIMD-optimized kernels. Built in Rust, runs everywhere.
|
||||
|
||||
> Inference at **88-135 tok/s** on M4 Pro | **<1ms** SONA adaptation | **6-8x** KV-cache compression via TurboQuant
|
||||
|
||||
## Installation
|
||||
|
||||
|
|
@ -34,18 +41,43 @@ for await (const token of llm.stream('Write a haiku about Rust')) {
|
|||
}
|
||||
```
|
||||
|
||||
## What's New in v2.3
|
||||
## What's New in v2.5
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| **TurboQuant KV-Cache** | 2-4 bit asymmetric quantization with per-channel scale/zero-point — 6-8x memory reduction, <0.5% perplexity loss |
|
||||
| **TurboQuant Embedding Store** | Quantized vector storage with compressed search — 10-30x memory savings |
|
||||
| **H2O / PyramidKV Eviction** | Intelligent cache eviction policies for long-context inference |
|
||||
| **Optimized Inner Product** | Asymmetric distance on quantized data — skip decompression for 2-4x faster search |
|
||||
| **RuvLTRA Models** | Purpose-built 0.5B & 3B models for Claude Flow |
|
||||
| **Task-Specific LoRA** | 5 pre-trained adapters (coder, researcher, security, architect, reviewer) |
|
||||
| **HuggingFace Hub** | Download/upload models directly |
|
||||
| **Adapter Merging** | TIES, DARE, SLERP strategies |
|
||||
| **HNSW Routing** | 150x faster semantic matching |
|
||||
| **Evaluation Harness** | SWE-Bench testing with 5 ablation modes |
|
||||
| **Auto-Dimension** | HNSW auto-detects model embedding size |
|
||||
| **mistral-rs Backend** | Production serving with PagedAttention, X-LoRA, ISQ (5-10x concurrent users) |
|
||||
| **mistral-rs Backend** | Production serving with PagedAttention, X-LoRA, ISQ |
|
||||
|
||||
## TurboQuant — KV-Cache Compression
|
||||
|
||||
Reduce inference memory by 6-8x with <0.5% quality loss:
|
||||
|
||||
```typescript
|
||||
import { simd } from '@ruvector/ruvllm/simd';
|
||||
|
||||
// TurboQuant compresses KV-cache entries at 2-4 bit precision
|
||||
// with per-channel asymmetric quantization (scale + zero-point).
|
||||
// Eviction policies (H2O, Sliding Window, PyramidKV) keep the
|
||||
// most important tokens in cache during long-context generation.
|
||||
|
||||
// Supported bit widths: 2-bit (32x), 3-bit (10.7x), 4-bit (8x), 8-bit (4x)
|
||||
```
|
||||
|
||||
| Bits | Compression | Perplexity Loss | Use Case |
|
||||
|------|-------------|-----------------|----------|
|
||||
| 2-bit | 32x | ~2% | Maximum compression, edge devices |
|
||||
| 3-bit | 10.7x | <1% | Balanced — recommended for most uses |
|
||||
| 4-bit | 8x | <0.5% | High quality, long-context inference |
|
||||
| 8-bit | 4x | ~0% | Baseline quantization |
|
||||
|
||||
## CLI Usage
|
||||
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
{
|
||||
"name": "@ruvector/ruvllm",
|
||||
"version": "2.5.3",
|
||||
"description": "Self-learning LLM orchestration with SONA adaptive learning, HNSW memory, FastGRNN routing, and SIMD inference",
|
||||
"version": "2.5.4",
|
||||
"description": "Self-learning LLM runtime — TurboQuant KV-cache (6-8x compression), SONA adaptive learning, FlashAttention, speculative decoding, GGUF inference",
|
||||
"main": "dist/cjs/index.js",
|
||||
"module": "dist/esm/index.js",
|
||||
"types": "dist/cjs/index.d.ts",
|
||||
|
|
@ -92,7 +92,18 @@
|
|||
"deep-learning",
|
||||
"napi",
|
||||
"rust",
|
||||
"ruvector"
|
||||
"ruvector",
|
||||
"turboquant",
|
||||
"kv-cache",
|
||||
"quantization",
|
||||
"flash-attention",
|
||||
"speculative-decoding",
|
||||
"gguf",
|
||||
"mamba",
|
||||
"transformer",
|
||||
"edge-ai",
|
||||
"local-llm",
|
||||
"model-compression"
|
||||
],
|
||||
"author": "rUv Team <team@ruv.io>",
|
||||
"license": "MIT OR Apache-2.0",
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue