mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-26 07:44:05 +00:00

History

rUv ae01304720 feat(postgres): Add HNSW index and embedding functions support (#62 ) * chore: Add proptest regression data from test run Records edge cases found during property testing that cause integer overflow failures. These will help reproduce and fix the boundary condition bugs in distance calculations. * fix: Resolve property test failures with overflow handling - Fix ScalarQuantized::distance() i16 overflow: use i32 for diffdiff (255255=65025 overflows i16 max of 32767) - Fix ScalarQuantized::quantize() division by zero when all values equal (handle scale=0 case by defaulting to 1.0) - Bound vector_strategy() to -1000..1000 range to prevent overflow in distance calculations with extreme float values All 177 tests now pass in ruvector-core. * fix(cli): Resolve short option conflicts in clap argument definitions - Change --dimensions from -d to -D to avoid conflict with global --debug - Change --db from -d to -b across all subcommands (Insert, Search, Info, Benchmark, Export, Import) to avoid conflict with global --debug Fixes clap panic in debug builds: "Short option names must be unique" Note: 4 CLI integration tests still fail due to pre-existing issue where VectorDB doesn't persist its configuration to disk. When reopening a database, dimensions are read from config defaults (384) instead of from the stored database metadata. This is an architectural issue requiring VectorDB changes to implement proper metadata persistence. * feat(core): Add database configuration persistence and fix CLI test - Add CONFIG_TABLE to storage.rs for persisting DbOptions - Implement save_config() and load_config() methods in VectorStorage - Modify VectorDB::new() to load stored config for existing databases - Fix dimension mismatch by recreating storage with correct dimensions - Fix test_error_handling CLI test to use /dev/null/db.db path This ensures database settings (dimensions, distance metric, HNSW config, quantization) are preserved across restarts. Previously opening an existing database would use default settings instead of stored configuration. * fix(ruvLLM): Guard against edge cases in HNSW and softmax - memory.rs: Fix random_level() to handle r=0 (ln(0) = -inf) - memory.rs: Fix ml calculation when hnsw_m=1 (ln(1) = 0 → div by zero) - router.rs: Add division-by-zero guard in softmax for larger arrays These edge cases could cause undefined behavior or NaN propagation. * feat(attention): Implement novel Lorentz Cascade Attention (LCA) A new hyperbolic attention architecture with significant improvements: ## Key Innovations 1. Lorentz Model: Uses hyperboloid instead of Poincaré ball - No boundary instability (points can extend to infinity) - Simpler distance formula 2. Busemann Scoring: O(d) attention weights via dot products - 50-100x faster than Poincaré distance computation - Naturally hierarchical (measures "depth" in tree) 3. Einstein Midpoint: Closed-form hyperbolic centroid - 322x faster than iterative Fréchet mean (50 iterations) - O(n×d) instead of O(n×d×iter) 4. Multi-Curvature Heads: Adaptive hierarchy depth - Different heads for shallow vs deep hierarchies - Logarithmically-spaced curvatures 5. Cascade Aggregation: Coarse-to-fine refinement - Combines multi-scale representations - Sparse attention via hierarchical pruning ## Benchmark Results (64-dim, 100 keys) \| Operation \| Poincaré \| LCA \| Speedup \| \|-----------\|----------\|-----\|---------\| \| Distance \| 25 ns \| 0.5 ns \| 53x \| \| Centroid \| 2.3 ms \| 7.3 µs \| 322x \| ## API ```rust let lca = LorentzCascadeAttention::new(LCAConfig { dim: 128, num_heads: 4, curvature_range: (0.1, 2.0), temperature: 1.0, }); let output = lca.attend(&query, &keys, &values); ``` Files: - lorentz_cascade.rs: Core LCA implementation - hyperbolic_bench.rs: Benchmark comparing LCA vs Poincaré * feat(bench): Replace simulated Python benchmarks with real Rust benchmarks - Delete fake qdrant_vs_ruvector_benchmark.py that used simulated data - Add real Criterion benchmarks in benches/real_benchmark.rs - Measure actual performance: distance ops, quantization, insert, search - Real numbers: 16M cosine ops/sec, 2.5K searches/sec on 10K vectors * docs: Add honest documentation about capabilities and limitations - Update lib.rs with tested/benchmarked features vs experimental ones - Mark AgenticDB embedding function as placeholder (NOT semantic) - Add warning to RAG example about mock embeddings - Clarify that external embedding models are required for semantic search * fix: Address code review issues from gist analysis ## Fixes Applied ### 1. Fabricated Benchmarks - Rewrote docs/benchmarks/BENCHMARK_COMPARISON.md - removed false "100-4,400x faster" claims - Fixed benchmarks/graph/src/comparison-runner.ts - removed hardcoded latency multipliers - Fixed benchmarks/src/results-analyzer.ts - removed simulated histogram data ### 2. Fake Text Embeddings - Added prominent warnings to agenticdb.rs about hash-based placeholder - Added compile-time deprecation warning in lib.rs - Created integration guide with 4 real embedding options (ONNX, Candle, API, Python) ### 3. Incomplete GNN Training - Implemented Loss::compute() for MSE, CrossEntropy, BinaryCrossEntropy - Implemented Loss::gradient() for backpropagation - Added 6 new verification tests ### 4. Distance Function Bugs - Fixed inverted dequantization formula in ruvector-router-core (was /scale, now scale) - Improved scale handling in ruvector-core quantization (now uses average scale) ### 5. Empty Transaction Tests - Implemented 10+ critical tests: dirty reads, phantom reads, MVCC, deadlock detection - All 31 transaction tests now passing Addresses issues from: https://gist.github.com/couzic/93126a1c12b8d77651f93a7805b4bd60 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> feat(embeddings): Add pluggable embedding provider system for AgenticDB Implements a proper embedding abstraction layer to replace the hash-based placeholder: ## New Features ### EmbeddingProvider Trait - Pluggable interface for any embedding system - Methods: embed(), dimensions(), name() - Thread-safe (Send + Sync) ### Built-in Providers - HashEmbedding: Original placeholder (default, backward compatible) - ApiEmbedding: Production-ready API providers (OpenAI, Cohere, Voyage AI) - CandleEmbedding: Stub for candle-transformers (feature: real-embeddings) ### AgenticDB Updates - New constructor: `AgenticDB::with_embedding_provider(options, provider)` - Backward compatible: `AgenticDB::new(options)` still works with HashEmbedding - Dimension validation ensures provider matches database configuration ### Files Added - src/embeddings.rs: Core embedding provider system - tests/embeddings_test.rs: Comprehensive test suite - docs/EMBEDDINGS.md: Complete usage documentation - examples/embeddings_example.rs: Working example ### Usage ```rust // Production (OpenAI) let provider = Arc::new(ApiEmbedding::openai(&key, "text-embedding-3-small")); let db = AgenticDB::with_embedding_provider(options, provider)?; ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: Bump version to 0.1.22 for crates.io publish 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore(npm): Bump all npm package versions to 0.1.22 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: Bump version to 0.1.24 * chore: Bump version to 0.1.25 for sequential CI builds * chore(npm): Publish v0.1.25 with updated native binaries - Published platform packages: - ruvector-core-linux-x64-gnu@0.1.25 - ruvector-core-linux-arm64-gnu@0.1.25 - ruvector-core-darwin-arm64@0.1.25 - ruvector-core-win32-x64-msvc@0.1.25 - @ruvector/router-linux-x64-gnu@0.1.25 - @ruvector/router-linux-arm64-gnu@0.1.25 - @ruvector/router-darwin-arm64@0.1.25 - @ruvector/router-win32-x64-msvc@0.1.25 - Published main packages: - ruvector-core@0.1.25 - ruvector@0.1.32 - @ruvector/router@0.1.25 - @ruvector/graph-node@0.1.25 - @ruvector/graph-wasm@0.1.25 - @ruvector/cli@0.1.25 Note: darwin-x64 binaries were not built (CI cancelled) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(embeddings): Add local embedding generation support via fastembed-rs Implements native local embedding generation for ruvector-postgres, eliminating the need for external embedding APIs. New SQL functions: - ruvector_embed(text, model) - Generate embedding from text - ruvector_embed_batch(texts[], model) - Batch embedding generation - ruvector_embedding_models() - List available models - ruvector_load_model(name) - Pre-load model into cache - ruvector_unload_model(name) - Remove model from cache - ruvector_model_info(name) - Get model metadata - ruvector_set_default_model(name) - Set default model - ruvector_default_model() - Get current default - ruvector_embedding_stats() - Get cache statistics - ruvector_embedding_dims(model) - Get dimensions for model Supported models: - all-MiniLM-L6-v2 (384 dims, fast) - BAAI/bge-small-en-v1.5 (384 dims) - BAAI/bge-base-en-v1.5 (768 dims) - BAAI/bge-large-en-v1.5 (1024 dims) - sentence-transformers/all-mpnet-base-v2 (768 dims) - nomic-ai/nomic-embed-text-v1.5 (768 dims) Features: - Thread-safe model caching with lazy loading - Optional feature flag 'embeddings' - PG17 support with updated IndexAmRoutine fields - Updated Dockerfile for PG17 with PGDG repository Closes #60 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * ci: Switch darwin-x64 builds from macos-13 to macos-12 The macos-13 runner appears to have availability issues causing darwin-x64 builds to be cancelled immediately. Switching to macos-12 which should be more reliable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(docker): Add Cargo.lock to fix dependency resolution - Include workspace Cargo.lock in Docker build context - Pin dependencies to avoid cargo registry parsing issues with base64ct - Ensures reproducible builds 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * ci: Switch darwin-x64 to macos-14 runner for faster availability macos-12 runners have very long queue times (45+ minutes). macos-14 runners can cross-compile x86_64 binaries and have much better availability. * feat(npm): Add darwin-x64 (Intel Mac) support - Published ruvector-core-darwin-x64@0.1.25 with native binary built on macos-14 - Updated ruvector-core to 0.1.26 with darwin-x64 in optionalDependencies - Updated ruvector to 0.1.33 CI runner change: Switched darwin-x64 builds from macos-12 to macos-14 for better availability. * fix(postgres): Remove unimplemented GNN functions from SQL schema - Removed 3 unimplemented functions: ruvector_gat_forward, ruvector_message_aggregate, ruvector_gnn_readout - Updated Dockerfile to use pre-built SQL file instead of cargo pgrx schema (which doesn't work reliably in Docker) - SQL function count: 92 → 89 (matching actual library exports) - Extension now loads successfully in PostgreSQL 17 with avx2 SIMD support - Docker image: ruvnet/ruvector-postgres:0.2.4 (477MB) Fixes SQL/library function symbol mismatch that caused "could not find function" errors during extension loading. * feat(postgres): Add HNSW index and embedding functions (v0.2.6) - Added HNSW access method handler and operator classes - Added 10 embedding generation functions (ruvector_embed, etc.) - Removed IVFFlat references (not yet implemented) - Updated SQL schema from 89 to 100 functions - Fixed 'could not find function' errors on extension load Fixes: HNSW index support, embedding generation availability * chore: Update Cargo.lock and documentation --------- Co-authored-by: Claude <noreply@anthropic.com>		2025-12-09 11:14:52 -05:00
..
benches	feat(postgres): Add HNSW index and embedding functions support (#62 )	2025-12-09 11:14:52 -05:00
docs	feat(postgres): Add HNSW index and embedding functions support (#62 )	2025-12-09 11:14:52 -05:00
examples	feat(postgres): Add HNSW index and embedding functions support (#62 )	2025-12-09 11:14:52 -05:00
src	feat(postgres): Add HNSW index and embedding functions support (#62 )	2025-12-09 11:14:52 -05:00
tests	feat(postgres): Add HNSW index and embedding functions support (#62 )	2025-12-09 11:14:52 -05:00
Cargo.toml	feat(postgres): Add HNSW index and embedding functions support (#62 )	2025-12-09 11:14:52 -05:00
README.md	Add README documentation for ruvector-cli and ruvector-core crates	2025-11-20 20:26:39 +00:00

README.md

Ruvector Core

High-performance Rust vector database engine with HNSW indexing, quantization, and SIMD optimizations.

ruvector-core is the foundational Rust library powering Ruvector—a next-generation vector database built for extreme performance and universal deployment. This crate provides the core vector database engine with state-of-the-art algorithms optimized for modern hardware.

🌟 Why Ruvector Core?

⚡ Blazing Fast: <0.5ms p50 query latency with HNSW indexing
🧠 Memory Efficient: 4-32x compression via quantization techniques
🎯 High Accuracy: 95%+ recall with HNSW + Product Quantization
🚀 SIMD Accelerated: Hardware-optimized distance calculations using SimSIMD
🔧 Zero Dependencies: Minimal external dependencies, pure Rust implementation
📦 Production Ready: Battle-tested algorithms with comprehensive benchmarks

🚀 Features

Core Capabilities

HNSW Indexing: Hierarchical Navigable Small World graphs for O(log n) approximate nearest neighbor search
Multiple Distance Metrics: Euclidean, Cosine, Dot Product, Manhattan
Advanced Quantization: Scalar (4x), Product (8-32x), and Binary (32x) quantization
SIMD Optimizations: Hardware-accelerated distance calculations via simsimd
Zero-Copy I/O: Memory-mapped storage for instant loading
Concurrent Operations: Lock-free data structures and parallel batch processing
Flexible Storage: Persistent storage with redb and memory-mapped files

Advanced Features

Hybrid Search: Combine dense vector search with sparse BM25 text search
Filtered Search: Apply metadata filters during vector search
MMR Diversification: Maximal Marginal Relevance for diverse result sets
Conformal Prediction: Uncertainty quantification for search results
Product Quantization: Memory-efficient vector compression with high accuracy
Cache Optimization: Multi-level caching for improved performance
Lock-Free Indexing: High-concurrency operations without blocking

📦 Installation

Add ruvector-core to your Cargo.toml:

[dependencies]
ruvector-core = "0.1.0"

Feature Flags

[dependencies]
ruvector-core = { version = "0.1.0", features = ["simd", "uuid-support"] }

Available features:

simd (default): Enable SIMD-optimized distance calculations
uuid-support (default): Enable UUID generation for vector IDs

⚡ Quick Start

Basic Usage

use ruvector_core::{VectorDB, DbOptions, VectorEntry, SearchQuery, DistanceMetric};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a new vector database
    let mut options = DbOptions::default();
    options.dimensions = 384;  // Vector dimensions
    options.storage_path = "./my_vectors.db".to_string();
    options.distance_metric = DistanceMetric::Cosine;

    let db = VectorDB::new(options)?;

    // Insert vectors
    db.insert(VectorEntry {
        id: Some("doc1".to_string()),
        vector: vec![0.1, 0.2, 0.3, /* ... 384 dimensions */],
        metadata: None,
    })?;

    db.insert(VectorEntry {
        id: Some("doc2".to_string()),
        vector: vec![0.4, 0.5, 0.6, /* ... 384 dimensions */],
        metadata: None,
    })?;

    // Search for similar vectors
    let results = db.search(SearchQuery {
        vector: vec![0.1, 0.2, 0.3, /* ... 384 dimensions */],
        k: 10,  // Return top 10 results
        filter: None,
        ef_search: None,
    })?;

    for result in results {
        println!("ID: {}, Score: {}", result.id, result.score);
    }

    Ok(())
}

Batch Operations

use ruvector_core::{VectorDB, VectorEntry};

// Insert multiple vectors efficiently
let entries = vec![
    VectorEntry {
        id: Some("doc1".to_string()),
        vector: vec![0.1, 0.2, 0.3],
        metadata: None,
    },
    VectorEntry {
        id: Some("doc2".to_string()),
        vector: vec![0.4, 0.5, 0.6],
        metadata: None,
    },
];

let ids = db.insert_batch(entries)?;
println!("Inserted {} vectors", ids.len());

With Metadata Filtering

use std::collections::HashMap;
use serde_json::json;

// Insert with metadata
db.insert(VectorEntry {
    id: Some("product1".to_string()),
    vector: vec![0.1, 0.2, 0.3],
    metadata: Some(HashMap::from([
        ("category".to_string(), json!("electronics")),
        ("price".to_string(), json!(299.99)),
    ])),
})?;

// Search with metadata filter
let results = db.search(SearchQuery {
    vector: vec![0.1, 0.2, 0.3],
    k: 10,
    filter: Some(HashMap::from([
        ("category".to_string(), json!("electronics")),
    ])),
    ef_search: None,
})?;

HNSW Configuration

use ruvector_core::{DbOptions, HnswConfig, DistanceMetric};

let mut options = DbOptions::default();
options.dimensions = 384;
options.distance_metric = DistanceMetric::Cosine;

// Configure HNSW index parameters
options.hnsw_config = Some(HnswConfig {
    m: 32,                    // Connections per layer (16-64 typical)
    ef_construction: 200,     // Build-time accuracy (100-500 typical)
    ef_search: 100,          // Search-time accuracy (50-200 typical)
    max_elements: 10_000_000, // Maximum vectors
});

let db = VectorDB::new(options)?;

Quantization

use ruvector_core::{DbOptions, QuantizationConfig};

let mut options = DbOptions::default();
options.dimensions = 384;

// Enable scalar quantization (4x compression)
options.quantization = Some(QuantizationConfig::Scalar);

// Or product quantization (8-32x compression)
options.quantization = Some(QuantizationConfig::Product {
    subspaces: 8,  // Number of subspaces
    k: 256,        // Codebook size
});

let db = VectorDB::new(options)?;

📊 API Overview

Core Types

// Main database interface
pub struct VectorDB { /* ... */ }

// Vector entry with optional ID and metadata
pub struct VectorEntry {
    pub id: Option<VectorId>,
    pub vector: Vec<f32>,
    pub metadata: Option<HashMap<String, serde_json::Value>>,
}

// Search query parameters
pub struct SearchQuery {
    pub vector: Vec<f32>,
    pub k: usize,
    pub filter: Option<HashMap<String, serde_json::Value>>,
    pub ef_search: Option<usize>,
}

// Search result with score
pub struct SearchResult {
    pub id: VectorId,
    pub score: f32,
    pub vector: Option<Vec<f32>>,
    pub metadata: Option<HashMap<String, serde_json::Value>>,
}

Main Operations

impl VectorDB {
    // Create new database with options
    pub fn new(options: DbOptions) -> Result<Self>;

    // Create with just dimensions (uses defaults)
    pub fn with_dimensions(dimensions: usize) -> Result<Self>;

    // Insert single vector
    pub fn insert(&self, entry: VectorEntry) -> Result<VectorId>;

    // Insert multiple vectors
    pub fn insert_batch(&self, entries: Vec<VectorEntry>) -> Result<Vec<VectorId>>;

    // Search for similar vectors
    pub fn search(&self, query: SearchQuery) -> Result<Vec<SearchResult>>;

    // Delete vector by ID
    pub fn delete(&self, id: &str) -> Result<bool>;

    // Get vector by ID
    pub fn get(&self, id: &str) -> Result<Option<VectorEntry>>;

    // Get total count
    pub fn len(&self) -> Result<usize>;

    // Check if empty
    pub fn is_empty(&self) -> Result<bool>;
}

Distance Metrics

pub enum DistanceMetric {
    Euclidean,   // L2 distance - default for embeddings
    Cosine,      // Cosine similarity (1 - similarity)
    DotProduct,  // Negative dot product (for maximization)
    Manhattan,   // L1 distance
}

Advanced Features

// Hybrid search (dense + sparse)
use ruvector_core::{HybridSearch, HybridConfig};

let hybrid = HybridSearch::new(HybridConfig {
    alpha: 0.7,  // Balance between dense (0.7) and sparse (0.3)
    ..Default::default()
});

// Filtered search with expressions
use ruvector_core::{FilteredSearch, FilterExpression};

let filtered = FilteredSearch::new(db);
let expr = FilterExpression::And(vec![
    FilterExpression::Equals("category".to_string(), json!("books")),
    FilterExpression::GreaterThan("price".to_string(), json!(10.0)),
]);

// MMR diversification
use ruvector_core::{MMRSearch, MMRConfig};

let mmr = MMRSearch::new(MMRConfig {
    lambda: 0.5,  // Balance relevance (0.5) and diversity (0.5)
    ..Default::default()
});

🎯 Performance Characteristics

Latency (Single Query)

Operation           Flat Index    HNSW Index
─────────────────────────────────────────────
Search (1K vecs)    ~0.1ms       ~0.2ms
Search (100K vecs)  ~10ms        ~0.5ms
Search (1M vecs)    ~100ms       <1ms
Insert              ~0.1ms       ~1ms
Batch (1000)        ~50ms        ~500ms

Memory Usage (1M Vectors, 384 Dimensions)

Configuration              Memory      Recall
─────────────────────────────────────────────
Full Precision (f32)       ~1.5GB      100%
Scalar Quantization        ~400MB      98%
Product Quantization       ~200MB      95%
Binary Quantization        ~50MB       85%

Throughput (Queries Per Second)

Configuration              QPS         Latency (p50)
─────────────────────────────────────────────────
Single Thread             ~2,000      ~0.5ms
Multi-Thread (8 cores)    ~50,000     <0.5ms
With SIMD                 ~80,000     <0.3ms
With Quantization         ~100,000    <0.2ms

🔧 Configuration Guide

For Maximum Accuracy

let options = DbOptions {
    dimensions: 384,
    distance_metric: DistanceMetric::Cosine,
    hnsw_config: Some(HnswConfig {
        m: 64,
        ef_construction: 500,
        ef_search: 200,
        max_elements: 10_000_000,
    }),
    quantization: None,  // Full precision
    ..Default::default()
};

For Maximum Speed

let options = DbOptions {
    dimensions: 384,
    distance_metric: DistanceMetric::DotProduct,
    hnsw_config: Some(HnswConfig {
        m: 16,
        ef_construction: 100,
        ef_search: 50,
        max_elements: 10_000_000,
    }),
    quantization: Some(QuantizationConfig::Binary),
    ..Default::default()
};

For Balanced Performance

let options = DbOptions::default(); // Recommended defaults

🔨 Building and Testing

Build

# Build with default features
cargo build --release

# Build without SIMD
cargo build --release --no-default-features --features uuid-support

# Build for specific target with optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release

Testing

# Run all tests
cargo test

# Run with specific features
cargo test --features simd

# Run with logging
RUST_LOG=debug cargo test

Benchmarks

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench --bench hnsw_search

# Run with features
cargo bench --features simd

Available benchmarks:

distance_metrics - SIMD-optimized distance calculations
hnsw_search - HNSW index search performance
quantization_bench - Quantization techniques
batch_operations - Batch insert/search operations
comprehensive_bench - Full system benchmarks

📚 Documentation

Complete Ruvector Documentation

This crate is part of the larger Ruvector project:

Main README - Complete project overview
Getting Started Guide - Quick start tutorial
Rust API Reference - Detailed API documentation
Advanced Features Guide - Quantization, indexing, tuning
Performance Tuning - Optimization strategies
Benchmarking Guide - Running benchmarks

API Documentation

Generate and view the full API documentation:

cargo doc --open --no-deps

ruvector-core is the foundation for platform-specific bindings:

ruvector-node - Node.js bindings via NAPI-RS
ruvector-wasm - WebAssembly bindings for browsers
ruvector-cli - Command-line interface
ruvector-bench - Performance benchmarks

🤝 Contributing

We welcome contributions! See the main Contributing Guidelines for details.

Areas for Contribution

🐛 Bug fixes and stability improvements
✨ New distance metrics or quantization techniques
📈 Performance optimizations
🧪 Additional test coverage
📝 Documentation and examples

📊 Comparison

Why Ruvector Core vs. Alternatives?

Feature	Ruvector Core	hnswlib-rs	faiss-rs	qdrant
Pure Rust	✅	✅	❌ (C++)	✅
SIMD	✅ SimSIMD	❌	✅	✅
Quantization	✅ Multiple	❌	✅	✅
Zero-Copy I/O	✅	❌	✅	✅
Metadata Filter	✅	❌	❌	✅
Hybrid Search	✅	❌	❌	✅
P50 Latency	<0.5ms	~1ms	~0.5ms	~1ms
Dependencies	Minimal	Minimal	Heavy	Heavy

📜 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

Built with state-of-the-art algorithms and libraries:

hnsw_rs - HNSW implementation
simsimd - SIMD distance calculations
redb - Embedded database
rayon - Data parallelism
memmap2 - Memory-mapped files

Part of Ruvector • Built by rUv

Documentation • Crates.io • GitHub

README.md

Ruvector Core

🌟 Why Ruvector Core?

🚀 Features

Core Capabilities

Advanced Features

📦 Installation

Feature Flags

⚡ Quick Start

Basic Usage

Batch Operations

With Metadata Filtering

HNSW Configuration

Quantization

📊 API Overview

Core Types

Main Operations

Distance Metrics

Advanced Features

🎯 Performance Characteristics

Latency (Single Query)

Memory Usage (1M Vectors, 384 Dimensions)

Throughput (Queries Per Second)

🔧 Configuration Guide

For Maximum Accuracy

For Maximum Speed

For Balanced Performance

🔨 Building and Testing

Build

Testing

Benchmarks

📚 Documentation

Complete Ruvector Documentation

API Documentation

🌐 Related Crates

🤝 Contributing

Areas for Contribution

📊 Comparison

📜 License

🙏 Acknowledgments