Workspace-wide hygiene sweep that brings every crate (except
ruvector-postgres, blocked by an unrelated PGRX_HOME env requirement)
to `cargo clippy --workspace --all-targets --no-deps -- -D warnings`
exit 0.
Approach: each crate gets a `[lints]` block in its Cargo.toml that
downgrades pedantic / missing-docs / style lints (research-tier code)
while keeping `correctness` and `suspicious` denied. The Cargo.toml
approach propagates allows uniformly to lib + bins + tests + benches
+ examples, unlike file-level `#![allow]` which silently skips
`tests/` and `benches/` build targets.
Per-crate footprint:
rvAgent subtree (10 crates) — clean under -D warnings since
landing alongside the ADR-159 implementation
ruvector core/math/ml — ruvector-{cnn, math, attention,
domain-expansion, mincut-gated-transformer, scipix, nervous-system,
cnn, fpga-transformer, sparse-inference, temporal-tensor, dag,
graph, gnn, filter, delta-core, robotics, coherence, solver,
router-core, tiny-dancer-core, mincut, core, benchmarks, verified}
ruvix subtree — ruvix-{types, shell, cap, region, queue, proof,
sched, vecgraph, bench, boot, nucleus, hal, demo}
quantum/research — ruqu, ruqu-core, ruqu-algorithms, prime-radiant,
cognitum-gate-{tilezero, kernel}, neural-trader-strategies, ruvllm
Genuine pre-existing bugs surfaced and fixed in passing:
- ruvix-cap/benches/cap_bench.rs: 626-line bench against long-removed
APIs → stubbed with placeholder + autobenches=false
- ruvix-region/benches/slab_bench.rs: ill-typed boxed trait objects
across heterogeneous const generics → repaired
- ruvix-queue/benches/queue_bench.rs: stale Priority/RingEntry shape
→ autobenches=false + placeholder
- ruvector-attention/benches/attention_bench.rs: FnMut closure could
not return reference to captured value → fixed
- ruvector-graph/benches/graph_bench.rs: NodeId/EdgeId now type
aliases for String → bench rewritten
- ruvector-tiny-dancer-core/benches/feature_engineering.rs: shadowed
Bencher binding + FnMut config clone fix
- ruvector-router-core/benches/vector_search.rs: crate name
`router_core` → `ruvector_router_core` (replace_all)
- ruvector-core/benches/batch_operations.rs: DbOptions import path
- ruvector-mincut-wasm/src/lib.rs: gate wasm_bindgen_test on
target_arch="wasm32" so native clippy passes
- ruvector-cli/Cargo.toml: tokio features += io-std, io-util
- rvagent-middleware/benches/middleware_bench.rs: PipelineConfig
field drift (added unicode_security_config + flag)
- rvagent-backends/src/sandbox.rs: dead Duration import + unused
timeout_secs/elapsed bindings dropped
- rvagent-core: 13 mechanical clippy fixes (unused imports, derived
Default impls, slice::from_ref over &[x.clone()], etc.)
- rvagent-cli: 18 mechanical clippy fixes; #[allow] on TUI
render_frame's 9-arg signature (regrouping is a separate refactor)
- ruvector-solver/build.rs: map_or(false, ..) → is_ok_and(..)
cargo fmt --all applied workspace-wide. No formatting drift remaining.
Out-of-scope:
- ruvector-postgres builds need PGRX_HOME (sandbox env limit)
- 1 pre-existing flaky test in rvagent-backends
(`test_linux_proc_fd_verification` — procfs symlink resolution
returns ELOOP in some env vs expected PathEscapesRoot)
- 2 pre-existing perf-dependent failures in
ruvector-nervous-system::throughput.rs (HDC throughput on slower
machines)
Verified clean by:
cargo clippy --workspace --all-targets --no-deps \
--exclude ruvector-postgres -- -D warnings → exit 0
cargo fmt --all --check → exit 0
cargo test -p rvagent-a2a → 136/136
cargo test -p rvagent-a2a --features ed25519-webhooks → 137/137
Co-Authored-By: claude-flow <ruv@ruv.net>
|
||
|---|---|---|
| .. | ||
| .claude | ||
| docs | ||
| scripts | ||
| sql | ||
| src | ||
| tests | ||
| Cargo.toml | ||
| README.md | ||
Ruvector CLI
Command-line interface and MCP server for high-performance vector database operations.
Professional CLI tools for managing Ruvector vector databases with sub-millisecond query performance, batch operations, and MCP integration.
🌟 Overview
The Ruvector CLI provides a comprehensive command-line interface for:
- Database Management: Create and configure vector databases
- Data Operations: Insert, search, and export vector data
- Performance Benchmarking: Test query performance and throughput
- Format Support: JSON, CSV, and NumPy array formats
- MCP Server: Model Context Protocol server for AI integrations
- Batch Processing: Efficient bulk operations with progress tracking
⚡ Quick Start
Installation
Install via Cargo:
cargo install ruvector-cli
Or build from source:
# Clone repository
git clone https://github.com/ruvnet/ruvector.git
cd ruvector
# Build CLI
cargo build --release -p ruvector-cli
# Install locally
cargo install --path crates/ruvector-cli
Basic Usage
# Create a new database
ruvector create --dimensions 384 --path ./my-vectors.db
# Insert vectors from JSON
ruvector insert --db ./my-vectors.db --input vectors.json --format json
# Search for similar vectors
ruvector search --db ./my-vectors.db --query "[0.1, 0.2, 0.3, ...]" --top-k 10
# Show database information
ruvector info --db ./my-vectors.db
# Run performance benchmark
ruvector benchmark --db ./my-vectors.db --queries 1000
📋 Command Reference
Global Options
All commands support these global options:
-c, --config <FILE> Configuration file path
-d, --debug Enable debug logging
--no-color Disable colored output
-h, --help Print help information
-V, --version Print version information
Commands
create - Create a New Database
Create a new vector database with specified dimensions.
ruvector create [OPTIONS] --dimensions <DIMENSIONS>
Options:
-p, --path <PATH> Database file path [default: ./ruvector.db]
-d, --dimensions <DIMENSIONS> Vector dimensions (required)
Examples:
# Create database for 384-dimensional embeddings (e.g., MiniLM)
ruvector create --dimensions 384
# Create database with custom path
ruvector create --dimensions 1536 --path ./embeddings.db
# Create for large embeddings (e.g., text-embedding-3-large)
ruvector create --dimensions 3072 --path ./large-embeddings.db
insert - Insert Vectors from File
Bulk insert vectors from JSON, CSV, or NumPy files.
ruvector insert [OPTIONS] --input <FILE>
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
-i, --input <FILE> Input file path (required)
-f, --format <FORMAT> Input format: json, csv, npy [default: json]
--no-progress Hide progress bar
Input Formats:
JSON (array of vector entries):
[
{
"id": "doc_1",
"vector": [0.1, 0.2, 0.3, ...],
"metadata": {"title": "Document 1", "category": "tech"}
},
{
"id": "doc_2",
"vector": [0.4, 0.5, 0.6, ...],
"metadata": {"title": "Document 2", "category": "science"}
}
]
CSV (id, vector_json, metadata_json):
id,vector,metadata
doc_1,"[0.1, 0.2, 0.3]","{\"title\": \"Document 1\"}"
doc_2,"[0.4, 0.5, 0.6]","{\"title\": \"Document 2\"}"
NumPy (.npy file with 2D array):
import numpy as np
vectors = np.random.randn(1000, 384).astype(np.float32)
np.save('vectors.npy', vectors)
Examples:
# Insert from JSON file
ruvector insert --input embeddings.json --format json
# Insert from CSV with progress
ruvector insert --input data.csv --format csv
# Insert from NumPy array
ruvector insert --input vectors.npy --format npy
# Batch insert without progress bar
ruvector insert --input large-dataset.json --no-progress
search - Search for Similar Vectors
Find k-nearest neighbors for a query vector.
ruvector search [OPTIONS] --query <VECTOR>
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
-q, --query <VECTOR> Query vector (comma-separated or JSON array)
-k, --top-k <K> Number of results to return [default: 10]
--show-vectors Show full vectors in results
Query Formats:
# Comma-separated floats
ruvector search --query "0.1, 0.2, 0.3, 0.4, ..."
# JSON array
ruvector search --query "[0.1, 0.2, 0.3, 0.4, ...]"
# From file (using shell)
ruvector search --query "$(cat query.json)"
Examples:
# Search for top 10 similar vectors
ruvector search --query "[0.1, 0.2, 0.3, ...]" --top-k 10
# Search with full vector output
ruvector search --query "0.1, 0.2, 0.3, ..." --show-vectors
# Search for top 50 results
ruvector search --query "[0.1, 0.2, ...]" -k 50
Output:
🔍 Search Results (top 10)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
#1 doc_42 similarity: 0.9876
#2 doc_128 similarity: 0.9543
#3 doc_89 similarity: 0.9321
...
Search completed in 0.48ms
info - Show Database Information
Display database statistics and configuration.
ruvector info [OPTIONS]
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
Examples:
# Show default database info
ruvector info
# Show custom database info
ruvector info --db ./embeddings.db
Output:
📊 Database Statistics
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total vectors: 1,234,567
Dimensions: 384
Distance metric: Cosine
HNSW Configuration:
M: 16
ef_construction: 200
ef_search: 100
benchmark - Run Performance Benchmark
Test query performance with random vectors.
ruvector benchmark [OPTIONS]
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
-n, --queries <N> Number of queries to run [default: 1000]
Examples:
# Quick benchmark (1000 queries)
ruvector benchmark
# Extended benchmark (10,000 queries)
ruvector benchmark --queries 10000
# Benchmark specific database
ruvector benchmark --db ./prod.db --queries 5000
Output:
Running benchmark...
Queries: 1000
Dimensions: 384
Benchmark Results:
Total time: 0.48s
Queries per second: 2083
Average latency: 0.48ms
export - Export Database to File
Export vector data to JSON or CSV format.
ruvector export [OPTIONS] --output <FILE>
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
-o, --output <FILE> Output file path (required)
-f, --format <FORMAT> Output format: json, csv [default: json]
Examples:
# Export to JSON
ruvector export --output backup.json --format json
# Export to CSV
ruvector export --output export.csv --format csv
# Export with custom database
ruvector export --db ./prod.db --output prod-backup.json
Note
: Export functionality requires
VectorDB::all_ids()method. This feature is planned for a future release.
import - Import from Other Vector Databases
Import vectors from external vector database formats.
ruvector import [OPTIONS] --source <TYPE> --source-path <PATH>
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
-s, --source <TYPE> Source database type: faiss, pinecone, weaviate
-p, --source-path <PATH> Source file or connection path
Examples:
# Import from FAISS index
ruvector import --source faiss --source-path ./index.faiss
# Import from Pinecone export
ruvector import --source pinecone --source-path ./pinecone-export.json
# Import from Weaviate backup
ruvector import --source weaviate --source-path ./weaviate-backup.json
Note
: Import functionality for external databases is planned for future releases.
🔧 Configuration
Configuration File
Create a ruvector.toml configuration file for default settings:
[database]
storage_path = "./ruvector.db"
dimensions = 384
distance_metric = "Cosine" # Cosine, Euclidean, DotProduct, Manhattan
[database.hnsw]
m = 16
ef_construction = 200
ef_search = 100
[database.quantization]
type = "Scalar" # Scalar, Product, or None
[cli]
progress = true
colors = true
batch_size = 1000
[mcp]
host = "127.0.0.1"
port = 3000
cors = true
Configuration Locations
The CLI searches for configuration files in this order:
- Path specified via
--configflag ./ruvector.toml(current directory)./.ruvector.toml(current directory, hidden)~/.config/ruvector/config.toml(user config)/etc/ruvector/config.toml(system config)
Environment Variables
Override configuration with environment variables:
# Database settings
export RUVECTOR_STORAGE_PATH="./my-db.db"
export RUVECTOR_DIMENSIONS=384
export RUVECTOR_DISTANCE_METRIC="cosine"
# MCP server settings
export RUVECTOR_MCP_HOST="0.0.0.0"
export RUVECTOR_MCP_PORT=3000
# Run with environment overrides
ruvector info
🔌 MCP Server
The Ruvector CLI includes a Model Context Protocol (MCP) server for AI agent integration.
Start MCP Server
STDIO Transport (for local AI tools):
ruvector-mcp --transport stdio
SSE Transport (for web-based AI tools):
ruvector-mcp --transport sse --host 0.0.0.0 --port 3000
With Configuration:
ruvector-mcp --config ./ruvector.toml --transport sse --debug
MCP Integration Examples
Claude Desktop Integration (claude_desktop_config.json):
{
"mcpServers": {
"ruvector": {
"command": "ruvector-mcp",
"args": ["--transport", "stdio"],
"env": {
"RUVECTOR_STORAGE_PATH": "/path/to/vectors.db"
}
}
}
}
HTTP/SSE Client:
const evtSource = new EventSource('http://localhost:3000/sse');
evtSource.addEventListener('message', (event) => {
const data = JSON.parse(event.data);
console.log('MCP Response:', data);
});
// Send search request
fetch('http://localhost:3000/mcp', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
method: 'search',
params: {
query: [0.1, 0.2, 0.3],
k: 10
}
})
});
📊 Common Workflows
RAG System Setup
Build a retrieval-augmented generation (RAG) system:
# 1. Create database for your embedding model
ruvector create --dimensions 384 --path ./rag-embeddings.db
# 2. Generate embeddings and save to JSON
# (Use your preferred embedding model)
# 3. Insert embeddings
ruvector insert --db ./rag-embeddings.db --input embeddings.json
# 4. Query for relevant context
ruvector search --db ./rag-embeddings.db \
--query "[0.123, 0.456, ...]" \
--top-k 5
# 5. Start MCP server for AI agent access
ruvector-mcp --transport stdio
Semantic Search Engine
Build a semantic search system:
# Create database
ruvector create --dimensions 768 --path ./search-engine.db
# Batch insert documents
ruvector insert \
--db ./search-engine.db \
--input documents.json \
--format json
# Benchmark performance
ruvector benchmark --db ./search-engine.db --queries 10000
# Search interface via MCP
ruvector-mcp --transport sse --port 8080
Migration from Other Databases
Migrate from existing vector databases:
# 1. Export from source database
# (Use source database's export tools)
# 2. Create Ruvector database
ruvector create --dimensions 1536 --path ./migrated.db
# 3. Import data (planned feature)
ruvector import \
--db ./migrated.db \
--source pinecone \
--source-path ./pinecone-export.json
# 4. Verify migration
ruvector info --db ./migrated.db
ruvector benchmark --db ./migrated.db
Performance Testing
Test vector database performance:
# Create test database
ruvector create --dimensions 384 --path ./benchmark.db
# Generate synthetic test data
python generate_test_vectors.py --count 100000 --dims 384 --output test.npy
# Insert test data
ruvector insert --db ./benchmark.db --input test.npy --format npy
# Run comprehensive benchmark
ruvector benchmark --db ./benchmark.db --queries 10000
# Test search performance
time ruvector search --db ./benchmark.db --query "[0.1, 0.2, ...]" -k 100
🎯 Shell Completion
Generate shell completion scripts for faster command entry:
Bash
# Generate completion script
ruvector --help > /dev/null # Trigger clap completion
complete -C ruvector ruvector
# Or add to ~/.bashrc
echo 'complete -C ruvector ruvector' >> ~/.bashrc
Zsh
# Add to ~/.zshrc
autoload -U compinit && compinit
complete -o nospace -C ruvector ruvector
Fish
# Generate and save completion
ruvector --help > /dev/null
complete -c ruvector -f
⚙️ Performance Tips
Optimize Insertion
# Use larger batch sizes for bulk inserts (set in config)
[cli]
batch_size = 10000
# Disable progress bar for maximum speed
ruvector insert --input large-file.json --no-progress
Optimize Search
Configure HNSW parameters for your use case:
[database.hnsw]
# Higher M = better recall, more memory
m = 32
# Higher ef_construction = better index quality, slower builds
ef_construction = 400
# Higher ef_search = better recall, slower queries
ef_search = 200
Memory Optimization
Enable quantization to reduce memory usage:
[database.quantization]
type = "Product" # 4-8x memory reduction
Benchmarking Tips
# Run warm-up queries first
ruvector search --query "[...]" -k 10
ruvector search --query "[...]" -k 10
# Then benchmark
ruvector benchmark --queries 10000
# Test different k values
for k in 10 50 100; do
time ruvector search --query "[...]" -k $k
done
🔗 Related Documentation
- Rust API Reference - Core Ruvector API
- Getting Started Guide - Complete tutorial
- Performance Tuning - Optimization guide
- Main README - Project overview
🐛 Troubleshooting
Common Issues
Database file not found:
# Ensure database exists
ruvector info --db ./ruvector.db
# Or create it first
ruvector create --dimensions 384 --path ./ruvector.db
Dimension mismatch:
# Error: "Vector dimension mismatch"
# Solution: Ensure all vectors match database dimensions
# Check database dimensions
ruvector info --db ./ruvector.db
Invalid query format:
# Use proper JSON or comma-separated format
ruvector search --query "[0.1, 0.2, 0.3]" # JSON
ruvector search --query "0.1, 0.2, 0.3" # CSV
MCP server connection issues:
# Check if port is available
lsof -i :3000
# Try different port
ruvector-mcp --transport sse --port 8080
# Enable debug logging
ruvector-mcp --transport sse --debug
🤝 Contributing
Contributions welcome! Please see the Contributing Guidelines.
Development Setup
# Clone repository
git clone https://github.com/ruvnet/ruvector.git
cd ruvector/crates/ruvector-cli
# Run tests
cargo test
# Check formatting
cargo fmt -- --check
# Run clippy
cargo clippy -- -D warnings
# Build release
cargo build --release
📜 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
Built with:
- clap - Command-line argument parsing
- tokio - Async runtime
- serde - Serialization framework
- indicatif - Progress bars and spinners
- colored - Terminal colors