mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-22 19:56:25 +00:00
* fix: batch 1 — deadlock, AVX-512 gating, Windows case-collisions
Closes #437: VectorDb::delete in ruvector-router-core acquired the stats
RwLock twice in one statement. parking_lot::RwLock is non-reentrant, so
the second .write() deadlocked against the first guard's lifetime. Bind
the guard once.
Closes #438: Gate AVX-512 intrinsics behind a new `simd-avx512` Cargo
feature (default-on). Lets downstream consumers on stable Rust 1.77–1.88
(before avx512f stabilization in 1.89) opt out without forcing nightly:
cargo build --no-default-features --features simd,storage,hnsw,api-embeddings,parallel
Runtime dispatch falls back to AVX2 + FMA when the feature is disabled.
All 4 #[target_feature(enable = "avx512f")] sites + 4 dispatch branches
updated. Both feature configurations verified to compile cleanly; all
18 simd_intrinsics tests pass.
Closes #458: Rename two pairs of case-colliding research artifacts under
docs/research/claude-code-rvsource/versions/v2.1.x/tree/react_memo_cache_sentinel/
that broke `git clone` on Windows/NTFS:
tmux.js → tmux_lc.js (TMUX.js kept)
type.js → type_lc.js (Type.js kept)
modules-manifest.json updated to match.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(brain): observable hydration + larger page-error budget (issue #464)
Bisect outcome: source diff between the 2026-04-14 working revision
(00203-brv → 22,005 memories) and current main (00204-92l → 10,227)
is whitespace-only (cargo fmt 2026-04-24 + clippy 2026-04-25). No
semantic change in store.rs, types.rs, or graph.rs. BrainMemory schema
is byte-identical. So the regression is environmental, surfacing
through a code path that has no observability today.
Two changes:
1. load_from_firestore() now emits per-collection counters so the next
deploy is diagnosable instead of a black box:
Hydrate brain_memories: considered=N accepted=M rejected_parse=K
First 5 parse errors are logged with the serde_json error so any
live schema drift surfaces immediately.
2. firestore_list MAX_PAGE_ERRORS raised 3 → 8. Hydration crosses ~75
pages of 300 docs each; 3 transient OAuth-refresh blips at the
wrong moment terminated the load at ~10K, consistent with the
reported 10,227 number. 8 still bounds runaway behaviour while
tolerating realistic blip rates.
The actual environmental cause is recoverable from one deploy with the
new logs in place. Until then, traffic stays on 00203-brv (which is
what the rollback already did).
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(router-core): HNSW result-heap inversion, prune drops oldest, k > ef_search (#430)
Three correctness bugs in crates/ruvector-router-core/src/index.rs that
together collapsed recall@1 at scale:
1. `Neighbor::Ord` is reversed so BinaryHeap acts as a min-heap. Correct
for `candidates` (pop closest unexplored first), but WRONG for the
`result` heap — peek returned the BEST candidate, so the eviction
path kept dropping the best item instead of the worst whenever the
set was full. Wrap result in `std::cmp::Reverse<Neighbor>` so
peek/pop return the furthest item (the actual eviction target). This
is the primary recall@1 fix.
2. Per-insert connection pruning used `truncate(m)`, which keeps the
OLDEST m connections — including dropping the just-pushed edge when
it landed past index m. Switch to `drain(0..len-m)` so the freshly
inserted edge always survives.
3. `search()` capped at `ef_search` regardless of caller's k. With
default ef_search=10 and k=25, results were silently 10. Raise ef
to `max(ef_search, k)` before invoking search_knn_internal.
New tests:
- `test_recall_at_1_with_biased_insertion_order`: 1024 vectors,
biased insertion order (the topology that historically exposed the
bug); asserts recall@1 ≥ 95% AND ≥ 80% distinct ids across queries.
- `test_k_exceeds_ef_search_default`: 50 vectors, default ef_search=10,
k=25; asserts 25 results returned.
All 19 router-core tests pass.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(npm): publish pipeline — dist/ guaranteed + dual ESM/CJS pi-brain (#462/#415/#376/#372)
@ruvector/pi-brain 0.1.1 → 0.1.2 (closes #462, #372):
* Add `prepack` hook so dist/ is always built before publish — tarballs
on 0.1.0/0.1.1 shipped without dist/ because `tsc` never ran.
* Add a second tsconfig (tsconfig.cjs.json) that emits CommonJS to
dist/cjs/ alongside the ESM build in dist/. A generated
dist/cjs/package.json carries {"type":"commonjs"} so Node treats
that subtree as CJS regardless of the package-level "type":"module".
* Expand the exports map with import + require + default conditions
so ruvector@0.2.x's CJS MCP server (Node 20.x, no require(ESM)
until 22.12) can require() the package. Add subpath exports for
./mcp and ./client.
* Verified locally: dist/cjs/index.js loads via `require()` and
dist/index.js loads via dynamic `import()`.
@ruvector/rvf-wasm 0.1.5 → 0.1.6 (closes #415):
* pkg/rvf_wasm.js contains ESM syntax (`import.meta.url`,
`export default`). The old exports map pointed `require` at this
file, which fails on every CJS consumer. Mark the package
explicitly `"type": "module"`, drop the `require` condition (the
`.mjs` build is the canonical one), and add a `./wasm` subpath for
consumers that want the raw bytes.
ruvector npm 0.2.25 (extends #376 mitigation):
* Add `prepack` mirroring `prepublishOnly` so `npm pack` (and CI
smoke tests that run pack) regenerate dist/ + run verify-dist.
Without this, `npm pack` skips prepublishOnly, masking
missing-dist regressions until publish.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(mcp): hooks_route_enhanced in-process — drop spawnSync (#463/#422)
The hooks_route_enhanced MCP tool shelled out via
execSync('npx ruvector hooks route-enhanced …', { timeout: 30000 })
which deterministically timed out: npx's package-resolution and
bin-launch overhead can spike past 30s on cold-cache machines, even
though the underlying work finishes in ~500ms. Callers got
deterministic `spawnSync /bin/sh ETIMEDOUT`.
The sibling hooks_route tool (reported as working in #463) uses
intel.route() directly. Mirror that pattern: call intel.route(), then
inline the same coverage-router + AST-parser signal enrichment the CLI
does. No subprocess, no timeout, no npx dependency.
Falls back gracefully when coverage-router or ast-parser aren't
installed (try/catch around each optional enhancement, same as the
CLI handler).
Co-Authored-By: claude-flow <ruv@ruv.net>
* ci: regression guard for 9 issues + fixes for 5 latent regressions it surfaced
New workflow .github/workflows/regression-guard.yml runs on every push +
PR. Each job pins one of these issue classes shut:
#437 reentrant-rwlock-double-write
Forbids `x.write()…x.(write|read)()` and `x.read()…x.write()` in
a single statement (parking_lot is non-reentrant). PCRE
backreference matches only same-lock cases.
#458 case-insensitive-collisions
Fails if `git ls-files` has any two paths that match after
lowercasing — Windows clones drop one of each silently.
#438 ruvector-core-no-avx512-builds-on-stable
cargo check ruvector-core with AND without the simd-avx512
feature so the AVX-512 gating doesn't regress.
#430 hnsw-recall-at-1
Runs the new recall@1 (biased insertion / 1024 vectors) test
and the k > ef_search test in release mode.
#462 / #376 npm-publish-pipeline
npm pack each shipped package and assert every entry referenced
by main/module/types/exports is actually inside the tarball.
#463 / #422 no-npx-execSync-in-mcp-server
Forbids execSync('npx ruvector …') anywhere in the MCP server.
#256 shell-injection-in-mcp-server
Flags any exec*/spawn* call that interpolates ${args.X} without
wrapping in sanitizeShellArg(...).
#267 no-systemtime-in-wasm-crates
Crates named *wasm* with ungated SystemTime::now / Instant::now
calls are rejected (the wasm32-unknown-unknown panic class).
#359 no-hardcoded-workspaces-paths
Devcontainer-only `/workspaces/ruvector` literals are banned
from .github/workflows, .claude/settings*, and scripts/publish/.
Adding the guard surfaced five real, already-present regressions of
these classes — fixed in this commit:
* crates/prime-radiant/src/coherence/engine.rs (3 sites):
self.stats.write().X = self.stats.read().X - 1 in the same
statement — exactly issue #437's shape on a different lock. Bind
the write guard once.
* crates/ruvector-wasm/src/lib.rs:465 (benchmark fn):
used std::time::Instant which panics on wasm32 (issue #267).
Switch to js_sys::Date::now().
* scripts/publish/publish-router-wasm.sh + check-and-publish-router-wasm.sh:
hardcoded /workspaces/ruvector paths (issue #359). Resolve REPO_ROOT
from BASH_SOURCE instead.
Co-Authored-By: claude-flow <ruv@ruv.net>
* ci: narrow scope of two guards to avoid pre-existing-debt false positives
After the first PR run two guards caught existing technical debt rather
than fresh regressions:
* no-npx-execSync-in-mcp-server flagged 10 other execSync('npx
ruvector …') sites (ast-analyze, coverage-route, graph-mincut,
security-scan, git-churn, …) which predate issue #463 and are a
distinct concern (some legitimately need subprocess). Narrow the
guard to the EXACT regression — execSync inside the
hooks_route_enhanced case body — using awk to extract that case's
body before grepping. Rename: no-npx-execSync-in-route-enhanced.
* npm-publish-pipeline failed at npm install (peer-dep ERESOLVE).
Add --legacy-peer-deps. The point of this guard is the tarball
content, not the install graph.
Co-Authored-By: claude-flow <ruv@ruv.net>
* style: cargo fmt --all (mechanical, pre-existing diffs on main + my new code)
Workspace had 11 files with rustfmt diffs predating this branch, plus
one new diff in store.rs from the hydration counters added in
|
||
|---|---|---|
| .. | ||
| benches | ||
| examples | ||
| fuzz | ||
| src | ||
| tests | ||
| ARCHITECTURE.md | ||
| Cargo.toml | ||
| README.md | ||
Ruvector Graph
A graph database with Cypher queries, hyperedges, and vector search -- all in one crate.
[dependencies]
ruvector-graph = "0.1.1"
Most graph databases make you choose: you can have relationships or vector search, a query language or raw traversals, pairwise edges or nothing. ruvector-graph gives you all of them together. Write familiar Cypher queries like Neo4j, attach vector embeddings to any node for semantic search, and model complex group relationships with hyperedges that connect three or more nodes at once. It runs on servers, in browsers via WASM, and across clusters with built-in RAFT consensus. Part of the RuVector ecosystem.
| ruvector-graph | Neo4j / Typical Graph DB | Vector DB + Custom Glue | |
|---|---|---|---|
| Query language | Full Cypher parser built-in | Cypher (Neo4j) or proprietary | No graph queries |
| Hyperedges | Native -- one edge connects N nodes | Pairwise only -- workarounds needed | Not applicable |
| Vector search | HNSW on every node, semantic similarity | Separate plugin or not available | Vectors only, no graph structure |
| SIMD acceleration | SimSIMD hardware-optimized ops | JVM-based | Varies |
| Browser / WASM | default-features = false, features = ["wasm"] |
Server only | Server only |
| Distributed | Built-in RAFT consensus + federation | Enterprise tier (paid) | Varies |
| Cost | Free, open source (MIT) | Community or paid license | Varies |
Key Features
| Feature | What It Does | Why It Matters |
|---|---|---|
| Cypher Engine | Parse and execute Cypher queries -- MATCH (a)-[:KNOWS]->(b) |
Use a query language you already know instead of raw traversal code |
| Hypergraph Model | Edges connect any number of nodes, not just pairs | Model meetings, co-authorships, reactions -- any group relationship -- natively |
| Vector Embeddings | Attach embeddings to nodes, run HNSW similarity search | Combine "who is connected to whom" with "what is semantically similar" |
| Property Graph | Rich JSON properties on every node and edge | Store real data on your graph elements, not just IDs |
| Label Indexes | Roaring bitmap indexes for fast label lookups | Filter millions of nodes by label in microseconds |
| SIMD Optimized | Hardware-accelerated distance calculations via SimSIMD | Faster vector operations without changing your code |
| Distributed Mode | RAFT consensus for multi-node deployments | Scale out without bolting on a separate coordination layer |
| Federation | Cross-cluster graph queries | Query across data centers as if they were one graph |
| Compression | ZSTD and LZ4 for storage | Smaller on disk without sacrificing read speed |
| WASM Compatible | Run in browsers with WebAssembly | Same graph engine on server and client |
Installation
[dependencies]
ruvector-graph = "0.1.1"
Feature Flags
[dependencies]
# Full feature set
ruvector-graph = { version = "0.1.1", features = ["full"] }
# Minimal WASM-compatible build
ruvector-graph = { version = "0.1.1", default-features = false, features = ["wasm"] }
# Distributed deployment
ruvector-graph = { version = "0.1.1", features = ["distributed"] }
Available features:
full(default): Complete feature set with all optimizationssimd: SIMD-optimized operationsstorage: Persistent storage with redbasync-runtime: Tokio async supportcompression: ZSTD/LZ4 compressiondistributed: RAFT consensus supportfederation: Cross-cluster federationwasm: WebAssembly-compatible minimal buildmetrics: Prometheus monitoring
Quick Start
Create a Graph
use ruvector_graph::{Graph, Node, Edge, GraphConfig};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create a new graph
let config = GraphConfig::default();
let graph = Graph::new(config)?;
// Create nodes
let alice = graph.create_node(Node {
labels: vec!["Person".to_string()],
properties: serde_json::json!({
"name": "Alice",
"age": 30
}),
..Default::default()
})?;
let bob = graph.create_node(Node {
labels: vec!["Person".to_string()],
properties: serde_json::json!({
"name": "Bob",
"age": 25
}),
..Default::default()
})?;
// Create relationship
graph.create_edge(Edge {
label: "KNOWS".to_string(),
source: alice.id,
target: bob.id,
properties: serde_json::json!({
"since": 2020
}),
..Default::default()
})?;
Ok(())
}
Cypher Queries
use ruvector_graph::{Graph, CypherExecutor};
// Execute Cypher query
let executor = CypherExecutor::new(&graph);
let results = executor.execute("
MATCH (p:Person)-[:KNOWS]->(friend:Person)
WHERE p.name = 'Alice'
RETURN friend.name AS name, friend.age AS age
")?;
for row in results {
println!("Friend: {} (age {})", row["name"], row["age"]);
}
Vector-Enhanced Graph
use ruvector_graph::{Graph, VectorConfig};
// Enable vector embeddings on nodes
let config = GraphConfig {
vector_config: Some(VectorConfig {
dimensions: 384,
distance_metric: DistanceMetric::Cosine,
..Default::default()
}),
..Default::default()
};
let graph = Graph::new(config)?;
// Create node with embedding
let node = graph.create_node(Node {
labels: vec!["Document".to_string()],
properties: serde_json::json!({"title": "Introduction to Graphs"}),
embedding: Some(vec![0.1, 0.2, 0.3, /* ... 384 dims */]),
..Default::default()
})?;
// Semantic similarity search
let similar = graph.search_similar_nodes(
vec![0.1, 0.2, 0.3, /* query vector */],
10, // top-k
Some(vec!["Document".to_string()]), // filter by labels
)?;
Hyperedges
use ruvector_graph::{Graph, Hyperedge};
// Create a hyperedge connecting multiple nodes
let meeting = graph.create_hyperedge(Hyperedge {
label: "PARTICIPATED_IN".to_string(),
nodes: vec![alice.id, bob.id, charlie.id],
properties: serde_json::json!({
"event": "Team Meeting",
"date": "2024-01-15"
}),
..Default::default()
})?;
API Overview
Core Types
// Node in the graph
pub struct Node {
pub id: NodeId,
pub labels: Vec<String>,
pub properties: serde_json::Value,
pub embedding: Option<Vec<f32>>,
}
// Edge connecting two nodes
pub struct Edge {
pub id: EdgeId,
pub label: String,
pub source: NodeId,
pub target: NodeId,
pub properties: serde_json::Value,
}
// Hyperedge connecting multiple nodes
pub struct Hyperedge {
pub id: HyperedgeId,
pub label: String,
pub nodes: Vec<NodeId>,
pub properties: serde_json::Value,
}
Graph Operations
impl Graph {
// Node operations
pub fn create_node(&self, node: Node) -> Result<Node>;
pub fn get_node(&self, id: &NodeId) -> Result<Option<Node>>;
pub fn update_node(&self, node: Node) -> Result<Node>;
pub fn delete_node(&self, id: &NodeId) -> Result<bool>;
// Edge operations
pub fn create_edge(&self, edge: Edge) -> Result<Edge>;
pub fn get_edge(&self, id: &EdgeId) -> Result<Option<Edge>>;
pub fn delete_edge(&self, id: &EdgeId) -> Result<bool>;
// Traversal
pub fn neighbors(&self, id: &NodeId, direction: Direction) -> Result<Vec<Node>>;
pub fn traverse(&self, start: &NodeId, config: TraversalConfig) -> Result<Vec<Path>>;
// Vector search
pub fn search_similar_nodes(&self, query: Vec<f32>, k: usize, labels: Option<Vec<String>>) -> Result<Vec<Node>>;
}
Performance
Benchmarks (1M Nodes, 10M Edges)
Operation Latency (p50) Throughput
-----------------------------------------------------
Node lookup ~0.1ms 100K ops/s
Edge traversal ~0.5ms 50K ops/s
1-hop neighbors ~1ms 20K ops/s
Cypher simple query ~5ms 5K ops/s
Vector similarity ~2ms 10K ops/s
Related Crates
- ruvector-core - Core vector database engine
- ruvector-graph-node - Node.js bindings
- ruvector-graph-wasm - WebAssembly bindings
- ruvector-raft - RAFT consensus for distributed mode
- ruvector-cluster - Clustering and sharding
Documentation
- RuVector README - Complete project overview
- API Documentation - Full API reference
- GitHub Repository - Source code
License
MIT License - see LICENSE for details.