mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-23 12:55:26 +00:00
Three remaining root causes from issue #430, plus the storage-rebuild gap from PR #460. Bug B — insert beam was clamped to ef_construction.min(m * 2). With defaults (m=16, ef_construction=200) the beam silently became 32. Late- inserted clusters got wired through whatever was near the entry point instead of through ef_construction-wide neighbour search. Bug C — adjacency-list pruning used `drain(0..drain_count)`, dropping the OLDEST edges regardless of distance. Proper HNSW pruning keeps the m CLOSEST edges. Now sort by `calculate_distance` to the anchor vector and truncate to m. Kept a fallback that preserves the newest-m behaviour when the anchor vector lookup fails so we never panic on a missing vector. Storage — VectorDB::new() always created a fresh empty HnswIndex, so previously persisted vectors were invisible to search after reopening the database. Now rebuild via storage.get_all_ids() + index.insert_batch() on open, and seed VectorDbStats.total_vectors with the recovered count. Tests: - test_pruning_keeps_closest_not_newest: builds a hub with 20 close neighbours then 6 far neighbours, asserts no "far_*" id appears in top-10 around the hub. Fails on FIFO pruning. - test_index_rebuilt_from_storage_on_open: writes 5 vectors via one VectorDB instance, reopens against the same path, asserts search returns the persisted match. Fails on the historical empty-index bug. Regression-guard CI additions: - hnsw-insert-beam-no-m2-clamp: textually forbids the ef_construction.min(m*2) pattern in index.rs. - hnsw-distance-based-neighbor-pruning: requires calculate_distance and the `> m * 2` overflow gate to both live in index.rs. - vector-db-rebuilds-index-on-open: requires storage.get_all_ids() in vector_db.rs. - hnsw-recall-at-1 job now also runs the two new tests. Supersedes PR #460 (CoolDude1969) which covered storage rebuild + an overlapping heap fix already in main from PR #466. Closes #430. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|---|---|---|
| .. | ||
| benchmarks | ||
| workflows | ||