mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-23 12:55:26 +00:00
Three remaining root causes from issue #430, plus the storage-rebuild gap from PR #460. Bug B — insert beam was clamped to ef_construction.min(m * 2). With defaults (m=16, ef_construction=200) the beam silently became 32. Late- inserted clusters got wired through whatever was near the entry point instead of through ef_construction-wide neighbour search. Bug C — adjacency-list pruning used `drain(0..drain_count)`, dropping the OLDEST edges regardless of distance. Proper HNSW pruning keeps the m CLOSEST edges. Now sort by `calculate_distance` to the anchor vector and truncate to m. Kept a fallback that preserves the newest-m behaviour when the anchor vector lookup fails so we never panic on a missing vector. Storage — VectorDB::new() always created a fresh empty HnswIndex, so previously persisted vectors were invisible to search after reopening the database. Now rebuild via storage.get_all_ids() + index.insert_batch() on open, and seed VectorDbStats.total_vectors with the recovered count. Tests: - test_pruning_keeps_closest_not_newest: builds a hub with 20 close neighbours then 6 far neighbours, asserts no "far_*" id appears in top-10 around the hub. Fails on FIFO pruning. - test_index_rebuilt_from_storage_on_open: writes 5 vectors via one VectorDB instance, reopens against the same path, asserts search returns the persisted match. Fails on the historical empty-index bug. Regression-guard CI additions: - hnsw-insert-beam-no-m2-clamp: textually forbids the ef_construction.min(m*2) pattern in index.rs. - hnsw-distance-based-neighbor-pruning: requires calculate_distance and the `> m * 2` overflow gate to both live in index.rs. - vector-db-rebuilds-index-on-open: requires storage.get_all_ids() in vector_db.rs. - hnsw-recall-at-1 job now also runs the two new tests. Supersedes PR #460 (CoolDude1969) which covered storage rebuild + an overlapping heap fix already in main from PR #466. Closes #430. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|---|---|---|
| .. | ||
| agentic-synth-ci.yml | ||
| benchmarks.yml | ||
| build-attention.yml | ||
| build-diskann.yml | ||
| build-gnn.yml | ||
| build-graph-node.yml | ||
| build-graph-transformer.yml | ||
| build-native.yml | ||
| build-router.yml | ||
| build-rvf-node.yml | ||
| build-tiny-dancer.yml | ||
| build-verified.yml | ||
| ci.yml | ||
| clippy-fmt.yml | ||
| copilot-setup-steps.yml | ||
| docker-publish.yml | ||
| edge-net-models.yml | ||
| hailo-backend-audit.yml | ||
| hooks-ci.yml | ||
| mirror-rulake.yml | ||
| postgres-extension-ci.yml | ||
| publish-all.yml | ||
| regression-guard.yml | ||
| RELEASE-FLOW.md | ||
| release-rvf-cli.yml | ||
| RELEASE.md | ||
| release.yml | ||
| ruvector-postgres-ci.yml | ||
| ruvllm-benchmarks.yml | ||
| ruvllm-build.yml | ||
| ruvllm-esp32-firmware.yml | ||
| ruvllm-native.yml | ||
| ruvltra-tests.yml | ||
| sona-napi.yml | ||
| sync-rvf-examples.yml | ||
| thermorust-ci.yml | ||
| ui-ci.yml | ||
| validate-lockfile.yml | ||
| wasm-dedup-check.yml | ||