ruvector/.github
ruvnet d5e07f6e6d fix(ruvector-router-core): #430 HNSW insert beam + distance-based pruning + storage rebuild
Three remaining root causes from issue #430, plus the storage-rebuild gap from PR #460.

  Bug B — insert beam was clamped to ef_construction.min(m * 2). With defaults
          (m=16, ef_construction=200) the beam silently became 32. Late-
          inserted clusters got wired through whatever was near the entry
          point instead of through ef_construction-wide neighbour search.

  Bug C — adjacency-list pruning used `drain(0..drain_count)`, dropping the
          OLDEST edges regardless of distance. Proper HNSW pruning keeps the
          m CLOSEST edges. Now sort by `calculate_distance` to the anchor
          vector and truncate to m. Kept a fallback that preserves the
          newest-m behaviour when the anchor vector lookup fails so we
          never panic on a missing vector.

  Storage — VectorDB::new() always created a fresh empty HnswIndex, so
            previously persisted vectors were invisible to search after
            reopening the database. Now rebuild via storage.get_all_ids()
            + index.insert_batch() on open, and seed VectorDbStats.total_vectors
            with the recovered count.

Tests:
  - test_pruning_keeps_closest_not_newest: builds a hub with 20 close
    neighbours then 6 far neighbours, asserts no "far_*" id appears in
    top-10 around the hub. Fails on FIFO pruning.
  - test_index_rebuilt_from_storage_on_open: writes 5 vectors via one
    VectorDB instance, reopens against the same path, asserts search
    returns the persisted match. Fails on the historical empty-index bug.

Regression-guard CI additions:
  - hnsw-insert-beam-no-m2-clamp: textually forbids the ef_construction.min(m*2)
    pattern in index.rs.
  - hnsw-distance-based-neighbor-pruning: requires calculate_distance and the
    `> m * 2` overflow gate to both live in index.rs.
  - vector-db-rebuilds-index-on-open: requires storage.get_all_ids() in
    vector_db.rs.
  - hnsw-recall-at-1 job now also runs the two new tests.

Supersedes PR #460 (CoolDude1969) which covered storage rebuild + an
overlapping heap fix already in main from PR #466.

Closes #430.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-18 16:30:32 -04:00
..
benchmarks feat: Add Neo4j-compatible hypergraph database package (ruvector-graph) 2025-11-25 23:11:54 +00:00
workflows fix(ruvector-router-core): #430 HNSW insert beam + distance-based pruning + storage rebuild 2026-05-18 16:30:32 -04:00