mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-22 19:56:25 +00:00

History

rUv bc3a9b1c93 fix: 9-issue cleanup batch + regression-guard CI workflow (#466 ) * fix: batch 1 — deadlock, AVX-512 gating, Windows case-collisions Closes #437: VectorDb::delete in ruvector-router-core acquired the stats RwLock twice in one statement. parking_lot::RwLock is non-reentrant, so the second .write() deadlocked against the first guard's lifetime. Bind the guard once. Closes #438: Gate AVX-512 intrinsics behind a new `simd-avx512` Cargo feature (default-on). Lets downstream consumers on stable Rust 1.77–1.88 (before avx512f stabilization in 1.89) opt out without forcing nightly: cargo build --no-default-features --features simd,storage,hnsw,api-embeddings,parallel Runtime dispatch falls back to AVX2 + FMA when the feature is disabled. All 4 #[target_feature(enable = "avx512f")] sites + 4 dispatch branches updated. Both feature configurations verified to compile cleanly; all 18 simd_intrinsics tests pass. Closes #458: Rename two pairs of case-colliding research artifacts under docs/research/claude-code-rvsource/versions/v2.1.x/tree/react_memo_cache_sentinel/ that broke `git clone` on Windows/NTFS: tmux.js → tmux_lc.js (TMUX.js kept) type.js → type_lc.js (Type.js kept) modules-manifest.json updated to match. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(brain): observable hydration + larger page-error budget (issue #464) Bisect outcome: source diff between the 2026-04-14 working revision (00203-brv → 22,005 memories) and current main (00204-92l → 10,227) is whitespace-only (cargo fmt 2026-04-24 + clippy 2026-04-25). No semantic change in store.rs, types.rs, or graph.rs. BrainMemory schema is byte-identical. So the regression is environmental, surfacing through a code path that has no observability today. Two changes: 1. load_from_firestore() now emits per-collection counters so the next deploy is diagnosable instead of a black box: Hydrate brain_memories: considered=N accepted=M rejected_parse=K First 5 parse errors are logged with the serde_json error so any live schema drift surfaces immediately. 2. firestore_list MAX_PAGE_ERRORS raised 3 → 8. Hydration crosses ~75 pages of 300 docs each; 3 transient OAuth-refresh blips at the wrong moment terminated the load at ~10K, consistent with the reported 10,227 number. 8 still bounds runaway behaviour while tolerating realistic blip rates. The actual environmental cause is recoverable from one deploy with the new logs in place. Until then, traffic stays on 00203-brv (which is what the rollback already did). Co-Authored-By: claude-flow <ruv@ruv.net> * fix(router-core): HNSW result-heap inversion, prune drops oldest, k > ef_search (#430) Three correctness bugs in crates/ruvector-router-core/src/index.rs that together collapsed recall@1 at scale: 1. `Neighbor::Ord` is reversed so BinaryHeap acts as a min-heap. Correct for `candidates` (pop closest unexplored first), but WRONG for the `result` heap — peek returned the BEST candidate, so the eviction path kept dropping the best item instead of the worst whenever the set was full. Wrap result in `std::cmp::Reverse<Neighbor>` so peek/pop return the furthest item (the actual eviction target). This is the primary recall@1 fix. 2. Per-insert connection pruning used `truncate(m)`, which keeps the OLDEST m connections — including dropping the just-pushed edge when it landed past index m. Switch to `drain(0..len-m)` so the freshly inserted edge always survives. 3. `search()` capped at `ef_search` regardless of caller's k. With default ef_search=10 and k=25, results were silently 10. Raise ef to `max(ef_search, k)` before invoking search_knn_internal. New tests: - `test_recall_at_1_with_biased_insertion_order`: 1024 vectors, biased insertion order (the topology that historically exposed the bug); asserts recall@1 ≥ 95% AND ≥ 80% distinct ids across queries. - `test_k_exceeds_ef_search_default`: 50 vectors, default ef_search=10, k=25; asserts 25 results returned. All 19 router-core tests pass. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(npm): publish pipeline — dist/ guaranteed + dual ESM/CJS pi-brain (#462/#415/#376/#372) @ruvector/pi-brain 0.1.1 → 0.1.2 (closes #462, #372): * Add `prepack` hook so dist/ is always built before publish — tarballs on 0.1.0/0.1.1 shipped without dist/ because `tsc` never ran. * Add a second tsconfig (tsconfig.cjs.json) that emits CommonJS to dist/cjs/ alongside the ESM build in dist/. A generated dist/cjs/package.json carries {"type":"commonjs"} so Node treats that subtree as CJS regardless of the package-level "type":"module". * Expand the exports map with import + require + default conditions so ruvector@0.2.x's CJS MCP server (Node 20.x, no require(ESM) until 22.12) can require() the package. Add subpath exports for ./mcp and ./client. * Verified locally: dist/cjs/index.js loads via `require()` and dist/index.js loads via dynamic `import()`. @ruvector/rvf-wasm 0.1.5 → 0.1.6 (closes #415): * pkg/rvf_wasm.js contains ESM syntax (`import.meta.url`, `export default`). The old exports map pointed `require` at this file, which fails on every CJS consumer. Mark the package explicitly `"type": "module"`, drop the `require` condition (the `.mjs` build is the canonical one), and add a `./wasm` subpath for consumers that want the raw bytes. ruvector npm 0.2.25 (extends #376 mitigation): * Add `prepack` mirroring `prepublishOnly` so `npm pack` (and CI smoke tests that run pack) regenerate dist/ + run verify-dist. Without this, `npm pack` skips prepublishOnly, masking missing-dist regressions until publish. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(mcp): hooks_route_enhanced in-process — drop spawnSync (#463/#422) The hooks_route_enhanced MCP tool shelled out via execSync('npx ruvector hooks route-enhanced …', { timeout: 30000 }) which deterministically timed out: npx's package-resolution and bin-launch overhead can spike past 30s on cold-cache machines, even though the underlying work finishes in ~500ms. Callers got deterministic `spawnSync /bin/sh ETIMEDOUT`. The sibling hooks_route tool (reported as working in #463) uses intel.route() directly. Mirror that pattern: call intel.route(), then inline the same coverage-router + AST-parser signal enrichment the CLI does. No subprocess, no timeout, no npx dependency. Falls back gracefully when coverage-router or ast-parser aren't installed (try/catch around each optional enhancement, same as the CLI handler). Co-Authored-By: claude-flow <ruv@ruv.net> * ci: regression guard for 9 issues + fixes for 5 latent regressions it surfaced New workflow .github/workflows/regression-guard.yml runs on every push + PR. Each job pins one of these issue classes shut: #437 reentrant-rwlock-double-write Forbids `x.write()…x.(write\|read)()` and `x.read()…x.write()` in a single statement (parking_lot is non-reentrant). PCRE backreference matches only same-lock cases. #458 case-insensitive-collisions Fails if `git ls-files` has any two paths that match after lowercasing — Windows clones drop one of each silently. #438 ruvector-core-no-avx512-builds-on-stable cargo check ruvector-core with AND without the simd-avx512 feature so the AVX-512 gating doesn't regress. #430 hnsw-recall-at-1 Runs the new recall@1 (biased insertion / 1024 vectors) test and the k > ef_search test in release mode. #462 / #376 npm-publish-pipeline npm pack each shipped package and assert every entry referenced by main/module/types/exports is actually inside the tarball. #463 / #422 no-npx-execSync-in-mcp-server Forbids execSync('npx ruvector …') anywhere in the MCP server. #256 shell-injection-in-mcp-server Flags any exec/spawn call that interpolates ${args.X} without wrapping in sanitizeShellArg(...). #267 no-systemtime-in-wasm-crates Crates named wasm with ungated SystemTime::now / Instant::now calls are rejected (the wasm32-unknown-unknown panic class). #359 no-hardcoded-workspaces-paths Devcontainer-only `/workspaces/ruvector` literals are banned from .github/workflows, .claude/settings, and scripts/publish/. Adding the guard surfaced five real, already-present regressions of these classes — fixed in this commit: crates/prime-radiant/src/coherence/engine.rs (3 sites): self.stats.write().X = self.stats.read().X - 1 in the same statement — exactly issue #437's shape on a different lock. Bind the write guard once. * crates/ruvector-wasm/src/lib.rs:465 (benchmark fn): used std::time::Instant which panics on wasm32 (issue #267). Switch to js_sys::Date::now(). * scripts/publish/publish-router-wasm.sh + check-and-publish-router-wasm.sh: hardcoded /workspaces/ruvector paths (issue #359). Resolve REPO_ROOT from BASH_SOURCE instead. Co-Authored-By: claude-flow <ruv@ruv.net> * ci: narrow scope of two guards to avoid pre-existing-debt false positives After the first PR run two guards caught existing technical debt rather than fresh regressions: * no-npx-execSync-in-mcp-server flagged 10 other execSync('npx ruvector …') sites (ast-analyze, coverage-route, graph-mincut, security-scan, git-churn, …) which predate issue #463 and are a distinct concern (some legitimately need subprocess). Narrow the guard to the EXACT regression — execSync inside the hooks_route_enhanced case body — using awk to extract that case's body before grepping. Rename: no-npx-execSync-in-route-enhanced. * npm-publish-pipeline failed at npm install (peer-dep ERESOLVE). Add --legacy-peer-deps. The point of this guard is the tarball content, not the install graph. Co-Authored-By: claude-flow <ruv@ruv.net> * style: cargo fmt --all (mechanical, pre-existing diffs on main + my new code) Workspace had 11 files with rustfmt diffs predating this branch, plus one new diff in store.rs from the hydration counters added in `97c07520d`. Running `cargo fmt --all` brings them all in line so the Rustfmt CI job passes on this branch. No semantic changes — pure whitespace. Co-Authored-By: claude-flow <ruv@ruv.net> * ci+build: isolate npm pack from workspace + fix ruvector build mkdir CI regression-guard's npm-publish-pipeline failed because pi-brain and ruvector both live inside the npm workspace at npm/package.json, whose other workspace members declare cross-platform native binaries (e.g. router-darwin-arm64). Running `npm install` from a package directory still walks the workspace and rejects EBADPLATFORM on the wrong-host binary. Fix: copy each package to a workspace-free /tmp dir, strip its lockfile, and install with --no-workspaces. The point of this guard is the tarball content, so isolating from the workspace doesn't reduce coverage. Also fixes ruvector's `build` script — it copy'd a file into dist/core/onnx/pkg/ without `mkdir -p` first, so the build crashed on any fresh install. Now: `tsc && mkdir -p dist/core/onnx/pkg && cp ...`. Verified locally: both pi-brain (8.9 kB, 15 files) and ruvector (826 kB, 134 files) pack cleanly with the new flow. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(ci): bump rkyv to 0.8.16 (RUSTSEC-2026-0122) + downgrade clippy on research crates Three CI failures left after the previous push: * cargo-deny / cargo-audit — RUSTSEC-2026-0122: rkyv 0.8.15 InlineVec::clear / SerVec::clear are not panic-safe → potential use-after-free / double-free via catch_unwind. Solution per the advisory: `cargo update -p rkyv`. Bumps rkyv 0.8.15 → 0.8.16 and rkyv_derive 0.8.15 → 0.8.16, pulls in hashbrown 0.17.1. Verified that ruvector-core + ruvector-hailo + ruvector-hailo-cluster (the rkyv consumers) all still cargo-check clean. * Clippy (workspace, deny warnings) — 12 stylistic clippy errors in ruvllm_sparse_attention (subquadratic attention research crate) and 11 more in ruvllm_retrieval_diffusion (training-free retrieval LM). The lints flagged: needless_range_loop, if_same_then_else, derivable_impls, redundant_closure, iter_cloned_collect, doc_lazy_continuation, unusual_byte_groupings, needless_lifetimes. None affect correctness — these are research-tier crates where the explicit indexing style is intentional. Add a per-crate `[lints.clippy]` section in each Cargo.toml downgrading the flagged lints to `allow`. The workspace-level `-D warnings` stays strict for every other crate. clippy --fix also auto-rewrote two minor sites in ruvllm_sparse_attention/examples/{sparse_mario,esp32s3_smoke}.rs that were stylistic improvements; kept those. Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: ruvnet <ruvnet@gmail.com>		2026-05-16 12:14:49 -04:00
..
benchmark	chore: reorganize scripts into subfolders	2026-01-21 23:48:37 -05:00
build	feat: Update NAPI-RS bindings with new capabilities (v0.1.15)	2025-11-26 18:47:48 +00:00
ci	chore: reorganize scripts into subfolders	2026-01-21 23:48:37 -05:00
deploy	chore: reorganize scripts into subfolders	2026-01-21 23:48:37 -05:00
lib	fix(decompiler): statement-boundary splitting — 14/14 modules now parse (was 2/17)	2026-04-03 11:50:34 +00:00
patches/hnsw_rs	Add WebAssembly binary and TypeScript definitions for rvlite	2025-12-25 19:50:53 +00:00
publish	fix: 9-issue cleanup batch + regression-guard CI workflow (#466 )	2026-05-16 12:14:49 -04:00
test	chore: reorganize scripts into subfolders	2026-01-21 23:48:37 -05:00
training	feat(training): source map extraction + v2 model (83.67% val accuracy)	2026-04-03 04:57:47 +00:00
validate	chore: reorganize scripts into subfolders	2026-01-21 23:48:37 -05:00
analyze-evolution.js	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
analyze-ham10000.js	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
build-solver.sh	feat: Implement complete sublinear-time sparse solver crate	2026-02-20 06:49:14 +00:00
check_brain_status.sh	feat: add brain training and status scripts for pi.ruv.io	2026-03-16 23:14:43 -04:00
claude-code-decompile.sh	feat(sse): decouple SSE to mcp.pi.ruv.io proxy + Claude Code source research	2026-04-02 23:39:56 +00:00
claude-code-rvf-corpus.sh	feat(sse): decouple SSE to mcp.pi.ruv.io proxy + Claude Code source research	2026-04-02 23:39:56 +00:00
create-brainpedia.py	fix: ruvector-postgres v0.3.1 — audit bug fixes, 46 SQL functions, Docker publish (#227 )	2026-03-03 12:53:10 -05:00
deploy-crawl-phase1.sh	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
deploy-dragnes.sh	feat(sse): decouple SSE to mcp.pi.ruv.io proxy + Claude Code source research	2026-04-02 23:39:56 +00:00
deploy-gemini-agents.sh	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
deploy-wet-job.sh	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
deploy_brain_services.sh	feat(brain): ADR-130 service split — SSE proxy, worker, internal queue	2026-03-30 11:54:01 -04:00
deploy_trainer.sh	feat: update ADR-093 + add deploy_trainer.sh for Cloud Run scheduling	2026-03-16 23:14:43 -04:00
discover_and_train.sh	feat: discover ↔ train feedback loop with live API discovery	2026-03-16 23:16:24 -04:00
gemini-agents.js	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
generate-rvf-manifest.py	fix: ruvector-postgres v0.3.1 — audit bug fixes, 46 SQL functions, Docker publish (#227 )	2026-03-03 12:53:10 -05:00
historical-crawl-import.sh	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
publish-rvf.sh	feat(rvf): RuVector Format — Universal Cognitive Container SDK (#166 )	2026-02-14 13:14:49 -05:00
README.md	chore: reorganize scripts into subfolders	2026-01-21 23:48:37 -05:00
rebuild-all-versions.mjs	feat(decompiler): rebuild all versions — organized source/rvf separation, 100% coverage	2026-04-03 03:18:41 +00:00
run_mincut_bench.sh	feat: Add min-cut gating experiment scaffolding (WIP)	2026-02-20 06:52:43 +00:00
seed-brain-all.py	fix: ruvector-postgres v0.3.1 — audit bug fixes, 46 SQL functions, Docker publish (#227 )	2026-03-03 12:53:10 -05:00
seed-brain.rs	fix: ruvector-postgres v0.3.1 — audit bug fixes, 46 SQL functions, Docker publish (#227 )	2026-03-03 12:53:10 -05:00
seed-specialized.py	fix: ruvector-postgres v0.3.1 — audit bug fixes, 46 SQL functions, Docker publish (#227 )	2026-03-03 12:53:10 -05:00
setup-gcs-examples.sh	fix: ruvector-postgres v0.3.1 — audit bug fixes, 46 SQL functions, Docker publish (#227 )	2026-03-03 12:53:10 -05:00
sql-audit-v3.sql	fix: ruvector-postgres v0.3.2 — 100% audit pass (HNSW + hybrid fixes) (#230 )	2026-03-03 13:21:48 -05:00
swarm_train_15.sh	feat: 15-agent concurrent discovery swarm with 12 new data sources	2026-03-16 23:16:24 -04:00
sync-lockfile.sh	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00
train-lora.py	fix: ruvector-postgres v0.3.1 — audit bug fixes, 46 SQL functions, Docker publish (#227 )	2026-03-03 12:53:10 -05:00
train_brain.sh	feat: brain trainer core module + auth fix — 56 discoveries ingested	2026-03-16 23:14:43 -04:00
training_orchestrator.sh	update: training orchestrator with improved PII stripping and color output	2026-03-16 23:21:01 -04:00
upvote_memories.py	fix(brain): defer sparsifier build on startup for large graphs	2026-03-24 12:29:52 +00:00
vote-boost.py	fix: ruvector-postgres v0.3.1 — audit bug fixes, 46 SQL functions, Docker publish (#227 )	2026-03-03 12:53:10 -05:00
wet-filter-inject.js	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
wet-full-import.sh	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
wet-job.yaml	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
wet-orchestrate.sh	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00
wet-processor.sh	feat: DrAgnes + Common Crawl WET + Gemini grounding agents (#282 )	2026-03-23 10:12:50 -04:00

README.md

RuVector Automation Scripts

This directory contains automation scripts organized by purpose.

📁 Directory Structure

scripts/
├── README.md           # This file
├── benchmark/          # Performance benchmarking
├── build/              # Build utilities
├── ci/                 # CI/CD automation
├── deploy/             # Deployment scripts
├── patches/            # Patch files
├── publish/            # Package publishing
├── test/               # Testing scripts
└── validate/           # Validation & verification

🚀 Deployment

Scripts for deploying to production.

Script	Description
`deploy/deploy.sh`	Comprehensive deployment (crates.io + npm)
`deploy/test-deploy.sh`	Test deployment without publishing
`deploy/DEPLOYMENT.md`	Full deployment documentation
`deploy/DEPLOYMENT-QUICKSTART.md`	Quick deployment guide

Usage:

# Full deployment
./scripts/deploy/deploy.sh

# Dry run
./scripts/deploy/deploy.sh --dry-run

# Test deployment
./scripts/deploy/test-deploy.sh

📦 Publishing

Scripts for publishing packages to registries.

Script	Description
`publish/publish-all.sh`	Publish all packages
`publish/publish-crates.sh`	Publish Rust crates to crates.io
`publish/publish-cli.sh`	Publish CLI package
`publish/publish-router-wasm.sh`	Publish router WASM package
`publish/check-and-publish-router-wasm.sh`	Check and publish router WASM

Usage:

# Set credentials first
export CRATES_API_KEY="your-crates-io-token"
export NPM_TOKEN="your-npm-token"

# Publish all
./scripts/publish/publish-all.sh

# Publish crates only
./scripts/publish/publish-crates.sh

📊 Benchmarking

Performance benchmarking scripts.

Script	Description
`benchmark/run_benchmarks.sh`	Run core benchmarks
`benchmark/run_llm_benchmarks.sh`	Run LLM inference benchmarks

Usage:

# Run core benchmarks
./scripts/benchmark/run_benchmarks.sh

# Run LLM benchmarks
./scripts/benchmark/run_llm_benchmarks.sh

🧪 Testing

Testing and validation scripts.

Script	Description
`test/test-wasm.mjs`	Test WASM bindings
`test/test-graph-cli.sh`	Test graph CLI commands
`test/test-all-graph-commands.sh`	Test all graph commands
`test/test-docker-package.sh`	Test Docker packaging

Usage:

# Test WASM
node ./scripts/test/test-wasm.mjs

# Test graph CLI
./scripts/test/test-graph-cli.sh

✅ Validation

Package and build verification scripts.

Script	Description
`validate/validate-packages.sh`	Validate package configs
`validate/validate-packages-simple.sh`	Simple package validation
`validate/verify-paper-impl.sh`	Verify paper implementation
`validate/verify_hnsw_build.sh`	Verify HNSW build

Usage:

# Validate packages
./scripts/validate/validate-packages.sh

# Verify HNSW
./scripts/validate/verify_hnsw_build.sh

🔄 CI/CD

Continuous integration scripts.

Script	Description
`ci/ci-sync-lockfile.sh`	Auto-fix lock files in CI
`ci/sync-lockfile.sh`	Sync package-lock.json
`ci/install-hooks.sh`	Install git hooks

Usage:

# Install git hooks (recommended)
./scripts/ci/install-hooks.sh

# Sync lockfile
./scripts/ci/sync-lockfile.sh

🛠️ Build

Build utility scripts located in build/.

🩹 Patches

Patch files for dependencies located in patches/.

🚀 Quick Start

For Development

Install git hooks (recommended):
```
./scripts/ci/install-hooks.sh
```
Run tests:
```
./scripts/test/test-wasm.mjs
```

For Deployment

Set credentials:

export CRATES_API_KEY="your-crates-io-token"
export NPM_TOKEN="your-npm-token"

Dry run first:
```
./scripts/deploy/deploy.sh --dry-run
```
Deploy:
```
./scripts/deploy/deploy.sh
```

🔐 Security

Never commit credentials! Always use environment variables or .env file.

See deploy/DEPLOYMENT.md for security best practices.