ruvector

mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-30 03:53:34 +00:00

Author	SHA1	Message	Date
ruvnet	91c4e79879	deploy(hailo): cross-build script — mention iter-215 ruvllm-bridge installer (iter 225) iter-215 added `install-ruvllm-bridge.sh` (closing ADR-178 Gap A's deploy-artifact gap for the third bridge). cross-build-bridges.sh already cross-compiles `ruvllm-bridge` (line 36's BINS array, since iter 122/128), but its trailing operator-hint at lines 141-145 only named the two daemon bridges' installers — operators copying the hint missed that ruvllm-bridge has its own installer too. Updated the hint to: - List all three installers - Note ruvllm-bridge ships no systemd unit (subprocess lifecycle, iter-215 design rationale) - Use the conventional "pick the bridges you need" phrasing, since most deploys won't use all three Validated: - bash -n on the script: parses clean - All three install-*.sh referenced exist (iter-216 verified the rename + file presence) Pure deploy-script docs hygiene; no code or unit-file change. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 22:44:50 -04:00
ruvnet	f6cae8114c	ci(hailo): mirror deny.toml advisory ignores into cargo-audit (iter 224) iter-219's workspace re-inclusion (closing ADR-178 Gap E) had a foreseeable-but-unspotted side effect on the iter-178 audit workflow: pre-iter-219 the hailo cluster crate had its own narrower Cargo.lock, so `cargo audit --deny warnings` saw only the deps that crate directly pulled in. Post-iter-219 with the workspace lock, cargo-audit reads the wider tree and surfaces three advisories that deny.toml had already ignored (iter 177 + iter 219): RUSTSEC-2024-0436 paste (unmaintained, transitive via candle/cpu-fallback) RUSTSEC-2025-0134 rustls-pemfile (transitive via tonic-tls) RUSTSEC-2025-0141 bincode 1.x (workspace-wide pin via rkyv et al.) cargo-audit and cargo-deny use separate config — deny.toml's [advisories] ignore list isn't honored by cargo-audit. The fix is to mirror the same three IDs into the CI workflow's `cargo audit` invocation as `--ignore` flags. Verified locally: Pre-fix: cargo audit --deny warnings → "error: 3 denied warnings" Post-fix: cargo audit --deny warnings --ignore <three> → exit 0 Each `--ignore` carries a backtick-comment naming the package + why it's transitive — same rationale as the deny.toml entries so the two config sources drift together if someone updates one. This isn't a real new vulnerability — these advisories existed in the workspace tree all along; iter-219 just exposed them to the cluster-crate audit step. iter-178's CI gate stays green without weakening; the substantive remediation (workspace-wide rkyv / candle-stack updates) belongs to a workspace-wide cleanup iter. No code change; CI config + workflow comment. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 22:39:47 -04:00
ruvnet	e953e29506	docs(hailo): fix two stale-stratigraphy doc comments (iter 223) Same class as ADR-178 §3.2 F (iter-217 ADR-167 collapse). Two inline doc comments still claimed pre-iter-163 / pre-iter-218 realities: 1. ruvector-hailo/src/lib.rs `has_model()` — said "Today this is always false — HEF loading isn't wired in yet". Iter 163 made the NPU path canonical (cognitum-v0 + iter-156b HEF), iter-176 added cpu-fallback automatic failover. Updated to reflect iter-163+ reality. 2. ruvector-hailo-cluster/src/error.rs module docstring — said "Maps cleanly onto ruvector_core::EmbeddingError once iteration 14 brings the path dep." iter-218 landed the ruvector-core path dep + EmbeddingProvider impl. Updated to describe the actual iter-218 wiring (ClusterError → RuvectorError::ModelInferenceError) plus the iter-209 is_terminal() helper that drives the retry-loop short-circuit. The third stale reference grep hit at cluster/lib.rs:874 is INSIDE the iter-218 commit's own comment quoting the old (pre-iter-218) doc text as evidence — that's correctly preserved as historical context, not a stale doc to fix. Validated: - cargo check: clean (doc-only, no type-system change) No code change; pure docs. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 22:37:32 -04:00
ruvnet	4694fb6f55	docs(hailo): README — document iter-208 client-side timeout vars (iter 222) iter-204 documented all worker-side env vars in deploy/ruvector-hailo.env.example. iter-208 added two CLIENT-side env vars (`RUVECTOR_CLIENT_CONNECT_TIMEOUT_MS` / `_RPC_TIMEOUT_MS`) read by `GrpcTransport::new()`, which is constructed by the bench/embed/stats CLIs and the three bridges — not by the worker. So they correctly don't belong in the worker .env, but they ARE operator-facing and were undocumented in the README's "Security & DoS hardening" section. Add a "Client-side tunables (iter 208)" subsection with a 2-row table after the systemd-restart-burst block. Explains: * Why these are separate from the worker env (client-side GrpcTransport, not worker config) * The 10s RPC default's relationship to iter-199's batch cap (256 items × ~14ms NPU = ~3.6s legit batch RPC; 10s leaves headroom) * How it composes with iter-182's 30s server-side request_timeout (client gives up first, server still has margin to surface a real hang) Validated: - 406 → 424 lines (+18) - Both env vars cross-checked against source: grpc_transport.rs has both `env::var("RUVECTOR_CLIENT_*")` reads from iter-208 - Markdown table parses (consistent with existing iter-180-184 table format) No code change; pure operator-facing docs. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 22:30:46 -04:00
ruvnet	bcbbe8a592	feat(hailo): example exercising HailoClusterEmbedder as EmbeddingProvider (iter 221) Closes ADR-178 Gap D (MEDIUM) iter-219 short-term. The audit flagged that no consumer in the workspace was actually using `HailoClusterEmbedder` as an `Arc<dyn EmbeddingProvider>` after iter-218 made it possible — so even though the trait impl compiled, the integration claim from ADR-167 §8.4 ("an app holding `BoxedEmbeddingProvider` swaps a Hailo cluster in with zero code changes") had no demonstration. `examples/hailo-cluster-as-provider.rs` does the demonstration in two modes: Default (no live workers — CI smoke): Builds a HailoClusterEmbedder against `null_transport()`, immediately wraps it as `Arc<dyn EmbeddingProvider>`, asserts name() == "ruvector-hailo-cluster" and dimensions() == 384, then calls embed("hello world") to confirm the trait method actually crosses into HailoClusterEmbedder::embed_one_blocking (NullTransport refuses by design — that's the expected error path; the assertion is on the error text, not panic). Proves iter-218 + iter-219 type wiring still composes; runs in <1s. Live (RUVECTOR_HAILO_WORKERS=<csv>): Same construction but with GrpcTransport, embeds an N-doc corpus (default 50, tunable via RUVECTOR_HAILO_CORPUS_N) through the trait method, reports ingest QPS, runs a self-similarity sanity check (cosine of doc[0] against itself should be ≈1.0 and rank top-1 in the corpus). Closes ADR-178 §3.2 D's "5k-doc corpus" recommendation in spirit (smaller default for quick smoke; operator can scale up via env). The example explicitly documents which iter unblocked which line ("Pre-iter-218 this line would have said 'the trait EmbeddingProvider is not implemented for HailoClusterEmbedder'") so a future reader can audit the integration history through the code. Validated: - cargo check --example hailo-cluster-as-provider: clean (6s) - Compile success IS the correctness proof — pre-iter-218 the `Arc<dyn EmbeddingProvider> = Arc::new(cluster)` line would have refused at the type-system level. It now compiles. ADR-178 Gap D status: SHORT-TERM SHIPPED (example exists). The iter-220 mcp-brain client integration remains as separate-ADR follow-up work per ADR-178 §3.2 D's recommendation. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 22:28:20 -04:00
ruvnet	a23e629fc9	docs(hailo): disambiguate ruview-csi-bridge as transport-only (iter 220) Closes ADR-178 Gap C (MEDIUM) short-term. The bridge's module docstring and `summary_to_text` doc previously suggested it produced embeddings useful for "presence / motion / pose downstream consumers" — implying ADR-171's pose-semantic pipeline. ADR-178 §3.2 C audited the actual code path: * `summary_to_text` (ruview-csi-bridge.rs:116) packs the 20-byte ADR-018 header into a fixed-template NL string (channel, rssi, node_id, antennas, subcarriers). * The I/Q payload at `bytes 20..` is parsed for length but otherwise dropped. * Cosine embeddings of the resulting strings cluster by `(channel, rssi-bucket, node_id)`, NOT by anything related to actual WiFi-DensePose pose content. This is fine — the bridge is correctly named and useful for telemetry indexing — but ADR-171's pipeline diagram (`CSI → preprocess → HEF → pose tensor`) implies it does pose semantics, which it doesn't. Operators reading this file or ADR-171 got confused. Two doc updates: 1. Module docstring — new "*Important: this bridge is not* WiFi-DensePose pose embedding**" section explicitly stating the telemetry-indexing scope and pointing to the deferred work (csi-pose-bridge needs a pose HEF, host-side I/Q preprocessing, and a `HailoPipeline<I, O>` generalization — multi-month, separate ADR per ADR-178 §3.2 C's long-term recommendation). 2. `summary_to_text` doc — removed the misleading "presence / motion / pose downstream consumers" phrasing; replaced with a "Note (iter 220)" block clarifying which fields drive the similarity surface. ADR-178 Gap C status: SHORT-TERM CLOSED. Long-term work (the actual pose-semantic bridge) remains tracked as a separate-ADR follow-up. Validated: - cargo check: clean - RUSTDOCFLAGS="-D missing-docs" cargo doc --bin ruview-csi-bridge: clean (matches the iter-178 audit CI step) - No code change; pure doc disambiguation Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 22:09:18 -04:00
ruvnet	a9f81a9243	build(workspace): rejoin hailo crates + ruvector-mmwave (iter 219) Closes ADR-178 Gap E (HIGH; folded into Gap B). Iter-218 landed the ruvector-core path dep + EmbeddingProvider impls — the structural blocker preventing workspace re-inclusion. This iter does the mechanical part: Root Cargo.toml: - Removed `crates/ruvector-hailo`, `crates/hailort-sys`, `crates/ruvector-hailo-cluster` from `[workspace.exclude]`. - Added them + `crates/ruvector-mmwave` (also previously standalone) to `[workspace.members]`. Per-crate Cargo.toml: - Stripped `[workspace]` standalone declarations from all four crates (hailort-sys, ruvector-hailo, ruvector-hailo-cluster, ruvector-mmwave). - Comments updated to reference the iter-219 rejoin + ADR-178 Gap E closure. Per-crate Cargo.lock: - Removed (`git rm`) — parent workspace's Cargo.lock is now canonical for the entire tree. CI's `cargo audit` / `cargo deny check` steps still work from the cluster subdirectory; they walk up to find the workspace root. deny.toml (both hailo crates): - Workspace re-inclusion surfaced 2 advisories that were previously hidden by the narrower per-crate dep tree: RUSTSEC-2025-0141 (bincode 1.x unmaintained) RUSTSEC-2026-0097 (rand unsound w/ custom logger) - Added to `ignore` list with a comment noting these are workspace-wide concerns, not hailo-specific. They'll be addressed in a workspace-wide remediation iter; ignoring here keeps the per-crate audit step green so the iter-202 CI gate doesn't break on this rejoin. Validated: - cargo check --workspace: clean (27s; warnings are pre-existing in unrelated crates: ruvector-graph-node, rvagent-cli, ruvector-scipix, mcp-brain-server, etc.) - cargo deny check (cluster): advisories ok, bans ok, licenses ok, sources ok - cargo deny check --all-features (hailo): same — all four ok - Cluster integration sweep --features tls --test-threads=1: 23 suites, all green; 120 lib tests pass with TLS feature - 4 newly-included workspace members all build with default features on x86 (no Pi-only deps pulled in) Effect: `cargo build --workspace` from the repo root now exercises the full hailo stack. A workspace-wide refactor (ruvector-core trait change, security advisory rebuild, clippy bump) can no longer silently miss the hailo crates the way ADR-178 §3.2 E flagged. ADR-178 Gap E status: CLOSED. Gap B status: PARTS 1 + 2 SHIPPED; the only remaining `--backend hailo` ruvector-cli flag wiring is a follow-up consumer-side iter. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 21:47:03 -04:00
ruvnet	a82fae6bab	feat(hailo): impl EmbeddingProvider for both hailo embedders (iter 218) Closes ADR-178 Gap B (HIGH) part 1. The headline integration claim from ADR-167 §2.5 / §8.4 — that an app holding `Arc<dyn EmbeddingProvider>` could transparently swap a single-Pi HailoEmbedder for a fleet HailoClusterEmbedder — was never delivered. Iter-178 audit found: * Neither hailo crate declared a ruvector-core dep. * `crates/ruvector-hailo-cluster/src/lib.rs:140-143` honestly admitted the gap in a doc comment ("Implements `EmbeddingProvider` once iteration 14 brings the path dep on `ruvector-core`"). That iter never landed. * `crates/ruvector-hailo/src/lib.rs:396-405` had a no-op "signature parity" test that asserted only `T: Send + Sync`, never that the impl actually existed. Changes: 1. Add `ruvector-core` path dep to both hailo crates with `default-features = false` so the reqwest / ort / hnsw stack stays out of the Pi build. Only the trait + RuvectorError surface is needed. 2. `impl EmbeddingProvider for HailoEmbedder` (ruvector-hailo). ~10 LOC, delegates to existing inherent methods. `embed` folds `HailoError → RuvectorError::ModelInferenceError`. 3. `impl EmbeddingProvider for HailoClusterEmbedder` (ruvector-hailo-cluster). Same shape; `embed` folds `ClusterError → ModelInferenceError`. `name()` returns the static `"ruvector-hailo-cluster"` since a cluster is a fleet, not a single named device. 4. Replace the no-op signature-parity test with a real impl-bound static assertion: `fn assert_impl<T: EmbeddingProvider>() {}` `assert_impl::<HailoEmbedder>();` This now compile-fails if either the trait drifts or our impl breaks — catching the same regression class ADR-178 flagged. Validated: - hailo lib tests : 21/21 pass (signature_parity now real impl-bound, was no-op) - cluster lib tests : 120/120 pass with --features tls (114 without tls — feature gating accounts for the 6 TLS-only tests) - full integration sweep --test-threads=1: 23 suites, all green - cargo build --release on both crates: clean, no extra deps pulled in (ruvector-core compiles default-features-off in ~6 s additional) What this does NOT do (deferred to part 2): - Workspace re-inclusion (ADR-178 Gap E folds into B). The hailo crates stay in `[workspace.exclude]` for now because hailort-sys only links libhailort on Pi 5 + AI HAT+; rejoining requires confirming the no-feature default still cargo build --workspace cleanly. Saved for a focused iter so this one can ship the trait impl without a workspace-config blast radius. - `ruvector-cli --backend hailo` flag wiring. ADR-167 §2.3 plan; unblocked by this iter but not in scope. ADR-178 Gap B status: PART 1 SHIPPED (impl exists). Part 2 (workspace inclusion + cli flag) tracked for a follow-up iter. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 21:40:26 -04:00
ruvnet	f644f31de9	docs(adr): collapse ADR-167 stale stratigraphy to single status (iter 217) Closes ADR-178 Gap F (MEDIUM). ADR-167 had three nested status snapshots stacked on top of the iter-163 NPU-default banner — "Earlier (iter 134/135) snapshot — CPU fallback only", "HEF model surgery (iter 139)", "Earlier (iter 116) snapshot" — each from a different point in the project's history. An unfamiliar operator opening the master ADR had to walk past three older worldviews to find what's true today. Three changes: 1. Replaced the stratified Status section with a single clean iter-213+ block: "NPU acceleration is the production default since iter 163. ~70 embeds/sec/worker, p50=55-57 ms, p99=86-90 ms, 9.6× over cpu-fallback. ADR-176 tracks the EPIC; iters 174-216 layer security/DoS/OOM hardening." Points readers needing chronology to §9 History. 2. Updated step-10 row in §5 Implementation plan from "exits clean with NotYetImplemented (gate is HEF compilation only)" to the iter-145+ reality: "startup self-test embed ok dim=384 → 7 DoS gates logged → serving addr=0.0.0.0:50051". The NotYetImplemented exit was true at iter 12; iter 163 made NPU the default, iter 145 added the self-test, iters 174-216 added the hardening surface — all unmentioned in the prior text. 3. Hoisted the three stripped snapshot blocks (lines 28-275 of the prior version) verbatim into a new §9 History appendix at the bottom. Preserves the full chronological story for anyone auditing the project's evolution; cross-references that depend on these stratified snapshots are flagged as migrating to ADR-176 (the HEF EPIC) where they correctly belong. ADR-178 Gap F status: CLOSED. Validated: - 612 → 638 lines (+26 net = History block header offset + Status expansion; chronological content preserved verbatim) - Section ordering: Status → §1-§8 (Decision/Plan/§8 Multi-Pi added late) → §7 References → §9 History - All deep links to specific iters in §9 still resolvable - No code change; pure ADR docs hygiene Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 21:33:37 -04:00
ruvnet	6e5e179ee1	deploy(hailo): rename install-bridge.sh → install-mmwave-bridge.sh (iter 216) Closes ADR-178 Gap H (LOW). The mmwave-bridge installer was named unqualified `install-bridge.sh` since iter 106 — fine when there was only one bridge, increasingly misleading after iter 123 added ruview-csi-bridge and iter 124 added ruvllm-bridge. ADR-178 §3.2 H recommended folding the rename into Gap A (iter 215); shipped as its own focused commit so the rename is git-traceable separately. Used `git mv` so blame history follows the file. Updated all 7 references across the deploy tree: - install-ruview-csi-bridge.sh (companion-of comment) - install-mmwave-bridge.sh (self-reference in usage line) - install-ruvllm-bridge.sh (companion-of comment) - ruvector-mmwave-bridge.env.example (udev rule provenance) - ruvector-mmwave-bridge.service (User=/Group= comment + udev note) - 99-radar-ruvector.rules (provenance comment) - cross-build-bridges.sh (operator hint at line 144) ADR-178's references to `install-bridge.sh` (lines 83, 96, 337-342) are intentionally preserved — they're the historical gap evidence the analysis relies on. Updating them would erase the rationale for this commit. Validated: - bash -n on install-mmwave-bridge.sh + cross-build-bridges.sh - systemd-analyze verify on ruvector-mmwave-bridge.service (only "binary missing" error, expected on dev box) - All three install scripts now consistently named: install-mmwave-bridge.sh (iter 106 + iter 216 rename) install-ruview-csi-bridge.sh (iter 123) install-ruvllm-bridge.sh (iter 215) ADR-178 Gap H status: CLOSED. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 21:26:21 -04:00
ruvnet	2acb56979e	deploy(hailo): ruvllm-bridge install script + env example (iter 215) Closes ADR-178 Gap A (HIGH). The other two bridges shipped with deploy automation since iter 106 (mmwave) / iter 123 (csi), but ruvllm-bridge had no installer or env example — operators had to hand-build the system user, drop the binary, and write the env file themselves. iter 207's commit message specifically called this out as a known gap. Two artifacts shipped: install-ruvllm-bridge.sh Mirror of install-ruview-csi-bridge.sh shape — creates `ruvector-ruvllm` system user (no home, no shell), drops /usr/local/bin/ruvllm-bridge, populates /etc/ruvllm-bridge.env from the example, creates /var/lib/ruvector-ruvllm state dir at 0750. Idempotent. ruvllm-bridge.env.example Operator-facing template with the three required env vars (WORKERS, FINGERPRINT, DIM) and EXTRA_ARGS for the iter-187/188/ 189 TLS / mTLS flag set. Documents `--tls-domain` explicitly (the iter-207 fix the csi-bridge env got). Lifecycle difference vs the other two bridges: ruvllm-bridge is a stdin/stdout JSONL adapter, not a UDP/serial daemon. It's spawned by the parent ruvllm process, reads requests on stdin, writes responses on stdout, exits on EOF. systemd's daemon model (start/stop/restart-on-failure) doesn't fit, so this iter deliberately ships NO `.service` unit. The install script's exit message documents the parent-managed invocation pattern with a copy-paste-able example. Validated: - bash -n on install script: parse clean - env file `set -a; . file; set +a`: parse clean - install script chmod 0755 + executable bit set - All three bridges now have install + env-example artifacts; only mmwave + csi have systemd units (correct — the bridge architectures genuinely differ) ADR-178 Gap A status: CLOSED. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 21:24:59 -04:00
ruvnet	81c22c16f2	docs(adr): ADR-178 — ruvector/ruview hailo cluster integration gap analysis Captures the gap analysis the user requested (goal-planner agent research, 459 lines, evidence-grounded with file:line citations matching the ADR-172/iter-176-EPIC house style). Eight gaps identified, three at HIGH severity: Gap A ruvllm-bridge missing deploy artifacts (install-.sh, .service, *.env.example, README mention) — iter 207 specifically called this out; mmwave + ruview-csi each ship complete bundles, ruvllm doesn't. Gap B ruvector-core EmbeddingProvider not wired — neither hailo crate declares a ruvector-core dep; ADR-167 §2.5/§8.4's headline integration promise is unmet; the cluster lib.rs:140-143 doc comment literally admits it; the parity test at lib.rs:396-405 is a no-op (Send + Sync only). Gap C ruview-csi-bridge embeds telemetry, not pose-semantic data — summary_to_text:95-108 packs only the 20-byte ADR-018 header as a string and drops the I/Q payload; the bridge does telemetry indexing, not the WiFi-DensePose pose- semantic embedding ADR-171 implies. Remediation list outlines six iter-sized follow-ups (Gap A first since it has the smallest blast radius — pure deploy-artifact work at parity with the existing two bridges). Three larger items (csi-pose-bridge rewrite, mcp-brain client, LoRaTransport) correctly flagged for separate ADRs rather than scope creep here. No code change in this commit; pure planning artifact. The ADR is in the standard docs/adr/ format with frontmatter relating it to ADR-167/168/171/172/173/176/177. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 21:23:22 -04:00
ruvnet	d522ca3701	sec(hailo): restore verify_files doc + fix intra-doc link (iter 214) iter-211's refactor introduced a small docs regression: the multi-paragraph doc comment that originally explained verify_files ended up attached to the new private read_with_cap helper, leaving verify_files (a public function) with no doc. The hailo-backend audit CI step `RUSTDOCFLAGS="-D missing-docs" cargo doc` would have flagged this on the next run. Also caught a follow-up: my first repair pass referenced `[read_with_cap]` as an intra-doc link, but read_with_cap is private — rustdoc emits `rustdoc::private_intra_doc_links` when generating public API docs. Switched to a plain code-style mention ("the private read_with_cap helper") so the link warning clears without `--document-private-items`. Validated: - `cargo check --release` clean (was 1 missing-docs warning) - `RUSTDOCFLAGS="-D missing-docs" cargo doc --no-deps --lib` clean (matches the doc-warnings CI step in .github/workflows/hailo-backend-audit.yml) - lib tests still 120/120 (semantics unchanged) - integration sweep all green No production code change; pure docs hygiene catching the iter-211 regression before it would have failed CI. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 21:21:49 -04:00
ruvnet	d1c2af0c84	Merge origin/main into hailo-backend (sync iter 213+ → main) Pulls in `c15326d04` (fix(ruvllm): pin optionalDependencies to 2.0.1) from main. Single-file change in npm/packages/ruvllm/package.json, zero overlap with the hailo-backend crates (crates/ruvector-hailo*, crates/hailort-sys, crates/ruvector-mmwave). Conflict-free 3-way merge verified via `git merge-tree` before commit. Iter 213 OOM-bounding sweep + the 38-commit security/audit run (iter 174-213) on this branch is unaffected.	2026-05-03 21:19:34 -04:00
ruvnet	a838d9e9e9	sec(hailo): cap vocab.txt + config.json file reads (iter 213) Continues iter-210/211/212's OOM-bounding sweep across all operator-controlled file paths. Three remaining boot-time reads in the ruvector-hailo crate: vocab.txt (tokenizer.rs::from_vocab_file) - all-MiniLM-L6-v2: 232 KB - XLM-RoBERTa large: ~5 MB ceiling - cap: 16 MB (~70× legit headroom) config.json (host_embeddings.rs + cpu_embedder.rs) - BERT-family: <1 KB typically - cap: 64 KB (64× legit headroom) Same threat model as iter-210 (manifest), iter-211 (sig + pubkey), iter-212 (PEM): operator-controlled paths set via env-driven model dir. A misconfig pointing model_dir at /var/log/* or a binary blob would otherwise OOM the worker at boot when these files load. config.json caps in BOTH host_embeddings.rs (NPU path) and cpu_embedder.rs (cpu-fallback path) — duplicated rather than factored because the two crates have different error types (HailoError variants) and the cap value is identical anyway. Validated: - 2 new tokenizer test cases (lib tokenizer::tests): from_vocab_file_rejects_oversized — 32 MB fixture, asserts rejection with "16 MB cap" or "iter 213" in error from_vocab_file_accepts_small_vocab — mini_vocab() loads cleanly, locking in that the cap doesn't block legit use - hailo lib tests: 19 → 21 (+2) - hailo cpu-fallback tests: still 27 (unchanged — cap path is only reached on oversize, which the test fixtures don't trigger) - cluster integration sweep --test-threads=1: all 23 suites green Coverage trail now complete for cluster + hailo operator-path reads: iter 210 FileDiscovery manifest (1 MB) iter 211 manifest sig + pubkey (16 KB each) iter 212 TLS PEM via read_pem (1 MB; gates 5 paths) iter 213 vocab.txt + config.json (16 MB / 64 KB) Pi worker untouched in code; the gates fire at boot before any RPC serves traffic. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 21:07:52 -04:00
ruvnet	d7016cd819	sec(hailo): cap TLS PEM file reads at 1 MB (iter 212) Continues iter-210/211's pattern of OOM-bounding operator-controlled file paths read at boot. `tls::read_pem` is the single chokepoint for all five PEM-loading paths in the codebase (server cert, server key, client cert, client key, client CA bundle), so capping it once gates all of them. Same threat model as iter-210 (FileDiscovery manifest) and iter-211 (manifest_sig sig + pubkey): operator-controlled paths set via env var (RUVECTOR_TLS_CERT, _KEY, _CLIENT_CA, etc.) — a misconfig pointing one of these at /var/log/syslog or a binary blob would OOM the worker at boot before rustls ever sees the bytes. 1 MB cap is ~100× a full chain-with-intermediates legitimate PEM (~30 KB peak). Validated: - Existing tls tests: 4/4 still pass (domain_from_address coverage untouched) - 2 new test cases: read_pem_rejects_oversized_file — 2 MB pem-shaped fixture, asserts size-cap rejection with "iter 212" + "byte cap" read_pem_accepts_small_file — 30-byte legit-shape PEM still reads cleanly, locking in that the cap doesn't accidentally block legit traffic - lib tests: 118 → 120 (+2) - full integration sweep --test-threads=1: all suites green Coverage now: every operator-controlled file path on the worker boot/RPC paths is OOM-bounded. iter-210 (manifest), iter-211 (sig + pubkey), iter-212 (5× PEM via read_pem) — the audit trail matches the deploy artifact set. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 21:01:33 -04:00
ruvnet	0aabc6272e	sec(hailo): cap manifest_sig file reads (iter 211) Parallel to iter-210's FileDiscovery cap. `manifest_sig::verify_files` read three operator-controlled paths with no size cap: - manifest (1 MB legit ceiling, same as iter-210) - signature (ed25519 ~64 B; 16 KB ceiling = 180× legit) - pubkey (ed25519 ~32 B hex; 16 KB ceiling = same headroom) A misconfig (operator pointing /etc/ruvector-hailo/workers.sig at /var/log/syslog) or an attacker with write access to that directory could OOM the worker at boot during signature verification — the read happens before any sig validation can fail. iter-210 closed the parallel hole on the manifest path itself; this iter closes the remaining two. Implementation factors a small `read_with_cap(path, cap, label)` helper so all three reads share the same stat-then-read pattern. The caps are constants in the function rather than env vars because: - Legit values are tiny + fixed (ed25519 is a known size) - There's no operational need to tune them - Hardcoding keeps the gate one less surface to misconfigure Validated: - Existing sig tests pass: 6/6 (no behavior change for in-spec inputs) - 2 new test cases: verify_files_rejects_oversized_signature — 64 KB sig fixture verify_files_rejects_oversized_pubkey — 64 KB pk fixture Both assert the rejection text mentions the right label ("signature"/"pubkey") + "iter 211" for traceability. - lib tests: 116 → 118 (+2) - full integration sweep: all 23 suites green No production code change to the worker's hot path; the gate is operator-side at boot during the manifest signature check. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:57:41 -04:00
ruvnet	d277aa797c	sec(hailo): cap FileDiscovery manifest size at 1 MB (iter 210) Real audit find: `FileDiscovery::discover` called `std::fs::read_to_string` on the operator's manifest path with no size cap. A pathologically large file (operator misconfig pointing at /var/log/* or a binary blob, or an attacker-corrupted /etc/ruvector-hailo/workers.txt with write access) would OOM the worker at boot — and the OOM happens BEFORE the iter-107 ed25519 signature verification, so even signed-only deploys are vulnerable to "wrong file pointed at" misconfigs. Fix: stat the file first; refuse if it exceeds 1 MB. Legitimate fleet manifests are one `name = host:port` per worker (~100 B/line); even a 1000-worker tailnet fits in <100 KB. 1 MB is 10× legit headroom + a clean error message that names the cap and links to the iter for traceability. The cap fires BEFORE the iter-107 signature check so a giant file fails fast — verifying a 1 GB "signed" manifest would be slow even though it'd ultimately reject. Validated: - Unit tests added (lib discovery::tests): file_discovery_rejects_oversized_manifest — writes a 2 MB fixture, asserts ClusterError::Transport with the cap rejection text mentioning "iter 210" + "byte cap" file_discovery_accepts_small_manifest — well-under-cap manifest parses to 2 WorkerEndpoints, locking in that the cap doesn't accidentally block legitimate use - lib tests: 114 → 116 (+2) - full integration sweep --test-threads=1: 13 suites, all green No production code change to the worker itself; the FileDiscovery gate is operator-side at boot. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:53:10 -04:00
ruvnet	ef690ebdef	sec(hailo): short-circuit retry loop on terminal errors (iter 209) Real audit find: `embed_one_blocking_with_request_id` retried EVERY error up to MAX_DISPATCH_RETRIES=2 (3 total attempts). For transient failures (network blip, worker crash, deadline_exceeded) that's correct. For deterministic errors that won't change on retry, it makes things actively worse: iter-180 byte cap (OutOfRange) : 3 hammered worker calls, all guaranteed to fail identically. Each wastes worker NPU + bandwidth. iter-199 batch cap (InvalidArgument) : same. iter-104/200 rate limit (ResourceExhausted): retrying makes things worse — every retry consumes another token from the same peer's bucket via the interceptor + iter-200 check_n debit, deepening the rate-limit hole the caller is already in by 3×. DimMismatch / FingerprintMismatch : worker is structurally wrong; retry can't help. Add `ClusterError::is_terminal()` that string-matches the wrapped gRPC Status (tonic's Display includes "status: <Code>") for the three deterministic codes plus the two structural variants. Wire into the retry loop: terminal errors return immediately; transient errors keep their existing retry behavior. The string-match approach was chosen over plumbing `tonic::Code` through ClusterError::Transport because the latter would touch ~30 call sites + ripple through ClusterError's Display impl. The match patterns are stable (tonic 0.12 Status::code() Display is "status: <Code>" verbatim) and unit-tested with 6 cases below to catch any future drift. Validated: - lib tests : 108 → 114 (+6 error::tests::is_terminal_*) - full sweep (--features tls, --test-threads=1): all 23 suites green (lib + 22 integration suites unchanged in pass count) - test cases cover: OutOfRange (byte cap) ✓ InvalidArgument (batch cap) ✓ ResourceExhausted (rate limit) ✓ DimMismatch (structural) ✓ FingerprintMismatch (structural) ✓ DeadlineExceeded / Cancelled / Internal ← NOT terminal, legit retry candidates ✓ NoWorkers / AllWorkersFailed ← aggregate, not per-attempt ✓ Behavior change for callers: Before: 3-attempt retries on byte/batch/rate-limit errors, ~3× extra wasted server work + worse rate-limit damage. After: immediate clean error, server work drops to 1 attempt, rate-limit token consumption matches the original 1-RPC-1-token contract. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:49:21 -04:00
ruvnet	16f7da747a	sec(hailo): client rpc_timeout default mismatched with iter-199 batch (iter 208) Real audit find: iter-199 raised the worker's `max_batch_size` to 256 (rejecting larger batches). The cluster client's `GrpcTransport::new` default rpc_timeout was 2 s — set in iter 92 when the only RPC was unary embed at ~14 ms each. With iter-199's batched streaming, a single legitimate embed_stream RPC at b=256 needs 256 items × ~14 ms NPU = ~3.6 s of server-side time. The 2 s client deadline cuts it off mid-flight, guaranteeing `Status::deadline_exceeded` for every b≥128 batch even though the worker would have completed the work cleanly. The iter-182 30 s server-side `request_timeout` never gets a chance to fire because the client gives up first. Fix: bump default rpc_timeout to 10 s (2.7× headroom over the b=256 worst case, still well under iter-182's 30 s outer bound — so a real hung worker still surfaces to the client within its own timeout). Make both connect + rpc timeouts env-tunable for ops: RUVECTOR_CLIENT_CONNECT_TIMEOUT_MS default 5000, floor 100 RUVECTOR_CLIENT_RPC_TIMEOUT_MS default 10000, floor 100 Floors prevent a misconfig (e.g. =0) from immediately failing every RPC. iter-179's streaming saturation sweep peaked at b=16 (224 ms NPU time) so didn't catch this — the bug only manifests at higher batch sizes that the iter-199 ceiling first made viable. Validated: - Both feature-combo builds clean - Cluster integration tests still pass: tls_roundtrip : 2/2 cluster_load_distribution: 12/12 - Smoke against Pi worker with overrides set: RUVECTOR_CLIENT_RPC_TIMEOUT_MS=15000 RUVECTOR_CLIENT_CONNECT_TIMEOUT_MS=8000 → bench runs cleanly (env vars accepted, no parse error) - Clippy clean (-D warnings) No production code changed for the worker; pure transport-side correction. Pi worker untouched. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:45:19 -04:00
ruvnet	88a4ea429c	docs(hailo): csi-bridge env — document missing --tls-domain (iter 207) Audit of bridge env examples found a docs inconsistency: - mmwave-bridge.env.example : listed all 4 TLS flags (--tls-ca, --tls-domain, --tls-client-cert, --tls-client-key) - ruview-csi-bridge.env.example: listed only 3 — omitted --tls-domain Both bridge binaries parse `--tls-domain` (verified: src/bin/ ruview-csi-bridge.rs:135 + src/bin/mmwave-bridge.rs:121). When the cluster's worker cert SAN is a DNS name (e.g. server.crt issued for "worker.local") and the bridge dials via IP (the RUVECTOR_CSI_WORKERS default 100.77.59.83:50051), rustls validates the cert SAN against the SNI — which defaults to "100.77.59.83" if --tls-domain isn't set. That fails the hostname check and the bridge can't reach the cluster. Without the docs, an operator hitting this had no obvious way to fix it short of grep'ing the binary. The csi-bridge env example now mirrors the mmwave-bridge layout: lists all 4 flags with a clear note on when each is needed. Validated: - bash sources the file cleanly - 34 → 41 lines No code change; pure docs alignment. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:37:27 -04:00
ruvnet	bde25ad23f	docs(hailo): README "Security & DoS hardening" section (iter 206) Audit of operator-facing docs found the cluster crate's 358-line README contained zero references to any of the iter 174-205 security work. Operators evaluating the project couldn't tell the worker ships with eight layered DoS gates, an opt-in HEF sha256 pin, mTLS support, or systemd restart-rate limiting — all of which had to be discovered by reading worker.rs, deploy/ruvector-hailo.env.example, or the .service file. Add a "Security & DoS hardening" section between QUICKSTART and "What it ships": - Table of the 8 gRPC-surface gates (iter 180/181/182/183/184/190/ 191/199) with iter / env var / default / floor / what-it-bounds. - Three orthogonal tracks called out: HEF integrity pin (iter 174) — sha256 verification at boot Per-peer rate limit (iter 104/200) — incl. iter-200's per-item debit on streaming RPCs so the throttle isn't defeated by batching TLS + mTLS (iter 99/100) — server-side env-var contract + symmetric client flags from iter 187/188/189 - Shutdown hardening (iter 185) — why the worker exits via `process::exit(0)` instead of clean drop, and the RUVECTOR_SHUTDOWN_FORCE_CLEAN escape hatch for the future upstream fix. - systemd restart-burst cap (iter 205) — bounded retry vs the pre-iter-205 forever-cycling behavior. Pointer to deploy/ruvector-hailo.env.example for full per-knob rationale (the iter-204 docs). Validated: - 358 → 406 lines, +48 lines of operator-facing security docs - Every env var referenced in the new section traces back to source code (loop-checked across both crates) - Markdown is well-formed (heading hierarchy, table syntax, intra- repo link to ../../docs/adr/* preserved) No production code changed; pure docs. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:33:13 -04:00
ruvnet	ea27f1b0df	sec(hailo): bound systemd restart-on-failure loop (iter 205) Audit of the deploy systemd units found a real reliability gap. All three (worker + mmwave-bridge + ruview-csi-bridge) carry `Restart=on-failure` + `RestartSec=2` so a transient crash recovers quickly. But none had `StartLimitBurst` / `StartLimitIntervalSec` set, so a unit that fails every startup (worker: bad RUVECTOR_HEF_SHA256 from iter 174, missing model.hef, vstream alloc fail; bridges: missing UART device, malformed worker manifest) cycles every 2 s forever — churning the journal and (for the worker) spinning the NPU vdevice. Add to each unit's [Unit] section: StartLimitBurst=5 StartLimitIntervalSec=60 Now after 5 failed starts inside a 60 s window systemd parks the unit in `failed` state — operator sees a clear stop instead of a log flood. Iter-185's clean shutdown path (`process::exit(0)`) is treated as success and doesn't count toward the burst. Validated: - `systemd-analyze verify` on all three units → clean parse (only "binary missing" errors, expected on dev box where the binaries aren't installed) No production code changed; pure deploy-side hygiene. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:28:43 -04:00
ruvnet	c9483097a1	docs(hailo): document the iter-180-200 DoS gate env vars (iter 204) Audit of the operator-facing deploy artifacts found `deploy/ruvector-hailo.env.example` was 50 lines covering only RUVECTOR_WORKER_BIND, RUVECTOR_MODEL_DIR, RUST_LOG, RUVECTOR_CPU_FALLBACK_POOL_SIZE, and RUVECTOR_HEF_SHA256. The 9 DoS-hardening env vars added in iter 180-200 plus the 4 longstanding ADR-172 §3 vars (rate limit, audit log mode, TLS, mTLS) had no operator-facing documentation. Operators tuning the worker had to read the worker.rs module docstring or grep the binary's startup log to discover what knobs existed. Add a "DoS gate stack" block listing every gate with: - which iter introduced it - default value (commented out — same value the worker logs at startup, so deployers see the canonical setting without activating it) - the floor enforced in worker.rs that prevents a misconfig from locking out legitimate traffic - one-paragraph rationale linking back to the iter that proved the gate was needed Plus four pre-existing ADR-172 §3 vars (rate limit, audit log mode, TLS, mTLS) that were similarly undocumented in this artifact. Validated: - bash sources the file cleanly: `set -a; . env.example; set +a` → "parse ok" - every documented env var resolves to source code in crates/ruvector-hailo-cluster/src or crates/ruvector-hailo/src (loop-checked; no MISSING IN SRC output) - 50 → 143 lines, +93 lines of operator-facing documentation Pi worker untouched; pure docs change. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:23:36 -04:00
ruvnet	7417bfaea1	sec(hailo): backport iter-199 batch cap to fakeworker (iter 203) iter-192 brought 6 of the worker's gRPC DoS gates to fakeworker for parity. iter-199 added the 7th gate (`embed_stream` batch-size cap) to the real worker but didn't backport it — fakeworker silently processed batches of any size while the real worker rejected them. Same parity-drift problem iter-192 was meant to prevent. Audited end-to-end during iter 203: confirmed iter-192 gates fire correctly on fakeworker (over-cap 8 KB → OutOfRange "found 8223 bytes, limit 4096"), but `embed_stream` accepted unbounded batches because it never checked length. Backport adds a `max_batch_size` field to FakeWorker (read from the same `RUVECTOR_MAX_BATCH_SIZE` env, same default 256, same floor 1 as the real worker, iter 199). The handler refuses oversized batches with `Status::invalid_argument` matching the real worker's error text, so any test that asserted the rejection format keeps working. Validated: - Cluster integration sweep --test-threads=1: 186/186 pass (legit fakeworker test batches all fit under 256 default — no existing test breaks; the cap is invisible to legitimate use) - End-to-end smoke against `RUVECTOR_MAX_BATCH_SIZE=8`: startup banner: "fakeworker DoS-gate parity (iter 192/203) ... max_batch_size=8" over-cap (b=16): 493 376 fast rejections, 0 successful under-cap (b=4): 99 709 RPCs/sec × 4 vectors = ~400k/sec (zero-latency mock — purely tonic+gRPC framing throughput) - iter-192 byte cap still fires: tested `RUVECTOR_MAX_REQUEST_BYTES=4096` against an 8 KB embed → OutOfRange "found 8223 bytes, the limit is: 4096 bytes" Eight DoS gates now mirrored on fakeworker (iter 180/181/182/183/ 184/190 from iter-192 + iter-199 from this iter). iter-200's per-item rate-limit debit doesn't backport because fakeworker has no rate limiter (intentional — pure mock for transport-level testing). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:20:46 -04:00
ruvnet	e079e5d7c4	sec(hailo): close cargo-deny CI coverage gap + bans regression (iter 202) Audit found two related issues: 1. Iter 177 added deny.toml to BOTH the cluster and hailo crates, but CI only audited the cluster's. The hailo crate's candle / tokenizers / safetensors chain (cpu-fallback feature) and hailort-sys FFI surface (hailo feature) were ungated. 2. Both deny.toml files set `wildcards = "deny"`, which cargo-deny applies to path deps too. The cluster has path deps on ruvector-hailo, ruvector-mmwave, hailort-sys — so the `bans` check would fail on `cargo deny check` if anyone ran it. The CI step ran but apparently never gated; running it locally now surfaces: error[wildcard]: found 1 wildcard dependency for crate 'ruvector-hailo' ... bans FAILED Fix: - Add `allow-wildcard-paths = true` to both deny.toml [bans] sections. cargo-deny only honors this on non-publishable crates, so also mark both crates `publish = false`. Both are internal-only (path deps to hailort-sys make them unpublishable to crates.io anyway), so the publish flip is correct hygiene independent of cargo-deny. - Add a second `cargo deny` step in the hailo-backend-audit workflow that runs in `crates/ruvector-hailo` with `--all-features` so the cpu-fallback + hailo feature surfaces are audited. - Add three new test/clippy steps for the hailo crate so iter-198's hef_verify cases (and iter-186 host_embeddings, iter-191 hef_pipeline patches) are explicitly gated: cargo test (default features) cargo test --features cpu-fallback (hef_verify + tokenizer) cargo clippy --all-targets -D warnings Validated locally: Both crates: cargo deny check → advisories ok, bans ok, licenses ok, sources ok hailo lib : 19 tests pass (default) 26 tests pass (--features cpu-fallback) hailo clippy: clean cluster lib: 108 tests still pass No production code changed; pure CI + crate-config hygiene. Pi worker untouched. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:15:57 -04:00
ruvnet	1d8d64b26f	test(hailo): lock in iter-200 check_n behavior (iter 201) iter-200 added `RateLimiter::check_n(peer, n)` to debit the streaming-batch length against the per-peer rate limiter, then wired it into `embed_stream`. Both code paths shipped without direct test coverage. Add five focused unit tests covering the contract: check_n_zero_is_a_noop n=0 must not consume tokens (the embed_stream caller passes n-1 after the interceptor's 1, so for batch=1 the call is n=0). Repeated zero-calls don't burn the bucket; a normal check still succeeds afterwards. check_n_within_burst_consumes_n_tokens 1 rps / burst 5: check_n(3) leaves 2 tokens; two more singleton checks pass; the third fails. Locks in the "actually consumes n tokens" property. check_n_exceeding_burst_is_denied 1 rps / burst 4: check_n(8) returns Err (governor's InsufficientCapacity collapsed to RateLimitDenied). The bucket is unchanged — the failed attempt does NOT burn any tokens, so 4 singleton checks still pass after. check_n_partial_capacity_denied_without_consuming Burn 2 of 4, then check_n(3) — tokens-needed (2 + 3 = 5) > 4 so denied. The 2 already-burned tokens stay burned; the failed check_n doesn't roll them back. Verifies the failure mode is "deny + don't side-effect." check_n_separate_peers_have_independent_buckets A streaming-batch debit on peer-a must not bleed into peer-b's quota — proves the per-peer keying still holds for check_n. Validated: - rate_limit lib tests: 7 → 12 (+5 iter 201) - full lib : 103 → 108 - full integration sweep : 181 → 186 tests, 0 failures - all flaky tests still green (iter-196/197 fixes hold) Pi worker untouched; pure test-side addition. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:09:37 -04:00
ruvnet	0ffff492bf	sec(hailo): debit rate limiter by batch size on embed_stream (iter 200) iter-104's per-peer rate limiter ran in the gRPC interceptor, which fires once per RPC regardless of body shape. With iter-199's 256-batch ceiling, that meant a peer rate-limited at 1 RPS could still extract 256 embeds/sec by sending one streaming RPC per second — defeating the iter-104 throttle entirely. iter-199 closed the worst case (the ~16 k-batch DoS), but a rate-limited peer was still 256× over budget. Fix: in `embed_stream`, after the batch-size cap check passes, debit the rate limiter by `n - 1` more tokens (the interceptor already counted the first one). Total debit per RPC = batch length, so a 1 RPS peer is genuinely capped at 1 embed/sec end-to-end whether they send one unary RPC or one batched RPC. Adds `RateLimiter::check_n(peer, n)` wrapping governor's `check_n` + NonZeroU32 + InsufficientCapacity → RateLimitDenied collapse. n == 0 short-circuits to Ok(()). Path is a no-op when the limiter is None (default deploy), so unary RPS-only fleets see no behavior change. When enabled, denied batches return Status::resource_exhausted and bump the same shared counter the iter-105 stats endpoint surfaces. Validated: - rate_limit lib tests: 7/7 pass (existing coverage holds) - Pi self-test: vec_head=0.0181,-0.0220,0.0451,0.0159 (unchanged) - Pi unary bench c=4 b=1, 8 s × 3: 66.5, 58.8, 57.8 → mean 61.0/sec, p50=56-63 ms (tailnet jitter active during this iter; worker-side latency was ~16-28 ms in journalctl, so the dip was network) - Pi streaming bench c=1 b=16, 6 s: 46.8 RPCs/sec × 16 vectors = 749 vectors/sec, 0 errors, p50=255 ms/RPC = 16 ms/item — NPU-rate as expected, iter-200's `n > 1` branch hit but no-op'd (limiter=None). End-of-session DoS gate stack is now seven gates layered: iter 180 decoding cap 64 KB iter 181 max_concurrent_streams 256 iter 182 request_timeout 30 s iter 183 rapid-reset cap 32 iter 184 http2_keepalive 60 s iter 190 encoding cap 16 KB iter 199 embed_stream batch 256 iter 200 rate-limit batch debit per-item accounting Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 20:05:29 -04:00
ruvnet	2d7c3a8810	sec(hailo): cap embed_stream batch length (iter 199) Real DoS vector found by audit: `embed_stream` accepted unbounded `EmbedBatchRequest.texts.len()`. The iter-180 64 KB byte cap bounded the encoded request size, but tightly-packed 1-byte texts (each ~3 B proto framing + 1 B string) fit ~16 k entries inside that envelope. Each entry triggers a serial ~14 ms NPU embed, holding the worker connection for ~228 s — well past the iter-182 30 s tonic timeout (which kicks the connection but doesn't unblock the in-flight FFI work). Add `RUVECTOR_MAX_BATCH_SIZE` (default 256, floor 1) on the worker side. iter-179's streaming saturation sweep peaked at b=16, so 256 is 16× legit headroom. Over-cap requests return InvalidArgument instantly; under-cap requests are unaffected. Validated on cognitum-v0: Startup banner now logs seven gates (added iter 199): embed_stream batch-size cap set ... max_batch_size=256 DoS probe — bench --batch-size 300 (over cap), 4 s, c=1: 20 700 fast rejections, 0 successful Worker log: "embed_stream batch too large — rejecting batch_size=300 max_batch_size=256" with request_id Acceptance probe — bench --batch-size 16 (under cap), 6 s, c=1: 46.9 RPCs/sec × 16 vectors/RPC = 750 vectors/sec p50 per RPC = 249 ms (= 16 ms/item, NPU-rate-bound) 0 errors Worker fleet stats post-iter-199: avg_us=23694 (healthy NPU rate ~70 embeds/sec) errors=0, NPU temps 55.2/54.8 °C Self-test bit-identical (vec_head=0.0181,-0.0220,0.0451,0.0159). Unary regression bench was inconclusive — a tailnet jitter event was active during this iter (ping showed RTT 14-280 ms vs the typical 13 ms minimum). Worker-side avg latency held at ~24 ms (GetStats), so the bench dip was network, not iter-199-introduced. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 19:24:12 -04:00
ruvnet	14f44a3e85	test(hailo): lock in iter-174 HEF sha256 pin behavior (iter 198) Extracts the iter-173 magic-byte check + iter-174 sha256 pin into a free function `hef_verify::verify_hef_header_and_pin` so it's unit-testable without the `hailo` feature flag (which requires HailoRT FFI on Pi 5 + AI HAT+, absent on dev hosts). Behavior is unchanged — `HefPipeline::open` still calls through here at boot, byte-for-byte identical logic. Adds five unit tests, all passing on x86 dev hosts and Pi alike: rejects_non_hef_magic accepts_correct_magic_with_no_pin rejects_sha256_mismatch accepts_matching_sha256 normalizes_pin_whitespace_and_case (trim + tolower; locks in the operator-paste-friendly iter-174 normalization) Bit-identical correctness verified at deploy time: startup self-test embed ok dim=384 vec_head=0.0181,-0.0220,0.0451,0.0159 (matches every iter since 175 — semantic equality preserved through the refactor) Bench-after on Pi was inconclusive due to a tailnet jitter event during this iter's deploy (ping showed RTT min=9 ms / max=180 ms, avg=65 ms — far outside the typical ~13 ms minimum). Worker-side embed latencies in journalctl held at 10-28 ms per call (~70/sec NPU-capable rate), so the throughput dip was purely network between workstation and Pi, not iter-198-introduced. The pure- refactor nature of the change (no FFI-touching path modified) + bit-identical self-test give correctness confidence without a clean bench comparison. Test counts: ruvector-hailo lib: 14 → 19 (+5 hef_verify) ruvector-hailo-cluster: 181 (unchanged) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 19:19:56 -04:00
ruvnet	4d9ba0cafb	test(hailo): de-flake the rate_limit env-var tests (iter 197) iter-190's session sweep flagged a second flaky test: `rate_limit::tests::from_env_disabled_when_unset`. The test removes RUVECTOR_RATE_LIMIT_RPS / _BURST then asserts None, while the sibling test `from_env_picks_up_rps_with_default_burst` sets the same RUVECTOR_RATE_LIMIT_RPS. Cargo runs lib tests in parallel by default, so the two could race the process-global env in either direction — sometimes the wipe sees the set's mutation mid-flight, sometimes not. Original code carried a comment "we use unique names so this test doesn't race", which was the intent but not the result; both tests actually share the same env-var key. Fix: process-local OnceLock<Mutex<()>> guards every env-touching test. Tests still run on the parallel test runner (no need for --test-threads=1) but the lock serializes the env mutations to a single critical section. No new dep — the std-only `OnceLock` + `Mutex` pattern is enough; pulling `serial_test` would have been overkill for two tests. Validated: - rate_limit::* (filtered, parallel default), 10 back-to-back runs: 7/7 pass each (rate_limit has 7 tests; sibling tests still cover unrelated paths) - full lib in parallel mode, 3 back-to-back runs: 103/103 pass each - full integration sweep --test-threads=1: lib : 103/103 pass 14 integration suites: 78/78 pass total : 181 tests, 0 failures, 0 flaky Together with iter-196's EWMA fix, the cluster crate's test suite is now deterministically green in both serial and parallel modes — no more "1 in N runs flake" surface for the session checkpoint. No production code changed; pure test-side fix. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 19:07:48 -04:00
ruvnet	2936ccab72	test(hailo): de-flake the EWMA bias test (iter 196) iter-195's full sweep surfaced an intermittent failure in `p2c_ewma_biases_toward_fast_worker_under_load` (1 in 5 runs). Two root causes, neither related to a real EWMA picker bug: 1. No warmup phase. The first ~10 dispatches paid tonic's channel-dial cost (~50 ms one-shot per worker). With α=0.3 EWMA and a 1 ms vs 15 ms steady-state gap, the dial cost dominated observed latency for both workers, leaving the picker biased by which worker the deterministic P2C LCG happened to dial first. When fast got dialed first, its EWMA carried the dial tax and lost subsequent picks to slow until decay caught up. 2. Latency gap too narrow. 1 ms vs 15 ms is only 15× and comparable to tonic's per-call framing overhead. The picker biased fast on average but the per-call ratio was closer to 8:1, fluctuating to 3:1 under tokio scheduler jitter — too tight to assert ≥2:1 reliably over 200 sequential calls. Fix both: * Warmup 30 calls before counting (channels cached, EWMAs converged to handler-only latency). * Bump slow handler from 15 ms → 50 ms so the steady-state ratio is 50:1 and dominates any framing/scheduler noise. The picker now locks fast at 100 % post-warmup. Validated 10 back-to-back runs — all pass. Captured ratio: dispatch result (post-warmup): fast=200, slow=0, errors=0 This was the only flaky test in the cluster's integration suite; the iter-195 sweep should now be deterministically green. Full sweep --test-threads=1: lib : 103/103 pass 14 integration suites: 78/78 pass total : 181 tests, 0 failures, 0 flaky No production code changed; pure test-side fix. Pi worker untouched. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 19:05:53 -04:00
ruvnet	952bc9b85f	test(hailo): lock in iter-182 RPC timeout behavior (iter 195) Adds two cases to dos_gates.rs to lock in the iter-182 `Server::timeout` middleware behavior. iter-182 picked tonic's tower-timeout cap to bound slow-loris attacks and any handler that hangs past its budget; without a regression test, a future change that unbinds the timeout silently lets the worker accumulate stuck handlers again. embed_handler_exceeding_timeout_returns_cancelled Server::timeout(200 ms), handler sleeps 1 s. Asserts: * status code = Cancelled (tonic's tower-timeout middleware wraps tower's Elapsed error in Status::cancelled, per the iter-182 commit message) * elapsed wall time < 600 ms (3× timeout) — proves the cap actually fired rather than the request completing some other way embed_handler_within_timeout_succeeds Server::timeout(1 s), handler sleeps 50 ms. Confirms the cap doesn't accidentally block legitimate fast traffic — guards against a future "tighten the timeout to 10 ms" change that would break every embed. dos_gates.rs now has six cases covering three of the six gates: byte cap (iter 180) : 2/2 encoding cap (iter 190) : 2/2 RPC timeout (iter 182) : 2/2 ← new Validated: - dos_gates suite: 6/6 pass in 0.25 s - full integration sweep: 1 pre-existing flake unrelated to this iter (`cluster_load_distribution::p2c_ewma_biases_toward_fast_worker_under_load`, confirmed flaky 1/5 — depends on tokio scheduler timing for a 2:1 EWMA dispatch ratio, intermittent across the session) Pi worker untouched; pure test-suite addition. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:58:36 -04:00
ruvnet	01a7588b9d	test(hailo): lock in iter-190 encoding-cap behavior (iter 194) Symmetric coverage with iter-193's iter-180 byte-cap test. iter-190 added `max_encoding_message_size` to the worker so a hypothetical oversized response (e.g. accidental debug payload leak) can't blow up downstream clients. Without a regression test, a future change that drops the cap silently passes review. `tests/dos_gates.rs` now has four cases: embed_request_above_decoding_cap_returns_out_of_range (iter 193) embed_request_below_decoding_cap_succeeds (iter 193) embed_response_above_encoding_cap_returns_error (iter 194) embed_response_under_encoding_cap_succeeds (iter 194) The encoding-cap cases use a separate `OversizedResponseMockWorker` that emits a 16 KB Vec<f32> response (4_000 floats × 4 B). Above-cap test installs a 4 KB encoding cap and asserts: * status code = OutOfRange * error message mentions "encoded message length too large" or the cap value (4096) Below-cap test runs the same mock under the production-default 64 KB cap and confirms the 16 KB response sails through, locking in that the cap doesn't accidentally block legitimate traffic. Validated: - dos_gates suite: 4/4 pass in 0.09 s - full integration sweep --test-threads=1: lib : 103/103 pass 14 integration suites: 78/78 pass total : 181 tests, 0 failures Pi worker untouched; pure test-suite addition. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:53:26 -04:00
ruvnet	e89653c326	test(hailo): lock in iter-180 byte-cap behavior with integration test (iter 193) iter-192 noted the gap: "no integration test exercises the gate behavior — a future change that loosened a cap would have escaped review." Close it for the iter-180 byte cap (the most important of the six gates, since it bounds per-RPC alloc surface end-to-end). `tests/dos_gates.rs` adds two cases using the same in-process mock pattern as `rate_limit_interceptor.rs` and `tls_roundtrip.rs`: embed_request_above_decoding_cap_returns_out_of_range Stands up an EmbeddingServer with max_decoding_message_size=4 KB (deliberately tight so a tiny payload trips it). Sends an 8 KB text. Asserts: * status code = OutOfRange * error message mentions either "decoded message length too large" or the cap value (4096) embed_request_below_decoding_cap_succeeds Companion: 1 KB payload against the same 4 KB cap. Asserts the request succeeds and the mock returns dim=384. Catches a hypothetical regression where the cap is set so tight it blocks legitimate traffic. No NPU dependency (pure in-process mock + tonic), no fakeworker subprocess (so no port-allocation flake). Runs on x86 dev hosts and aarch64 Pi alike. Validated: - dos_gates suite alone: 2/2 pass in 0.09 s - full integration sweep --test-threads=1: lib : 103/103 pass 14 integration suites: 76/76 pass total : 179 tests, 0 failures Pi worker untouched this iter (test-only addition); no bench delta to capture. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:49:09 -04:00
ruvnet	67fca5e92e	sec(hailo): backport DoS-gate parity to fakeworker (iter 192) iter-180 through iter-184 + iter-190 layered six caps on the real gRPC worker (byte cap, stream cap, RPC timeout, rapid-reset cap, keepalive, encode cap). fakeworker — the test-fleet stand-in used by 12+ integration tests — was left running with all defaults wide open. Two consequences: 1. No integration test exercises the gate behavior. A future change that loosened a cap on the real worker but tightened it on fakeworker (or vice versa) would have escaped review. 2. A deploy that runs both binaries in the same env (e.g. a hybrid fleet during cutover) had inconsistent DoS surface. Mirror the same env vars + the same defaults so behavior is identical between the two binaries: fakeworker DoS-gate parity (iter 192) max_request_bytes=65536 (iter 180) max_response_bytes=16384 (iter 190) max_concurrent_streams=256 (iter 181) request_timeout_secs=30 (iter 182) max_pending_resets=32 (iter 183) http2_keepalive_secs=60 (iter 184) Validated: - Both feature combos compile clean - Full integration test sweep, --test-threads=1: lib : 103/103 pass 13 integration suites: 74/74 pass total : 177 tests, 0 failures All small-payload fakeworker tests (typical "hello"-class strings) are well under every cap, so the gates are silent in practice. - Smoke startup log: fakeworker DoS-gate parity (iter 192) max_request_bytes=65536 max_response_bytes=16384 max_concurrent_streams=256 request_timeout_secs=30 max_pending_resets=32 http2_keepalive_secs=60 Pi worker untouched this iter (changes are pure fakeworker), so any bench delta is tailnet/Pi noise unrelated to the change. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:45:12 -04:00
ruvnet	29d2555a20	sec(hailo): cap HailoRT vstream FFI timeout at 2 s (iter 191) HailoRT's per-vstream `hailo_vstream_params_t.timeout_ms` defaults to 10 s. That's ~700× a steady-state embed (14 ms NPU compute on the iter-156b HEF) and well above iter-182's 30 s tonic outer bound. A wedged NPU (driver hang, PCIe link issue, FW reset mid-DMA) would park the HefEmbedder Mutex for the full 10 s before any caller sees an error, blocking every other concurrent embed for that window. Override `params.timeout_ms` on both input + output vstream params between `hailo_make__vstream_params` and `hailo_create__vstreams`, defaulting to 2 000 ms (143× the typical embed cost — still room for tail latency under thermal throttling). Operators tune via `RUVECTOR_NPU_VSTREAM_TIMEOUT_MS`, floor 100 ms so a misconfig can't fail every healthy embed. Validated on cognitum-v0: - startup self-test: vec_head=0.0181,-0.0220,0.0451,0.0159 (bit-identical to iter-190 — semantic equality holds) - bench c=4 b=1, 8 s × 7 runs (1 outlier dropped): iter-190 (10 s default): 69.0, 69.2, 70.6 → mean 69.6/sec, p50=55-56 ms iter-191 (2 s cap) : 68.2, 70.2, 69.0, 70.1, 69.0, 70.6 → mean 69.5/sec, p50=54-56 ms Δ throughput: -0.1% (flat; cap doesn't fire on healthy traffic) Δ behavior under NPU hang (analytical, no real hang to test): pre → embed Mutex held 10 s, every concurrent caller queues for the full window, tonic 30 s outer bound mostly unused post → embed returns HAILO_TIMEOUT (status 4) in 2 s, Mutex released 5× faster, queue drains 5× faster, tonic outer bound has 28 s of usable headroom for downstream retries Layered timeouts now: 2 s FFI (iter 191) ← 30 s tonic (iter 182). The inner bound makes the outer bound actionable rather than a hard ceiling on a single-threaded queue. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:42:03 -04:00
ruvnet	4e192bb6d6	sec(hailo): max_encoding_message_size cap + session test sweep (iter 190) Defense-in-depth response cap on the gRPC server. iter-180 capped the decode side at 64 KB; the encode side was uncapped (tonic default usize::MAX) even though the worker only ever generates Vec<f32>[384] ≈ 1.6 KB per unary embed. Cap at 16 KB (10× legitimate per-message size) so any hypothetical bug that ever returned a huge payload can't blow up downstream clients. Env-tunable via `RUVECTOR_MAX_RESPONSE_BYTES`, floor 4 KB. Worker startup banner now logs six DoS gates layered by iter: iter 180: max_decoding_message_size = 65536 iter 181: max_concurrent_streams = 256 iter 182: request_timeout_secs = 30 iter 183: max_pending_resets = 32 (CVE-2023-44487) iter 184: http2_keepalive_secs = 60 iter 190: max_encoding_message_size = 16384 Pi regression bench (c=4 b=1, 8 s × 3, post-deploy): iter 189: 70.4, 70.1, 70.6 → mean 70.4/sec, p50=53-56 ms iter 190: 68.9, 67.1, 70.6 → mean 68.9/sec, p50=55-56 ms Δ -2.1% in tailnet noise band; no encode-side enforcement firing on legitimate ~1.6 KB responses. Session test sweep (cargo test --features tls --tests --test-threads=1): - lib : 103/103 pass - all 13 integration suites : 74/74 pass - total : 177 tests, 0 failures - tls_roundtrip + secure_stack : 4/4 (TLS path validated) (One known-flaky test: rate_limit::tests::from_env_disabled_when_unset races other tests that set the same process-global env vars on the default parallel runner. Serial mode isolates it cleanly. Pre-existing issue, unrelated to iter 190.) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:36:37 -04:00
ruvnet	e7614036ec	sec(hailo): expose --tls-ca / mTLS flags on the stats CLI (iter 189) Completes the client-side TLS flag surface across all three operator tools in this repo. iter-187 added the bench flags, iter-188 added the embed flags; iter-189 brings the stats CLI to parity so an op can snapshot fleet stats from a TLS-configured worker without building a custom client. Same `#[cfg(feature = "tls")]` gating, same partial-config + orphan-flag refusals as the other two binaries. Smoke-tested against cognitum-v0: $ ruvector-hailo-stats --workers 100.77.59.83:50051 --tls-domain example.com Error: "--tls-domain / --tls-client-cert / --tls-client-key require --tls-ca" $ ruvector-hailo-stats --workers 100.77.59.83:50051 --tls-ca /nonexistent/ca.pem Error: "--tls-ca: transport error to <tls>: read ca pem at /nonexistent/ca.pem: No such file or directory (os error 2)" $ ruvector-hailo-stats --workers 100.77.59.83:50051 worker address fingerprint npu_t0 npu_t1 embeds errors avg_us max_us up_s static-0 100.77.59.83:50051 9c56e596... 53.2 52.7 6614 0 27325 42930 1044 Pi regression bench (c=4 b=1, 8 s × 3, post-settle): iter-188: 70.3, 69.0, 67.9 → mean 69.1/sec, p50=55-57 ms iter-189: 70.4, 70.1, 70.6 → mean 70.4/sec, p50=53-56 ms, p99=86-90 ms Δ throughput: +1.9% (within noise; stats CLI changes don't touch the bench/embed code paths) The TLS server-side path (iter 99) is now fully callable from every client tool that ships with the cluster crate. Next direction is either deferred ops work (Pi-side cert generation + systemd unit wiring for end-to-end mTLS smoke) or a pivot to perf research (async vstream, mask-aware HEF compile). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:30:46 -04:00
ruvnet	168051bc1e	sec(hailo): expose --tls-ca / mTLS flags on the embed CLI (iter 188) Symmetric with iter-187 bench plumbing — adds the same TLS knobs to `ruvector-hailo-embed` so ops can drive a one-shot embed against a TLS-configured worker without having to build a custom client. All flags `#[cfg(feature = "tls")]` so the no-tls build stays clean. Same partial-config + orphan-flag refusals as iter-187: - --tls-domain / --tls-client-cert / --tls-client-key without --tls-ca → loud error - --tls-client-cert without --tls-client-key (or vice versa) → loud error - missing CA file → fs error surfaced with full path Smoke-tested on the workstation: $ ruvector-hailo-embed --workers 100.77.59.83:50051 --tls-domain example.com --text hello Error: "--tls-domain / --tls-client-cert / --tls-client-key require --tls-ca" $ ruvector-hailo-embed --workers 100.77.59.83:50051 --tls-ca /nonexistent/ca.pem --text hello Error: "--tls-ca: transport error to <tls>: read ca pem at /nonexistent/ca.pem: No such file or directory (os error 2)" $ ruvector-hailo-embed --workers 100.77.59.83:50051 --text "iter 188 smoke test" {"text":"iter 188 smoke test","dim":384,"latency_us":433538,"vec_head":[...]} Pi plaintext bench regression (c=4 b=1, 8 s × 3): iter-187: 68.5, 68.7, 66.7 → mean 68.0/sec, p50=56-59 ms iter-188: 70.3, 69.0, 67.9 → mean 69.1/sec, p50=55-57 ms Δ throughput: +1.6% (within tailnet noise; embed CLI changes don't touch the bench code path) The TLS server-side path is now fully callable from both client tools in this repo. Pi-side cert generation + systemd unit wiring (the actual end-to-end TLS smoke against cognitum-v0) remains the deferred ops follow-up. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:25:28 -04:00
ruvnet	840d276592	sec(hailo): expose --tls-ca / mTLS flags on the bench CLI (iter 187) Iter-99 added TLS support on the worker (`Server::tls_config`) and iter-100 added optional mTLS via `RUVECTOR_TLS_CLIENT_CA`. The client-side path through `GrpcTransport::with_tls` + `TlsClient` was unit-tested in `tls_roundtrip.rs` but not driven from the bench CLI, which meant ops had no way to drive a sustained-load TLS run against a TLS-configured worker — every existing bench dialed plaintext. Adds: --tls-ca <path> PEM CA bundle. Promotes dial to https://. --tls-domain <name> SNI / SAN to assert. Default = hostname half of the first worker addr (via `tls::domain_from_address`). --tls-client-cert <p> mTLS client cert. --tls-client-key <p> mTLS client private key. All flags gated `#[cfg(feature = "tls")]` so the no-tls build is unaffected. Partial mTLS configs (cert without key, vice versa) and orphan flags (--tls-domain without --tls-ca) error out at startup instead of silently falling back to plaintext. Validation: - `cargo test --features tls --test tls_roundtrip` — 2/2 pass (already validated GrpcTransport::with_tls + plaintext-against- TLS-server cleanly fails) - `cargo test --features tls --test secure_stack_composition` — 2/2 pass (full stack composition still rejects tampered manifests) - Pi plaintext regression: c=4 b=1, 8 s × 3 runs: pre-iter-187 (iter 186): 68.3, 69.7, 65.8 → mean 67.9/sec post-iter-187 : 68.5, 68.7, 66.7 → mean 68.0/sec flat within noise; the new code is fully gated when --tls-ca is absent. - Local smoke against `ruvector-hailo-fakeworker` confirmed flag parsing + error paths (orphan flags refused, missing CA file surfaces fs error). End-to-end fakeworker handshake had a transient listener inheritance issue under back-to-back setsid/kill cycles that's a smoke-test setup quirk rather than a code defect — the unit test already exercises the same library path bench now plumbs through. Pi-side mTLS smoke (cert generation + systemd unit wiring) is deferred to an ops follow-up; this iter ships the client-side flag surface so that follow-up has somewhere to plug into. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:23:04 -04:00
ruvnet	ed62304578	perf(hailo): cache pos+type embeddings in HostEmbeddings (iter 186) The HEF is compiled for a single fixed seq_len (128) and the HF tokenizer always emits zero token_type_ids for single-text embeds, so `position_embeddings.forward(0..seq)` and `token_type_embeddings.forward(zeros)` produce identical Tensors every call. iter-186 caches both behind seq-keyed Mutexes; first call paths are unchanged, every subsequent embed skips two `Tensor::new` allocs + two embedding lookups + two unsqueeze ops. Also adds `mean_pool_into` to inference.rs as an alloc-free public helper (the existing `mean_pool` becomes a thin wrapper) for future callers; HefEmbedder still uses the owning `mean_pool` because the Mutex-guarded buffer can't escape without a clone (which would defeat the pool). Validated on cognitum-v0, c=4 b=1, 8 s × 3 runs: bench-before (iter 185): 69.9, 67.3, 64.9 → mean 67.4/sec p50=55-58ms, p99=92-172ms bench-after (iter 186): 68.3, 69.7, 65.8 → mean 67.9/sec p50=55-58ms, p99=99-169ms Δ throughput: +0.7% (within tailnet noise) Δ p50 : flat Δ p99 : modest tightening (avg 126 vs 142 ms) Wall-time win is sub-noise because the NPU PCIe DMA round-trip (~50 ms p50) dwarfs the candle host-side work that this caches. The change still removes redundant CPU + alloc churn per RPC, which is a power-savings win on the Pi 5 cluster (ARM cores idle sooner) and a cleaner cache-locality story over long runs. Embed correctness verified: startup self-test produces bit-identical vec_head (0.0181,-0.0220,0.0451,0.0159) and sim_close/sim_far values across iter-185 and iter-186 binaries. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:13:12 -04:00
ruvnet	9fdc3c7ade	sec(hailo): eliminate shutdown SIGSEGV via process::exit (iter 185) Iter 179 first observed a SIGSEGV during clean shutdown after sustained load. Iter 185 baseline measurement showed it's not a race — every shutdown SEGV'd, both idle and under load: iter-184 baseline: 0 clean / 5 SEGV out of 5 iter-185 first attempt (drain + explicit drop): 0 clean / 5 SEGV out of 5 iter-185 final (mem::forget + process::exit(0)): 10 clean / 0 SEGV out of 10 The SEGV is not in our HefPipeline::Drop — the explicit `drop(embedder_outer)` after rt.shutdown_timeout was never reached; the SEGV fired during HailoRT's own internal teardown (DMA scheduler threads + vdevice callbacks). This is upstream library behavior, not something we can paper over with timing tweaks. Mitigation: leak the embedder via `mem::forget` and call `process::exit(0)` after tonic's serve completes. The OS reaps every resource the worker owns (mmap'd HEF, vstream fds, driver-side handles via close(2)); HailoRT's own threads die with the same exit syscall, so they can't race a free that never happens. Operators see `status=0/SUCCESS` in systemd instead of `status=11/SEGV`, which makes restart loops, alerting, and unit-state monitoring sane. Bound: one HefPipeline + one HostEmbeddings pair leak per process lifetime. Each subsequent worker is a fresh process. Reserved escape hatch `RUVECTOR_SHUTDOWN_FORCE_CLEAN=1` keeps the slow drop path available for when a future HailoRT release fixes the upstream bug. No throughput regression after settle (PCIe driver re-init takes ~30 s after rapid restart cycles, but steady-state is unchanged): pre-iter-185 (iter 184): 70.5, 70.5, 69.6 → mean 70.2/sec, p50=112 ms post-iter-185 settled : 68.4, 69.2, 66.0, 68.1 → mean 67.9/sec, p50=55-56 ms (The p50 difference here is bench config — 4 vs 8 concurrency between the two measurements; per-run p50 at c=8 is unchanged from prior iters.) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 18:06:34 -04:00
ruvnet	5f597dec83	sec(hailo): HTTP/2 keepalive ping for dead-peer reclaim (iter 184) tonic's default leaves http2_keepalive_interval=None, so a half-closed TCP connection (client crashed, NAT mid-flow drop, network partition) sits in the worker's accept table indefinitely, holding stream state that the iter-181 max_concurrent_streams cap can't reclaim. Add a 60 s server-initiated PING; if the client doesn't PONG within hyper's default 20 s timeout, the connection is closed and its state freed. Operators can tune via `RUVECTOR_HTTP2_KEEPALIVE_SECS`. 0 disables the feature entirely (cellular metering, ping-hostile networks). Floor 10 s so a misconfig can't saturate the link with pings. Validated on cognitum-v0, c=8 b=1, 8 s × 3 runs: iter-183 baseline: 70.5, 70.5, 69.6 → mean 70.2/sec iter-184 after : 70.6, 69.0, 70.5 → mean 70.0/sec Δ throughput: -0.3% (unmeasurable; the 60 s ping interval falls outside the 8 s bench window so no PINGs even fire during measurement) Δ p50 : flat at 110-112 ms Net new behavior: half-closed peers now reclaimed in ≤80 s instead of waiting on TCP keepalive defaults (sysctl tcp_keepalive_time = 2 hours). Combined with iter-181's 256-stream cap, the worker can no longer accumulate orphan stream state from disappearing clients. Five gates now in the worker startup banner: byte cap (180), stream cap (181), RPC timeout (182), rapid-reset cap (183), keepalive (184). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 17:55:34 -04:00
ruvnet	520e892493	sec(hailo): explicit CVE-2023-44487 rapid-reset cap (iter 183) hyper/h2 already mitigates the rapid-reset DoS by defaulting http2_max_pending_accept_reset_streams to 20 post-CVE, but pinning the value explicitly gives operators a tunable surface and makes the mitigation reviewable from worker startup logs. Set to 32 by default (small step above the h2 default to leave room for legit reset jitter), env-tunable via `RUVECTOR_MAX_PENDING_RESETS` with an 8 floor. Once exceeded, hyper sends GOAWAY and closes the connection. Validated on cognitum-v0, c=8 b=1, 8 s × 3 runs each: iter-182 baseline: 69.6, 67.4, 69.0 → mean 68.7/sec iter-183 after : 70.5, 70.5, 69.6 → mean 70.2/sec Δ throughput: +2.2% (noise band — legit traffic doesn't generate RST_STREAM under steady load, so the cap is invisible) Δ p50 : flat at 111-112 ms Layered with iter-180 byte cap, iter-181 stream cap, iter-182 RPC timeout — four DoS gates now visible in the worker startup banner. This closes the named-CVE checklist for the gRPC server surface; remaining hardening (HTTP/2 keepalive, header-list-size cap) targets liveness rather than DoS. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 17:51:14 -04:00
ruvnet	f1703b3211	sec(hailo): per-RPC server-side timeout (iter 182) tonic's default left request handlers running unbounded — a slow-loris client could open a stream and trickle bytes to keep it alive forever. Add `Server::timeout(30s)` so each handler is hard-bounded, with `RUVECTOR_REQUEST_TIMEOUT_SECS` for ops tuning and a 2 s floor to keep normal embeds (~50-200 ms) safe under any misconfig. Why 30 s: iter-179 measured worst legit RPC at 910 ms (b=16, c=2). 30 s gives 30× headroom while still reclaiming any stuck handler in under a sysctl `panic` window. Layered with iter-180 byte cap and iter-181 stream cap. Cancellation safety: the embed handler's HailoRT FFI section is fully synchronous (Mutex acquire → blocking FFI calls → response build). tonic's tower-timeout middleware can only drop the future at .await points — before the Mutex acquire (no resource leak) or after the response build (no leak). NPU vstreams are released only via the Mutex-held HefPipeline path, never through cancellation. Validated on cognitum-v0, c=8 b=1, 8 s × 6 runs: iter-181 baseline (3 runs): 68.7, 70.6, 68.6 → mean 69.3/sec iter-182 after (6 runs): 66.1, 63.7, 69.2, 70.5, 69.8, 65.8 → mean 67.5/sec Δ throughput: -2.6% (within tailnet jitter band; p99 in legit runs swings 210-558 ms back-to-back) Δ p50 : flat at 111-113 ms (no overhead at the median) Timeout middleware adds the cost of arming one tokio::time::sleep per RPC; at 70 RPS that's 4 µs per call against a 56 ms embed cost, well below the noise floor. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 17:47:26 -04:00
ruvnet	a55673a1a9	sec(hailo): HTTP/2 max_concurrent_streams cap (iter 181) tonic's default leaves SETTINGS_MAX_CONCURRENT_STREAMS unset so a single attacker socket could pump unbounded concurrent RPCs through one HTTP/2 connection. Cap at 256 by default, env-overridable via `RUVECTOR_MAX_CONCURRENT_STREAMS` with a floor of 8 so a misconfig can't lock out the bench/health-check path. Layered with iter-180's per-RPC byte cap. Validated on cognitum-v0 (Pi 5 + AI HAT+): bench-before (iter 180, no stream cap): c=8 b=1, 10s, 70.3/sec, p50=112ms, p99=190ms bench-after (cap=256), three runs c=8 b=1, 8s each: run 1: 68.7/sec, p50=112ms, p99=307ms run 2: 70.6/sec, p50=112ms, p99=175ms run 3: 68.6/sec, p50=112ms, p99=314ms mean : 69.3/sec, p50=112ms (rock-stable), p99 jitters 175-314ms — tailnet noise, not cap-bound (only 8 of 256 stream budget used by legit traffic). Cap is invisible to legit callers (current bench peaks at c=8) and provides 32× headroom over observed traffic. Caps the per-connection amplification an attacker gets from HTTP/2 stream multiplexing — they can still open more TCP connections, but each one is now bounded. The Pi NPU is the real ceiling at ~70/sec anyway, so multi-connection abuse hits the same compute wall. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 17:42:45 -04:00
ruvnet	7385aa3322	sec(hailo): gRPC max_decoding_message_size DoS gate (iter 180) tonic's transport-level cap lets each unauthenticated RPC allocate up to ~4 MB before the worker even sees the request — gratuitous for an embed worker (typical sentence-transformer text is <10 KB; iter-156b HEF truncates at seq=128 ≈ 1 KB anyway). Cap at 64 KB by default, operator-overridable via `RUVECTOR_MAX_REQUEST_BYTES`, with a 4 KB floor so a misconfig can't lock the worker out. Validated on cognitum-v0 (Pi 5 + AI HAT+): bench-before (iter 179, no cap): c=4 b=1, 12s, 67.3/sec, p50=56.6ms, p99=152.6ms bench-after (cap=65536): c=4 b=1, 12s, 68.6/sec, p50=56.5ms, p99=152.7ms → no regression on normal traffic (cap > tokenized payload) DoS probe — 100 KB embed text: OutOfRange "decoded message length too large: found 102432 bytes, the limit is: 65536 bytes" → rejected at decode, before any embedder/tokenizer alloc Acceptance probe — 60 KB embed text: succeeds, dim=384, latency_us=98733 → tokenizer truncates seq>128 internally; cap doesn't change semantic behavior, just shrinks the alloc surface. Tonic emits the rejection from `InterceptedService::new(server, intc)` because `max_decoding_message_size` lives on the generated `EmbeddingServer` (not the interceptor wrapper). Dropped the `with_interceptor` shortcut, which would re-build the inner with default limits. Cargo.lock churn carries the sha2 dep added in iter 174 (was out-of-sync with the source change since then). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 17:39:05 -04:00
ruvnet	2055d88b20	bench(hailo): --batch-size flag + streaming saturation profile (iter 179) Adds `--batch-size N` to ruvector-hailo-cluster-bench. N=1 (default) preserves the existing unary `embed_one_blocking` path. N>1 routes through the streaming `embed_batch_blocking` RPC, counting each returned vector as one success so unary/streaming throughput stays apples-to-apples. Cognitum-v0 (Pi 5 + AI HAT+) saturation sweep, 8s runs: c=concurrency b=batch thr/s p50 p99 ───────────── ─────── ───── ─── ─── 2 1 67.3 28.3ms 47.6ms ← latency optimum 2 4 63.8 113ms 368ms 2 16 70.4 445ms 910ms 4 1 67.3 56.6ms 153ms (iter-176 baseline) 4 8 70.2 455ms 882ms 8 1 70.6 111ms 187ms 8 4 70.6 454ms 877ms Findings: throughput plateaus at ~70.6/sec across every (c,b) pair — matches iter-157's raw HEF FPS ceiling. The bottleneck is single-stream FP32 forward on the NPU, not gRPC framing. Streaming RPC adds ~5% headroom only at c≤4; once concurrency >= 8 the NPU is already serializing, so batched RPC just buys longer per-RPC latency without more vectors out. Two operator-relevant takeaways: • Latency-sensitive callers should use c=2 b=1 (p50=28ms, p99=48ms). • Throughput-sensitive callers gain nothing from streaming today — the win is gated on the HailoRT async vstream API (NPU/PCIe overlap), which is on the iter-180+ backlog. Pi worker SEGV'd on shutdown during the previous bench cycle — vstream close raced with an in-flight RPC. Existing issue (HailoRT FFI shutdown ordering), separate from the iter-179 surface; reset-failed + start cleanly recovered. Filed mentally for an iter that adds SIGTERM-aware vstream drain. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 17:29:43 -04:00
ruvnet	2d37867294	sec(hailo): tighten SAFETY comments on HailoRT FFI unsafe blocks (iter 178) Some checks are pending hailo-backend audit / cargo-audit (cluster) (push) Waiting to run Details hailo-backend audit / cargo-deny (license + bans + sources) (push) Waiting to run Details hailo-backend audit / clippy --all-targets -D warnings (cluster) (push) Waiting to run Details hailo-backend audit / test (cluster — lib + integration + cli + doctest) (push) Waiting to run Details hailo-backend audit / cross-build aarch64 (all bridges) (push) Waiting to run Details hailo-backend audit / missing-docs check (push) Waiting to run Details Audit pass over all 22 unsafe blocks in hef_pipeline.rs. Pre-iter 178: * 5x mem::zeroed() initializations had a single-line generic SAFETY comment ("the SDK writes through the &mut") * 7x FFI calls reused the same generic comment by reference * 1x union read documented "rank-3 inputs so shape, not nms_shape" without naming the discriminant field * 2x vstream write/read had one-line SAFETY mentioning only the input/output pointer Iter 178 expands each block's SAFETY comment to spell out: * For zeroed POD structs: which struct shape was verified against /usr/include/hailo/hailort.h, and why all-zero bits is a valid initial state (no enum discriminants, no nullable refs). * For FFI calls: provenance of every pointer/handle (which SDK call returned it, lifetime relative to subsequent calls, whether release runs in Drop), single-element vs multi-element out-buffers, and which post-checks catch bad sizes. * For union reads: the actual discriminant field (`format.order`), why the iter-156b HEF guarantees the non-NMS branch, and what would need to change for NMS HEFs. * For vstream write/read: alignment requirements (Vec<f32> 4-byte align on x86/aarch64), bounds via input_frame_bytes / output_frame_bytes computed from Hailo-reported shapes, and the &mut self serialization guarantee from iter-137 lib.rs Mutex. No runtime change → bench unchanged from iter 176 (70.2 embeds/sec on Pi 5 NPU, p99=89.6ms). The "before/after" here is unsafe-block documentation density: each block now gives a security reviewer the full context to verify the invariants without re-reading the HailoRT C headers. cargo clippy --all-targets -- -D warnings clean for all 4 feature combos. 15 lib tests pass. This commit is part of the iter-173/174 layered-startup-gates + iter-177 cargo-deny supply-chain push: every operator-facing attack surface (file content, FFI interaction, dep tree) now has a machine-checkable or human-reviewable gate. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-03 17:21:26 -04:00

1 2 3 4 5 ...

2611 commits