iter-215 added `install-ruvllm-bridge.sh` (closing ADR-178 Gap A's
deploy-artifact gap for the third bridge). cross-build-bridges.sh
already cross-compiles `ruvllm-bridge` (line 36's BINS array, since
iter 122/128), but its trailing operator-hint at lines 141-145 only
named the two daemon bridges' installers — operators copying the
hint missed that ruvllm-bridge has its own installer too.
Updated the hint to:
- List all three installers
- Note ruvllm-bridge ships no systemd unit (subprocess lifecycle,
iter-215 design rationale)
- Use the conventional "pick the bridges you need" phrasing,
since most deploys won't use all three
Validated:
- bash -n on the script: parses clean
- All three install-*.sh referenced exist (iter-216 verified the
rename + file presence)
Pure deploy-script docs hygiene; no code or unit-file change.
Co-Authored-By: claude-flow <ruv@ruv.net>
iter-219's workspace re-inclusion (closing ADR-178 Gap E) had a
foreseeable-but-unspotted side effect on the iter-178
audit workflow: pre-iter-219 the hailo cluster crate had its own
narrower Cargo.lock, so `cargo audit --deny warnings` saw only the
deps that crate directly pulled in. Post-iter-219 with the workspace
lock, cargo-audit reads the wider tree and surfaces three advisories
that **deny.toml had already ignored** (iter 177 + iter 219):
RUSTSEC-2024-0436 paste (unmaintained, transitive
via candle/cpu-fallback)
RUSTSEC-2025-0134 rustls-pemfile (transitive via tonic-tls)
RUSTSEC-2025-0141 bincode 1.x (workspace-wide pin via
rkyv et al.)
cargo-audit and cargo-deny use separate config — deny.toml's
[advisories] ignore list isn't honored by cargo-audit. The fix is
to mirror the same three IDs into the CI workflow's `cargo audit`
invocation as `--ignore` flags.
Verified locally:
Pre-fix: cargo audit --deny warnings → "error: 3 denied warnings"
Post-fix: cargo audit --deny warnings --ignore <three> → exit 0
Each `--ignore` carries a backtick-comment naming the package + why
it's transitive — same rationale as the deny.toml entries so the two
config sources drift together if someone updates one.
This isn't a real new vulnerability — these advisories existed in
the workspace tree all along; iter-219 just exposed them to the
cluster-crate audit step. iter-178's CI gate stays green without
weakening; the substantive remediation (workspace-wide rkyv /
candle-stack updates) belongs to a workspace-wide cleanup iter.
No code change; CI config + workflow comment.
Co-Authored-By: claude-flow <ruv@ruv.net>
Same class as ADR-178 §3.2 F (iter-217 ADR-167 collapse). Two
inline doc comments still claimed pre-iter-163 / pre-iter-218
realities:
1. ruvector-hailo/src/lib.rs `has_model()` — said "Today this is
**always false** — HEF loading isn't wired in yet". Iter 163
made the NPU path canonical (cognitum-v0 + iter-156b HEF),
iter-176 added cpu-fallback automatic failover. Updated to
reflect iter-163+ reality.
2. ruvector-hailo-cluster/src/error.rs module docstring — said
"Maps cleanly onto ruvector_core::EmbeddingError once
iteration 14 brings the path dep." iter-218 landed the
ruvector-core path dep + EmbeddingProvider impl. Updated to
describe the actual iter-218 wiring (ClusterError →
RuvectorError::ModelInferenceError) plus the iter-209
is_terminal() helper that drives the retry-loop short-circuit.
The third stale reference grep hit at cluster/lib.rs:874 is INSIDE
the iter-218 commit's own comment quoting the old (pre-iter-218)
doc text as evidence — that's correctly preserved as historical
context, not a stale doc to fix.
Validated:
- cargo check: clean (doc-only, no type-system change)
No code change; pure docs.
Co-Authored-By: claude-flow <ruv@ruv.net>
iter-204 documented all worker-side env vars in
deploy/ruvector-hailo.env.example. iter-208 added two CLIENT-side
env vars (`RUVECTOR_CLIENT_CONNECT_TIMEOUT_MS` /
`_RPC_TIMEOUT_MS`) read by `GrpcTransport::new()`, which is
constructed by the bench/embed/stats CLIs and the three bridges —
not by the worker. So they correctly don't belong in the worker
.env, but they ARE operator-facing and were undocumented in the
README's "Security & DoS hardening" section.
Add a "Client-side tunables (iter 208)" subsection with a 2-row
table after the systemd-restart-burst block. Explains:
* Why these are separate from the worker env (client-side
GrpcTransport, not worker config)
* The 10s RPC default's relationship to iter-199's batch cap
(256 items × ~14ms NPU = ~3.6s legit batch RPC; 10s leaves
headroom)
* How it composes with iter-182's 30s server-side
request_timeout (client gives up first, server still has
margin to surface a real hang)
Validated:
- 406 → 424 lines (+18)
- Both env vars cross-checked against source:
grpc_transport.rs has both `env::var("RUVECTOR_CLIENT_*")`
reads from iter-208
- Markdown table parses (consistent with existing iter-180-184
table format)
No code change; pure operator-facing docs.
Co-Authored-By: claude-flow <ruv@ruv.net>
Closes ADR-178 Gap D (MEDIUM) iter-219 short-term. The audit flagged
that no consumer in the workspace was actually using
`HailoClusterEmbedder` as an `Arc<dyn EmbeddingProvider>` after
iter-218 made it possible — so even though the trait impl compiled,
the integration claim from ADR-167 §8.4 ("an app holding
`BoxedEmbeddingProvider` swaps a Hailo cluster in with zero code
changes") had no demonstration.
`examples/hailo-cluster-as-provider.rs` does the demonstration in
two modes:
Default (no live workers — CI smoke):
Builds a HailoClusterEmbedder against `null_transport()`,
immediately wraps it as `Arc<dyn EmbeddingProvider>`, asserts
name() == "ruvector-hailo-cluster" and dimensions() == 384,
then calls embed("hello world") to confirm the trait method
actually crosses into HailoClusterEmbedder::embed_one_blocking
(NullTransport refuses by design — that's the expected error
path; the assertion is on the error text, not panic). Proves
iter-218 + iter-219 type wiring still composes; runs in <1s.
Live (RUVECTOR_HAILO_WORKERS=<csv>):
Same construction but with GrpcTransport, embeds an N-doc
corpus (default 50, tunable via RUVECTOR_HAILO_CORPUS_N) through
the trait method, reports ingest QPS, runs a self-similarity
sanity check (cosine of doc[0] against itself should be ≈1.0
and rank top-1 in the corpus). Closes ADR-178 §3.2 D's
"5k-doc corpus" recommendation in spirit (smaller default for
quick smoke; operator can scale up via env).
The example explicitly documents which iter unblocked which line
("Pre-iter-218 this line would have said 'the trait
EmbeddingProvider is not implemented for HailoClusterEmbedder'") so
a future reader can audit the integration history through the code.
Validated:
- cargo check --example hailo-cluster-as-provider: clean (6s)
- Compile success IS the correctness proof — pre-iter-218 the
`Arc<dyn EmbeddingProvider> = Arc::new(cluster)` line would
have refused at the type-system level. It now compiles.
ADR-178 Gap D status: SHORT-TERM SHIPPED (example exists). The
iter-220 mcp-brain client integration remains as separate-ADR
follow-up work per ADR-178 §3.2 D's recommendation.
Co-Authored-By: claude-flow <ruv@ruv.net>
Closes ADR-178 Gap C (MEDIUM) short-term. The bridge's module
docstring and `summary_to_text` doc previously suggested it produced
embeddings useful for "presence / motion / pose downstream
consumers" — implying ADR-171's pose-semantic pipeline. ADR-178 §3.2
C audited the actual code path:
* `summary_to_text` (ruview-csi-bridge.rs:116) packs the 20-byte
ADR-018 header into a fixed-template NL string (channel, rssi,
node_id, antennas, subcarriers).
* The I/Q payload at `bytes 20..` is parsed for length but
otherwise dropped.
* Cosine embeddings of the resulting strings cluster by
`(channel, rssi-bucket, node_id)`, NOT by anything related to
actual WiFi-DensePose pose content.
This is fine — the bridge is correctly named and useful for
telemetry indexing — but ADR-171's pipeline diagram
(`CSI → preprocess → HEF → pose tensor`) implies it does pose
semantics, which it doesn't. Operators reading this file or ADR-171
got confused.
Two doc updates:
1. Module docstring — new "**Important: this bridge is *not*
WiFi-DensePose pose embedding**" section explicitly stating the
telemetry-indexing scope and pointing to the deferred work
(csi-pose-bridge needs a pose HEF, host-side I/Q preprocessing,
and a `HailoPipeline<I, O>` generalization — multi-month, separate
ADR per ADR-178 §3.2 C's long-term recommendation).
2. `summary_to_text` doc — removed the misleading "presence /
motion / pose downstream consumers" phrasing; replaced with a
"Note (iter 220)" block clarifying which fields drive the
similarity surface.
ADR-178 Gap C status: SHORT-TERM CLOSED. Long-term work (the actual
pose-semantic bridge) remains tracked as a separate-ADR follow-up.
Validated:
- cargo check: clean
- RUSTDOCFLAGS="-D missing-docs" cargo doc --bin ruview-csi-bridge:
clean (matches the iter-178 audit CI step)
- No code change; pure doc disambiguation
Co-Authored-By: claude-flow <ruv@ruv.net>
Closes ADR-178 Gap E (HIGH; folded into Gap B). Iter-218 landed the
ruvector-core path dep + EmbeddingProvider impls — the structural
blocker preventing workspace re-inclusion. This iter does the
mechanical part:
Root Cargo.toml:
- Removed `crates/ruvector-hailo`, `crates/hailort-sys`,
`crates/ruvector-hailo-cluster` from `[workspace.exclude]`.
- Added them + `crates/ruvector-mmwave` (also previously
standalone) to `[workspace.members]`.
Per-crate Cargo.toml:
- Stripped `[workspace]` standalone declarations from all four
crates (hailort-sys, ruvector-hailo, ruvector-hailo-cluster,
ruvector-mmwave).
- Comments updated to reference the iter-219 rejoin + ADR-178
Gap E closure.
Per-crate Cargo.lock:
- Removed (`git rm`) — parent workspace's Cargo.lock is now
canonical for the entire tree. CI's `cargo audit` /
`cargo deny check` steps still work from the cluster
subdirectory; they walk up to find the workspace root.
deny.toml (both hailo crates):
- Workspace re-inclusion surfaced 2 advisories that were
previously hidden by the narrower per-crate dep tree:
RUSTSEC-2025-0141 (bincode 1.x unmaintained)
RUSTSEC-2026-0097 (rand unsound w/ custom logger)
- Added to `ignore` list with a comment noting these are
workspace-wide concerns, not hailo-specific. They'll be
addressed in a workspace-wide remediation iter; ignoring
here keeps the per-crate audit step green so the iter-202
CI gate doesn't break on this rejoin.
Validated:
- cargo check --workspace: clean (27s; warnings are pre-existing
in unrelated crates: ruvector-graph-node, rvagent-cli,
ruvector-scipix, mcp-brain-server, etc.)
- cargo deny check (cluster): advisories ok, bans ok,
licenses ok, sources ok
- cargo deny check --all-features (hailo): same — all four ok
- Cluster integration sweep --features tls --test-threads=1:
23 suites, all green; 120 lib tests pass with TLS feature
- 4 newly-included workspace members all build with default
features on x86 (no Pi-only deps pulled in)
Effect: `cargo build --workspace` from the repo root now exercises
the full hailo stack. A workspace-wide refactor (ruvector-core
trait change, security advisory rebuild, clippy bump) can no longer
silently miss the hailo crates the way ADR-178 §3.2 E flagged.
ADR-178 Gap E status: CLOSED. Gap B status: PARTS 1 + 2 SHIPPED;
the only remaining `--backend hailo` ruvector-cli flag wiring is a
follow-up consumer-side iter.
Co-Authored-By: claude-flow <ruv@ruv.net>
Closes ADR-178 Gap B (HIGH) part 1. The headline integration claim
from ADR-167 §2.5 / §8.4 — that an app holding `Arc<dyn
EmbeddingProvider>` could transparently swap a single-Pi
HailoEmbedder for a fleet HailoClusterEmbedder — was never
delivered. Iter-178 audit found:
* Neither hailo crate declared a ruvector-core dep.
* `crates/ruvector-hailo-cluster/src/lib.rs:140-143` honestly
admitted the gap in a doc comment ("Implements
`EmbeddingProvider` once iteration 14 brings the path dep on
`ruvector-core`"). That iter never landed.
* `crates/ruvector-hailo/src/lib.rs:396-405` had a no-op
"signature parity" test that asserted only `T: Send + Sync`,
never that the impl actually existed.
Changes:
1. Add `ruvector-core` path dep to both hailo crates with
`default-features = false` so the reqwest / ort / hnsw stack
stays out of the Pi build. Only the trait + RuvectorError
surface is needed.
2. `impl EmbeddingProvider for HailoEmbedder` (ruvector-hailo).
~10 LOC, delegates to existing inherent methods. `embed`
folds `HailoError → RuvectorError::ModelInferenceError`.
3. `impl EmbeddingProvider for HailoClusterEmbedder`
(ruvector-hailo-cluster). Same shape; `embed` folds
`ClusterError → ModelInferenceError`. `name()` returns the
static `"ruvector-hailo-cluster"` since a cluster is a fleet,
not a single named device.
4. Replace the no-op signature-parity test with a real
impl-bound static assertion:
`fn assert_impl<T: EmbeddingProvider>() {}`
`assert_impl::<HailoEmbedder>();`
This now compile-fails if either the trait drifts or our impl
breaks — catching the same regression class ADR-178 flagged.
Validated:
- hailo lib tests : 21/21 pass (signature_parity now
real impl-bound, was no-op)
- cluster lib tests : 120/120 pass with --features tls
(114 without tls — feature gating
accounts for the 6 TLS-only tests)
- full integration sweep --test-threads=1: 23 suites, all green
- cargo build --release on both crates: clean, no extra deps
pulled in (ruvector-core compiles default-features-off in
~6 s additional)
What this does NOT do (deferred to part 2):
- Workspace re-inclusion (ADR-178 Gap E folds into B). The hailo
crates stay in `[workspace.exclude]` for now because hailort-sys
only links libhailort on Pi 5 + AI HAT+; rejoining requires
confirming the no-feature default still cargo build --workspace
cleanly. Saved for a focused iter so this one can ship the trait
impl without a workspace-config blast radius.
- `ruvector-cli --backend hailo` flag wiring. ADR-167 §2.3 plan;
unblocked by this iter but not in scope.
ADR-178 Gap B status: PART 1 SHIPPED (impl exists). Part 2 (workspace
inclusion + cli flag) tracked for a follow-up iter.
Co-Authored-By: claude-flow <ruv@ruv.net>
Closes ADR-178 Gap F (MEDIUM). ADR-167 had three nested status
snapshots stacked on top of the iter-163 NPU-default banner —
"Earlier (iter 134/135) snapshot — CPU fallback only", "HEF model
surgery (iter 139)", "Earlier (iter 116) snapshot" — each from a
different point in the project's history. An unfamiliar operator
opening the master ADR had to walk past three older worldviews to
find what's true today.
Three changes:
1. Replaced the stratified Status section with a single clean
iter-213+ block: "NPU acceleration is the production default
since iter 163. ~70 embeds/sec/worker, p50=55-57 ms, p99=86-90
ms, 9.6× over cpu-fallback. ADR-176 tracks the EPIC; iters
174-216 layer security/DoS/OOM hardening." Points readers
needing chronology to §9 History.
2. Updated step-10 row in §5 Implementation plan from "exits clean
with NotYetImplemented (gate is HEF compilation only)" to the
iter-145+ reality: "startup self-test embed ok dim=384 → 7 DoS
gates logged → serving addr=0.0.0.0:50051". The
NotYetImplemented exit was true at iter 12; iter 163 made NPU
the default, iter 145 added the self-test, iters 174-216 added
the hardening surface — all unmentioned in the prior text.
3. Hoisted the three stripped snapshot blocks (lines 28-275 of the
prior version) verbatim into a new §9 History appendix at the
bottom. Preserves the full chronological story for anyone
auditing the project's evolution; cross-references that depend
on these stratified snapshots are flagged as migrating to
ADR-176 (the HEF EPIC) where they correctly belong.
ADR-178 Gap F status: CLOSED.
Validated:
- 612 → 638 lines (+26 net = History block header offset + Status
expansion; chronological content preserved verbatim)
- Section ordering: Status → §1-§8 (Decision/Plan/§8 Multi-Pi
added late) → §7 References → §9 History
- All deep links to specific iters in §9 still resolvable
- No code change; pure ADR docs hygiene
Co-Authored-By: claude-flow <ruv@ruv.net>
Closes ADR-178 Gap H (LOW). The mmwave-bridge installer was named
unqualified `install-bridge.sh` since iter 106 — fine when there was
only one bridge, increasingly misleading after iter 123 added
ruview-csi-bridge and iter 124 added ruvllm-bridge. ADR-178 §3.2 H
recommended folding the rename into Gap A (iter 215); shipped as
its own focused commit so the rename is git-traceable separately.
Used `git mv` so blame history follows the file. Updated all 7
references across the deploy tree:
- install-ruview-csi-bridge.sh (companion-of comment)
- install-mmwave-bridge.sh (self-reference in usage line)
- install-ruvllm-bridge.sh (companion-of comment)
- ruvector-mmwave-bridge.env.example (udev rule provenance)
- ruvector-mmwave-bridge.service (User=/Group= comment + udev note)
- 99-radar-ruvector.rules (provenance comment)
- cross-build-bridges.sh (operator hint at line 144)
ADR-178's references to `install-bridge.sh` (lines 83, 96, 337-342)
are intentionally preserved — they're the historical gap evidence
the analysis relies on. Updating them would erase the rationale for
this commit.
Validated:
- bash -n on install-mmwave-bridge.sh + cross-build-bridges.sh
- systemd-analyze verify on ruvector-mmwave-bridge.service
(only "binary missing" error, expected on dev box)
- All three install scripts now consistently named:
install-mmwave-bridge.sh (iter 106 + iter 216 rename)
install-ruview-csi-bridge.sh (iter 123)
install-ruvllm-bridge.sh (iter 215)
ADR-178 Gap H status: CLOSED.
Co-Authored-By: claude-flow <ruv@ruv.net>
Closes ADR-178 Gap A (HIGH). The other two bridges shipped with
deploy automation since iter 106 (mmwave) / iter 123 (csi), but
ruvllm-bridge had no installer or env example — operators had to
hand-build the system user, drop the binary, and write the env file
themselves. iter 207's commit message specifically called this out
as a known gap.
Two artifacts shipped:
install-ruvllm-bridge.sh
Mirror of install-ruview-csi-bridge.sh shape — creates
`ruvector-ruvllm` system user (no home, no shell), drops
/usr/local/bin/ruvllm-bridge, populates /etc/ruvllm-bridge.env
from the example, creates /var/lib/ruvector-ruvllm state dir
at 0750. Idempotent.
ruvllm-bridge.env.example
Operator-facing template with the three required env vars
(WORKERS, FINGERPRINT, DIM) and EXTRA_ARGS for the iter-187/188/
189 TLS / mTLS flag set. Documents `--tls-domain` explicitly
(the iter-207 fix the csi-bridge env got).
**Lifecycle difference vs the other two bridges:** ruvllm-bridge is
a stdin/stdout JSONL adapter, not a UDP/serial daemon. It's spawned
by the parent ruvllm process, reads requests on stdin, writes
responses on stdout, exits on EOF. systemd's daemon model
(start/stop/restart-on-failure) doesn't fit, so this iter
deliberately ships NO `.service` unit. The install script's
exit message documents the parent-managed invocation pattern with
a copy-paste-able example.
Validated:
- bash -n on install script: parse clean
- env file `set -a; . file; set +a`: parse clean
- install script chmod 0755 + executable bit set
- All three bridges now have install + env-example artifacts;
only mmwave + csi have systemd units (correct — the bridge
architectures genuinely differ)
ADR-178 Gap A status: CLOSED.
Co-Authored-By: claude-flow <ruv@ruv.net>
Captures the gap analysis the user requested (goal-planner agent
research, 459 lines, evidence-grounded with file:line citations
matching the ADR-172/iter-176-EPIC house style).
Eight gaps identified, three at HIGH severity:
Gap A ruvllm-bridge missing deploy artifacts
(install-*.sh, *.service, *.env.example, README mention)
— iter 207 specifically called this out; mmwave + ruview-csi
each ship complete bundles, ruvllm doesn't.
Gap B ruvector-core EmbeddingProvider not wired
— neither hailo crate declares a ruvector-core dep;
ADR-167 §2.5/§8.4's headline integration promise is unmet;
the cluster lib.rs:140-143 doc comment literally admits it;
the parity test at lib.rs:396-405 is a no-op (Send + Sync
only).
Gap C ruview-csi-bridge embeds telemetry, not pose-semantic data
— summary_to_text:95-108 packs only the 20-byte ADR-018
header as a string and drops the I/Q payload; the bridge
does telemetry indexing, not the WiFi-DensePose pose-
semantic embedding ADR-171 implies.
Remediation list outlines six iter-sized follow-ups (Gap A first
since it has the smallest blast radius — pure deploy-artifact work
at parity with the existing two bridges). Three larger items
(csi-pose-bridge rewrite, mcp-brain client, LoRaTransport)
correctly flagged for separate ADRs rather than scope creep here.
No code change in this commit; pure planning artifact. The ADR is
in the standard docs/adr/ format with frontmatter relating it to
ADR-167/168/171/172/173/176/177.
Co-Authored-By: claude-flow <ruv@ruv.net>
iter-211's refactor introduced a small docs regression: the
multi-paragraph doc comment that originally explained verify_files
ended up attached to the new private read_with_cap helper, leaving
verify_files (a public function) with no doc. The hailo-backend
audit CI step `RUSTDOCFLAGS="-D missing-docs" cargo doc` would have
flagged this on the next run.
Also caught a follow-up: my first repair pass referenced
`[read_with_cap]` as an intra-doc link, but read_with_cap is
private — rustdoc emits `rustdoc::private_intra_doc_links` when
generating public API docs. Switched to a plain code-style mention
("the private read_with_cap helper") so the link warning clears
without `--document-private-items`.
Validated:
- `cargo check --release` clean (was 1 missing-docs warning)
- `RUSTDOCFLAGS="-D missing-docs" cargo doc --no-deps --lib` clean
(matches the doc-warnings CI step in
.github/workflows/hailo-backend-audit.yml)
- lib tests still 120/120 (semantics unchanged)
- integration sweep all green
No production code change; pure docs hygiene catching the iter-211
regression before it would have failed CI.
Co-Authored-By: claude-flow <ruv@ruv.net>
Pulls in c15326d04 (fix(ruvllm): pin optionalDependencies to 2.0.1)
from main. Single-file change in npm/packages/ruvllm/package.json,
zero overlap with the hailo-backend crates (crates/ruvector-hailo*,
crates/hailort-sys, crates/ruvector-mmwave). Conflict-free 3-way
merge verified via `git merge-tree` before commit.
Iter 213 OOM-bounding sweep + the 38-commit security/audit run
(iter 174-213) on this branch is unaffected.
Continues iter-210/211/212's OOM-bounding sweep across all
operator-controlled file paths. Three remaining boot-time reads in
the ruvector-hailo crate:
vocab.txt (tokenizer.rs::from_vocab_file)
- all-MiniLM-L6-v2: 232 KB
- XLM-RoBERTa large: ~5 MB ceiling
- cap: 16 MB (~70× legit headroom)
config.json (host_embeddings.rs + cpu_embedder.rs)
- BERT-family: <1 KB typically
- cap: 64 KB (64× legit headroom)
Same threat model as iter-210 (manifest), iter-211 (sig + pubkey),
iter-212 (PEM): operator-controlled paths set via env-driven model
dir. A misconfig pointing model_dir at /var/log/* or a binary blob
would otherwise OOM the worker at boot when these files load.
config.json caps in BOTH host_embeddings.rs (NPU path) and
cpu_embedder.rs (cpu-fallback path) — duplicated rather than
factored because the two crates have different error types
(HailoError variants) and the cap value is identical anyway.
Validated:
- 2 new tokenizer test cases (lib tokenizer::tests):
from_vocab_file_rejects_oversized — 32 MB fixture, asserts
rejection with "16 MB cap" or "iter 213" in error
from_vocab_file_accepts_small_vocab — mini_vocab() loads
cleanly, locking in that the cap doesn't block legit use
- hailo lib tests: 19 → 21 (+2)
- hailo cpu-fallback tests: still 27 (unchanged — cap path is
only reached on oversize, which the test fixtures don't trigger)
- cluster integration sweep --test-threads=1: all 23 suites green
Coverage trail now complete for cluster + hailo operator-path reads:
iter 210 FileDiscovery manifest (1 MB)
iter 211 manifest sig + pubkey (16 KB each)
iter 212 TLS PEM via read_pem (1 MB; gates 5 paths)
iter 213 vocab.txt + config.json (16 MB / 64 KB)
Pi worker untouched in code; the gates fire at boot before any RPC
serves traffic.
Co-Authored-By: claude-flow <ruv@ruv.net>
Continues iter-210/211's pattern of OOM-bounding operator-controlled
file paths read at boot. `tls::read_pem` is the single chokepoint for
all five PEM-loading paths in the codebase (server cert, server key,
client cert, client key, client CA bundle), so capping it once gates
all of them.
Same threat model as iter-210 (FileDiscovery manifest) and iter-211
(manifest_sig sig + pubkey): operator-controlled paths set via env
var (RUVECTOR_TLS_CERT, _KEY, _CLIENT_CA, etc.) — a misconfig
pointing one of these at /var/log/syslog or a binary blob would OOM
the worker at boot before rustls ever sees the bytes. 1 MB cap is
~100× a full chain-with-intermediates legitimate PEM (~30 KB peak).
Validated:
- Existing tls tests: 4/4 still pass (domain_from_address coverage
untouched)
- 2 new test cases:
read_pem_rejects_oversized_file — 2 MB pem-shaped fixture,
asserts size-cap rejection with "iter 212" + "byte cap"
read_pem_accepts_small_file — 30-byte legit-shape PEM still
reads cleanly, locking in that the cap doesn't accidentally
block legit traffic
- lib tests: 118 → 120 (+2)
- full integration sweep --test-threads=1: all suites green
Coverage now: every operator-controlled file path on the worker
boot/RPC paths is OOM-bounded. iter-210 (manifest), iter-211
(sig + pubkey), iter-212 (5× PEM via read_pem) — the audit trail
matches the deploy artifact set.
Co-Authored-By: claude-flow <ruv@ruv.net>
Parallel to iter-210's FileDiscovery cap. `manifest_sig::verify_files`
read three operator-controlled paths with no size cap:
- manifest (1 MB legit ceiling, same as iter-210)
- signature (ed25519 ~64 B; 16 KB ceiling = 180× legit)
- pubkey (ed25519 ~32 B hex; 16 KB ceiling = same headroom)
A misconfig (operator pointing /etc/ruvector-hailo/workers.sig at
/var/log/syslog) or an attacker with write access to that directory
could OOM the worker at boot during signature verification — the
read happens before any sig validation can fail. iter-210 closed the
parallel hole on the manifest path itself; this iter closes the
remaining two.
Implementation factors a small `read_with_cap(path, cap, label)`
helper so all three reads share the same stat-then-read pattern. The
caps are constants in the function rather than env vars because:
- Legit values are tiny + fixed (ed25519 is a known size)
- There's no operational need to tune them
- Hardcoding keeps the gate one less surface to misconfigure
Validated:
- Existing sig tests pass: 6/6 (no behavior change for in-spec inputs)
- 2 new test cases:
verify_files_rejects_oversized_signature — 64 KB sig fixture
verify_files_rejects_oversized_pubkey — 64 KB pk fixture
Both assert the rejection text mentions the right label
("signature"/"pubkey") + "iter 211" for traceability.
- lib tests: 116 → 118 (+2)
- full integration sweep: all 23 suites green
No production code change to the worker's hot path; the gate is
operator-side at boot during the manifest signature check.
Co-Authored-By: claude-flow <ruv@ruv.net>
Real audit find: `FileDiscovery::discover` called
`std::fs::read_to_string` on the operator's manifest path with no
size cap. A pathologically large file (operator misconfig pointing
at /var/log/* or a binary blob, or an attacker-corrupted
/etc/ruvector-hailo/workers.txt with write access) would OOM the
worker at boot — and the OOM happens BEFORE the iter-107 ed25519
signature verification, so even signed-only deploys are vulnerable
to "wrong file pointed at" misconfigs.
Fix: stat the file first; refuse if it exceeds 1 MB. Legitimate
fleet manifests are one `name = host:port` per worker (~100 B/line);
even a 1000-worker tailnet fits in <100 KB. 1 MB is 10× legit
headroom + a clean error message that names the cap and links to
the iter for traceability. The cap fires BEFORE the iter-107
signature check so a giant file fails fast — verifying a 1 GB
"signed" manifest would be slow even though it'd ultimately reject.
Validated:
- Unit tests added (lib discovery::tests):
file_discovery_rejects_oversized_manifest — writes a 2 MB
fixture, asserts ClusterError::Transport with the cap
rejection text mentioning "iter 210" + "byte cap"
file_discovery_accepts_small_manifest — well-under-cap
manifest parses to 2 WorkerEndpoints, locking in that
the cap doesn't accidentally block legitimate use
- lib tests: 114 → 116 (+2)
- full integration sweep --test-threads=1: 13 suites, all green
No production code change to the worker itself; the FileDiscovery
gate is operator-side at boot.
Co-Authored-By: claude-flow <ruv@ruv.net>
Real audit find: `embed_one_blocking_with_request_id` retried EVERY
error up to MAX_DISPATCH_RETRIES=2 (3 total attempts). For transient
failures (network blip, worker crash, deadline_exceeded) that's
correct. For deterministic errors that won't change on retry, it
makes things actively worse:
iter-180 byte cap (OutOfRange) : 3 hammered worker calls,
all guaranteed to fail
identically. Each wastes
worker NPU + bandwidth.
iter-199 batch cap (InvalidArgument) : same.
iter-104/200 rate limit (ResourceExhausted):
retrying makes things
*worse* — every retry
consumes another token
from the same peer's
bucket via the
interceptor + iter-200
check_n debit, deepening
the rate-limit hole the
caller is already in by
3×.
DimMismatch / FingerprintMismatch : worker is structurally
wrong; retry can't help.
Add `ClusterError::is_terminal()` that string-matches the wrapped
gRPC Status (tonic's Display includes "status: <Code>") for the
three deterministic codes plus the two structural variants. Wire
into the retry loop: terminal errors return immediately; transient
errors keep their existing retry behavior.
The string-match approach was chosen over plumbing `tonic::Code`
through ClusterError::Transport because the latter would touch
~30 call sites + ripple through ClusterError's Display impl. The
match patterns are stable (tonic 0.12 Status::code() Display is
"status: <Code>" verbatim) and unit-tested with 6 cases below to
catch any future drift.
Validated:
- lib tests : 108 → 114 (+6 error::tests::is_terminal_*)
- full sweep (--features tls, --test-threads=1): all 23 suites green
(lib + 22 integration suites unchanged in pass count)
- test cases cover:
OutOfRange (byte cap) ✓
InvalidArgument (batch cap) ✓
ResourceExhausted (rate limit) ✓
DimMismatch (structural) ✓
FingerprintMismatch (structural) ✓
DeadlineExceeded / Cancelled / Internal ← NOT terminal,
legit retry candidates ✓
NoWorkers / AllWorkersFailed ← aggregate, not
per-attempt ✓
Behavior change for callers:
Before: 3-attempt retries on byte/batch/rate-limit errors,
~3× extra wasted server work + worse rate-limit damage.
After: immediate clean error, server work drops to 1 attempt,
rate-limit token consumption matches the original
1-RPC-1-token contract.
Co-Authored-By: claude-flow <ruv@ruv.net>
Real audit find: iter-199 raised the worker's `max_batch_size` to 256
(rejecting larger batches). The cluster client's `GrpcTransport::new`
default rpc_timeout was 2 s — set in iter 92 when the only RPC was
unary embed at ~14 ms each. With iter-199's batched streaming, a
single legitimate embed_stream RPC at b=256 needs
256 items × ~14 ms NPU = ~3.6 s
of server-side time. The 2 s client deadline cuts it off mid-flight,
guaranteeing `Status::deadline_exceeded` for every b≥128 batch even
though the worker would have completed the work cleanly. The
iter-182 30 s server-side `request_timeout` never gets a chance to
fire because the client gives up first.
Fix: bump default rpc_timeout to 10 s (2.7× headroom over the b=256
worst case, still well under iter-182's 30 s outer bound — so a real
hung worker still surfaces to the client within its own timeout).
Make both connect + rpc timeouts env-tunable for ops:
RUVECTOR_CLIENT_CONNECT_TIMEOUT_MS default 5000, floor 100
RUVECTOR_CLIENT_RPC_TIMEOUT_MS default 10000, floor 100
Floors prevent a misconfig (e.g. =0) from immediately failing every
RPC.
iter-179's streaming saturation sweep peaked at b=16 (224 ms NPU
time) so didn't catch this — the bug only manifests at higher batch
sizes that the iter-199 ceiling first made viable.
Validated:
- Both feature-combo builds clean
- Cluster integration tests still pass:
tls_roundtrip : 2/2
cluster_load_distribution: 12/12
- Smoke against Pi worker with overrides set:
RUVECTOR_CLIENT_RPC_TIMEOUT_MS=15000
RUVECTOR_CLIENT_CONNECT_TIMEOUT_MS=8000
→ bench runs cleanly (env vars accepted, no parse error)
- Clippy clean (-D warnings)
No production code changed for the worker; pure transport-side
correction. Pi worker untouched.
Co-Authored-By: claude-flow <ruv@ruv.net>
Audit of bridge env examples found a docs inconsistency:
- mmwave-bridge.env.example : listed all 4 TLS flags
(--tls-ca, --tls-domain,
--tls-client-cert, --tls-client-key)
- ruview-csi-bridge.env.example: listed only 3 — omitted --tls-domain
Both bridge binaries parse `--tls-domain` (verified: src/bin/
ruview-csi-bridge.rs:135 + src/bin/mmwave-bridge.rs:121). When the
cluster's worker cert SAN is a DNS name (e.g. server.crt issued for
"worker.local") and the bridge dials via IP (the
RUVECTOR_CSI_WORKERS default 100.77.59.83:50051), rustls validates
the cert SAN against the SNI — which defaults to "100.77.59.83" if
--tls-domain isn't set. That fails the hostname check and the
bridge can't reach the cluster.
Without the docs, an operator hitting this had no obvious way to fix
it short of grep'ing the binary. The csi-bridge env example now
mirrors the mmwave-bridge layout: lists all 4 flags with a clear
note on when each is needed.
Validated:
- bash sources the file cleanly
- 34 → 41 lines
No code change; pure docs alignment.
Co-Authored-By: claude-flow <ruv@ruv.net>
Audit of operator-facing docs found the cluster crate's 358-line
README contained zero references to any of the iter 174-205 security
work. Operators evaluating the project couldn't tell the worker
ships with eight layered DoS gates, an opt-in HEF sha256 pin, mTLS
support, or systemd restart-rate limiting — all of which had to be
discovered by reading worker.rs, deploy/ruvector-hailo.env.example,
or the .service file.
Add a "Security & DoS hardening" section between QUICKSTART and "What
it ships":
- Table of the 8 gRPC-surface gates (iter 180/181/182/183/184/190/
191/199) with iter / env var / default / floor / what-it-bounds.
- Three orthogonal tracks called out:
HEF integrity pin (iter 174) — sha256 verification at boot
Per-peer rate limit (iter 104/200) — incl. iter-200's per-item
debit on streaming RPCs so the throttle isn't defeated by
batching
TLS + mTLS (iter 99/100) — server-side env-var contract +
symmetric client flags from iter 187/188/189
- Shutdown hardening (iter 185) — why the worker exits via
`process::exit(0)` instead of clean drop, and the
RUVECTOR_SHUTDOWN_FORCE_CLEAN escape hatch for the future
upstream fix.
- systemd restart-burst cap (iter 205) — bounded retry vs the
pre-iter-205 forever-cycling behavior.
Pointer to deploy/ruvector-hailo.env.example for full per-knob
rationale (the iter-204 docs).
Validated:
- 358 → 406 lines, +48 lines of operator-facing security docs
- Every env var referenced in the new section traces back to
source code (loop-checked across both crates)
- Markdown is well-formed (heading hierarchy, table syntax, intra-
repo link to ../../docs/adr/* preserved)
No production code changed; pure docs.
Co-Authored-By: claude-flow <ruv@ruv.net>
Audit of the deploy systemd units found a real reliability gap. All
three (worker + mmwave-bridge + ruview-csi-bridge) carry
`Restart=on-failure` + `RestartSec=2` so a transient crash recovers
quickly. But none had `StartLimitBurst` / `StartLimitIntervalSec`
set, so a unit that fails *every* startup (worker: bad
RUVECTOR_HEF_SHA256 from iter 174, missing model.hef, vstream alloc
fail; bridges: missing UART device, malformed worker manifest) cycles
every 2 s forever — churning the journal and (for the worker)
spinning the NPU vdevice.
Add to each unit's [Unit] section:
StartLimitBurst=5
StartLimitIntervalSec=60
Now after 5 failed starts inside a 60 s window systemd parks the
unit in `failed` state — operator sees a clear stop instead of a
log flood. Iter-185's clean shutdown path (`process::exit(0)`) is
treated as success and doesn't count toward the burst.
Validated:
- `systemd-analyze verify` on all three units → clean parse
(only "binary missing" errors, expected on dev box where the
binaries aren't installed)
No production code changed; pure deploy-side hygiene.
Co-Authored-By: claude-flow <ruv@ruv.net>
Audit of the operator-facing deploy artifacts found
`deploy/ruvector-hailo.env.example` was 50 lines covering only
RUVECTOR_WORKER_BIND, RUVECTOR_MODEL_DIR, RUST_LOG,
RUVECTOR_CPU_FALLBACK_POOL_SIZE, and RUVECTOR_HEF_SHA256. The 9
DoS-hardening env vars added in iter 180-200 plus the 4 longstanding
ADR-172 §3 vars (rate limit, audit log mode, TLS, mTLS) had no
operator-facing documentation. Operators tuning the worker had to
read the worker.rs module docstring or grep the binary's startup
log to discover what knobs existed.
Add a "DoS gate stack" block listing every gate with:
- which iter introduced it
- default value (commented out — same value the worker logs at
startup, so deployers see the canonical setting without
activating it)
- the floor enforced in worker.rs that prevents a misconfig
from locking out legitimate traffic
- one-paragraph rationale linking back to the iter that proved
the gate was needed
Plus four pre-existing ADR-172 §3 vars (rate limit, audit log mode,
TLS, mTLS) that were similarly undocumented in this artifact.
Validated:
- bash sources the file cleanly: `set -a; . env.example; set +a`
→ "parse ok"
- every documented env var resolves to source code in
crates/ruvector-hailo-cluster/src or crates/ruvector-hailo/src
(loop-checked; no MISSING IN SRC output)
- 50 → 143 lines, +93 lines of operator-facing documentation
Pi worker untouched; pure docs change.
Co-Authored-By: claude-flow <ruv@ruv.net>
iter-192 brought 6 of the worker's gRPC DoS gates to fakeworker for
parity. iter-199 added the 7th gate (`embed_stream` batch-size cap)
to the real worker but **didn't backport it** — fakeworker silently
processed batches of any size while the real worker rejected them.
Same parity-drift problem iter-192 was meant to prevent.
Audited end-to-end during iter 203: confirmed iter-192 gates fire
correctly on fakeworker (over-cap 8 KB → OutOfRange "found 8223
bytes, limit 4096"), but `embed_stream` accepted unbounded batches
because it never checked length.
Backport adds a `max_batch_size` field to FakeWorker (read from the
same `RUVECTOR_MAX_BATCH_SIZE` env, same default 256, same floor 1
as the real worker, iter 199). The handler refuses oversized batches
with `Status::invalid_argument` matching the real worker's error
text, so any test that asserted the rejection format keeps working.
Validated:
- Cluster integration sweep --test-threads=1: 186/186 pass
(legit fakeworker test batches all fit under 256 default — no
existing test breaks; the cap is invisible to legitimate use)
- End-to-end smoke against `RUVECTOR_MAX_BATCH_SIZE=8`:
startup banner: "fakeworker DoS-gate parity (iter 192/203) ...
max_batch_size=8"
over-cap (b=16): 493 376 fast rejections, 0 successful
under-cap (b=4): 99 709 RPCs/sec × 4 vectors = ~400k/sec
(zero-latency mock — purely tonic+gRPC framing throughput)
- iter-192 byte cap still fires: tested
`RUVECTOR_MAX_REQUEST_BYTES=4096` against an 8 KB embed →
OutOfRange "found 8223 bytes, the limit is: 4096 bytes"
Eight DoS gates now mirrored on fakeworker (iter 180/181/182/183/
184/190 from iter-192 + iter-199 from this iter). iter-200's per-item
rate-limit debit doesn't backport because fakeworker has no rate
limiter (intentional — pure mock for transport-level testing).
Co-Authored-By: claude-flow <ruv@ruv.net>
Audit found two related issues:
1. Iter 177 added deny.toml to BOTH the cluster and hailo crates,
but CI only audited the cluster's. The hailo crate's
candle / tokenizers / safetensors chain (cpu-fallback feature)
and hailort-sys FFI surface (hailo feature) were ungated.
2. Both deny.toml files set `wildcards = "deny"`, which
cargo-deny applies to path deps too. The cluster has path
deps on ruvector-hailo, ruvector-mmwave, hailort-sys — so the
`bans` check would fail on `cargo deny check` if anyone ran
it. The CI step ran but apparently never gated; running it
locally now surfaces:
error[wildcard]: found 1 wildcard dependency for crate
'ruvector-hailo' ...
bans FAILED
Fix:
- Add `allow-wildcard-paths = true` to both deny.toml [bans]
sections. cargo-deny only honors this on non-publishable
crates, so also mark both crates `publish = false`. Both
are internal-only (path deps to hailort-sys make them
unpublishable to crates.io anyway), so the publish flip is
correct hygiene independent of cargo-deny.
- Add a second `cargo deny` step in the hailo-backend-audit
workflow that runs in `crates/ruvector-hailo` with
`--all-features` so the cpu-fallback + hailo feature surfaces
are audited.
- Add three new test/clippy steps for the hailo crate so iter-198's
hef_verify cases (and iter-186 host_embeddings, iter-191
hef_pipeline patches) are explicitly gated:
cargo test (default features)
cargo test --features cpu-fallback (hef_verify + tokenizer)
cargo clippy --all-targets -D warnings
Validated locally:
Both crates: cargo deny check → advisories ok, bans ok,
licenses ok, sources ok
hailo lib : 19 tests pass (default)
26 tests pass (--features cpu-fallback)
hailo clippy: clean
cluster lib: 108 tests still pass
No production code changed; pure CI + crate-config hygiene. Pi
worker untouched.
Co-Authored-By: claude-flow <ruv@ruv.net>
iter-200 added `RateLimiter::check_n(peer, n)` to debit the
streaming-batch length against the per-peer rate limiter, then
wired it into `embed_stream`. Both code paths shipped without
direct test coverage. Add five focused unit tests covering the
contract:
check_n_zero_is_a_noop
n=0 must not consume tokens (the embed_stream caller passes
n-1 after the interceptor's 1, so for batch=1 the call is
n=0). Repeated zero-calls don't burn the bucket; a normal
check still succeeds afterwards.
check_n_within_burst_consumes_n_tokens
1 rps / burst 5: check_n(3) leaves 2 tokens; two more singleton
checks pass; the third fails. Locks in the "actually consumes
n tokens" property.
check_n_exceeding_burst_is_denied
1 rps / burst 4: check_n(8) returns Err (governor's
InsufficientCapacity collapsed to RateLimitDenied). The bucket
is unchanged — the failed attempt does NOT burn any tokens, so
4 singleton checks still pass after.
check_n_partial_capacity_denied_without_consuming
Burn 2 of 4, then check_n(3) — tokens-needed (2 + 3 = 5) > 4 so
denied. The 2 already-burned tokens stay burned; the failed
check_n doesn't roll them back. Verifies the failure mode is
"deny + don't side-effect."
check_n_separate_peers_have_independent_buckets
A streaming-batch debit on peer-a must not bleed into peer-b's
quota — proves the per-peer keying still holds for check_n.
Validated:
- rate_limit lib tests: 7 → 12 (+5 iter 201)
- full lib : 103 → 108
- full integration sweep : 181 → 186 tests, 0 failures
- all flaky tests still green (iter-196/197 fixes hold)
Pi worker untouched; pure test-side addition.
Co-Authored-By: claude-flow <ruv@ruv.net>
iter-104's per-peer rate limiter ran in the gRPC interceptor, which
fires once per RPC regardless of body shape. With iter-199's 256-batch
ceiling, that meant a peer rate-limited at 1 RPS could still extract
256 embeds/sec by sending one streaming RPC per second — defeating
the iter-104 throttle entirely. iter-199 closed the worst case (the
~16 k-batch DoS), but a rate-limited peer was still 256× over budget.
Fix: in `embed_stream`, after the batch-size cap check passes, debit
the rate limiter by `n - 1` more tokens (the interceptor already
counted the first one). Total debit per RPC = batch length, so a
1 RPS peer is genuinely capped at 1 embed/sec end-to-end whether
they send one unary RPC or one batched RPC.
Adds `RateLimiter::check_n(peer, n)` wrapping governor's `check_n`
+ NonZeroU32 + InsufficientCapacity → RateLimitDenied collapse.
n == 0 short-circuits to Ok(()).
Path is a no-op when the limiter is None (default deploy), so unary
RPS-only fleets see no behavior change. When enabled, denied batches
return Status::resource_exhausted and bump the same shared counter
the iter-105 stats endpoint surfaces.
Validated:
- rate_limit lib tests: 7/7 pass (existing coverage holds)
- Pi self-test: vec_head=0.0181,-0.0220,0.0451,0.0159 (unchanged)
- Pi unary bench c=4 b=1, 8 s × 3:
66.5, 58.8, 57.8 → mean 61.0/sec, p50=56-63 ms
(tailnet jitter active during this iter; worker-side latency
was ~16-28 ms in journalctl, so the dip was network)
- Pi streaming bench c=1 b=16, 6 s:
46.8 RPCs/sec × 16 vectors = 749 vectors/sec, 0 errors,
p50=255 ms/RPC = 16 ms/item — NPU-rate as expected,
iter-200's `n > 1` branch hit but no-op'd (limiter=None).
End-of-session DoS gate stack is now seven gates layered:
iter 180 decoding cap 64 KB
iter 181 max_concurrent_streams 256
iter 182 request_timeout 30 s
iter 183 rapid-reset cap 32
iter 184 http2_keepalive 60 s
iter 190 encoding cap 16 KB
iter 199 embed_stream batch 256
iter 200 rate-limit batch debit per-item accounting
Co-Authored-By: claude-flow <ruv@ruv.net>
Real DoS vector found by audit: `embed_stream` accepted unbounded
`EmbedBatchRequest.texts.len()`. The iter-180 64 KB byte cap bounded
the encoded request size, but tightly-packed 1-byte texts (each ~3 B
proto framing + 1 B string) fit ~16 k entries inside that envelope.
Each entry triggers a serial ~14 ms NPU embed, holding the worker
connection for ~228 s — well past the iter-182 30 s tonic timeout
(which kicks the connection but doesn't unblock the in-flight FFI
work).
Add `RUVECTOR_MAX_BATCH_SIZE` (default 256, floor 1) on the worker
side. iter-179's streaming saturation sweep peaked at b=16, so 256
is 16× legit headroom. Over-cap requests return InvalidArgument
instantly; under-cap requests are unaffected.
Validated on cognitum-v0:
Startup banner now logs seven gates (added iter 199):
embed_stream batch-size cap set ... max_batch_size=256
DoS probe — bench --batch-size 300 (over cap), 4 s, c=1:
20 700 fast rejections, 0 successful
Worker log: "embed_stream batch too large — rejecting
batch_size=300 max_batch_size=256" with request_id
Acceptance probe — bench --batch-size 16 (under cap), 6 s, c=1:
46.9 RPCs/sec × 16 vectors/RPC = 750 vectors/sec
p50 per RPC = 249 ms (= 16 ms/item, NPU-rate-bound)
0 errors
Worker fleet stats post-iter-199:
avg_us=23694 (healthy NPU rate ~70 embeds/sec)
errors=0, NPU temps 55.2/54.8 °C
Self-test bit-identical (vec_head=0.0181,-0.0220,0.0451,0.0159).
Unary regression bench was inconclusive — a tailnet jitter event
was active during this iter (ping showed RTT 14-280 ms vs the
typical 13 ms minimum). Worker-side avg latency held at ~24 ms
(GetStats), so the bench dip was network, not iter-199-introduced.
Co-Authored-By: claude-flow <ruv@ruv.net>
Extracts the iter-173 magic-byte check + iter-174 sha256 pin into a
free function `hef_verify::verify_hef_header_and_pin` so it's
unit-testable without the `hailo` feature flag (which requires
HailoRT FFI on Pi 5 + AI HAT+, absent on dev hosts). Behavior is
unchanged — `HefPipeline::open` still calls through here at boot,
byte-for-byte identical logic.
Adds five unit tests, all passing on x86 dev hosts and Pi alike:
rejects_non_hef_magic
accepts_correct_magic_with_no_pin
rejects_sha256_mismatch
accepts_matching_sha256
normalizes_pin_whitespace_and_case (trim + tolower; locks in
the operator-paste-friendly
iter-174 normalization)
Bit-identical correctness verified at deploy time:
startup self-test embed ok dim=384
vec_head=0.0181,-0.0220,0.0451,0.0159 (matches every iter
since 175 — semantic equality preserved through the refactor)
Bench-after on Pi was inconclusive due to a tailnet jitter event
during this iter's deploy (ping showed RTT min=9 ms / max=180 ms,
avg=65 ms — far outside the typical ~13 ms minimum). Worker-side
embed latencies in journalctl held at 10-28 ms per call (~70/sec
NPU-capable rate), so the throughput dip was purely network
between workstation and Pi, not iter-198-introduced. The pure-
refactor nature of the change (no FFI-touching path modified) +
bit-identical self-test give correctness confidence without a
clean bench comparison.
Test counts:
ruvector-hailo lib: 14 → 19 (+5 hef_verify)
ruvector-hailo-cluster: 181 (unchanged)
Co-Authored-By: claude-flow <ruv@ruv.net>
iter-190's session sweep flagged a second flaky test:
`rate_limit::tests::from_env_disabled_when_unset`. The test removes
RUVECTOR_RATE_LIMIT_RPS / _BURST then asserts None, while the sibling
test `from_env_picks_up_rps_with_default_burst` sets the same
RUVECTOR_RATE_LIMIT_RPS. Cargo runs lib tests in parallel by default,
so the two could race the process-global env in either direction —
sometimes the wipe sees the set's mutation mid-flight, sometimes not.
Original code carried a comment "we use unique names so this test
doesn't race", which was the intent but not the result; both tests
actually share the same env-var key.
Fix: process-local OnceLock<Mutex<()>> guards every env-touching
test. Tests still run on the parallel test runner (no need for
--test-threads=1) but the lock serializes the env mutations to a
single critical section. No new dep — the std-only `OnceLock` +
`Mutex` pattern is enough; pulling `serial_test` would have been
overkill for two tests.
Validated:
- rate_limit::* (filtered, parallel default), 10 back-to-back runs:
7/7 pass each (rate_limit has 7 tests; sibling tests still
cover unrelated paths)
- full lib in parallel mode, 3 back-to-back runs:
103/103 pass each
- full integration sweep --test-threads=1:
lib : 103/103 pass
14 integration suites: 78/78 pass
total : 181 tests, 0 failures, 0 flaky
Together with iter-196's EWMA fix, the cluster crate's test suite
is now deterministically green in both serial and parallel modes —
no more "1 in N runs flake" surface for the session checkpoint.
No production code changed; pure test-side fix.
Co-Authored-By: claude-flow <ruv@ruv.net>
iter-195's full sweep surfaced an intermittent failure in
`p2c_ewma_biases_toward_fast_worker_under_load` (1 in 5 runs). Two
root causes, neither related to a real EWMA picker bug:
1. **No warmup phase.** The first ~10 dispatches paid tonic's
channel-dial cost (~50 ms one-shot per worker). With α=0.3 EWMA
and a 1 ms vs 15 ms steady-state gap, the dial cost dominated
observed latency for both workers, leaving the picker biased
by which worker the deterministic P2C LCG happened to dial
first. When fast got dialed first, its EWMA carried the dial
tax and lost subsequent picks to slow until decay caught up.
2. **Latency gap too narrow.** 1 ms vs 15 ms is only 15× and
comparable to tonic's per-call framing overhead. The picker
biased fast on average but the per-call ratio was closer to
8:1, fluctuating to 3:1 under tokio scheduler jitter — too
tight to assert ≥2:1 reliably over 200 sequential calls.
Fix both:
* Warmup 30 calls before counting (channels cached, EWMAs
converged to handler-only latency).
* Bump slow handler from 15 ms → 50 ms so the steady-state ratio
is 50:1 and dominates any framing/scheduler noise. The picker
now locks fast at 100 % post-warmup.
Validated 10 back-to-back runs — all pass. Captured ratio:
dispatch result (post-warmup): fast=200, slow=0, errors=0
This was the only flaky test in the cluster's integration suite;
the iter-195 sweep should now be deterministically green.
Full sweep --test-threads=1:
lib : 103/103 pass
14 integration suites: 78/78 pass
total : 181 tests, 0 failures, 0 flaky
No production code changed; pure test-side fix. Pi worker untouched.
Co-Authored-By: claude-flow <ruv@ruv.net>
Adds two cases to dos_gates.rs to lock in the iter-182
`Server::timeout` middleware behavior. iter-182 picked tonic's
tower-timeout cap to bound slow-loris attacks and any handler that
hangs past its budget; without a regression test, a future change
that unbinds the timeout silently lets the worker accumulate stuck
handlers again.
embed_handler_exceeding_timeout_returns_cancelled
Server::timeout(200 ms), handler sleeps 1 s. Asserts:
* status code = Cancelled (tonic's tower-timeout middleware
wraps tower's Elapsed error in Status::cancelled, per the
iter-182 commit message)
* elapsed wall time < 600 ms (3× timeout) — proves the cap
actually fired rather than the request completing some
other way
embed_handler_within_timeout_succeeds
Server::timeout(1 s), handler sleeps 50 ms. Confirms the cap
doesn't accidentally block legitimate fast traffic — guards
against a future "tighten the timeout to 10 ms" change that
would break every embed.
dos_gates.rs now has six cases covering three of the six gates:
byte cap (iter 180) : 2/2
encoding cap (iter 190) : 2/2
RPC timeout (iter 182) : 2/2 ← new
Validated:
- dos_gates suite: 6/6 pass in 0.25 s
- full integration sweep: 1 pre-existing flake unrelated to this
iter (`cluster_load_distribution::p2c_ewma_biases_toward_fast_worker_under_load`,
confirmed flaky 1/5 — depends on tokio scheduler timing for
a 2:1 EWMA dispatch ratio, intermittent across the session)
Pi worker untouched; pure test-suite addition.
Co-Authored-By: claude-flow <ruv@ruv.net>
Symmetric coverage with iter-193's iter-180 byte-cap test. iter-190
added `max_encoding_message_size` to the worker so a hypothetical
oversized response (e.g. accidental debug payload leak) can't blow
up downstream clients. Without a regression test, a future change
that drops the cap silently passes review.
`tests/dos_gates.rs` now has four cases:
embed_request_above_decoding_cap_returns_out_of_range (iter 193)
embed_request_below_decoding_cap_succeeds (iter 193)
embed_response_above_encoding_cap_returns_error (iter 194)
embed_response_under_encoding_cap_succeeds (iter 194)
The encoding-cap cases use a separate `OversizedResponseMockWorker`
that emits a 16 KB Vec<f32> response (4_000 floats × 4 B). Above-cap
test installs a 4 KB encoding cap and asserts:
* status code = OutOfRange
* error message mentions "encoded message length too large" or
the cap value (4096)
Below-cap test runs the same mock under the production-default
64 KB cap and confirms the 16 KB response sails through, locking
in that the cap doesn't accidentally block legitimate traffic.
Validated:
- dos_gates suite: 4/4 pass in 0.09 s
- full integration sweep --test-threads=1:
lib : 103/103 pass
14 integration suites: 78/78 pass
total : 181 tests, 0 failures
Pi worker untouched; pure test-suite addition.
Co-Authored-By: claude-flow <ruv@ruv.net>
iter-192 noted the gap: "no integration test exercises the gate
behavior — a future change that loosened a cap would have escaped
review." Close it for the iter-180 byte cap (the most important of
the six gates, since it bounds per-RPC alloc surface end-to-end).
`tests/dos_gates.rs` adds two cases using the same in-process mock
pattern as `rate_limit_interceptor.rs` and `tls_roundtrip.rs`:
embed_request_above_decoding_cap_returns_out_of_range
Stands up an EmbeddingServer with max_decoding_message_size=4 KB
(deliberately tight so a tiny payload trips it). Sends an 8 KB
text. Asserts:
* status code = OutOfRange
* error message mentions either "decoded message length too
large" or the cap value (4096)
embed_request_below_decoding_cap_succeeds
Companion: 1 KB payload against the same 4 KB cap. Asserts the
request succeeds and the mock returns dim=384. Catches a
hypothetical regression where the cap is set so tight it blocks
legitimate traffic.
No NPU dependency (pure in-process mock + tonic), no fakeworker
subprocess (so no port-allocation flake). Runs on x86 dev hosts and
aarch64 Pi alike.
Validated:
- dos_gates suite alone: 2/2 pass in 0.09 s
- full integration sweep --test-threads=1:
lib : 103/103 pass
14 integration suites: 76/76 pass
total : 179 tests, 0 failures
Pi worker untouched this iter (test-only addition); no bench delta
to capture.
Co-Authored-By: claude-flow <ruv@ruv.net>
iter-180 through iter-184 + iter-190 layered six caps on the real
gRPC worker (byte cap, stream cap, RPC timeout, rapid-reset cap,
keepalive, encode cap). fakeworker — the test-fleet stand-in used
by 12+ integration tests — was left running with all defaults wide
open. Two consequences:
1. No integration test exercises the gate behavior. A future
change that loosened a cap on the real worker but tightened
it on fakeworker (or vice versa) would have escaped review.
2. A deploy that runs both binaries in the same env (e.g. a
hybrid fleet during cutover) had inconsistent DoS surface.
Mirror the same env vars + the same defaults so behavior is
identical between the two binaries:
fakeworker DoS-gate parity (iter 192)
max_request_bytes=65536 (iter 180)
max_response_bytes=16384 (iter 190)
max_concurrent_streams=256 (iter 181)
request_timeout_secs=30 (iter 182)
max_pending_resets=32 (iter 183)
http2_keepalive_secs=60 (iter 184)
Validated:
- Both feature combos compile clean
- Full integration test sweep, --test-threads=1:
lib : 103/103 pass
13 integration suites: 74/74 pass
total : 177 tests, 0 failures
All small-payload fakeworker tests (typical "hello"-class strings)
are well under every cap, so the gates are silent in practice.
- Smoke startup log:
fakeworker DoS-gate parity (iter 192) max_request_bytes=65536
max_response_bytes=16384 max_concurrent_streams=256
request_timeout_secs=30 max_pending_resets=32
http2_keepalive_secs=60
Pi worker untouched this iter (changes are pure fakeworker), so any
bench delta is tailnet/Pi noise unrelated to the change.
Co-Authored-By: claude-flow <ruv@ruv.net>
HailoRT's per-vstream `hailo_vstream_params_t.timeout_ms` defaults to
10 s. That's ~700× a steady-state embed (14 ms NPU compute on the
iter-156b HEF) and well above iter-182's 30 s tonic outer bound.
A wedged NPU (driver hang, PCIe link issue, FW reset mid-DMA) would
park the HefEmbedder Mutex for the full 10 s before any caller sees
an error, blocking every other concurrent embed for that window.
Override `params.timeout_ms` on both input + output vstream params
between `hailo_make_*_vstream_params` and `hailo_create_*_vstreams`,
defaulting to 2 000 ms (143× the typical embed cost — still room for
tail latency under thermal throttling). Operators tune via
`RUVECTOR_NPU_VSTREAM_TIMEOUT_MS`, floor 100 ms so a misconfig can't
fail every healthy embed.
Validated on cognitum-v0:
- startup self-test: vec_head=0.0181,-0.0220,0.0451,0.0159
(bit-identical to iter-190 — semantic equality holds)
- bench c=4 b=1, 8 s × 7 runs (1 outlier dropped):
iter-190 (10 s default): 69.0, 69.2, 70.6
→ mean 69.6/sec, p50=55-56 ms
iter-191 (2 s cap) : 68.2, 70.2, 69.0, 70.1, 69.0, 70.6
→ mean 69.5/sec, p50=54-56 ms
Δ throughput: -0.1% (flat; cap doesn't fire on healthy traffic)
Δ behavior under NPU hang (analytical, no real hang to test):
pre → embed Mutex held 10 s, every concurrent caller queues
for the full window, tonic 30 s outer bound mostly unused
post → embed returns HAILO_TIMEOUT (status 4) in 2 s, Mutex
released 5× faster, queue drains 5× faster, tonic outer
bound has 28 s of usable headroom for downstream retries
Layered timeouts now: 2 s FFI (iter 191) ← 30 s tonic (iter 182).
The inner bound makes the outer bound actionable rather than a hard
ceiling on a single-threaded queue.
Co-Authored-By: claude-flow <ruv@ruv.net>
Defense-in-depth response cap on the gRPC server. iter-180 capped the
decode side at 64 KB; the encode side was uncapped (tonic default
usize::MAX) even though the worker only ever generates Vec<f32>[384]
≈ 1.6 KB per unary embed. Cap at 16 KB (10× legitimate per-message
size) so any hypothetical bug that ever returned a huge payload
can't blow up downstream clients. Env-tunable via
`RUVECTOR_MAX_RESPONSE_BYTES`, floor 4 KB.
Worker startup banner now logs six DoS gates layered by iter:
iter 180: max_decoding_message_size = 65536
iter 181: max_concurrent_streams = 256
iter 182: request_timeout_secs = 30
iter 183: max_pending_resets = 32 (CVE-2023-44487)
iter 184: http2_keepalive_secs = 60
iter 190: max_encoding_message_size = 16384
Pi regression bench (c=4 b=1, 8 s × 3, post-deploy):
iter 189: 70.4, 70.1, 70.6 → mean 70.4/sec, p50=53-56 ms
iter 190: 68.9, 67.1, 70.6 → mean 68.9/sec, p50=55-56 ms
Δ -2.1% in tailnet noise band; no encode-side enforcement firing
on legitimate ~1.6 KB responses.
Session test sweep (cargo test --features tls --tests --test-threads=1):
- lib : 103/103 pass
- all 13 integration suites : 74/74 pass
- total : 177 tests, 0 failures
- tls_roundtrip + secure_stack : 4/4 (TLS path validated)
(One known-flaky test: rate_limit::tests::from_env_disabled_when_unset
races other tests that set the same process-global env vars on the
default parallel runner. Serial mode isolates it cleanly. Pre-existing
issue, unrelated to iter 190.)
Co-Authored-By: claude-flow <ruv@ruv.net>
Completes the client-side TLS flag surface across all three operator
tools in this repo. iter-187 added the bench flags, iter-188 added
the embed flags; iter-189 brings the stats CLI to parity so an op
can snapshot fleet stats from a TLS-configured worker without
building a custom client. Same `#[cfg(feature = "tls")]` gating, same
partial-config + orphan-flag refusals as the other two binaries.
Smoke-tested against cognitum-v0:
$ ruvector-hailo-stats --workers 100.77.59.83:50051 --tls-domain example.com
Error: "--tls-domain / --tls-client-cert / --tls-client-key require --tls-ca"
$ ruvector-hailo-stats --workers 100.77.59.83:50051 --tls-ca /nonexistent/ca.pem
Error: "--tls-ca: transport error to <tls>: read ca pem at /nonexistent/ca.pem: No such file or directory (os error 2)"
$ ruvector-hailo-stats --workers 100.77.59.83:50051
worker address fingerprint npu_t0 npu_t1 embeds errors avg_us max_us up_s
static-0 100.77.59.83:50051 9c56e596... 53.2 52.7 6614 0 27325 42930 1044
Pi regression bench (c=4 b=1, 8 s × 3, post-settle):
iter-188: 70.3, 69.0, 67.9 → mean 69.1/sec, p50=55-57 ms
iter-189: 70.4, 70.1, 70.6 → mean 70.4/sec, p50=53-56 ms, p99=86-90 ms
Δ throughput: +1.9% (within noise; stats CLI changes don't touch
the bench/embed code paths)
The TLS server-side path (iter 99) is now fully callable from every
client tool that ships with the cluster crate. Next direction is
either deferred ops work (Pi-side cert generation + systemd unit
wiring for end-to-end mTLS smoke) or a pivot to perf research
(async vstream, mask-aware HEF compile).
Co-Authored-By: claude-flow <ruv@ruv.net>
Symmetric with iter-187 bench plumbing — adds the same TLS knobs to
`ruvector-hailo-embed` so ops can drive a one-shot embed against a
TLS-configured worker without having to build a custom client. All
flags `#[cfg(feature = "tls")]` so the no-tls build stays clean.
Same partial-config + orphan-flag refusals as iter-187:
- --tls-domain / --tls-client-cert / --tls-client-key without
--tls-ca → loud error
- --tls-client-cert without --tls-client-key (or vice versa) →
loud error
- missing CA file → fs error surfaced with full path
Smoke-tested on the workstation:
$ ruvector-hailo-embed --workers 100.77.59.83:50051 --tls-domain example.com --text hello
Error: "--tls-domain / --tls-client-cert / --tls-client-key require --tls-ca"
$ ruvector-hailo-embed --workers 100.77.59.83:50051 --tls-ca /nonexistent/ca.pem --text hello
Error: "--tls-ca: transport error to <tls>: read ca pem at /nonexistent/ca.pem: No such file or directory (os error 2)"
$ ruvector-hailo-embed --workers 100.77.59.83:50051 --text "iter 188 smoke test"
{"text":"iter 188 smoke test","dim":384,"latency_us":433538,"vec_head":[...]}
Pi plaintext bench regression (c=4 b=1, 8 s × 3):
iter-187: 68.5, 68.7, 66.7 → mean 68.0/sec, p50=56-59 ms
iter-188: 70.3, 69.0, 67.9 → mean 69.1/sec, p50=55-57 ms
Δ throughput: +1.6% (within tailnet noise; embed CLI changes don't
touch the bench code path)
The TLS server-side path is now fully callable from both client tools
in this repo. Pi-side cert generation + systemd unit wiring (the
actual end-to-end TLS smoke against cognitum-v0) remains the deferred
ops follow-up.
Co-Authored-By: claude-flow <ruv@ruv.net>
Iter-99 added TLS support on the worker (`Server::tls_config`) and
iter-100 added optional mTLS via `RUVECTOR_TLS_CLIENT_CA`. The
client-side path through `GrpcTransport::with_tls` + `TlsClient` was
unit-tested in `tls_roundtrip.rs` but not driven from the bench CLI,
which meant ops had no way to drive a sustained-load TLS run against
a TLS-configured worker — every existing bench dialed plaintext.
Adds:
--tls-ca <path> PEM CA bundle. Promotes dial to https://.
--tls-domain <name> SNI / SAN to assert. Default = hostname half
of the first worker addr (via
`tls::domain_from_address`).
--tls-client-cert <p> mTLS client cert.
--tls-client-key <p> mTLS client private key.
All flags gated `#[cfg(feature = "tls")]` so the no-tls build is
unaffected. Partial mTLS configs (cert without key, vice versa) and
orphan flags (--tls-domain without --tls-ca) error out at startup
instead of silently falling back to plaintext.
Validation:
- `cargo test --features tls --test tls_roundtrip` — 2/2 pass
(already validated GrpcTransport::with_tls + plaintext-against-
TLS-server cleanly fails)
- `cargo test --features tls --test secure_stack_composition` —
2/2 pass (full stack composition still rejects tampered manifests)
- Pi plaintext regression: c=4 b=1, 8 s × 3 runs:
pre-iter-187 (iter 186): 68.3, 69.7, 65.8 → mean 67.9/sec
post-iter-187 : 68.5, 68.7, 66.7 → mean 68.0/sec
flat within noise; the new code is fully gated when --tls-ca is
absent.
- Local smoke against `ruvector-hailo-fakeworker` confirmed flag
parsing + error paths (orphan flags refused, missing CA file
surfaces fs error). End-to-end fakeworker handshake had a
transient listener inheritance issue under back-to-back
setsid/kill cycles that's a smoke-test setup quirk rather than
a code defect — the unit test already exercises the same library
path bench now plumbs through.
Pi-side mTLS smoke (cert generation + systemd unit wiring) is
deferred to an ops follow-up; this iter ships the client-side flag
surface so that follow-up has somewhere to plug into.
Co-Authored-By: claude-flow <ruv@ruv.net>
The HEF is compiled for a single fixed seq_len (128) and the HF
tokenizer always emits zero token_type_ids for single-text embeds,
so `position_embeddings.forward(0..seq)` and
`token_type_embeddings.forward(zeros)` produce identical Tensors
every call. iter-186 caches both behind seq-keyed Mutexes; first
call paths are unchanged, every subsequent embed skips two
`Tensor::new` allocs + two embedding lookups + two unsqueeze ops.
Also adds `mean_pool_into` to inference.rs as an alloc-free public
helper (the existing `mean_pool` becomes a thin wrapper) for future
callers; HefEmbedder still uses the owning `mean_pool` because the
Mutex-guarded buffer can't escape without a clone (which would
defeat the pool).
Validated on cognitum-v0, c=4 b=1, 8 s × 3 runs:
bench-before (iter 185): 69.9, 67.3, 64.9 → mean 67.4/sec
p50=55-58ms, p99=92-172ms
bench-after (iter 186): 68.3, 69.7, 65.8 → mean 67.9/sec
p50=55-58ms, p99=99-169ms
Δ throughput: +0.7% (within tailnet noise)
Δ p50 : flat
Δ p99 : modest tightening (avg 126 vs 142 ms)
Wall-time win is sub-noise because the NPU PCIe DMA round-trip
(~50 ms p50) dwarfs the candle host-side work that this caches.
The change still removes redundant CPU + alloc churn per RPC,
which is a power-savings win on the Pi 5 cluster (ARM cores idle
sooner) and a cleaner cache-locality story over long runs.
Embed correctness verified: startup self-test produces bit-identical
vec_head (0.0181,-0.0220,0.0451,0.0159) and sim_close/sim_far values
across iter-185 and iter-186 binaries.
Co-Authored-By: claude-flow <ruv@ruv.net>
Iter 179 first observed a SIGSEGV during clean shutdown after
sustained load. Iter 185 baseline measurement showed it's not a
race — every shutdown SEGV'd, both idle and under load:
iter-184 baseline: 0 clean / 5 SEGV out of 5
iter-185 first attempt (drain + explicit drop):
0 clean / 5 SEGV out of 5
iter-185 final (mem::forget + process::exit(0)):
10 clean / 0 SEGV out of 10
The SEGV is not in our HefPipeline::Drop — the explicit
`drop(embedder_outer)` after rt.shutdown_timeout was never reached;
the SEGV fired during HailoRT's own internal teardown (DMA scheduler
threads + vdevice callbacks). This is upstream library behavior, not
something we can paper over with timing tweaks.
Mitigation: leak the embedder via `mem::forget` and call
`process::exit(0)` after tonic's serve completes. The OS reaps every
resource the worker owns (mmap'd HEF, vstream fds, driver-side
handles via close(2)); HailoRT's own threads die with the same exit
syscall, so they can't race a free that never happens. Operators see
`status=0/SUCCESS` in systemd instead of `status=11/SEGV`, which
makes restart loops, alerting, and unit-state monitoring sane.
Bound: one HefPipeline + one HostEmbeddings pair leak per process
lifetime. Each subsequent worker is a fresh process. Reserved escape
hatch `RUVECTOR_SHUTDOWN_FORCE_CLEAN=1` keeps the slow drop path
available for when a future HailoRT release fixes the upstream bug.
No throughput regression after settle (PCIe driver re-init takes
~30 s after rapid restart cycles, but steady-state is unchanged):
pre-iter-185 (iter 184): 70.5, 70.5, 69.6 → mean 70.2/sec, p50=112 ms
post-iter-185 settled : 68.4, 69.2, 66.0, 68.1 → mean 67.9/sec,
p50=55-56 ms
(The p50 difference here is bench config — 4 vs 8 concurrency between
the two measurements; per-run p50 at c=8 is unchanged from prior iters.)
Co-Authored-By: claude-flow <ruv@ruv.net>
tonic's default leaves http2_keepalive_interval=None, so a half-closed
TCP connection (client crashed, NAT mid-flow drop, network partition)
sits in the worker's accept table indefinitely, holding stream state
that the iter-181 max_concurrent_streams cap can't reclaim. Add a
60 s server-initiated PING; if the client doesn't PONG within hyper's
default 20 s timeout, the connection is closed and its state freed.
Operators can tune via `RUVECTOR_HTTP2_KEEPALIVE_SECS`. 0 disables
the feature entirely (cellular metering, ping-hostile networks).
Floor 10 s so a misconfig can't saturate the link with pings.
Validated on cognitum-v0, c=8 b=1, 8 s × 3 runs:
iter-183 baseline: 70.5, 70.5, 69.6 → mean 70.2/sec
iter-184 after : 70.6, 69.0, 70.5 → mean 70.0/sec
Δ throughput: -0.3% (unmeasurable; the 60 s ping interval falls
outside the 8 s bench window so no PINGs even fire
during measurement)
Δ p50 : flat at 110-112 ms
Net new behavior: half-closed peers now reclaimed in ≤80 s instead
of waiting on TCP keepalive defaults (sysctl tcp_keepalive_time =
2 hours). Combined with iter-181's 256-stream cap, the worker can
no longer accumulate orphan stream state from disappearing clients.
Five gates now in the worker startup banner: byte cap (180), stream
cap (181), RPC timeout (182), rapid-reset cap (183), keepalive (184).
Co-Authored-By: claude-flow <ruv@ruv.net>
hyper/h2 already mitigates the rapid-reset DoS by defaulting
http2_max_pending_accept_reset_streams to 20 post-CVE, but pinning
the value explicitly gives operators a tunable surface and makes the
mitigation reviewable from worker startup logs. Set to 32 by default
(small step above the h2 default to leave room for legit reset
jitter), env-tunable via `RUVECTOR_MAX_PENDING_RESETS` with an 8
floor. Once exceeded, hyper sends GOAWAY and closes the connection.
Validated on cognitum-v0, c=8 b=1, 8 s × 3 runs each:
iter-182 baseline: 69.6, 67.4, 69.0 → mean 68.7/sec
iter-183 after : 70.5, 70.5, 69.6 → mean 70.2/sec
Δ throughput: +2.2% (noise band — legit traffic doesn't generate
RST_STREAM under steady load, so the cap is invisible)
Δ p50 : flat at 111-112 ms
Layered with iter-180 byte cap, iter-181 stream cap, iter-182 RPC
timeout — four DoS gates now visible in the worker startup banner.
This closes the named-CVE checklist for the gRPC server surface;
remaining hardening (HTTP/2 keepalive, header-list-size cap) targets
liveness rather than DoS.
Co-Authored-By: claude-flow <ruv@ruv.net>
tonic's default left request handlers running unbounded — a slow-loris
client could open a stream and trickle bytes to keep it alive forever.
Add `Server::timeout(30s)` so each handler is hard-bounded, with
`RUVECTOR_REQUEST_TIMEOUT_SECS` for ops tuning and a 2 s floor to
keep normal embeds (~50-200 ms) safe under any misconfig.
Why 30 s: iter-179 measured worst legit RPC at 910 ms (b=16, c=2).
30 s gives 30× headroom while still reclaiming any stuck handler in
under a sysctl `panic` window. Layered with iter-180 byte cap and
iter-181 stream cap.
Cancellation safety: the embed handler's HailoRT FFI section is fully
synchronous (Mutex acquire → blocking FFI calls → response build).
tonic's tower-timeout middleware can only drop the future at .await
points — before the Mutex acquire (no resource leak) or after the
response build (no leak). NPU vstreams are released only via the
Mutex-held HefPipeline path, never through cancellation.
Validated on cognitum-v0, c=8 b=1, 8 s × 6 runs:
iter-181 baseline (3 runs): 68.7, 70.6, 68.6 → mean 69.3/sec
iter-182 after (6 runs): 66.1, 63.7, 69.2, 70.5, 69.8, 65.8
→ mean 67.5/sec
Δ throughput: -2.6% (within tailnet jitter band; p99 in legit
runs swings 210-558 ms back-to-back)
Δ p50 : flat at 111-113 ms (no overhead at the median)
Timeout middleware adds the cost of arming one tokio::time::sleep per
RPC; at 70 RPS that's 4 µs per call against a 56 ms embed cost, well
below the noise floor.
Co-Authored-By: claude-flow <ruv@ruv.net>
tonic's default leaves SETTINGS_MAX_CONCURRENT_STREAMS unset so a
single attacker socket could pump unbounded concurrent RPCs through
one HTTP/2 connection. Cap at 256 by default, env-overridable via
`RUVECTOR_MAX_CONCURRENT_STREAMS` with a floor of 8 so a misconfig
can't lock out the bench/health-check path. Layered with iter-180's
per-RPC byte cap.
Validated on cognitum-v0 (Pi 5 + AI HAT+):
bench-before (iter 180, no stream cap):
c=8 b=1, 10s, 70.3/sec, p50=112ms, p99=190ms
bench-after (cap=256), three runs c=8 b=1, 8s each:
run 1: 68.7/sec, p50=112ms, p99=307ms
run 2: 70.6/sec, p50=112ms, p99=175ms
run 3: 68.6/sec, p50=112ms, p99=314ms
mean : 69.3/sec, p50=112ms (rock-stable), p99 jitters
175-314ms — tailnet noise, not cap-bound (only 8 of 256
stream budget used by legit traffic).
Cap is invisible to legit callers (current bench peaks at c=8) and
provides 32× headroom over observed traffic. Caps the per-connection
amplification an attacker gets from HTTP/2 stream multiplexing — they
can still open more TCP connections, but each one is now bounded.
The Pi NPU is the real ceiling at ~70/sec anyway, so multi-connection
abuse hits the same compute wall.
Co-Authored-By: claude-flow <ruv@ruv.net>
tonic's transport-level cap lets each unauthenticated RPC allocate up
to ~4 MB before the worker even sees the request — gratuitous for an
embed worker (typical sentence-transformer text is <10 KB; iter-156b
HEF truncates at seq=128 ≈ 1 KB anyway). Cap at 64 KB by default,
operator-overridable via `RUVECTOR_MAX_REQUEST_BYTES`, with a 4 KB
floor so a misconfig can't lock the worker out.
Validated on cognitum-v0 (Pi 5 + AI HAT+):
bench-before (iter 179, no cap):
c=4 b=1, 12s, 67.3/sec, p50=56.6ms, p99=152.6ms
bench-after (cap=65536):
c=4 b=1, 12s, 68.6/sec, p50=56.5ms, p99=152.7ms
→ no regression on normal traffic (cap > tokenized payload)
DoS probe — 100 KB embed text:
OutOfRange "decoded message length too large: found 102432 bytes,
the limit is: 65536 bytes"
→ rejected at decode, before any embedder/tokenizer alloc
Acceptance probe — 60 KB embed text:
succeeds, dim=384, latency_us=98733
→ tokenizer truncates seq>128 internally; cap doesn't change
semantic behavior, just shrinks the alloc surface.
Tonic emits the rejection from `InterceptedService::new(server, intc)`
because `max_decoding_message_size` lives on the generated
`EmbeddingServer` (not the interceptor wrapper). Dropped the
`with_interceptor` shortcut, which would re-build the inner with
default limits.
Cargo.lock churn carries the sha2 dep added in iter 174 (was
out-of-sync with the source change since then).
Co-Authored-By: claude-flow <ruv@ruv.net>
Adds `--batch-size N` to ruvector-hailo-cluster-bench. N=1 (default)
preserves the existing unary `embed_one_blocking` path. N>1 routes
through the streaming `embed_batch_blocking` RPC, counting each
returned vector as one success so unary/streaming throughput stays
apples-to-apples.
Cognitum-v0 (Pi 5 + AI HAT+) saturation sweep, 8s runs:
c=concurrency b=batch thr/s p50 p99
───────────── ─────── ───── ─── ───
2 1 67.3 28.3ms 47.6ms ← latency optimum
2 4 63.8 113ms 368ms
2 16 70.4 445ms 910ms
4 1 67.3 56.6ms 153ms (iter-176 baseline)
4 8 70.2 455ms 882ms
8 1 70.6 111ms 187ms
8 4 70.6 454ms 877ms
Findings: throughput plateaus at ~70.6/sec across every (c,b) pair —
matches iter-157's raw HEF FPS ceiling. The bottleneck is single-stream
FP32 forward on the NPU, not gRPC framing. Streaming RPC adds ~5%
headroom only at c≤4; once concurrency >= 8 the NPU is already
serializing, so batched RPC just buys longer per-RPC latency without
more vectors out.
Two operator-relevant takeaways:
• Latency-sensitive callers should use c=2 b=1 (p50=28ms, p99=48ms).
• Throughput-sensitive callers gain nothing from streaming today —
the win is gated on the HailoRT async vstream API (NPU/PCIe
overlap), which is on the iter-180+ backlog.
Pi worker SEGV'd on shutdown during the previous bench cycle — vstream
close raced with an in-flight RPC. Existing issue (HailoRT FFI
shutdown ordering), separate from the iter-179 surface; reset-failed
+ start cleanly recovered. Filed mentally for an iter that adds
SIGTERM-aware vstream drain.
Co-Authored-By: claude-flow <ruv@ruv.net>
Audit pass over all 22 unsafe blocks in hef_pipeline.rs. Pre-iter 178:
* 5x mem::zeroed() initializations had a single-line generic
SAFETY comment ("the SDK writes through the &mut")
* 7x FFI calls reused the same generic comment by reference
* 1x union read documented "rank-3 inputs so shape, not nms_shape"
without naming the discriminant field
* 2x vstream write/read had one-line SAFETY mentioning only the
input/output pointer
Iter 178 expands each block's SAFETY comment to spell out:
* For zeroed POD structs: which struct shape was verified against
/usr/include/hailo/hailort.h, and why all-zero bits is a valid
initial state (no enum discriminants, no nullable refs).
* For FFI calls: provenance of every pointer/handle (which SDK
call returned it, lifetime relative to subsequent calls,
whether release runs in Drop), single-element vs multi-element
out-buffers, and which post-checks catch bad sizes.
* For union reads: the actual discriminant field
(`format.order`), why the iter-156b HEF guarantees the
non-NMS branch, and what would need to change for NMS HEFs.
* For vstream write/read: alignment requirements (Vec<f32> 4-byte
align on x86/aarch64), bounds via input_frame_bytes /
output_frame_bytes computed from Hailo-reported shapes, and
the &mut self serialization guarantee from iter-137 lib.rs Mutex.
No runtime change → bench unchanged from iter 176 (70.2 embeds/sec
on Pi 5 NPU, p99=89.6ms). The "before/after" here is unsafe-block
documentation density: each block now gives a security reviewer
the full context to verify the invariants without re-reading the
HailoRT C headers.
cargo clippy --all-targets -- -D warnings clean for all 4 feature
combos. 15 lib tests pass.
This commit is part of the iter-173/174 layered-startup-gates +
iter-177 cargo-deny supply-chain push: every operator-facing
attack surface (file content, FFI interaction, dep tree) now has
a machine-checkable or human-reviewable gate.
Co-Authored-By: claude-flow <ruv@ruv.net>