docs: ADR-176 EPIC accepted; ADR-167/175 + cluster README mark NPU production-default (iter 165)

ADR-176 transitions from `in-progress` to `accepted`. Six phases shipped iter 158-164, all acceptance criteria met: ✅ build cleanly on Pi 5 (--features hailo,cpu-fallback) ✅ systemctl boot with HEF, fingerprint computed ✅ iter-145 self-test embed ok dim=384 ✅ ruvllm-bridge → cluster → Pi worker returns real semantic vector ✅ cluster-bench ≥5x throughput (measured 9.6x: 7/sec → 67.3/sec) ✅ NPU output preserves semantic ordering (sim(close) > sim(far)) ✅ clippy clean all 4 feature combos Updated: ADR-167 status: NPU is now production-default; old "CPU fallback only, HEF blocked" snapshot preserved below as historical context. iter-163 measurements quoted. ADR-175 status: Option A is now the production default (was "shipped iter 156b but not yet integrated"). References ADR-176 for the integration EPIC. README ruvector-hailo-cluster opening status: NPU acceleration shipped; cpu-fallback is the automatic failover. Pi worker stopped post-validation; the systemd unit is configured to start it back up on the next reboot or `systemctl start`. The HEF lives at /var/lib/ruvector-hailo/models/all-minilm-l6-v2/model.hef ready for the next deploy. EPIC closed. The cron loop b7f30007 will continue ticking but has nothing left to ship — the acceptance gate is met. Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-29 19:33:34 +00:00 · 2026-05-03 15:34:07 -04:00 · 2026-05-03 15:34:07 -04:00 · 4f1bc906a2
commit 4f1bc906a2
parent 52cd6617b1
4 changed files with 44 additions and 9 deletions
--- a/crates/ruvector-hailo-cluster/README.md
+++ b/crates/ruvector-hailo-cluster/README.md
@ -8,11 +8,17 @@ in-process caching, and Tailscale-tag-based discovery.
 > passing across lib unit / cluster integration / 3 CLI integration / 7
 > doctest suites.
 >
-> **End-to-end embedding works today** via the iter-134 cpu-fallback
-> path — `cargo build --features cpu-fallback --bin ruvector-hailo-worker`
-> + `deploy/download-cpu-fallback-model.sh` produces a worker that
-> returns real 384-dim semantic vectors via candle BERT-6 on host CPU.
-> NPU acceleration is gated on HEF model surgery (see [ADR-167][adr167] §6).
+> **NPU acceleration is the production default** as of iter 163
+> (ADR-176). `cargo build --release --features hailo,cpu-fallback
+> --bin ruvector-hailo-worker` produces a worker that auto-detects
+> `model.hef` in the model dir and runs encoder forward pass on the
+> Hailo-8 NPU. Measured **67.3 embeds/sec/worker** on real Pi 5 +
+> AI HAT+ (9.6× over cpu-fallback baseline).
+>
+> **cpu-fallback remains the automatic failover.** When `model.hef`
+> isn't present, the worker uses host-CPU candle BERT-6 (~7
+> embeds/sec/worker on Pi 5). `deploy/download-cpu-fallback-model.sh`
+> fetches the safetensors trio with sha256 pinning.

 [adr167]: ../../docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md
 [adr168]: ../../docs/adr/ADR-168-ruvector-hailo-cluster-cli-surface.md
--- a/docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md
+++ b/docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md
@ -13,9 +13,23 @@ related: [ADR-SYS-0027, ADR-165, ADR-166]

 ## Status

-**Iter 134/135 (2026-05-02): CPU fallback path is production-deployable
-today; HEF compile is unblocked at the tooling layer but blocked at the
-model-graph layer.** Branch `hailo-backend`.
+**Iter 163 (2026-05-03): NPU acceleration is the production default.**
+The full embed path runs on Hailo-8 hardware end-to-end via
+`HailoEmbedder` → `HefEmbedder` → tokenize → host-side BertEmbeddings
+→ `HefPipeline::forward` (NPU) → mean-pool → L2-normalize. Measured
+on real cognitum-v0 (Pi 5 + AI HAT+) at concurrency=4:
+**67.3 embeds/sec/worker, p50=57ms, 9.6× over cpu-fallback.**
+See ADR-176 for the EPIC tracking the iter 158-164 integration work.
+cpu-fallback remains as the automatic failover when no `model.hef`
+is present in the model dir.
+
+---
+
+**Earlier (iter 134/135) snapshot — CPU fallback only, HEF blocked:**
+
+CPU fallback path is production-deployable today; HEF compile is
+unblocked at the tooling layer but blocked at the model-graph layer.
+Branch `hailo-backend`.

 | Surface | Status |
 |---|---|
--- a/docs/adr/ADR-175-hailo-rust-side-workarounds.md
+++ b/docs/adr/ADR-175-hailo-rust-side-workarounds.md
@ -17,6 +17,21 @@ unblocked via Rust-side SDK monkey-patch.** A working
 on Hailo-8 — 15.7 MB,
 sha256 `cdbc892765d3099f74723ee6c28ab3f0daade2358827823ba08d2969b07ebd40`.

+**Iter 163 final update (2026-05-03): Option A is now the production
+default.** ADR-176 P5 shipped end-to-end NPU acceleration on real Pi
+5 + AI HAT+ hardware:
+
+  cpu-fallback (Option E):     7.0 embeds/sec/worker
+  NPU HEF      (Option A):    67.3 embeds/sec/worker  (9.6× faster)
+  p50 latency:                 572 ms → 57 ms          (10× faster)
+  p99 latency:                 813 ms → 152 ms         (5.4× faster)
+
+`HailoEmbedder` (iter 162, ADR-176 P4) routes
+`HEF > cpu-fallback > NoModelLoaded` automatically. Operators
+following deploy/install.sh + dropping `model.hef` into the model
+dir get NPU acceleration with no other config changes. cpu-fallback
+remains the failover path when no HEF is available.
+
 The 156-iteration arc resolved every SDK bug encountered:
 1. KeyError input_layer1 (iter 142): keyed calibration dict by
   internal HN layer name discovered via `runner.get_hn()` introspection
--- a/docs/adr/ADR-176-hef-integration-epic.md
+++ b/docs/adr/ADR-176-hef-integration-epic.md
@ -1,7 +1,7 @@
 ---
 adr: 176
 title: "EPIC — Wire HEF into HailoEmbedder for NPU-accelerated production embeddings"
-status: in-progress
+status: accepted
 date: 2026-05-03
 authors: [ruvnet, claude-flow]
 related: [ADR-167, ADR-172, ADR-173, ADR-175]