From 4f1bc906a23c45b78476d2dba38bcbbefcd23450 Mon Sep 17 00:00:00 2001 From: ruvnet Date: Sun, 3 May 2026 15:34:07 -0400 Subject: [PATCH] docs: ADR-176 EPIC accepted; ADR-167/175 + cluster README mark NPU production-default (iter 165) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ADR-176 transitions from `in-progress` to `accepted`. Six phases shipped iter 158-164, all acceptance criteria met: ✅ build cleanly on Pi 5 (--features hailo,cpu-fallback) ✅ systemctl boot with HEF, fingerprint computed ✅ iter-145 self-test embed ok dim=384 ✅ ruvllm-bridge → cluster → Pi worker returns real semantic vector ✅ cluster-bench ≥5x throughput (measured 9.6x: 7/sec → 67.3/sec) ✅ NPU output preserves semantic ordering (sim(close) > sim(far)) ✅ clippy clean all 4 feature combos Updated: ADR-167 status: NPU is now production-default; old "CPU fallback only, HEF blocked" snapshot preserved below as historical context. iter-163 measurements quoted. ADR-175 status: Option A is now the production default (was "shipped iter 156b but not yet integrated"). References ADR-176 for the integration EPIC. README ruvector-hailo-cluster opening status: NPU acceleration shipped; cpu-fallback is the automatic failover. Pi worker stopped post-validation; the systemd unit is configured to start it back up on the next reboot or `systemctl start`. The HEF lives at /var/lib/ruvector-hailo/models/all-minilm-l6-v2/model.hef ready for the next deploy. EPIC closed. The cron loop b7f30007 will continue ticking but has nothing left to ship — the acceptance gate is met. Co-Authored-By: claude-flow --- crates/ruvector-hailo-cluster/README.md | 16 ++++++++++----- ...67-ruvector-hailo-npu-embedding-backend.md | 20 ++++++++++++++++--- .../ADR-175-hailo-rust-side-workarounds.md | 15 ++++++++++++++ docs/adr/ADR-176-hef-integration-epic.md | 2 +- 4 files changed, 44 insertions(+), 9 deletions(-) diff --git a/crates/ruvector-hailo-cluster/README.md b/crates/ruvector-hailo-cluster/README.md index 173cb780..149eb085 100644 --- a/crates/ruvector-hailo-cluster/README.md +++ b/crates/ruvector-hailo-cluster/README.md @@ -8,11 +8,17 @@ in-process caching, and Tailscale-tag-based discovery. > passing across lib unit / cluster integration / 3 CLI integration / 7 > doctest suites. > -> **End-to-end embedding works today** via the iter-134 cpu-fallback -> path — `cargo build --features cpu-fallback --bin ruvector-hailo-worker` -> + `deploy/download-cpu-fallback-model.sh` produces a worker that -> returns real 384-dim semantic vectors via candle BERT-6 on host CPU. -> NPU acceleration is gated on HEF model surgery (see [ADR-167][adr167] §6). +> **NPU acceleration is the production default** as of iter 163 +> (ADR-176). `cargo build --release --features hailo,cpu-fallback +> --bin ruvector-hailo-worker` produces a worker that auto-detects +> `model.hef` in the model dir and runs encoder forward pass on the +> Hailo-8 NPU. Measured **67.3 embeds/sec/worker** on real Pi 5 + +> AI HAT+ (9.6× over cpu-fallback baseline). +> +> **cpu-fallback remains the automatic failover.** When `model.hef` +> isn't present, the worker uses host-CPU candle BERT-6 (~7 +> embeds/sec/worker on Pi 5). `deploy/download-cpu-fallback-model.sh` +> fetches the safetensors trio with sha256 pinning. [adr167]: ../../docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md [adr168]: ../../docs/adr/ADR-168-ruvector-hailo-cluster-cli-surface.md diff --git a/docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md b/docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md index 906350e4..2b84c948 100644 --- a/docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md +++ b/docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md @@ -13,9 +13,23 @@ related: [ADR-SYS-0027, ADR-165, ADR-166] ## Status -**Iter 134/135 (2026-05-02): CPU fallback path is production-deployable -today; HEF compile is unblocked at the tooling layer but blocked at the -model-graph layer.** Branch `hailo-backend`. +**Iter 163 (2026-05-03): NPU acceleration is the production default.** +The full embed path runs on Hailo-8 hardware end-to-end via +`HailoEmbedder` → `HefEmbedder` → tokenize → host-side BertEmbeddings +→ `HefPipeline::forward` (NPU) → mean-pool → L2-normalize. Measured +on real cognitum-v0 (Pi 5 + AI HAT+) at concurrency=4: +**67.3 embeds/sec/worker, p50=57ms, 9.6× over cpu-fallback.** +See ADR-176 for the EPIC tracking the iter 158-164 integration work. +cpu-fallback remains as the automatic failover when no `model.hef` +is present in the model dir. + +--- + +**Earlier (iter 134/135) snapshot — CPU fallback only, HEF blocked:** + +CPU fallback path is production-deployable today; HEF compile is +unblocked at the tooling layer but blocked at the model-graph layer. +Branch `hailo-backend`. | Surface | Status | |---|---| diff --git a/docs/adr/ADR-175-hailo-rust-side-workarounds.md b/docs/adr/ADR-175-hailo-rust-side-workarounds.md index e5b0bf6e..eb3f6dc1 100644 --- a/docs/adr/ADR-175-hailo-rust-side-workarounds.md +++ b/docs/adr/ADR-175-hailo-rust-side-workarounds.md @@ -17,6 +17,21 @@ unblocked via Rust-side SDK monkey-patch.** A working on Hailo-8 — 15.7 MB, sha256 `cdbc892765d3099f74723ee6c28ab3f0daade2358827823ba08d2969b07ebd40`. +**Iter 163 final update (2026-05-03): Option A is now the production +default.** ADR-176 P5 shipped end-to-end NPU acceleration on real Pi +5 + AI HAT+ hardware: + + cpu-fallback (Option E): 7.0 embeds/sec/worker + NPU HEF (Option A): 67.3 embeds/sec/worker (9.6× faster) + p50 latency: 572 ms → 57 ms (10× faster) + p99 latency: 813 ms → 152 ms (5.4× faster) + +`HailoEmbedder` (iter 162, ADR-176 P4) routes +`HEF > cpu-fallback > NoModelLoaded` automatically. Operators +following deploy/install.sh + dropping `model.hef` into the model +dir get NPU acceleration with no other config changes. cpu-fallback +remains the failover path when no HEF is available. + The 156-iteration arc resolved every SDK bug encountered: 1. KeyError input_layer1 (iter 142): keyed calibration dict by internal HN layer name discovered via `runner.get_hn()` introspection diff --git a/docs/adr/ADR-176-hef-integration-epic.md b/docs/adr/ADR-176-hef-integration-epic.md index 8e94fe27..494aca10 100644 --- a/docs/adr/ADR-176-hef-integration-epic.md +++ b/docs/adr/ADR-176-hef-integration-epic.md @@ -1,7 +1,7 @@ --- adr: 176 title: "EPIC — Wire HEF into HailoEmbedder for NPU-accelerated production embeddings" -status: in-progress +status: accepted date: 2026-05-03 authors: [ruvnet, claude-flow] related: [ADR-167, ADR-172, ADR-173, ADR-175]