docs: ADR-176 EPIC accepted; ADR-167/175 + cluster README mark NPU production-default (iter 165)

ADR-176 transitions from `in-progress` to `accepted`. Six phases
shipped iter 158-164, all acceptance criteria met:

   build cleanly on Pi 5 (--features hailo,cpu-fallback)
   systemctl boot with HEF, fingerprint computed
   iter-145 self-test embed ok dim=384
   ruvllm-bridge → cluster → Pi worker returns real semantic vector
   cluster-bench ≥5x throughput (measured 9.6x: 7/sec → 67.3/sec)
   NPU output preserves semantic ordering (sim(close) > sim(far))
   clippy clean all 4 feature combos

Updated:

  ADR-167  status: NPU is now production-default; old "CPU fallback
                   only, HEF blocked" snapshot preserved below as
                   historical context. iter-163 measurements quoted.
  ADR-175  status: Option A is now the production default (was
                   "shipped iter 156b but not yet integrated").
                   References ADR-176 for the integration EPIC.
  README   ruvector-hailo-cluster opening status: NPU acceleration
                   shipped; cpu-fallback is the automatic failover.

Pi worker stopped post-validation; the systemd unit is configured
to start it back up on the next reboot or `systemctl start`. The
HEF lives at /var/lib/ruvector-hailo/models/all-minilm-l6-v2/model.hef
ready for the next deploy.

EPIC closed. The cron loop b7f30007 will continue ticking but has
nothing left to ship — the acceptance gate is met.

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
ruvnet 2026-05-03 15:34:07 -04:00
parent 52cd6617b1
commit 4f1bc906a2
4 changed files with 44 additions and 9 deletions

View file

@ -8,11 +8,17 @@ in-process caching, and Tailscale-tag-based discovery.
> passing across lib unit / cluster integration / 3 CLI integration / 7
> doctest suites.
>
> **End-to-end embedding works today** via the iter-134 cpu-fallback
> path — `cargo build --features cpu-fallback --bin ruvector-hailo-worker`
> + `deploy/download-cpu-fallback-model.sh` produces a worker that
> returns real 384-dim semantic vectors via candle BERT-6 on host CPU.
> NPU acceleration is gated on HEF model surgery (see [ADR-167][adr167] §6).
> **NPU acceleration is the production default** as of iter 163
> (ADR-176). `cargo build --release --features hailo,cpu-fallback
> --bin ruvector-hailo-worker` produces a worker that auto-detects
> `model.hef` in the model dir and runs encoder forward pass on the
> Hailo-8 NPU. Measured **67.3 embeds/sec/worker** on real Pi 5 +
> AI HAT+ (9.6× over cpu-fallback baseline).
>
> **cpu-fallback remains the automatic failover.** When `model.hef`
> isn't present, the worker uses host-CPU candle BERT-6 (~7
> embeds/sec/worker on Pi 5). `deploy/download-cpu-fallback-model.sh`
> fetches the safetensors trio with sha256 pinning.
[adr167]: ../../docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md
[adr168]: ../../docs/adr/ADR-168-ruvector-hailo-cluster-cli-surface.md

View file

@ -13,9 +13,23 @@ related: [ADR-SYS-0027, ADR-165, ADR-166]
## Status
**Iter 134/135 (2026-05-02): CPU fallback path is production-deployable
today; HEF compile is unblocked at the tooling layer but blocked at the
model-graph layer.** Branch `hailo-backend`.
**Iter 163 (2026-05-03): NPU acceleration is the production default.**
The full embed path runs on Hailo-8 hardware end-to-end via
`HailoEmbedder``HefEmbedder` → tokenize → host-side BertEmbeddings
`HefPipeline::forward` (NPU) → mean-pool → L2-normalize. Measured
on real cognitum-v0 (Pi 5 + AI HAT+) at concurrency=4:
**67.3 embeds/sec/worker, p50=57ms, 9.6× over cpu-fallback.**
See ADR-176 for the EPIC tracking the iter 158-164 integration work.
cpu-fallback remains as the automatic failover when no `model.hef`
is present in the model dir.
---
**Earlier (iter 134/135) snapshot — CPU fallback only, HEF blocked:**
CPU fallback path is production-deployable today; HEF compile is
unblocked at the tooling layer but blocked at the model-graph layer.
Branch `hailo-backend`.
| Surface | Status |
|---|---|

View file

@ -17,6 +17,21 @@ unblocked via Rust-side SDK monkey-patch.** A working
on Hailo-8 — 15.7 MB,
sha256 `cdbc892765d3099f74723ee6c28ab3f0daade2358827823ba08d2969b07ebd40`.
**Iter 163 final update (2026-05-03): Option A is now the production
default.** ADR-176 P5 shipped end-to-end NPU acceleration on real Pi
5 + AI HAT+ hardware:
cpu-fallback (Option E): 7.0 embeds/sec/worker
NPU HEF (Option A): 67.3 embeds/sec/worker (9.6× faster)
p50 latency: 572 ms → 57 ms (10× faster)
p99 latency: 813 ms → 152 ms (5.4× faster)
`HailoEmbedder` (iter 162, ADR-176 P4) routes
`HEF > cpu-fallback > NoModelLoaded` automatically. Operators
following deploy/install.sh + dropping `model.hef` into the model
dir get NPU acceleration with no other config changes. cpu-fallback
remains the failover path when no HEF is available.
The 156-iteration arc resolved every SDK bug encountered:
1. KeyError input_layer1 (iter 142): keyed calibration dict by
internal HN layer name discovered via `runner.get_hn()` introspection

View file

@ -1,7 +1,7 @@
---
adr: 176
title: "EPIC — Wire HEF into HailoEmbedder for NPU-accelerated production embeddings"
status: in-progress
status: accepted
date: 2026-05-03
authors: [ruvnet, claude-flow]
related: [ADR-167, ADR-172, ADR-173, ADR-175]