mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-29 19:33:34 +00:00
docs: ADR-176 EPIC accepted; ADR-167/175 + cluster README mark NPU production-default (iter 165)
ADR-176 transitions from `in-progress` to `accepted`. Six phases shipped iter 158-164, all acceptance criteria met: ✅ build cleanly on Pi 5 (--features hailo,cpu-fallback) ✅ systemctl boot with HEF, fingerprint computed ✅ iter-145 self-test embed ok dim=384 ✅ ruvllm-bridge → cluster → Pi worker returns real semantic vector ✅ cluster-bench ≥5x throughput (measured 9.6x: 7/sec → 67.3/sec) ✅ NPU output preserves semantic ordering (sim(close) > sim(far)) ✅ clippy clean all 4 feature combos Updated: ADR-167 status: NPU is now production-default; old "CPU fallback only, HEF blocked" snapshot preserved below as historical context. iter-163 measurements quoted. ADR-175 status: Option A is now the production default (was "shipped iter 156b but not yet integrated"). References ADR-176 for the integration EPIC. README ruvector-hailo-cluster opening status: NPU acceleration shipped; cpu-fallback is the automatic failover. Pi worker stopped post-validation; the systemd unit is configured to start it back up on the next reboot or `systemctl start`. The HEF lives at /var/lib/ruvector-hailo/models/all-minilm-l6-v2/model.hef ready for the next deploy. EPIC closed. The cron loop b7f30007 will continue ticking but has nothing left to ship — the acceptance gate is met. Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
parent
52cd6617b1
commit
4f1bc906a2
4 changed files with 44 additions and 9 deletions
|
|
@ -8,11 +8,17 @@ in-process caching, and Tailscale-tag-based discovery.
|
|||
> passing across lib unit / cluster integration / 3 CLI integration / 7
|
||||
> doctest suites.
|
||||
>
|
||||
> **End-to-end embedding works today** via the iter-134 cpu-fallback
|
||||
> path — `cargo build --features cpu-fallback --bin ruvector-hailo-worker`
|
||||
> + `deploy/download-cpu-fallback-model.sh` produces a worker that
|
||||
> returns real 384-dim semantic vectors via candle BERT-6 on host CPU.
|
||||
> NPU acceleration is gated on HEF model surgery (see [ADR-167][adr167] §6).
|
||||
> **NPU acceleration is the production default** as of iter 163
|
||||
> (ADR-176). `cargo build --release --features hailo,cpu-fallback
|
||||
> --bin ruvector-hailo-worker` produces a worker that auto-detects
|
||||
> `model.hef` in the model dir and runs encoder forward pass on the
|
||||
> Hailo-8 NPU. Measured **67.3 embeds/sec/worker** on real Pi 5 +
|
||||
> AI HAT+ (9.6× over cpu-fallback baseline).
|
||||
>
|
||||
> **cpu-fallback remains the automatic failover.** When `model.hef`
|
||||
> isn't present, the worker uses host-CPU candle BERT-6 (~7
|
||||
> embeds/sec/worker on Pi 5). `deploy/download-cpu-fallback-model.sh`
|
||||
> fetches the safetensors trio with sha256 pinning.
|
||||
|
||||
[adr167]: ../../docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md
|
||||
[adr168]: ../../docs/adr/ADR-168-ruvector-hailo-cluster-cli-surface.md
|
||||
|
|
|
|||
|
|
@ -13,9 +13,23 @@ related: [ADR-SYS-0027, ADR-165, ADR-166]
|
|||
|
||||
## Status
|
||||
|
||||
**Iter 134/135 (2026-05-02): CPU fallback path is production-deployable
|
||||
today; HEF compile is unblocked at the tooling layer but blocked at the
|
||||
model-graph layer.** Branch `hailo-backend`.
|
||||
**Iter 163 (2026-05-03): NPU acceleration is the production default.**
|
||||
The full embed path runs on Hailo-8 hardware end-to-end via
|
||||
`HailoEmbedder` → `HefEmbedder` → tokenize → host-side BertEmbeddings
|
||||
→ `HefPipeline::forward` (NPU) → mean-pool → L2-normalize. Measured
|
||||
on real cognitum-v0 (Pi 5 + AI HAT+) at concurrency=4:
|
||||
**67.3 embeds/sec/worker, p50=57ms, 9.6× over cpu-fallback.**
|
||||
See ADR-176 for the EPIC tracking the iter 158-164 integration work.
|
||||
cpu-fallback remains as the automatic failover when no `model.hef`
|
||||
is present in the model dir.
|
||||
|
||||
---
|
||||
|
||||
**Earlier (iter 134/135) snapshot — CPU fallback only, HEF blocked:**
|
||||
|
||||
CPU fallback path is production-deployable today; HEF compile is
|
||||
unblocked at the tooling layer but blocked at the model-graph layer.
|
||||
Branch `hailo-backend`.
|
||||
|
||||
| Surface | Status |
|
||||
|---|---|
|
||||
|
|
|
|||
|
|
@ -17,6 +17,21 @@ unblocked via Rust-side SDK monkey-patch.** A working
|
|||
on Hailo-8 — 15.7 MB,
|
||||
sha256 `cdbc892765d3099f74723ee6c28ab3f0daade2358827823ba08d2969b07ebd40`.
|
||||
|
||||
**Iter 163 final update (2026-05-03): Option A is now the production
|
||||
default.** ADR-176 P5 shipped end-to-end NPU acceleration on real Pi
|
||||
5 + AI HAT+ hardware:
|
||||
|
||||
cpu-fallback (Option E): 7.0 embeds/sec/worker
|
||||
NPU HEF (Option A): 67.3 embeds/sec/worker (9.6× faster)
|
||||
p50 latency: 572 ms → 57 ms (10× faster)
|
||||
p99 latency: 813 ms → 152 ms (5.4× faster)
|
||||
|
||||
`HailoEmbedder` (iter 162, ADR-176 P4) routes
|
||||
`HEF > cpu-fallback > NoModelLoaded` automatically. Operators
|
||||
following deploy/install.sh + dropping `model.hef` into the model
|
||||
dir get NPU acceleration with no other config changes. cpu-fallback
|
||||
remains the failover path when no HEF is available.
|
||||
|
||||
The 156-iteration arc resolved every SDK bug encountered:
|
||||
1. KeyError input_layer1 (iter 142): keyed calibration dict by
|
||||
internal HN layer name discovered via `runner.get_hn()` introspection
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
adr: 176
|
||||
title: "EPIC — Wire HEF into HailoEmbedder for NPU-accelerated production embeddings"
|
||||
status: in-progress
|
||||
status: accepted
|
||||
date: 2026-05-03
|
||||
authors: [ruvnet, claude-flow]
|
||||
related: [ADR-167, ADR-172, ADR-173, ADR-175]
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue