diff --git a/crates/ruvector-hailo-cluster/README.md b/crates/ruvector-hailo-cluster/README.md index 173cb780..149eb085 100644 --- a/crates/ruvector-hailo-cluster/README.md +++ b/crates/ruvector-hailo-cluster/README.md @@ -8,11 +8,17 @@ in-process caching, and Tailscale-tag-based discovery. > passing across lib unit / cluster integration / 3 CLI integration / 7 > doctest suites. > -> **End-to-end embedding works today** via the iter-134 cpu-fallback -> path — `cargo build --features cpu-fallback --bin ruvector-hailo-worker` -> + `deploy/download-cpu-fallback-model.sh` produces a worker that -> returns real 384-dim semantic vectors via candle BERT-6 on host CPU. -> NPU acceleration is gated on HEF model surgery (see [ADR-167][adr167] §6). +> **NPU acceleration is the production default** as of iter 163 +> (ADR-176). `cargo build --release --features hailo,cpu-fallback +> --bin ruvector-hailo-worker` produces a worker that auto-detects +> `model.hef` in the model dir and runs encoder forward pass on the +> Hailo-8 NPU. Measured **67.3 embeds/sec/worker** on real Pi 5 + +> AI HAT+ (9.6× over cpu-fallback baseline). +> +> **cpu-fallback remains the automatic failover.** When `model.hef` +> isn't present, the worker uses host-CPU candle BERT-6 (~7 +> embeds/sec/worker on Pi 5). `deploy/download-cpu-fallback-model.sh` +> fetches the safetensors trio with sha256 pinning. [adr167]: ../../docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md [adr168]: ../../docs/adr/ADR-168-ruvector-hailo-cluster-cli-surface.md diff --git a/docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md b/docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md index 906350e4..2b84c948 100644 --- a/docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md +++ b/docs/adr/ADR-167-ruvector-hailo-npu-embedding-backend.md @@ -13,9 +13,23 @@ related: [ADR-SYS-0027, ADR-165, ADR-166] ## Status -**Iter 134/135 (2026-05-02): CPU fallback path is production-deployable -today; HEF compile is unblocked at the tooling layer but blocked at the -model-graph layer.** Branch `hailo-backend`. +**Iter 163 (2026-05-03): NPU acceleration is the production default.** +The full embed path runs on Hailo-8 hardware end-to-end via +`HailoEmbedder` → `HefEmbedder` → tokenize → host-side BertEmbeddings +→ `HefPipeline::forward` (NPU) → mean-pool → L2-normalize. Measured +on real cognitum-v0 (Pi 5 + AI HAT+) at concurrency=4: +**67.3 embeds/sec/worker, p50=57ms, 9.6× over cpu-fallback.** +See ADR-176 for the EPIC tracking the iter 158-164 integration work. +cpu-fallback remains as the automatic failover when no `model.hef` +is present in the model dir. + +--- + +**Earlier (iter 134/135) snapshot — CPU fallback only, HEF blocked:** + +CPU fallback path is production-deployable today; HEF compile is +unblocked at the tooling layer but blocked at the model-graph layer. +Branch `hailo-backend`. | Surface | Status | |---|---| diff --git a/docs/adr/ADR-175-hailo-rust-side-workarounds.md b/docs/adr/ADR-175-hailo-rust-side-workarounds.md index e5b0bf6e..eb3f6dc1 100644 --- a/docs/adr/ADR-175-hailo-rust-side-workarounds.md +++ b/docs/adr/ADR-175-hailo-rust-side-workarounds.md @@ -17,6 +17,21 @@ unblocked via Rust-side SDK monkey-patch.** A working on Hailo-8 — 15.7 MB, sha256 `cdbc892765d3099f74723ee6c28ab3f0daade2358827823ba08d2969b07ebd40`. +**Iter 163 final update (2026-05-03): Option A is now the production +default.** ADR-176 P5 shipped end-to-end NPU acceleration on real Pi +5 + AI HAT+ hardware: + + cpu-fallback (Option E): 7.0 embeds/sec/worker + NPU HEF (Option A): 67.3 embeds/sec/worker (9.6× faster) + p50 latency: 572 ms → 57 ms (10× faster) + p99 latency: 813 ms → 152 ms (5.4× faster) + +`HailoEmbedder` (iter 162, ADR-176 P4) routes +`HEF > cpu-fallback > NoModelLoaded` automatically. Operators +following deploy/install.sh + dropping `model.hef` into the model +dir get NPU acceleration with no other config changes. cpu-fallback +remains the failover path when no HEF is available. + The 156-iteration arc resolved every SDK bug encountered: 1. KeyError input_layer1 (iter 142): keyed calibration dict by internal HN layer name discovered via `runner.get_hn()` introspection diff --git a/docs/adr/ADR-176-hef-integration-epic.md b/docs/adr/ADR-176-hef-integration-epic.md index 8e94fe27..494aca10 100644 --- a/docs/adr/ADR-176-hef-integration-epic.md +++ b/docs/adr/ADR-176-hef-integration-epic.md @@ -1,7 +1,7 @@ --- adr: 176 title: "EPIC — Wire HEF into HailoEmbedder for NPU-accelerated production embeddings" -status: in-progress +status: accepted date: 2026-05-03 authors: [ruvnet, claude-flow] related: [ADR-167, ADR-172, ADR-173, ADR-175]