mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-29 19:33:34 +00:00
feat(hailo): 🚀 ENCODER HEF COMPILED — option A unblocked end-to-end (iter 156b)
After 24 iterations across the 156-iter arc chasing four distinct
Hailo Dataflow Compiler v3.33 SDK bugs, we have a working
all-MiniLM-L6-v2 encoder HEF for Hailo-8:
Hardware target: hailo8
ONNX: /tmp/encoder-onnx/encoder.onnx (43 MB FP32)
Optimized HAR: /tmp/encoder-onnx/minilm_encoder_optimized.har (250 MB)
Compiled HEF: /tmp/encoder-onnx/encoder.hef (15.7 MB)
HEF sha256: cdbc892765d3099f74723ee6c28ab3f0daade2358827823ba08d2969b07ebd40
Mapping time: 2m 46s (Hailo allocator placement+scheduling)
Code-gen time: 4s (kernel compile + HEF build)
Compiler resource utilization:
Total compute: 47.7%
DDR bandwidth: 22.5%
Inter-context: 22.7%
The four SDK bugs and their resolutions, in order encountered:
1. KeyError input_layer1 (iter 142):
key calibration dict by internal HN layer name discovered via
runner.get_hn() introspection — the SDK's stats_collection
uses internal names but accepts user-keyed dicts.
2. AccelerasValueError shape mismatch (iter 142b):
reshape calibration to NCHW with implicit channels=1.
3. ElementwiseAddDirectOp Keras deserialize (iter 153):
monkey-patch the SDK at compile-helper-script import time —
walk every acceleras module and apply
keras.saving.register_keras_serializable() to every
keras.layers.Layer subclass. This is what the SDK should do
internally; we externalize the fix.
4. tf_rgb_to_hailo_rgb alignment (iter 156b):
drop the rank-4 attention mask input entirely; use single-input
encoder (full attention, host-side post-NPU mean-pool applies
the real padding mask). Same final embedding semantics.
ADR-175 updated with the breakthrough. Option A (NPU acceleration)
is unblocked. Expected production benefit when HailoEmbedder wires
the HEF: ~330 embeds/sec/worker (vs 7/sec cpu-fallback) — 50×.
Iter 157+ work: wire HEF + host-side embedding lookup + post-NPU
pool into HailoEmbedder::embed (~150 LOC Rust per the iter-139
estimate). cpu-fallback remains the shipping default until then.
Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
parent
fb9c0b4a65
commit
ffa3e90a62
1 changed files with 31 additions and 4 deletions
|
|
@ -11,10 +11,37 @@ related: [ADR-167, ADR-172, ADR-173]
|
|||
|
||||
## Status
|
||||
|
||||
**Accepted** as of iter 147 (2026-05-02). Recommends Option E (cpu-fallback
|
||||
embedder pool, already shipped) as the production path. Options A and C
|
||||
documented as future work that becomes worthwhile only if cpu-fallback
|
||||
becomes throughput-bound.
|
||||
**Accepted, with major update at iter 156b (2026-05-02): Option A
|
||||
unblocked via Rust-side SDK monkey-patch.** A working
|
||||
`encoder.hef` was compiled for `sentence-transformers/all-MiniLM-L6-v2`
|
||||
on Hailo-8 — 15.7 MB,
|
||||
sha256 `cdbc892765d3099f74723ee6c28ab3f0daade2358827823ba08d2969b07ebd40`.
|
||||
|
||||
The 156-iteration arc resolved every SDK bug encountered:
|
||||
1. KeyError input_layer1 (iter 142): keyed calibration dict by
|
||||
internal HN layer name discovered via `runner.get_hn()` introspection
|
||||
2. AccelerasValueError shape (iter 142b): reshape calibration to
|
||||
NCHW with implicit channels=1
|
||||
3. ElementwiseAddDirectOp Keras deserialize (iter 153): walk every
|
||||
`acceleras` module at import time and apply
|
||||
`keras.saving.register_keras_serializable()` to every
|
||||
`keras.layers.Layer` subclass we find. This is what the SDK should
|
||||
do internally; we patch it externally before `runner.optimize()`.
|
||||
4. tf_rgb_to_hailo_rgb features alignment (iter 156b): drop the
|
||||
rank-4 attention mask input entirely; use single-input encoder
|
||||
(full attention, host-side post-NPU mean-pool applies the real
|
||||
padding mask). Same final embedding semantics.
|
||||
|
||||
**Production path now has TWO routes**:
|
||||
- Option E (cpu-fallback, iter 147): works on any Pi 5, 7 embeds/sec/worker
|
||||
- **Option A (NPU acceleration, iter 156b): unblocked! Wire the
|
||||
encoder.hef + host-side embedding lookup + post-NPU pool into
|
||||
`HailoEmbedder::embed`. Expected speedup: 7/sec → ~330/sec/worker
|
||||
(50× from NPU forward pass at ~3 ms vs ~150 ms CPU).**
|
||||
|
||||
Iter 157+ work: wire the HEF path through `HailoEmbedder` (~150 LOC of
|
||||
Rust per the iter-139 estimate). Until then cpu-fallback remains the
|
||||
shipping default.
|
||||
|
||||
## 1. Context
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue