ruvector

mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-28 01:44:41 +00:00

History

ruvnet 4edd404328 feat(hailo): cpu-fallback embedder pool — 1.75x throughput, p99 halved (iter 147) The single-Mutex around BertModel was capping cluster throughput at 25.7 embeds/sec regardless of how many concurrent client threads dispatched (8-thread bench got the same single-thread number — they all queued on one lock). Iter 147 replaces the single Mutex with a pool of N independent BertModel instances, each in its own Mutex. `embed()` round-robins through slots via try_lock (parallel work in the happy case) and falls through to a blocking lock on the originally chosen slot if all are busy (bounded wait, fair-ish under load). Sizing: `RUVECTOR_CPU_FALLBACK_POOL_SIZE` env var, default 1 (backward compat). Recommended on Pi 5: 4 (one per Cortex-A76 core). Memory cost: each BertModel calls `from_mmaped_safetensors` on the same .safetensors file. The OS dedupes the 90 MB weight blob into shared physical pages, so per-slot memory cost is just the candle graph structure (~few hundred KB). Pool=4 ≈ 100 MB resident vs 90 MB for pool=1. Measured throughput (cluster-bench, x86 release, concurrency=8, pool=4): throughput_per_s : 45.0 (was 25.7 with pool=1 → 1.75× improvement) latency_us p50 : 175,164 (was 279,315 → tail latency cut by 37%) latency_us p99 : 278,993 (was 581,620 → 52% reduction) On Pi 5 with 4 Cortex-A76 cores the speedup will likely be closer to linear (4×) since the bottleneck is pure CPU compute, not lock contention. Also drops `docs/hailo/HAILO-SUPPORT-TICKET.md` — pre-drafted ticket text covering the three SDK bugs (KeyError, AccelerasValueError, ElementwiseAddDirectOp Keras serialize) with the encoder ONNX repro and stack traces. Ready to paste into Hailo's developer zone. 99 cluster lib tests + 14 hailo lib tests pass; strict clippy clean both feature combos. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-02 17:30:38 -04:00
..
HAILO-SUPPORT-TICKET.md	feat(hailo): cpu-fallback embedder pool — 1.75x throughput, p99 halved (iter 147)	2026-05-02 17:30:38 -04:00

ruvnet 4edd404328 feat(hailo): cpu-fallback embedder pool — 1.75x throughput, p99 halved (iter 147)

The single-Mutex around BertModel was capping cluster throughput at
25.7 embeds/sec regardless of how many concurrent client threads
dispatched (8-thread bench got the same single-thread number — they
all queued on one lock). Iter 147 replaces the single Mutex with a
pool of N independent BertModel instances, each in its own Mutex.

`embed()` round-robins through slots via try_lock (parallel work in
the happy case) and falls through to a blocking lock on the originally
chosen slot if all are busy (bounded wait, fair-ish under load).

**Sizing**: `RUVECTOR_CPU_FALLBACK_POOL_SIZE` env var, default 1
(backward compat). Recommended on Pi 5: 4 (one per Cortex-A76 core).

**Memory cost**: each BertModel calls `from_mmaped_safetensors` on
the same .safetensors file. The OS dedupes the 90 MB weight blob into
shared physical pages, so per-slot memory cost is just the candle
graph structure (~few hundred KB). Pool=4 ≈ 100 MB resident vs 90 MB
for pool=1.

**Measured throughput** (cluster-bench, x86 release, concurrency=8,
pool=4):
  throughput_per_s : 45.0  (was 25.7 with pool=1 → 1.75× improvement)
  latency_us p50   : 175,164  (was 279,315 → tail latency cut by 37%)
  latency_us p99   : 278,993  (was 581,620 → 52% reduction)

On Pi 5 with 4 Cortex-A76 cores the speedup will likely be closer to
linear (4×) since the bottleneck is pure CPU compute, not lock
contention.

Also drops `docs/hailo/HAILO-SUPPORT-TICKET.md` — pre-drafted ticket
text covering the three SDK bugs (KeyError, AccelerasValueError,
ElementwiseAddDirectOp Keras serialize) with the encoder ONNX repro
and stack traces. Ready to paste into Hailo's developer zone.

99 cluster lib tests + 14 hailo lib tests pass; strict clippy clean
both feature combos.

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-05-02 17:30:38 -04:00

HAILO-SUPPORT-TICKET.md

feat(hailo): cpu-fallback embedder pool — 1.75x throughput, p99 halved (iter 147)

2026-05-02 17:30:38 -04:00