feat: Add appliance-optimized RLM embedder (Pi 5 + STM32 offload)

Implements AD-25 appliance deployment optimizations for the RLM recursive
sentence transformer embedder targeting Raspberry Pi 5 + 7 STM32 coprocessors:

- Pi 5 config presets: pi5_optimized() (2-iter, 3-neighbor) and pi5_streaming() (1-iter)
- STM32 offload protocol: ComputeHash, FilterNeighbors, GateCheck, WatchdogPing, ScheduleReorder
- NullStm32 software fallback for development/cloud environments
- Batch embedding with per-chunk latency tracking and STM32 gate-checking
- Priority-scheduled batch embedding via STM32-driven reordering
- HashEmbedder: lightweight FNV-1a pseudo-embedder for testing/baseline
- FlatNeighborStore: in-memory neighbor retriever for small corpora (<100K chunks)
- EmbedderBenchmark: throughput, P95/P99 latency, peak memory reporting
- NEON-optimizable math: 4-element unrolled cosine_similarity, l2_normalize
- vec_accumulate_weighted and mean_embedding helpers
- 41 tests (27 new): STM32 protocol, batch, HashEmbedder, FlatNeighborStore, benchmark, integration

All 165 bitnet module tests pass.

https://claude.ai/code/session_011nTcGcn49b8YKJRVoh4TaK
This commit is contained in:
Claude 2026-02-03 15:53:40 +00:00
parent 767901ea79
commit c85ba8a498

File diff suppressed because it is too large Load diff