feat(hailo): cpu-fallback fingerprint integrity + ADR-167 SDK bug chain (iter 143)

Production fix: cpu-fallback workers now produce a real model
fingerprint instead of empty-string. Previously, compute_fingerprint
only hashed model.hef + vocab.txt so cpu-fallback workers always
reported empty, which caused the cluster's ADR-167 §8.3 fleet
integrity check to silently skip them.

compute_fingerprint now also hashes model.safetensors + tokenizer.json
+ config.json (streaming the safetensors so we don't hold 90 MB in
RAM). NPU-layout vs cpu-fallback workers produce different
fingerprints by design — they run different code paths so the cluster
will refuse to mix them.

Verified end-to-end: booted cpu-fallback worker against
/tmp/cpu-fallback-test, got real fingerprint 2517aa00... (was empty
before). One new lib test, total 16 fingerprint tests green.

Worker startup warning updated to mention both layouts.

ADR-167 documents the iter-142/142b/143 SDK bug chain found by reading
hailo_sdk source: KeyError fixed by internal-layer-name keying;
AccelerasValueError fixed by 4D NCHW calib; then TypeError on
ElementwiseAddDirectOp deserialization in spawned subprocess — that
last one is beyond user-space patching. NPU acceleration remains
blocked; cpu-fallback remains the production path.

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
ruvnet 2026-05-02 17:16:31 -04:00
parent fb091c7567
commit 4ca64ed7d3
3 changed files with 112 additions and 17 deletions

View file

@ -337,8 +337,10 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let fingerprint = compute_fingerprint(&model_dir);
if fingerprint.is_empty() {
warn!(
"model_dir {} has no model.hef / vocab.txt — fingerprint empty; \
coordinators will skip the integrity check",
"model_dir {} has no recognizable model artifacts \
(NPU: model.hef + vocab.txt; cpu-fallback: model.safetensors + \
tokenizer.json + config.json) fingerprint empty; coordinators \
will skip the integrity check",
model_dir.display()
);
} else {

View file

@ -1,29 +1,50 @@
//! Model fingerprint — sha256 over HEF + tokenizer artifacts.
//! Model fingerprint — sha256 over the model artifacts on disk.
//!
//! ADR-167 §8.3 fleet integrity guard: coordinators refuse to mix
//! workers reporting different model fingerprints. Computed at worker
//! startup over the files actually loaded, so any swap of HEF or
//! vocab.txt produces a different fingerprint and the coordinator can
//! eject the drift.
//! startup over the files actually loaded, so any swap of model
//! weights or tokenizer produces a different fingerprint and the
//! coordinator can eject the drift.
//!
//! Format: hex-lowercase, 64 chars.
//! Iter 143 — covers both deployment paths:
//! * NPU path: sha256(model.hef || vocab.txt)
//! * cpu-fallback: sha256(model.safetensors || tokenizer.json || config.json)
//!
//! Mixed clusters (some workers on NPU, some on CPU) intentionally
//! produce different fingerprints — they're running different code
//! paths so the cluster should reject the mix.
//!
//! Format: hex-lowercase, 64 chars. Empty when no recognizable model
//! artifacts are present in `model_dir`.
use sha2::{Digest, Sha256};
use std::path::Path;
/// Compute sha256 over (hef_bytes || vocab_bytes). Missing files are
/// treated as empty so the fingerprint is *also* a witness of which
/// files exist — a worker that loads only the HEF (no tokenizer)
/// produces a different fingerprint than a worker with both.
/// Compute sha256 over the model artifacts. Missing files are treated
/// as empty within their layout so the fingerprint is *also* a witness
/// of which files exist — a worker that loads only the HEF (no
/// tokenizer) produces a different fingerprint than a worker with both.
///
/// Returns "" when both files are missing — caller (worker startup)
/// uses empty-string as "skip the check" sentinel until step 6 lands.
/// Returns "" when neither layout has any recognizable file — caller
/// (worker startup) uses empty-string as "skip the check" sentinel.
pub fn compute_fingerprint(model_dir: &Path) -> String {
let hef = std::fs::read(model_dir.join("model.hef")).unwrap_or_default();
let vocab = std::fs::read(model_dir.join("vocab.txt")).unwrap_or_default();
if hef.is_empty() && vocab.is_empty() {
// Iter 143: cpu-fallback artifacts. We don't read the full
// safetensors (90 MB) into memory — sha256 it in streaming chunks.
let safetensors_present = model_dir.join("model.safetensors").exists();
let tokenizer_json = std::fs::read(model_dir.join("tokenizer.json")).unwrap_or_default();
let config_json = std::fs::read(model_dir.join("config.json")).unwrap_or_default();
let npu_layout_present = !hef.is_empty() || !vocab.is_empty();
let cpu_layout_present =
safetensors_present || !tokenizer_json.is_empty() || !config_json.is_empty();
if !npu_layout_present && !cpu_layout_present {
return String::new();
}
let mut h = Sha256::new();
// Length-prefix each input so a hef of N bytes + vocab of M bytes
// never collides with a hef of N+M bytes + empty vocab.
@ -31,6 +52,30 @@ pub fn compute_fingerprint(model_dir: &Path) -> String {
h.update(&hef);
h.update((vocab.len() as u64).to_le_bytes());
h.update(&vocab);
// Stream-hash the safetensors so we don't read 90 MB into memory.
if safetensors_present {
// Tag with file marker so an empty hef doesn't blend with safetensors.
h.update(b"safetensors:");
if let Ok(mut f) = std::fs::File::open(model_dir.join("model.safetensors")) {
let mut buf = [0u8; 64 * 1024];
let mut total: u64 = 0;
use std::io::Read;
while let Ok(n) = f.read(&mut buf) {
if n == 0 {
break;
}
h.update(&buf[..n]);
total += n as u64;
}
h.update(total.to_le_bytes());
}
}
h.update((tokenizer_json.len() as u64).to_le_bytes());
h.update(&tokenizer_json);
h.update((config_json.len() as u64).to_le_bytes());
h.update(&config_json);
let digest = h.finalize();
hex_lower(&digest)
}
@ -95,6 +140,37 @@ mod tests {
assert_ne!(fp2, fp3);
}
#[test]
fn cpu_fallback_safetensors_layout_yields_distinct_fingerprint() {
// Iter 143: a worker with safetensors+tokenizer+config but no
// hef must produce a non-empty fingerprint, distinct from a
// worker with the same files but different content.
let d1 = tmpdir();
std::fs::write(d1.path().join("model.safetensors"), b"weights-A").unwrap();
std::fs::write(d1.path().join("tokenizer.json"), b"tok-A").unwrap();
std::fs::write(d1.path().join("config.json"), b"cfg-A").unwrap();
let fp1 = compute_fingerprint(d1.path());
assert_eq!(fp1.len(), 64);
assert!(!fp1.is_empty());
let d2 = tmpdir();
std::fs::write(d2.path().join("model.safetensors"), b"weights-B").unwrap();
std::fs::write(d2.path().join("tokenizer.json"), b"tok-A").unwrap();
std::fs::write(d2.path().join("config.json"), b"cfg-A").unwrap();
let fp2 = compute_fingerprint(d2.path());
assert_ne!(fp1, fp2, "different safetensors must yield different fp");
// Per ADR-167 §8.3, an NPU-layout worker and a cpu-fallback
// worker run different code paths so their fingerprints SHOULD
// differ even with the same logical model — the cluster will
// refuse to mix them.
let d3 = tmpdir();
std::fs::write(d3.path().join("model.hef"), b"weights-A").unwrap();
std::fs::write(d3.path().join("vocab.txt"), b"tok-A").unwrap();
let fp3 = compute_fingerprint(d3.path());
assert_ne!(fp1, fp3, "NPU layout vs cpu-fallback must differ");
}
#[test]
fn length_prefix_prevents_split_collision() {
// Without length-prefixing, sha256(b"abc" || b"de") == sha256(b"ab" || b"cde").

View file

@ -66,10 +66,27 @@ re-exported the BERT encoder block in isolation:
quantized weights (full-precision HEF would be possible on
hailo15h but not hailo8).
**Iter 142/142b/143 follow-up debugging** (after reading the SDK source):
- Root-caused the iter-139 `KeyError` to a mismatch between
`_get_build_inputs` (returns dict keyed by user dataset keys) and
`hailo_model.build` (looks up by internal `flow.input_nodes` names).
Workaround: introspect the parsed HN, key the calibration dict by
the actual layer name (`minilm_encoder/input_layer1`).
- Past that: `AccelerasValueError` shape mismatch — Hailo's HN treats
inputs as 4D NCHW with implicit channels=1. Workaround: reshape
calibration from `[batch, seq, hidden]` to `[batch, 1, seq, hidden]`.
- Past **that**: a Keras serialization bug —
`TypeError: Could not locate class 'ElementwiseAddDirectOp'`
during the SDK's deepcopy of its own internal layer types. This is
hailo_model_optimization deepcopy-ing a custom Keras layer it
registered itself, then failing to deserialize it because the
`@register_keras_serializable` decorator isn't running in the
spawned subprocess. Cannot be fixed from user-space.
**Status:** the encoder ONNX is fundamentally Hailo-compatible (it
parses + full-precision-optimizes cleanly). The remaining gap is an
SDK-internal bug in INT8 quantization of transformer encoders that
can't be worked around from user-space. The cleanest unblock paths:
parses + full-precision-optimizes cleanly). The remaining gap is a
chain of SDK-internal bugs in INT8 quantization of transformer encoders
that can't be worked around from user-space. The cleanest unblock paths:
1. Hailo support ticket (the SDK should not KeyError on a layer it
knows about — this is a quantization-flow bug, not a
user-input bug)