ruvector

mirror of https://github.com/ruvnet/RuVector.git synced 2026-07-09 17:28:42 +00:00

History

OceanLi 34390efe56 feat(ruvllm): add lattice as an optional macOS Metal LlmBackend (#642 ) * feat(ruvllm): add lattice as an optional macOS LlmBackend Adds LatticeBackend, a pluggable LlmBackend implementation over lattice-inference's pure-Rust Qwen3.5 Metal GPU forward pass, gated behind a new default-OFF `lattice` feature (macOS-only: dependency under [target.'cfg(target_os = "macos")'.dependencies], module gated #[cfg(all(feature = "lattice", target_os = "macos"))]). - MetalQwen35State (!Send) is owned by a dedicated worker thread, mirroring lattice_serve.rs's spawn_worker/run_worker_loop pattern, but over plain std::sync::mpsc (TokenStream is std-mpsc-backed). - generate_stream_v2 streams every real decoded token via generate_streaming_with_cancel, unlike candle's prefill-only stream stub. - get_embeddings returns RuvLLMError::NotImplemented (honest, per ratified O1) rather than a fake zero vector. - create_backend() precedence: lattice (if enabled) > candle > NoopBackend. Root Cargo.toml carries an uncommitted dev-only [patch.crates-io] pointing lattice-inference at a local checkout; not included in this commit. * fix(ruvllm): enforce stop strings + reject unsupported penalties in LatticeBackend Codex round-1 fixes: - MAJOR 1: lattice's Metal generation loops honor EOS/stop_token_ids but not GenerateConfig::stop_strings, so callers' stop_sequences were silently ignored. Added StopScan: incremental stop-string scanner that holds back the longest possible stop prefix (char-boundary safe), excludes the matched stop from output, and halts generation through the token callback. Both generate (via the streaming loop, so a match actually stops decode) and generate_stream_v2 route through it; no stop strings = zero-overhead path. - MAJOR 2: frequency_penalty/presence_penalty are live ruvllm fields (serving/engine.rs:547, mistral_backend.rs:907), not dead ones; nonzero values now fail fast with NotImplemented instead of being silently dropped. - MINOR 3: em dashes removed from all added lines (repo prose lint). - 6 non-GPU unit tests: StopScan cut/holdback/multi-stop/UTF-8 + penalty rejection on both entry points. * chore(ruvllm): bump lattice-inference to 0.5 * fix(ruvllm): adapt LatticeBackend to lattice-inference 0.5 Result APIs generate and generate_streaming_with_cancel return Result in lattice 0.5; propagate failures as RuvLLMError::Backend on the once path and StreamEvent::Error on the stream path instead of unwrapping. * bench(ruvllm): add lattice_bench example, reproducible backend throughput harness Measures load time, TTFT, and decode throughput for the lattice backend (stream and blocking legs), with a BENCH_GREEDY env toggle so results can be compared against greedy standalone-engine numbers using the same prefill-canceling slope method. The candle backend is timed via blocking generate() only; its generate_stream_v2 emits a single token from prefill logits and is not a decode loop. Feature-gated: builds as a stub without the lattice feature. * docs(ruvllm): model-prep guide for lattice_bench + rustfmt The bench doc header now walks through obtaining a runnable model dir: f16 safetensors straight from HuggingFace, or quantizing with lattice's quantize_q4 and copying tokenizer.json + config.json next to the .q4 output (the quantizer writes weights only). Documents all flags and the BENCH_GREEDY toggle. README points to it from the lattice section. Also applies rustfmt to lattice_backend.rs (import order, comment alignment). * fix(ruvllm): derive safetensors precision label from torch_dtype, not hard-coded Bf16 load_worker_state stamped every safetensors checkpoint as Quantization::Bf16, so an f16/f32 checkpoint got a false precision label in ModelInfo and a wrong bytes_per_weight in the num_parameters estimate. Read torch_dtype from the already-open config.json instead — the same honesty guard lattice_bench.rs applies to the candle side — falling back to Bf16 (the Qwen3.5 release dtype and the previous fixed label) when the field is missing or unmapped, since a label must not fail a load that from_safetensors already accepted. Verified on macOS arm64 (M4, Metal): cargo test -p ruvllm --features lattice green, including the new safetensors_precision_label_follows_torch_dtype test. Co-Authored-By: claude-flow <ruv@ruv.net> Claude-Session: https://claude.ai/code/session_017sXWL4ox5bhC86FYwJpmyK --------- Co-authored-by: ruvnet <ruv@ruv.net>		2026-07-05 11:10:37 -04:00
..
backends	feat(ruvllm): add lattice as an optional macOS Metal LlmBackend (#642 )	2026-07-05 11:10:37 -04:00
bitnet	test: remove 12 flaky tests previously quarantined with #[ignore] (#393 )	2026-04-26 23:10:00 -04:00
claude_flow	fix(security): RUSTSEC advisories + clippy hardening in RuVector (#504 )	2026-05-23 05:40:24 -04:00
context	chore(workspace): clippy-clean every crate under -D warnings + fmt + repair pre-existing broken benches	2026-04-25 17:00:20 -04:00
evaluation	fix(security): RUSTSEC advisories + clippy hardening in RuVector (#504 )	2026-05-23 05:40:24 -04:00
gguf	fix: apply cargo fmt across workspace and fix CI issues	2026-02-21 20:56:38 +00:00
hub	fix(security): RUSTSEC advisories + clippy hardening in RuVector (#504 )	2026-05-23 05:40:24 -04:00
intelligence	fix: apply cargo fmt across workspace and fix CI issues	2026-02-21 20:56:38 +00:00
kernels	fix: add missing pg17 feature flag in pgrx test commands and fix rustdoc link errors	2026-02-21 22:44:28 +00:00
lora	fix: 19 surfaced test failures in ruvllm + prime-radiant (post PR #389 )	2026-04-26 12:18:31 -04:00
metal	fix(ci): Apple Silicon tests and gitignore improvements	2026-03-16 23:21:02 -04:00
models	feat(timesfm): TimesFM 1.0 200M decoder-only inference port to candle (#603 )	2026-06-25 13:52:42 -04:00
moe	chore(workspace): clippy-clean every crate under -D warnings + fmt + repair pre-existing broken benches	2026-04-25 17:00:20 -04:00
optimization	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00
qat	fix: 19 surfaced test failures in ruvllm + prime-radiant (post PR #389 )	2026-04-26 12:18:31 -04:00
quality	style: cargo fmt — formatting fix for ruvllm coherence + claude_dataset	2026-04-26 12:20:13 -04:00
quantize	test(ruvllm): fix 4 surfaced integration-test failures	2026-04-26 13:46:46 -04:00
reasoning_bank	test: fix reasoning_bank lock contention + ignore nervous-system perf gate	2026-04-26 12:52:16 -04:00
reflection	fix: add missing pg17 feature flag in pgrx test commands and fix rustdoc link errors	2026-02-21 22:44:28 +00:00
serving	chore(workspace): clippy-clean every crate under -D warnings + fmt + repair pre-existing broken benches	2026-04-25 17:00:20 -04:00
sona	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00
tests	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00
training	style: cargo fmt — formatting fix for ruvllm coherence + claude_dataset	2026-04-26 12:20:13 -04:00
adapter_manager.rs	chore(workspace): clippy-clean every crate under -D warnings + fmt + repair pre-existing broken benches	2026-04-25 17:00:20 -04:00
autodetect.rs	fix: 19 surfaced test failures in ruvllm + prime-radiant (post PR #389 )	2026-04-26 12:18:31 -04:00
capabilities.rs	feat(training): RuvLTRA v2.4 Ecosystem Edition - 100% routing accuracy (#123 )	2026-01-20 20:08:30 -05:00
error.rs	feat(training): RuvLTRA v2.4 Ecosystem Edition - 100% routing accuracy (#123 )	2026-01-20 20:08:30 -05:00
kv_cache.rs	style(ruvllm): fix rustfmt formatting in turbo_quant and kv_cache	2026-03-25 13:43:36 +00:00
lib.rs	feat(ruvllm): add lattice as an optional macOS Metal LlmBackend (#642 )	2026-07-05 11:10:37 -04:00
memory_pool.rs	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00
paged_attention.rs	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00
policy_store.rs	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00
ruvector_integration.rs	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00
session.rs	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00
session_index.rs	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00
speculative.rs	fix(security): RUSTSEC advisories + clippy hardening in RuVector (#504 )	2026-05-23 05:40:24 -04:00
tokenizer.rs	ADR-179: ruvllm 4-Pi 5 + Hailo HAT cluster — SOTA 20.5 tok/s, 28 iter loop (#423 )	2026-05-05 08:36:32 -04:00
types.rs	feat(training): RuvLTRA v2.4 Ecosystem Edition - 100% routing accuracy (#123 )	2026-01-20 20:08:30 -05:00
witness_log.rs	style: apply rustfmt across entire codebase	2026-01-28 17:00:26 +00:00