Commit graph

10 commits

Author SHA1 Message Date
rUv
d102bb87ab fix: apply cargo fmt across workspace and fix CI issues
- Run cargo fmt --all to fix formatting in 362 files across the entire workspace
- Add PGDG repository for PostgreSQL 17 in CI test-all-features and benchmark jobs
- Add missing rvf dependency crates to standalone Dockerfile for domain-expansion
- Add sona-learning and domain-expansion features to standalone Dockerfile build
- Create npu.rs stub for ruvector-sparse-inference (fixes rustfmt resolution error)

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-02-21 20:56:38 +00:00
Claude
a8b9b25e16
feat: Add benchmarks for new features + persistence integration tests
Benchmarks (store.rs, 8 new bench tests):
- Batch scoring 10k blocks vs individual scoring
- 5-bit and 7-bit dequant fast paths (4096 values)
- 5-bit quantize fast path (4096 values)
- SVD adaptive rank selection (64x64 matrix)
- format_report and format_json throughput
- MetricsSeries trend computation (100 snapshots)

Persistence tests (10 tests, feature-gated):
- FileBlockIO: write/read, multi-tier, delete, overwrite, missing key
- FileMetaLog: append/get, upsert, iter, missing key, multi-block

354 tests pass (with --features persistence).

https://claude.ai/code/session_01Ksy165BL5nGpVoWaAfTE7t
2026-02-08 04:56:40 +00:00
Claude
52c8499574
feat: Wire coherence gate, epoch tracker, and metrics series into TieredStore
- Add CoherenceCheck, EpochTracker, MetricsSeries fields to TieredStore
- put() now records write epochs for staleness detection
- tick() auto-records metrics snapshots for trend analysis
- Add enable_coherence()/disable_coherence() + accessor methods
- Add coherence_check() convenience method on TieredStore
- 4 new integration tests verify wiring

https://claude.ai/code/session_01Ksy165BL5nGpVoWaAfTE7t
2026-02-08 04:50:33 +00:00
Claude
54d5cd83dd
feat: Implement all 11 temporal tensor improvements
- 5-bit quantize fast path (8 values → 5 bytes, no bit accumulator)
- 5-bit + 7-bit dequant fast paths (8-at-a-time byte extraction)
- Batch scoring: compute_scores_batch, choose_tiers_batch,
  score_and_partition, top_k_coldest with partial sort
- SVD improvements: reconstruction_error, energy_captured,
  compression_ratio, from_data_adaptive (auto-rank selection)
- Metrics dashboard: format_report, format_json, health_check
  with StoreHealthStatus, MetricsSeries with trend analysis
- Core trait integration (TensorStore, TensorStoreExt, TensorStoreSnapshot)
- AgentDB adapter (PatternIndex, InMemoryPatternIndex, AdaptiveTiering)
- Coherence gate (CoherenceCheck, EpochTracker, verify_put)
- Persistence layer (FileBlockIO, FileMetaLog, feature-gated)
- Stress/fuzz tests (8 adversarial scenarios)
- WASM FFI end-to-end test (feature-gated behind ffi)

306 tests pass (257 unit + 12 integration + 11 benchmarks +
14 property + 8 stress + 4 doctests).

https://claude.ai/code/session_01Ksy165BL5nGpVoWaAfTE7t
2026-02-08 04:34:12 +00:00
Claude
8ea1a54ffa
perf: Optimize quantizer + store; add comprehensive tests
- Eliminate round() call in quantize hot path (1.8x speedup)
- Add 3-bit dequant fast path (8-values-from-3-bytes, 2.4x speedup)
- Wire WitnessLog into TieredStore (put/get/evict audit trail)
- Add TieredStore.metrics() for aggregate store statistics
- Add TieredStore.witness_log() accessors
- Update store.get() to accept `now` tick for access tracking
- 14 property-based tests (roundtrip, bitpack, segment, delta, f16,
  score monotonicity, extreme values, compression ratio, determinism)
- 11 end-to-end integration tests (lifecycle, delta chain, quality
  sweep, persistence, eviction, checksum, multi-tensor, stress,
  compressor-to-store, factor reconstruction, witness logging)

Benchmarks (4096-element tensors, release mode):
  8-bit quantize: 10,745 ns (1.52 GB/s)
  8-bit dequant:    992 ns (16.52 GB/s)
  3-bit dequant:  2,998 ns (5.46 GB/s)
  Zipf P95 read:     41 ns
  Tier flip rate: 0.074/block/min (threshold: 0.1)

All 204 tests pass.

https://claude.ai/code/session_01Ksy165BL5nGpVoWaAfTE7t
2026-02-08 03:57:57 +00:00
Claude
f1423dac51
feat: Implement temporal tensor store with block-based tiered compression
Implements the block-based storage engine specified in ADR-018 through
ADR-023 with 5 new modules and 1 benchmark/test suite.

New modules:
- store.rs (1056 lines): BlockKey, BlockMeta, Tier, TieredStore with
  HashMap index, per-tier data storage, CRC32 checksums, eviction,
  and BlockIO/MetaLog/Clock traits
- tiering.rs (846 lines): EMA + popcount + recency scoring with LUT-based
  fast_exp_neg, hysteresis, min_residency, budgeted maintenance,
  MigrationCandidate selection, warm aggressive mode (7->5 bit)
- delta.rs (825 lines): Sparse delta format (u16 index + i16 value),
  DeltaChain with bounded length and compaction, FactorSet for
  low-rank reconstruction, encode/decode serialization
- metrics.rs (770 lines): WitnessLog (ring buffer), WitnessEvent enum
  (Access, TierChange, Eviction, Maintenance, Compaction, etc.),
  StoreMetrics aggregates, StoreSnapshot serialization
- store_ffi.rs (680 lines): tts_init/put/get/tick/stats/touch/evict
  WASM exports with u128 split into hi/lo u64, feature-gated

Optimizations:
- 8-bit fast path in quantizer: direct byte read/write, no bit
  accumulator. Dequant: 7313ns -> 1290ns (5.7x faster, 12.7 GB/s)
- 8-bit fast path in bitpack: direct copy, no accumulator.
  Pack: 8484ns -> 742ns (11.4x), Unpack: 8845ns -> 396ns (22.3x)
- #[inline] on hot functions

Benchmark results (release, 16KB blocks):
  Quantize 8-bit:  18.9us   Dequant 8-bit: 1.3us (12.7 GB/s)
  Quantize 3-bit:  22.5us   Dequant 3-bit: 7.2us (2.3 GB/s)
  Score compute:   10ns      Single frame decode: 178ns
  Segment 8-bit decode: 1.5us (11.2 GB/s)
  Zipf P95 read: 48ns        Tier flip rate: 0.074/block/min

Quality (all PASS):
  8-bit: 0.39% max error    7-bit: 0.79% max error
  5-bit: 3.33% max error    3-bit: 16.67% max error

Tests: 170 unit + 12 integration/benchmark, all passing.

https://claude.ai/code/session_01Ksy165BL5nGpVoWaAfTE7t
2026-02-08 03:18:51 +00:00
Claude
96730e60d1
docs: Add README for ruvector-temporal-tensor crate
Plain-language introduction explaining what temporal tensor compression
does and why it matters, feature tables, Quick Start with 4 code
examples (basic, streaming, random-access, custom policy), full API
reference, segment binary format spec, FFI/WASM guide, and build
instructions.

https://claude.ai/code/session_01U63xtGd5Q8mUevyY7nUSfJ
2026-02-06 01:01:48 +00:00
Claude
4f5d31f324
docs: Polish temporal tensor crate with clippy fixes, docs, and utilities
- Fix all clippy warnings: module-level //! docs, .div_ceil(), is_empty()
- Optimize segment::decode to call dequantize_f32 directly (skip legacy wrapper)
- Add decode_single_frame() for random-access frame decoding
- Add compression_ratio() utility for segment inspection
- Add comprehensive doc-examples with 3 tested examples in lib.rs
- Fix HEADER_SIZE offset bug in decode_single_frame (22 vs 26)
- All 41 unit tests + 3 doc-tests pass, 0 clippy warnings

https://claude.ai/code/session_01U63xtGd5Q8mUevyY7nUSfJ
2026-02-06 00:50:48 +00:00
Claude
8f0849389d
perf: Optimize temporal tensor compression hot paths
Key optimizations:
- Eliminate per-value modulo/division in dequantize by restructuring
  to iterate by frame→group→element (was: flat index with val_idx % tensor_len)
- Cache f32 scales in TemporalTensorCompressor to avoid repeated f16→f32
  conversion on every push_frame (drift check + quantization)
- Add optimized _f32 API variants (frame_fits_scales_f32, quantize_and_pack_f32,
  dequantize_f32) that accept pre-converted scales
- Pre-reserve Vec capacity in quantize_and_pack (avoids reallocations)
- Add #[inline] on qmax_from_bits, f32_to_f16_bits, f16_bits_to_f32
- Use chunks() iterator instead of manual index tracking in compute_scales

New tests (41 total, up from 33):
- Roundtrip tests for 5-bit and 7-bit quantization
- Non-finite value handling (NaN, Inf, -Inf)
- Single-element group edge case
- Compression ratio validation for all tiers
- Cold-tier (3-bit) full roundtrip
- Large tensor multi-group (512-dim, 50 frames)
- Accessor method coverage

https://claude.ai/code/session_01U63xtGd5Q8mUevyY7nUSfJ
2026-02-06 00:40:36 +00:00
Claude
b431351d75
feat: Add ADR-017 temporal tensor compression with tiered quantization
Introduces a complete temporal tensor compression system with:

- ADR-017: SOTA research-backed architecture decision record covering
  groupwise symmetric quantization, temporal segment reuse, access-pattern
  driven tier selection (8/7/5/3 bit), and WASM-compatible design

- ruvector-temporal-tensor crate (zero external dependencies):
  - tier_policy: Score-based hot/warm/cold bit-width selection
  - f16: Software IEEE 754 half-precision conversion
  - bitpack: Arbitrary bit-width stream packing (no alignment waste)
  - quantizer: Groupwise symmetric quantization with f16 scales
  - segment: Binary segment format (TQTC) encode/decode
  - compressor: Temporal segment manager with drift detection
  - ffi: WASM/C FFI with handle-based resource management

- ruvector-temporal-tensor-wasm crate for wasm32 targets

- 33 passing unit tests covering all modules

Compression targets: 4x (hot/8-bit), 4.57x (warm/7-bit),
6.4x (warm/5-bit), 10.67x (cold/3-bit) vs f32 baseline.

https://claude.ai/code/session_01U63xtGd5Q8mUevyY7nUSfJ
2026-02-06 00:28:21 +00:00