ruvector/docs/adr/ADR-090-implementation-checklist.md
Reuven 6c1a674a6c docs(adr): add governance improvements to ADR-090/091, create ADR-092, add implementation checklists
ADR-090 (Ultra-Low-Bit QAT):
- Changed status to "Accepted (Staged Implementation)"
- Added decision statement choosing LoRA-QAT as first path
- Added staged implementation phases (4 phases, explicit gates)
- Added validation plan defining "better" (MSE, spectral, cosine, outlier retention)
- Added reasoning preservation metrics (PPL, GSM8K, HumanEval, tool use, long context)
- Added system invariants (INV-1 through INV-8)
- Added acceptance gates (G1-G6) with rollback triggers
- Restructured success criteria into correctness/performance/quality/rollout

ADR-091 (INT8 CNN Quantization):
- Changed status to "Accepted"
- Added decision statement with acceptance benchmark
- Added system invariants (INV-1 through INV-8)
- Added operator coverage table (11 operators)
- Added graph rewrite passes section (4 passes)
- Added deployment policy matrix
- Added acceptance gates (7 gates) with rollback conditions

ADR-092 (MoE Memory-Aware Routing):
- Split from ADR-090 as routing affects scheduling/cache, not representation
- Added decision statement with acceptance benchmark (≥70% cache hit rate)
- Added system invariants (INV-1 through INV-6)
- Added acceptance gates (G1-G5) with rollback conditions
- Added domain analysis with bounded context

Implementation Checklists:
- ADR-090: 6 phases, ~28 files, 16 new + 12 extended
- ADR-091: 6 phases, acceptance gate verification commands

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-03-12 10:47:00 -04:00

12 KiB
Raw Permalink Blame History

ADR-090 Implementation Checklist: Ultra-Low-Bit QAT & Pi-Quantization

ADR: ADR-090-ultra-low-bit-qat-pi-quantization-ddd.md Status: Ready for Implementation (Staged) Target Crate: ruvllm


Phase 1: PiQ3 PTQ (Weeks 1-3)

1.1 Pi-Quantization Core

  • File: crates/ruvllm/src/quantize/pi_quant.rs
    • PiQuantizer struct (bits, k, alpha per-channel)
    • quantize_scalar() method with pi/k step size
    • quantize_block() method for packed storage
    • Pi3BitBlock packed format (3 bytes → 8 values)
    • Pi2BitBlock packed format (1 byte → 4 values)
    • Unit tests for quantize/dequantize round-trip
    • Invariant Check: INV-2 (scale positivity), INV-3 (step size constraint)

1.2 Pi-Quant SIMD Kernels

  • File: crates/ruvllm/src/quantize/pi_quant_simd.rs
    • pi_dequantize_neon() - ARM NEON kernel
    • pi_dequantize_avx2() - x86_64 AVX2 kernel
    • Scalar reference implementation
    • Kernel equivalence tests (≤1 ULP, INV-8)

1.3 TargetFormat Extension

  • File: crates/ruvllm/src/quantize/ruvltra_quant.rs
    • Add PiQ3 variant to TargetFormat enum
    • Add PiQ2 variant to TargetFormat enum
    • Implement bits_per_weight() for new variants
    • Update quantize_tensor() dispatch

1.4 GGUF Type Registration

  • File: crates/ruvllm/src/gguf/quantization.rs
    • Register PiQ3 = 40 type ID (INV-7)
    • Register PiQ2 = 41 type ID
    • Implement serialization/deserialization

1.5 Phase 1 Validation

  • Metrics collection: MSE, spectral distortion, cosine similarity, outlier retention
  • Gate G1: PiQ3 beats uniform Q3 on ≥2/4 quality metrics
  • Benchmark: benches/pi_quant_bench.rs created

Phase 2: PiQ3 + LoRA-QAT (Weeks 4-7)

2.1 Straight-Through Estimator

  • File: crates/ruvllm/src/qat/ste.rs
    • SteVariant enum (Standard, Clipped, LearnedStepSize, Ewgs)
    • backward() method for each variant
    • Gradient correctness tests vs PyTorch reference (INV-1)

2.2 Differentiable Quantization

  • File: crates/ruvllm/src/qat/differentiable_quant.rs
    • DifferentiableQuantizer trait
    • PiQuantDifferentiable impl
    • Forward/backward pass with STE
    • Scale gradient computation for LSQ variant

2.3 Calibration Pipeline

  • File: crates/ruvllm/src/qat/calibration.rs
    • CalibrationEngine aggregate root
    • Mixed-domain calibration (tool use + reasoning)
    • Per-layer scale/zero-point initialization
    • Calibration artifact serialization (INV-5)
    • Integration with training/tool_dataset.rs
    • Integration with training/claude_dataset.rs

2.4 Distillation Loss

  • File: crates/ruvllm/src/qat/distillation.rs
    • DistillationLoss struct
    • L_task component (task loss)
    • L_KD component (KL divergence from teacher)
    • L_reasoning component (CoT fidelity)
    • Composite loss with configurable weights

2.5 Reasoning Loss

  • File: crates/ruvllm/src/qat/reasoning_loss.rs
    • Chain-of-thought fidelity loss
    • Step-wise reasoning preservation
    • Integration with evaluation harness

2.6 LoRA-QAT Integration

  • File: crates/ruvllm/src/qat/lora_qat.rs

    • LoraQatTrainer struct
    • Quantization-aware LoRA forward pass
    • Memory-efficient gradient checkpointing
    • Integration with lora/micro_lora.rs
    • Integration with lora/training.rs
  • File: crates/ruvllm/src/lora/micro_lora.rs (extend)

    • Add AdapterMode::Qat variant
    • Support quantized base + FP32 adapter

2.7 QAT Training Loop

  • File: crates/ruvllm/src/qat/training_loop.rs
    • QatTrainer orchestrator
    • run() method: calibrate → train → export
    • Epoch metrics: loss, PPL, reasoning score
    • Domain event emission: QatEpochComplete

2.8 QAT Config

  • File: crates/ruvllm/src/qat/config.rs
    • QatConfig struct (bits, STE variant, loss weights, epochs)
    • QuantGranularity enum (PerTensor, PerChannel, PerToken)
    • Serialization for config persistence

2.9 Phase 2 Validation

  • Reasoning metrics: PPL delta, GSM8K delta, HumanEval delta, tool use delta, long context
  • Gate G2: All 5 reasoning metrics within acceptable delta
  • Memory check: LoRA-QAT ≤2 GB for 0.5B model

Phase 3: PiQ2 + Incoherence (Weeks 8-10)

3.1 Hadamard Transform

  • File: crates/ruvllm/src/quantize/hadamard.rs
    • HadamardTransform struct
    • forward_inplace() - O(n log n) Walsh-Hadamard
    • inverse_inplace() - inverse transform
    • Random sign flip support
    • Property test: H × H^T = n × I (INV-4)

3.2 Incoherence Processing

  • File: crates/ruvllm/src/quantize/incoherence.rs
    • IncoherenceTransform aggregate root
    • Apply Hadamard before quantization
    • Store transform metadata
    • Domain event: IncoherenceApplied

3.3 QuIP-Enhanced Quantization

  • File: crates/ruvllm/src/quantize/quip.rs
    • Q2_QuIP variant in TargetFormat
    • Combine incoherence + 2-bit K-quant
    • Metadata for inverse transform

3.4 PiQ2 Implementation

  • File: crates/ruvllm/src/quantize/pi_quant.rs (extend)
    • quantize_2bit() method
    • 2-bit packing (reuse bitnet/ternary_tensor.rs pattern)
    • Integration with incoherence pipeline

3.5 Phase 3 Validation

  • Gate G3: PiQ2 + incoherence achieves acceptable quality without full QAT
  • Benchmark: 2-bit throughput targets

Phase 4: WASM Integration (Week 11)

4.1 WASM SIMD Kernels

  • File: crates/ruvllm/src/quantize/pi_quant_wasm_simd.rs
    • pi_dequant_wasm_simd() using SIMD128
    • Reuse LUT pattern from bitnet/tl1_wasm.rs
    • In-browser kernel tests

4.2 WASM Bindings

  • File: crates/ruvllm-wasm/src/pi_quant_wasm.rs
    • PiQuantWasm wasm_bindgen struct
    • quantize() method
    • dequantize() method
    • computeMse() method
    • spectralDistortion() method
    • JSON serialization

4.3 WASM Benchmarks

  • File: crates/ruvllm-wasm/src/quant_bench_wasm.rs
    • QuantBenchWasm struct
    • runBench() method
    • compareFormats() method

4.4 WASM Feature Gating

  • File: crates/ruvllm-wasm/Cargo.toml
    • Add pi-quant feature
    • Add qat feature (depends on pi-quant)
    • Feature flag tests

Phase 5: Security & Observability (Week 12)

5.1 Weight Integrity

  • File: crates/ruvllm/src/quantize/security.rs
    • WeightIntegrity struct (hashes, config)
    • validate_quantized_model() function
    • SHA-256 checksum computation
    • GGUF security validation

5.2 Observability

  • File: crates/ruvllm/src/qat/metrics.rs
    • Per-epoch loss tracking
    • Quality metric export
    • Training duration metrics

Phase 6: Integration & Benchmarks (Weeks 13-14)

6.1 SONA Integration

  • File: crates/ruvllm/src/sona/integration.rs (extend)
    • Tier 2: Quantization scale adaptation
    • Quality signal for dynamic precision

6.2 Evaluation Harness Extension

  • File: crates/ruvllm/src/evaluation/real_harness.rs (extend)
    • Add quantized model evaluation
    • GSM8K, HumanEval, WikiText-2 benchmarks
    • Tool use evaluation

6.3 Criterion Benchmarks

  • File: crates/ruvllm/benches/pi_quant_bench.rs
    • bench_pi_quantize (target: >1 GB/s)
    • bench_pi_dequantize_simd (target: >10 GB/s NEON, >2 GB/s WASM)
    • bench_hadamard_transform (target: <50 μs for 4096-dim)
    • bench_qat_forward_backward (target: <500 ms/step)

6.4 Integration Tests

  • File: crates/ruvllm/tests/qat_integration.rs
    • Full QAT pipeline test (calibrate → train → export)
    • Quantized model inference test
    • WASM export test

Acceptance Gates Verification

Gate Phase Test Command
G1 1→2 cargo test -p ruvllm gate_piq3_quality
G2 2→3 cargo test -p ruvllm gate_lora_qat_convergence
G3 3→4 cargo test -p ruvllm gate_piq2_viability
G4 Any cargo bench -p ruvllm -- --baseline main
G5 Pre-merge cargo clippy -p ruvllm -- -D clippy::undocumented_unsafe_blocks
G6 Pre-release cd crates/ruvllm-wasm && wasm-pack build --target web

Quality Metrics Collection

Phase 1 Metrics (PiQ3 PTQ)

# Run quality comparison
cargo test -p ruvllm --release pi_quant_quality -- --nocapture

# Expected output:
# MSE (PiQ3 vs Q3): 0.042 vs 0.051 (✓ PiQ3 better)
# Spectral (PiQ3 vs Q3): -18.2 dB vs -17.5 dB (✓ PiQ3 better)
# Cosine (PiQ3 vs Q3): 0.9981 vs 0.9975 (✓ PiQ3 better)
# Outlier Retention: 87% vs 68% (✓ PiQ3 better)

Phase 2 Metrics (LoRA-QAT)

# Run reasoning evaluation
cargo run -p ruvllm --release -- evaluate \
  --model quantized-0.5b-piq3-qat.gguf \
  --benchmarks wikitext2,gsm8k,humaneval,tool_use,long_context

# Expected output:
# WikiText-2 PPL: 13.2 (+7% from FP16 12.3) ✓
# GSM8K: 40% (-5pt from 45%) ✓
# HumanEval: 25% (-3pt from 28%) ✓
# Tool Use: 89% (-3pt from 92%) ✓
# Long Context 8K: 88% (-4pt from 92%) ✓

Rollback Triggers

Trigger Detection Response
PiQ3 ≤ Uniform G1 fails (0-1 metrics better) Investigate step size; if fails, de-scope to research
LoRA-QAT OOM Training crashes Reduce rank 16→8→4; if OOM persists, defer to GPU cluster
Reasoning collapse >25 point GSM8K drop Increase λ_KD; if fails, revert to PTQ-only
SIMD divergence >1 ULP vs scalar Fix kernel; block merge
Benchmark regression >5% slower than baseline Bisect and fix; block merge

File Summary

New Files (16 files)

crates/ruvllm/src/quantize/
  pi_quant.rs              # Pi-constant quantization core
  pi_quant_simd.rs         # NEON/AVX2 kernels
  pi_quant_wasm_simd.rs    # WASM SIMD128 kernels
  incoherence.rs           # Hadamard rotation transforms
  hadamard.rs              # Fast Walsh-Hadamard O(n log n)
  quip.rs                  # QuIP-enhanced 2-bit
  security.rs              # Weight integrity validation

crates/ruvllm/src/qat/
  mod.rs                   # Public API
  config.rs                # QatConfig, SteVariant
  ste.rs                   # Straight-through estimator
  differentiable_quant.rs  # DifferentiableQuantizer trait
  calibration.rs           # Mixed-domain calibration
  distillation.rs          # Teacher-student loss
  reasoning_loss.rs        # Chain-of-thought fidelity
  training_loop.rs         # Main QAT orchestrator
  lora_qat.rs              # LoRA-QAT lightweight variant

crates/ruvllm-wasm/src/
  pi_quant_wasm.rs         # Pi-quantization WASM bindings
  quant_bench_wasm.rs      # In-browser benchmarks

crates/ruvllm/benches/
  pi_quant_bench.rs        # Criterion benchmarks

Extended Files (12 files)

crates/ruvllm/src/quantize/ruvltra_quant.rs  # TargetFormat enum
crates/ruvllm/src/gguf/quantization.rs       # GGUF type IDs
crates/ruvllm/src/training/real_trainer.rs   # QatMode
crates/ruvllm/src/lora/micro_lora.rs         # AdapterMode::Qat
crates/ruvllm/src/lora/training.rs           # LoRA-QAT gradients
crates/ruvllm/src/sona/integration.rs        # Tier 2 adaptation
crates/ruvllm/src/evaluation/real_harness.rs # Quantized eval
crates/ruvllm-wasm/src/bindings.rs           # WASM exports
crates/ruvllm-wasm/Cargo.toml                # Feature flags

Definition of Done

  • All checkboxes above completed
  • Gates G1-G6 passing
  • All 5 reasoning metrics within acceptable delta
  • Benchmarks meet performance targets
  • Documentation in crates/ruvllm/docs/qat/
  • CHANGELOG entry added
  • PR reviewed and approved
  • Merged to main