ruvector/docs/adr/ADR-090-implementation-checklist.md

# ADR-090 Implementation Checklist: Ultra-Low-Bit QAT & Pi-Quantization

**ADR**: ADR-090-ultra-low-bit-qat-pi-quantization-ddd.md
**Status**: Ready for Implementation (Staged)
**Target Crate**: `ruvllm`

---

## Phase 1: PiQ3 PTQ (Weeks 1-3)

### 1.1 Pi-Quantization Core

- [ ] **File**: `crates/ruvllm/src/quantize/pi_quant.rs`
  - [ ] `PiQuantizer` struct (bits, k, alpha per-channel)
  - [ ] `quantize_scalar()` method with pi/k step size
  - [ ] `quantize_block()` method for packed storage
  - [ ] `Pi3BitBlock` packed format (3 bytes → 8 values)
  - [ ] `Pi2BitBlock` packed format (1 byte → 4 values)
  - [ ] Unit tests for quantize/dequantize round-trip
  - [ ] **Invariant Check**: INV-2 (scale positivity), INV-3 (step size constraint)

### 1.2 Pi-Quant SIMD Kernels

- [ ] **File**: `crates/ruvllm/src/quantize/pi_quant_simd.rs`
  - [ ] `pi_dequantize_neon()` - ARM NEON kernel
  - [ ] `pi_dequantize_avx2()` - x86_64 AVX2 kernel
  - [ ] Scalar reference implementation
  - [ ] Kernel equivalence tests (≤1 ULP, INV-8)

### 1.3 TargetFormat Extension

- [ ] **File**: `crates/ruvllm/src/quantize/ruvltra_quant.rs`
  - [ ] Add `PiQ3` variant to `TargetFormat` enum
  - [ ] Add `PiQ2` variant to `TargetFormat` enum
  - [ ] Implement `bits_per_weight()` for new variants
  - [ ] Update `quantize_tensor()` dispatch

### 1.4 GGUF Type Registration

- [ ] **File**: `crates/ruvllm/src/gguf/quantization.rs`
  - [ ] Register `PiQ3 = 40` type ID (INV-7)
  - [ ] Register `PiQ2 = 41` type ID
  - [ ] Implement serialization/deserialization

### 1.5 Phase 1 Validation

- [ ] **Metrics collection**: MSE, spectral distortion, cosine similarity, outlier retention
- [ ] **Gate G1**: PiQ3 beats uniform Q3 on ≥2/4 quality metrics
- [ ] **Benchmark**: `benches/pi_quant_bench.rs` created

---

## Phase 2: PiQ3 + LoRA-QAT (Weeks 4-7)

### 2.1 Straight-Through Estimator

- [ ] **File**: `crates/ruvllm/src/qat/ste.rs`
  - [ ] `SteVariant` enum (Standard, Clipped, LearnedStepSize, Ewgs)
  - [ ] `backward()` method for each variant
  - [ ] Gradient correctness tests vs PyTorch reference (INV-1)

### 2.2 Differentiable Quantization

- [ ] **File**: `crates/ruvllm/src/qat/differentiable_quant.rs`
  - [ ] `DifferentiableQuantizer` trait
  - [ ] `PiQuantDifferentiable` impl
  - [ ] Forward/backward pass with STE
  - [ ] Scale gradient computation for LSQ variant

### 2.3 Calibration Pipeline

- [ ] **File**: `crates/ruvllm/src/qat/calibration.rs`
  - [ ] `CalibrationEngine` aggregate root
  - [ ] Mixed-domain calibration (tool use + reasoning)
  - [ ] Per-layer scale/zero-point initialization
  - [ ] Calibration artifact serialization (INV-5)
  - [ ] Integration with `training/tool_dataset.rs`
  - [ ] Integration with `training/claude_dataset.rs`

### 2.4 Distillation Loss

- [ ] **File**: `crates/ruvllm/src/qat/distillation.rs`
  - [ ] `DistillationLoss` struct
  - [ ] `L_task` component (task loss)
  - [ ] `L_KD` component (KL divergence from teacher)
  - [ ] `L_reasoning` component (CoT fidelity)
  - [ ] Composite loss with configurable weights

### 2.5 Reasoning Loss

- [ ] **File**: `crates/ruvllm/src/qat/reasoning_loss.rs`
  - [ ] Chain-of-thought fidelity loss
  - [ ] Step-wise reasoning preservation
  - [ ] Integration with evaluation harness

### 2.6 LoRA-QAT Integration

- [ ] **File**: `crates/ruvllm/src/qat/lora_qat.rs`
  - [ ] `LoraQatTrainer` struct
  - [ ] Quantization-aware LoRA forward pass
  - [ ] Memory-efficient gradient checkpointing
  - [ ] Integration with `lora/micro_lora.rs`
  - [ ] Integration with `lora/training.rs`

- [ ] **File**: `crates/ruvllm/src/lora/micro_lora.rs` (extend)
  - [ ] Add `AdapterMode::Qat` variant
  - [ ] Support quantized base + FP32 adapter

### 2.7 QAT Training Loop

- [ ] **File**: `crates/ruvllm/src/qat/training_loop.rs`
  - [ ] `QatTrainer` orchestrator
  - [ ] `run()` method: calibrate → train → export
  - [ ] Epoch metrics: loss, PPL, reasoning score
  - [ ] Domain event emission: `QatEpochComplete`

### 2.8 QAT Config

- [ ] **File**: `crates/ruvllm/src/qat/config.rs`
  - [ ] `QatConfig` struct (bits, STE variant, loss weights, epochs)
  - [ ] `QuantGranularity` enum (PerTensor, PerChannel, PerToken)
  - [ ] Serialization for config persistence

### 2.9 Phase 2 Validation

- [ ] **Reasoning metrics**: PPL delta, GSM8K delta, HumanEval delta, tool use delta, long context
- [ ] **Gate G2**: All 5 reasoning metrics within acceptable delta
- [ ] **Memory check**: LoRA-QAT ≤2 GB for 0.5B model

---

## Phase 3: PiQ2 + Incoherence (Weeks 8-10)

### 3.1 Hadamard Transform

- [ ] **File**: `crates/ruvllm/src/quantize/hadamard.rs`
  - [ ] `HadamardTransform` struct
  - [ ] `forward_inplace()` - O(n log n) Walsh-Hadamard
  - [ ] `inverse_inplace()` - inverse transform
  - [ ] Random sign flip support
  - [ ] Property test: H × H^T = n × I (INV-4)

### 3.2 Incoherence Processing

- [ ] **File**: `crates/ruvllm/src/quantize/incoherence.rs`
  - [ ] `IncoherenceTransform` aggregate root
  - [ ] Apply Hadamard before quantization
  - [ ] Store transform metadata
  - [ ] Domain event: `IncoherenceApplied`

### 3.3 QuIP-Enhanced Quantization

- [ ] **File**: `crates/ruvllm/src/quantize/quip.rs`
  - [ ] `Q2_QuIP` variant in TargetFormat
  - [ ] Combine incoherence + 2-bit K-quant
  - [ ] Metadata for inverse transform

### 3.4 PiQ2 Implementation

- [ ] **File**: `crates/ruvllm/src/quantize/pi_quant.rs` (extend)
  - [ ] `quantize_2bit()` method
  - [ ] 2-bit packing (reuse `bitnet/ternary_tensor.rs` pattern)
  - [ ] Integration with incoherence pipeline

### 3.5 Phase 3 Validation

- [ ] **Gate G3**: PiQ2 + incoherence achieves acceptable quality without full QAT
- [ ] **Benchmark**: 2-bit throughput targets

---

## Phase 4: WASM Integration (Week 11)

### 4.1 WASM SIMD Kernels

- [ ] **File**: `crates/ruvllm/src/quantize/pi_quant_wasm_simd.rs`
  - [ ] `pi_dequant_wasm_simd()` using SIMD128
  - [ ] Reuse LUT pattern from `bitnet/tl1_wasm.rs`
  - [ ] In-browser kernel tests

### 4.2 WASM Bindings

- [ ] **File**: `crates/ruvllm-wasm/src/pi_quant_wasm.rs`
  - [ ] `PiQuantWasm` wasm_bindgen struct
  - [ ] `quantize()` method
  - [ ] `dequantize()` method
  - [ ] `computeMse()` method
  - [ ] `spectralDistortion()` method
  - [ ] JSON serialization

### 4.3 WASM Benchmarks

- [ ] **File**: `crates/ruvllm-wasm/src/quant_bench_wasm.rs`
  - [ ] `QuantBenchWasm` struct
  - [ ] `runBench()` method
  - [ ] `compareFormats()` method

### 4.4 WASM Feature Gating

- [ ] **File**: `crates/ruvllm-wasm/Cargo.toml`
  - [ ] Add `pi-quant` feature
  - [ ] Add `qat` feature (depends on `pi-quant`)
  - [ ] Feature flag tests

---

## Phase 5: Security & Observability (Week 12)

### 5.1 Weight Integrity

- [ ] **File**: `crates/ruvllm/src/quantize/security.rs`
  - [ ] `WeightIntegrity` struct (hashes, config)
  - [ ] `validate_quantized_model()` function
  - [ ] SHA-256 checksum computation
  - [ ] GGUF security validation

### 5.2 Observability

- [ ] **File**: `crates/ruvllm/src/qat/metrics.rs`
  - [ ] Per-epoch loss tracking
  - [ ] Quality metric export
  - [ ] Training duration metrics

---

## Phase 6: Integration & Benchmarks (Weeks 13-14)

### 6.1 SONA Integration

- [ ] **File**: `crates/ruvllm/src/sona/integration.rs` (extend)
  - [ ] Tier 2: Quantization scale adaptation
  - [ ] Quality signal for dynamic precision

### 6.2 Evaluation Harness Extension

- [ ] **File**: `crates/ruvllm/src/evaluation/real_harness.rs` (extend)
  - [ ] Add quantized model evaluation
  - [ ] GSM8K, HumanEval, WikiText-2 benchmarks
  - [ ] Tool use evaluation

### 6.3 Criterion Benchmarks

- [ ] **File**: `crates/ruvllm/benches/pi_quant_bench.rs`
  - [ ] `bench_pi_quantize` (target: >1 GB/s)
  - [ ] `bench_pi_dequantize_simd` (target: >10 GB/s NEON, >2 GB/s WASM)
  - [ ] `bench_hadamard_transform` (target: <50 μs for 4096-dim)
  - [ ] `bench_qat_forward_backward` (target: <500 ms/step)

### 6.4 Integration Tests

- [ ] **File**: `crates/ruvllm/tests/qat_integration.rs`
  - [ ] Full QAT pipeline test (calibrate → train → export)
  - [ ] Quantized model inference test
  - [ ] WASM export test

---

## Acceptance Gates Verification

| Gate | Phase | Test Command |
|------|-------|--------------|
| G1 | 1→2 | `cargo test -p ruvllm gate_piq3_quality` |
| G2 | 2→3 | `cargo test -p ruvllm gate_lora_qat_convergence` |
| G3 | 3→4 | `cargo test -p ruvllm gate_piq2_viability` |
| G4 | Any | `cargo bench -p ruvllm -- --baseline main` |
| G5 | Pre-merge | `cargo clippy -p ruvllm -- -D clippy::undocumented_unsafe_blocks` |
| G6 | Pre-release | `cd crates/ruvllm-wasm && wasm-pack build --target web` |

---

## Quality Metrics Collection

### Phase 1 Metrics (PiQ3 PTQ)

```bash
# Run quality comparison
cargo test -p ruvllm --release pi_quant_quality -- --nocapture

# Expected output:
# MSE (PiQ3 vs Q3): 0.042 vs 0.051 (✓ PiQ3 better)
# Spectral (PiQ3 vs Q3): -18.2 dB vs -17.5 dB (✓ PiQ3 better)
# Cosine (PiQ3 vs Q3): 0.9981 vs 0.9975 (✓ PiQ3 better)
# Outlier Retention: 87% vs 68% (✓ PiQ3 better)
```

### Phase 2 Metrics (LoRA-QAT)

```bash
# Run reasoning evaluation
cargo run -p ruvllm --release -- evaluate \
  --model quantized-0.5b-piq3-qat.gguf \
  --benchmarks wikitext2,gsm8k,humaneval,tool_use,long_context

# Expected output:
# WikiText-2 PPL: 13.2 (+7% from FP16 12.3) ✓
# GSM8K: 40% (-5pt from 45%) ✓
# HumanEval: 25% (-3pt from 28%) ✓
# Tool Use: 89% (-3pt from 92%) ✓
# Long Context 8K: 88% (-4pt from 92%) ✓
```

---

## Rollback Triggers

| Trigger | Detection | Response |
|---------|-----------|----------|
| PiQ3 ≤ Uniform | G1 fails (0-1 metrics better) | Investigate step size; if fails, de-scope to research |
| LoRA-QAT OOM | Training crashes | Reduce rank 16→8→4; if OOM persists, defer to GPU cluster |
| Reasoning collapse | >25 point GSM8K drop | Increase λ_KD; if fails, revert to PTQ-only |
| SIMD divergence | >1 ULP vs scalar | Fix kernel; block merge |
| Benchmark regression | >5% slower than baseline | Bisect and fix; block merge |

---

## File Summary

### New Files (16 files)

```
crates/ruvllm/src/quantize/
  pi_quant.rs              # Pi-constant quantization core
  pi_quant_simd.rs         # NEON/AVX2 kernels
  pi_quant_wasm_simd.rs    # WASM SIMD128 kernels
  incoherence.rs           # Hadamard rotation transforms
  hadamard.rs              # Fast Walsh-Hadamard O(n log n)
  quip.rs                  # QuIP-enhanced 2-bit
  security.rs              # Weight integrity validation

crates/ruvllm/src/qat/
  mod.rs                   # Public API
  config.rs                # QatConfig, SteVariant
  ste.rs                   # Straight-through estimator
  differentiable_quant.rs  # DifferentiableQuantizer trait
  calibration.rs           # Mixed-domain calibration
  distillation.rs          # Teacher-student loss
  reasoning_loss.rs        # Chain-of-thought fidelity
  training_loop.rs         # Main QAT orchestrator
  lora_qat.rs              # LoRA-QAT lightweight variant

crates/ruvllm-wasm/src/
  pi_quant_wasm.rs         # Pi-quantization WASM bindings
  quant_bench_wasm.rs      # In-browser benchmarks

crates/ruvllm/benches/
  pi_quant_bench.rs        # Criterion benchmarks
```

### Extended Files (12 files)

```
crates/ruvllm/src/quantize/ruvltra_quant.rs  # TargetFormat enum
crates/ruvllm/src/gguf/quantization.rs       # GGUF type IDs
crates/ruvllm/src/training/real_trainer.rs   # QatMode
crates/ruvllm/src/lora/micro_lora.rs         # AdapterMode::Qat
crates/ruvllm/src/lora/training.rs           # LoRA-QAT gradients
crates/ruvllm/src/sona/integration.rs        # Tier 2 adaptation
crates/ruvllm/src/evaluation/real_harness.rs # Quantized eval
crates/ruvllm-wasm/src/bindings.rs           # WASM exports
crates/ruvllm-wasm/Cargo.toml                # Feature flags
```

---

## Definition of Done

- [ ] All checkboxes above completed
- [ ] Gates G1-G6 passing
- [ ] All 5 reasoning metrics within acceptable delta
- [ ] Benchmarks meet performance targets
- [ ] Documentation in `crates/ruvllm/docs/qat/`
- [ ] CHANGELOG entry added
- [ ] PR reviewed and approved
- [ ] Merged to main