ruvector/docs/adr/ADR-090-implementation-checklist.md
Reuven f091d45575 docs(adr): add governance improvements to ADR-090/091, create ADR-092, add implementation checklists
ADR-090 (Ultra-Low-Bit QAT):
- Changed status to "Accepted (Staged Implementation)"
- Added decision statement choosing LoRA-QAT as first path
- Added staged implementation phases (4 phases, explicit gates)
- Added validation plan defining "better" (MSE, spectral, cosine, outlier retention)
- Added reasoning preservation metrics (PPL, GSM8K, HumanEval, tool use, long context)
- Added system invariants (INV-1 through INV-8)
- Added acceptance gates (G1-G6) with rollback triggers
- Restructured success criteria into correctness/performance/quality/rollout

ADR-091 (INT8 CNN Quantization):
- Changed status to "Accepted"
- Added decision statement with acceptance benchmark
- Added system invariants (INV-1 through INV-8)
- Added operator coverage table (11 operators)
- Added graph rewrite passes section (4 passes)
- Added deployment policy matrix
- Added acceptance gates (7 gates) with rollback conditions

ADR-092 (MoE Memory-Aware Routing):
- Split from ADR-090 as routing affects scheduling/cache, not representation
- Added decision statement with acceptance benchmark (≥70% cache hit rate)
- Added system invariants (INV-1 through INV-6)
- Added acceptance gates (G1-G5) with rollback conditions
- Added domain analysis with bounded context

Implementation Checklists:
- ADR-090: 6 phases, ~28 files, 16 new + 12 extended
- ADR-091: 6 phases, acceptance gate verification commands

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-03-12 10:47:00 -04:00

374 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-090 Implementation Checklist: Ultra-Low-Bit QAT & Pi-Quantization
**ADR**: ADR-090-ultra-low-bit-qat-pi-quantization-ddd.md
**Status**: Ready for Implementation (Staged)
**Target Crate**: `ruvllm`
---
## Phase 1: PiQ3 PTQ (Weeks 1-3)
### 1.1 Pi-Quantization Core
- [ ] **File**: `crates/ruvllm/src/quantize/pi_quant.rs`
- [ ] `PiQuantizer` struct (bits, k, alpha per-channel)
- [ ] `quantize_scalar()` method with pi/k step size
- [ ] `quantize_block()` method for packed storage
- [ ] `Pi3BitBlock` packed format (3 bytes → 8 values)
- [ ] `Pi2BitBlock` packed format (1 byte → 4 values)
- [ ] Unit tests for quantize/dequantize round-trip
- [ ] **Invariant Check**: INV-2 (scale positivity), INV-3 (step size constraint)
### 1.2 Pi-Quant SIMD Kernels
- [ ] **File**: `crates/ruvllm/src/quantize/pi_quant_simd.rs`
- [ ] `pi_dequantize_neon()` - ARM NEON kernel
- [ ] `pi_dequantize_avx2()` - x86_64 AVX2 kernel
- [ ] Scalar reference implementation
- [ ] Kernel equivalence tests (≤1 ULP, INV-8)
### 1.3 TargetFormat Extension
- [ ] **File**: `crates/ruvllm/src/quantize/ruvltra_quant.rs`
- [ ] Add `PiQ3` variant to `TargetFormat` enum
- [ ] Add `PiQ2` variant to `TargetFormat` enum
- [ ] Implement `bits_per_weight()` for new variants
- [ ] Update `quantize_tensor()` dispatch
### 1.4 GGUF Type Registration
- [ ] **File**: `crates/ruvllm/src/gguf/quantization.rs`
- [ ] Register `PiQ3 = 40` type ID (INV-7)
- [ ] Register `PiQ2 = 41` type ID
- [ ] Implement serialization/deserialization
### 1.5 Phase 1 Validation
- [ ] **Metrics collection**: MSE, spectral distortion, cosine similarity, outlier retention
- [ ] **Gate G1**: PiQ3 beats uniform Q3 on ≥2/4 quality metrics
- [ ] **Benchmark**: `benches/pi_quant_bench.rs` created
---
## Phase 2: PiQ3 + LoRA-QAT (Weeks 4-7)
### 2.1 Straight-Through Estimator
- [ ] **File**: `crates/ruvllm/src/qat/ste.rs`
- [ ] `SteVariant` enum (Standard, Clipped, LearnedStepSize, Ewgs)
- [ ] `backward()` method for each variant
- [ ] Gradient correctness tests vs PyTorch reference (INV-1)
### 2.2 Differentiable Quantization
- [ ] **File**: `crates/ruvllm/src/qat/differentiable_quant.rs`
- [ ] `DifferentiableQuantizer` trait
- [ ] `PiQuantDifferentiable` impl
- [ ] Forward/backward pass with STE
- [ ] Scale gradient computation for LSQ variant
### 2.3 Calibration Pipeline
- [ ] **File**: `crates/ruvllm/src/qat/calibration.rs`
- [ ] `CalibrationEngine` aggregate root
- [ ] Mixed-domain calibration (tool use + reasoning)
- [ ] Per-layer scale/zero-point initialization
- [ ] Calibration artifact serialization (INV-5)
- [ ] Integration with `training/tool_dataset.rs`
- [ ] Integration with `training/claude_dataset.rs`
### 2.4 Distillation Loss
- [ ] **File**: `crates/ruvllm/src/qat/distillation.rs`
- [ ] `DistillationLoss` struct
- [ ] `L_task` component (task loss)
- [ ] `L_KD` component (KL divergence from teacher)
- [ ] `L_reasoning` component (CoT fidelity)
- [ ] Composite loss with configurable weights
### 2.5 Reasoning Loss
- [ ] **File**: `crates/ruvllm/src/qat/reasoning_loss.rs`
- [ ] Chain-of-thought fidelity loss
- [ ] Step-wise reasoning preservation
- [ ] Integration with evaluation harness
### 2.6 LoRA-QAT Integration
- [ ] **File**: `crates/ruvllm/src/qat/lora_qat.rs`
- [ ] `LoraQatTrainer` struct
- [ ] Quantization-aware LoRA forward pass
- [ ] Memory-efficient gradient checkpointing
- [ ] Integration with `lora/micro_lora.rs`
- [ ] Integration with `lora/training.rs`
- [ ] **File**: `crates/ruvllm/src/lora/micro_lora.rs` (extend)
- [ ] Add `AdapterMode::Qat` variant
- [ ] Support quantized base + FP32 adapter
### 2.7 QAT Training Loop
- [ ] **File**: `crates/ruvllm/src/qat/training_loop.rs`
- [ ] `QatTrainer` orchestrator
- [ ] `run()` method: calibrate → train → export
- [ ] Epoch metrics: loss, PPL, reasoning score
- [ ] Domain event emission: `QatEpochComplete`
### 2.8 QAT Config
- [ ] **File**: `crates/ruvllm/src/qat/config.rs`
- [ ] `QatConfig` struct (bits, STE variant, loss weights, epochs)
- [ ] `QuantGranularity` enum (PerTensor, PerChannel, PerToken)
- [ ] Serialization for config persistence
### 2.9 Phase 2 Validation
- [ ] **Reasoning metrics**: PPL delta, GSM8K delta, HumanEval delta, tool use delta, long context
- [ ] **Gate G2**: All 5 reasoning metrics within acceptable delta
- [ ] **Memory check**: LoRA-QAT ≤2 GB for 0.5B model
---
## Phase 3: PiQ2 + Incoherence (Weeks 8-10)
### 3.1 Hadamard Transform
- [ ] **File**: `crates/ruvllm/src/quantize/hadamard.rs`
- [ ] `HadamardTransform` struct
- [ ] `forward_inplace()` - O(n log n) Walsh-Hadamard
- [ ] `inverse_inplace()` - inverse transform
- [ ] Random sign flip support
- [ ] Property test: H × H^T = n × I (INV-4)
### 3.2 Incoherence Processing
- [ ] **File**: `crates/ruvllm/src/quantize/incoherence.rs`
- [ ] `IncoherenceTransform` aggregate root
- [ ] Apply Hadamard before quantization
- [ ] Store transform metadata
- [ ] Domain event: `IncoherenceApplied`
### 3.3 QuIP-Enhanced Quantization
- [ ] **File**: `crates/ruvllm/src/quantize/quip.rs`
- [ ] `Q2_QuIP` variant in TargetFormat
- [ ] Combine incoherence + 2-bit K-quant
- [ ] Metadata for inverse transform
### 3.4 PiQ2 Implementation
- [ ] **File**: `crates/ruvllm/src/quantize/pi_quant.rs` (extend)
- [ ] `quantize_2bit()` method
- [ ] 2-bit packing (reuse `bitnet/ternary_tensor.rs` pattern)
- [ ] Integration with incoherence pipeline
### 3.5 Phase 3 Validation
- [ ] **Gate G3**: PiQ2 + incoherence achieves acceptable quality without full QAT
- [ ] **Benchmark**: 2-bit throughput targets
---
## Phase 4: WASM Integration (Week 11)
### 4.1 WASM SIMD Kernels
- [ ] **File**: `crates/ruvllm/src/quantize/pi_quant_wasm_simd.rs`
- [ ] `pi_dequant_wasm_simd()` using SIMD128
- [ ] Reuse LUT pattern from `bitnet/tl1_wasm.rs`
- [ ] In-browser kernel tests
### 4.2 WASM Bindings
- [ ] **File**: `crates/ruvllm-wasm/src/pi_quant_wasm.rs`
- [ ] `PiQuantWasm` wasm_bindgen struct
- [ ] `quantize()` method
- [ ] `dequantize()` method
- [ ] `computeMse()` method
- [ ] `spectralDistortion()` method
- [ ] JSON serialization
### 4.3 WASM Benchmarks
- [ ] **File**: `crates/ruvllm-wasm/src/quant_bench_wasm.rs`
- [ ] `QuantBenchWasm` struct
- [ ] `runBench()` method
- [ ] `compareFormats()` method
### 4.4 WASM Feature Gating
- [ ] **File**: `crates/ruvllm-wasm/Cargo.toml`
- [ ] Add `pi-quant` feature
- [ ] Add `qat` feature (depends on `pi-quant`)
- [ ] Feature flag tests
---
## Phase 5: Security & Observability (Week 12)
### 5.1 Weight Integrity
- [ ] **File**: `crates/ruvllm/src/quantize/security.rs`
- [ ] `WeightIntegrity` struct (hashes, config)
- [ ] `validate_quantized_model()` function
- [ ] SHA-256 checksum computation
- [ ] GGUF security validation
### 5.2 Observability
- [ ] **File**: `crates/ruvllm/src/qat/metrics.rs`
- [ ] Per-epoch loss tracking
- [ ] Quality metric export
- [ ] Training duration metrics
---
## Phase 6: Integration & Benchmarks (Weeks 13-14)
### 6.1 SONA Integration
- [ ] **File**: `crates/ruvllm/src/sona/integration.rs` (extend)
- [ ] Tier 2: Quantization scale adaptation
- [ ] Quality signal for dynamic precision
### 6.2 Evaluation Harness Extension
- [ ] **File**: `crates/ruvllm/src/evaluation/real_harness.rs` (extend)
- [ ] Add quantized model evaluation
- [ ] GSM8K, HumanEval, WikiText-2 benchmarks
- [ ] Tool use evaluation
### 6.3 Criterion Benchmarks
- [ ] **File**: `crates/ruvllm/benches/pi_quant_bench.rs`
- [ ] `bench_pi_quantize` (target: >1 GB/s)
- [ ] `bench_pi_dequantize_simd` (target: >10 GB/s NEON, >2 GB/s WASM)
- [ ] `bench_hadamard_transform` (target: <50 μs for 4096-dim)
- [ ] `bench_qat_forward_backward` (target: <500 ms/step)
### 6.4 Integration Tests
- [ ] **File**: `crates/ruvllm/tests/qat_integration.rs`
- [ ] Full QAT pipeline test (calibrate train export)
- [ ] Quantized model inference test
- [ ] WASM export test
---
## Acceptance Gates Verification
| Gate | Phase | Test Command |
|------|-------|--------------|
| G1 | 12 | `cargo test -p ruvllm gate_piq3_quality` |
| G2 | 23 | `cargo test -p ruvllm gate_lora_qat_convergence` |
| G3 | 34 | `cargo test -p ruvllm gate_piq2_viability` |
| G4 | Any | `cargo bench -p ruvllm -- --baseline main` |
| G5 | Pre-merge | `cargo clippy -p ruvllm -- -D clippy::undocumented_unsafe_blocks` |
| G6 | Pre-release | `cd crates/ruvllm-wasm && wasm-pack build --target web` |
---
## Quality Metrics Collection
### Phase 1 Metrics (PiQ3 PTQ)
```bash
# Run quality comparison
cargo test -p ruvllm --release pi_quant_quality -- --nocapture
# Expected output:
# MSE (PiQ3 vs Q3): 0.042 vs 0.051 (✓ PiQ3 better)
# Spectral (PiQ3 vs Q3): -18.2 dB vs -17.5 dB (✓ PiQ3 better)
# Cosine (PiQ3 vs Q3): 0.9981 vs 0.9975 (✓ PiQ3 better)
# Outlier Retention: 87% vs 68% (✓ PiQ3 better)
```
### Phase 2 Metrics (LoRA-QAT)
```bash
# Run reasoning evaluation
cargo run -p ruvllm --release -- evaluate \
--model quantized-0.5b-piq3-qat.gguf \
--benchmarks wikitext2,gsm8k,humaneval,tool_use,long_context
# Expected output:
# WikiText-2 PPL: 13.2 (+7% from FP16 12.3) ✓
# GSM8K: 40% (-5pt from 45%) ✓
# HumanEval: 25% (-3pt from 28%) ✓
# Tool Use: 89% (-3pt from 92%) ✓
# Long Context 8K: 88% (-4pt from 92%) ✓
```
---
## Rollback Triggers
| Trigger | Detection | Response |
|---------|-----------|----------|
| PiQ3 Uniform | G1 fails (0-1 metrics better) | Investigate step size; if fails, de-scope to research |
| LoRA-QAT OOM | Training crashes | Reduce rank 1684; if OOM persists, defer to GPU cluster |
| Reasoning collapse | >25 point GSM8K drop | Increase λ_KD; if fails, revert to PTQ-only |
| SIMD divergence | >1 ULP vs scalar | Fix kernel; block merge |
| Benchmark regression | >5% slower than baseline | Bisect and fix; block merge |
---
## File Summary
### New Files (16 files)
```
crates/ruvllm/src/quantize/
pi_quant.rs # Pi-constant quantization core
pi_quant_simd.rs # NEON/AVX2 kernels
pi_quant_wasm_simd.rs # WASM SIMD128 kernels
incoherence.rs # Hadamard rotation transforms
hadamard.rs # Fast Walsh-Hadamard O(n log n)
quip.rs # QuIP-enhanced 2-bit
security.rs # Weight integrity validation
crates/ruvllm/src/qat/
mod.rs # Public API
config.rs # QatConfig, SteVariant
ste.rs # Straight-through estimator
differentiable_quant.rs # DifferentiableQuantizer trait
calibration.rs # Mixed-domain calibration
distillation.rs # Teacher-student loss
reasoning_loss.rs # Chain-of-thought fidelity
training_loop.rs # Main QAT orchestrator
lora_qat.rs # LoRA-QAT lightweight variant
crates/ruvllm-wasm/src/
pi_quant_wasm.rs # Pi-quantization WASM bindings
quant_bench_wasm.rs # In-browser benchmarks
crates/ruvllm/benches/
pi_quant_bench.rs # Criterion benchmarks
```
### Extended Files (12 files)
```
crates/ruvllm/src/quantize/ruvltra_quant.rs # TargetFormat enum
crates/ruvllm/src/gguf/quantization.rs # GGUF type IDs
crates/ruvllm/src/training/real_trainer.rs # QatMode
crates/ruvllm/src/lora/micro_lora.rs # AdapterMode::Qat
crates/ruvllm/src/lora/training.rs # LoRA-QAT gradients
crates/ruvllm/src/sona/integration.rs # Tier 2 adaptation
crates/ruvllm/src/evaluation/real_harness.rs # Quantized eval
crates/ruvllm-wasm/src/bindings.rs # WASM exports
crates/ruvllm-wasm/Cargo.toml # Feature flags
```
---
## Definition of Done
- [ ] All checkboxes above completed
- [ ] Gates G1-G6 passing
- [ ] All 5 reasoning metrics within acceptable delta
- [ ] Benchmarks meet performance targets
- [ ] Documentation in `crates/ruvllm/docs/qat/`
- [ ] CHANGELOG entry added
- [ ] PR reviewed and approved
- [ ] Merged to main