mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-25 15:03:46 +00:00
ADR-090 (Ultra-Low-Bit QAT): - Changed status to "Accepted (Staged Implementation)" - Added decision statement choosing LoRA-QAT as first path - Added staged implementation phases (4 phases, explicit gates) - Added validation plan defining "better" (MSE, spectral, cosine, outlier retention) - Added reasoning preservation metrics (PPL, GSM8K, HumanEval, tool use, long context) - Added system invariants (INV-1 through INV-8) - Added acceptance gates (G1-G6) with rollback triggers - Restructured success criteria into correctness/performance/quality/rollout ADR-091 (INT8 CNN Quantization): - Changed status to "Accepted" - Added decision statement with acceptance benchmark - Added system invariants (INV-1 through INV-8) - Added operator coverage table (11 operators) - Added graph rewrite passes section (4 passes) - Added deployment policy matrix - Added acceptance gates (7 gates) with rollback conditions ADR-092 (MoE Memory-Aware Routing): - Split from ADR-090 as routing affects scheduling/cache, not representation - Added decision statement with acceptance benchmark (≥70% cache hit rate) - Added system invariants (INV-1 through INV-6) - Added acceptance gates (G1-G5) with rollback conditions - Added domain analysis with bounded context Implementation Checklists: - ADR-090: 6 phases, ~28 files, 16 new + 12 extended - ADR-091: 6 phases, acceptance gate verification commands Co-Authored-By: claude-flow <ruv@ruv.net>
12 KiB
12 KiB
ADR-090 Implementation Checklist: Ultra-Low-Bit QAT & Pi-Quantization
ADR: ADR-090-ultra-low-bit-qat-pi-quantization-ddd.md
Status: Ready for Implementation (Staged)
Target Crate: ruvllm
Phase 1: PiQ3 PTQ (Weeks 1-3)
1.1 Pi-Quantization Core
- File:
crates/ruvllm/src/quantize/pi_quant.rsPiQuantizerstruct (bits, k, alpha per-channel)quantize_scalar()method with pi/k step sizequantize_block()method for packed storagePi3BitBlockpacked format (3 bytes → 8 values)Pi2BitBlockpacked format (1 byte → 4 values)- Unit tests for quantize/dequantize round-trip
- Invariant Check: INV-2 (scale positivity), INV-3 (step size constraint)
1.2 Pi-Quant SIMD Kernels
- File:
crates/ruvllm/src/quantize/pi_quant_simd.rspi_dequantize_neon()- ARM NEON kernelpi_dequantize_avx2()- x86_64 AVX2 kernel- Scalar reference implementation
- Kernel equivalence tests (≤1 ULP, INV-8)
1.3 TargetFormat Extension
- File:
crates/ruvllm/src/quantize/ruvltra_quant.rs- Add
PiQ3variant toTargetFormatenum - Add
PiQ2variant toTargetFormatenum - Implement
bits_per_weight()for new variants - Update
quantize_tensor()dispatch
- Add
1.4 GGUF Type Registration
- File:
crates/ruvllm/src/gguf/quantization.rs- Register
PiQ3 = 40type ID (INV-7) - Register
PiQ2 = 41type ID - Implement serialization/deserialization
- Register
1.5 Phase 1 Validation
- Metrics collection: MSE, spectral distortion, cosine similarity, outlier retention
- Gate G1: PiQ3 beats uniform Q3 on ≥2/4 quality metrics
- Benchmark:
benches/pi_quant_bench.rscreated
Phase 2: PiQ3 + LoRA-QAT (Weeks 4-7)
2.1 Straight-Through Estimator
- File:
crates/ruvllm/src/qat/ste.rsSteVariantenum (Standard, Clipped, LearnedStepSize, Ewgs)backward()method for each variant- Gradient correctness tests vs PyTorch reference (INV-1)
2.2 Differentiable Quantization
- File:
crates/ruvllm/src/qat/differentiable_quant.rsDifferentiableQuantizertraitPiQuantDifferentiableimpl- Forward/backward pass with STE
- Scale gradient computation for LSQ variant
2.3 Calibration Pipeline
- File:
crates/ruvllm/src/qat/calibration.rsCalibrationEngineaggregate root- Mixed-domain calibration (tool use + reasoning)
- Per-layer scale/zero-point initialization
- Calibration artifact serialization (INV-5)
- Integration with
training/tool_dataset.rs - Integration with
training/claude_dataset.rs
2.4 Distillation Loss
- File:
crates/ruvllm/src/qat/distillation.rsDistillationLossstructL_taskcomponent (task loss)L_KDcomponent (KL divergence from teacher)L_reasoningcomponent (CoT fidelity)- Composite loss with configurable weights
2.5 Reasoning Loss
- File:
crates/ruvllm/src/qat/reasoning_loss.rs- Chain-of-thought fidelity loss
- Step-wise reasoning preservation
- Integration with evaluation harness
2.6 LoRA-QAT Integration
-
File:
crates/ruvllm/src/qat/lora_qat.rsLoraQatTrainerstruct- Quantization-aware LoRA forward pass
- Memory-efficient gradient checkpointing
- Integration with
lora/micro_lora.rs - Integration with
lora/training.rs
-
File:
crates/ruvllm/src/lora/micro_lora.rs(extend)- Add
AdapterMode::Qatvariant - Support quantized base + FP32 adapter
- Add
2.7 QAT Training Loop
- File:
crates/ruvllm/src/qat/training_loop.rsQatTrainerorchestratorrun()method: calibrate → train → export- Epoch metrics: loss, PPL, reasoning score
- Domain event emission:
QatEpochComplete
2.8 QAT Config
- File:
crates/ruvllm/src/qat/config.rsQatConfigstruct (bits, STE variant, loss weights, epochs)QuantGranularityenum (PerTensor, PerChannel, PerToken)- Serialization for config persistence
2.9 Phase 2 Validation
- Reasoning metrics: PPL delta, GSM8K delta, HumanEval delta, tool use delta, long context
- Gate G2: All 5 reasoning metrics within acceptable delta
- Memory check: LoRA-QAT ≤2 GB for 0.5B model
Phase 3: PiQ2 + Incoherence (Weeks 8-10)
3.1 Hadamard Transform
- File:
crates/ruvllm/src/quantize/hadamard.rsHadamardTransformstructforward_inplace()- O(n log n) Walsh-Hadamardinverse_inplace()- inverse transform- Random sign flip support
- Property test: H × H^T = n × I (INV-4)
3.2 Incoherence Processing
- File:
crates/ruvllm/src/quantize/incoherence.rsIncoherenceTransformaggregate root- Apply Hadamard before quantization
- Store transform metadata
- Domain event:
IncoherenceApplied
3.3 QuIP-Enhanced Quantization
- File:
crates/ruvllm/src/quantize/quip.rsQ2_QuIPvariant in TargetFormat- Combine incoherence + 2-bit K-quant
- Metadata for inverse transform
3.4 PiQ2 Implementation
- File:
crates/ruvllm/src/quantize/pi_quant.rs(extend)quantize_2bit()method- 2-bit packing (reuse
bitnet/ternary_tensor.rspattern) - Integration with incoherence pipeline
3.5 Phase 3 Validation
- Gate G3: PiQ2 + incoherence achieves acceptable quality without full QAT
- Benchmark: 2-bit throughput targets
Phase 4: WASM Integration (Week 11)
4.1 WASM SIMD Kernels
- File:
crates/ruvllm/src/quantize/pi_quant_wasm_simd.rspi_dequant_wasm_simd()using SIMD128- Reuse LUT pattern from
bitnet/tl1_wasm.rs - In-browser kernel tests
4.2 WASM Bindings
- File:
crates/ruvllm-wasm/src/pi_quant_wasm.rsPiQuantWasmwasm_bindgen structquantize()methoddequantize()methodcomputeMse()methodspectralDistortion()method- JSON serialization
4.3 WASM Benchmarks
- File:
crates/ruvllm-wasm/src/quant_bench_wasm.rsQuantBenchWasmstructrunBench()methodcompareFormats()method
4.4 WASM Feature Gating
- File:
crates/ruvllm-wasm/Cargo.toml- Add
pi-quantfeature - Add
qatfeature (depends onpi-quant) - Feature flag tests
- Add
Phase 5: Security & Observability (Week 12)
5.1 Weight Integrity
- File:
crates/ruvllm/src/quantize/security.rsWeightIntegritystruct (hashes, config)validate_quantized_model()function- SHA-256 checksum computation
- GGUF security validation
5.2 Observability
- File:
crates/ruvllm/src/qat/metrics.rs- Per-epoch loss tracking
- Quality metric export
- Training duration metrics
Phase 6: Integration & Benchmarks (Weeks 13-14)
6.1 SONA Integration
- File:
crates/ruvllm/src/sona/integration.rs(extend)- Tier 2: Quantization scale adaptation
- Quality signal for dynamic precision
6.2 Evaluation Harness Extension
- File:
crates/ruvllm/src/evaluation/real_harness.rs(extend)- Add quantized model evaluation
- GSM8K, HumanEval, WikiText-2 benchmarks
- Tool use evaluation
6.3 Criterion Benchmarks
- File:
crates/ruvllm/benches/pi_quant_bench.rsbench_pi_quantize(target: >1 GB/s)bench_pi_dequantize_simd(target: >10 GB/s NEON, >2 GB/s WASM)bench_hadamard_transform(target: <50 μs for 4096-dim)bench_qat_forward_backward(target: <500 ms/step)
6.4 Integration Tests
- File:
crates/ruvllm/tests/qat_integration.rs- Full QAT pipeline test (calibrate → train → export)
- Quantized model inference test
- WASM export test
Acceptance Gates Verification
| Gate | Phase | Test Command |
|---|---|---|
| G1 | 1→2 | cargo test -p ruvllm gate_piq3_quality |
| G2 | 2→3 | cargo test -p ruvllm gate_lora_qat_convergence |
| G3 | 3→4 | cargo test -p ruvllm gate_piq2_viability |
| G4 | Any | cargo bench -p ruvllm -- --baseline main |
| G5 | Pre-merge | cargo clippy -p ruvllm -- -D clippy::undocumented_unsafe_blocks |
| G6 | Pre-release | cd crates/ruvllm-wasm && wasm-pack build --target web |
Quality Metrics Collection
Phase 1 Metrics (PiQ3 PTQ)
# Run quality comparison
cargo test -p ruvllm --release pi_quant_quality -- --nocapture
# Expected output:
# MSE (PiQ3 vs Q3): 0.042 vs 0.051 (✓ PiQ3 better)
# Spectral (PiQ3 vs Q3): -18.2 dB vs -17.5 dB (✓ PiQ3 better)
# Cosine (PiQ3 vs Q3): 0.9981 vs 0.9975 (✓ PiQ3 better)
# Outlier Retention: 87% vs 68% (✓ PiQ3 better)
Phase 2 Metrics (LoRA-QAT)
# Run reasoning evaluation
cargo run -p ruvllm --release -- evaluate \
--model quantized-0.5b-piq3-qat.gguf \
--benchmarks wikitext2,gsm8k,humaneval,tool_use,long_context
# Expected output:
# WikiText-2 PPL: 13.2 (+7% from FP16 12.3) ✓
# GSM8K: 40% (-5pt from 45%) ✓
# HumanEval: 25% (-3pt from 28%) ✓
# Tool Use: 89% (-3pt from 92%) ✓
# Long Context 8K: 88% (-4pt from 92%) ✓
Rollback Triggers
| Trigger | Detection | Response |
|---|---|---|
| PiQ3 ≤ Uniform | G1 fails (0-1 metrics better) | Investigate step size; if fails, de-scope to research |
| LoRA-QAT OOM | Training crashes | Reduce rank 16→8→4; if OOM persists, defer to GPU cluster |
| Reasoning collapse | >25 point GSM8K drop | Increase λ_KD; if fails, revert to PTQ-only |
| SIMD divergence | >1 ULP vs scalar | Fix kernel; block merge |
| Benchmark regression | >5% slower than baseline | Bisect and fix; block merge |
File Summary
New Files (16 files)
crates/ruvllm/src/quantize/
pi_quant.rs # Pi-constant quantization core
pi_quant_simd.rs # NEON/AVX2 kernels
pi_quant_wasm_simd.rs # WASM SIMD128 kernels
incoherence.rs # Hadamard rotation transforms
hadamard.rs # Fast Walsh-Hadamard O(n log n)
quip.rs # QuIP-enhanced 2-bit
security.rs # Weight integrity validation
crates/ruvllm/src/qat/
mod.rs # Public API
config.rs # QatConfig, SteVariant
ste.rs # Straight-through estimator
differentiable_quant.rs # DifferentiableQuantizer trait
calibration.rs # Mixed-domain calibration
distillation.rs # Teacher-student loss
reasoning_loss.rs # Chain-of-thought fidelity
training_loop.rs # Main QAT orchestrator
lora_qat.rs # LoRA-QAT lightweight variant
crates/ruvllm-wasm/src/
pi_quant_wasm.rs # Pi-quantization WASM bindings
quant_bench_wasm.rs # In-browser benchmarks
crates/ruvllm/benches/
pi_quant_bench.rs # Criterion benchmarks
Extended Files (12 files)
crates/ruvllm/src/quantize/ruvltra_quant.rs # TargetFormat enum
crates/ruvllm/src/gguf/quantization.rs # GGUF type IDs
crates/ruvllm/src/training/real_trainer.rs # QatMode
crates/ruvllm/src/lora/micro_lora.rs # AdapterMode::Qat
crates/ruvllm/src/lora/training.rs # LoRA-QAT gradients
crates/ruvllm/src/sona/integration.rs # Tier 2 adaptation
crates/ruvllm/src/evaluation/real_harness.rs # Quantized eval
crates/ruvllm-wasm/src/bindings.rs # WASM exports
crates/ruvllm-wasm/Cargo.toml # Feature flags
Definition of Done
- All checkboxes above completed
- Gates G1-G6 passing
- All 5 reasoning metrics within acceptable delta
- Benchmarks meet performance targets
- Documentation in
crates/ruvllm/docs/qat/ - CHANGELOG entry added
- PR reviewed and approved
- Merged to main