mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-28 01:44:41 +00:00
ADR-090 (Ultra-Low-Bit QAT): - Changed status to "Accepted (Staged Implementation)" - Added decision statement choosing LoRA-QAT as first path - Added staged implementation phases (4 phases, explicit gates) - Added validation plan defining "better" (MSE, spectral, cosine, outlier retention) - Added reasoning preservation metrics (PPL, GSM8K, HumanEval, tool use, long context) - Added system invariants (INV-1 through INV-8) - Added acceptance gates (G1-G6) with rollback triggers - Restructured success criteria into correctness/performance/quality/rollout ADR-091 (INT8 CNN Quantization): - Changed status to "Accepted" - Added decision statement with acceptance benchmark - Added system invariants (INV-1 through INV-8) - Added operator coverage table (11 operators) - Added graph rewrite passes section (4 passes) - Added deployment policy matrix - Added acceptance gates (7 gates) with rollback conditions ADR-092 (MoE Memory-Aware Routing): - Split from ADR-090 as routing affects scheduling/cache, not representation - Added decision statement with acceptance benchmark (≥70% cache hit rate) - Added system invariants (INV-1 through INV-6) - Added acceptance gates (G1-G5) with rollback conditions - Added domain analysis with bounded context Implementation Checklists: - ADR-090: 6 phases, ~28 files, 16 new + 12 extended - ADR-091: 6 phases, acceptance gate verification commands Co-Authored-By: claude-flow <ruv@ruv.net>
7.6 KiB
7.6 KiB
ADR-091 Implementation Checklist: INT8 CNN Quantization
ADR: ADR-091-int8-cnn-quantization-ddd.md
Status: Ready for Implementation
Target Crate: ruvector-cnn
Phase 1: Core Quantization Infrastructure (Weeks 1-2)
1.1 Quantization Parameter Module
- File:
crates/ruvector-cnn/src/quantize/params.rsQuantizationParamsstruct (scale, zero_point, qmin, qmax)QuantizationSchemeenum (PerTensor, PerChannel)QuantizationModeenum (Symmetric, Asymmetric)from_minmax()constructorfrom_percentile()constructor (calibration-based)- Unit tests for parameter computation
1.2 INT8 Tensor Types
- File:
crates/ruvector-cnn/src/quantize/tensor.rsQuantizedTensor<i8>struct with metadataQuantizationMetadata(scale, zero_point, shape)quantize()method (f32 → i8)dequantize()method (i8 → f32)- Bounds checking (INV-1, INV-2, INV-3)
- Unit tests for round-trip accuracy
1.3 Calibration Engine
- File:
crates/ruvector-cnn/src/quantize/calibration.rsCalibrationMethodenum (MinMax, Percentile, MSE, Entropy)CalibrationCollectorfor activation statisticsCalibrationResultwith per-layer paramscalibrate_model()entry point- Calibration artifact serialization (INV-7)
- Unit tests for each calibration method
Phase 2: INT8 Kernels (Weeks 3-4)
2.1 Scalar Reference Kernels
- File:
crates/ruvector-cnn/src/kernels/int8_scalar.rsconv2d_int8_scalar()- reference implementationdepthwise_conv2d_int8_scalar()matmul_int8_scalar()for FC layersrequantize_scalar()(i32 → i8 with scale adjustment)- Property tests: output bounds, accumulator overflow checks
2.2 AVX2 Kernels
- File:
crates/ruvector-cnn/src/kernels/int8_avx2.rsconv2d_int8_avx2()using_mm256_maddubs_epi16depthwise_conv2d_int8_avx2()matmul_int8_avx2()using VNNI intrinsics (if available)- Kernel equivalence tests vs scalar (≤1 ULP, INV-6)
2.3 NEON Kernels
- File:
crates/ruvector-cnn/src/kernels/int8_neon.rsconv2d_int8_neon()usingvmlal_s8depthwise_conv2d_int8_neon()matmul_int8_neon()using dot product instructions- Kernel equivalence tests vs scalar
2.4 WASM SIMD128 Kernels
- File:
crates/ruvector-cnn/src/kernels/int8_wasm.rsconv2d_int8_wasm()usingi8x16operationsdepthwise_conv2d_int8_wasm()matmul_int8_wasm()- In-browser kernel equivalence tests
Phase 3: Graph Rewrite Passes (Weeks 5-6)
3.1 BatchNorm Fusion
- File:
crates/ruvector-cnn/src/quantize/graph_rewrite.rsfuse_batchnorm_to_conv()pass (GR-1)- Fused weight/bias computation
- Integration with model loader
- Unit test: verify BN params absorbed correctly
3.2 Zero-Point Correction
- File:
crates/ruvector-cnn/src/quantize/graph_rewrite.rs(continued)fuse_zp_to_bias()pass (GR-2)- Pre-compute:
bias_q = bias - zp_input × Σweights - Unit test: verify runtime subtraction eliminated
3.3 Quantize/Dequantize Insertion
- File:
crates/ruvector-cnn/src/quantize/graph_rewrite.rs(continued)insert_qdq_nodes()pass (GR-3)QuantizeNodeandDequantizeNodetypes- Boundary detection for INT8 subgraph
- Integration test: full model quantization
3.4 Activation Fusion
- File:
crates/ruvector-cnn/src/quantize/graph_rewrite.rs(continued)fuse_relu()pass (GR-4)fuse_hardswish()pass (GR-4)- LUT-based HardSwish (256-entry table)
- Unit tests for activation correctness
Phase 4: Quantized Layer Implementations (Weeks 7-8)
4.1 QuantizedConv2d
- File:
crates/ruvector-cnn/src/layers/quantized_conv2d.rsQuantizedConv2dstructforward()method with SIMD dispatch- Weight packing for SIMD efficiency
- Unit tests with known inputs/outputs
4.2 QuantizedDepthwiseConv2d
- File:
crates/ruvector-cnn/src/layers/quantized_depthwise.rsQuantizedDepthwiseConv2dstruct- Separate kernel for channel-wise ops
- Unit tests
4.3 QuantizedLinear
- File:
crates/ruvector-cnn/src/layers/quantized_linear.rsQuantizedLinearstruct- GEMM-based forward pass
- Unit tests
4.4 Quantized Pooling
- File:
crates/ruvector-cnn/src/layers/quantized_pooling.rsQuantizedMaxPool2d- operates in INT8 domainQuantizedAvgPool2d- requires intermediate precision- Unit tests for pooling correctness
4.5 Quantized Residual Add
- File:
crates/ruvector-cnn/src/layers/quantized_residual.rsQuantizedResidualAddwith requantization- Handle mismatched scales between branches
- Unit tests
Phase 5: Model Export & Integration (Weeks 9-10)
5.1 Model Exporter
- File:
crates/ruvector-cnn/src/quantize/export.rs- Export quantized model to binary format
- Include quantization config (INV-8)
- Checksum generation (INV-7)
- Version stamping
5.2 Model Loader
- File:
crates/ruvector-cnn/src/quantize/loader.rs- Load quantized model from binary
- Checksum verification
- Config validation
5.3 MobileNetV3 Integration
- File:
crates/ruvector-cnn/src/models/mobilenetv3_int8.rsMobileNetV3Int8model definitionfrom_float_model()conversion- End-to-end inference test
Phase 6: Benchmarks & Validation (Weeks 11-12)
6.1 Criterion Benchmarks
- File:
crates/ruvector-cnn/benches/int8_bench.rs- Conv2d INT8 throughput (target: 2x FP32)
- MobileNetV3 INT8 latency (target: 2.5x speedup)
- Memory usage comparison
6.2 Quality Validation
- File:
crates/ruvector-cnn/tests/quality_validation.rs- Cosine similarity ≥0.995 vs FP32 (GATE-1)
- Per-layer MSE tracking
- Embedding validation on test set (GATE-2)
6.3 Acceptance Gate Tests
- File:
crates/ruvector-cnn/tests/acceptance_gates.rs- GATE-1: Calibration produces valid params
- GATE-2: Cosine similarity ≥0.995
- GATE-3: Latency improvement ≥2.5x
- GATE-4: Memory reduction ≥3x
- GATE-5: Zero unsafe without assertion
- GATE-6: WASM build succeeds
- GATE-7: CI pipeline passes
Acceptance Criteria Verification
| Gate | Test File | Command |
|---|---|---|
| GATE-1 | tests/acceptance_gates.rs |
cargo test gate_calibration |
| GATE-2 | tests/quality_validation.rs |
cargo test gate_cosine_similarity |
| GATE-3 | benches/int8_bench.rs |
cargo bench --bench int8_bench -- latency |
| GATE-4 | tests/acceptance_gates.rs |
cargo test gate_memory_reduction |
| GATE-5 | CI | cargo clippy -- -D clippy::undocumented_unsafe_blocks |
| GATE-6 | CI | wasm-pack build --target web |
| GATE-7 | CI | GitHub Actions workflow |
Rollback Triggers
| Trigger | Detection | Response |
|---|---|---|
| Cosine <0.99 | GATE-2 fails | Review calibration method |
| Latency <2x | GATE-3 fails | Profile kernel hot paths |
| SIMD ≠ Scalar | Kernel tests fail | Fix kernel bug, block merge |
| CI failure | GATE-7 fails | Fix or revert |
Definition of Done
- All checkboxes above completed
- All 7 acceptance gates passing
- Documentation in
crates/ruvector-cnn/docs/ - CHANGELOG entry added
- PR reviewed and approved
- Merged to main