ruvector/docs/research/quantization-edge
rUv 3ed78842dd docs(research): add ultra-low-bit quantization & edge deployment research (#255)
* docs(research): add ultra-low-bit quantization & edge deployment research

Comprehensive research collection on 2-bit/3-bit quantization for ruvLLM:

- 01: Ultra-low-bit quantization survey (ICLR'26, QuIP, BitNet, I-quants)
- 02: Quantization-aware training (QAT) with reasoning preservation
- 03: QuIP 2-bit framework analysis (incoherence processing, E8 lattice)
- 04: MoE memory-aware routing for edge SRAM budgets
- 05: ruvLLM quantization architecture deep review and gap analysis
- 06: Rust implementation plan for 2-bit QAT pipeline (14-week roadmap)
- 07: Novel 3-int pi-constant quantization using irrational scaling

Key findings: ruvLLM has strong foundations (BitNet, K-quants, GGUF, KV cache)
but needs QAT training loop and differentiable quantization primitives.
Pi-constant scaling provides ~0.5 bit effective precision gain at 3-bit.

https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj

* docs(adr): add ADR-090 ultra-low-bit QAT & pi-quantization DDD architecture

Comprehensive architecture decision record for implementing 2-bit/3-bit
quantization-aware training in ruvLLM using Domain-Driven Design:

- 5 bounded contexts: Quantization Core, Training, MoE Routing, WASM Runtime, Observability
- Pi-constant quantization with irrational scaling (pi/k step sizes)
- QAT training loop with STE variants and LoRA-QAT lightweight path
- QuIP incoherence via fast Walsh-Hadamard (O(n log n))
- Memory-aware MoE routing with expert precision allocation
- WASM SIMD128 kernels reusing existing tl1_wasm.rs LUT pattern
- Security: weight integrity, GGUF validation, WASM sandbox
- Benchmarking: criterion suite with throughput/quality targets
- 14-week timeline, maps to 18 existing files for extension

Placed in docs/adr/ddd/ per DDD architectural pattern organization.

https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-12 10:21:30 -04:00
..
00-README.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
01-ultra-low-bit-quantization-survey.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
02-quantization-aware-training-qat.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
03-quip-2bit-framework.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
04-moe-memory-aware-routing.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
05-ruvllm-quantization-architecture.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
06-implementation-plan-rust-ruvllm.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
07-3int-pi-constant-quantization.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00