Claude
51bb16ca09
docs(research): add TurboQuant KV cache compression research document
...
Comprehensive research document covering TurboQuant (ICLR 2026) and its
mapping to ruvLLM. Covers algorithm details, performance results,
integration architecture, PiQ3 comparison, risks/mitigations, and
implementation summary.
https://claude.ai/code/session_011ogX2uc7Zf8d8aQ3UAbNcd
2026-03-25 12:14:17 +00:00
rUv
3ed78842dd
docs(research): add ultra-low-bit quantization & edge deployment research ( #255 )
...
* docs(research): add ultra-low-bit quantization & edge deployment research
Comprehensive research collection on 2-bit/3-bit quantization for ruvLLM:
- 01: Ultra-low-bit quantization survey (ICLR'26, QuIP, BitNet, I-quants)
- 02: Quantization-aware training (QAT) with reasoning preservation
- 03: QuIP 2-bit framework analysis (incoherence processing, E8 lattice)
- 04: MoE memory-aware routing for edge SRAM budgets
- 05: ruvLLM quantization architecture deep review and gap analysis
- 06: Rust implementation plan for 2-bit QAT pipeline (14-week roadmap)
- 07: Novel 3-int pi-constant quantization using irrational scaling
Key findings: ruvLLM has strong foundations (BitNet, K-quants, GGUF, KV cache)
but needs QAT training loop and differentiable quantization primitives.
Pi-constant scaling provides ~0.5 bit effective precision gain at 3-bit.
https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj
* docs(adr): add ADR-090 ultra-low-bit QAT & pi-quantization DDD architecture
Comprehensive architecture decision record for implementing 2-bit/3-bit
quantization-aware training in ruvLLM using Domain-Driven Design:
- 5 bounded contexts: Quantization Core, Training, MoE Routing, WASM Runtime, Observability
- Pi-constant quantization with irrational scaling (pi/k step sizes)
- QAT training loop with STE variants and LoRA-QAT lightweight path
- QuIP incoherence via fast Walsh-Hadamard (O(n log n))
- Memory-aware MoE routing with expert precision allocation
- WASM SIMD128 kernels reusing existing tl1_wasm.rs LUT pattern
- Security: weight integrity, GGUF validation, WASM sandbox
- Benchmarking: criterion suite with throughput/quality targets
- 14-week timeline, maps to 18 existing files for extension
Placed in docs/adr/ddd/ per DDD architectural pattern organization.
https://claude.ai/code/session_01E4pmfETYzknb1xq2dzCCaj
---------
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-12 10:21:30 -04:00