ruvector/docs/research/quantization-edge
rUv 76679927c8 research(kv-cache): TriAttention + TurboQuant stacked compression analysis (#342)
Add deep research into three-axis KV cache compression:
- TriAttention (arXiv:2604.04921): trigonometric RoPE-based token sparsity, 10.7x
- Stacked compression: TriAttention × TurboQuant for ~50x KV reduction
- ADR-147: formal architecture decision with GOAP implementation plan

No published work combines these orthogonal methods. First-mover opportunity
for ruvLLM edge inference (128K context in 175MB on Pi 5).

Co-authored-by: Reuven <cohen@ruv-mac-mini.local>
2026-04-08 13:29:16 -05:00
..
00-README.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
01-ultra-low-bit-quantization-survey.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
02-quantization-aware-training-qat.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
03-quip-2bit-framework.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
04-moe-memory-aware-routing.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
05-ruvllm-quantization-architecture.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
06-implementation-plan-rust-ruvllm.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
07-3int-pi-constant-quantization.md docs(research): add ultra-low-bit quantization & edge deployment research (#255) 2026-03-12 10:21:30 -04:00
08-turboquant-kv-cache-compression.md docs(research): add TurboQuant KV cache compression research document 2026-03-25 12:14:17 +00:00
09-triattention-kv-sparsity.md research(kv-cache): TriAttention + TurboQuant stacked compression analysis (#342) 2026-04-08 13:29:16 -05:00
10-stacked-kv-compression.md research(kv-cache): TriAttention + TurboQuant stacked compression analysis (#342) 2026-04-08 13:29:16 -05:00