mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-23 04:27:11 +00:00
Add deep research into three-axis KV cache compression: - TriAttention (arXiv:2604.04921): trigonometric RoPE-based token sparsity, 10.7x - Stacked compression: TriAttention × TurboQuant for ~50x KV reduction - ADR-147: formal architecture decision with GOAP implementation plan No published work combines these orthogonal methods. First-mover opportunity for ruvLLM edge inference (128K context in 175MB on Pi 5). Co-authored-by: Reuven <cohen@ruv-mac-mini.local> |
||
|---|---|---|
| .. | ||
| 00-README.md | ||
| 01-ultra-low-bit-quantization-survey.md | ||
| 02-quantization-aware-training-qat.md | ||
| 03-quip-2bit-framework.md | ||
| 04-moe-memory-aware-routing.md | ||
| 05-ruvllm-quantization-architecture.md | ||
| 06-implementation-plan-rust-ruvllm.md | ||
| 07-3int-pi-constant-quantization.md | ||
| 08-turboquant-kv-cache-compression.md | ||
| 09-triattention-kv-sparsity.md | ||
| 10-stacked-kv-compression.md | ||