ruvector/docs/adr
Reuven a0a8065a17 docs(adr): add P0 SOTA feature ADRs - Structured Output, Function Calling, Prefix Caching
Add architecture decision records for the 3 critical P0 features needed for
production LLM inference parity with vLLM/SGLang:

ADR-009: Structured Output (JSON Mode)
- Constrained decoding with state machine token filtering
- GBNF grammar support for complex schemas
- Incremental JSON validation during generation
- Performance: <2ms overhead per token

ADR-010: Function Calling (Tool Use)
- OpenAI-compatible tool definition format
- Stop-sequence based argument extraction
- Parallel and sequential function execution
- Automatic retry with error context

ADR-011: Prefix Caching (Radix Tree)
- SGLang-style radix tree for prefix matching
- Copy-on-write KV cache page sharing
- LRU eviction with configurable cache size
- 10x speedup target for chat/RAG workloads

Also includes:
- GitHub issue markdown for tracking implementation
- Comprehensive SOTA analysis comparing RuvLLM vs competitors
- Detailed roadmap (Q1-Q4 2026) for feature parity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 15:02:07 -05:00
..
ADR-001-ruvector-core-architecture.md fix(security): Apply 8 critical security fixes and update ADRs 2026-01-19 11:21:31 -05:00
ADR-002-ruvllm-integration.md docs(adr): Update ADRs with v2.1.1 performance optimizations 2026-01-19 12:03:43 -05:00
ADR-003-simd-optimization-strategy.md docs(adr): Update ADRs with v2.1.1 performance optimizations 2026-01-19 12:03:43 -05:00
ADR-004-kv-cache-management.md fix(security): Apply 8 critical security fixes and update ADRs 2026-01-19 11:21:31 -05:00
ADR-005-wasm-runtime-integration.md fix(security): Apply 8 critical security fixes and update ADRs 2026-01-19 11:21:31 -05:00
ADR-006-memory-management.md fix(security): Apply 8 critical security fixes and update ADRs 2026-01-19 11:21:31 -05:00
ADR-007-security-review-technical-debt.md fix(security): Apply 8 critical security fixes and update ADRs 2026-01-19 11:21:31 -05:00
ADR-008-mistral-rs-integration.md feat(ruvllm): mistral-rs backend integration for production-scale serving 2026-01-20 14:03:48 -05:00
ADR-009-structured-output.md docs(adr): add P0 SOTA feature ADRs - Structured Output, Function Calling, Prefix Caching 2026-01-20 15:02:07 -05:00
ADR-010-function-calling.md docs(adr): add P0 SOTA feature ADRs - Structured Output, Function Calling, Prefix Caching 2026-01-20 15:02:07 -05:00
ADR-011-prefix-caching.md docs(adr): add P0 SOTA feature ADRs - Structured Output, Function Calling, Prefix Caching 2026-01-20 15:02:07 -05:00