Add ModelDownloader module to @ruvector/ruvllm npm package with
automatic download capability for RuvLTRA models from HuggingFace.
New CLI commands:
- `ruvllm models list` - Show available models with download status
- `ruvllm models download <id>` - Download specific model
- `ruvllm models download --all` - Download all models
- `ruvllm models status` - Check which models are downloaded
- `ruvllm models delete <id>` - Remove downloaded model
Available models (from https://huggingface.co/ruv/ruvltra):
- claude-code (398 MB) - Optimized for Claude Code workflows
- small (398 MB) - Edge devices, IoT
- medium (669 MB) - General purpose
Features:
- Progress tracking with speed and ETA
- Automatic directory creation (~/.ruvllm/models)
- Resume support (skips already downloaded)
- Force re-download option
- JSON output for scripting
- Model aliases (cc, sm, med)
Also updates Rust registry to use consolidated HuggingFace repo.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add architecture decision records for the 3 critical P0 features needed for
production LLM inference parity with vLLM/SGLang:
ADR-009: Structured Output (JSON Mode)
- Constrained decoding with state machine token filtering
- GBNF grammar support for complex schemas
- Incremental JSON validation during generation
- Performance: <2ms overhead per token
ADR-010: Function Calling (Tool Use)
- OpenAI-compatible tool definition format
- Stop-sequence based argument extraction
- Parallel and sequential function execution
- Automatic retry with error context
ADR-011: Prefix Caching (Radix Tree)
- SGLang-style radix tree for prefix matching
- Copy-on-write KV cache page sharing
- LRU eviction with configurable cache size
- 10x speedup target for chat/RAG workloads
Also includes:
- GitHub issue markdown for tracking implementation
- Comprehensive SOTA analysis comparing RuvLLM vs competitors
- Detailed roadmap (Q1-Q4 2026) for feature parity
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>