ruvector

mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-24 05:43:58 +00:00

History

Reuven 72493c7abd feat: Complete LLM system with Candle, MicroLoRA, NEON kernels Implements a full LLM inference and fine-tuning system optimized for Mac M4 Pro: ## New Crates - ruvllm-cli: CLI tool with download, serve, chat, benchmark commands ## Backends (crates/ruvllm/src/backends/) - LlmBackend trait for pluggable inference backends - CandleBackend with Metal acceleration, GGUF quantization, HF Hub ## MicroLoRA (crates/ruvllm/src/lora/) - Rank 1-2 adapters for <1ms per-request adaptation - EWC++ regularization to prevent catastrophic forgetting - Hot-swap adapter registry with composition strategies - Training pipeline with LR schedules (Constant, Cosine, OneCycle) ## NEON Kernels (crates/ruvllm/src/kernels/) - Flash Attention 2 with online softmax - Paged Attention for KV cache efficiency - Multi-Query (MQA) and Grouped-Query (GQA) attention - RoPE with precomputed tables and NTK-aware scaling - RMSNorm and LayerNorm with batched variants - GEMV, GEMM, batched GEMM with 4x unrolling ## Real-time Optimization (crates/ruvllm/src/optimization/) - SONA-LLM with 3 learning loops (instant <1ms, background ~100ms, deep) - RealtimeOptimizer with dynamic batch sizing - KV cache pressure policies (Evict, Quantize, Reject, Spill) - Metrics collection with moving averages and histograms ## Benchmarks - 6 Criterion benchmark suites for M4 Pro profiling - Runner script with baseline comparison ## Tests - 297 total tests (171 unit + 126 integration) - Full coverage of backends, LoRA, kernels, SONA, e2e ## Recommended Models for 48GB M4 Pro - Primary: Qwen2.5-14B-Instruct (Q8, 15-25 t/s) - Fast: Mistral-7B-Instruct-v0.3 (Q8, 30-45 t/s) - Tiny: Phi-4-mini (Q4, 40-60 t/s) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>		2026-01-18 21:04:21 -05:00
..
LLM-Integration-Architecture.md	feat: Complete LLM system with Candle, MicroLoRA, NEON kernels	2026-01-18 21:04:21 -05:00
NPM_PACKAGE_ARCHITECTURE.md	feat: Phase 3 - WASM architecture with in-memory storage	2025-11-21 13:40:34 +00:00
SYSTEM_OVERVIEW.md	feat: Complete ALL Ruvector phases - production-ready vector database	2025-11-19 14:37:21 +00:00
TECHNICAL_PLAN.md	docs: Reorganize documentation and add postgres README	2025-12-02 16:45:44 +00:00