ruvector/docs
Reuven 1e358ce819 feat: Complete production LLM system with Metal GPU, streaming, speculative decoding
This commit completes the RuvLLM system with all missing production features:

## New Features

### mistral-rs Backend (mistral_backend.rs)
- PagedAttention integration for memory efficiency
- X-LoRA dynamic adapter mixing with learned routing
- ISQ runtime quantization (AWQ, GPTQ, SmoothQuant)
- 9 tests passing

### Real Model Loading (candle_backend.rs ~1,590 lines)
- GGUF quantized loading (Q4_K_M, Q4_0, Q8_0)
- Safetensors memory-mapped loading
- HuggingFace Hub auto-download
- Full generation pipeline with sampling

### Tokenizer Integration (tokenizer.rs)
- HuggingFace tokenizers with chat templates
- Llama3, Llama2, Mistral, Qwen/ChatML, Phi, Gemma formats
- Streaming decode with UTF-8 buffer
- Auto-detection from model ID
- 14 tests passing

### Metal GPU Shaders (metal/)
- Flash Attention 2 with simdgroup_matrix tensor cores
- FP16 GEMM with 2x throughput
- RMSNorm, LayerNorm
- RoPE with YaRN and ALiBi support
- Buffer pooling with RAII scoping

### Streaming Generation
- Real token-by-token generation
- CLI colored streaming output
- HTTP SSE for OpenAI-compatible API
- Async support via AsyncTokenStream

### Speculative Decoding (speculative.rs ~1,119 lines)
- Adaptive lookahead (2-8 tokens)
- Tree-based speculation
- 2-3x speedup for low-temperature sampling
- 29 tests passing

## Optimizations (52% attention speedup)
- 8x loop unrolling throughout
- Dual accumulator pattern for FMA latency hiding
- 64-byte aligned buffers
- Memory pooling in KV cache
- Fused A*B operations in MicroLoRA
- Fast exp polynomial approximation

## Benchmark Results (All Targets Met)
- Flash Attention (256 seq): 840µs (<2ms target) 
- RMSNorm (4096 dim): 620ns (<10µs target) 
- GEMV (4096x4096): 1.36ms (<5ms target) 
- MicroLoRA forward: 2.61µs (<1ms target) 

## Documentation
- Comprehensive rustdoc on all public APIs
- Performance tables with benchmarks
- Architecture diagrams
- Usage examples

## Tests
- 307 total tests, 300 passing, 7 ignored (doc tests)
- Full coverage: backends, kernels, LoRA, SONA, speculative, e2e

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 22:06:22 -05:00
..
adr docs: Add comprehensive ADRs for ruvector and ruvllm architecture 2026-01-18 16:31:14 -05:00
api docs: Add Cypher reference, include Tiny Dancer, fix WASM build 2025-11-26 12:54:04 +00:00
architecture feat: Complete LLM system with Candle, MicroLoRA, NEON kernels 2026-01-18 21:04:21 -05:00
benchmarks feat(postgres): Add HNSW index and embedding functions support (#62) 2025-12-09 11:14:52 -05:00
cloud-architecture Implement global streaming optimization for 500M concurrent streams 2025-11-20 18:51:26 +00:00
dag docs(dag): add comprehensive Neural DAG Learning implementation plan 2025-12-29 22:15:55 +00:00
development feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
examples feat(nervous-system): Complete bio-inspired neural architecture implementation 2025-12-28 04:05:08 +00:00
gnn feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
guides feat(postgres): Add HNSW index and embedding functions support (#62) 2025-12-09 11:14:52 -05:00
hnsw docs: Reorganize documentation and add postgres README 2025-12-02 16:45:44 +00:00
hooks feat(cli): Implement full hooks system in Rust CLI 2025-12-27 01:08:36 +00:00
implementation feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
integration feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
nervous-system feat(nervous-system): Complete bio-inspired neural architecture implementation 2025-12-28 04:05:08 +00:00
optimization chore(docs): Clean up and reorganize documentation structure 2025-12-25 19:39:44 +00:00
plans/subpolynomial-time-mincut chore(docs): Clean up and reorganize documentation structure 2025-12-25 19:39:44 +00:00
postgres Feat/ruvector postgres v2 (#82) 2025-12-25 17:02:55 -05:00
project-phases Clean up repository structure and organize documentation 2025-11-20 19:50:03 +00:00
publishing feat: Implement all 6 ADRs for ruvector and ruvllm optimization 2026-01-18 16:52:15 -05:00
research chore(docs): Clean up and reorganize documentation structure 2025-12-25 19:39:44 +00:00
ruvllm feat: Complete production LLM system with Metal GPU, streaming, speculative decoding 2026-01-18 22:06:22 -05:00
sparse-inference feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
sql feat(postgres): Add ruvector-postgres extension with SIMD optimizations (#42) 2025-12-02 09:55:07 -05:00
testing Clean up repository structure and organize documentation 2025-11-20 19:50:03 +00:00
.gitkeep Clean up repository structure and organize documentation 2025-11-20 19:50:03 +00:00
algorithmic-optimization-analysis.md docs: Add performance optimization analysis reports 2025-12-26 17:41:13 +00:00
BENCHMARK_RESULTS.md docs: Add comprehensive benchmark results and CI script 2026-01-18 17:01:06 -05:00
BTSP_IMPLEMENTATION.md feat(nervous-system): Complete bio-inspired neural architecture implementation 2025-12-28 04:05:08 +00:00
code-review-mincut-gated-transformer.md fix(security): Critical security and performance improvements 2025-12-26 16:25:02 +00:00
dendrite-implementation-summary.md feat(nervous-system): Complete bio-inspired neural architecture implementation 2025-12-28 04:05:08 +00:00
exotic-neural-trader-code-review.md docs: add neural-trader code review and performance analysis reports 2025-12-31 02:56:08 +00:00
INDEX.md chore(docs): Clean up and reorganize documentation structure 2025-12-25 19:39:44 +00:00
LLM_BENCHMARK_RESULTS.md feat: Complete production LLM system with Metal GPU, streaming, speculative decoding 2026-01-18 22:06:22 -05:00
mincut-transformer-memory-optimization-analysis.md docs: Add performance optimization analysis reports 2025-12-26 17:41:13 +00:00
nervous-system-eventbus-summary.md feat(nervous-system): Complete bio-inspired neural architecture implementation 2025-12-28 04:05:08 +00:00
neural-trader-performance-analysis.md docs: add neural-trader code review and performance analysis reports 2025-12-31 02:56:08 +00:00
plaid-bottleneck-summary.md fix(security): Address critical security and performance issues 2026-01-01 18:36:58 +00:00
plaid-optimization-guide.md fix(security): Address critical security and performance issues 2026-01-01 18:36:58 +00:00
plaid-performance-analysis.md fix(security): Address critical security and performance issues 2026-01-01 18:36:58 +00:00
qudag-token-implementation.md feat(dag): implement Neural Self-Learning DAG with QuDAG integration 2025-12-29 22:58:43 +00:00
README.md Clean up repository structure and organize documentation 2025-11-20 19:50:03 +00:00
REPO_STRUCTURE.md feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
security-audit-fpga-transformer.md feat: Add FPGA Transformer backend crates (#105) 2026-01-04 18:59:02 -05:00
SECURITY_AUDIT.md feat: Implement all 6 ADRs for ruvector and ruvllm optimization 2026-01-18 16:52:15 -05:00
simd-optimization-analysis.md docs: Add performance optimization analysis reports 2025-12-26 17:41:13 +00:00
SPECULATIVE_DECODING.md docs(mincut-transformer): Add examples and documentation for SOTA features 2025-12-26 19:55:06 +00:00
workspace-implementation-summary.md feat(nervous-system): Complete bio-inspired neural architecture implementation 2025-12-28 04:05:08 +00:00
zk_security_audit_report.md fix(security): Address critical security and performance issues 2026-01-01 18:36:58 +00:00

RuVector Documentation

Complete documentation for RuVector, the high-performance Rust vector database with global scale capabilities.

📚 Documentation Structure

Getting Started

Quick start guides and tutorials for new users:

Architecture & Design

System architecture and design documentation:

API Reference

API documentation for different platforms:

User Guides

Comprehensive user guides:

Performance & Optimization

Performance tuning and benchmarking:

Development

Contributing and development guides:

Testing

Testing documentation and reports:

Project History

Historical project phase documentation:

Implementation Summary


For New Users

  1. Start with Getting Started Guide
  2. Try the Basic Tutorial
  3. Review API Documentation

For Cloud Deployment

  1. Read Architecture Overview
  2. Follow Deployment Guide
  3. Apply Performance Optimizations

For Contributors

  1. Read Contributing Guidelines
  2. Review Technical Plan
  3. Check Migration Guide

For Performance Tuning

  1. Review Optimization Guide
  2. Run Benchmarks
  3. Apply Query Optimizations

📊 Documentation Status

Category Files Status
Getting Started 7 Complete
Architecture 11 Complete
API Reference 2 Complete
User Guides 4 Complete
Optimization 4 Complete
Development 3 Complete
Testing 2 Complete
Project Phases 8 📚 Historical

Total Documentation: 40+ comprehensive documents


🔗 External Resources


Last Updated: 2025-11-20 | Version: 0.1.0 | Status: Production Ready