mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-24 13:54:31 +00:00
Key optimizations in v0.1.31: - W2 matrix stored transposed for contiguous row access during sparse accumulation - SIMD GELU/SiLU using AVX2+FMA polynomial approximations - Cached SIMD feature detection with OnceLock (eliminates runtime CPUID calls) - SIMD axpy for vectorized weight accumulation Benchmark results (512 input, 2048 hidden): - 10% active: 130µs (83% reduction, 52× vs dense) - 30% active: 383µs (83% reduction, 18× vs dense) - 50% active: 651µs (83% reduction, 10× vs dense) - 70% active: 912µs (83% reduction, 7× vs dense) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| backend | ||
| integration | ||
| model | ||
| pi | ||
| precision | ||
| predictor | ||
| sparse | ||
| config.rs | ||
| error.rs | ||
| lib.rs | ||
| memory.rs | ||
| ops.rs | ||