mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-24 22:15:18 +00:00
Key optimizations in v0.1.31: - W2 matrix stored transposed for contiguous row access during sparse accumulation - SIMD GELU/SiLU using AVX2+FMA polynomial approximations - Cached SIMD feature detection with OnceLock (eliminates runtime CPUID calls) - SIMD axpy for vectorized weight accumulation Benchmark results (512 input, 2048 hidden): - 10% active: 130µs (83% reduction, 52× vs dense) - 30% active: 383µs (83% reduction, 18× vs dense) - 50% active: 651µs (83% reduction, 10× vs dense) - 70% active: 912µs (83% reduction, 7× vs dense) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| darwin-arm64 | ||
| darwin-x64 | ||
| linux-arm64-gnu | ||
| linux-x64-gnu | ||
| linux-x64-musl | ||
| win32-arm64-msvc | ||
| win32-x64-msvc | ||