ruvector/crates/ruvector-attention-node/npm
rUv 253faf3902 perf(sparse-inference): 6x speedup with W2 transpose and SIMD activations
Key optimizations in v0.1.31:
- W2 matrix stored transposed for contiguous row access during sparse accumulation
- SIMD GELU/SiLU using AVX2+FMA polynomial approximations
- Cached SIMD feature detection with OnceLock (eliminates runtime CPUID calls)
- SIMD axpy for vectorized weight accumulation

Benchmark results (512 input, 2048 hidden):
- 10% active: 130µs (83% reduction, 52× vs dense)
- 30% active: 383µs (83% reduction, 18× vs dense)
- 50% active: 651µs (83% reduction, 10× vs dense)
- 70% active: 912µs (83% reduction, 7× vs dense)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 05:07:42 +00:00
..
darwin-arm64 feat: Export all 39 attention mechanisms and utilities 2025-11-30 22:23:21 +00:00
darwin-x64 perf(sparse-inference): 6x speedup with W2 transpose and SIMD activations 2026-01-05 05:07:42 +00:00
linux-arm64-gnu feat: Export all 39 attention mechanisms and utilities 2025-11-30 22:23:21 +00:00
linux-x64-gnu perf(sparse-inference): 6x speedup with W2 transpose and SIMD activations 2026-01-05 05:07:42 +00:00
linux-x64-musl fix: Fix PQ integration test failures and add v0.1.18 release 2025-11-30 20:45:43 +00:00
win32-arm64-msvc fix: Fix PQ integration test failures and add v0.1.18 release 2025-11-30 20:45:43 +00:00
win32-x64-msvc perf(sparse-inference): 6x speedup with W2 transpose and SIMD activations 2026-01-05 05:07:42 +00:00