ruvector/crates/ruvector-sparse-inference/src
rUv 253faf3902 perf(sparse-inference): 6x speedup with W2 transpose and SIMD activations
Key optimizations in v0.1.31:
- W2 matrix stored transposed for contiguous row access during sparse accumulation
- SIMD GELU/SiLU using AVX2+FMA polynomial approximations
- Cached SIMD feature detection with OnceLock (eliminates runtime CPUID calls)
- SIMD axpy for vectorized weight accumulation

Benchmark results (512 input, 2048 hidden):
- 10% active: 130µs (83% reduction, 52× vs dense)
- 30% active: 383µs (83% reduction, 18× vs dense)
- 50% active: 651µs (83% reduction, 10× vs dense)
- 70% active: 912µs (83% reduction, 7× vs dense)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 05:07:42 +00:00
..
backend perf(sparse-inference): 6x speedup with W2 transpose and SIMD activations 2026-01-05 05:07:42 +00:00
integration feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
model feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
pi feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
precision feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
predictor feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
sparse perf(sparse-inference): 6x speedup with W2 transpose and SIMD activations 2026-01-05 05:07:42 +00:00
config.rs feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
error.rs feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
lib.rs feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
memory.rs feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00
ops.rs feat: Add PowerInfer-style sparse inference engine with precision lanes (#106) 2026-01-04 23:40:31 -05:00