ruvector/crates/ruvector-core/examples
Reuven 48304e7a11 feat: Add ARM NEON SIMD optimizations for Apple Silicon (M1/M2/M3/M4)
Performance improvements on Apple Silicon M4 Pro:
- Euclidean distance: 2.96x faster
- Dot product: 3.09x faster
- Cosine similarity: 5.96x faster

Changes:
- Add NEON implementations using std::arch::aarch64 intrinsics
- Use vfmaq_f32 (fused multiply-add) for better accuracy and performance
- Use vaddvq_f32 for efficient horizontal sum
- Add Manhattan distance SIMD implementation
- Update public API with architecture dispatch (_simd functions)
- Maintain backward compatibility with _avx2 function aliases
- Add comprehensive tests for SIMD correctness
- Add NEON benchmark example

The SIMD functions now automatically dispatch:
- x86_64: AVX2 (with runtime detection)
- aarch64: NEON (Apple Silicon, always available)
- Other: Scalar fallback

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 13:14:32 -05:00
..
embeddings_example.rs fix(ci): Fix formatting and workflow permission issues 2025-12-26 22:11:57 +00:00
neon_benchmark.rs feat: Add ARM NEON SIMD optimizations for Apple Silicon (M1/M2/M3/M4) 2026-01-18 13:14:32 -05:00