ruvector/docs/research/cnn/RUVECTOR_INTEGRATION.md
rUv e743785c7d feat(ruvector-cnn): CNN contrastive learning + SIMD optimization fixes (#252)
* feat: add CNN contrastive learning crate with SIMD optimization

- Add ruvector-cnn crate with SIMD-optimized convolutions and contrastive losses
- Implement InfoNCE (SimCLR) and TripletLoss for contrastive learning
- Add MobileNet-V3 inspired backbone architecture
- Include AVX2, NEON, WASM SIMD support with scalar fallback
- Add WASM bindings (ruvector-cnn-wasm) for browser/Node.js
- Add npm package with TypeScript definitions
- Include comprehensive research docs and ADR-088
- 36 tests passing

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: add npm package JavaScript wrapper and TypeScript definitions

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(ruvector-cnn): implement real SIMD and fix stubbed code

## SIMD Implementations (was using scalar fallbacks)
- AVX2: conv_3x3_avx2, conv_3x3_avx2_fma, depthwise_conv_3x3_avx2
- AVX2: global_avg_pool_avx2, max_pool_2x2_avx2
- WASM: conv_3x3_wasm, depthwise_conv_3x3_wasm

All now use real SIMD intrinsics processing 8 (AVX2) or 4 (WASM)
channels simultaneously with scalar fallback for remainders.

## Backbone Fixes
- Deprecated MobileNetV3Small/Large (use unified MobileNetV3 instead)
- Implemented actual block processing in forward() methods
- Fixed hardcoded channel counts in global_avg_pool calls

## Dead Code Fixes
- Added #[allow(dead_code)] for momentum field (used in training)
- Added #[allow(dead_code)] for rng field (feature-gated)
- Added #[cfg(feature = "augmentation")] for rand::Rng import
- Commented out undefined "parallel" feature reference

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(ruvector-cnn): add Winograd F(2,3) and π-calibrated INT8 quantization

- Add Winograd F(2,3) transforms for 2.25x faster 3x3 convolutions
- Implement π-calibrated INT8 quantization with anti-resonance offsets
- Apply 4x loop unrolling with 4 accumulators to AVX2 convolutions
- Update README with practical intro, capabilities table, benchmarks
- Update npm README with simpler language and examples
- Add CNN image embeddings to root README capabilities

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: publish @ruvector/cnn v0.1.0 WASM npm package

- Add unsafe blocks for WASM SIMD intrinsics (v128_load/v128_store)
- Disable wasm-opt to avoid SIMD validation issues
- Build and include WASM bindings in npm package
- Update npm package.json with all WASM files
- Published to npm as @ruvector/cnn@0.1.0

Co-Authored-By: claude-flow <ruv@ruv.net>

---------

Co-authored-by: Reuven <cohen@ruv-mac-mini.local>
2026-03-11 17:41:53 -04:00

40 KiB

RuVector CNN Integration Architecture

Executive Summary

This document outlines the integration architecture for CNN-based feature extraction with RuVector's vector similarity search capabilities. The goal is to provide end-to-end image embedding generation and retrieval using SIMD-optimized Rust implementations.

Integration Overview

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                         RuVector CNN Pipeline                       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────┐    ┌───────────────┐    ┌──────────────┐              │
│  │  Image   │───▶│ Preprocessor  │───▶│    CNN       │              │
│  │  Input   │    │ (Resize/Norm) │    │  Backbone    │              │
│  └──────────┘    └───────────────┘    └──────┬───────┘              │
│                                              │                       │
│                                              ▼                       │
│  ┌──────────┐    ┌───────────────┐    ┌──────────────┐              │
│  │ RuVector │◀───│  L2 Normalize │◀───│  Embedding   │              │
│  │  Index   │    │               │    │  Extraction  │              │
│  └──────────┘    └───────────────┘    └──────────────┘              │
│       │                                                              │
│       ▼                                                              │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                    Vector Search (HNSW/IVF)                   │   │
│  │  • Approximate Nearest Neighbors                             │   │
│  │  • Cosine Similarity (via inner product on normalized vecs)  │   │
│  │  • Filtering and Re-ranking                                  │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Module Structure

Proposed Directory Layout

ruvector/
├── crates/
│   └── ruvector-cnn/
│       ├── Cargo.toml
│       ├── src/
│       │   ├── lib.rs
│       │   ├── backbone/
│       │   │   ├── mod.rs
│       │   │   ├── mobilenet.rs
│       │   │   ├── efficientnet.rs
│       │   │   └── shufflenet.rs
│       │   ├── layers/
│       │   │   ├── mod.rs
│       │   │   ├── conv.rs
│       │   │   ├── batchnorm.rs
│       │   │   ├── pooling.rs
│       │   │   └── activation.rs
│       │   ├── simd/
│       │   │   ├── mod.rs
│       │   │   ├── avx2.rs
│       │   │   ├── avx512.rs
│       │   │   └── neon.rs
│       │   ├── quantization/
│       │   │   ├── mod.rs
│       │   │   ├── int8.rs
│       │   │   └── calibration.rs
│       │   ├── contrastive/
│       │   │   ├── mod.rs
│       │   │   ├── infonce.rs
│       │   │   └── triplet.rs
│       │   └── embedding.rs
│       └── tests/
│           └── integration.rs

Cargo.toml

[package]
name = "ruvector-cnn"
version = "0.1.0"
edition = "2021"
description = "CNN feature extraction for RuVector"

[features]
default = ["avx2"]
avx2 = []
avx512 = []
neon = []
quantized = []
onnx = ["tract-onnx"]

[dependencies]
ndarray = { version = "0.16", features = ["blas"] }
ruvector-core = { path = "../ruvector-core" }

# Optional: ONNX model loading
tract-onnx = { version = "0.21", optional = true }

# Image preprocessing
image = { version = "0.25", default-features = false, features = ["jpeg", "png"] }

# Parallelism
rayon = "1.8"

[dev-dependencies]
criterion = "0.5"
rand = "0.8"

Core Interfaces

Embedding Trait

// src/embedding.rs

use ndarray::Array2;
use ruvector_core::Vector;

/// Trait for models that extract embeddings from images
pub trait EmbeddingExtractor: Send + Sync {
    /// Dimension of output embedding
    fn embedding_dim(&self) -> usize;

    /// Extract embedding from a single image
    fn extract(&self, image: &[u8], width: u32, height: u32) -> Result<Vector, CnnError>;

    /// Batch extraction for efficiency
    fn extract_batch(
        &self,
        images: &[(&[u8], u32, u32)],
    ) -> Result<Vec<Vector>, CnnError> {
        images.iter()
            .map(|(data, w, h)| self.extract(data, *w, *h))
            .collect()
    }
}

/// Configuration for embedding extraction
#[derive(Clone, Debug)]
pub struct EmbeddingConfig {
    /// Input image size (square)
    pub input_size: u32,
    /// Normalize embeddings to unit length
    pub normalize: bool,
    /// Embedding dimension
    pub embedding_dim: usize,
    /// Use INT8 quantization
    pub quantized: bool,
}

impl Default for EmbeddingConfig {
    fn default() -> Self {
        Self {
            input_size: 224,
            normalize: true,
            embedding_dim: 512,
            quantized: false,
        }
    }
}

CNN Backbone Interface

// src/backbone/mod.rs

use ndarray::Array4;

/// Trait for CNN backbones
pub trait Backbone: Send + Sync {
    /// Forward pass returning feature maps
    fn forward(&self, input: &Array4<f32>) -> Array4<f32>;

    /// Output feature dimension (channels)
    fn output_channels(&self) -> usize;

    /// Expected input size
    fn input_size(&self) -> (usize, usize);
}

/// Factory for creating backbones
pub enum BackboneType {
    MobileNetV3Small,
    MobileNetV3Large,
    ShuffleNetV2_0_5,
    ShuffleNetV2_1_0,
    EfficientNetB0,
}

pub fn create_backbone(
    backbone_type: BackboneType,
    weights_path: Option<&str>,
) -> Result<Box<dyn Backbone>, CnnError> {
    match backbone_type {
        BackboneType::MobileNetV3Small => {
            Ok(Box::new(MobileNetV3::small(weights_path)?))
        }
        BackboneType::MobileNetV3Large => {
            Ok(Box::new(MobileNetV3::large(weights_path)?))
        }
        BackboneType::ShuffleNetV2_0_5 => {
            Ok(Box::new(ShuffleNetV2::new(0.5, weights_path)?))
        }
        BackboneType::ShuffleNetV2_1_0 => {
            Ok(Box::new(ShuffleNetV2::new(1.0, weights_path)?))
        }
        BackboneType::EfficientNetB0 => {
            Ok(Box::new(EfficientNet::b0(weights_path)?))
        }
    }
}

Integration with RuVector Index

Vector Index Integration

// Integration with ruvector-core

use ruvector_core::{Index, Vector, SearchResult};
use ruvector_cnn::{EmbeddingExtractor, MobileNetEmbedder};

/// Image index combining CNN embeddings with vector search
pub struct ImageIndex {
    embedder: Box<dyn EmbeddingExtractor>,
    index: Index,
    metadata: Vec<ImageMetadata>,
}

pub struct ImageMetadata {
    pub id: u64,
    pub path: String,
    pub size: (u32, u32),
    pub format: ImageFormat,
}

impl ImageIndex {
    pub fn new(config: ImageIndexConfig) -> Result<Self, Error> {
        let embedder = create_embedder(&config)?;
        let index = Index::new(IndexConfig {
            dimension: embedder.embedding_dim(),
            metric: Metric::InnerProduct, // Cosine on normalized vectors
            index_type: config.index_type,
        })?;

        Ok(Self {
            embedder,
            index,
            metadata: Vec::new(),
        })
    }

    /// Add image to index
    pub fn add_image(
        &mut self,
        id: u64,
        image_data: &[u8],
        width: u32,
        height: u32,
        path: String,
    ) -> Result<(), Error> {
        // Extract embedding
        let embedding = self.embedder.extract(image_data, width, height)?;

        // Add to vector index
        self.index.add(id, embedding)?;

        // Store metadata
        self.metadata.push(ImageMetadata {
            id,
            path,
            size: (width, height),
            format: detect_format(image_data),
        });

        Ok(())
    }

    /// Search for similar images
    pub fn search(
        &self,
        query_image: &[u8],
        width: u32,
        height: u32,
        k: usize,
    ) -> Result<Vec<ImageSearchResult>, Error> {
        // Extract query embedding
        let query_embedding = self.embedder.extract(query_image, width, height)?;

        // Vector search
        let results = self.index.search(&query_embedding, k)?;

        // Attach metadata
        Ok(results
            .into_iter()
            .map(|r| ImageSearchResult {
                id: r.id,
                score: r.score,
                metadata: self.get_metadata(r.id),
            })
            .collect())
    }

    /// Batch add for efficiency
    pub fn add_images_batch(
        &mut self,
        images: &[(u64, &[u8], u32, u32, String)],
    ) -> Result<(), Error> {
        use rayon::prelude::*;

        // Parallel embedding extraction
        let embeddings: Vec<_> = images
            .par_iter()
            .map(|(_, data, w, h, _)| {
                self.embedder.extract(data, *w, *h)
            })
            .collect::<Result<Vec<_>, _>>()?;

        // Batch add to index
        for ((id, _, w, h, path), embedding) in images.iter().zip(embeddings) {
            self.index.add(*id, embedding)?;
            self.metadata.push(ImageMetadata {
                id: *id,
                path: path.clone(),
                size: (*w, *h),
                format: ImageFormat::Unknown,
            });
        }

        Ok(())
    }
}

Contrastive Learning Integration

Training Pipeline

// src/contrastive/mod.rs

use ndarray::Array2;

/// InfoNCE / NT-Xent loss for contrastive learning
pub struct InfoNCELoss {
    temperature: f32,
}

impl InfoNCELoss {
    pub fn new(temperature: f32) -> Self {
        Self { temperature }
    }

    /// Compute loss for positive pairs
    /// embeddings: [2N, D] where (i, i+N) are positive pairs
    pub fn forward(&self, embeddings: &Array2<f32>) -> f32 {
        let batch_size = embeddings.nrows() / 2;
        let dim = embeddings.ncols();

        // Compute similarity matrix
        let sim = self.compute_similarity_matrix(embeddings);

        // InfoNCE loss
        let mut loss = 0.0;
        for i in 0..batch_size {
            let positive_idx = i + batch_size;

            // Positive similarity
            let pos_sim = sim[[i, positive_idx]] / self.temperature;

            // Denominator: sum over all negatives + positive
            let mut denom = 0.0f32;
            for j in 0..(2 * batch_size) {
                if j != i {
                    denom += (sim[[i, j]] / self.temperature).exp();
                }
            }

            loss -= pos_sim - denom.ln();
        }

        loss / batch_size as f32
    }

    #[target_feature(enable = "avx2")]
    unsafe fn compute_similarity_matrix(&self, embeddings: &Array2<f32>) -> Array2<f32> {
        let n = embeddings.nrows();
        let mut sim = Array2::zeros((n, n));

        // Cosine similarity = dot product of normalized vectors
        for i in 0..n {
            for j in i..n {
                let a = embeddings.row(i);
                let b = embeddings.row(j);
                let dot = dot_product_avx2(a.as_slice().unwrap(), b.as_slice().unwrap());
                sim[[i, j]] = dot;
                sim[[j, i]] = dot;
            }
        }

        sim
    }
}

/// Triplet loss for metric learning
pub struct TripletLoss {
    margin: f32,
}

impl TripletLoss {
    pub fn new(margin: f32) -> Self {
        Self { margin }
    }

    /// anchor, positive, negative: [N, D]
    pub fn forward(
        &self,
        anchor: &Array2<f32>,
        positive: &Array2<f32>,
        negative: &Array2<f32>,
    ) -> f32 {
        let batch_size = anchor.nrows();
        let mut loss = 0.0;

        for i in 0..batch_size {
            let d_pos = euclidean_distance(
                anchor.row(i).as_slice().unwrap(),
                positive.row(i).as_slice().unwrap(),
            );
            let d_neg = euclidean_distance(
                anchor.row(i).as_slice().unwrap(),
                negative.row(i).as_slice().unwrap(),
            );

            loss += (d_pos - d_neg + self.margin).max(0.0);
        }

        loss / batch_size as f32
    }
}

Data Augmentation for Contrastive Learning

// src/contrastive/augmentation.rs

use image::{DynamicImage, ImageBuffer, Rgb};
use rand::Rng;

/// Augmentation pipeline for SimCLR/MoCo
pub struct ContrastiveAugmentation {
    crop_scale: (f32, f32),      // Random crop scale range
    flip_prob: f32,              // Horizontal flip probability
    color_jitter: ColorJitter,   // Color augmentation
    blur_prob: f32,              // Gaussian blur probability
}

impl ContrastiveAugmentation {
    pub fn simclr_default() -> Self {
        Self {
            crop_scale: (0.08, 1.0),
            flip_prob: 0.5,
            color_jitter: ColorJitter {
                brightness: 0.8,
                contrast: 0.8,
                saturation: 0.8,
                hue: 0.2,
            },
            blur_prob: 0.5,
        }
    }

    /// Generate two augmented views of the same image
    pub fn generate_pair(&self, image: &DynamicImage) -> (DynamicImage, DynamicImage) {
        let mut rng = rand::thread_rng();

        let view1 = self.augment(image, &mut rng);
        let view2 = self.augment(image, &mut rng);

        (view1, view2)
    }

    fn augment<R: Rng>(&self, image: &DynamicImage, rng: &mut R) -> DynamicImage {
        let mut img = image.clone();

        // Random resized crop
        img = self.random_resized_crop(&img, rng);

        // Horizontal flip
        if rng.gen::<f32>() < self.flip_prob {
            img = img.fliph();
        }

        // Color jitter
        img = self.apply_color_jitter(&img, rng);

        // Gaussian blur
        if rng.gen::<f32>() < self.blur_prob {
            img = self.gaussian_blur(&img);
        }

        img
    }

    fn random_resized_crop<R: Rng>(
        &self,
        image: &DynamicImage,
        rng: &mut R,
    ) -> DynamicImage {
        let (w, h) = (image.width(), image.height());
        let area = w * h;

        // Random scale
        let scale = rng.gen_range(self.crop_scale.0..self.crop_scale.1);
        let target_area = (area as f32 * scale) as u32;

        // Random aspect ratio (3/4 to 4/3)
        let aspect = rng.gen_range(0.75..1.333);

        let crop_w = ((target_area as f32 * aspect).sqrt() as u32).min(w);
        let crop_h = ((target_area as f32 / aspect).sqrt() as u32).min(h);

        let x = rng.gen_range(0..=(w - crop_w));
        let y = rng.gen_range(0..=(h - crop_h));

        image.crop_imm(x, y, crop_w, crop_h)
    }
}

ONNX Model Loading

Using tract-onnx

// src/backbone/onnx.rs

use tract_onnx::prelude::*;
use ndarray::Array4;

pub struct OnnxBackbone {
    model: SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>,
    input_size: (usize, usize),
    output_channels: usize,
}

impl OnnxBackbone {
    pub fn load(model_path: &str) -> Result<Self, CnnError> {
        let model = tract_onnx::onnx()
            .model_for_path(model_path)?
            .with_input_fact(0, f32::fact(&[1, 3, 224, 224]).into())?
            .into_optimized()?
            .into_runnable()?;

        // Infer output shape
        let dummy = tract_ndarray::Array4::<f32>::zeros((1, 3, 224, 224));
        let output = model.run(tvec!(dummy.into()))?;
        let output_channels = output[0].shape()[1];

        Ok(Self {
            model,
            input_size: (224, 224),
            output_channels,
        })
    }
}

impl Backbone for OnnxBackbone {
    fn forward(&self, input: &Array4<f32>) -> Array4<f32> {
        let input_tract: tract_ndarray::ArrayD<f32> = input
            .clone()
            .into_dyn();

        let result = self.model
            .run(tvec!(input_tract.into()))
            .expect("Forward pass failed");

        result[0]
            .to_array_view::<f32>()
            .expect("Output conversion failed")
            .to_owned()
            .into_dimensionality::<ndarray::Ix4>()
            .expect("Shape conversion failed")
    }

    fn output_channels(&self) -> usize {
        self.output_channels
    }

    fn input_size(&self) -> (usize, usize) {
        self.input_size
    }
}

Performance Benchmarks

Target Performance

Component Target Latency Throughput
Image resize (224x224) <1ms >1000 img/s
MobileNet-V3 Small forward <5ms >200 img/s
ShuffleNet-V2 forward <3ms >300 img/s
L2 normalization (512-d) <0.01ms >100k vec/s
HNSW search (1M vectors) <1ms >1000 qps

Benchmark Suite

// benches/embedding_bench.rs

use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};
use ruvector_cnn::{MobileNetEmbedder, EmbeddingExtractor};

fn benchmark_embedding_extraction(c: &mut Criterion) {
    let embedder = MobileNetEmbedder::new_v3_small().unwrap();

    // Generate random test images
    let images: Vec<Vec<u8>> = (0..100)
        .map(|_| random_image_rgb(224, 224))
        .collect();

    let mut group = c.benchmark_group("embedding_extraction");

    group.bench_function("mobilenet_v3_small", |b| {
        b.iter(|| {
            for img in &images {
                let _ = embedder.extract(img, 224, 224);
            }
        })
    });

    group.finish();
}

fn benchmark_simd_operations(c: &mut Criterion) {
    let mut group = c.benchmark_group("simd_operations");

    for size in [256, 512, 1024, 2048].iter() {
        let a: Vec<f32> = (0..*size).map(|i| i as f32 / *size as f32).collect();
        let b: Vec<f32> = (0..*size).map(|i| (size - i) as f32 / *size as f32).collect();

        group.bench_with_input(
            BenchmarkId::new("dot_product_scalar", size),
            size,
            |bench, _| {
                bench.iter(|| dot_product_scalar(&a, &b))
            },
        );

        group.bench_with_input(
            BenchmarkId::new("dot_product_avx2", size),
            size,
            |bench, _| {
                bench.iter(|| unsafe { dot_product_avx2(&a, &b) })
            },
        );
    }

    group.finish();
}

criterion_group!(benches, benchmark_embedding_extraction, benchmark_simd_operations);
criterion_main!(benches);

Usage Examples

Basic Image Embedding

use ruvector_cnn::{MobileNetEmbedder, EmbeddingConfig, EmbeddingExtractor};
use ruvector_core::Index;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create embedder with MobileNet-V3 Small
    let embedder = MobileNetEmbedder::new(EmbeddingConfig {
        input_size: 224,
        embedding_dim: 512,
        normalize: true,
        quantized: false,
    })?;

    // Load and embed an image
    let image_data = std::fs::read("image.jpg")?;
    let (width, height) = get_image_dimensions(&image_data)?;

    let embedding = embedder.extract(&image_data, width, height)?;

    println!("Embedding dimension: {}", embedding.len());
    println!("First 5 values: {:?}", &embedding.as_slice()[..5]);

    Ok(())
}

Building an Image Search Index

use ruvector_cnn::ImageIndex;
use std::path::Path;

fn build_image_index(image_dir: &Path) -> Result<ImageIndex, Box<dyn std::error::Error>> {
    let mut index = ImageIndex::new(ImageIndexConfig {
        backbone: BackboneType::MobileNetV3Small,
        index_type: IndexType::HNSW { ef_construction: 200, m: 16 },
        ..Default::default()
    })?;

    // Add all images from directory
    let mut id = 0u64;
    for entry in std::fs::read_dir(image_dir)? {
        let path = entry?.path();
        if path.extension().map_or(false, |e| e == "jpg" || e == "png") {
            let data = std::fs::read(&path)?;
            let (w, h) = get_image_dimensions(&data)?;

            index.add_image(id, &data, w, h, path.to_string_lossy().to_string())?;
            id += 1;
        }
    }

    println!("Indexed {} images", id);
    Ok(index)
}

fn search_similar(index: &ImageIndex, query_path: &str) -> Result<(), Box<dyn std::error::Error>> {
    let query_data = std::fs::read(query_path)?;
    let (w, h) = get_image_dimensions(&query_data)?;

    let results = index.search(&query_data, w, h, 10)?;

    println!("Top 10 similar images:");
    for (i, result) in results.iter().enumerate() {
        println!("  {}. {} (score: {:.4})", i + 1, result.metadata.path, result.score);
    }

    Ok(())
}

Contrastive Pre-training

use ruvector_cnn::contrastive::{InfoNCELoss, ContrastiveAugmentation};
use ruvector_cnn::backbone::MobileNetV3;

fn pretrain_embedder(
    images: &[DynamicImage],
    epochs: usize,
    batch_size: usize,
) -> Result<MobileNetV3, Box<dyn std::error::Error>> {
    let augmentation = ContrastiveAugmentation::simclr_default();
    let loss_fn = InfoNCELoss::new(0.1); // temperature = 0.1
    let mut model = MobileNetV3::small_untrained()?;

    for epoch in 0..epochs {
        let mut epoch_loss = 0.0;

        for batch in images.chunks(batch_size) {
            // Generate augmented pairs
            let pairs: Vec<_> = batch
                .iter()
                .flat_map(|img| {
                    let (v1, v2) = augmentation.generate_pair(img);
                    vec![v1, v2]
                })
                .collect();

            // Forward pass
            let embeddings = model.forward_batch(&pairs)?;

            // Compute loss
            let loss = loss_fn.forward(&embeddings);
            epoch_loss += loss;

            // Backward pass and update (pseudo-code)
            // let gradients = loss.backward();
            // optimizer.step(&mut model, gradients);
        }

        println!("Epoch {}: loss = {:.4}", epoch + 1, epoch_loss / images.len() as f32);
    }

    Ok(model)
}

SONA Integration (Self-Optimizing Neural Architecture)

The sona crate provides adaptive learning capabilities that enhance CNN embeddings through trajectory-based learning.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    CNN + SONA Pipeline                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────┐    ┌───────────────┐    ┌──────────────┐          │
│  │  Image   │───▶│    CNN        │───▶│  Embedding   │          │
│  │  Input   │    │  Backbone     │    │  (512-d)     │          │
│  └──────────┘    └───────────────┘    └──────┬───────┘          │
│                                              │                   │
│                                              ▼                   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    SONA Engine                             │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐    │  │
│  │  │ Micro-LoRA  │  │ Base-LoRA   │  │  EWC++          │    │  │
│  │  │ (rank=1-2)  │  │ (rank=8-16) │  │  (anti-forget)  │    │  │
│  │  └─────────────┘  └─────────────┘  └─────────────────┘    │  │
│  │                         │                                  │  │
│  │                         ▼                                  │  │
│  │  ┌─────────────────────────────────────────────────────┐  │  │
│  │  │              ReasoningBank                          │  │  │
│  │  │  • Pattern extraction from successful embeddings    │  │  │
│  │  │  • Similarity search for related contexts           │  │  │
│  │  │  • Trajectory storage for replay learning           │  │  │
│  │  └─────────────────────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

SONA-Enhanced Embedding Extraction

use sona::{SonaEngine, SonaConfig, TrajectoryBuilder};
use ruvector_cnn::{MobileNetEmbedder, EmbeddingExtractor};

/// CNN embedder enhanced with SONA adaptive learning
pub struct SonaEnhancedEmbedder {
    cnn: MobileNetEmbedder,
    sona: SonaEngine,
}

impl SonaEnhancedEmbedder {
    pub fn new(cnn_config: EmbeddingConfig, sona_config: SonaConfig) -> Result<Self, Error> {
        Ok(Self {
            cnn: MobileNetEmbedder::new(cnn_config)?,
            sona: SonaEngine::new(sona_config),
        })
    }

    /// Extract embedding with SONA transformation
    pub fn extract_enhanced(&self, image_data: &[u8], w: u32, h: u32) -> Result<Vec<f32>, Error> {
        // Base CNN embedding
        let base_embedding = self.cnn.extract(image_data, w, h)?;

        // Apply SONA's learned Micro-LoRA transformation
        let mut enhanced = vec![0.0; base_embedding.len()];
        self.sona.apply_micro_lora(&base_embedding, &mut enhanced);

        // L2 normalize
        l2_normalize(&mut enhanced);

        Ok(enhanced)
    }

    /// Record contrastive pair for trajectory learning
    pub fn record_contrastive_pair(
        &self,
        anchor: &[u8],
        positive: &[u8],
        w: u32,
        h: u32,
    ) -> Result<(), Error> {
        let anchor_emb = self.cnn.extract(anchor, w, h)?;
        let positive_emb = self.cnn.extract(positive, w, h)?;

        // Begin trajectory with anchor
        let mut trajectory = self.sona.begin_trajectory(anchor_emb.clone());

        // Add positive as a step with high similarity signal
        let similarity = cosine_similarity(&anchor_emb, &positive_emb);
        trajectory.add_step(positive_emb, vec![], similarity);

        // End trajectory - triggers SONA learning
        self.sona.end_trajectory(trajectory, similarity);

        Ok(())
    }

    /// Query ReasoningBank for similar patterns
    pub fn find_similar_patterns(&self, embedding: &[f32], k: usize) -> Vec<LearnedPattern> {
        self.sona.reasoning_bank().search_similar(embedding, k)
    }
}

Trajectory-Based Contrastive Learning

/// Contrastive learning loop with SONA integration
pub fn contrastive_train_with_sona(
    embedder: &SonaEnhancedEmbedder,
    images: &[DynamicImage],
    augmentation: &ContrastiveAugmentation,
    epochs: usize,
) -> Result<(), Error> {
    for epoch in 0..epochs {
        for image in images {
            // Generate two augmented views
            let (view1, view2) = augmentation.generate_pair(image);

            // Record as contrastive pair - SONA learns the invariance
            embedder.record_contrastive_pair(
                &view1.as_bytes(),
                &view2.as_bytes(),
                view1.width(),
                view1.height(),
            )?;
        }

        // Periodic consolidation to prevent forgetting
        if epoch % 10 == 0 {
            embedder.sona.consolidate_memory();
        }

        println!("Epoch {}: SONA patterns = {}", epoch + 1,
            embedder.sona.reasoning_bank().pattern_count());
    }

    Ok(())
}

Hyperbolic HNSW Integration

Leverage ruvector-hyperbolic-hnsw for hierarchical image search where semantic concepts have natural tree-like relationships.

Why Hyperbolic for Images?

Property Euclidean Hyperbolic
Hierarchy Poor Excellent
Example "car" vs "sedan" "vehicle" → "car" → "sedan"
Volume growth Polynomial Exponential
Tree embedding Distorted Isometric

Poincaré Ball Projection

use ruvector_hyperbolic_hnsw::{
    HyperbolicHnsw, PoincareBall, HyperbolicConfig,
    exp_map_zero, log_map_zero, poincare_distance,
};

/// Project Euclidean CNN embedding to Poincaré ball
pub struct HyperbolicImageIndex {
    embedder: MobileNetEmbedder,
    poincare: PoincareBall,
    hnsw: HyperbolicHnsw,
}

impl HyperbolicImageIndex {
    pub fn new(config: HyperbolicConfig) -> Result<Self, Error> {
        Ok(Self {
            embedder: MobileNetEmbedder::new_v3_small()?,
            poincare: PoincareBall::new(config.curvature),
            hnsw: HyperbolicHnsw::new(config)?,
        })
    }

    /// Add image with hyperbolic embedding
    pub fn add_image(&mut self, id: u64, image: &[u8], w: u32, h: u32) -> Result<(), Error> {
        // Extract Euclidean embedding
        let euclidean_emb = self.embedder.extract(image, w, h)?;

        // Project to Poincaré ball using exponential map at origin
        let hyperbolic_emb = exp_map_zero(&euclidean_emb, self.poincare.curvature());

        // Add to hyperbolic HNSW index
        self.hnsw.add(id, &hyperbolic_emb)?;

        Ok(())
    }

    /// Search using hyperbolic distance
    pub fn search(&self, query_image: &[u8], w: u32, h: u32, k: usize) -> Result<Vec<SearchResult>, Error> {
        let euclidean_emb = self.embedder.extract(query_image, w, h)?;
        let hyperbolic_query = exp_map_zero(&euclidean_emb, self.poincare.curvature());

        self.hnsw.search(&hyperbolic_query, k)
    }

    /// Hierarchical search: find images at specific semantic level
    pub fn search_at_level(
        &self,
        query: &[f32],
        target_norm: f32,  // Norm ~ hierarchy level in Poincaré ball
        k: usize,
    ) -> Result<Vec<SearchResult>, Error> {
        // In Poincaré ball, points near origin are "general" (e.g., "animal")
        // Points near boundary are "specific" (e.g., "golden retriever puppy")
        let scaled_query = self.poincare.scale_to_norm(query, target_norm);
        self.hnsw.search(&scaled_query, k)
    }
}

Mixed-Curvature Embedding (via ruvector-math)

use ruvector_math::product_manifold::{ProductManifold, ManifoldType};

/// Embed images in mixed-curvature space
/// - Euclidean: color/texture features
/// - Hyperbolic: semantic hierarchy
/// - Spherical: cyclical patterns (time of day, seasons)
pub struct MixedCurvatureImageEmbedder {
    cnn: MobileNetEmbedder,
    manifold: ProductManifold,
}

impl MixedCurvatureImageEmbedder {
    pub fn new() -> Result<Self, Error> {
        Ok(Self {
            cnn: MobileNetEmbedder::new(EmbeddingConfig {
                embedding_dim: 512,
                ..Default::default()
            })?,
            // 256-dim Euclidean + 200-dim Hyperbolic + 56-dim Spherical = 512
            manifold: ProductManifold::new(256, 200, 56),
        })
    }

    pub fn embed(&self, image: &[u8], w: u32, h: u32) -> Result<ProductEmbedding, Error> {
        let flat_emb = self.cnn.extract(image, w, h)?;

        // Split into components
        let euclidean = flat_emb[..256].to_vec();
        let hyperbolic = self.manifold.project_hyperbolic(&flat_emb[256..456]);
        let spherical = self.manifold.project_spherical(&flat_emb[456..]);

        Ok(ProductEmbedding { euclidean, hyperbolic, spherical })
    }

    pub fn distance(&self, a: &ProductEmbedding, b: &ProductEmbedding) -> f32 {
        self.manifold.distance(
            &[&a.euclidean[..], &a.hyperbolic[..], &a.spherical[..]].concat(),
            &[&b.euclidean[..], &b.hyperbolic[..], &b.spherical[..]].concat(),
        ).unwrap()
    }
}

CNN vs GNN: Complementary Roles

RuVector has both ruvector-cnn (proposed) and ruvector-gnn (existing). They serve different purposes:

Comparison

Aspect CNN (ruvector-cnn) GNN (ruvector-gnn)
Input Grid-structured (images) Graph-structured (relations)
Learns Spatial hierarchies Relational patterns
Invariance Translation invariance Permutation invariance
Key ops Convolution, pooling Message passing, aggregation
Use case Image → embedding Embedding → reasoning

Combined Pipeline

┌────────────────────────────────────────────────────────────────┐
│                    Multimodal Pipeline                         │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Images          ┌─────────┐                                   │
│  ─────────────▶  │   CNN   │ ──▶ Image Embeddings              │
│                  └─────────┘           │                        │
│                                        │                        │
│                                        ▼                        │
│                            ┌─────────────────────┐             │
│  Relationships             │  Image Similarity   │             │
│  (co-occurrence,     ───▶  │  Graph Construction │             │
│   captions, etc.)          └──────────┬──────────┘             │
│                                       │                        │
│                                       ▼                        │
│                            ┌─────────────────────┐             │
│                            │        GNN          │             │
│                            │  (ruvector-gnn)     │             │
│                            │  • Message passing  │             │
│                            │  • Cross-image      │             │
│                            │    reasoning        │             │
│                            └──────────┬──────────┘             │
│                                       │                        │
│                                       ▼                        │
│                            ┌─────────────────────┐             │
│                            │  Enhanced Embeddings│             │
│                            │  (context-aware)    │             │
│                            └─────────────────────┘             │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

Code Example: CNN + GNN

use ruvector_cnn::{MobileNetEmbedder, EmbeddingExtractor};
use ruvector_gnn::{GraphAttentionNetwork, GnnConfig};

pub struct MultimodalImageReasoner {
    cnn: MobileNetEmbedder,
    gnn: GraphAttentionNetwork,
}

impl MultimodalImageReasoner {
    /// Extract image embeddings and reason over relationships
    pub fn reason_over_images(
        &self,
        images: &[(&[u8], u32, u32)],
        edges: &[(usize, usize)],  // Image relationships
    ) -> Result<Vec<Vec<f32>>, Error> {
        // Step 1: CNN extracts initial embeddings
        let embeddings: Vec<_> = images
            .iter()
            .map(|(data, w, h)| self.cnn.extract(data, *w, *h))
            .collect::<Result<Vec<_>, _>>()?;

        // Step 2: Build graph from relationships
        let graph = self.build_graph(&embeddings, edges);

        // Step 3: GNN refines embeddings based on relationships
        let refined = self.gnn.forward(&graph)?;

        Ok(refined)
    }

    fn build_graph(&self, nodes: &[Vec<f32>], edges: &[(usize, usize)]) -> Graph {
        // ... construct graph structure
    }
}

Future Enhancements

Phase 1: Core Implementation

  • Basic MobileNet-V3 Small/Large in Rust
  • SIMD-optimized convolution (AVX2)
  • Integration with RuVector index
  • ONNX model loading support

Phase 2: Performance Optimization

  • INT8 quantization support
  • AVX-512 optimizations
  • Winograd convolution for 3x3 kernels
  • Multi-threaded batch processing

Phase 3: Training Support

  • InfoNCE/NT-Xent loss implementation
  • Data augmentation pipeline
  • Gradient computation
  • Model checkpointing

Phase 4: Advanced Features

  • ShuffleNet and EfficientNet backbones
  • Domain-specific fine-tuning
  • Hard negative mining
  • Multi-modal embedding support

Phase 5: SONA & Hyperbolic Integration

  • SONA trajectory learning for contrastive pairs
  • ReasoningBank pattern extraction from embeddings
  • Hyperbolic HNSW integration
  • Mixed-curvature product manifold support
  • CNN + GNN combined pipeline

References

  1. FAISS Library
  2. FAISS Documentation
  3. Embedding with FAISS and HuggingFace
  4. LangChain FAISS Integration
  5. SimCLR Paper
  6. MoCo Paper
  7. MobileNet-V3
  8. ShuffleNet
  9. tract-onnx