ruvector/examples/scipix/docs/07_IMAGE_PREPROCESSING.md
rUv 3ed8784b41 Plan Rust Mathpix clone for ruvector (#28)
* feat(mathpix): Add complete ruvector-mathpix OCR implementation

Comprehensive Rust-based Mathpix API clone with full SPARC methodology:

## Core Implementation (98 Rust files)
- OCR engine with ONNX Runtime inference
- Math/LaTeX parsing with 200+ symbol mappings
- Image preprocessing pipeline (rotation, deskew, CLAHE, thresholding)
- Multi-format output (LaTeX, MathML, MMD, AsciiMath, HTML)
- REST API server with Axum (Mathpix v3 compatible)
- CLI tool with batch processing
- WebAssembly bindings for browser use
- Performance optimizations (SIMD, parallel processing, caching)

## Documentation (35 markdown files)
- SPARC specification and architecture
- OCR research and Rust ecosystem analysis
- Benchmarking and optimization roadmaps
- Test strategy and security design
- lean-agentic integration guide

## Testing & CI/CD
- Unit tests with 80%+ coverage target
- Integration tests for full pipeline
- Criterion benchmark suite (7 benchmarks)
- GitHub Actions workflows (CI, release, security)

## Key Features
- Vector-based caching via ruvector-core
- lean-agentic agent orchestration support
- Multi-platform: Linux, macOS, Windows, WASM
- Performance targets: <100ms latency, 95%+ accuracy

Part of ruvector v0.1.16 ecosystem.

* fix(mathpix): Fix compilation errors and dependency conflicts

- Fix getrandom dependency: use wasm_js feature instead of js
- Remove duplicate WASM dependency declarations in Cargo.toml
- Add Clone derive to CLI argument structs (OcrArgs, BatchArgs, ServeArgs, ConfigArgs)
- Fix borrow-after-move error in CLI by borrowing command enum

The project now compiles successfully with only warnings (unused imports/variables).

* fix(mathpix): Add missing test dependencies and font assets

- Add dev-dependencies: predicates, assert_cmd, ab_glyph, tokio[process], reqwest[blocking]
- Download and add DejaVuSans.ttf font for test image generation
- Update tests/common/images.rs to use ab_glyph instead of rusttype (imageproc 0.25 compatibility)

* chore: Update Cargo.lock with new dev-dependencies

* security(mathpix): Fix critical authentication and remove mock implementations

SECURITY FIXES:
- Replace insecure credential validation that accepted ANY non-empty credentials
- Implement proper SHA-256 hashed API key storage in AppState
- Add constant-time comparison to prevent timing attacks
- Add configurable auth_enabled flag for development vs production

API IMPROVEMENTS:
- Remove mock OCR responses - now returns 503 with setup instructions
- Add service_unavailable and not_implemented error responses
- Convert document endpoint properly returns 501 Not Implemented
- Usage/history endpoints now clearly indicate no database configured

OCR ENGINE:
- Remove mock detection/recognition - now returns proper errors
- Add is_ready() check for model availability
- Implement real image preprocessing (decode, resize, normalize)
- Add clear error messages directing users to model setup docs

These changes ensure the API fails safely and informs users how to
properly configure the service rather than returning fake data.

* fix(mathpix): Fix test module organization and circular dependencies

- Create common/types.rs for shared test types (OutputFormat, ProcessingOptions, etc.)
- Update server.rs to use common types instead of circular imports
- Add #[cfg(feature = "math")] to math_tests.rs for conditional compilation
- Fix CLI serve test to use std::env::var instead of env! macro
- Remove duplicate type definitions from pipeline_tests.rs and cache_tests.rs

* feat(mathpix): Implement real ONNX inference with ort 2.0 API

- Update models.rs to load actual ONNX sessions via ort crate
- Add is_loaded() method to check if model session is available
- Implement run_onnx_detection, run_onnx_recognition, run_onnx_math_recognition
- Use ndarray + Tensor::from_array for proper tensor creation
- Parse detection output with bounding box extraction and region cropping
- Properly handle softmax for confidence scores
- All inference methods return proper errors when models unavailable

* feat(scipix): Rebrand mathpix to scipix with comprehensive documentation

- Rename examples/mathpix folder to examples/scipix
- Update package name from ruvector-mathpix to ruvector-scipix
- Update binary names: mathpix-cli -> scipix-cli, mathpix-server -> scipix-server
- Update library name: ruvector_mathpix -> ruvector_scipix
- Update all internal type names: MathpixError -> ScipixError, MathpixWasm -> ScipixWasm
- Update all imports and module references throughout codebase
- Update Makefile, scripts, and configuration files
- Create comprehensive README.md with:
  - Better introduction and feature overview
  - Quick start guide (30-second setup)
  - Six step-by-step tutorials covering all use cases
  - Complete API reference with request/response examples
  - Configuration options and environment variables
  - Project structure documentation
  - Performance benchmarks and optimization tips
  - Troubleshooting guide

* perf(scipix): Add SIMD-optimized preprocessing with 4.4x pipeline speedup

- Add SIMD-accelerated bilinear resize for 1.5x faster image resizing
- Add fast area average resize for large image downscaling
- Implement parallel SIMD resize using rayon for HD images
- Add comprehensive benchmark binary comparing original vs SIMD performance

Performance improvements:
- SIMD Grayscale: 4.22x speedup (426µs → 101µs)
- SIMD Resize: 1.51x speedup (3.98ms → 2.63ms)
- Full Pipeline: 4.39x speedup (2.16ms → 0.49ms)

State-of-the-art comparison:
- Estimated latency: 55ms @ 18 images/sec
- Comparable to PaddleOCR (~50ms, ~20 img/s)
- Faster than Tesseract (~200ms) and EasyOCR (~100ms)

* chore: Ignore generated test images

* feat(scipix): Add MCP server for AI integration

Implement Model Context Protocol (MCP) 2025-11 server to expose OCR
capabilities as tools for AI hosts like Claude.

Available MCP tools:
- ocr_image: Process image files with OCR
- ocr_base64: Process base64-encoded images
- batch_ocr: Batch process multiple images
- preprocess_image: Apply image preprocessing
- latex_to_mathml: Convert LaTeX to MathML
- benchmark_performance: Run performance benchmarks

Usage:
  scipix-cli mcp              # Start MCP server
  scipix-cli mcp --debug      # Enable debug logging

Claude Code integration:
  claude mcp add scipix -- scipix-cli mcp

* docs(mcp): Add Anthropic best practices for tool definitions

Update MCP tool descriptions following guidelines from:
https://www.anthropic.com/engineering/advanced-tool-use

Improvements:
- Add "WHEN TO USE" guidance for each tool
- Include concrete usage EXAMPLES with JSON
- Add RETURNS section describing output format
- Document WORKFLOW patterns (e.g., preprocess -> ocr)
- Improve parameter descriptions and constraints

This improves tool selection accuracy from ~72% to ~90% based on
Anthropic's benchmarks for complex parameter handling.

* feat(scipix): Add doctor command for environment optimization

Add a comprehensive `doctor` command to the SciPix CLI that:
- Detects CPU cores, SIMD capabilities (SSE2/AVX/AVX2/AVX-512/NEON)
- Analyzes memory availability and per-core allocation
- Checks dependencies (ONNX Runtime, OpenSSL)
- Validates configuration files and environment variables
- Tests network port availability
- Generates optimal configuration recommendations
- Supports --fix to auto-create configuration files
- Outputs in human-readable or JSON format
- Allows filtering by check category (cpu, memory, config, deps, network)

* fix(scipix): Add required-features for OCR-dependent examples

- Add required-features = ["ocr"] to batch_processing and streaming examples
- Fix imports to use ruvector_scipix::ocr::OcrEngine instead of root export
- Update example documentation to show --features ocr flag

This ensures examples that depend on the OCR feature won't fail to compile
when the feature is not enabled.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(scipix): Fix all 22 compiler warnings

Remove unused imports:
- tokio::sync::mpsc from mcp.rs
- uuid::Uuid from handlers.rs
- ScipixError from cache/mod.rs
- PreprocessError from pipeline.rs and segmentation.rs
- BoundingBox and WordData from json.rs
- crate::error::Result from parallel.rs
- mpsc from batch.rs

Fix unused variables:
- Rename idx to _idx in batch.rs
- Rename image to _image in segmentation.rs
- Rename pixels to _pixels, y_frac to _y_frac, y_frac_inv to _y_frac_inv in simd.rs
- Fix pixel_idx variable name (was using undefined idx)

Mark intentionally unused fields with #[allow(dead_code)]:
- jsonrpc field in JsonRpcRequest
- ToolResult and ContentBlock structs
- models_dir in McpServer
- style in StyledLaTeXFormatter
- include_styles in DocxFormatter
- max_size in BufferPool

Remove unnecessary mut from merge_overlapping_regions parameter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs(scipix): Update README and Cargo.toml for crates.io publishing

- Completely rewrite README.md with comprehensive documentation:
  - crates.io badges and metadata
  - Installation guide (cargo add, from source, pre-built binaries)
  - Feature flags documentation
  - SDK usage examples (basic, preprocessing, OCR, math, caching)
  - CLI reference for all commands (ocr, batch, serve, config, doctor, mcp)
  - 6 tutorials covering basic OCR to MCP integration
  - API reference for REST endpoints
  - Configuration options (env vars and TOML)
  - Performance benchmarks

- Update Cargo.toml with crates.io publishing metadata:
  - description, readme, keywords, categories
  - documentation and homepage URLs
  - rust-version requirement (1.77)
  - exclude patterns for unnecessary files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs(scipix): Improve introduction and SEO optimize crate metadata

README improvements:
- Enhanced title for better search visibility
- Added downloads and CI badges
- Expanded "Why SciPix?" section with use cases
- Added feature comparison table with detailed descriptions
- Added performance benchmarks vs Tesseract/Mathpix
- Better keyword-rich descriptions for discoverability

Cargo.toml SEO optimization:
- Expanded description with key search terms (LaTeX, MathML, ONNX, GPU)
- Updated keywords for crates.io search: ocr, latex, mathml, scientific-computing, image-recognition

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: Add SciPix OCR crate to root README

- Add Scientific OCR (SciPix) section to Crates table
- Include brief description of capabilities: LaTeX/MathML extraction,
  ONNX inference, SIMD preprocessing, REST API, CLI, MCP integration
- Add crates.io badge and quick usage examples

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-29 17:34:47 -05:00

41 KiB

Image Preprocessing Pipeline for Optimal OCR Performance

Overview

This document details the image preprocessing pipeline optimized for OCR performance in the ruvector-scipix project. The pipeline transforms raw images into clean, normalized inputs suitable for mathematical OCR models.

Architecture

Raw Image → Load/Decode → Enhancement → Geometric Correction →
Resolution Normalization → Text Region Detection → OCR-Ready Output

1. Image Loading and Decoding

Format Support

Support all common image formats with efficient decoding strategies:

use image::{DynamicImage, ImageFormat, ImageError, GenericImageView};
use std::path::Path;
use std::io::BufReader;
use std::fs::File;

pub struct ImageLoader {
    max_dimension: u32,
    supported_formats: Vec<ImageFormat>,
}

impl ImageLoader {
    pub fn new() -> Self {
        Self {
            max_dimension: 8192, // Prevent memory exhaustion
            supported_formats: vec![
                ImageFormat::Jpeg,
                ImageFormat::Png,
                ImageFormat::Tiff,
                ImageFormat::WebP,
                ImageFormat::Bmp,
                ImageFormat::Gif,
            ],
        }
    }

    /// Load image with memory-efficient strategy
    pub fn load<P: AsRef<Path>>(&self, path: P) -> Result<DynamicImage, ImageError> {
        let file = File::open(path)?;
        let reader = BufReader::new(file);

        // Load image with format detection
        let img = image::load(reader, ImageFormat::from_path(path.as_ref())?)?;

        // Validate dimensions to prevent memory issues
        let (width, height) = img.dimensions();
        if width > self.max_dimension || height > self.max_dimension {
            return Err(ImageError::Limits(image::error::LimitError::from_kind(
                image::error::LimitErrorKind::DimensionError
            )));
        }

        Ok(img)
    }

    /// Load image from bytes with format hint
    pub fn load_from_memory(&self, buffer: &[u8], format: ImageFormat) -> Result<DynamicImage, ImageError> {
        image::load_from_memory_with_format(buffer, format)
    }

    /// Load with progressive decoding for large images
    pub fn load_progressive<P: AsRef<Path>>(&self, path: P) -> Result<DynamicImage, ImageError> {
        let file = File::open(path.as_ref())?;
        let reader = BufReader::with_capacity(8192, file);

        // For JPEG, enable progressive decoding
        if let Some(ext) = path.as_ref().extension() {
            if ext.to_str().unwrap_or("").to_lowercase() == "jpg" ||
               ext.to_str().unwrap_or("").to_lowercase() == "jpeg" {
                return image::load(reader, ImageFormat::Jpeg);
            }
        }

        image::load(reader, ImageFormat::from_path(path)?)
    }
}

/// Zero-copy image loading for supported formats
pub fn load_zero_copy<P: AsRef<Path>>(path: P) -> Result<DynamicImage, ImageError> {
    // For formats that support memory mapping
    image::open(path)
}

Memory-Efficient Loading Strategies

use image::{GenericImageView, ImageBuffer, Rgba};

pub struct StreamingLoader {
    chunk_size: usize,
}

impl StreamingLoader {
    /// Load large images in chunks to minimize memory footprint
    pub fn load_chunked<P: AsRef<Path>>(
        &self,
        path: P,
        callback: impl Fn(&ImageBuffer<Rgba<u8>, Vec<u8>>) -> Result<(), ImageError>
    ) -> Result<(), ImageError> {
        let img = image::open(path)?;
        let (width, height) = img.dimensions();

        let chunk_height = (self.chunk_size as u32).min(height);

        for y in (0..height).step_by(chunk_height as usize) {
            let h = chunk_height.min(height - y);
            let chunk = img.crop_imm(0, y, width, h).to_rgba8();
            callback(&chunk)?;
        }

        Ok(())
    }
}

2. Image Enhancement

Contrast Adjustment (CLAHE)

use imageproc::contrast::adaptive_threshold;
use imageproc::filter::{gaussian_blur_f32, bilateral_filter};
use image::{GrayImage, Luma};

pub struct ImageEnhancer {
    clahe_clip_limit: f32,
    clahe_tile_size: (u32, u32),
}

impl ImageEnhancer {
    pub fn new() -> Self {
        Self {
            clahe_clip_limit: 2.0,
            clahe_tile_size: (8, 8),
        }
    }

    /// Apply Contrast Limited Adaptive Histogram Equalization
    pub fn apply_clahe(&self, img: &GrayImage) -> GrayImage {
        // Simplified CLAHE implementation
        // For production, use a dedicated CLAHE library
        let (width, height) = img.dimensions();
        let (tile_w, tile_h) = self.clahe_tile_size;

        let mut output = GrayImage::new(width, height);

        // Process each tile
        for tile_y in (0..height).step_by(tile_h as usize) {
            for tile_x in (0..width).step_by(tile_w as usize) {
                let w = tile_w.min(width - tile_x);
                let h = tile_h.min(height - tile_y);

                // Extract tile
                let tile = imageproc::rect::Rect::at(tile_x as i32, tile_y as i32)
                    .of_size(w, h);

                // Compute histogram and equalize with clipping
                let equalized = self.equalize_tile_with_clip(img, tile);

                // Copy back to output
                for y in 0..h {
                    for x in 0..w {
                        output.put_pixel(
                            tile_x + x,
                            tile_y + y,
                            equalized[(y * w + x) as usize]
                        );
                    }
                }
            }
        }

        output
    }

    fn equalize_tile_with_clip(&self, img: &GrayImage, rect: imageproc::rect::Rect) -> Vec<Luma<u8>> {
        let mut histogram = [0u32; 256];
        let mut pixels = Vec::new();

        // Build histogram
        for y in rect.top()..(rect.top() + rect.height() as i32) {
            for x in rect.left()..(rect.left() + rect.width() as i32) {
                if x >= 0 && y >= 0 {
                    let pixel = img.get_pixel(x as u32, y as u32)[0];
                    histogram[pixel as usize] += 1;
                    pixels.push(Luma([pixel]));
                }
            }
        }

        // Apply clip limit
        let total_pixels = pixels.len() as u32;
        let clip_limit = ((self.clahe_clip_limit * total_pixels as f32) / 256.0) as u32;

        let mut clipped_total = 0u32;
        for count in &mut histogram {
            if *count > clip_limit {
                clipped_total += *count - clip_limit;
                *count = clip_limit;
            }
        }

        // Redistribute clipped pixels
        let redistribute = clipped_total / 256;
        for count in &mut histogram {
            *count += redistribute;
        }

        // Build CDF
        let mut cdf = [0u32; 256];
        cdf[0] = histogram[0];
        for i in 1..256 {
            cdf[i] = cdf[i - 1] + histogram[i];
        }

        // Normalize and map pixels
        let cdf_min = cdf.iter().find(|&&x| x > 0).copied().unwrap_or(0);
        let cdf_range = total_pixels - cdf_min;

        pixels.into_iter().map(|pixel| {
            let old_val = pixel[0] as usize;
            let new_val = if cdf_range > 0 {
                (((cdf[old_val] - cdf_min) as f32 / cdf_range as f32) * 255.0) as u8
            } else {
                pixel[0]
            };
            Luma([new_val])
        }).collect()
    }
}

Noise Reduction

use imageproc::filter::{gaussian_blur_f32, median_filter};

pub struct NoiseReducer;

impl NoiseReducer {
    /// Apply Gaussian blur for noise reduction
    pub fn gaussian_denoise(img: &GrayImage, sigma: f32) -> GrayImage {
        gaussian_blur_f32(img, sigma)
    }

    /// Apply bilateral filter (edge-preserving)
    pub fn bilateral_denoise(
        img: &GrayImage,
        spatial_sigma: f32,
        range_sigma: f32
    ) -> GrayImage {
        let (width, height) = img.dimensions();
        let mut output = GrayImage::new(width, height);

        let kernel_radius = (3.0 * spatial_sigma).ceil() as i32;

        for y in 0..height {
            for x in 0..width {
                let center_val = img.get_pixel(x, y)[0] as f32;
                let mut sum_weights = 0.0f32;
                let mut sum_values = 0.0f32;

                for dy in -kernel_radius..=kernel_radius {
                    for dx in -kernel_radius..=kernel_radius {
                        let nx = (x as i32 + dx).clamp(0, width as i32 - 1) as u32;
                        let ny = (y as i32 + dy).clamp(0, height as i32 - 1) as u32;

                        let neighbor_val = img.get_pixel(nx, ny)[0] as f32;

                        // Spatial weight
                        let spatial_dist = ((dx * dx + dy * dy) as f32).sqrt();
                        let spatial_weight = (-spatial_dist * spatial_dist /
                            (2.0 * spatial_sigma * spatial_sigma)).exp();

                        // Range weight
                        let range_dist = (center_val - neighbor_val).abs();
                        let range_weight = (-range_dist * range_dist /
                            (2.0 * range_sigma * range_sigma)).exp();

                        let weight = spatial_weight * range_weight;
                        sum_weights += weight;
                        sum_values += weight * neighbor_val;
                    }
                }

                let filtered_val = (sum_values / sum_weights).round() as u8;
                output.put_pixel(x, y, Luma([filtered_val]));
            }
        }

        output
    }

    /// Apply median filter for impulse noise
    pub fn median_denoise(img: &GrayImage, kernel_size: u32) -> GrayImage {
        median_filter(img, kernel_size, kernel_size)
    }
}

Sharpening Filters

use imageproc::filter::sharpen3x3;

pub struct Sharpener;

impl Sharpener {
    /// Apply unsharp mask sharpening
    pub fn unsharp_mask(img: &GrayImage, sigma: f32, amount: f32) -> GrayImage {
        let blurred = gaussian_blur_f32(img, sigma);
        let (width, height) = img.dimensions();
        let mut output = GrayImage::new(width, height);

        for y in 0..height {
            for x in 0..width {
                let original = img.get_pixel(x, y)[0] as f32;
                let blur = blurred.get_pixel(x, y)[0] as f32;
                let sharpened = original + amount * (original - blur);
                output.put_pixel(x, y, Luma([sharpened.clamp(0.0, 255.0) as u8]));
            }
        }

        output
    }

    /// Apply 3x3 sharpening kernel
    pub fn sharpen_3x3(img: &GrayImage) -> GrayImage {
        sharpen3x3(img)
    }

    /// Apply Laplacian sharpening
    pub fn laplacian_sharpen(img: &GrayImage, strength: f32) -> GrayImage {
        let (width, height) = img.dimensions();
        let mut output = GrayImage::new(width, height);

        // Laplacian kernel
        let kernel = [
            [0, -1, 0],
            [-1, 4, -1],
            [0, -1, 0],
        ];

        for y in 1..height - 1 {
            for x in 1..width - 1 {
                let mut laplacian = 0.0f32;

                for ky in 0..3 {
                    for kx in 0..3 {
                        let px = img.get_pixel(x + kx - 1, y + ky - 1)[0] as f32;
                        laplacian += px * kernel[ky][kx] as f32;
                    }
                }

                let original = img.get_pixel(x, y)[0] as f32;
                let sharpened = original + strength * laplacian;
                output.put_pixel(x, y, Luma([sharpened.clamp(0.0, 255.0) as u8]));
            }
        }

        output
    }
}

Binarization

use imageproc::contrast::{otsu_level, threshold, ThresholdType};

pub struct Binarizer;

impl Binarizer {
    /// Apply Otsu's automatic thresholding
    pub fn otsu_binarize(img: &GrayImage) -> GrayImage {
        let threshold_value = otsu_level(img);
        threshold(img, threshold_value, ThresholdType::Binary)
    }

    /// Apply adaptive thresholding
    pub fn adaptive_binarize(img: &GrayImage, block_size: u32, c: i32) -> GrayImage {
        adaptive_threshold(img, block_size)
    }

    /// Apply Sauvola's local thresholding (good for varied illumination)
    pub fn sauvola_binarize(img: &GrayImage, window_size: u32, k: f32) -> GrayImage {
        let (width, height) = img.dimensions();
        let mut output = GrayImage::new(width, height);
        let half_window = (window_size / 2) as i32;

        for y in 0..height {
            for x in 0..width {
                // Compute local mean and standard deviation
                let mut sum = 0.0f32;
                let mut sum_sq = 0.0f32;
                let mut count = 0u32;

                for dy in -half_window..=half_window {
                    for dx in -half_window..=half_window {
                        let nx = (x as i32 + dx).clamp(0, width as i32 - 1) as u32;
                        let ny = (y as i32 + dy).clamp(0, height as i32 - 1) as u32;
                        let val = img.get_pixel(nx, ny)[0] as f32;
                        sum += val;
                        sum_sq += val * val;
                        count += 1;
                    }
                }

                let mean = sum / count as f32;
                let variance = (sum_sq / count as f32) - (mean * mean);
                let std_dev = variance.sqrt();

                // Sauvola threshold
                let threshold = mean * (1.0 + k * ((std_dev / 128.0) - 1.0));
                let pixel_val = img.get_pixel(x, y)[0] as f32;

                let binary = if pixel_val > threshold { 255 } else { 0 };
                output.put_pixel(x, y, Luma([binary]));
            }
        }

        output
    }
}

3. Geometric Corrections

Auto-Rotation Detection

use imageproc::geometric_transformations::{rotate_about_center, Interpolation};
use std::f32::consts::PI;

pub struct RotationDetector;

impl RotationDetector {
    /// Detect rotation angle using projection profiles
    pub fn detect_rotation_angle(img: &GrayImage) -> f32 {
        let angles = [-90.0, -45.0, -30.0, -15.0, 0.0, 15.0, 30.0, 45.0, 90.0, 180.0];
        let mut max_variance = 0.0f32;
        let mut best_angle = 0.0f32;

        for &angle in &angles {
            let rotated = Self::rotate_image(img, angle);
            let variance = Self::compute_horizontal_projection_variance(&rotated);

            if variance > max_variance {
                max_variance = variance;
                best_angle = angle;
            }
        }

        best_angle
    }

    /// Compute variance of horizontal projection profile
    fn compute_horizontal_projection_variance(img: &GrayImage) -> f32 {
        let (width, height) = img.dimensions();
        let mut projections = vec![0u32; height as usize];

        for y in 0..height {
            let mut sum = 0u32;
            for x in 0..width {
                sum += img.get_pixel(x, y)[0] as u32;
            }
            projections[y as usize] = sum;
        }

        // Compute variance
        let mean = projections.iter().sum::<u32>() as f32 / height as f32;
        let variance = projections.iter()
            .map(|&x| (x as f32 - mean).powi(2))
            .sum::<f32>() / height as f32;

        variance
    }

    /// Rotate image by specified angle
    pub fn rotate_image(img: &GrayImage, angle_degrees: f32) -> GrayImage {
        let (width, height) = img.dimensions();
        let center = ((width / 2) as f32, (height / 2) as f32);
        let angle_rad = angle_degrees * PI / 180.0;

        rotate_about_center(img, center, angle_rad, Interpolation::Bilinear, Luma([255]))
    }

    /// Detect and correct common rotation angles (0, 90, -90, 180)
    pub fn auto_correct_rotation(img: &GrayImage) -> (GrayImage, f32) {
        let angle = Self::detect_rotation_angle(img);
        let corrected = Self::rotate_image(img, -angle);
        (corrected, angle)
    }
}

Deskewing Algorithms

use imageproc::hough::detect_lines;

pub struct Deskewer;

impl Deskewer {
    /// Detect skew angle using Hough transform
    pub fn detect_skew_hough(img: &GrayImage) -> f32 {
        // Apply edge detection first
        let edges = imageproc::edges::canny(img, 50.0, 100.0);

        // Use Hough transform to detect dominant lines
        let lines = detect_lines(&edges, 1, PI / 180.0, 50);

        if lines.is_empty() {
            return 0.0;
        }

        // Extract angles from detected lines
        let mut angles: Vec<f32> = lines.iter()
            .map(|line| {
                let theta = line.angle_in_degrees();
                // Normalize to [-45, 45] range
                if theta > 45.0 {
                    theta - 90.0
                } else if theta < -45.0 {
                    theta + 90.0
                } else {
                    theta
                }
            })
            .collect();

        // Find median angle
        angles.sort_by(|a, b| a.partial_cmp(b).unwrap());
        let median_angle = if angles.len() % 2 == 0 {
            (angles[angles.len() / 2 - 1] + angles[angles.len() / 2]) / 2.0
        } else {
            angles[angles.len() / 2]
        };

        median_angle
    }

    /// Detect skew using projection profiles
    pub fn detect_skew_projection(img: &GrayImage) -> f32 {
        let test_angles: Vec<f32> = (-45..=45).step_by(1).map(|x| x as f32).collect();
        let mut max_variance = 0.0f32;
        let mut best_angle = 0.0f32;

        for angle in test_angles {
            let rotated = RotationDetector::rotate_image(img, angle);
            let variance = Self::compute_projection_variance(&rotated);

            if variance > max_variance {
                max_variance = variance;
                best_angle = angle;
            }
        }

        -best_angle // Negate to get correction angle
    }

    fn compute_projection_variance(img: &GrayImage) -> f32 {
        let (width, height) = img.dimensions();
        let mut row_sums = vec![0u32; height as usize];

        for y in 0..height {
            for x in 0..width {
                row_sums[y as usize] += img.get_pixel(x, y)[0] as u32;
            }
        }

        let mean = row_sums.iter().sum::<u32>() as f32 / height as f32;
        row_sums.iter()
            .map(|&x| (x as f32 - mean).powi(2))
            .sum::<f32>() / height as f32
    }

    /// Apply deskewing correction
    pub fn deskew(img: &GrayImage) -> (GrayImage, f32) {
        let angle = Self::detect_skew_hough(img);
        let corrected = RotationDetector::rotate_image(img, -angle);
        (corrected, angle)
    }
}

Perspective Correction

use imageproc::geometric_transformations::warp;
use imageproc::geometric_transformations::Projection;

pub struct PerspectiveCorrector;

impl PerspectiveCorrector {
    /// Detect document corners
    pub fn detect_corners(img: &GrayImage) -> Option<[(f32, f32); 4]> {
        // Apply edge detection
        let edges = imageproc::edges::canny(img, 50.0, 150.0);

        // Find contours (simplified - use a contour detection library in production)
        // For now, assume corners are provided or detected via another method

        // Return corners in order: top-left, top-right, bottom-right, bottom-left
        None // Placeholder
    }

    /// Apply perspective transformation
    pub fn correct_perspective(
        img: &GrayImage,
        src_corners: [(f32, f32); 4],
        output_width: u32,
        output_height: u32
    ) -> GrayImage {
        // Define destination corners (rectangle)
        let dst_corners = [
            (0.0, 0.0),
            (output_width as f32, 0.0),
            (output_width as f32, output_height as f32),
            (0.0, output_height as f32),
        ];

        // Compute perspective transformation matrix
        let projection = Self::compute_perspective_transform(&src_corners, &dst_corners);

        // Apply transformation
        warp(img, &projection, Interpolation::Bilinear, Luma([255]))
    }

    /// Compute perspective transformation matrix
    fn compute_perspective_transform(
        src: &[(f32, f32); 4],
        dst: &[(f32, f32); 4]
    ) -> Projection {
        // Implement perspective matrix computation
        // Using homography estimation (Direct Linear Transform)

        // Placeholder - use a linear algebra library like nalgebra
        Projection::translate(0.0, 0.0)
    }
}

4. Resolution Handling

Optimal Resolution for OCR

use image::imageops::{FilterType, resize};

pub struct ResolutionHandler {
    target_size: (u32, u32),
    min_size: (u32, u32),
    max_size: (u32, u32),
}

impl ResolutionHandler {
    pub fn new() -> Self {
        Self {
            target_size: (384, 384), // Optimal for most OCR models
            min_size: (224, 224),
            max_size: (640, 640),
        }
    }

    /// Resize image to optimal resolution
    pub fn normalize_resolution(&self, img: &DynamicImage) -> DynamicImage {
        let (width, height) = img.dimensions();

        // Check if resize is needed
        if width >= self.min_size.0 && width <= self.max_size.0 &&
           height >= self.min_size.1 && height <= self.max_size.1 {
            return img.clone();
        }

        // Compute scaling factor while preserving aspect ratio
        let scale = self.compute_scale_factor(width, height);
        let new_width = (width as f32 * scale) as u32;
        let new_height = (height as f32 * scale) as u32;

        // Use high-quality filter for upscaling, faster filter for downscaling
        let filter = if scale > 1.0 {
            FilterType::Lanczos3
        } else {
            FilterType::Triangle
        };

        DynamicImage::ImageRgba8(resize(img, new_width, new_height, filter))
    }

    /// Compute optimal scale factor
    fn compute_scale_factor(&self, width: u32, height: u32) -> f32 {
        let target_w = self.target_size.0 as f32;
        let target_h = self.target_size.1 as f32;

        let scale_w = target_w / width as f32;
        let scale_h = target_h / height as f32;

        // Use the minimum scale to ensure entire image fits
        scale_w.min(scale_h)
    }

    /// Upscale with super-resolution (placeholder for SR models)
    pub fn upscale_sr(&self, img: &DynamicImage, factor: u32) -> DynamicImage {
        // Placeholder for super-resolution model
        // In production, integrate with ONNX/TensorFlow models
        let (width, height) = img.dimensions();
        DynamicImage::ImageRgba8(resize(
            img,
            width * factor,
            height * factor,
            FilterType::Lanczos3
        ))
    }

    /// Preserve aspect ratio with padding
    pub fn resize_with_padding(&self, img: &DynamicImage) -> DynamicImage {
        let (width, height) = img.dimensions();
        let (target_w, target_h) = self.target_size;

        let scale = (target_w as f32 / width as f32).min(target_h as f32 / height as f32);
        let new_width = (width as f32 * scale) as u32;
        let new_height = (height as f32 * scale) as u32;

        // Resize image
        let resized = resize(img, new_width, new_height, FilterType::Lanczos3);

        // Create padded image
        let mut padded = DynamicImage::new_rgba8(target_w, target_h);
        let x_offset = (target_w - new_width) / 2;
        let y_offset = (target_h - new_height) / 2;

        image::imageops::overlay(&mut padded, &resized, x_offset as i64, y_offset as i64);

        padded
    }
}

5. Text Region Detection

Connected Component Analysis

use imageproc::region_labelling::{connected_components, Connectivity};
use imageproc::rect::Rect;

pub struct TextDetector {
    min_component_size: u32,
    max_component_size: u32,
}

impl TextDetector {
    pub fn new() -> Self {
        Self {
            min_component_size: 10,
            max_component_size: 10000,
        }
    }

    /// Detect text regions using connected components
    pub fn detect_text_regions(&self, binary_img: &GrayImage) -> Vec<Rect> {
        let labeled = connected_components(binary_img, Connectivity::Eight, Luma([0]));
        let mut components = std::collections::HashMap::new();

        // Group pixels by label
        for y in 0..labeled.height() {
            for x in 0..labeled.width() {
                let label = labeled.get_pixel(x, y)[0];
                if label > 0 {
                    components.entry(label)
                        .or_insert_with(Vec::new)
                        .push((x, y));
                }
            }
        }

        // Compute bounding boxes
        let mut bboxes = Vec::new();
        for (_, pixels) in components {
            if pixels.len() < self.min_component_size as usize ||
               pixels.len() > self.max_component_size as usize {
                continue;
            }

            let min_x = pixels.iter().map(|(x, _)| x).min().unwrap();
            let max_x = pixels.iter().map(|(x, _)| x).max().unwrap();
            let min_y = pixels.iter().map(|(_, y)| y).min().unwrap();
            let max_y = pixels.iter().map(|(_, y)| y).max().unwrap();

            bboxes.push(Rect::at(*min_x as i32, *min_y as i32)
                .of_size((max_x - min_x + 1), (max_y - min_y + 1)));
        }

        bboxes
    }

    /// Merge nearby bounding boxes (for text lines)
    pub fn merge_bboxes(&self, bboxes: Vec<Rect>, max_distance: u32) -> Vec<Rect> {
        if bboxes.is_empty() {
            return Vec::new();
        }

        let mut merged = Vec::new();
        let mut used = vec![false; bboxes.len()];

        for i in 0..bboxes.len() {
            if used[i] {
                continue;
            }

            let mut current = bboxes[i];
            used[i] = true;

            // Try to merge with other boxes
            let mut changed = true;
            while changed {
                changed = false;
                for j in 0..bboxes.len() {
                    if used[j] {
                        continue;
                    }

                    if Self::boxes_close(&current, &bboxes[j], max_distance) {
                        current = Self::union_rect(&current, &bboxes[j]);
                        used[j] = true;
                        changed = true;
                    }
                }
            }

            merged.push(current);
        }

        merged
    }

    fn boxes_close(a: &Rect, b: &Rect, max_distance: u32) -> bool {
        let dx = if a.left() > b.right() {
            a.left() - b.right()
        } else if b.left() > a.right() {
            b.left() - a.right()
        } else {
            0
        };

        let dy = if a.top() > b.bottom() {
            a.top() - b.bottom()
        } else if b.top() > a.bottom() {
            b.top() - a.bottom()
        } else {
            0
        };

        (dx * dx + dy * dy) <= (max_distance * max_distance) as i32
    }

    fn union_rect(a: &Rect, b: &Rect) -> Rect {
        let left = a.left().min(b.left());
        let top = a.top().min(b.top());
        let right = a.right().max(b.right());
        let bottom = a.bottom().max(b.bottom());

        Rect::at(left, top).of_size((right - left) as u32, (bottom - top) as u32)
    }
}

Line Segmentation

pub struct LineSegmenter;

impl LineSegmenter {
    /// Segment image into text lines using horizontal projection
    pub fn segment_lines(img: &GrayImage) -> Vec<(u32, u32)> {
        let (_, height) = img.dimensions();
        let projection = Self::horizontal_projection(img);

        // Find valleys in projection (gaps between lines)
        let mut lines = Vec::new();
        let mut in_line = false;
        let mut line_start = 0u32;

        let threshold = Self::compute_threshold(&projection);

        for (y, &val) in projection.iter().enumerate() {
            if val > threshold && !in_line {
                line_start = y as u32;
                in_line = true;
            } else if val <= threshold && in_line {
                lines.push((line_start, y as u32));
                in_line = false;
            }
        }

        // Handle last line
        if in_line {
            lines.push((line_start, height));
        }

        lines
    }

    fn horizontal_projection(img: &GrayImage) -> Vec<u32> {
        let (width, height) = img.dimensions();
        let mut projection = vec![0u32; height as usize];

        for y in 0..height {
            let mut sum = 0u32;
            for x in 0..width {
                // Count black pixels (text)
                if img.get_pixel(x, y)[0] < 128 {
                    sum += 1;
                }
            }
            projection[y as usize] = sum;
        }

        projection
    }

    fn compute_threshold(projection: &[u32]) -> u32 {
        let max_val = projection.iter().max().copied().unwrap_or(0);
        max_val / 10 // 10% of maximum
    }
}

6. PDF/Document Processing

PDF Rasterization

// Note: Requires pdfium-render or pdf crate
// This is a conceptual implementation

use std::path::Path;

pub struct PdfProcessor {
    dpi: u32,
    page_limit: Option<usize>,
}

impl PdfProcessor {
    pub fn new() -> Self {
        Self {
            dpi: 300, // Standard DPI for OCR
            page_limit: Some(100), // Prevent processing huge PDFs
        }
    }

    /// Rasterize PDF to images
    /// Note: Requires pdfium-render crate
    pub fn rasterize_pdf<P: AsRef<Path>>(
        &self,
        pdf_path: P
    ) -> Result<Vec<DynamicImage>, Box<dyn std::error::Error>> {
        // Placeholder - actual implementation requires pdfium-render
        //
        // Example with pdfium-render:
        // let pdfium = Pdfium::new(...)?;
        // let document = pdfium.load_pdf_from_file(pdf_path, None)?;
        //
        // let mut images = Vec::new();
        // for page_index in 0..document.pages().len() {
        //     let page = document.pages().get(page_index)?;
        //     let render_config = PdfRenderConfig::new()
        //         .set_target_width(((page.width() * self.dpi as f32) / 72.0) as u32)
        //         .set_target_height(((page.height() * self.dpi as f32) / 72.0) as u32);
        //
        //     let bitmap = page.render_with_config(&render_config)?;
        //     let image = bitmap.as_image();
        //     images.push(image);
        // }

        Ok(Vec::new())
    }

    /// Extract embedded images from PDF
    pub fn extract_images<P: AsRef<Path>>(
        &self,
        pdf_path: P
    ) -> Result<Vec<DynamicImage>, Box<dyn std::error::Error>> {
        // Placeholder for image extraction
        // Actual implementation requires PDF parsing
        Ok(Vec::new())
    }

    /// Process multi-page PDF
    pub fn process_multipage<P: AsRef<Path>>(
        &self,
        pdf_path: P,
        processor: impl Fn(DynamicImage) -> Result<(), Box<dyn std::error::Error>>
    ) -> Result<(), Box<dyn std::error::Error>> {
        let images = self.rasterize_pdf(pdf_path)?;

        for (i, img) in images.into_iter().enumerate() {
            if let Some(limit) = self.page_limit {
                if i >= limit {
                    break;
                }
            }
            processor(img)?;
        }

        Ok(())
    }
}

7. Memory and Performance Optimization

Zero-Copy Operations

use std::sync::Arc;
use image::ImageBuffer;

pub struct OptimizedProcessor {
    use_zero_copy: bool,
}

impl OptimizedProcessor {
    /// Process image with zero-copy where possible
    pub fn process_zero_copy(img: Arc<DynamicImage>) -> Arc<DynamicImage> {
        // Use Arc to share image data without copying
        // Apply non-mutating operations
        img
    }

    /// Use image views instead of copying
    pub fn process_view<'a>(img: &'a DynamicImage) -> &'a DynamicImage {
        // Return views when possible
        img
    }
}

SIMD Acceleration

// Enable SIMD for performance-critical operations
#[cfg(target_feature = "avx2")]
use std::arch::x86_64::*;

pub struct SimdProcessor;

impl SimdProcessor {
    /// SIMD-accelerated threshold operation
    #[cfg(target_feature = "avx2")]
    pub unsafe fn threshold_simd(data: &mut [u8], threshold: u8) {
        let threshold_vec = _mm256_set1_epi8(threshold as i8);
        let chunks = data.chunks_exact_mut(32);

        for chunk in chunks {
            let values = _mm256_loadu_si256(chunk.as_ptr() as *const __m256i);
            let mask = _mm256_cmpgt_epi8(values, threshold_vec);
            let result = _mm256_and_si256(mask, _mm256_set1_epi8(-1));
            _mm256_storeu_si256(chunk.as_mut_ptr() as *mut __m256i, result);
        }
    }

    /// Portable version without SIMD
    pub fn threshold_portable(data: &mut [u8], threshold: u8) {
        for pixel in data {
            *pixel = if *pixel > threshold { 255 } else { 0 };
        }
    }
}

GPU Preprocessing (via wgpu)

// Conceptual GPU preprocessing using wgpu
use wgpu;

pub struct GpuPreprocessor {
    device: wgpu::Device,
    queue: wgpu::Queue,
}

impl GpuPreprocessor {
    /// Initialize GPU context
    pub async fn new() -> Result<Self, Box<dyn std::error::Error>> {
        let instance = wgpu::Instance::new(wgpu::InstanceDescriptor::default());
        let adapter = instance.request_adapter(&wgpu::RequestAdapterOptions::default())
            .await
            .ok_or("Failed to find adapter")?;

        let (device, queue) = adapter.request_device(
            &wgpu::DeviceDescriptor::default(),
            None
        ).await?;

        Ok(Self { device, queue })
    }

    /// GPU-accelerated gaussian blur
    pub fn gpu_gaussian_blur(&self, img: &GrayImage, sigma: f32) -> GrayImage {
        // Implement GPU-based blur using compute shaders
        // Upload image to GPU, run shader, download result
        img.clone() // Placeholder
    }

    /// Batch processing on GPU
    pub fn batch_process_gpu(&self, images: Vec<GrayImage>) -> Vec<GrayImage> {
        // Process multiple images in parallel on GPU
        images // Placeholder
    }
}

Batch Operations

use rayon::prelude::*;

pub struct BatchProcessor;

impl BatchProcessor {
    /// Process multiple images in parallel
    pub fn batch_process<F>(images: Vec<DynamicImage>, processor: F) -> Vec<DynamicImage>
    where
        F: Fn(DynamicImage) -> DynamicImage + Sync + Send
    {
        images.into_par_iter()
            .map(processor)
            .collect()
    }

    /// Pipeline processing with parallelism
    pub fn pipeline_process(images: Vec<DynamicImage>) -> Vec<GrayImage> {
        images.into_par_iter()
            .map(|img| img.to_luma8())
            .map(|img| NoiseReducer::gaussian_denoise(&img, 1.0))
            .map(|img| Binarizer::otsu_binarize(&img))
            .collect()
    }
}

Memory Pool

use std::collections::VecDeque;

pub struct ImagePool {
    pool: VecDeque<ImageBuffer<Rgba<u8>, Vec<u8>>>,
    max_size: usize,
}

impl ImagePool {
    pub fn new(max_size: usize) -> Self {
        Self {
            pool: VecDeque::new(),
            max_size,
        }
    }

    /// Get buffer from pool or allocate new
    pub fn acquire(&mut self, width: u32, height: u32) -> ImageBuffer<Rgba<u8>, Vec<u8>> {
        self.pool.pop_front()
            .unwrap_or_else(|| ImageBuffer::new(width, height))
    }

    /// Return buffer to pool
    pub fn release(&mut self, mut buffer: ImageBuffer<Rgba<u8>, Vec<u8>>) {
        if self.pool.len() < self.max_size {
            // Clear buffer
            for pixel in buffer.pixels_mut() {
                *pixel = Rgba([0, 0, 0, 0]);
            }
            self.pool.push_back(buffer);
        }
    }
}

Complete Pipeline Implementation

use image::{DynamicImage, GrayImage};

pub struct PreprocessingPipeline {
    loader: ImageLoader,
    enhancer: ImageEnhancer,
    rotation_detector: RotationDetector,
    deskewer: Deskewer,
    resolution_handler: ResolutionHandler,
    binarizer: Binarizer,
    text_detector: TextDetector,
}

impl PreprocessingPipeline {
    pub fn new() -> Self {
        Self {
            loader: ImageLoader::new(),
            enhancer: ImageEnhancer::new(),
            rotation_detector: RotationDetector,
            deskewer: Deskewer,
            resolution_handler: ResolutionHandler::new(),
            binarizer: Binarizer,
            text_detector: TextDetector::new(),
        }
    }

    /// Run complete preprocessing pipeline
    pub fn process<P: AsRef<std::path::Path>>(
        &self,
        path: P
    ) -> Result<ProcessedImage, ImageError> {
        // 1. Load image
        let img = self.loader.load(path)?;

        // 2. Convert to grayscale
        let gray = img.to_luma8();

        // 3. Enhance contrast
        let enhanced = self.enhancer.apply_clahe(&gray);

        // 4. Denoise
        let denoised = NoiseReducer::bilateral_denoise(&enhanced, 3.0, 50.0);

        // 5. Auto-rotate
        let (rotated, rotation_angle) = self.rotation_detector.auto_correct_rotation(&denoised);

        // 6. Deskew
        let (deskewed, skew_angle) = self.deskewer.deskew(&rotated);

        // 7. Normalize resolution
        let normalized = self.resolution_handler.normalize_resolution(
            &DynamicImage::ImageLuma8(deskewed)
        );

        // 8. Final enhancement and binarization
        let final_gray = normalized.to_luma8();
        let sharpened = Sharpener::unsharp_mask(&final_gray, 1.0, 0.5);
        let binary = self.binarizer.otsu_binarize(&sharpened);

        // 9. Detect text regions
        let text_regions = self.text_detector.detect_text_regions(&binary);
        let merged_regions = self.text_detector.merge_bboxes(text_regions, 10);

        Ok(ProcessedImage {
            processed: binary,
            original_dimensions: img.dimensions(),
            rotation_angle,
            skew_angle,
            text_regions: merged_regions,
        })
    }

    /// Process with custom pipeline steps
    pub fn process_custom<P: AsRef<std::path::Path>>(
        &self,
        path: P,
        config: PipelineConfig
    ) -> Result<ProcessedImage, ImageError> {
        let img = self.loader.load(path)?;
        let mut current = img.to_luma8();

        if config.enhance {
            current = self.enhancer.apply_clahe(&current);
        }

        if config.denoise {
            current = NoiseReducer::gaussian_denoise(&current, config.denoise_sigma);
        }

        if config.auto_rotate {
            let (rotated, _) = self.rotation_detector.auto_correct_rotation(&current);
            current = rotated;
        }

        if config.deskew {
            let (deskewed, _) = self.deskewer.deskew(&current);
            current = deskewed;
        }

        if config.binarize {
            current = self.binarizer.adaptive_binarize(&current, config.binarize_block_size, 0);
        }

        Ok(ProcessedImage {
            processed: current,
            original_dimensions: img.dimensions(),
            rotation_angle: 0.0,
            skew_angle: 0.0,
            text_regions: Vec::new(),
        })
    }
}

pub struct ProcessedImage {
    pub processed: GrayImage,
    pub original_dimensions: (u32, u32),
    pub rotation_angle: f32,
    pub skew_angle: f32,
    pub text_regions: Vec<imageproc::rect::Rect>,
}

pub struct PipelineConfig {
    pub enhance: bool,
    pub denoise: bool,
    pub denoise_sigma: f32,
    pub auto_rotate: bool,
    pub deskew: bool,
    pub binarize: bool,
    pub binarize_block_size: u32,
}

impl Default for PipelineConfig {
    fn default() -> Self {
        Self {
            enhance: true,
            denoise: true,
            denoise_sigma: 1.0,
            auto_rotate: true,
            deskew: true,
            binarize: true,
            binarize_block_size: 15,
        }
    }
}

Performance Benchmarks

Expected performance characteristics:

  • Image Loading: 10-50ms (depending on size and format)
  • CLAHE Enhancement: 20-100ms (depends on tile size)
  • Bilateral Filtering: 50-200ms (edge-preserving but slower)
  • Gaussian Blur: 10-30ms (fast)
  • Rotation Detection: 100-300ms (tests multiple angles)
  • Deskewing: 50-150ms
  • Binarization: 10-30ms
  • Text Detection: 30-100ms
  • Full Pipeline: 300-800ms per image

SIMD and GPU acceleration can improve these by 2-10x.

Dependencies

Add to Cargo.toml:

[dependencies]
image = "0.24"
imageproc = "0.23"
rayon = "1.7"  # Parallel processing
nalgebra = "0.32"  # Linear algebra for transformations

[target.'cfg(target_feature = "avx2")'.dependencies]
# SIMD-specific dependencies

[features]
gpu = ["wgpu"]  # Optional GPU support
simd = []  # Enable SIMD optimizations

Usage Example

use preprocessing_pipeline::PreprocessingPipeline;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let pipeline = PreprocessingPipeline::new();

    // Process single image
    let result = pipeline.process("document.jpg")?;
    result.processed.save("processed.png")?;

    println!("Rotation: {}°", result.rotation_angle);
    println!("Skew: {}°", result.skew_angle);
    println!("Text regions: {}", result.text_regions.len());

    // Batch process
    let images = vec!["doc1.jpg", "doc2.png", "doc3.tiff"];
    let results: Vec<_> = images.into_par_iter()
        .map(|path| pipeline.process(path))
        .collect::<Result<_, _>>()?;

    Ok(())
}

Future Enhancements

  1. Deep Learning Integration: Use neural networks for super-resolution and denoising
  2. Document Layout Analysis: Detect columns, tables, figures
  3. Adaptive Pipeline: Automatically select preprocessing steps based on image quality
  4. Real-time Processing: Optimize for video/camera feeds
  5. Cloud GPU Support: Integrate with cloud GPU services for batch processing

This preprocessing pipeline provides a solid foundation for optimal OCR performance in the ruvector-scipix project.