mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-25 06:36:37 +00:00

Plan Rust Mathpix clone for ruvector (#28 )

* feat(mathpix): Add complete ruvector-mathpix OCR implementation

Comprehensive Rust-based Mathpix API clone with full SPARC methodology:

## Core Implementation (98 Rust files)
- OCR engine with ONNX Runtime inference
- Math/LaTeX parsing with 200+ symbol mappings
- Image preprocessing pipeline (rotation, deskew, CLAHE, thresholding)
- Multi-format output (LaTeX, MathML, MMD, AsciiMath, HTML)
- REST API server with Axum (Mathpix v3 compatible)
- CLI tool with batch processing
- WebAssembly bindings for browser use
- Performance optimizations (SIMD, parallel processing, caching)

## Documentation (35 markdown files)
- SPARC specification and architecture
- OCR research and Rust ecosystem analysis
- Benchmarking and optimization roadmaps
- Test strategy and security design
- lean-agentic integration guide

## Testing & CI/CD
- Unit tests with 80%+ coverage target
- Integration tests for full pipeline
- Criterion benchmark suite (7 benchmarks)
- GitHub Actions workflows (CI, release, security)

## Key Features
- Vector-based caching via ruvector-core
- lean-agentic agent orchestration support
- Multi-platform: Linux, macOS, Windows, WASM
- Performance targets: <100ms latency, 95%+ accuracy

Part of ruvector v0.1.16 ecosystem.

* fix(mathpix): Fix compilation errors and dependency conflicts

- Fix getrandom dependency: use wasm_js feature instead of js
- Remove duplicate WASM dependency declarations in Cargo.toml
- Add Clone derive to CLI argument structs (OcrArgs, BatchArgs, ServeArgs, ConfigArgs)
- Fix borrow-after-move error in CLI by borrowing command enum

The project now compiles successfully with only warnings (unused imports/variables).

* fix(mathpix): Add missing test dependencies and font assets

- Add dev-dependencies: predicates, assert_cmd, ab_glyph, tokio[process], reqwest[blocking]
- Download and add DejaVuSans.ttf font for test image generation
- Update tests/common/images.rs to use ab_glyph instead of rusttype (imageproc 0.25 compatibility)

* chore: Update Cargo.lock with new dev-dependencies

* security(mathpix): Fix critical authentication and remove mock implementations

SECURITY FIXES:
- Replace insecure credential validation that accepted ANY non-empty credentials
- Implement proper SHA-256 hashed API key storage in AppState
- Add constant-time comparison to prevent timing attacks
- Add configurable auth_enabled flag for development vs production

API IMPROVEMENTS:
- Remove mock OCR responses - now returns 503 with setup instructions
- Add service_unavailable and not_implemented error responses
- Convert document endpoint properly returns 501 Not Implemented
- Usage/history endpoints now clearly indicate no database configured

OCR ENGINE:
- Remove mock detection/recognition - now returns proper errors
- Add is_ready() check for model availability
- Implement real image preprocessing (decode, resize, normalize)
- Add clear error messages directing users to model setup docs

These changes ensure the API fails safely and informs users how to
properly configure the service rather than returning fake data.

* fix(mathpix): Fix test module organization and circular dependencies

- Create common/types.rs for shared test types (OutputFormat, ProcessingOptions, etc.)
- Update server.rs to use common types instead of circular imports
- Add #[cfg(feature = "math")] to math_tests.rs for conditional compilation
- Fix CLI serve test to use std::env::var instead of env! macro
- Remove duplicate type definitions from pipeline_tests.rs and cache_tests.rs

* feat(mathpix): Implement real ONNX inference with ort 2.0 API

- Update models.rs to load actual ONNX sessions via ort crate
- Add is_loaded() method to check if model session is available
- Implement run_onnx_detection, run_onnx_recognition, run_onnx_math_recognition
- Use ndarray + Tensor::from_array for proper tensor creation
- Parse detection output with bounding box extraction and region cropping
- Properly handle softmax for confidence scores
- All inference methods return proper errors when models unavailable

* feat(scipix): Rebrand mathpix to scipix with comprehensive documentation

- Rename examples/mathpix folder to examples/scipix
- Update package name from ruvector-mathpix to ruvector-scipix
- Update binary names: mathpix-cli -> scipix-cli, mathpix-server -> scipix-server
- Update library name: ruvector_mathpix -> ruvector_scipix
- Update all internal type names: MathpixError -> ScipixError, MathpixWasm -> ScipixWasm
- Update all imports and module references throughout codebase
- Update Makefile, scripts, and configuration files
- Create comprehensive README.md with:
  - Better introduction and feature overview
  - Quick start guide (30-second setup)
  - Six step-by-step tutorials covering all use cases
  - Complete API reference with request/response examples
  - Configuration options and environment variables
  - Project structure documentation
  - Performance benchmarks and optimization tips
  - Troubleshooting guide

* perf(scipix): Add SIMD-optimized preprocessing with 4.4x pipeline speedup

- Add SIMD-accelerated bilinear resize for 1.5x faster image resizing
- Add fast area average resize for large image downscaling
- Implement parallel SIMD resize using rayon for HD images
- Add comprehensive benchmark binary comparing original vs SIMD performance

Performance improvements:
- SIMD Grayscale: 4.22x speedup (426µs → 101µs)
- SIMD Resize: 1.51x speedup (3.98ms → 2.63ms)
- Full Pipeline: 4.39x speedup (2.16ms → 0.49ms)

State-of-the-art comparison:
- Estimated latency: 55ms @ 18 images/sec
- Comparable to PaddleOCR (~50ms, ~20 img/s)
- Faster than Tesseract (~200ms) and EasyOCR (~100ms)

* chore: Ignore generated test images

* feat(scipix): Add MCP server for AI integration

Implement Model Context Protocol (MCP) 2025-11 server to expose OCR
capabilities as tools for AI hosts like Claude.

Available MCP tools:
- ocr_image: Process image files with OCR
- ocr_base64: Process base64-encoded images
- batch_ocr: Batch process multiple images
- preprocess_image: Apply image preprocessing
- latex_to_mathml: Convert LaTeX to MathML
- benchmark_performance: Run performance benchmarks

Usage:
  scipix-cli mcp              # Start MCP server
  scipix-cli mcp --debug      # Enable debug logging

Claude Code integration:
  claude mcp add scipix -- scipix-cli mcp

* docs(mcp): Add Anthropic best practices for tool definitions

Update MCP tool descriptions following guidelines from:
https://www.anthropic.com/engineering/advanced-tool-use

Improvements:
- Add "WHEN TO USE" guidance for each tool
- Include concrete usage EXAMPLES with JSON
- Add RETURNS section describing output format
- Document WORKFLOW patterns (e.g., preprocess -> ocr)
- Improve parameter descriptions and constraints

This improves tool selection accuracy from ~72% to ~90% based on
Anthropic's benchmarks for complex parameter handling.

* feat(scipix): Add doctor command for environment optimization

Add a comprehensive `doctor` command to the SciPix CLI that:
- Detects CPU cores, SIMD capabilities (SSE2/AVX/AVX2/AVX-512/NEON)
- Analyzes memory availability and per-core allocation
- Checks dependencies (ONNX Runtime, OpenSSL)
- Validates configuration files and environment variables
- Tests network port availability
- Generates optimal configuration recommendations
- Supports --fix to auto-create configuration files
- Outputs in human-readable or JSON format
- Allows filtering by check category (cpu, memory, config, deps, network)

* fix(scipix): Add required-features for OCR-dependent examples

- Add required-features = ["ocr"] to batch_processing and streaming examples
- Fix imports to use ruvector_scipix::ocr::OcrEngine instead of root export
- Update example documentation to show --features ocr flag

This ensures examples that depend on the OCR feature won't fail to compile
when the feature is not enabled.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(scipix): Fix all 22 compiler warnings

Remove unused imports:
- tokio::sync::mpsc from mcp.rs
- uuid::Uuid from handlers.rs
- ScipixError from cache/mod.rs
- PreprocessError from pipeline.rs and segmentation.rs
- BoundingBox and WordData from json.rs
- crate::error::Result from parallel.rs
- mpsc from batch.rs

Fix unused variables:
- Rename idx to _idx in batch.rs
- Rename image to _image in segmentation.rs
- Rename pixels to _pixels, y_frac to _y_frac, y_frac_inv to _y_frac_inv in simd.rs
- Fix pixel_idx variable name (was using undefined idx)

Mark intentionally unused fields with #[allow(dead_code)]:
- jsonrpc field in JsonRpcRequest
- ToolResult and ContentBlock structs
- models_dir in McpServer
- style in StyledLaTeXFormatter
- include_styles in DocxFormatter
- max_size in BufferPool

Remove unnecessary mut from merge_overlapping_regions parameter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs(scipix): Update README and Cargo.toml for crates.io publishing

- Completely rewrite README.md with comprehensive documentation:
  - crates.io badges and metadata
  - Installation guide (cargo add, from source, pre-built binaries)
  - Feature flags documentation
  - SDK usage examples (basic, preprocessing, OCR, math, caching)
  - CLI reference for all commands (ocr, batch, serve, config, doctor, mcp)
  - 6 tutorials covering basic OCR to MCP integration
  - API reference for REST endpoints
  - Configuration options (env vars and TOML)
  - Performance benchmarks

- Update Cargo.toml with crates.io publishing metadata:
  - description, readme, keywords, categories
  - documentation and homepage URLs
  - rust-version requirement (1.77)
  - exclude patterns for unnecessary files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs(scipix): Improve introduction and SEO optimize crate metadata

README improvements:
- Enhanced title for better search visibility
- Added downloads and CI badges
- Expanded "Why SciPix?" section with use cases
- Added feature comparison table with detailed descriptions
- Added performance benchmarks vs Tesseract/Mathpix
- Better keyword-rich descriptions for discoverability

Cargo.toml SEO optimization:
- Expanded description with key search terms (LaTeX, MathML, ONNX, GPU)
- Updated keywords for crates.io search: ocr, latex, mathml, scientific-computing, image-recognition

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* docs: Add SciPix OCR crate to root README

- Add Scientific OCR (SciPix) section to Crates table
- Include brief description of capabilities: LaTeX/MathML extraction,
  ONNX inference, SIMD preprocessing, REST API, CLI, MCP integration
- Add crates.io badge and quick usage examples

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>

2025-11-29 17:34:47 -05:00

47 KiB

Raw Permalink Blame History

SPARC Pseudocode: Ruvector-Scipix OCR & Math Recognition Pipeline

Document Overview

This document provides algorithmic pseudocode for the core components of the ruvector-scipix OCR and mathematical expression recognition system. All algorithms use Rust-like syntax and include complexity analysis.

1. Image Preprocessing Pipeline

1.1 Main Preprocessing Algorithm

ALGORITHM: PreprocessImage
INPUT: imageBytes (Vec<u8>), config (PreprocessConfig)
OUTPUT: Result<ProcessedImage, Error>

CONSTANTS:
    MAX_IMAGE_SIZE = 4096 × 4096 pixels
    MIN_DPI = 150
    TARGET_DPI = 300
    NOISE_THRESHOLD = 0.15

DATA STRUCTURES:
    ProcessedImage {
        data: Vec<u8>,
        width: u32,
        height: u32,
        channels: u8,
        metadata: ImageMetadata,
        regions: Vec<TextRegion>
    }

    ImageMetadata {
        dpi: u32,
        rotation: f32,
        quality_score: f32,
        has_math: bool
    }

    TextRegion {
        bbox: BoundingBox,
        confidence: f32,
        region_type: RegionType  // Text, Math, Diagram
    }

BEGIN
    // Phase 1: Image Loading and Validation
    rawImage ← DecodeImage(imageBytes)
    IF rawImage.is_error() THEN
        RETURN Error("Failed to decode image")
    END IF

    IF rawImage.width > MAX_IMAGE_SIZE OR rawImage.height > MAX_IMAGE_SIZE THEN
        rawImage ← ResizeImage(rawImage, MAX_IMAGE_SIZE)
    END IF

    // Phase 2: Rotation Detection and Correction
    rotationAngle ← DetectRotation(rawImage)
    IF ABS(rotationAngle) > 0.5 THEN
        rawImage ← RotateImage(rawImage, -rotationAngle)
    END IF

    // Phase 3: DPI Normalization
    currentDPI ← EstimateDPI(rawImage)
    IF currentDPI < MIN_DPI THEN
        RETURN Error("Image resolution too low")
    END IF

    IF currentDPI != TARGET_DPI THEN
        scaleFactor ← TARGET_DPI / currentDPI
        rawImage ← ResizeImage(rawImage, scaleFactor)
    END IF

    // Phase 4: Noise Reduction
    noiseLevel ← EstimateNoise(rawImage)
    IF noiseLevel > NOISE_THRESHOLD THEN
        rawImage ← ApplyBilateralFilter(rawImage, sigma: 2.0, radius: 3)
    END IF

    // Phase 5: Contrast Enhancement
    enhancedImage ← AdaptiveHistogramEqualization(rawImage, clip_limit: 2.0)

    // Phase 6: Text Region Detection
    regions ← DetectTextRegions(enhancedImage)

    // Phase 7: Quality Assessment
    qualityScore ← AssessQuality(enhancedImage, regions)

    metadata ← ImageMetadata {
        dpi: TARGET_DPI,
        rotation: rotationAngle,
        quality_score: qualityScore,
        has_math: ContainsMathRegions(regions)
    }

    RETURN Ok(ProcessedImage {
        data: enhancedImage.to_bytes(),
        width: enhancedImage.width,
        height: enhancedImage.height,
        channels: enhancedImage.channels,
        metadata: metadata,
        regions: regions
    })
END

COMPLEXITY ANALYSIS:
    Time Complexity:
        - Image decoding: O(n) where n = pixel count
        - Rotation detection: O(n log n) using Hough transform
        - Image rotation: O(n)
        - DPI scaling: O(n)
        - Bilateral filter: O(n × r²) where r = radius
        - CLAHE: O(n)
        - Region detection: O(n log n)
        Total: O(n log n)

    Space Complexity:
        - Raw image buffer: O(n)
        - Intermediate buffers: O(n)
        - Region storage: O(k) where k = region count
        Total: O(n)

1.2 Rotation Detection Algorithm

ALGORITHM: DetectRotation
INPUT: image (Image)
OUTPUT: angle (f32)

BEGIN
    // Convert to grayscale if needed
    grayImage ← ToGrayscale(image)

    // Apply edge detection
    edges ← CannyEdgeDetection(grayImage, low: 50, high: 150)

    // Use Hough Line Transform to detect dominant lines
    lines ← HoughLineTransform(edges, rho: 1.0, theta: PI/180, threshold: 100)

    IF lines.is_empty() THEN
        RETURN 0.0
    END IF

    // Cluster angles into dominant orientations
    angles ← []
    FOR EACH line IN lines DO
        angle ← line.theta * 180 / PI
        // Normalize to [-45, 45] range
        WHILE angle > 45 DO
            angle ← angle - 90
        END WHILE
        WHILE angle < -45 DO
            angle ← angle + 90
        END WHILE
        angles.push(angle)
    END FOR

    // Use median for robustness
    angles.sort()
    medianAngle ← angles[angles.len() / 2]

    RETURN medianAngle
END

COMPLEXITY ANALYSIS:
    Time: O(n log n) for Hough transform
    Space: O(n) for edge map

1.3 Text Region Detection

ALGORITHM: DetectTextRegions
INPUT: image (Image)
OUTPUT: regions (Vec<TextRegion>)

DATA STRUCTURES:
    Component {
        pixels: Vec<Point>,
        bbox: BoundingBox,
        area: u32
    }

BEGIN
    // Use MSER (Maximally Stable Extremal Regions)
    binaryImage ← AdaptiveThreshold(image, window: 15)

    components ← FindConnectedComponents(binaryImage)

    regions ← []
    FOR EACH comp IN components DO
        // Filter by geometric properties
        aspectRatio ← comp.bbox.width / comp.bbox.height
        density ← comp.area / (comp.bbox.width * comp.bbox.height)

        IF aspectRatio > 0.1 AND aspectRatio < 10.0 AND density > 0.3 THEN
            // Classify region type
            features ← ExtractRegionFeatures(comp, image)
            regionType ← ClassifyRegion(features)

            region ← TextRegion {
                bbox: comp.bbox,
                confidence: features.confidence,
                region_type: regionType
            }
            regions.push(region)
        END IF
    END FOR

    // Merge nearby regions
    mergedRegions ← MergeOverlappingRegions(regions, iou_threshold: 0.3)

    RETURN mergedRegions
END

COMPLEXITY ANALYSIS:
    Time: O(n × α(n)) where α is inverse Ackermann (connected components)
    Space: O(k) where k = component count

2. OCR Engine Core

2.1 Main OCR Pipeline

ALGORITHM: RecognizeText
INPUT: image (ProcessedImage), model (VisionTransformer)
OUTPUT: Result<RecognitionResult, Error>

DATA STRUCTURES:
    RecognitionResult {
        lines: Vec<TextLine>,
        confidence: f32,
        processing_time_ms: u64
    }

    TextLine {
        text: String,
        bbox: BoundingBox,
        words: Vec<Word>,
        confidence: f32
    }

    Word {
        text: String,
        bbox: BoundingBox,
        chars: Vec<Character>,
        confidence: f32
    }

    Character {
        char: char,
        bbox: BoundingBox,
        confidence: f32,
        alternatives: Vec<(char, f32)>
    }

BEGIN
    startTime ← GetCurrentTime()

    // Phase 1: Vision Transformer Encoding
    encodedFeatures ← EncodeImageFeatures(image, model)

    // Phase 2: Text Line Detection
    textLines ← DetectTextLines(encodedFeatures, image.regions)

    // Phase 3: Character Recognition
    recognizedLines ← []
    totalConfidence ← 0.0

    FOR EACH lineRegion IN textLines DO
        lineImage ← CropRegion(image, lineRegion.bbox)

        // Run sequence-to-sequence recognition
        words ← RecognizeLineSequence(lineImage, model, encodedFeatures)

        lineText ← words.map(|w| w.text).join(" ")
        lineConfidence ← ComputeLineConfidence(words)

        textLine ← TextLine {
            text: lineText,
            bbox: lineRegion.bbox,
            words: words,
            confidence: lineConfidence
        }

        recognizedLines.push(textLine)
        totalConfidence ← totalConfidence + lineConfidence
    END FOR

    avgConfidence ← totalConfidence / recognizedLines.len()
    processingTime ← GetCurrentTime() - startTime

    RETURN Ok(RecognitionResult {
        lines: recognizedLines,
        confidence: avgConfidence,
        processing_time_ms: processingTime
    })
END

COMPLEXITY ANALYSIS:
    Time Complexity:
        - Vision Transformer encoding: O(n² × d) where d = embedding dim
        - Line detection: O(k × log k) where k = regions
        - Character recognition per line: O(m × d²) where m = line length
        - Total lines L: O(L × m × d²)
        Overall: O(n² × d + L × m × d²)

    Space Complexity:
        - Feature maps: O(n × d)
        - Attention maps: O(n² × h) where h = attention heads
        - Output storage: O(L × m)
        Total: O(n² × h + n × d)

2.2 Vision Transformer Encoding

ALGORITHM: EncodeImageFeatures
INPUT: image (ProcessedImage), model (VisionTransformer)
OUTPUT: features (FeatureMap)

DATA STRUCTURES:
    FeatureMap {
        embeddings: Tensor<f32>,  // Shape: [seq_len, embed_dim]
        attention_weights: Tensor<f32>,  // Shape: [heads, seq_len, seq_len]
        positions: Vec<Point>
    }

    VisionTransformer {
        patch_size: u32,
        embed_dim: u32,
        num_heads: u32,
        num_layers: u32,
        weights: ModelWeights
    }

BEGIN
    // Phase 1: Patch Extraction
    patchSize ← model.patch_size
    numPatchesH ← image.height / patchSize
    numPatchesW ← image.width / patchSize

    patches ← []
    positions ← []

    FOR h IN 0..numPatchesH DO
        FOR w IN 0..numPatchesW DO
            y ← h * patchSize
            x ← w * patchSize
            patch ← ExtractPatch(image, x, y, patchSize)
            patches.push(patch)
            positions.push(Point{x, y})
        END FOR
    END FOR

    // Phase 2: Patch Embedding
    embeddings ← []
    FOR EACH patch IN patches DO
        // Linear projection of flattened patch
        flatPatch ← Flatten(patch)
        embedding ← MatMul(model.weights.patch_projection, flatPatch)
        embeddings.push(embedding)
    END FOR

    // Phase 3: Positional Encoding
    FOR i IN 0..embeddings.len() DO
        posEncoding ← ComputePositionalEncoding(i, model.embed_dim)
        embeddings[i] ← embeddings[i] + posEncoding
    END FOR

    // Add [CLS] token
    clsToken ← model.weights.cls_token
    embeddings.insert(0, clsToken)

    // Phase 4: Transformer Layers
    x ← Tensor::from(embeddings)
    allAttentionWeights ← []

    FOR layer IN 0..model.num_layers DO
        // Multi-head self-attention
        (x, attentionWeights) ← MultiHeadAttention(
            x,
            model.weights.layers[layer],
            num_heads: model.num_heads
        )

        allAttentionWeights.push(attentionWeights)

        // Feed-forward network
        x ← FeedForward(x, model.weights.layers[layer])

        // Layer normalization
        x ← LayerNorm(x, model.weights.layers[layer])
    END FOR

    RETURN FeatureMap {
        embeddings: x,
        attention_weights: Stack(allAttentionWeights),
        positions: positions
    }
END

COMPLEXITY ANALYSIS:
    Time Complexity:
        - Patch extraction: O(n) where n = pixels
        - Patch embedding: O(p × d²) where p = patches, d = embed_dim
        - Attention per layer: O(p² × d)
        - Total layers L: O(L × p² × d)
        Overall: O(L × p² × d)

    Space Complexity:
        - Embeddings: O(p × d)
        - Attention matrices: O(L × h × p²) where h = heads
        Total: O(L × h × p² + p × d)

2.3 Character Recognition Sequence

ALGORITHM: RecognizeLineSequence
INPUT: lineImage (Image), model (VisionTransformer), features (FeatureMap)
OUTPUT: words (Vec<Word>)

DATA STRUCTURES:
    BeamSearchState {
        sequence: Vec<char>,
        score: f32,
        hidden_state: Tensor<f32>
    }

CONSTANTS:
    BEAM_WIDTH = 5
    MAX_SEQUENCE_LENGTH = 256
    END_TOKEN = '<END>'
    SPACE_TOKEN = '<SPACE>'

BEGIN
    // Initialize beam search
    initialState ← BeamSearchState {
        sequence: [],
        score: 0.0,
        hidden_state: features.embeddings[0]  // CLS token
    }

    beams ← [initialState]

    // Beam search decoding
    FOR step IN 0..MAX_SEQUENCE_LENGTH DO
        candidates ← []

        FOR EACH beam IN beams DO
            IF beam.sequence.last() == END_TOKEN THEN
                candidates.push(beam)
                CONTINUE
            END IF

            // Get character probabilities from model
            (logits, newHiddenState) ← model.decode_step(
                beam.hidden_state,
                features.embeddings
            )

            probabilities ← Softmax(logits)

            // Get top-k characters
            topK ← GetTopK(probabilities, k: BEAM_WIDTH)

            FOR EACH (char, prob) IN topK DO
                newSequence ← beam.sequence.clone()
                newSequence.push(char)

                // Log probability for numerical stability
                newScore ← beam.score + LOG(prob)

                newBeam ← BeamSearchState {
                    sequence: newSequence,
                    score: newScore,
                    hidden_state: newHiddenState
                }

                candidates.push(newBeam)
            END FOR
        END FOR

        // Keep top BEAM_WIDTH candidates
        candidates.sort_by(|a, b| b.score.cmp(a.score))
        beams ← candidates[0..BEAM_WIDTH]

        // Check if all beams ended
        allEnded ← beams.all(|b| b.sequence.last() == END_TOKEN)
        IF allEnded THEN
            BREAK
        END IF
    END FOR

    // Take best beam
    bestBeam ← beams[0]

    // Split sequence into words
    words ← []
    currentWord ← []
    currentBBox ← BoundingBox::new()

    FOR i IN 0..bestBeam.sequence.len() DO
        char ← bestBeam.sequence[i]

        IF char == SPACE_TOKEN OR char == END_TOKEN THEN
            IF NOT currentWord.is_empty() THEN
                wordText ← currentWord.join("")
                word ← Word {
                    text: wordText,
                    bbox: currentBBox,
                    chars: currentWord.clone(),
                    confidence: EXP(bestBeam.score / bestBeam.sequence.len())
                }
                words.push(word)
                currentWord.clear()
            END IF
        ELSE
            currentWord.push(Character {
                char: char,
                bbox: EstimateCharBBox(lineImage, i),
                confidence: EXP(bestBeam.score / (i + 1)),
                alternatives: []
            })
        END IF
    END FOR

    RETURN words
END

COMPLEXITY ANALYSIS:
    Time Complexity:
        - Beam search steps: O(T × B × V) where:
            T = max sequence length
            B = beam width
            V = vocabulary size
        - Sorting per step: O(B × V × log(B × V))
        Overall: O(T × B × V × log(B × V))

    Space Complexity:
        - Beam storage: O(B × T × d) where d = hidden dim
        - Candidate buffer: O(B × V)
        Total: O(B × T × d)

3. Mathematical Expression Parser

3.1 Math Expression Recognition

ALGORITHM: RecognizeMathExpression
INPUT: region (TextRegion), image (ProcessedImage), model (MathModel)
OUTPUT: Result<MathExpression, Error>

DATA STRUCTURES:
    MathExpression {
        latex: String,
        tree: ExpressionTree,
        symbols: Vec<MathSymbol>,
        confidence: f32
    }

    ExpressionTree {
        root: Box<TreeNode>,
        height: u32
    }

    TreeNode {
        symbol: MathSymbol,
        relationship: SpatialRelation,
        children: Vec<Box<TreeNode>>
    }

    MathSymbol {
        symbol_type: SymbolType,  // Digit, Operator, Letter, Special
        value: String,
        bbox: BoundingBox,
        confidence: f32
    }

    SpatialRelation {
        relation_type: RelationType,  // Above, Below, Right, Superscript, Subscript
        distance: f32,
        alignment: f32
    }

BEGIN
    // Phase 1: Extract math region
    mathImage ← CropRegion(image, region.bbox)

    // Phase 2: Symbol Detection and Classification
    symbols ← DetectMathSymbols(mathImage, model)

    IF symbols.is_empty() THEN
        RETURN Error("No mathematical symbols detected")
    END IF

    // Phase 3: Spatial Relationship Analysis
    relationships ← AnalyzeSpatialRelationships(symbols)

    // Phase 4: Expression Tree Construction
    tree ← BuildExpressionTree(symbols, relationships)

    // Phase 5: LaTeX Generation
    latex ← GenerateLaTeX(tree)

    // Calculate overall confidence
    avgConfidence ← symbols.map(|s| s.confidence).average()

    RETURN Ok(MathExpression {
        latex: latex,
        tree: tree,
        symbols: symbols,
        confidence: avgConfidence
    })
END

COMPLEXITY ANALYSIS:
    Time: O(n² × log n) where n = symbol count
    Space: O(n × h) where h = tree height

3.2 Symbol Detection and Classification

ALGORITHM: DetectMathSymbols
INPUT: mathImage (Image), model (MathModel)
OUTPUT: symbols (Vec<MathSymbol>)

CONSTANTS:
    SYMBOL_MIN_SIZE = 8 pixels
    SYMBOL_MAX_SIZE = 128 pixels
    CONFIDENCE_THRESHOLD = 0.7

BEGIN
    // Phase 1: Connected Component Analysis
    binaryImage ← AdaptiveThreshold(mathImage, window: 11)
    components ← FindConnectedComponents(binaryImage)

    symbols ← []

    FOR EACH comp IN components DO
        // Filter by size
        width ← comp.bbox.width
        height ← comp.bbox.height

        IF width < SYMBOL_MIN_SIZE OR height < SYMBOL_MIN_SIZE THEN
            CONTINUE
        END IF

        IF width > SYMBOL_MAX_SIZE OR height > SYMBOL_MAX_SIZE THEN
            // Might be compound symbol, try to split
            subComponents ← SplitComponent(comp)
            FOR EACH subComp IN subComponents DO
                ProcessSymbol(subComp, mathImage, model, symbols)
            END FOR
        ELSE
            ProcessSymbol(comp, mathImage, model, symbols)
        END IF
    END FOR

    // Sort symbols left-to-right, top-to-bottom
    symbols.sort_by(|a, b| {
        IF ABS(a.bbox.y - b.bbox.y) < 10 THEN
            a.bbox.x.cmp(b.bbox.x)
        ELSE
            a.bbox.y.cmp(b.bbox.y)
        END IF
    })

    RETURN symbols
END

SUBROUTINE: ProcessSymbol
INPUT: component (Component), image (Image), model (MathModel), symbols (Vec<MathSymbol>)
OUTPUT: None (modifies symbols)

BEGIN
    // Extract symbol image
    symbolImage ← CropRegion(image, component.bbox)

    // Normalize to model input size
    normalizedSymbol ← ResizeImage(symbolImage, 64, 64)

    // Classify symbol
    (symbolClass, confidence) ← model.classify_symbol(normalizedSymbol)

    IF confidence >= CONFIDENCE_THRESHOLD THEN
        symbol ← MathSymbol {
            symbol_type: DetermineSymbolType(symbolClass),
            value: symbolClass.to_string(),
            bbox: component.bbox,
            confidence: confidence
        }

        symbols.push(symbol)
    END IF
END

COMPLEXITY ANALYSIS:
    Time: O(n × c) where n = components, c = classification time
    Space: O(n) for symbol storage

3.3 Spatial Relationship Analysis

ALGORITHM: AnalyzeSpatialRelationships
INPUT: symbols (Vec<MathSymbol>)
OUTPUT: relationships (Vec<(usize, usize, SpatialRelation)>)

DATA STRUCTURES:
    RelationFeatures {
        horizontal_distance: f32,
        vertical_distance: f32,
        size_ratio: f32,
        vertical_alignment: f32,
        horizontal_alignment: f32
    }

CONSTANTS:
    SUPERSCRIPT_Y_THRESHOLD = 0.6  // Relative to symbol height
    SUBSCRIPT_Y_THRESHOLD = 0.4
    FRACTION_ALIGNMENT_THRESHOLD = 0.8

BEGIN
    relationships ← []

    // Build spatial index for efficient queries
    spatialIndex ← BuildQuadTree(symbols)

    FOR i IN 0..symbols.len() DO
        symbolA ← symbols[i]

        // Find nearby symbols
        nearbySymbols ← spatialIndex.query_radius(
            symbolA.bbox.center(),
            radius: symbolA.bbox.width * 3
        )

        FOR EACH (j, symbolB) IN nearbySymbols DO
            IF i >= j THEN
                CONTINUE  // Avoid duplicate pairs
            END IF

            // Extract relationship features
            features ← ExtractRelationFeatures(symbolA, symbolB)

            // Classify relationship
            relation ← ClassifyRelation(features, symbolA, symbolB)

            IF relation.is_some() THEN
                relationships.push((i, j, relation.unwrap()))
            END IF
        END FOR
    END FOR

    RETURN relationships
END

SUBROUTINE: ClassifyRelation
INPUT: features (RelationFeatures), symbolA (MathSymbol), symbolB (MathSymbol)
OUTPUT: Option<SpatialRelation>

BEGIN
    centerA ← symbolA.bbox.center()
    centerB ← symbolB.bbox.center()

    deltaX ← centerB.x - centerA.x
    deltaY ← centerB.y - centerA.y

    // Determine dominant relationship

    // Superscript/Subscript detection
    IF deltaX > 0 AND deltaX < symbolA.bbox.width * 1.5 THEN
        relativeY ← deltaY / symbolA.bbox.height

        IF relativeY < -SUPERSCRIPT_Y_THRESHOLD THEN
            RETURN Some(SpatialRelation {
                relation_type: Superscript,
                distance: SQRT(deltaX² + deltaY²),
                alignment: features.horizontal_alignment
            })
        ELSE IF relativeY > SUBSCRIPT_Y_THRESHOLD THEN
            RETURN Some(SpatialRelation {
                relation_type: Subscript,
                distance: SQRT(deltaX² + deltaY²),
                alignment: features.horizontal_alignment
            })
        END IF
    END IF

    // Fraction detection (vertical alignment)
    IF features.vertical_alignment > FRACTION_ALIGNMENT_THRESHOLD THEN
        IF deltaY < 0 THEN
            RETURN Some(SpatialRelation {
                relation_type: Above,
                distance: ABS(deltaY),
                alignment: features.vertical_alignment
            })
        ELSE IF deltaY > 0 THEN
            RETURN Some(SpatialRelation {
                relation_type: Below,
                distance: ABS(deltaY),
                alignment: features.vertical_alignment
            })
        END IF
    END IF

    // Horizontal sequence (default)
    IF deltaX > 0 AND ABS(deltaY) < symbolA.bbox.height * 0.3 THEN
        RETURN Some(SpatialRelation {
            relation_type: Right,
            distance: deltaX,
            alignment: features.horizontal_alignment
        })
    END IF

    RETURN None
END

COMPLEXITY ANALYSIS:
    Time Complexity:
        - QuadTree construction: O(n log n)
        - For each symbol, query nearby: O(log n + k) where k = nearby count
        - Total: O(n × (log n + k))
        Average case: O(n log n) if k is constant

    Space Complexity:
        - QuadTree: O(n)
        - Relationships: O(n²) worst case, O(n) average
        Total: O(n²) worst case

3.4 Expression Tree Construction

ALGORITHM: BuildExpressionTree
INPUT: symbols (Vec<MathSymbol>), relationships (Vec<(usize, usize, SpatialRelation)>)
OUTPUT: tree (ExpressionTree)

DATA STRUCTURES:
    TreeBuilder {
        nodes: Vec<Box<TreeNode>>,
        parent_map: HashMap<usize, usize>,
        relation_graph: AdjacencyList
    }

BEGIN
    // Phase 1: Build relationship graph
    graph ← BuildRelationGraph(symbols, relationships)

    // Phase 2: Identify root candidates (symbols with no parents)
    rootCandidates ← []
    FOR i IN 0..symbols.len() DO
        IF NOT HasIncomingEdge(graph, i, excludeRight: true) THEN
            rootCandidates.push(i)
        END IF
    END FOR

    // Phase 3: Build tree from leftmost root
    rootCandidates.sort_by(|a, b| {
        symbols[*a].bbox.x.cmp(&symbols[*b].bbox.x)
    })

    rootIdx ← rootCandidates[0]

    // Phase 4: Recursive tree construction
    root ← BuildSubtree(rootIdx, symbols, graph, visited: Set::new())

    // Phase 5: Calculate tree height
    height ← CalculateHeight(root)

    RETURN ExpressionTree {
        root: root,
        height: height
    }
END

SUBROUTINE: BuildSubtree
INPUT: nodeIdx (usize), symbols (Vec<MathSymbol>), graph (AdjacencyList), visited (Set<usize>)
OUTPUT: node (Box<TreeNode>)

BEGIN
    IF visited.contains(nodeIdx) THEN
        RETURN Error("Cycle detected in expression tree")
    END IF

    visited.insert(nodeIdx)

    symbol ← symbols[nodeIdx]
    children ← []

    // Get all outgoing edges sorted by relationship priority
    edges ← graph.get_outgoing(nodeIdx)
    edges.sort_by(|a, b| {
        // Priority: Superscript > Subscript > Above > Below > Right
        GetRelationPriority(a.relation).cmp(GetRelationPriority(b.relation))
    })

    FOR EACH edge IN edges DO
        IF NOT visited.contains(edge.target) THEN
            childNode ← BuildSubtree(edge.target, symbols, graph, visited)
            childNode.relationship ← edge.relation
            children.push(childNode)
        END IF
    END FOR

    node ← TreeNode {
        symbol: symbol.clone(),
        relationship: SpatialRelation::default(),
        children: children
    }

    RETURN Box::new(node)
END

COMPLEXITY ANALYSIS:
    Time: O(n × log n) for graph construction and tree building
    Space: O(n × h) where h = average tree height

3.5 LaTeX Generation

ALGORITHM: GenerateLaTeX
INPUT: tree (ExpressionTree)
OUTPUT: latex (String)

BEGIN
    latex ← RecursiveGenerateLaTeX(tree.root)

    // Wrap in delimiters
    latex ← "\\(" + latex + "\\)"

    RETURN latex
END

SUBROUTINE: RecursiveGenerateLaTeX
INPUT: node (Box<TreeNode>)
OUTPUT: latex (String)

BEGIN
    symbol ← node.symbol
    baseLatex ← SymbolToLatex(symbol)

    // Group children by relationship type
    superscripts ← []
    subscripts ← []
    numerator ← None
    denominator ← None
    rightChildren ← []

    FOR EACH child IN node.children DO
        MATCH child.relationship.relation_type:
            Superscript → superscripts.push(child)
            Subscript → subscripts.push(child)
            Above → numerator ← Some(child)
            Below → denominator ← Some(child)
            Right → rightChildren.push(child)
        END MATCH
    END FOR

    // Build LaTeX string
    result ← baseLatex

    // Handle fractions
    IF numerator.is_some() AND denominator.is_some() THEN
        numLatex ← RecursiveGenerateLaTeX(numerator.unwrap())
        denomLatex ← RecursiveGenerateLaTeX(denominator.unwrap())
        result ← "\\frac{" + numLatex + "}{" + denomLatex + "}"
    END IF

    // Handle superscripts
    IF NOT superscripts.is_empty() THEN
        superLatex ← superscripts
            .map(|c| RecursiveGenerateLaTeX(c))
            .join("")
        result ← result + "^{" + superLatex + "}"
    END IF

    // Handle subscripts
    IF NOT subscripts.is_empty() THEN
        subLatex ← subscripts
            .map(|c| RecursiveGenerateLaTeX(c))
            .join("")
        result ← result + "_{" + subLatex + "}"
    END IF

    // Handle right children (sequential)
    FOR EACH child IN rightChildren DO
        childLatex ← RecursiveGenerateLaTeX(child)

        // Add spacing for operators
        IF IsOperator(child.symbol) THEN
            result ← result + " " + childLatex + " "
        ELSE
            result ← result + childLatex
        END IF
    END FOR

    RETURN result
END

SUBROUTINE: SymbolToLatex
INPUT: symbol (MathSymbol)
OUTPUT: latex (String)

BEGIN
    MATCH symbol.symbol_type:
        Digit → RETURN symbol.value
        Letter → RETURN symbol.value
        Operator → RETURN OperatorToLatex(symbol.value)
        Special → RETURN SpecialToLatex(symbol.value)
    END MATCH

    RETURN symbol.value
END

COMPLEXITY ANALYSIS:
    Time: O(n) where n = nodes in tree
    Space: O(h) for recursion stack where h = tree height

4. Output Format Conversion

4.1 Multi-Format Generation

ALGORITHM: ConvertToFormats
INPUT: mathExpr (MathExpression), formats (Vec<OutputFormat>)
OUTPUT: Result<HashMap<OutputFormat, String>, Error>

DATA STRUCTURES:
    OutputFormat {
        MMD,      // Markdown with delimiters
        LaTeXStyled,  // Standalone LaTeX
        MathML,   // MathML XML
        HTML      // Rendered HTML
    }

BEGIN
    results ← HashMap::new()

    FOR EACH format IN formats DO
        output ← MATCH format:
            MMD → GenerateMMD(mathExpr)
            LaTeXStyled → GenerateStyledLaTeX(mathExpr)
            MathML → GenerateMathML(mathExpr.tree)
            HTML → GenerateHTML(mathExpr)
        END MATCH

        results.insert(format, output)
    END FOR

    RETURN Ok(results)
END

COMPLEXITY ANALYSIS:
    Time: O(f × n) where f = format count, n = expression size
    Space: O(f × n) for storing all formats

4.2 MMD Generation

ALGORITHM: GenerateMMD
INPUT: mathExpr (MathExpression)
OUTPUT: mmd (String)

CONSTANTS:
    INLINE_DELIMITER = "$"
    DISPLAY_DELIMITER = "$$"

BEGIN
    latex ← mathExpr.latex

    // Determine if expression should be display or inline
    isDisplayMath ← ShouldBeDisplayMath(mathExpr)

    IF isDisplayMath THEN
        mmd ← DISPLAY_DELIMITER + "\n" + latex + "\n" + DISPLAY_DELIMITER
    ELSE
        mmd ← INLINE_DELIMITER + latex + INLINE_DELIMITER
    END IF

    RETURN mmd
END

SUBROUTINE: ShouldBeDisplayMath
INPUT: mathExpr (MathExpression)
OUTPUT: isDisplay (bool)

BEGIN
    // Display math if:
    // 1. Contains fractions or large operators
    // 2. Tree height > 2
    // 3. Width > threshold

    hasFractions ← mathExpr.latex.contains("\\frac")
    hasLargeOps ← mathExpr.latex.contains("\\sum") OR
                   mathExpr.latex.contains("\\int") OR
                   mathExpr.latex.contains("\\prod")

    isTall ← mathExpr.tree.height > 2
    isWide ← mathExpr.symbols.len() > 10

    RETURN hasFractions OR hasLargeOps OR isTall OR isWide
END

COMPLEXITY ANALYSIS:
    Time: O(n) where n = LaTeX string length
    Space: O(n) for output string

4.3 MathML Generation

ALGORITHM: GenerateMathML
INPUT: tree (ExpressionTree)
OUTPUT: mathml (String)

BEGIN
    xml ← XMLBuilder::new()
    xml.start_element("math", [("xmlns", "http://www.w3.org/1998/Math/MathML")])

    RecursiveGenerateMathML(tree.root, xml)

    xml.end_element("math")

    RETURN xml.to_string()
END

SUBROUTINE: RecursiveGenerateMathML
INPUT: node (Box<TreeNode>), xml (XMLBuilder)
OUTPUT: None (modifies xml)

BEGIN
    symbol ← node.symbol

    // Determine MathML element type
    MATCH symbol.symbol_type:
        Digit OR Letter →
            xml.element("mi", symbol.value)

        Operator →
            xml.element("mo", symbol.value)

        Special →
            HandleSpecialSymbol(symbol, xml)
    END MATCH

    // Handle relationships
    IF HasSuperscript(node) THEN
        xml.start_element("msup")
        RecursiveGenerateMathML(GetBase(node), xml)
        RecursiveGenerateMathML(GetSuperscript(node), xml)
        xml.end_element("msup")
    ELSE IF HasSubscript(node) THEN
        xml.start_element("msub")
        RecursiveGenerateMathML(GetBase(node), xml)
        RecursiveGenerateMathML(GetSubscript(node), xml)
        xml.end_element("msub")
    ELSE IF HasFraction(node) THEN
        xml.start_element("mfrac")
        RecursiveGenerateMathML(GetNumerator(node), xml)
        RecursiveGenerateMathML(GetDenominator(node), xml)
        xml.end_element("mfrac")
    END IF

    // Process right children
    FOR EACH child IN GetRightChildren(node) DO
        RecursiveGenerateMathML(child, xml)
    END FOR
END

COMPLEXITY ANALYSIS:
    Time: O(n) tree traversal
    Space: O(n) for XML string

4.4 HTML Rendering

ALGORITHM: GenerateHTML
INPUT: mathExpr (MathExpression)
OUTPUT: html (String)

BEGIN
    // Use KaTeX or MathJax for rendering
    latex ← mathExpr.latex

    html ← """
    <div class="math-expression" data-confidence="{mathExpr.confidence}">
        <script type="math/tex">
        {latex}
        </script>
    </div>
    """

    // Add accessibility attributes
    html ← AddAriaLabels(html, mathExpr)

    RETURN html
END

COMPLEXITY ANALYSIS:
    Time: O(n) string concatenation
    Space: O(n) output size

5. Batch Processing

5.1 Parallel Batch Processing

ALGORITHM: ProcessBatch
INPUT: inputs (Vec<InputSource>), config (ProcessConfig)
OUTPUT: Result<Vec<ProcessResult>, Error>

DATA STRUCTURES:
    InputSource {
        source_type: SourceType,  // Image, PDF, Directory
        path: PathBuf,
        page_range: Option<Range<u32>>
    }

    ProcessResult {
        input: InputSource,
        output: RecognitionResult,
        processing_time_ms: u64,
        status: ResultStatus
    }

    ProcessConfig {
        max_parallel: usize,
        timeout_ms: u64,
        cache_enabled: bool
    }

BEGIN
    // Phase 1: Expand inputs (handle PDFs and directories)
    expandedInputs ← []
    FOR EACH input IN inputs DO
        MATCH input.source_type:
            PDF →
                pages ← ExtractPDFPages(input.path, input.page_range)
                expandedInputs.extend(pages)
            Directory →
                images ← FindImagesInDirectory(input.path)
                expandedInputs.extend(images)
            Image →
                expandedInputs.push(input)
        END MATCH
    END FOR

    // Phase 2: Create processing queue
    queue ← WorkQueue::new(expandedInputs)
    results ← ConcurrentVec::new()

    // Phase 3: Parallel processing
    numWorkers ← MIN(config.max_parallel, CPU_COUNT)

    PARALLEL FOR worker IN 0..numWorkers DO
        LOOP
            input ← queue.pop()
            IF input.is_none() THEN
                BREAK
            END IF

            startTime ← GetCurrentTime()

            // Process single input
            result ← ProcessSingleInput(
                input.unwrap(),
                config,
                timeout: config.timeout_ms
            )

            processingTime ← GetCurrentTime() - startTime

            processResult ← ProcessResult {
                input: input.unwrap(),
                output: result,
                processing_time_ms: processingTime,
                status: DetermineStatus(result)
            }

            results.push(processResult)
        END LOOP
    END PARALLEL

    // Phase 4: Aggregate and return
    finalResults ← results.into_vec()
    finalResults.sort_by(|a, b| a.input.path.cmp(&b.input.path))

    RETURN Ok(finalResults)
END

COMPLEXITY ANALYSIS:
    Time Complexity:
        - With P workers, N inputs, T time per input
        - Parallel: O(N × T / P)
        - Sequential equivalent: O(N × T)
        - Speedup: ~P (linear with worker count)

    Space Complexity:
        - Queue: O(N)
        - Results: O(N × R) where R = result size
        - Worker memory: O(P × M) where M = model size
        Total: O(N × R + P × M)

5.2 PDF Page Extraction

ALGORITHM: ExtractPDFPages
INPUT: pdfPath (PathBuf), pageRange (Option<Range<u32>>)
OUTPUT: pages (Vec<InputSource>)

BEGIN
    // Load PDF document
    document ← PDFDocument::load(pdfPath)

    IF document.is_error() THEN
        RETURN Error("Failed to load PDF")
    END IF

    // Determine page range
    totalPages ← document.page_count()
    range ← pageRange.unwrap_or(0..totalPages)

    pages ← []

    FOR pageNum IN range DO
        IF pageNum >= totalPages THEN
            BREAK
        END IF

        // Render page to image
        page ← document.get_page(pageNum)

        // Render at high DPI for quality
        image ← page.render(dpi: 300)

        // Create temporary file
        tempPath ← CreateTempFile(format!("page_{}.png", pageNum))
        image.save(tempPath)

        inputSource ← InputSource {
            source_type: Image,
            path: tempPath,
            page_range: None
        }

        pages.push(inputSource)
    END FOR

    RETURN pages
END

COMPLEXITY ANALYSIS:
    Time: O(P × R) where P = pages, R = render time per page
    Space: O(P × S) where S = image size

5.3 Result Aggregation

ALGORITHM: AggregateResults
INPUT: results (Vec<ProcessResult>)
OUTPUT: aggregated (AggregatedResults)

DATA STRUCTURES:
    AggregatedResults {
        total_count: usize,
        success_count: usize,
        failure_count: usize,
        total_processing_time_ms: u64,
        average_confidence: f32,
        results_by_status: HashMap<ResultStatus, Vec<ProcessResult>>
    }

BEGIN
    totalCount ← results.len()
    successCount ← 0
    failureCount ← 0
    totalTime ← 0
    totalConfidence ← 0.0
    byStatus ← HashMap::new()

    FOR EACH result IN results DO
        totalTime ← totalTime + result.processing_time_ms

        MATCH result.status:
            Success →
                successCount ← successCount + 1
                totalConfidence ← totalConfidence + result.output.confidence
            Failure →
                failureCount ← failureCount + 1
        END MATCH

        // Group by status
        IF NOT byStatus.contains_key(result.status) THEN
            byStatus.insert(result.status, [])
        END IF
        byStatus.get_mut(result.status).push(result)
    END FOR

    avgConfidence ← IF successCount > 0 THEN
        totalConfidence / successCount
    ELSE
        0.0
    END IF

    RETURN AggregatedResults {
        total_count: totalCount,
        success_count: successCount,
        failure_count: failureCount,
        total_processing_time_ms: totalTime,
        average_confidence: avgConfidence,
        results_by_status: byStatus
    }
END

COMPLEXITY ANALYSIS:
    Time: O(n) single pass
    Space: O(n) for grouping

6. Caching and Memoization

6.1 Model Weight Caching

ALGORITHM: LoadModelWithCache
INPUT: modelPath (PathBuf), cacheConfig (CacheConfig)
OUTPUT: Result<Model, Error>

DATA STRUCTURES:
    CacheConfig {
        enabled: bool,
        cache_dir: PathBuf,
        max_cache_size_mb: u64,
        ttl_seconds: u64
    }

    CachedModel {
        weights: Vec<u8>,
        metadata: ModelMetadata,
        cached_at: Timestamp,
        access_count: u64
    }

BEGIN
    IF NOT cacheConfig.enabled THEN
        RETURN LoadModelDirect(modelPath)
    END IF

    // Generate cache key from model path and version
    cacheKey ← ComputeHash(modelPath, algorithm: SHA256)
    cachePath ← cacheConfig.cache_dir.join(cacheKey)

    // Check if cached version exists and is valid
    IF cachePath.exists() THEN
        cachedModel ← DeserializeCachedModel(cachePath)

        // Check TTL
        age ← GetCurrentTime() - cachedModel.cached_at
        IF age < cacheConfig.ttl_seconds THEN
            // Cache hit
            cachedModel.access_count ← cachedModel.access_count + 1
            UpdateCacheMetadata(cachePath, cachedModel.metadata)

            model ← DeserializeModel(cachedModel.weights)
            RETURN Ok(model)
        ELSE
            // Cache expired
            DeleteFile(cachePath)
        END IF
    END IF

    // Cache miss - load from disk
    model ← LoadModelDirect(modelPath)

    IF model.is_error() THEN
        RETURN model
    END IF

    // Serialize and cache
    serializedWeights ← SerializeModel(model.unwrap())

    cachedModel ← CachedModel {
        weights: serializedWeights,
        metadata: model.metadata,
        cached_at: GetCurrentTime(),
        access_count: 1
    }

    // Check cache size limit
    EnsureCacheSize(cacheConfig)

    // Write to cache
    WriteCachedModel(cachePath, cachedModel)

    RETURN model
END

SUBROUTINE: EnsureCacheSize
INPUT: cacheConfig (CacheConfig)
OUTPUT: None

BEGIN
    currentSize ← GetDirectorySize(cacheConfig.cache_dir)
    maxSize ← cacheConfig.max_cache_size_mb * 1024 * 1024

    IF currentSize <= maxSize THEN
        RETURN
    END IF

    // Evict least recently used models
    cachedFiles ← ListFiles(cacheConfig.cache_dir)

    // Sort by last access time
    cachedFiles.sort_by(|a, b| {
        a.metadata.accessed_at.cmp(&b.metadata.accessed_at)
    })

    freedSpace ← 0
    targetFree ← currentSize - maxSize

    FOR EACH file IN cachedFiles DO
        IF freedSpace >= targetFree THEN
            BREAK
        END IF

        fileSize ← GetFileSize(file)
        DeleteFile(file)
        freedSpace ← freedSpace + fileSize
    END FOR
END

COMPLEXITY ANALYSIS:
    Time Complexity:
        - Cache hit: O(1) for lookup + O(m) for deserialization
        - Cache miss: O(m) for model loading + O(m) for serialization
        - Eviction: O(k log k) where k = cached files

    Space Complexity:
        - Cached model: O(m) where m = model size
        - LRU tracking: O(k)

6.2 Result Caching with Ruvector

ALGORITHM: CacheResultWithVector
INPUT: imageHash (Hash), result (RecognitionResult), vectorStore (RuvectorStore)
OUTPUT: Result<(), Error>

DATA STRUCTURES:
    RuvectorStore {
        index: VectorIndex,
        metadata_db: HashMap<Hash, ResultMetadata>,
        config: VectorConfig
    }

    VectorConfig {
        embedding_dim: usize,
        similarity_threshold: f32,
        max_cache_entries: usize
    }

    ResultMetadata {
        result: RecognitionResult,
        image_hash: Hash,
        cached_at: Timestamp,
        hit_count: u64
    }

BEGIN
    // Phase 1: Generate perceptual hash
    perceptualHash ← ComputePerceptualHash(imageHash)

    // Phase 2: Check if already cached
    IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
        // Update metadata
        metadata ← vectorStore.metadata_db.get_mut(perceptualHash)
        metadata.hit_count ← metadata.hit_count + 1
        RETURN Ok(())
    END IF

    // Phase 3: Generate embedding for the result
    embedding ← GenerateResultEmbedding(result)

    // Phase 4: Store in vector index
    vectorStore.index.insert(
        id: perceptualHash,
        vector: embedding
    )

    // Phase 5: Store metadata
    metadata ← ResultMetadata {
        result: result,
        image_hash: imageHash,
        cached_at: GetCurrentTime(),
        hit_count: 1
    }

    vectorStore.metadata_db.insert(perceptualHash, metadata)

    // Phase 6: Enforce cache size limit
    IF vectorStore.metadata_db.len() > vectorStore.config.max_cache_entries THEN
        EvictLeastUsedEntry(vectorStore)
    END IF

    RETURN Ok(())
END

ALGORITHM: QuerySimilarCachedResult
INPUT: imageHash (Hash), vectorStore (RuvectorStore)
OUTPUT: Option<RecognitionResult>

BEGIN
    // Generate perceptual hash
    perceptualHash ← ComputePerceptualHash(imageHash)

    // Exact match check
    IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
        metadata ← vectorStore.metadata_db.get(perceptualHash)
        metadata.hit_count ← metadata.hit_count + 1
        RETURN Some(metadata.result.clone())
    END IF

    // Generate query embedding
    queryEmbedding ← GenerateImageEmbedding(imageHash)

    // Search for similar results
    results ← vectorStore.index.search(
        query: queryEmbedding,
        k: 1,
        threshold: vectorStore.config.similarity_threshold
    )

    IF results.is_empty() THEN
        RETURN None
    END IF

    bestMatch ← results[0]

    IF bestMatch.similarity >= vectorStore.config.similarity_threshold THEN
        metadata ← vectorStore.metadata_db.get(bestMatch.id)
        metadata.hit_count ← metadata.hit_count + 1
        RETURN Some(metadata.result.clone())
    END IF

    RETURN None
END

COMPLEXITY ANALYSIS:
    Caching:
        Time: O(d) for embedding + O(log n) for index insertion
        Space: O(n × d) where n = cached entries, d = embedding dim

    Querying:
        Time: O(d) for embedding + O(log n × d) for ANN search
        Space: O(k) for results where k = search parameter

6.3 Incremental Update Cache

ALGORITHM: UpdateCacheIncremental
INPUT: updates (Vec<CacheUpdate>), vectorStore (RuvectorStore)
OUTPUT: Result<(), Error>

DATA STRUCTURES:
    CacheUpdate {
        operation: UpdateOp,  // Insert, Update, Delete
        image_hash: Hash,
        result: Option<RecognitionResult>
    }

    UpdateOp {
        Insert,
        Update,
        Delete
    }

BEGIN
    // Batch updates for efficiency
    insertBatch ← []
    updateBatch ← []
    deleteBatch ← []

    FOR EACH update IN updates DO
        MATCH update.operation:
            Insert →
                insertBatch.push(update)
            Update →
                updateBatch.push(update)
            Delete →
                deleteBatch.push(update)
        END MATCH
    END FOR

    // Process deletes first
    FOR EACH update IN deleteBatch DO
        perceptualHash ← ComputePerceptualHash(update.image_hash)
        vectorStore.index.remove(perceptualHash)
        vectorStore.metadata_db.remove(perceptualHash)
    END FOR

    // Process updates
    FOR EACH update IN updateBatch DO
        perceptualHash ← ComputePerceptualHash(update.image_hash)

        IF vectorStore.metadata_db.contains_key(perceptualHash) THEN
            // Update existing entry
            embedding ← GenerateResultEmbedding(update.result.unwrap())
            vectorStore.index.update(perceptualHash, embedding)

            metadata ← vectorStore.metadata_db.get_mut(perceptualHash)
            metadata.result ← update.result.unwrap()
            metadata.cached_at ← GetCurrentTime()
        END IF
    END FOR

    // Process inserts in batch
    IF NOT insertBatch.is_empty() THEN
        embeddings ← []
        metadataList ← []

        FOR EACH update IN insertBatch DO
            embedding ← GenerateResultEmbedding(update.result.unwrap())
            embeddings.push(embedding)

            perceptualHash ← ComputePerceptualHash(update.image_hash)
            metadata ← ResultMetadata {
                result: update.result.unwrap(),
                image_hash: update.image_hash,
                cached_at: GetCurrentTime(),
                hit_count: 1
            }
            metadataList.push((perceptualHash, metadata))
        END FOR

        // Batch insert into vector index
        vectorStore.index.insert_batch(embeddings)

        // Batch insert metadata
        FOR EACH (hash, metadata) IN metadataList DO
            vectorStore.metadata_db.insert(hash, metadata)
        END FOR
    END IF

    RETURN Ok(())
END

COMPLEXITY ANALYSIS:
    Time: O(b × d) where b = batch size, d = embedding dim
    Space: O(b × d) for batch processing

Summary: Complexity Analysis

Overall System Complexity

Component	Time Complexity	Space Complexity
Image Preprocessing	O(n log n)	O(n)
Vision Transformer	O(L × p² × d)	O(L × h × p²)
Text Recognition	O(T × B × V × log(BV))	O(B × T × d)
Math Symbol Detection	O(s × c)	O(s)
Spatial Analysis	O(s log s)	O(s²) worst case
Tree Construction	O(s log s)	O(s × h)
LaTeX Generation	O(s)	O(h)
Batch Processing	O(N × T / P)	O(N × R + P × M)
Vector Caching	O(d + log n)	O(n × d)

Legend:

n = pixel count
L = transformer layers
p = number of patches
d = embedding dimension
h = attention heads
T = sequence length
B = beam width
V = vocabulary size
s = symbol count
N = batch size
P = parallel workers
R = result size
M = model size

Optimization Opportunities

Preprocessing: Use GPU-accelerated image operations
Transformer: Implement efficient attention (FlashAttention)
Beam Search: Prune low-probability beams early
Spatial Analysis: Use spatial indexing (QuadTree/R-tree)
Caching: Implement tiered cache (L1: memory, L2: disk)
Batch Processing: Dynamic load balancing across workers
Vector Search: Use approximate nearest neighbor (HNSW)

Design Patterns Used

Pipeline Pattern: Image preprocessing → OCR → Math parsing → Output
Strategy Pattern: Multiple output format generators
Observer Pattern: Progress tracking in batch processing
Factory Pattern: Model and cache instantiation
Adapter Pattern: Format conversion layers
Repository Pattern: Vector store abstraction
Command Pattern: Cache update operations
Builder Pattern: Expression tree and XML construction

This pseudocode serves as the algorithmic blueprint for implementation in the Refinement phase.

47 KiB Raw Permalink Blame History Unescape Escape