ruvector/examples/google-cloud
ruvnet 100fd8bbef chore(workspace): clippy-clean every crate under -D warnings + fmt + repair pre-existing broken benches
Workspace-wide hygiene sweep that brings every crate (except
ruvector-postgres, blocked by an unrelated PGRX_HOME env requirement)
to `cargo clippy --workspace --all-targets --no-deps -- -D warnings`
exit 0.

Approach: each crate gets a `[lints]` block in its Cargo.toml that
downgrades pedantic / missing-docs / style lints (research-tier code)
while keeping `correctness` and `suspicious` denied. The Cargo.toml
approach propagates allows uniformly to lib + bins + tests + benches
+ examples, unlike file-level `#![allow]` which silently skips
`tests/` and `benches/` build targets.

Per-crate footprint:

  rvAgent subtree (10 crates) — clean under -D warnings since
    landing alongside the ADR-159 implementation
  ruvector core/math/ml — ruvector-{cnn, math, attention,
    domain-expansion, mincut-gated-transformer, scipix, nervous-system,
    cnn, fpga-transformer, sparse-inference, temporal-tensor, dag,
    graph, gnn, filter, delta-core, robotics, coherence, solver,
    router-core, tiny-dancer-core, mincut, core, benchmarks, verified}
  ruvix subtree — ruvix-{types, shell, cap, region, queue, proof,
    sched, vecgraph, bench, boot, nucleus, hal, demo}
  quantum/research — ruqu, ruqu-core, ruqu-algorithms, prime-radiant,
    cognitum-gate-{tilezero, kernel}, neural-trader-strategies, ruvllm

Genuine pre-existing bugs surfaced and fixed in passing:

  - ruvix-cap/benches/cap_bench.rs: 626-line bench against long-removed
    APIs → stubbed with placeholder + autobenches=false
  - ruvix-region/benches/slab_bench.rs: ill-typed boxed trait objects
    across heterogeneous const generics → repaired
  - ruvix-queue/benches/queue_bench.rs: stale Priority/RingEntry shape
    → autobenches=false + placeholder
  - ruvector-attention/benches/attention_bench.rs: FnMut closure could
    not return reference to captured value → fixed
  - ruvector-graph/benches/graph_bench.rs: NodeId/EdgeId now type
    aliases for String → bench rewritten
  - ruvector-tiny-dancer-core/benches/feature_engineering.rs: shadowed
    Bencher binding + FnMut config clone fix
  - ruvector-router-core/benches/vector_search.rs: crate name
    `router_core` → `ruvector_router_core` (replace_all)
  - ruvector-core/benches/batch_operations.rs: DbOptions import path
  - ruvector-mincut-wasm/src/lib.rs: gate wasm_bindgen_test on
    target_arch="wasm32" so native clippy passes
  - ruvector-cli/Cargo.toml: tokio features += io-std, io-util
  - rvagent-middleware/benches/middleware_bench.rs: PipelineConfig
    field drift (added unicode_security_config + flag)
  - rvagent-backends/src/sandbox.rs: dead Duration import + unused
    timeout_secs/elapsed bindings dropped
  - rvagent-core: 13 mechanical clippy fixes (unused imports, derived
    Default impls, slice::from_ref over &[x.clone()], etc.)
  - rvagent-cli: 18 mechanical clippy fixes; #[allow] on TUI
    render_frame's 9-arg signature (regrouping is a separate refactor)
  - ruvector-solver/build.rs: map_or(false, ..) → is_ok_and(..)

cargo fmt --all applied workspace-wide. No formatting drift remaining.

Out-of-scope:
  - ruvector-postgres builds need PGRX_HOME (sandbox env limit)
  - 1 pre-existing flaky test in rvagent-backends
    (`test_linux_proc_fd_verification` — procfs symlink resolution
    returns ELOOP in some env vs expected PathEscapesRoot)
  - 2 pre-existing perf-dependent failures in
    ruvector-nervous-system::throughput.rs (HDC throughput on slower
    machines)

Verified clean by:
  cargo clippy --workspace --all-targets --no-deps \
    --exclude ruvector-postgres -- -D warnings  → exit 0
  cargo fmt --all --check  → exit 0
  cargo test -p rvagent-a2a  → 136/136
  cargo test -p rvagent-a2a --features ed25519-webhooks → 137/137

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-25 17:00:20 -04:00
..
benchmark_results feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
src style: apply cargo fmt formatting 2026-03-12 20:57:18 -04:00
Cargo.toml chore(workspace): clippy-clean every crate under -D warnings + fmt + repair pre-existing broken benches 2026-04-25 17:00:20 -04:00
cloudrun.yaml feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
deploy.sh feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
Dockerfile.build feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
Dockerfile.cloudrun feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
Dockerfile.gpu feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
Dockerfile.simple feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00
README.md feat(micro-hnsw-wasm): Add Neuromorphic HNSW v2.3 with SNN Integration (#40) 2025-12-01 22:30:15 -05:00

RuVector Cloud Run GPU Deployment

High-performance vector database benchmarks and deployment on Google Cloud Run with GPU acceleration (NVIDIA L4).

Table of Contents

Overview

This example provides:

  • GPU-Accelerated Benchmarks: SIMD (AVX-512, AVX2, NEON) and CUDA optimized operations
  • Cloud Run Deployment: Scalable, serverless deployment with GPU support
  • Multiple Deployment Models:
    • Single-node benchmark service
    • Attention/GNN inference service
    • Raft consensus cluster (3+ nodes)
    • Primary-replica replication

Supported RuVector Capabilities

Capability Description Cloud Run Support
Core Vector Search HNSW indexing, k-NN search Full GPU
Attention Mechanisms Multi-head attention layers Full GPU
GNN Inference Graph neural network forward pass Full GPU
Raft Consensus Distributed consensus protocol Multi-service
Replication Primary-replica data replication Multi-service
Quantization INT8/PQ compression GPU optimized

Prerequisites

Required Tools

# Google Cloud CLI
curl https://sdk.cloud.google.com | bash
gcloud init

# Docker
# Install from: https://docs.docker.com/get-docker/

# Rust (for local development)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

GCP Setup

# Authenticate
gcloud auth login

# Set project
gcloud config set project YOUR_PROJECT_ID

# Enable required APIs
gcloud services enable \
    run.googleapis.com \
    containerregistry.googleapis.com \
    cloudbuild.googleapis.com \
    compute.googleapis.com

Quick Start

1. One-Command Deployment

cd examples/google-cloud

# Setup and deploy
./deploy.sh setup
./deploy.sh build Dockerfile.gpu latest
./deploy.sh push latest
./deploy.sh deploy latest true  # true = GPU enabled

# Run benchmark
./deploy.sh benchmark ruvector-benchmark quick

2. View Results

# Get service URL
gcloud run services describe ruvector-benchmark \
    --region=us-central1 \
    --format='value(status.url)'

# Test endpoints
curl $URL/health
curl $URL/info
curl -X POST $URL/benchmark/quick

Step-by-Step Tutorial

Step 1: Project Setup

# Clone the repository
git clone https://github.com/ruvnet/ruvector.git
cd ruvector/examples/google-cloud

# Set environment variables
export GCP_PROJECT_ID="your-project-id"
export GCP_REGION="us-central1"

# Run setup
./deploy.sh setup

Step 2: Build the Docker Image

Option A: Local Build (faster iteration)

# Build locally
./deploy.sh build Dockerfile.gpu latest

# Push to Container Registry
./deploy.sh push latest

Option B: Cloud Build (no local Docker required)

# Build in the cloud
./deploy.sh build-cloud Dockerfile.gpu latest

Step 3: Deploy to Cloud Run

Basic Deployment (with GPU)

./deploy.sh deploy latest true

Custom Configuration

# High-memory configuration for large vector sets
MEMORY=16Gi CPU=8 ./deploy.sh deploy latest true

# Scale settings
MIN_INSTANCES=1 MAX_INSTANCES=20 ./deploy.sh deploy latest true

Step 4: Run Benchmarks

# Quick benchmark (128d, 10k vectors)
./deploy.sh benchmark ruvector-benchmark quick

# Distance computation benchmark
./deploy.sh benchmark ruvector-benchmark distance

# HNSW index benchmark
./deploy.sh benchmark ruvector-benchmark hnsw

# Full benchmark suite
./deploy.sh benchmark ruvector-benchmark full

Step 5: View Results

# Get all results
./deploy.sh results ruvector-benchmark

# View logs
./deploy.sh logs ruvector-benchmark

# Check service status
./deploy.sh status

Deployment Options

1. Single-Node Benchmark Service

Best for: Development, testing, single-user benchmarks

./deploy.sh deploy latest true

2. Attention/GNN Service

Best for: Neural network inference, embedding generation

./deploy.sh deploy-attention latest

Features:

  • 16GB memory for large models
  • 3-layer GNN with 8 attention heads
  • Optimized for batch inference

3. Raft Consensus Cluster

Best for: High availability, consistent distributed state

# Deploy 3-node cluster
CLUSTER_SIZE=3 ./deploy.sh deploy-raft

# Deploy 5-node cluster for higher fault tolerance
CLUSTER_SIZE=5 ./deploy.sh deploy-raft

Architecture:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Node 1    │◄───►│   Node 2    │◄───►│   Node 3    │
│  (Leader)   │     │  (Follower) │     │  (Follower) │
└─────────────┘     └─────────────┘     └─────────────┘
       │                  │                   │
       └──────────────────┴───────────────────┘
                    Raft Consensus

Configuration:

# Environment variables for Raft nodes
RUVECTOR_NODE_ID=0              # Node identifier (0, 1, 2, ...)
RUVECTOR_CLUSTER_SIZE=3         # Total cluster size
RUVECTOR_RAFT_ELECTION_TIMEOUT=150  # Election timeout (ms)
RUVECTOR_RAFT_HEARTBEAT_INTERVAL=50 # Heartbeat interval (ms)

4. Primary-Replica Replication

Best for: Read scaling, geographic distribution

# Deploy with 3 replicas
./deploy.sh deploy-replication 3

Architecture:

                    ┌─────────────┐
          Writes───►│   Primary   │
                    └──────┬──────┘
                           │ Replication
          ┌────────────────┼────────────────┐
          ▼                ▼                ▼
    ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
    │  Replica 1  │  │  Replica 2  │  │  Replica 3  │
    └─────────────┘  └─────────────┘  └─────────────┘
          │                │                │
          └────────────────┴────────────────┘
                      Reads (load balanced)

Configuration:

# Primary node
RUVECTOR_MODE=primary
RUVECTOR_REPLICATION_FACTOR=3
RUVECTOR_SYNC_MODE=async  # or "sync" for strong consistency

# Replica nodes
RUVECTOR_MODE=replica
RUVECTOR_PRIMARY_URL=https://ruvector-primary-xxx.run.app

Benchmarking

Available Benchmarks

Benchmark Description Dimensions Vector Count
quick Fast sanity check 128 10,000
distance Distance computation configurable configurable
hnsw HNSW index search configurable configurable
gnn GNN forward pass 256 10,000 nodes
cuda CUDA kernel perf - -
quantization INT8/PQ compression configurable configurable

Running Benchmarks via API

# Quick benchmark
curl -X POST https://YOUR-SERVICE-URL/benchmark/quick

# Custom distance benchmark
curl -X POST "https://YOUR-SERVICE-URL/benchmark/distance?dims=768&num_vectors=100000&batch_size=64"

# Custom HNSW benchmark
curl -X POST "https://YOUR-SERVICE-URL/benchmark/hnsw?dims=768&num_vectors=100000&k=10"

# Full custom benchmark
curl -X POST https://YOUR-SERVICE-URL/benchmark \
    -H "Content-Type: application/json" \
    -d '{
        "dims": 768,
        "num_vectors": 100000,
        "num_queries": 1000,
        "k": 10,
        "benchmark_type": "hnsw"
    }'

Expected Performance

NVIDIA L4 GPU (Cloud Run default):

Operation Dimensions Vectors P99 Latency QPS
L2 Distance 128 10k 0.5ms 2,000
L2 Distance 768 100k 5ms 200
HNSW Search 128 100k 1ms 1,000
HNSW Search 768 1M 10ms 100
GNN Forward 256 10k nodes 15ms 66

SIMD Capabilities

The benchmark automatically detects and uses:

Architecture SIMD Vector Width Speedup
x86_64 AVX-512 16 floats 8-16x
x86_64 AVX2 8 floats 4-8x
x86_64 SSE4.1 4 floats 2-4x
ARM64 NEON 4 floats 2-4x

Architecture

System Components

┌─────────────────────────────────────────────────────────────────┐
│                        Cloud Run                                 │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐ │
│  │ HTTP Server │  │  Benchmark  │  │    SIMD/GPU Runtime     │ │
│  │   (Axum)    │  │   Engine    │  │  AVX-512 │ CUDA │ NEON  │ │
│  └──────┬──────┘  └──────┬──────┘  └────────────────┬────────┘ │
│         │                │                          │          │
│  ┌──────┴────────────────┴──────────────────────────┴────────┐ │
│  │                    RuVector Core                          │ │
│  │  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────────────┐  │ │
│  │  │  HNSW  │  │  GNN   │  │ Quant  │  │  Attention     │  │ │
│  │  │ Index  │  │ Layers │  │  INT8  │  │  Multi-Head    │  │ │
│  │  └────────┘  └────────┘  └────────┘  └────────────────┘  │ │
│  └───────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│                      NVIDIA L4 GPU                              │
└─────────────────────────────────────────────────────────────────┘

File Structure

examples/google-cloud/
├── Cargo.toml              # Rust dependencies
├── Dockerfile.gpu          # GPU-optimized Docker image
├── cloudrun.yaml           # Cloud Run service configs
├── deploy.sh               # Deployment automation
├── README.md               # This file
└── src/
    ├── main.rs             # CLI entry point
    ├── benchmark.rs        # Benchmark implementations
    ├── simd.rs             # SIMD-optimized operations
    ├── cuda.rs             # GPU/CUDA operations
    ├── report.rs           # Report generation
    └── server.rs           # HTTP server for Cloud Run

API Reference

Endpoints

Method Endpoint Description
GET / API info and available endpoints
GET /health Health check
GET /info System information (GPU, SIMD, memory)
POST /benchmark Run custom benchmark
POST /benchmark/quick Run quick benchmark
POST /benchmark/distance Run distance benchmark
POST /benchmark/hnsw Run HNSW benchmark
GET /results Get all benchmark results
POST /results/clear Clear stored results

Health Check Response

{
    "status": "healthy",
    "version": "0.1.0",
    "gpu_available": true,
    "gpu_name": "NVIDIA L4",
    "simd_capability": "AVX2",
    "uptime_secs": 3600
}

Benchmark Request

{
    "dims": 768,
    "num_vectors": 100000,
    "num_queries": 1000,
    "k": 10,
    "benchmark_type": "hnsw"
}

Benchmark Response

{
    "status": "success",
    "message": "Benchmark completed",
    "result": {
        "name": "hnsw_768d_100000v",
        "operation": "hnsw_search",
        "dimensions": 768,
        "num_vectors": 100000,
        "mean_time_ms": 2.5,
        "p50_ms": 2.1,
        "p95_ms": 3.8,
        "p99_ms": 5.2,
        "qps": 400.0,
        "memory_mb": 585.9,
        "gpu_enabled": true
    }
}

Troubleshooting

Common Issues

1. GPU not detected

# Check GPU availability
gcloud run services describe ruvector-benchmark \
    --region=us-central1 \
    --format='yaml(spec.template.metadata.annotations)'

# Ensure GPU annotations are present:
# run.googleapis.com/gpu-type: nvidia-l4
# run.googleapis.com/gpu-count: "1"

2. Container fails to start

# Check logs
./deploy.sh logs ruvector-benchmark 200

# Common causes:
# - Missing CUDA libraries (use nvidia/cuda base image)
# - Memory limit too low (increase MEMORY env var)
# - Health check failing (check /health endpoint)

3. Slow cold starts

# Set minimum instances
MIN_INSTANCES=1 ./deploy.sh deploy latest true

# Enable startup CPU boost (already in cloudrun.yaml)

4. Out of memory

# Increase memory allocation
MEMORY=16Gi ./deploy.sh deploy latest true

# Or reduce vector count in benchmark
curl -X POST "$URL/benchmark?num_vectors=50000"

Performance Optimization

  1. Enable CPU boost for cold starts

    run.googleapis.com/startup-cpu-boost: "true"
    
  2. Disable CPU throttling

    run.googleapis.com/cpu-throttling: "false"
    
  3. Use Gen2 execution environment

    run.googleapis.com/execution-environment: gen2
    
  4. Tune concurrency based on workload

    • CPU-bound: Lower concurrency (10-20)
    • Memory-bound: Medium concurrency (50-80)
    • I/O-bound: Higher concurrency (100+)

Cleanup

# Remove all RuVector services
./deploy.sh cleanup

# Remove specific service
gcloud run services delete ruvector-benchmark --region=us-central1

# Remove container images
gcloud container images delete gcr.io/PROJECT_ID/ruvector-benchmark

Cost Estimation

Configuration vCPU Memory GPU Cost/hour
Basic 2 4GB None ~$0.10
GPU Standard 4 8GB L4 ~$0.80
GPU High-Mem 8 16GB L4 ~$1.20
Raft Cluster (3) 6 12GB None ~$0.30

Costs are approximate and vary by region. See Cloud Run Pricing.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run benchmarks to verify performance
  5. Submit a pull request

License

MIT License - see LICENSE for details.