ruvector/examples/dragnes/docs/deployment.md
rUv fde768f86d refactor(dragnes): move to standalone examples/dragnes/ app
Extract DrAgnes dermatology intelligence platform from ui/ruvocal/ into
a self-contained SvelteKit application under examples/dragnes/. Includes
all library modules, components, API routes, tests, deployment config,
PWA assets, and research documentation. Updated paths for standalone
routing (no /dragnes prefix), fixed static asset references, and
adjusted test imports.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-03-21 22:15:50 +00:00

22 KiB

DrAgnes Google Cloud Deployment Plan

Status: Research & Planning Date: 2026-03-21

Overview

DrAgnes leverages the existing pi.ruv.io Google Cloud infrastructure, extending it with dermatology-specific services. The deployment follows a multi-region, HIPAA-compliant architecture using Google Cloud's BAA-covered services.

Architecture Overview

                        ┌─────────────────────────────────┐
                        │        Cloud CDN + LB            │
                        │   (Global, HTTPS termination)    │
                        └──────────┬──────────────────────┘
                                   │
                    ┌──────────────┼──────────────┐
                    │              │              │
              ┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐
              │ us-east1  │ │ us-west1  │ │ europe-w1 │
              │ (primary) │ │ (failover)│ │ (EU data) │
              └─────┬─────┘ └─────┬─────┘ └─────┬─────┘
                    │              │              │
         ┌──────────┴──────────────┴──────────────┴──────────┐
         │                  Service Mesh                      │
         │                                                    │
         │  ┌────────────┐  ┌────────────┐  ┌────────────┐   │
         │  │ DrAgnes    │  │ Brain      │  │ CNN Model  │   │
         │  │ API        │  │ Server     │  │ Server     │   │
         │  │ (Cloud Run)│  │ (Cloud Run)│  │ (Cloud Run)│   │
         │  └─────┬──────┘  └─────┬──────┘  └─────┬──────┘   │
         │        │               │               │           │
         │  ┌─────┴───────────────┴───────────────┴─────┐     │
         │  │              Data Layer                    │     │
         │  │                                            │     │
         │  │  Firestore │ GCS │ Memorystore │ BigQuery  │     │
         │  └────────────────────────────────────────────┘     │
         │                                                    │
         │  ┌────────────────────────────────────────────┐     │
         │  │           Event Layer                      │     │
         │  │                                            │     │
         │  │  Pub/Sub │ Cloud Scheduler │ Cloud Tasks   │     │
         │  └────────────────────────────────────────────┘     │
         │                                                    │
         │  ┌────────────────────────────────────────────┐     │
         │  │           Security Layer                   │     │
         │  │                                            │     │
         │  │  IAM │ Secret Manager │ CMEK │ VPC-SC     │     │
         │  └────────────────────────────────────────────┘     │
         └────────────────────────────────────────────────────┘

Service Configuration

1. DrAgnes API Service (Cloud Run)

Primary API service for classification requests and practice management.

# dragnes-api.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: dragnes-api
  annotations:
    run.googleapis.com/launch-stage: GA
    run.googleapis.com/ingress: internal-and-cloud-load-balancing
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minInstances: "2"
        autoscaling.knative.dev/maxInstances: "100"
        run.googleapis.com/cpu-throttling: "false"
        run.googleapis.com/execution-environment: gen2
    spec:
      containerConcurrency: 80
      timeoutSeconds: 300
      containers:
        - image: gcr.io/ruvector-brain-dev/dragnes-api:latest
          ports:
            - containerPort: 8080
          resources:
            limits:
              cpu: "2"
              memory: 2Gi
          env:
            - name: BRAIN_URL
              value: "https://brain-server-internal.run.app"
            - name: MODEL_BUCKET
              value: "gs://dragnes-models"
            - name: RUST_LOG
              value: "info"
          startupProbe:
            httpGet:
              path: /health
            initialDelaySeconds: 5
            periodSeconds: 5

2. CNN Model Server (Cloud Run)

Server-side CNN inference for practices without WASM capability.

# dragnes-cnn.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: dragnes-cnn
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minInstances: "1"
        autoscaling.knative.dev/maxInstances: "50"
        run.googleapis.com/cpu-throttling: "false"
        run.googleapis.com/execution-environment: gen2
    spec:
      containerConcurrency: 20
      timeoutSeconds: 30
      containers:
        - image: gcr.io/ruvector-brain-dev/dragnes-cnn:latest
          ports:
            - containerPort: 8080
          resources:
            limits:
              cpu: "4"
              memory: 4Gi
          env:
            - name: MODEL_PATH
              value: "/models/mobilenetv3_small_int8.bin"
            - name: SIMD_ENABLED
              value: "true"

Performance Notes:

  • Cloud Run gen2 provides AVX2 SIMD acceleration
  • INT8 quantized model fits in <5MB memory
  • Target: <50ms inference per image
  • Concurrency limited to 20 (CPU-bound workload)

3. Brain Server (Existing)

The existing pi.ruv.io brain server at brain-server-*.run.app handles:

  • Knowledge graph management (316K edges)
  • HNSW search (128-dim, PiQ3 quantized)
  • PubMed integration
  • Sparsifier analytics (ADR-116)
  • Witness chain management

DrAgnes-specific extensions:

  • New memory namespace: dragnes-dermatology
  • Custom similarity threshold for dermoscopic embeddings
  • Dermoscopy-specific PubMed search templates
  • Classification feedback ingestion endpoint

4. PWA Frontend (Firebase Hosting)

Firebase Hosting Configuration
    │
    ├── Hosting
    │       ├── SPA routing (all paths → index.html)
    │       ├── CDN caching (immutable assets: 1 year)
    │       ├── WASM files: Cache-Control: public, max-age=31536000
    │       ├── Model weights: Cache-Control: public, max-age=86400
    │       └── API proxy: /api/** → Cloud Run dragnes-api
    │
    ├── Service Worker (Workbox)
    │       ├── Precache: app shell, WASM module, model weights
    │       ├── Runtime cache: brain search results (stale-while-revalidate)
    │       ├── Background sync: diagnosis submissions
    │       └── Offline fallback page
    │
    └── PWA Manifest
            ├── name: "DrAgnes"
            ├── display: "standalone"
            ├── orientation: "portrait"
            ├── theme_color: "#1a365d"
            └── icons: 192x192, 512x512 (maskable)

Data Storage

Firestore (De-Identified Metadata)

Firestore Collections
    │
    ├── /practices/{practiceId}
    │       ├── name: string
    │       ├── region: string
    │       ├── modelVersion: string
    │       ├── totalClassifications: number
    │       ├── dpBudgetUsed: number
    │       └── createdAt: timestamp
    │
    ├── /classifications/{classificationId}
    │       ├── practiceId: string (hashed)
    │       ├── lesionClass: string
    │       ├── confidence: number
    │       ├── abcdeTotal: number
    │       ├── sevenPointScore: number
    │       ├── riskLevel: string
    │       ├── clinicianAction: string
    │       ├── fitzpatrickType: number (I-VI)
    │       ├── bodyLocationCategory: string
    │       ├── ageDecade: number
    │       ├── witnessHash: string
    │       └── createdAt: timestamp
    │       NOTE: No patient identifiers. No raw images.
    │
    ├── /feedback/{feedbackId}
    │       ├── classificationId: string
    │       ├── clinicianReview: string
    │       ├── correctedClass: string (optional)
    │       ├── histopathResult: string (optional)
    │       └── createdAt: timestamp
    │
    └── /modelVersions/{versionId}
            ├── version: string (semver)
            ├── trainedOn: number (embedding count)
            ├── accuracy: number
            ├── sensitivityMelanoma: number
            ├── specificityMelanoma: number
            ├── fairnessScore: number
            └── releasedAt: timestamp

Firestore Security Rules:

  • Practice-level tenant isolation
  • Write access: authenticated clinicians only
  • Read access: same practice only
  • Admin access: platform operators only
  • No cross-practice data access

Google Cloud Storage (GCS)

GCS Buckets
    │
    ├── gs://dragnes-models/
    │       ├── mobilenetv3_small_int8.bin          (INT8 model, ~5MB)
    │       ├── mobilenetv3_small_fp32.bin           (FP32 model, ~15MB)
    │       ├── mobilenetv3_small.wasm               (WASM module, ~2MB)
    │       ├── lora_weights/{practiceId}/latest.bin (per-practice LoRA)
    │       └── reference_embeddings/top1000.bin     (offline cache)
    │       Encryption: CMEK (AES-256)
    │       Access: dragnes-api service account only
    │
    ├── gs://dragnes-rvf/
    │       ├── {contributorHash}/{memoryId}.rvf     (RVF containers)
    │       Encryption: CMEK (AES-256)
    │       Access: brain server service account only
    │       Lifecycle: Archive after 90 days, delete after 7 years
    │
    └── gs://dragnes-audit/
            ├── access_logs/YYYY/MM/DD/*.jsonl
            ├── classification_logs/YYYY/MM/DD/*.jsonl
            └── security_events/YYYY/MM/DD/*.jsonl
            Encryption: CMEK (AES-256)
            Retention: 6 years (HIPAA minimum)
            Access: Security team only

Memorystore (Redis) -- Optional Performance Layer

Redis Instance (Basic tier, 1GB)
    │
    ├── Session cache (15-min TTL)
    ├── Rate limiting counters (per-practice, per-hour)
    ├── HNSW search result cache (5-min TTL)
    └── Model version cache (1-hour TTL)

Event Architecture

Pub/Sub Topics

Pub/Sub Configuration
    │
    ├── dragnes-classification (new classification events)
    │       ├── Publisher: dragnes-api
    │       ├── Subscriber: brain-server (brain ingestion)
    │       ├── Subscriber: dragnes-analytics (BigQuery sink)
    │       └── Subscriber: dragnes-alerts (monitoring)
    │
    ├── dragnes-feedback (clinician feedback events)
    │       ├── Publisher: dragnes-api
    │       ├── Subscriber: brain-server (model improvement)
    │       └── Subscriber: dragnes-analytics (accuracy tracking)
    │
    ├── dragnes-model-update (model version events)
    │       ├── Publisher: dragnes-training (Cloud Run job)
    │       ├── Subscriber: dragnes-api (hot-reload)
    │       └── Subscriber: dragnes-cnn (hot-reload)
    │
    └── dragnes-alerts (monitoring alerts)
            ├── Publisher: various services
            └── Subscriber: Cloud Monitoring → PagerDuty

Cloud Scheduler Jobs

Scheduled Jobs
    │
    ├── dragnes-model-retrain
    │       ├── Schedule: Weekly (Sunday 02:00 UTC)
    │       ├── Action: Trigger Cloud Run job for model retraining
    │       ├── Input: New feedback + brain embeddings since last train
    │       └── Output: New model version to GCS
    │
    ├── dragnes-drift-check
    │       ├── Schedule: Daily (06:00 UTC)
    │       ├── Action: Brain drift analysis on dermoscopy namespace
    │       └── Alert: If drift > 0.15, trigger early retrain
    │
    ├── dragnes-fairness-audit
    │       ├── Schedule: Weekly (Monday 08:00 UTC)
    │       ├── Action: Compute accuracy by Fitzpatrick type
    │       └── Alert: If disparity > 5%, flag for investigation
    │
    ├── dragnes-privacy-audit
    │       ├── Schedule: Daily (04:00 UTC)
    │       ├── Action: Verify no PII in Firestore/GCS
    │       └── Alert: Any PII detection triggers incident
    │
    └── dragnes-backup
            ├── Schedule: Daily (00:00 UTC)
            ├── Action: Firestore export to GCS
            └── Retention: 30 daily + 12 monthly + 7 yearly

Security Configuration

Google Secrets Manager

Secrets (extending existing pi.ruv.io secrets)
    │
    ├── dragnes-api-key              (API authentication key)
    ├── dragnes-jwt-signing-key      (JWT token signing)
    ├── dragnes-cmek-key-id          (CMEK key reference)
    ├── dragnes-oauth-client-id      (Google OAuth client)
    ├── dragnes-oauth-client-secret  (Google OAuth secret)
    ├── dragnes-firebase-config      (Firebase project config)
    └── dragnes-pubmed-api-key       (NCBI E-utilities key)

    Existing secrets reused:
    ├── ANTHROPIC_API_KEY            (for chat interface LLM)
    └── huggingface-token            (for model downloads)

IAM Configuration

Service Accounts
    │
    ├── dragnes-api@ruvector-brain-dev.iam.gserviceaccount.com
    │       ├── roles/run.invoker (invoke brain server)
    │       ├── roles/datastore.user (Firestore read/write)
    │       ├── roles/storage.objectViewer (model bucket)
    │       ├── roles/pubsub.publisher (classification events)
    │       └── roles/secretmanager.secretAccessor (secrets)
    │
    ├── dragnes-cnn@ruvector-brain-dev.iam.gserviceaccount.com
    │       ├── roles/storage.objectViewer (model bucket)
    │       └── roles/secretmanager.secretAccessor (secrets)
    │
    └── dragnes-training@ruvector-brain-dev.iam.gserviceaccount.com
            ├── roles/storage.objectAdmin (model bucket, write new versions)
            ├── roles/datastore.viewer (read feedback data)
            ├── roles/pubsub.publisher (model update events)
            └── roles/bigquery.dataViewer (analytics queries)

VPC Service Controls

VPC-SC Perimeter: dragnes-perimeter
    │
    ├── Protected Services
    │       ├── firestore.googleapis.com
    │       ├── storage.googleapis.com
    │       ├── bigquery.googleapis.com
    │       └── secretmanager.googleapis.com
    │
    ├── Access Levels
    │       ├── Corporate network CIDR ranges
    │       ├── Cloud Run service accounts (internal)
    │       └── Emergency break-glass accounts
    │
    └── Ingress Rules
            ├── Allow: Cloud Run → Firestore/GCS (internal)
            ├── Allow: Cloud Scheduler → Cloud Run (internal)
            └── Deny: All other access to protected services

Multi-Region Deployment

Region Selection

Region Role Justification
us-east1 (South Carolina) Primary Low latency to East Coast US; HIPAA eligible
us-west1 (Oregon) Failover West Coast coverage; disaster recovery
europe-west1 (Belgium) EU Data Residency GDPR compliance for EU practices
asia-southeast1 (Singapore) Future APAC coverage (Phase 4)

Cross-Region Data Flow

Data Residency Rules
    │
    ├── Patient metadata: Region-locked (US data stays in US, EU in EU)
    ├── De-identified brain embeddings: Global (privacy-preserving)
    ├── Model weights: Global (no PHI)
    ├── Audit logs: Region-locked
    └── WASM/PWA assets: Global CDN

Monitoring & Observability

Cloud Monitoring Dashboard

DrAgnes Operations Dashboard
    │
    ├── Service Health
    │       ├── API latency (p50, p95, p99)
    │       ├── CNN inference latency
    │       ├── Error rate by endpoint
    │       ├── Active instances per region
    │       └── Request volume (per hour, per practice)
    │
    ├── Classification Metrics
    │       ├── Classifications per hour (global)
    │       ├── Distribution by lesion class
    │       ├── Average confidence score
    │       ├── Clinician override rate
    │       └── Sensitivity/specificity (rolling 30-day)
    │
    ├── Brain Health
    │       ├── Memory count (dermatology namespace)
    │       ├── Drift status
    │       ├── Embedding quality score
    │       └── Sync latency
    │
    ├── Privacy & Compliance
    │       ├── PII scan results (should always be 0)
    │       ├── DP budget consumption per practice
    │       ├── Access audit anomalies
    │       └── Witness chain verification failures
    │
    └── Cost Tracking
            ├── Cloud Run cost by service
            ├── Storage cost by bucket
            ├── Network egress cost
            └── Total monthly cost vs. budget

Alert Policies

Alert Condition Severity Action
API error rate > 1% 5-min window P2 PagerDuty notification
CNN latency > 500ms (p95) 15-min window P3 Slack notification
PII detected in cloud Any occurrence P1 Immediate incident response
Melanoma sensitivity < 90% 7-day rolling P1 Model freeze + investigation
Fairness disparity > 5% Weekly audit P2 Investigation within 24 hours
Brain drift > 0.15 Daily check P3 Trigger early retrain
DP budget > 80% for practice Per check P3 Notify practice admin

Cost Projections

Monthly Cost Estimates (by Scale)

Component 10 Practices 100 Practices 1,000 Practices
Cloud Run (API) $50 $200 $1,500
Cloud Run (CNN) $30 $150 $1,000
Brain Server (shared) $150 (existing) $150 $300
Firestore $10 $50 $300
GCS (models + RVF) $5 $20 $100
Cloud CDN $10 $30 $150
Firebase Hosting $0 (free tier) $25 $100
Memorystore (Redis) $0 (skip) $50 $100
Cloud Monitoring $0 (free tier) $50 $200
Secret Manager $1 $1 $5
Pub/Sub $1 $5 $30
Cloud Scheduler $1 $1 $5
BigQuery (analytics) $0 (free tier) $20 $100
Total Monthly ~$258 ~$752 ~$3,890
Per Practice/Month $25.80 $7.52 $3.89

Revenue Model

Tier Price Features
Starter $99/mo/practice 500 classifications/mo, WASM offline, basic brain
Professional $199/mo/practice Unlimited, LoRA adaptation, full brain, teledermatology
Enterprise Custom Multi-practice, EHR integration, dedicated support, SLA
Academic Free Research use, data contribution agreement
Underserved Free Qualifying community health centers

Break-even: approximately 30 practices on Professional tier covers infrastructure costs at the 100-practice scale.

Deployment Pipeline

Deployment Pipeline (Cloud Build)
    │
    ├── Source: GitHub (ruvector/dragnes)
    ├── Trigger: Push to main branch
    │
    ├── Build Stage
    │       ├── Rust compilation (--release --target x86_64-unknown-linux-gnu)
    │       ├── WASM compilation (--target wasm32-unknown-unknown)
    │       ├── Docker image build (distroless base)
    │       └── SvelteKit build (npm run build)
    │
    ├── Test Stage
    │       ├── Unit tests (cargo test)
    │       ├── Integration tests (against staging brain)
    │       ├── WASM inference accuracy test (reference images)
    │       ├── Security scan (cargo audit + npm audit)
    │       └── HIPAA compliance checks (PII scanner)
    │
    ├── Deploy Stage (Canary)
    │       ├── Deploy to staging (full test suite)
    │       ├── Canary deployment (5% traffic for 30 minutes)
    │       ├── Monitor error rate and latency
    │       ├── Auto-rollback if error rate > 0.5%
    │       └── Promote to 100% if healthy
    │
    └── Post-Deploy
            ├── Smoke tests against production
            ├── Notify operations channel
            ├── Update model version registry
            └── Archive previous version artifacts

Disaster Recovery

Scenario RTO RPO Recovery Procedure
Single region outage 5 minutes 0 (multi-region) Automatic failover via Cloud LB
Firestore corruption 1 hour 24 hours Restore from daily export
Model corruption 10 minutes N/A Roll back to previous model version
Brain server outage 5 minutes 0 Existing brain HA (pi.ruv.io)
Complete GCP outage 4 hours 24 hours Multi-cloud DR (backup to AWS S3)
Security breach 1 hour N/A Incident response plan activation