# DrAgnes Google Cloud Deployment Plan **Status**: Research & Planning **Date**: 2026-03-21 ## Overview DrAgnes leverages the existing pi.ruv.io Google Cloud infrastructure, extending it with dermatology-specific services. The deployment follows a multi-region, HIPAA-compliant architecture using Google Cloud's BAA-covered services. ## Architecture Overview ``` ┌─────────────────────────────────┐ │ Cloud CDN + LB │ │ (Global, HTTPS termination) │ └──────────┬──────────────────────┘ │ ┌──────────────┼──────────────┐ │ │ │ ┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐ │ us-east1 │ │ us-west1 │ │ europe-w1 │ │ (primary) │ │ (failover)│ │ (EU data) │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │ │ ┌──────────┴──────────────┴──────────────┴──────────┐ │ Service Mesh │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ DrAgnes │ │ Brain │ │ CNN Model │ │ │ │ API │ │ Server │ │ Server │ │ │ │ (Cloud Run)│ │ (Cloud Run)│ │ (Cloud Run)│ │ │ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │ │ │ │ │ │ │ ┌─────┴───────────────┴───────────────┴─────┐ │ │ │ Data Layer │ │ │ │ │ │ │ │ Firestore │ GCS │ Memorystore │ BigQuery │ │ │ └────────────────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────────────────┐ │ │ │ Event Layer │ │ │ │ │ │ │ │ Pub/Sub │ Cloud Scheduler │ Cloud Tasks │ │ │ └────────────────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────────────────┐ │ │ │ Security Layer │ │ │ │ │ │ │ │ IAM │ Secret Manager │ CMEK │ VPC-SC │ │ │ └────────────────────────────────────────────┘ │ └────────────────────────────────────────────────────┘ ``` ## Service Configuration ### 1. DrAgnes API Service (Cloud Run) Primary API service for classification requests and practice management. ```yaml # dragnes-api.yaml apiVersion: serving.knative.dev/v1 kind: Service metadata: name: dragnes-api annotations: run.googleapis.com/launch-stage: GA run.googleapis.com/ingress: internal-and-cloud-load-balancing spec: template: metadata: annotations: autoscaling.knative.dev/minInstances: "2" autoscaling.knative.dev/maxInstances: "100" run.googleapis.com/cpu-throttling: "false" run.googleapis.com/execution-environment: gen2 spec: containerConcurrency: 80 timeoutSeconds: 300 containers: - image: gcr.io/ruvector-brain-dev/dragnes-api:latest ports: - containerPort: 8080 resources: limits: cpu: "2" memory: 2Gi env: - name: BRAIN_URL value: "https://brain-server-internal.run.app" - name: MODEL_BUCKET value: "gs://dragnes-models" - name: RUST_LOG value: "info" startupProbe: httpGet: path: /health initialDelaySeconds: 5 periodSeconds: 5 ``` ### 2. CNN Model Server (Cloud Run) Server-side CNN inference for practices without WASM capability. ```yaml # dragnes-cnn.yaml apiVersion: serving.knative.dev/v1 kind: Service metadata: name: dragnes-cnn spec: template: metadata: annotations: autoscaling.knative.dev/minInstances: "1" autoscaling.knative.dev/maxInstances: "50" run.googleapis.com/cpu-throttling: "false" run.googleapis.com/execution-environment: gen2 spec: containerConcurrency: 20 timeoutSeconds: 30 containers: - image: gcr.io/ruvector-brain-dev/dragnes-cnn:latest ports: - containerPort: 8080 resources: limits: cpu: "4" memory: 4Gi env: - name: MODEL_PATH value: "/models/mobilenetv3_small_int8.bin" - name: SIMD_ENABLED value: "true" ``` **Performance Notes**: - Cloud Run gen2 provides AVX2 SIMD acceleration - INT8 quantized model fits in <5MB memory - Target: <50ms inference per image - Concurrency limited to 20 (CPU-bound workload) ### 3. Brain Server (Existing) The existing pi.ruv.io brain server at `brain-server-*.run.app` handles: - Knowledge graph management (316K edges) - HNSW search (128-dim, PiQ3 quantized) - PubMed integration - Sparsifier analytics (ADR-116) - Witness chain management **DrAgnes-specific extensions**: - New memory namespace: `dragnes-dermatology` - Custom similarity threshold for dermoscopic embeddings - Dermoscopy-specific PubMed search templates - Classification feedback ingestion endpoint ### 4. PWA Frontend (Firebase Hosting) ``` Firebase Hosting Configuration │ ├── Hosting │ ├── SPA routing (all paths → index.html) │ ├── CDN caching (immutable assets: 1 year) │ ├── WASM files: Cache-Control: public, max-age=31536000 │ ├── Model weights: Cache-Control: public, max-age=86400 │ └── API proxy: /api/** → Cloud Run dragnes-api │ ├── Service Worker (Workbox) │ ├── Precache: app shell, WASM module, model weights │ ├── Runtime cache: brain search results (stale-while-revalidate) │ ├── Background sync: diagnosis submissions │ └── Offline fallback page │ └── PWA Manifest ├── name: "DrAgnes" ├── display: "standalone" ├── orientation: "portrait" ├── theme_color: "#1a365d" └── icons: 192x192, 512x512 (maskable) ``` ## Data Storage ### Firestore (De-Identified Metadata) ``` Firestore Collections │ ├── /practices/{practiceId} │ ├── name: string │ ├── region: string │ ├── modelVersion: string │ ├── totalClassifications: number │ ├── dpBudgetUsed: number │ └── createdAt: timestamp │ ├── /classifications/{classificationId} │ ├── practiceId: string (hashed) │ ├── lesionClass: string │ ├── confidence: number │ ├── abcdeTotal: number │ ├── sevenPointScore: number │ ├── riskLevel: string │ ├── clinicianAction: string │ ├── fitzpatrickType: number (I-VI) │ ├── bodyLocationCategory: string │ ├── ageDecade: number │ ├── witnessHash: string │ └── createdAt: timestamp │ NOTE: No patient identifiers. No raw images. │ ├── /feedback/{feedbackId} │ ├── classificationId: string │ ├── clinicianReview: string │ ├── correctedClass: string (optional) │ ├── histopathResult: string (optional) │ └── createdAt: timestamp │ └── /modelVersions/{versionId} ├── version: string (semver) ├── trainedOn: number (embedding count) ├── accuracy: number ├── sensitivityMelanoma: number ├── specificityMelanoma: number ├── fairnessScore: number └── releasedAt: timestamp ``` **Firestore Security Rules**: - Practice-level tenant isolation - Write access: authenticated clinicians only - Read access: same practice only - Admin access: platform operators only - No cross-practice data access ### Google Cloud Storage (GCS) ``` GCS Buckets │ ├── gs://dragnes-models/ │ ├── mobilenetv3_small_int8.bin (INT8 model, ~5MB) │ ├── mobilenetv3_small_fp32.bin (FP32 model, ~15MB) │ ├── mobilenetv3_small.wasm (WASM module, ~2MB) │ ├── lora_weights/{practiceId}/latest.bin (per-practice LoRA) │ └── reference_embeddings/top1000.bin (offline cache) │ Encryption: CMEK (AES-256) │ Access: dragnes-api service account only │ ├── gs://dragnes-rvf/ │ ├── {contributorHash}/{memoryId}.rvf (RVF containers) │ Encryption: CMEK (AES-256) │ Access: brain server service account only │ Lifecycle: Archive after 90 days, delete after 7 years │ └── gs://dragnes-audit/ ├── access_logs/YYYY/MM/DD/*.jsonl ├── classification_logs/YYYY/MM/DD/*.jsonl └── security_events/YYYY/MM/DD/*.jsonl Encryption: CMEK (AES-256) Retention: 6 years (HIPAA minimum) Access: Security team only ``` ### Memorystore (Redis) -- Optional Performance Layer ``` Redis Instance (Basic tier, 1GB) │ ├── Session cache (15-min TTL) ├── Rate limiting counters (per-practice, per-hour) ├── HNSW search result cache (5-min TTL) └── Model version cache (1-hour TTL) ``` ## Event Architecture ### Pub/Sub Topics ``` Pub/Sub Configuration │ ├── dragnes-classification (new classification events) │ ├── Publisher: dragnes-api │ ├── Subscriber: brain-server (brain ingestion) │ ├── Subscriber: dragnes-analytics (BigQuery sink) │ └── Subscriber: dragnes-alerts (monitoring) │ ├── dragnes-feedback (clinician feedback events) │ ├── Publisher: dragnes-api │ ├── Subscriber: brain-server (model improvement) │ └── Subscriber: dragnes-analytics (accuracy tracking) │ ├── dragnes-model-update (model version events) │ ├── Publisher: dragnes-training (Cloud Run job) │ ├── Subscriber: dragnes-api (hot-reload) │ └── Subscriber: dragnes-cnn (hot-reload) │ └── dragnes-alerts (monitoring alerts) ├── Publisher: various services └── Subscriber: Cloud Monitoring → PagerDuty ``` ### Cloud Scheduler Jobs ``` Scheduled Jobs │ ├── dragnes-model-retrain │ ├── Schedule: Weekly (Sunday 02:00 UTC) │ ├── Action: Trigger Cloud Run job for model retraining │ ├── Input: New feedback + brain embeddings since last train │ └── Output: New model version to GCS │ ├── dragnes-drift-check │ ├── Schedule: Daily (06:00 UTC) │ ├── Action: Brain drift analysis on dermoscopy namespace │ └── Alert: If drift > 0.15, trigger early retrain │ ├── dragnes-fairness-audit │ ├── Schedule: Weekly (Monday 08:00 UTC) │ ├── Action: Compute accuracy by Fitzpatrick type │ └── Alert: If disparity > 5%, flag for investigation │ ├── dragnes-privacy-audit │ ├── Schedule: Daily (04:00 UTC) │ ├── Action: Verify no PII in Firestore/GCS │ └── Alert: Any PII detection triggers incident │ └── dragnes-backup ├── Schedule: Daily (00:00 UTC) ├── Action: Firestore export to GCS └── Retention: 30 daily + 12 monthly + 7 yearly ``` ## Security Configuration ### Google Secrets Manager ``` Secrets (extending existing pi.ruv.io secrets) │ ├── dragnes-api-key (API authentication key) ├── dragnes-jwt-signing-key (JWT token signing) ├── dragnes-cmek-key-id (CMEK key reference) ├── dragnes-oauth-client-id (Google OAuth client) ├── dragnes-oauth-client-secret (Google OAuth secret) ├── dragnes-firebase-config (Firebase project config) └── dragnes-pubmed-api-key (NCBI E-utilities key) Existing secrets reused: ├── ANTHROPIC_API_KEY (for chat interface LLM) └── huggingface-token (for model downloads) ``` ### IAM Configuration ``` Service Accounts │ ├── dragnes-api@ruvector-brain-dev.iam.gserviceaccount.com │ ├── roles/run.invoker (invoke brain server) │ ├── roles/datastore.user (Firestore read/write) │ ├── roles/storage.objectViewer (model bucket) │ ├── roles/pubsub.publisher (classification events) │ └── roles/secretmanager.secretAccessor (secrets) │ ├── dragnes-cnn@ruvector-brain-dev.iam.gserviceaccount.com │ ├── roles/storage.objectViewer (model bucket) │ └── roles/secretmanager.secretAccessor (secrets) │ └── dragnes-training@ruvector-brain-dev.iam.gserviceaccount.com ├── roles/storage.objectAdmin (model bucket, write new versions) ├── roles/datastore.viewer (read feedback data) ├── roles/pubsub.publisher (model update events) └── roles/bigquery.dataViewer (analytics queries) ``` ### VPC Service Controls ``` VPC-SC Perimeter: dragnes-perimeter │ ├── Protected Services │ ├── firestore.googleapis.com │ ├── storage.googleapis.com │ ├── bigquery.googleapis.com │ └── secretmanager.googleapis.com │ ├── Access Levels │ ├── Corporate network CIDR ranges │ ├── Cloud Run service accounts (internal) │ └── Emergency break-glass accounts │ └── Ingress Rules ├── Allow: Cloud Run → Firestore/GCS (internal) ├── Allow: Cloud Scheduler → Cloud Run (internal) └── Deny: All other access to protected services ``` ## Multi-Region Deployment ### Region Selection | Region | Role | Justification | |--------|------|---------------| | us-east1 (South Carolina) | Primary | Low latency to East Coast US; HIPAA eligible | | us-west1 (Oregon) | Failover | West Coast coverage; disaster recovery | | europe-west1 (Belgium) | EU Data Residency | GDPR compliance for EU practices | | asia-southeast1 (Singapore) | Future | APAC coverage (Phase 4) | ### Cross-Region Data Flow ``` Data Residency Rules │ ├── Patient metadata: Region-locked (US data stays in US, EU in EU) ├── De-identified brain embeddings: Global (privacy-preserving) ├── Model weights: Global (no PHI) ├── Audit logs: Region-locked └── WASM/PWA assets: Global CDN ``` ## Monitoring & Observability ### Cloud Monitoring Dashboard ``` DrAgnes Operations Dashboard │ ├── Service Health │ ├── API latency (p50, p95, p99) │ ├── CNN inference latency │ ├── Error rate by endpoint │ ├── Active instances per region │ └── Request volume (per hour, per practice) │ ├── Classification Metrics │ ├── Classifications per hour (global) │ ├── Distribution by lesion class │ ├── Average confidence score │ ├── Clinician override rate │ └── Sensitivity/specificity (rolling 30-day) │ ├── Brain Health │ ├── Memory count (dermatology namespace) │ ├── Drift status │ ├── Embedding quality score │ └── Sync latency │ ├── Privacy & Compliance │ ├── PII scan results (should always be 0) │ ├── DP budget consumption per practice │ ├── Access audit anomalies │ └── Witness chain verification failures │ └── Cost Tracking ├── Cloud Run cost by service ├── Storage cost by bucket ├── Network egress cost └── Total monthly cost vs. budget ``` ### Alert Policies | Alert | Condition | Severity | Action | |-------|-----------|----------|--------| | API error rate > 1% | 5-min window | P2 | PagerDuty notification | | CNN latency > 500ms (p95) | 15-min window | P3 | Slack notification | | PII detected in cloud | Any occurrence | P1 | Immediate incident response | | Melanoma sensitivity < 90% | 7-day rolling | P1 | Model freeze + investigation | | Fairness disparity > 5% | Weekly audit | P2 | Investigation within 24 hours | | Brain drift > 0.15 | Daily check | P3 | Trigger early retrain | | DP budget > 80% for practice | Per check | P3 | Notify practice admin | ## Cost Projections ### Monthly Cost Estimates (by Scale) | Component | 10 Practices | 100 Practices | 1,000 Practices | |-----------|-------------|--------------|-----------------| | Cloud Run (API) | $50 | $200 | $1,500 | | Cloud Run (CNN) | $30 | $150 | $1,000 | | Brain Server (shared) | $150 (existing) | $150 | $300 | | Firestore | $10 | $50 | $300 | | GCS (models + RVF) | $5 | $20 | $100 | | Cloud CDN | $10 | $30 | $150 | | Firebase Hosting | $0 (free tier) | $25 | $100 | | Memorystore (Redis) | $0 (skip) | $50 | $100 | | Cloud Monitoring | $0 (free tier) | $50 | $200 | | Secret Manager | $1 | $1 | $5 | | Pub/Sub | $1 | $5 | $30 | | Cloud Scheduler | $1 | $1 | $5 | | BigQuery (analytics) | $0 (free tier) | $20 | $100 | | **Total Monthly** | **~$258** | **~$752** | **~$3,890** | | **Per Practice/Month** | **$25.80** | **$7.52** | **$3.89** | ### Revenue Model | Tier | Price | Features | |------|-------|---------| | Starter | $99/mo/practice | 500 classifications/mo, WASM offline, basic brain | | Professional | $199/mo/practice | Unlimited, LoRA adaptation, full brain, teledermatology | | Enterprise | Custom | Multi-practice, EHR integration, dedicated support, SLA | | Academic | Free | Research use, data contribution agreement | | Underserved | Free | Qualifying community health centers | **Break-even**: approximately 30 practices on Professional tier covers infrastructure costs at the 100-practice scale. ## Deployment Pipeline ``` Deployment Pipeline (Cloud Build) │ ├── Source: GitHub (ruvector/dragnes) ├── Trigger: Push to main branch │ ├── Build Stage │ ├── Rust compilation (--release --target x86_64-unknown-linux-gnu) │ ├── WASM compilation (--target wasm32-unknown-unknown) │ ├── Docker image build (distroless base) │ └── SvelteKit build (npm run build) │ ├── Test Stage │ ├── Unit tests (cargo test) │ ├── Integration tests (against staging brain) │ ├── WASM inference accuracy test (reference images) │ ├── Security scan (cargo audit + npm audit) │ └── HIPAA compliance checks (PII scanner) │ ├── Deploy Stage (Canary) │ ├── Deploy to staging (full test suite) │ ├── Canary deployment (5% traffic for 30 minutes) │ ├── Monitor error rate and latency │ ├── Auto-rollback if error rate > 0.5% │ └── Promote to 100% if healthy │ └── Post-Deploy ├── Smoke tests against production ├── Notify operations channel ├── Update model version registry └── Archive previous version artifacts ``` ## Disaster Recovery | Scenario | RTO | RPO | Recovery Procedure | |----------|-----|-----|-------------------| | Single region outage | 5 minutes | 0 (multi-region) | Automatic failover via Cloud LB | | Firestore corruption | 1 hour | 24 hours | Restore from daily export | | Model corruption | 10 minutes | N/A | Roll back to previous model version | | Brain server outage | 5 minutes | 0 | Existing brain HA (pi.ruv.io) | | Complete GCP outage | 4 hours | 24 hours | Multi-cloud DR (backup to AWS S3) | | Security breach | 1 hour | N/A | Incident response plan activation |